commit c214ed2a4dda35b308b0b28eed804d7ae66401f9 Author: Saurav Kashyap Date: Fri Mar 15 12:44:27 2024 +0530 scsi: bnx2fc: Remove spin_lock_bh while releasing resources after upload The session resources are used by FW and driver when session is offloaded, once session is uploaded these resources are not used. The lock is not required as these fields won't be used any longer. The offload and upload calls are sequential, hence lock is not required. This will suppress following BUG_ON(): [ 449.843143] ------------[ cut here ]------------ [ 449.848302] kernel BUG at mm/vmalloc.c:2727! [ 449.853072] invalid opcode: 0000 [#1] PREEMPT SMP PTI [ 449.858712] CPU: 5 PID: 1996 Comm: kworker/u24:2 Not tainted 5.14.0-118.el9.x86_64 #1 Rebooting. [ 449.867454] Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.3.4 11/08/2016 [ 449.876966] Workqueue: fc_rport_eq fc_rport_work [libfc] [ 449.882910] RIP: 0010:vunmap+0x2e/0x30 [ 449.887098] Code: 00 65 8b 05 14 a2 f0 4a a9 00 ff ff 00 75 1b 55 48 89 fd e8 34 36 79 00 48 85 ed 74 0b 48 89 ef 31 f6 5d e9 14 fc ff ff 5d c3 <0f> 0b 0f 1f 44 00 00 41 57 41 56 49 89 ce 41 55 49 89 fd 41 54 41 [ 449.908054] RSP: 0018:ffffb83d878b3d68 EFLAGS: 00010206 [ 449.913887] RAX: 0000000080000201 RBX: ffff8f4355133550 RCX: 000000000d400005 [ 449.921843] RDX: 0000000000000001 RSI: 0000000000001000 RDI: ffffb83da53f5000 [ 449.929808] RBP: ffff8f4ac6675800 R08: ffffb83d878b3d30 R09: 00000000000efbdf [ 449.937774] R10: 0000000000000003 R11: ffff8f434573e000 R12: 0000000000001000 [ 449.945736] R13: 0000000000001000 R14: ffffb83da53f5000 R15: ffff8f43d4ea3ae0 [ 449.953701] FS: 0000000000000000(0000) GS:ffff8f529fc80000(0000) knlGS:0000000000000000 [ 449.962732] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 449.969138] CR2: 00007f8cf993e150 CR3: 0000000efbe10003 CR4: 00000000003706e0 [ 449.977102] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 449.985065] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 449.993028] Call Trace: [ 449.995756] __iommu_dma_free+0x96/0x100 [ 450.000139] bnx2fc_free_session_resc+0x67/0x240 [bnx2fc] [ 450.006171] bnx2fc_upload_session+0xce/0x100 [bnx2fc] [ 450.011910] bnx2fc_rport_event_handler+0x9f/0x240 [bnx2fc] [ 450.018136] fc_rport_work+0x103/0x5b0 [libfc] [ 450.023103] process_one_work+0x1e8/0x3c0 [ 450.027581] worker_thread+0x50/0x3b0 [ 450.031669] ? rescuer_thread+0x370/0x370 [ 450.036143] kthread+0x149/0x170 [ 450.039744] ? set_kthread_struct+0x40/0x40 [ 450.044411] ret_from_fork+0x22/0x30 [ 450.048404] Modules linked in: vfat msdos fat xfs nfs_layout_nfsv41_files rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver dm_service_time qedf qed crc8 bnx2fc libfcoe libfc scsi_transport_fc intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp dcdbas rapl intel_cstate intel_uncore mei_me pcspkr mei ipmi_ssif lpc_ich ipmi_si fuse zram ext4 mbcache jbd2 loop nfsv3 nfs_acl nfs lockd grace fscache netfs irdma ice sd_mod t10_pi sg ib_uverbs ib_core 8021q garp mrp stp llc mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt mxm_wmi fb_sys_fops cec crct10dif_pclmul ahci crc32_pclmul bnx2x drm ghash_clmulni_intel libahci rfkill i40e libata megaraid_sas mdio wmi sunrpc lrw dm_crypt dm_round_robin dm_multipath dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log dm_zero dm_mod linear raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid6_pq libcrc32c crc32c_intel raid1 raid0 iscsi_ibft squashfs be2iscsi bnx2i cnic uio cxgb4i cxgb4 tls [ 450.048497] libcxgbi libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi edd ipmi_devintf ipmi_msghandler [ 450.159753] ---[ end trace 712de2c57c64abc8 ]--- Reported-by: Guangwu Zhang Signed-off-by: Saurav Kashyap Signed-off-by: Nilesh Javali Link: https://lore.kernel.org/r/20240315071427.31842-1-skashyap@marvell.com Signed-off-by: Martin K. Petersen commit f23a4d6e07570826fe95023ca1aa96a011fa9f84 Author: Guilherme G. Piccoli Date: Wed Mar 13 08:21:20 2024 -0300 scsi: core: Fix unremoved procfs host directory regression Commit fc663711b944 ("scsi: core: Remove the /proc/scsi/${proc_name} directory earlier") fixed a bug related to modules loading/unloading, by adding a call to scsi_proc_hostdir_rm() on scsi_remove_host(). But that led to a potential duplicate call to the hostdir_rm() routine, since it's also called from scsi_host_dev_release(). That triggered a regression report, which was then fixed by commit be03df3d4bfe ("scsi: core: Fix a procfs host directory removal regression"). The fix just dropped the hostdir_rm() call from dev_release(). But it happens that this proc directory is created on scsi_host_alloc(), and that function "pairs" with scsi_host_dev_release(), while scsi_remove_host() pairs with scsi_add_host(). In other words, it seems the reason for removing the proc directory on dev_release() was meant to cover cases in which a SCSI host structure was allocated, but the call to scsi_add_host() didn't happen. And that pattern happens to exist in some error paths, for example. Syzkaller causes that by using USB raw gadget device, error'ing on usb-storage driver, at usb_stor_probe2(). By checking that path, we can see that the BadDevice label leads to a scsi_host_put() after a SCSI host allocation, but there's no call to scsi_add_host() in such path. That leads to messages like this in dmesg (and a leak of the SCSI host proc structure): usb-storage 4-1:87.51: USB Mass Storage device detected proc_dir_entry 'scsi/usb-storage' already registered WARNING: CPU: 1 PID: 3519 at fs/proc/generic.c:377 proc_register+0x347/0x4e0 fs/proc/generic.c:376 The proper fix seems to still call scsi_proc_hostdir_rm() on dev_release(), but guard that with the state check for SHOST_CREATED; there is even a comment in scsi_host_dev_release() detailing that: such conditional is meant for cases where the SCSI host was allocated but there was no calls to {add,remove}_host(), like the usb-storage case. This is what we propose here and with that, the error path of usb-storage does not trigger the warning anymore. Reported-by: syzbot+c645abf505ed21f931b5@syzkaller.appspotmail.com Fixes: be03df3d4bfe ("scsi: core: Fix a procfs host directory removal regression") Cc: stable@vger.kernel.org Cc: Bart Van Assche Cc: John Garry Cc: Shin'ichiro Kawasaki Signed-off-by: Guilherme G. Piccoli Link: https://lore.kernel.org/r/20240313113006.2834799-1-gpiccoli@igalia.com Reviewed-by: Bart Van Assche Signed-off-by: Martin K. Petersen commit 429846b4b6ce9853e0d803a2357bb2e55083adf0 Author: Shin'ichiro Kawasaki Date: Sat Mar 23 17:41:55 2024 +0900 scsi: mpi3mr: Avoid memcpy field-spanning write WARNING When the "storcli2 show" command is executed for eHBA-9600, mpi3mr driver prints this WARNING message: memcpy: detected field-spanning write (size 128) of single field "bsg_reply_buf->reply_buf" at drivers/scsi/mpi3mr/mpi3mr_app.c:1658 (size 1) WARNING: CPU: 0 PID: 12760 at drivers/scsi/mpi3mr/mpi3mr_app.c:1658 mpi3mr_bsg_request+0x6b12/0x7f10 [mpi3mr] The cause of the WARN is 128 bytes memcpy to the 1 byte size array "__u8 replay_buf[1]" in the struct mpi3mr_bsg_in_reply_buf. The array is intended to be a flexible length array, so the WARN is a false positive. To suppress the WARN, remove the constant number '1' from the array declaration and clarify that it has flexible length. Also, adjust the memory allocation size to match the change. Suggested-by: Sathya Prakash Veerichetty Signed-off-by: Shin'ichiro Kawasaki Link: https://lore.kernel.org/r/20240323084155.166835-1-shinichiro.kawasaki@wdc.com Signed-off-by: Martin K. Petersen commit 0c76106cb97548810214def8ee22700bbbb90543 Author: Damien Le Moal Date: Tue Mar 19 16:12:09 2024 +0900 scsi: sd: Fix TCG OPAL unlock on system resume Commit 3cc2ffe5c16d ("scsi: sd: Differentiate system and runtime start/stop management") introduced the manage_system_start_stop scsi_device flag to allow libata to indicate to the SCSI disk driver that nothing should be done when resuming a disk on system resume. This change turned the execution of sd_resume() into a no-op for ATA devices on system resume. While this solved deadlock issues during device resume, this change also wrongly removed the execution of opal_unlock_from_suspend(). As a result, devices with TCG OPAL locking enabled remain locked and inaccessible after a system resume from sleep. To fix this issue, introduce the SCSI driver resume method and implement it with the sd_resume() function calling opal_unlock_from_suspend(). The former sd_resume() function is renamed to sd_resume_common() and modified to call the new sd_resume() function. For non-ATA devices, this result in no functional changes. In order for libata to explicitly execute sd_resume() when a device is resumed during system restart, the function scsi_resume_device() is introduced. libata calls this function from the revalidation work executed on devie resume, a state that is indicated with the new device flag ATA_DFLAG_RESUMING. Doing so, locked TCG OPAL enabled devices are unlocked on resume, allowing normal operation. Fixes: 3cc2ffe5c16d ("scsi: sd: Differentiate system and runtime start/stop management") Link: https://bugzilla.kernel.org/show_bug.cgi?id=218538 Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal Link: https://lore.kernel.org/r/20240319071209.1179257-1-dlemoal@kernel.org Signed-off-by: Martin K. Petersen commit 27f58c04a8f438078583041468ec60597841284d Author: Alexander Wetzel Date: Wed Mar 20 22:30:32 2024 +0100 scsi: sg: Avoid sg device teardown race sg_remove_sfp_usercontext() must not use sg_device_destroy() after calling scsi_device_put(). sg_device_destroy() is accessing the parent scsi_device request_queue which will already be set to NULL when the preceding call to scsi_device_put() removed the last reference to the parent scsi_device. The resulting NULL pointer exception will then crash the kernel. Link: https://lore.kernel.org/r/20240305150509.23896-1-Alexander@wetzel-home.de Fixes: db59133e9279 ("scsi: sg: fix blktrace debugfs entries leakage") Cc: stable@vger.kernel.org Signed-off-by: Alexander Wetzel Link: https://lore.kernel.org/r/20240320213032.18221-1-Alexander@wetzel-home.de Reviewed-by: Bart Van Assche Signed-off-by: Martin K. Petersen commit 0fa215e5326b49fc7870e2f576bc4316017a23dd Author: Justin Tee Date: Tue Mar 5 12:05:03 2024 -0800 scsi: lpfc: Copyright updates for 14.4.0.1 patches Update copyrights to 2024 for files modified in the 14.4.0.1 patch set. Signed-off-by: Justin Tee Link: https://lore.kernel.org/r/20240305200503.57317-13-justintee8345@gmail.com Signed-off-by: Martin K. Petersen commit 01b6b70d98f2e0c9c7b8b5d962b5e22f74f60056 Author: Justin Tee Date: Tue Mar 5 12:05:02 2024 -0800 scsi: lpfc: Update lpfc version to 14.4.0.1 Update lpfc version to 14.4.0.1 Signed-off-by: Justin Tee Link: https://lore.kernel.org/r/20240305200503.57317-12-justintee8345@gmail.com Signed-off-by: Martin K. Petersen commit 85d77f917a3b86872d3d52d5cea945a661858d20 Author: Justin Tee Date: Tue Mar 5 12:05:01 2024 -0800 scsi: lpfc: Define types in a union for generic void *context3 ptr In LPFC_MBOXQ_t, the void *context3 ptr is used for various paths. It is treated as a generic pointer, and is type casted during its usage. The issue with this is that it can sometimes get confusing when reading code as to what the context3 ptr is being used for and mistakenly be reused in a different context. Rename context3 to ctx_u, and declare it as a union of defined ptr types. From now on, the ctx_u ptr may be used only if users define the use case type. Signed-off-by: Justin Tee Link: https://lore.kernel.org/r/20240305200503.57317-11-justintee8345@gmail.com Signed-off-by: Martin K. Petersen commit 115d137aa918d879e3cca9605bbf59e0482aa734 Author: Justin Tee Date: Tue Mar 5 12:05:00 2024 -0800 scsi: lpfc: Define lpfc_dmabuf type for ctx_buf ptr In LPFC_MBOXQ_t, the ctx_buf ptr shouldn't be defined as a generic void *ptr. It is named ctx_buf and it should only be used as an lpfc_dmabuf *ptr. Due to the void* declaration, there have been abuses of ctx_buf for things not related to lpfc_dmabuf. So, set the ptr type for *ctx_buf as lpfc_dmabuf. Remove all type casts on ctx_buf because it is no longer a void *ptr. Convert the abuse of ctx_buf for something not related to lpfc_dmabuf to use the void *context3 ptr. A particular abuse of the ctx_buf warranted a new void *ext_buf ptr. However, the usage of this new void *ext_buf is not generic. It is intended to only hold virtual addresses for extended mailbox commands. Signed-off-by: Justin Tee Link: https://lore.kernel.org/r/20240305200503.57317-10-justintee8345@gmail.com Signed-off-by: Martin K. Petersen commit 18f7fe44bc79e67eccd4c118f10aa16647d446f8 Author: Justin Tee Date: Tue Mar 5 12:04:59 2024 -0800 scsi: lpfc: Define lpfc_nodelist type for ctx_ndlp ptr In LPFC_MBOXQ_t data structure, the ctx_ndlp ptr shouldn't be defined as a generic void *ptr. It is named ctx_ndlp and it should only be used as an lpfc_nodelist *ptr. Due to the void* declaration, there have been abuses of ctx_ndlp for things not related to ndlp. So, set the ptr type for *ctx_ndlp as lpfc_nodelist. Remove all type casts on ctx_ndlp because it is no longer a void *ptr. Convert the abuse of ctx_ndlp for things not related to ndlps to use the void *context3 ptr. Signed-off-by: Justin Tee Link: https://lore.kernel.org/r/20240305200503.57317-9-justintee8345@gmail.com Signed-off-by: Martin K. Petersen commit f733a76ea0a9a84aee4ac41b81fad4d610ecbd8e Author: Justin Tee Date: Tue Mar 5 12:04:58 2024 -0800 scsi: lpfc: Use a dedicated lock for ras_fwlog state To reduce usage of and contention for hbalock, a separate dedicated lock is used to protect ras_fwlog state. Signed-off-by: Justin Tee Link: https://lore.kernel.org/r/20240305200503.57317-8-justintee8345@gmail.com Signed-off-by: Martin K. Petersen commit ded20192dff31c91cef2a04f7e20e60e9bb887d3 Author: Justin Tee Date: Tue Mar 5 12:04:57 2024 -0800 scsi: lpfc: Release hbalock before calling lpfc_worker_wake_up() lpfc_worker_wake_up() calls the lpfc_work_done() routine, which takes the hbalock. Thus, lpfc_worker_wake_up() should not be called while holding the hbalock to avoid potential deadlock. Signed-off-by: Justin Tee Link: https://lore.kernel.org/r/20240305200503.57317-7-justintee8345@gmail.com Signed-off-by: Martin K. Petersen commit d11272be497e48a8e8f980470eb6b70e92eed0ce Author: Justin Tee Date: Tue Mar 5 12:04:56 2024 -0800 scsi: lpfc: Replace hbalock with ndlp lock in lpfc_nvme_unregister_port() The ndlp object update in lpfc_nvme_unregister_port() should be protected by the ndlp lock rather than hbalock. Signed-off-by: Justin Tee Link: https://lore.kernel.org/r/20240305200503.57317-6-justintee8345@gmail.com Signed-off-by: Martin K. Petersen commit bb011631435c705cdeddca68d5c85fd40a4320f9 Author: Justin Tee Date: Tue Mar 5 12:04:55 2024 -0800 scsi: lpfc: Update lpfc_ramp_down_queue_handler() logic Typically when an out of resource CQE status is detected, the lpfc_ramp_down_queue_handler() logic is called to help reduce I/O load by reducing an sdev's queue_depth. However, the current lpfc_rampdown_queue_depth() logic does not help reduce queue_depth. num_cmd_success is never updated and is always zero, which means new_queue_depth will always be set to sdev->queue_depth. So, new_queue_depth = sdev->queue_depth - new_queue_depth always sets new_queue_depth to zero. And, scsi_change_queue_depth(sdev, 0) is essentially a no-op. Change the lpfc_ramp_down_queue_handler() logic to set new_queue_depth equal to sdev->queue_depth subtracted from number of times num_rsrc_err was incremented. If num_rsrc_err is >= sdev->queue_depth, then set new_queue_depth equal to 1. Eventually, the frequency of Good_Status frames will signal SCSI upper layer to auto increase the queue_depth back to the driver default of 64 via scsi_handle_queue_ramp_up(). Signed-off-by: Justin Tee Link: https://lore.kernel.org/r/20240305200503.57317-5-justintee8345@gmail.com Signed-off-by: Martin K. Petersen commit 4623713e7ade46bfc63a3eade836f566ccbcd771 Author: Justin Tee Date: Tue Mar 5 12:04:54 2024 -0800 scsi: lpfc: Remove IRQF_ONESHOT flag from threaded IRQ handling IRQF_ONESHOT is found to mask HBA generated interrupts when thread_fn is running. As a result, some EQEs/CQEs miss timely processing resulting in SCSI layer attempts to abort commands due to io_timeout. Abort CQEs are also not processed leading to the observations of hangs and spam of "0748 abort handler timed out waiting for aborting I/O" log messages. Remove the IRQF_ONESHOT flag. The cmpxchg and xchg atomic operations on lpfc_queue->queue_claimed already protect potential parallel access to an EQ/CQ should the thread_fn get interrupted by the primary irq handler. Signed-off-by: Justin Tee Link: https://lore.kernel.org/r/20240305200503.57317-4-justintee8345@gmail.com Signed-off-by: Martin K. Petersen commit 4ddf01f2f1504fa08b766e8cfeec558e9f8eef6c Author: Justin Tee Date: Tue Mar 5 12:04:53 2024 -0800 scsi: lpfc: Move NPIV's transport unregistration to after resource clean up There are cases after NPIV deletion where the fabric switch still believes the NPIV is logged into the fabric. This occurs when a vport is unregistered before the Remove All DA_ID CT and LOGO ELS are sent to the fabric. Currently fc_remove_host(), which calls dev_loss_tmo for all D_IDs including the fabric D_ID, removes the last ndlp reference and frees the ndlp rport object. This sometimes causes the race condition where the final DA_ID and LOGO are skipped from being sent to the fabric switch. Fix by moving the fc_remove_host() and scsi_remove_host() calls after DA_ID and LOGO are sent. Signed-off-by: Justin Tee Link: https://lore.kernel.org/r/20240305200503.57317-3-justintee8345@gmail.com Signed-off-by: Martin K. Petersen commit 91ddb6d0c3159bcc505bfa564d0573ae500cc2c7 Author: Justin Tee Date: Tue Mar 5 12:04:52 2024 -0800 scsi: lpfc: Remove unnecessary log message in queuecommand path Message 9038 logs when LLDD receives SCSI_PROT_NORMAL when T10 DIF protection is configured. The event is not wrong, but the log message has not proven useful in debugging so it is removed. Signed-off-by: Justin Tee Link: https://lore.kernel.org/r/20240305200503.57317-2-justintee8345@gmail.com Signed-off-by: Martin K. Petersen commit b8260ca37930a4b007f7b662d4b501a030a4935f Author: Nilesh Javali Date: Tue Feb 27 22:11:27 2024 +0530 scsi: qla2xxx: Update version to 10.02.09.200-k Signed-off-by: Nilesh Javali Link: https://lore.kernel.org/r/20240227164127.36465-12-njavali@marvell.com Reviewed-by: Himanshu Madhani Signed-off-by: Martin K. Petersen commit 591c1fdf2016d118b8fbde427b796fac13f3f070 Author: Quinn Tran Date: Tue Feb 27 22:11:26 2024 +0530 scsi: qla2xxx: Delay I/O Abort on PCI error Currently when PCI error is detected, I/O is aborted manually through the ABORT IOCB mechanism which is not guaranteed to succeed. Instead, wait for the OS or system to notify driver to wind down I/O through the pci_error_handlers api. Set eeh_busy flag to pause all traffic and wait for I/O to drain. Cc: stable@vger.kernel.org Signed-off-by: Quinn Tran Signed-off-by: Nilesh Javali Link: https://lore.kernel.org/r/20240227164127.36465-11-njavali@marvell.com Reviewed-by: Himanshu Madhani Signed-off-by: Martin K. Petersen commit b5a30840727a3e41d12a336d19f6c0716b299161 Author: Saurav Kashyap Date: Tue Feb 27 22:11:25 2024 +0530 scsi: qla2xxx: Change debug message during driver unload Upon driver unload, purge_mbox flag is set and the heartbeat monitor thread detects this flag and does not send the mailbox command down to FW with a debug message "Error detected: purge[1] eeh[0] cmd=0x0, Exiting". This being not a real error, change the debug message. Cc: stable@vger.kernel.org Signed-off-by: Saurav Kashyap Signed-off-by: Nilesh Javali Link: https://lore.kernel.org/r/20240227164127.36465-10-njavali@marvell.com Reviewed-by: Himanshu Madhani Signed-off-by: Martin K. Petersen commit 82f522ae0d97119a43da53e0f729275691b9c525 Author: Saurav Kashyap Date: Tue Feb 27 22:11:24 2024 +0530 scsi: qla2xxx: Fix double free of fcport The server was crashing after LOGO because fcport was getting freed twice. -----------[ cut here ]----------- kernel BUG at mm/slub.c:371! invalid opcode: 0000 1 SMP PTI CPU: 35 PID: 4610 Comm: bash Kdump: loaded Tainted: G OE --------- - - 4.18.0-425.3.1.el8.x86_64 #1 Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 09/03/2021 RIP: 0010:set_freepointer.part.57+0x0/0x10 RSP: 0018:ffffb07107027d90 EFLAGS: 00010246 RAX: ffff9cb7e3150000 RBX: ffff9cb7e332b9c0 RCX: ffff9cb7e3150400 RDX: 0000000000001f37 RSI: 0000000000000000 RDI: ffff9cb7c0005500 RBP: fffff693448c5400 R08: 0000000080000000 R09: 0000000000000009 R10: 0000000000000000 R11: 0000000000132af0 R12: ffff9cb7c0005500 R13: ffff9cb7e3150000 R14: ffffffffc06990e0 R15: ffff9cb7ea85ea58 FS: 00007ff6b79c2740(0000) GS:ffff9cb8f7ec0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055b426b7d700 CR3: 0000000169c18002 CR4: 00000000007706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: kfree+0x238/0x250 qla2x00_els_dcmd_sp_free+0x20/0x230 [qla2xxx] ? qla24xx_els_dcmd_iocb+0x607/0x690 [qla2xxx] qla2x00_issue_logo+0x28c/0x2a0 [qla2xxx] ? qla2x00_issue_logo+0x28c/0x2a0 [qla2xxx] ? kernfs_fop_write+0x11e/0x1a0 Remove one of the free calls and add check for valid fcport. Also use function qla2x00_free_fcport() instead of kfree(). Cc: stable@vger.kernel.org Signed-off-by: Saurav Kashyap Signed-off-by: Nilesh Javali Link: https://lore.kernel.org/r/20240227164127.36465-9-njavali@marvell.com Reviewed-by: Himanshu Madhani Signed-off-by: Martin K. Petersen commit e288285d47784fdcf7c81be56df7d65c6f10c58b Author: Saurav Kashyap Date: Tue Feb 27 22:11:23 2024 +0530 scsi: qla2xxx: Fix double free of the ha->vp_map pointer Coverity scan reported potential risk of double free of the pointer ha->vp_map. ha->vp_map was freed in qla2x00_mem_alloc(), and again freed in function qla2x00_mem_free(ha). Assign NULL to vp_map and kfree take care of NULL. Cc: stable@vger.kernel.org Signed-off-by: Saurav Kashyap Signed-off-by: Nilesh Javali Link: https://lore.kernel.org/r/20240227164127.36465-8-njavali@marvell.com Reviewed-by: Himanshu Madhani Signed-off-by: Martin K. Petersen commit a27d4d0e7de305def8a5098a614053be208d1aa1 Author: Quinn Tran Date: Tue Feb 27 22:11:22 2024 +0530 scsi: qla2xxx: Fix command flush on cable pull System crash due to command failed to flush back to SCSI layer. BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 PGD 0 P4D 0 Oops: 0000 [#1] SMP NOPTI CPU: 27 PID: 793455 Comm: kworker/u130:6 Kdump: loaded Tainted: G OE --------- - - 4.18.0-372.9.1.el8.x86_64 #1 Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 09/03/2021 Workqueue: nvme-wq nvme_fc_connect_ctrl_work [nvme_fc] RIP: 0010:__wake_up_common+0x4c/0x190 Code: 24 10 4d 85 c9 74 0a 41 f6 01 04 0f 85 9d 00 00 00 48 8b 43 08 48 83 c3 08 4c 8d 48 e8 49 8d 41 18 48 39 c3 0f 84 f0 00 00 00 <49> 8b 41 18 89 54 24 08 31 ed 4c 8d 70 e8 45 8b 29 41 f6 c5 04 75 RSP: 0018:ffff95f3e0cb7cd0 EFLAGS: 00010086 RAX: 0000000000000000 RBX: ffff8b08d3b26328 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffff8b08d3b26320 RBP: 0000000000000001 R08: 0000000000000000 R09: ffffffffffffffe8 R10: 0000000000000000 R11: ffff95f3e0cb7a60 R12: ffff95f3e0cb7d20 R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8b2fdf6c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000002f1e410002 CR4: 00000000007706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: __wake_up_common_lock+0x7c/0xc0 qla_nvme_ls_req+0x355/0x4c0 [qla2xxx] qla2xxx [0000:12:00.1]-f084:3: qlt_free_session_done: se_sess 0000000000000000 / sess ffff8ae1407ca000 from port 21:32:00:02:ac:07:ee:b8 loop_id 0x02 s_id 01:02:00 logout 1 keep 0 els_logo 0 ? __nvme_fc_send_ls_req+0x260/0x380 [nvme_fc] qla2xxx [0000:12:00.1]-207d:3: FCPort 21:32:00:02:ac:07:ee:b8 state transitioned from ONLINE to LOST - portid=010200. ? nvme_fc_send_ls_req.constprop.42+0x1a/0x45 [nvme_fc] qla2xxx [0000:12:00.1]-2109:3: qla2x00_schedule_rport_del 21320002ac07eeb8. rport ffff8ae598122000 roles 1 ? nvme_fc_connect_ctrl_work.cold.63+0x1e3/0xa7d [nvme_fc] qla2xxx [0000:12:00.1]-f084:3: qlt_free_session_done: se_sess 0000000000000000 / sess ffff8ae14801e000 from port 21:32:01:02:ad:f7:ee:b8 loop_id 0x04 s_id 01:02:01 logout 1 keep 0 els_logo 0 ? __switch_to+0x10c/0x450 ? process_one_work+0x1a7/0x360 qla2xxx [0000:12:00.1]-207d:3: FCPort 21:32:01:02:ad:f7:ee:b8 state transitioned from ONLINE to LOST - portid=010201. ? worker_thread+0x1ce/0x390 ? create_worker+0x1a0/0x1a0 qla2xxx [0000:12:00.1]-2109:3: qla2x00_schedule_rport_del 21320102adf7eeb8. rport ffff8ae3b2312800 roles 70 ? kthread+0x10a/0x120 qla2xxx [0000:12:00.1]-2112:3: qla_nvme_unregister_remote_port: unregister remoteport on ffff8ae14801e000 21320102adf7eeb8 ? set_kthread_struct+0x40/0x40 qla2xxx [0000:12:00.1]-2110:3: remoteport_delete of ffff8ae14801e000 21320102adf7eeb8 completed. ? ret_from_fork+0x1f/0x40 qla2xxx [0000:12:00.1]-f086:3: qlt_free_session_done: waiting for sess ffff8ae14801e000 logout The system was under memory stress where driver was not able to allocate an SRB to carry out error recovery of cable pull. The failure to flush causes upper layer to start modifying scsi_cmnd. When the system frees up some memory, the subsequent cable pull trigger another command flush. At this point the driver access a null pointer when attempting to DMA unmap the SGL. Add a check to make sure commands are flush back on session tear down to prevent the null pointer access. Cc: stable@vger.kernel.org Signed-off-by: Quinn Tran Signed-off-by: Nilesh Javali Link: https://lore.kernel.org/r/20240227164127.36465-7-njavali@marvell.com Reviewed-by: Himanshu Madhani Signed-off-by: Martin K. Petersen commit 69aecdd410106dc3a8f543a4f7ec6379b995b8d0 Author: Quinn Tran Date: Tue Feb 27 22:11:21 2024 +0530 scsi: qla2xxx: NVME|FCP prefer flag not being honored Changing of [FCP|NVME] prefer flag in flash has no effect on driver. For device that supports both FCP + NVMe over the same connection, driver continues to connect to this device using the previous successful login mode. On completion of flash update, adapter will be reset. Driver will reset the prefer flag based on setting from flash. Cc: stable@vger.kernel.org Signed-off-by: Quinn Tran Signed-off-by: Nilesh Javali Link: https://lore.kernel.org/r/20240227164127.36465-6-njavali@marvell.com Reviewed-by: Himanshu Madhani Signed-off-by: Martin K. Petersen commit 688fa069fda6fce24d243cddfe0c7024428acb74 Author: Bikash Hazarika Date: Tue Feb 27 22:11:20 2024 +0530 scsi: qla2xxx: Update manufacturer detail Update manufacturer detail from "Marvell Semiconductor, Inc." to "Marvell". Cc: stable@vger.kernel.org Signed-off-by: Bikash Hazarika Signed-off-by: Nilesh Javali Link: https://lore.kernel.org/r/20240227164127.36465-5-njavali@marvell.com Reviewed-by: Himanshu Madhani Signed-off-by: Martin K. Petersen commit 76a192e1a566e15365704b9f8fb3b70825f85064 Author: Quinn Tran Date: Tue Feb 27 22:11:19 2024 +0530 scsi: qla2xxx: Split FCE|EFT trace control Current code combines the allocation of FCE|EFT trace buffers and enables the features all in 1 step. Split this step into separate steps in preparation for follow-on patch to allow user to have a choice to enable / disable FCE trace feature. Cc: stable@vger.kernel.org Reported-by: kernel test robot Signed-off-by: Quinn Tran Signed-off-by: Nilesh Javali Link: https://lore.kernel.org/r/20240227164127.36465-4-njavali@marvell.com Reviewed-by: Himanshu Madhani Signed-off-by: Martin K. Petersen commit 881eb861ca3877300570db10abbf11494e48548d Author: Quinn Tran Date: Tue Feb 27 22:11:18 2024 +0530 scsi: qla2xxx: Fix N2N stuck connection Disk failed to rediscover after chip reset error injection. The chip reset happens at the time when a PLOGI is being sent. This causes a flag to be left on which blocks the retry. Clear the blocking flag. Cc: stable@vger.kernel.org Signed-off-by: Quinn Tran Signed-off-by: Nilesh Javali Link: https://lore.kernel.org/r/20240227164127.36465-3-njavali@marvell.com Reviewed-by: Himanshu Madhani Signed-off-by: Martin K. Petersen commit 4895009c4bb72f71f2e682f1e7d2c2d96e482087 Author: Quinn Tran Date: Tue Feb 27 22:11:17 2024 +0530 scsi: qla2xxx: Prevent command send on chip reset Currently IOCBs are allowed to push through while chip reset could be in progress. During chip reset the outstanding_cmds array is cleared twice. Once when any command on this array is returned as failed and secondly when the array is initialize to zero. If a command is inserted on to the array between these intervals, then the command will be lost. Check for chip reset before sending IOCB. Cc: stable@vger.kernel.org Signed-off-by: Quinn Tran Signed-off-by: Nilesh Javali Link: https://lore.kernel.org/r/20240227164127.36465-2-njavali@marvell.com Reviewed-by: Himanshu Madhani Signed-off-by: Martin K. Petersen commit 16cc2ba71b9f6440805aef7f92ba0f031f79b765 Author: Muhammad Usama Anjum Date: Mon Mar 4 14:11:19 2024 +0500 scsi: lpfc: Correct size for cmdwqe/rspwqe for memset() The cmdwqe and rspwqe are of type lpfc_wqe128. They should be memset() with the same type. Fixes: 61910d6a5243 ("scsi: lpfc: SLI path split: Refactor CT paths") Signed-off-by: Muhammad Usama Anjum Link: https://lore.kernel.org/r/20240304091119.847060-1-usama.anjum@collabora.com Reviewed-by: Justin Tee Signed-off-by: Martin K. Petersen commit 28d41991182c210ec1654f8af2e140ef4cc73f20 Author: Muhammad Usama Anjum Date: Mon Mar 4 14:06:48 2024 +0500 scsi: lpfc: Correct size for wqe for memset() The wqe is of type lpfc_wqe128. It should be memset with the same type. Fixes: 6c621a2229b0 ("scsi: lpfc: Separate NVMET RQ buffer posting from IO resources SGL/iocbq/context") Signed-off-by: Muhammad Usama Anjum Link: https://lore.kernel.org/r/20240304090649.833953-1-usama.anjum@collabora.com Reviewed-by: AngeloGioacchino Del Regno Reviewed-by: Justin Tee Signed-off-by: Martin K. Petersen commit 0822853d658bbfa93bb16716ab10819788ad0550 Author: Ricardo B. Marliere Date: Sat Mar 2 13:47:26 2024 -0300 scsi: st: Make st_sysfs_class constant Since commit 43a7206b0963 ("driver core: class: make class_register() take a const *"), the driver core allows for struct class to be in read-only memory, so move the st_sysfs_class structure to be declared at build time placing it into read-only memory, instead of having to be dynamically allocated at boot time. Cc: Greg Kroah-Hartman Suggested-by: Greg Kroah-Hartman Signed-off-by: Ricardo B. Marliere Link: https://lore.kernel.org/r/20240302-class_cleanup-scsi-v1-5-b9096b990e27@marliere.net Signed-off-by: Martin K. Petersen commit ac9f3ac5b86c41196bfd0c114d97db359b3bc44a Author: Ricardo B. Marliere Date: Sat Mar 2 13:47:25 2024 -0300 scsi: ch: Make ch_sysfs_class constant Since commit 43a7206b0963 ("driver core: class: make class_register() take a const *"), the driver core allows for struct class to be in read-only memory, so move the ch_sysfs_class structure to be declared at build time placing it into read-only memory, instead of having to be dynamically allocated at boot time. Cc: Greg Kroah-Hartman Suggested-by: Greg Kroah-Hartman Signed-off-by: Ricardo B. Marliere Link: https://lore.kernel.org/r/20240302-class_cleanup-scsi-v1-4-b9096b990e27@marliere.net Signed-off-by: Martin K. Petersen commit a08f0eb02981ebeccc6c62833e673cc06a29393b Author: Ricardo B. Marliere Date: Sat Mar 2 13:47:24 2024 -0300 scsi: cxlflash: Make cxlflash_class constant Since commit 43a7206b0963 ("driver core: class: make class_register() take a const *"), the driver core allows for struct class to be in read-only memory, so move the cxlflash_class structure to be declared at build time placing it into read-only memory, instead of having to be dynamically allocated at boot time. Cc: Greg Kroah-Hartman Suggested-by: Greg Kroah-Hartman Signed-off-by: Ricardo B. Marliere Link: https://lore.kernel.org/r/20240302-class_cleanup-scsi-v1-3-b9096b990e27@marliere.net Signed-off-by: Martin K. Petersen commit ee8dda6a7e9d28260e30ecaf8b5f27d176c8ade0 Author: Ricardo B. Marliere Date: Sat Mar 2 13:47:23 2024 -0300 scsi: pmcraid: Make pmcraid_class constant Since commit 43a7206b0963 ("driver core: class: make class_register() take a const *"), the driver core allows for struct class to be in read-only memory, so move the pmcraid_class structure to be declared at build time placing it into read-only memory, instead of having to be dynamically allocated at boot time. Cc: Greg Kroah-Hartman Suggested-by: Greg Kroah-Hartman Signed-off-by: Ricardo B. Marliere Link: https://lore.kernel.org/r/20240302-class_cleanup-scsi-v1-2-b9096b990e27@marliere.net Signed-off-by: Martin K. Petersen commit f1fb41765d0bff77514ffeaef37bbb45608f6c62 Author: Ricardo B. Marliere Date: Sat Mar 2 13:47:22 2024 -0300 scsi: sg: Make sg_sysfs_class constant Since commit 43a7206b0963 ("driver core: class: make class_register() take a const *"), the driver core allows for struct class to be in read-only memory, so move the sg_sysfs_class structure to be declared at build time placing it into read-only memory, instead of having to be dynamically allocated at boot time. Cc: Greg Kroah-Hartman Suggested-by: Greg Kroah-Hartman Signed-off-by: Ricardo B. Marliere Link: https://lore.kernel.org/r/20240302-class_cleanup-scsi-v1-1-b9096b990e27@marliere.net Signed-off-by: Martin K. Petersen commit db06ae7ce9fdd3076d81588e80a2a41c3ce82765 Author: Peter Wang Date: Fri Mar 1 11:46:10 2024 +0800 scsi: ufs: core: Add config_scsi_dev vops comment Add config_scsi_dev vops comment. Signed-off-by: Peter Wang Link: https://lore.kernel.org/r/20240301034610.24928-1-peter.wang@mediatek.com Reviewed-by: Bart Van Assche Signed-off-by: Martin K. Petersen commit 81e2c1a0f8d3f62f4c9e80b20270aa3481c40524 Author: Dmitry Baryshkov Date: Sun Feb 18 15:56:34 2024 +0200 scsi: ufs: qcom: Provide default cycles_in_1us value The MSM8996 DT doesn't provide frequency limits for the core_clk_unipro clock, which results in miscalculation of the cycles_in_1us value. Provide the backwards-compatible default to support existing MSM8996 DT files. Fixes: b4e13e1ae95e ("scsi: ufs: qcom: Add multiple frequency support for MAX_CORE_CLK_1US_CYCLES") Cc: Nitin Rawat Cc: stable@vger.kernel.org # 6.7.x Reviewed-by: Konrad Dybcio Reviewed-by: Manivannan Sadhasivam Signed-off-by: Dmitry Baryshkov Link: https://lore.kernel.org/r/20240218-msm8996-fix-ufs-v3-1-40aab49899a3@linaro.org Signed-off-by: Martin K. Petersen commit 8e68a458bcf5b5cb9c3624598bae28f08251601f Author: Xingui Yang Date: Thu Mar 7 14:14:13 2024 +0000 scsi: libsas: Fix disk not being scanned in after being removed As of commit d8649fc1c5e4 ("scsi: libsas: Do discovery on empty PHY to update PHY info"), do discovery will send a new SMP_DISCOVER and update phy->phy_change_count. We found that if the disk is reconnected and phy change_count changes at this time, the disk scanning process will not be triggered. Therefore, call sas_set_ex_phy() to update the PHY info with the results of the last query. And because the previous phy info will be used when calling sas_unregister_devs_sas_addr(), sas_unregister_devs_sas_addr() should be called before sas_set_ex_phy(). Fixes: d8649fc1c5e4 ("scsi: libsas: Do discovery on empty PHY to update PHY info") Signed-off-by: Xingui Yang Link: https://lore.kernel.org/r/20240307141413.48049-3-yangxingui@huawei.com Reviewed-by: John Garry Signed-off-by: Martin K. Petersen commit a57345279fd311ba679b8083feb0eec5272c7729 Author: Xingui Yang Date: Thu Mar 7 14:14:12 2024 +0000 scsi: libsas: Add a helper sas_get_sas_addr_and_dev_type() Add a helper to get attached_sas_addr and device type from disc_resp. Suggested-by: John Garry Signed-off-by: Xingui Yang Link: https://lore.kernel.org/r/20240307141413.48049-2-yangxingui@huawei.com Reviewed-by: John Garry Signed-off-by: Martin K. Petersen commit 99cfb212ef4d04515efcd88fd05cd9cdff4f9542 Author: Colin Ian King Date: Thu Mar 7 10:45:53 2024 +0000 scsi: target: iscsi: Remove unused variable xfer_len The variable 'xfer_len' is being initialized and incremented but it is never actually referenced in any other way. The variable is redundant and can be removed. Cleans up clang scan build warning: drivers/target/iscsi/iscsi_target_erl1.c:586:45: warning: variable 'xfer_len' set but not used [-Wunused-but-set-variable] Signed-off-by: Colin Ian King Link: https://lore.kernel.org/r/20240307104553.1980860-1-colin.i.king@gmail.com Signed-off-by: Martin K. Petersen commit 767712f91de76abd22a45184e6e3440120b8bfce Author: Rohit Ner Date: Tue Feb 20 01:56:37 2024 -0800 scsi: ufs: core: Fix MCQ MAC configuration As per JEDEC Standard No. 223E Section 5.9.2, the max # active commands value programmed by the host sw in MCQConfig.MAC should be one less than the actual value. Signed-off-by: Rohit Ner Link: https://lore.kernel.org/r/20240220095637.2900067-1-rohitner@google.com Reviewed-by: Peter Wang Reviewed-by: Can Guo Signed-off-by: Martin K. Petersen