commit c3203bb03db01f205a909d5010782358bc92bc0a Author: Greg Kroah-Hartman Date: Wed Nov 18 18:26:32 2020 +0100 Linux 4.9.244 Tested-by: Jon Hunter Tested-by: Shuah Khan Tested-by: Linux Kernel Functional Testing Reviewed-by: Guenter Roeck Link: https://lore.kernel.org/r/20201117122109.116890262@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman commit b5050c0486b7478078be5ea7a746e7c726025b61 Author: Boris Protopopov Date: Thu Sep 24 00:36:38 2020 +0000 Convert trailing spaces and periods in path components commit 57c176074057531b249cf522d90c22313fa74b0b upstream. When converting trailing spaces and periods in paths, do so for every component of the path, not just the last component. If the conversion is not done for every path component, then subsequent operations in directories with trailing spaces or periods (e.g. create(), mkdir()) will fail with ENOENT. This is because on the server, the directory will have a special symbol in its name, and the client needs to provide the same. Signed-off-by: Boris Protopopov Acked-by: Ronnie Sahlberg Signed-off-by: Steve French Signed-off-by: Greg Kroah-Hartman commit 09424dab92413d8f10e880107aef5faaa5e547ef Author: Eric Biggers Date: Tue Sep 22 09:24:56 2020 -0700 ext4: fix leaking sysfs kobject after failed mount commit cb8d53d2c97369029cc638c9274ac7be0a316c75 upstream. ext4_unregister_sysfs() only deletes the kobject. The reference to it needs to be put separately, like ext4_put_super() does. This addresses the syzbot report "memory leak in kobject_set_name_vargs (3)" (https://syzkaller.appspot.com/bug?extid=9f864abad79fae7c17e1). Reported-by: syzbot+9f864abad79fae7c17e1@syzkaller.appspotmail.com Fixes: 72ba74508b28 ("ext4: release sysfs kobject when failing to enable quotas on mount") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers Link: https://lore.kernel.org/r/20200922162456.93657-1-ebiggers@kernel.org Reviewed-by: Jan Kara Signed-off-by: Theodore Ts'o [sudip: adjust context] Signed-off-by: Sudip Mukherjee Signed-off-by: Greg Kroah-Hartman commit 41ac66d1d68f9c48fdc21c5f7669c087b2a0fd8a Author: Matteo Croce Date: Fri Nov 13 22:52:07 2020 -0800 reboot: fix overflow parsing reboot cpu number commit df5b0ab3e08a156701b537809914b339b0daa526 upstream. Limit the CPU number to num_possible_cpus(), because setting it to a value lower than INT_MAX but higher than NR_CPUS produces the following error on reboot and shutdown: BUG: unable to handle page fault for address: ffffffff90ab1bb0 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 1c09067 P4D 1c09067 PUD 1c0a063 PMD 0 Oops: 0000 [#1] SMP CPU: 1 PID: 1 Comm: systemd-shutdow Not tainted 5.9.0-rc8-kvm #110 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 RIP: 0010:migrate_to_reboot_cpu+0xe/0x60 Code: ea ea 00 48 89 fa 48 c7 c7 30 57 f1 81 e9 fa ef ff ff 66 2e 0f 1f 84 00 00 00 00 00 53 8b 1d d5 ea ea 00 e8 14 33 fe ff 89 da <48> 0f a3 15 ea fc bd 00 48 89 d0 73 29 89 c2 c1 e8 06 65 48 8b 3c RSP: 0018:ffffc90000013e08 EFLAGS: 00010246 RAX: ffff88801f0a0000 RBX: 0000000077359400 RCX: 0000000000000000 RDX: 0000000077359400 RSI: 0000000000000002 RDI: ffffffff81c199e0 RBP: ffffffff81c1e3c0 R08: ffff88801f41f000 R09: ffffffff81c1e348 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 00007f32bedf8830 R14: 00000000fee1dead R15: 0000000000000000 FS: 00007f32bedf8980(0000) GS:ffff88801f480000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffff90ab1bb0 CR3: 000000001d057000 CR4: 00000000000006a0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __do_sys_reboot.cold+0x34/0x5b do_syscall_64+0x2d/0x40 Fixes: 1b3a5d02ee07 ("reboot: move arch/x86 reboot= handling to generic kernel") Signed-off-by: Matteo Croce Signed-off-by: Andrew Morton Cc: Arnd Bergmann Cc: Fabian Frederick Cc: Greg Kroah-Hartman Cc: Guenter Roeck Cc: Kees Cook Cc: Mike Rapoport Cc: Pavel Tatashin Cc: Petr Mladek Cc: Robin Holt Cc: Link: https://lkml.kernel.org/r/20201103214025.116799-3-mcroce@linux.microsoft.com Signed-off-by: Linus Torvalds [sudip: use reboot_mode instead of mode] Signed-off-by: Sudip Mukherjee Signed-off-by: Greg Kroah-Hartman commit 3a4304ca26d70ef4aa5a46035776c4e3ac4acc47 Author: Matteo Croce Date: Fri Nov 13 22:52:02 2020 -0800 Revert "kernel/reboot.c: convert simple_strtoul to kstrtoint" commit 8b92c4ff4423aa9900cf838d3294fcade4dbda35 upstream. Patch series "fix parsing of reboot= cmdline", v3. The parsing of the reboot= cmdline has two major errors: - a missing bound check can crash the system on reboot - parsing of the cpu number only works if specified last Fix both. This patch (of 2): This reverts commit 616feab753972b97. kstrtoint() and simple_strtoul() have a subtle difference which makes them non interchangeable: if a non digit character is found amid the parsing, the former will return an error, while the latter will just stop parsing, e.g. simple_strtoul("123xyx") = 123. The kernel cmdline reboot= argument allows to specify the CPU used for rebooting, with the syntax `s####` among the other flags, e.g. "reboot=warm,s31,force", so if this flag is not the last given, it's silently ignored as well as the subsequent ones. Fixes: 616feab75397 ("kernel/reboot.c: convert simple_strtoul to kstrtoint") Signed-off-by: Matteo Croce Signed-off-by: Andrew Morton Cc: Guenter Roeck Cc: Petr Mladek Cc: Arnd Bergmann Cc: Mike Rapoport Cc: Kees Cook Cc: Pavel Tatashin Cc: Robin Holt Cc: Fabian Frederick Cc: Greg Kroah-Hartman Cc: Link: https://lkml.kernel.org/r/20201103214025.116799-2-mcroce@linux.microsoft.com Signed-off-by: Linus Torvalds [sudip: use reboot_mode instead of mode] Signed-off-by: Sudip Mukherjee Signed-off-by: Greg Kroah-Hartman commit 5a097d643717160d859f5bd4a29e2088f48a5fd3 Author: Jiri Olsa Date: Wed Sep 16 13:53:11 2020 +0200 perf/core: Fix race in the perf_mmap_close() function commit f91072ed1b7283b13ca57fcfbece5a3b92726143 upstream. There's a possible race in perf_mmap_close() when checking ring buffer's mmap_count refcount value. The problem is that the mmap_count check is not atomic because we call atomic_dec() and atomic_read() separately. perf_mmap_close: ... atomic_dec(&rb->mmap_count); ... if (atomic_read(&rb->mmap_count)) goto out_put; free_uid out_put: ring_buffer_put(rb); /* could be last */ The race can happen when we have two (or more) events sharing same ring buffer and they go through atomic_dec() and then they both see 0 as refcount value later in atomic_read(). Then both will go on and execute code which is meant to be run just once. The code that detaches ring buffer is probably fine to be executed more than once, but the problem is in calling free_uid(), which will later on demonstrate in related crashes and refcount warnings, like: refcount_t: addition on 0; use-after-free. ... RIP: 0010:refcount_warn_saturate+0x6d/0xf ... Call Trace: prepare_creds+0x190/0x1e0 copy_creds+0x35/0x172 copy_process+0x471/0x1a80 _do_fork+0x83/0x3a0 __do_sys_wait4+0x83/0x90 __do_sys_clone+0x85/0xa0 do_syscall_64+0x5b/0x1e0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Using atomic decrease and check instead of separated calls. Tested-by: Michael Petlan Signed-off-by: Jiri Olsa Signed-off-by: Ingo Molnar Acked-by: Peter Zijlstra Acked-by: Namhyung Kim Acked-by: Wade Mealing Fixes: 9bb5d40cd93c ("perf: Fix mmap() accounting hole"); Link: https://lore.kernel.org/r/20200916115311.GE2301783@krava [sudip: backport to v4.9.y by using ring_buffer] Signed-off-by: Sudip Mukherjee Signed-off-by: Greg Kroah-Hartman commit 35b6c796aa3861d740a333540c78110c2bf47dab Author: Juergen Gross Date: Tue Nov 3 15:35:28 2020 +0100 xen/events: block rogue events for some time commit 5f7f77400ab5b357b5fdb7122c3442239672186c upstream. In order to avoid high dom0 load due to rogue guests sending events at high frequency, block those events in case there was no action needed in dom0 to handle the events. This is done by adding a per-event counter, which set to zero in case an EOI without the XEN_EOI_FLAG_SPURIOUS is received from a backend driver, and incremented when this flag has been set. In case the counter is 2 or higher delay the EOI by 1 << (cnt - 2) jiffies, but not more than 1 second. In order not to waste memory shorten the per-event refcnt to two bytes (it should normally never exceed a value of 2). Add an overflow check to evtchn_get() to make sure the 2 bytes really won't overflow. This is part of XSA-332. Cc: stable@vger.kernel.org Signed-off-by: Juergen Gross Reviewed-by: Jan Beulich Reviewed-by: Stefano Stabellini Reviewed-by: Wei Liu Signed-off-by: Greg Kroah-Hartman commit 0c56aa8589d7dc4342e0d425b21232a08badfd7d Author: Juergen Gross Date: Tue Nov 3 15:35:27 2020 +0100 xen/events: defer eoi in case of excessive number of events commit e99502f76271d6bc4e374fe368c50c67a1fd3070 upstream. In case rogue guests are sending events at high frequency it might happen that xen_evtchn_do_upcall() won't stop processing events in dom0. As this is done in irq handling a crash might be the result. In order to avoid that, delay further inter-domain events after some time in xen_evtchn_do_upcall() by forcing eoi processing into a worker on the same cpu, thus inhibiting new events coming in. The time after which eoi processing is to be delayed is configurable via a new module parameter "event_loop_timeout" which specifies the maximum event loop time in jiffies (default: 2, the value was chosen after some tests showing that a value of 2 was the lowest with an only slight drop of dom0 network throughput while multiple guests performed an event storm). How long eoi processing will be delayed can be specified via another parameter "event_eoi_delay" (again in jiffies, default 10, again the value was chosen after testing with different delay values). This is part of XSA-332. Cc: stable@vger.kernel.org Reported-by: Julien Grall Signed-off-by: Juergen Gross Reviewed-by: Stefano Stabellini Reviewed-by: Wei Liu Signed-off-by: Greg Kroah-Hartman commit cb3c705cfaec87b0ce197444b113ad3e4cab51a4 Author: Juergen Gross Date: Tue Nov 3 15:35:26 2020 +0100 xen/events: use a common cpu hotplug hook for event channels commit 7beb290caa2adb0a399e735a1e175db9aae0523a upstream. Today only fifo event channels have a cpu hotplug callback. In order to prepare for more percpu (de)init work move that callback into events_base.c and add percpu_init() and percpu_deinit() hooks to struct evtchn_ops. This is part of XSA-332. Cc: stable@vger.kernel.org Signed-off-by: Juergen Gross Reviewed-by: Jan Beulich Reviewed-by: Wei Liu Signed-off-by: Greg Kroah-Hartman commit d949b512adaef2d8857609c950895da8af4ad4c9 Author: Juergen Gross Date: Tue Nov 3 15:35:25 2020 +0100 xen/events: switch user event channels to lateeoi model commit c44b849cee8c3ac587da3b0980e01f77500d158c upstream. Instead of disabling the irq when an event is received and enabling it again when handled by the user process use the lateeoi model. This is part of XSA-332. Cc: stable@vger.kernel.org Reported-by: Julien Grall Signed-off-by: Juergen Gross Tested-by: Stefano Stabellini Reviewed-by: Stefano Stabellini Reviewed-by: Jan Beulich Reviewed-by: Wei Liu Signed-off-by: Greg Kroah-Hartman commit ff215b74d5726561fe1a33584f2d58cbe6e6909d Author: Juergen Gross Date: Tue Nov 3 15:35:24 2020 +0100 xen/pciback: use lateeoi irq binding commit c2711441bc961b37bba0615dd7135857d189035f upstream. In order to reduce the chance for the system becoming unresponsive due to event storms triggered by a misbehaving pcifront use the lateeoi irq binding for pciback and unmask the event channel only just before leaving the event handling function. Restructure the handling to support that scheme. Basically an event can come in for two reasons: either a normal request for a pciback action, which is handled in a worker, or in case the guest has finished an AER request which was requested by pciback. When an AER request is issued to the guest and a normal pciback action is currently active issue an EOI early in order to be able to receive another event when the AER request has been finished by the guest. Let the worker processing the normal requests run until no further request is pending, instead of starting a new worker ion that case. Issue the EOI only just before leaving the worker. This scheme allows to drop calling the generic function xen_pcibk_test_and_schedule_op() after processing of any request as the handling of both request types is now separated more cleanly. This is part of XSA-332. Cc: stable@vger.kernel.org Reported-by: Julien Grall Signed-off-by: Juergen Gross Reviewed-by: Jan Beulich Reviewed-by: Wei Liu Signed-off-by: Greg Kroah-Hartman commit 4daf5efd46cda0e91846164303a9da5559559758 Author: Juergen Gross Date: Tue Nov 3 15:35:23 2020 +0100 xen/scsiback: use lateeoi irq binding commit 86991b6e7ea6c613b7692f65106076943449b6b7 upstream. In order to reduce the chance for the system becoming unresponsive due to event storms triggered by a misbehaving scsifront use the lateeoi irq binding for scsiback and unmask the event channel only just before leaving the event handling function. In case of a ring protocol error don't issue an EOI in order to avoid the possibility to use that for producing an event storm. This at once will result in no further call of scsiback_irq_fn(), so the ring_error struct member can be dropped and scsiback_do_cmd_fn() can signal the protocol error via a negative return value. This is part of XSA-332. Cc: stable@vger.kernel.org Reported-by: Julien Grall Signed-off-by: Juergen Gross Reviewed-by: Jan Beulich Reviewed-by: Wei Liu Signed-off-by: Greg Kroah-Hartman commit 7d720061cdf40bb7cceadd337c4c698d7222e977 Author: Juergen Gross Date: Tue Nov 3 15:35:22 2020 +0100 xen/netback: use lateeoi irq binding commit 23025393dbeb3b8b3b60ebfa724cdae384992e27 upstream. In order to reduce the chance for the system becoming unresponsive due to event storms triggered by a misbehaving netfront use the lateeoi irq binding for netback and unmask the event channel only just before going to sleep waiting for new events. Make sure not to issue an EOI when none is pending by introducing an eoi_pending element to struct xenvif_queue. When no request has been consumed set the spurious flag when sending the EOI for an interrupt. This is part of XSA-332. Cc: stable@vger.kernel.org Reported-by: Julien Grall Signed-off-by: Juergen Gross Reviewed-by: Jan Beulich Reviewed-by: Wei Liu Signed-off-by: Greg Kroah-Hartman commit e8972e9615c312e37916f57e224a692dd80c38dd Author: Juergen Gross Date: Tue Nov 3 15:35:21 2020 +0100 xen/blkback: use lateeoi irq binding commit 01263a1fabe30b4d542f34c7e2364a22587ddaf2 upstream. In order to reduce the chance for the system becoming unresponsive due to event storms triggered by a misbehaving blkfront use the lateeoi irq binding for blkback and unmask the event channel only after processing all pending requests. As the thread processing requests is used to do purging work in regular intervals an EOI may be sent only after having received an event. If there was no pending I/O request flag the EOI as spurious. This is part of XSA-332. Cc: stable@vger.kernel.org Reported-by: Julien Grall Signed-off-by: Juergen Gross Reviewed-by: Jan Beulich Reviewed-by: Wei Liu Signed-off-by: Greg Kroah-Hartman commit e068ed2c1b7ab47e623b8c2eef2537eb5b27defb Author: Juergen Gross Date: Tue Nov 3 15:35:20 2020 +0100 xen/events: add a new "late EOI" evtchn framework commit 54c9de89895e0a36047fcc4ae754ea5b8655fb9d upstream. In order to avoid tight event channel related IRQ loops add a new framework of "late EOI" handling: the IRQ the event channel is bound to will be masked until the event has been handled and the related driver is capable to handle another event. The driver is responsible for unmasking the event channel via the new function xen_irq_lateeoi(). This is similar to binding an event channel to a threaded IRQ, but without having to structure the driver accordingly. In order to support a future special handling in case a rogue guest is sending lots of unsolicited events, add a flag to xen_irq_lateeoi() which can be set by the caller to indicate the event was a spurious one. This is part of XSA-332. Cc: stable@vger.kernel.org Reported-by: Julien Grall Signed-off-by: Juergen Gross Reviewed-by: Jan Beulich Reviewed-by: Stefano Stabellini Reviewed-by: Wei Liu Signed-off-by: Greg Kroah-Hartman commit 5b166acf638fef2c0bac2b18b2314d92c65701ba Author: Juergen Gross Date: Tue Nov 3 15:35:19 2020 +0100 xen/events: fix race in evtchn_fifo_unmask() commit f01337197419b7e8a492e83089552b77d3b5fb90 upstream. Unmasking a fifo event channel can result in unmasking it twice, once directly in the kernel and once via a hypercall in case the event was pending. Fix that by doing the local unmask only if the event is not pending. This is part of XSA-332. Cc: stable@vger.kernel.org Signed-off-by: Juergen Gross Reviewed-by: Jan Beulich Signed-off-by: Greg Kroah-Hartman commit d7b048485f6f71e55f32ce904ead727b187b3671 Author: Juergen Gross Date: Tue Nov 3 15:35:18 2020 +0100 xen/events: add a proper barrier to 2-level uevent unmasking commit 4d3fe31bd993ef504350989786858aefdb877daa upstream. A follow-up patch will require certain write to happen before an event channel is unmasked. While the memory barrier is not strictly necessary for all the callers, the main one will need it. In order to avoid an extra memory barrier when using fifo event channels, mandate evtchn_unmask() to provide write ordering. The 2-level event handling unmask operation is missing an appropriate barrier, so add it. Fifo event channels are fine in this regard due to using sync_cmpxchg(). This is part of XSA-332. Cc: stable@vger.kernel.org Suggested-by: Julien Grall Signed-off-by: Juergen Gross Reviewed-by: Julien Grall Reviewed-by: Wei Liu Signed-off-by: Greg Kroah-Hartman commit e4ccd4b1a6e586659005a231e793af325e575e53 Author: Juergen Gross Date: Tue Nov 3 15:35:17 2020 +0100 xen/events: avoid removing an event channel while handling it commit 073d0552ead5bfc7a3a9c01de590e924f11b5dd2 upstream. Today it can happen that an event channel is being removed from the system while the event handling loop is active. This can lead to a race resulting in crashes or WARN() splats when trying to access the irq_info structure related to the event channel. Fix this problem by using a rwlock taken as reader in the event handling loop and as writer when deallocating the irq_info structure. As the observed problem was a NULL dereference in evtchn_from_irq() make this function more robust against races by testing the irq_info pointer to be not NULL before dereferencing it. And finally make all accesses to evtchn_to_irq[row][col] atomic ones in order to avoid seeing partial updates of an array element in irq handling. Note that irq handling can be entered only for event channels which have been valid before, so any not populated row isn't a problem in this regard, as rows are only ever added and never removed. This is XSA-331. Cc: stable@vger.kernel.org Reported-by: Marek Marczykowski-Górecki Reported-by: Jinoh Kang Signed-off-by: Juergen Gross Reviewed-by: Stefano Stabellini Reviewed-by: Wei Liu Signed-off-by: Greg Kroah-Hartman commit d59f7d676bfe2149662361fc3a1c0de9d011066d Author: kiyin(尹亮) Date: Wed Nov 4 08:23:22 2020 +0300 perf/core: Fix a memory leak in perf_event_parse_addr_filter() commit 7bdb157cdebbf95a1cd94ed2e01b338714075d00 upstream As shown through runtime testing, the "filename" allocation is not always freed in perf_event_parse_addr_filter(). There are three possible ways that this could happen: - It could be allocated twice on subsequent iterations through the loop, - or leaked on the success path, - or on the failure path. Clean up the code flow to make it obvious that 'filename' is always freed in the reallocation path and in the two return paths as well. We rely on the fact that kfree(NULL) is NOP and filename is initialized with NULL. This fixes the leak. No other side effects expected. [ Dan Carpenter: cleaned up the code flow & added a changelog. ] [ Ingo Molnar: updated the changelog some more. ] Fixes: 375637bc5249 ("perf/core: Introduce address range filtering") Signed-off-by: "kiyin(尹亮)" Signed-off-by: Dan Carpenter Signed-off-by: Ingo Molnar Cc: "Srivatsa S. Bhat" Cc: Anthony Liguori [sudip: Backported to 4.9: adjust context] Signed-off-by: Sudip Mukherjee Signed-off-by: Greg Kroah-Hartman commit 857302055f1e4701138270db88076417da8f31c8 Author: Mathieu Poirier Date: Mon Jul 16 17:13:51 2018 -0600 perf/core: Fix crash when using HW tracing kernel filters commit 7f635ff187ab6be0b350b3ec06791e376af238ab upstream In function perf_event_parse_addr_filter(), the path::dentry of each struct perf_addr_filter is left unassigned (as it should be) when the pattern being parsed is related to kernel space. But in function perf_addr_filter_match() the same dentries are given to d_inode() where the value is not expected to be NULL, resulting in the following splat: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000058 pc : perf_event_mmap+0x2fc/0x5a0 lr : perf_event_mmap+0x2c8/0x5a0 Process uname (pid: 2860, stack limit = 0x000000001cbcca37) Call trace: perf_event_mmap+0x2fc/0x5a0 mmap_region+0x124/0x570 do_mmap+0x344/0x4f8 vm_mmap_pgoff+0xe4/0x110 vm_mmap+0x2c/0x40 elf_map+0x60/0x108 load_elf_binary+0x450/0x12c4 search_binary_handler+0x90/0x290 __do_execve_file.isra.13+0x6e4/0x858 sys_execve+0x3c/0x50 el0_svc_naked+0x30/0x34 This patch is fixing the problem by introducing a new check in function perf_addr_filter_match() to see if the filter's dentry is NULL. Signed-off-by: Mathieu Poirier Signed-off-by: Peter Zijlstra (Intel) Acked-by: Alexander Shishkin Cc: Arnaldo Carvalho de Melo Cc: Jiri Olsa Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Thomas Gleixner Cc: Vince Weaver Cc: acme@kernel.org Cc: miklos@szeredi.hu Cc: namhyung@kernel.org Cc: songliubraving@fb.com Fixes: 9511bce9fe8e ("perf/core: Fix bad use of igrab()") Link: http://lkml.kernel.org/r/1531782831-1186-1-git-send-email-mathieu.poirier@linaro.org Signed-off-by: Ingo Molnar Signed-off-by: Sudip Mukherjee Signed-off-by: Greg Kroah-Hartman commit 51f0471b12817e576e42ecea027177798a968db4 Author: Song Liu Date: Tue Apr 17 23:29:07 2018 -0700 perf/core: Fix bad use of igrab() commit 9511bce9fe8e5e6c0f923c09243a713eba560141 upstream As Miklos reported and suggested: "This pattern repeats two times in trace_uprobe.c and in kernel/events/core.c as well: ret = kern_path(filename, LOOKUP_FOLLOW, &path); if (ret) goto fail_address_parse; inode = igrab(d_inode(path.dentry)); path_put(&path); And it's wrong. You can only hold a reference to the inode if you have an active ref to the superblock as well (which is normally through path.mnt) or holding s_umount. This way unmounting the containing filesystem while the tracepoint is active will give you the "VFS: Busy inodes after unmount..." message and a crash when the inode is finally put. Solution: store path instead of inode." This patch fixes the issue in kernel/event/core.c. Reviewed-and-tested-by: Alexander Shishkin Reported-by: Miklos Szeredi Signed-off-by: Song Liu Signed-off-by: Peter Zijlstra (Intel) Cc: Cc: Alexander Shishkin Cc: Arnaldo Carvalho de Melo Cc: Jiri Olsa Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Stephane Eranian Cc: Thomas Gleixner Cc: Vince Weaver Fixes: 375637bc5249 ("perf/core: Introduce address range filtering") Link: http://lkml.kernel.org/r/20180418062907.3210386-2-songliubraving@fb.com Signed-off-by: Ingo Molnar [sudip: Backported to 4.9: use file_inode()] Signed-off-by: Sudip Mukherjee Signed-off-by: Greg Kroah-Hartman commit 27979f60871ae5755c14c4425fe976f055e1fde8 Author: Anand K Mistry Date: Thu Nov 5 16:33:04 2020 +1100 x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP commit 1978b3a53a74e3230cd46932b149c6e62e832e9a upstream. On AMD CPUs which have the feature X86_FEATURE_AMD_STIBP_ALWAYS_ON, STIBP is set to on and spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED At the same time, IBPB can be set to conditional. However, this leads to the case where it's impossible to turn on IBPB for a process because in the PR_SPEC_DISABLE case in ib_prctl_set() the spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED condition leads to a return before the task flag is set. Similarly, ib_prctl_get() will return PR_SPEC_DISABLE even though IBPB is set to conditional. More generally, the following cases are possible: 1. STIBP = conditional && IBPB = on for spectre_v2_user=seccomp,ibpb 2. STIBP = on && IBPB = conditional for AMD CPUs with X86_FEATURE_AMD_STIBP_ALWAYS_ON The first case functions correctly today, but only because spectre_v2_user_ibpb isn't updated to reflect the IBPB mode. At a high level, this change does one thing. If either STIBP or IBPB is set to conditional, allow the prctl to change the task flag. Also, reflect that capability when querying the state. This isn't perfect since it doesn't take into account if only STIBP or IBPB is unconditionally on. But it allows the conditional feature to work as expected, without affecting the unconditional one. [ bp: Massage commit message and comment; space out statements for better readability. ] Fixes: 21998a351512 ("x86/speculation: Avoid force-disabling IBPB based on STIBP and enhanced IBRS.") Signed-off-by: Anand K Mistry Signed-off-by: Borislav Petkov Acked-by: Thomas Gleixner Acked-by: Tom Lendacky Link: https://lkml.kernel.org/r/20201105163246.v2.1.Ifd7243cd3e2c2206a893ad0a5b9a4f19549e22c6@changeid Signed-off-by: Greg Kroah-Hartman commit 29da3bb1a848abdcadc2cbec68787c4c81671341 Author: George Spelvin Date: Sun Aug 9 06:57:44 2020 +0000 random32: make prandom_u32() output unpredictable commit c51f8f88d705e06bd696d7510aff22b33eb8e638 upstream. Non-cryptographic PRNGs may have great statistical properties, but are usually trivially predictable to someone who knows the algorithm, given a small sample of their output. An LFSR like prandom_u32() is particularly simple, even if the sample is widely scattered bits. It turns out the network stack uses prandom_u32() for some things like random port numbers which it would prefer are *not* trivially predictable. Predictability led to a practical DNS spoofing attack. Oops. This patch replaces the LFSR with a homebrew cryptographic PRNG based on the SipHash round function, which is in turn seeded with 128 bits of strong random key. (The authors of SipHash have *not* been consulted about this abuse of their algorithm.) Speed is prioritized over security; attacks are rare, while performance is always wanted. Replacing all callers of prandom_u32() is the quick fix. Whether to reinstate a weaker PRNG for uses which can tolerate it is an open question. Commit f227e3ec3b5c ("random32: update the net random state on interrupt and activity") was an earlier attempt at a solution. This patch replaces it. Reported-by: Amit Klein Cc: Willy Tarreau Cc: Eric Dumazet Cc: "Jason A. Donenfeld" Cc: Andy Lutomirski Cc: Kees Cook Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: Linus Torvalds Cc: tytso@mit.edu Cc: Florian Westphal Cc: Marc Plumb Fixes: f227e3ec3b5c ("random32: update the net random state on interrupt and activity") Signed-off-by: George Spelvin Link: https://lore.kernel.org/netdev/20200808152628.GA27941@SDF.ORG/ [ willy: partial reversal of f227e3ec3b5c; moved SIPROUND definitions to prandom.h for later use; merged George's prandom_seed() proposal; inlined siprand_u32(); replaced the net_rand_state[] array with 4 members to fix a build issue; cosmetic cleanups to make checkpatch happy; fixed RANDOM32_SELFTEST build ] [wt: backported to 4.9 -- various context adjustments; timer API change] Signed-off-by: Willy Tarreau Signed-off-by: Greg Kroah-Hartman commit 44d2ea6ef69fb12187aaa327967fc7b323139ba6 Author: Mao Wenan Date: Tue Nov 10 08:16:31 2020 +0800 net: Update window_clamp if SOCK_RCVBUF is set [ Upstream commit 909172a149749242990a6e64cb55d55460d4e417 ] When net.ipv4.tcp_syncookies=1 and syn flood is happened, cookie_v4_check or cookie_v6_check tries to redo what tcp_v4_send_synack or tcp_v6_send_synack did, rsk_window_clamp will be changed if SOCK_RCVBUF is set, which will make rcv_wscale is different, the client still operates with initial window scale and can overshot granted window, the client use the initial scale but local server use new scale to advertise window value, and session work abnormally. Fixes: e88c64f0a425 ("tcp: allow effective reduction of TCP's rcv-buffer via setsockopt") Signed-off-by: Mao Wenan Signed-off-by: Eric Dumazet Link: https://lore.kernel.org/r/1604967391-123737-1-git-send-email-wenan.mao@linux.alibaba.com Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman commit 1bbd12d33defae71b39abc4231b4f8a02c047836 Author: Martin Schiller Date: Mon Nov 9 07:54:49 2020 +0100 net/x25: Fix null-ptr-deref in x25_connect [ Upstream commit 361182308766a265b6c521879b34302617a8c209 ] This fixes a regression for blocking connects introduced by commit 4becb7ee5b3d ("net/x25: Fix x25_neigh refcnt leak when x25 disconnect"). The x25->neighbour is already set to "NULL" by x25_disconnect() now, while a blocking connect is waiting in x25_wait_for_connection_establishment(). Therefore x25->neighbour must not be accessed here again and x25->state is also already set to X25_STATE_0 by x25_disconnect(). Fixes: 4becb7ee5b3d ("net/x25: Fix x25_neigh refcnt leak when x25 disconnect") Signed-off-by: Martin Schiller Reviewed-by: Xie He Link: https://lore.kernel.org/r/20201109065449.9014-1-ms@dev.tdt.de Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman commit 09aeee1252ea11c7186215d60a72454eb6db8cfb Author: Ursula Braun Date: Mon Nov 9 08:57:05 2020 +0100 net/af_iucv: fix null pointer dereference on shutdown [ Upstream commit 4031eeafa71eaf22ae40a15606a134ae86345daf ] syzbot reported the following KASAN finding: BUG: KASAN: nullptr-dereference in iucv_send_ctrl+0x390/0x3f0 net/iucv/af_iucv.c:385 Read of size 2 at addr 000000000000021e by task syz-executor907/519 CPU: 0 PID: 519 Comm: syz-executor907 Not tainted 5.9.0-syzkaller-07043-gbcf9877ad213 #0 Hardware name: IBM 3906 M04 701 (KVM/Linux) Call Trace: [<00000000c576af60>] unwind_start arch/s390/include/asm/unwind.h:65 [inline] [<00000000c576af60>] show_stack+0x180/0x228 arch/s390/kernel/dumpstack.c:135 [<00000000c9dcd1f8>] __dump_stack lib/dump_stack.c:77 [inline] [<00000000c9dcd1f8>] dump_stack+0x268/0x2f0 lib/dump_stack.c:118 [<00000000c5fed016>] print_address_description.constprop.0+0x5e/0x218 mm/kasan/report.c:383 [<00000000c5fec82a>] __kasan_report mm/kasan/report.c:517 [inline] [<00000000c5fec82a>] kasan_report+0x11a/0x168 mm/kasan/report.c:534 [<00000000c98b5b60>] iucv_send_ctrl+0x390/0x3f0 net/iucv/af_iucv.c:385 [<00000000c98b6262>] iucv_sock_shutdown+0x44a/0x4c0 net/iucv/af_iucv.c:1457 [<00000000c89d3a54>] __sys_shutdown+0x12c/0x1c8 net/socket.c:2204 [<00000000c89d3b70>] __do_sys_shutdown net/socket.c:2212 [inline] [<00000000c89d3b70>] __s390x_sys_shutdown+0x38/0x48 net/socket.c:2210 [<00000000c9e36eac>] system_call+0xe0/0x28c arch/s390/kernel/entry.S:415 There is nothing to shutdown if a connection has never been established. Besides that iucv->hs_dev is not yet initialized if a socket is in IUCV_OPEN state and iucv->path is not yet initialized if socket is in IUCV_BOUND state. So, just skip the shutdown calls for a socket in these states. Fixes: eac3731bd04c ("[S390]: Add AF_IUCV socket support") Fixes: 82492a355fac ("af_iucv: add shutdown for HS transport") Reviewed-by: Vasily Gorbik Signed-off-by: Ursula Braun [jwi: correct one Fixes tag] Signed-off-by: Julian Wiedmann Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman commit a870fc0270a8241c3fb0eb9a2e6744e5e0f94415 Author: Oliver Herms Date: Tue Nov 3 11:41:33 2020 +0100 IPv6: Set SIT tunnel hard_header_len to zero [ Upstream commit 8ef9ba4d666614497a057d09b0a6eafc1e34eadf ] Due to the legacy usage of hard_header_len for SIT tunnels while already using infrastructure from net/ipv4/ip_tunnel.c the calculation of the path MTU in tnl_update_pmtu is incorrect. This leads to unnecessary creation of MTU exceptions for any flow going over a SIT tunnel. As SIT tunnels do not have a header themsevles other than their transport (L3, L2) headers we're leaving hard_header_len set to zero as tnl_update_pmtu is already taking care of the transport headers sizes. This will also help avoiding unnecessary IPv6 GC runs and spinlock contention seen when using SIT tunnels and for more than net.ipv6.route.gc_thresh flows. Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.") Signed-off-by: Oliver Herms Acked-by: Willem de Bruijn Link: https://lore.kernel.org/r/20201103104133.GA1573211@tws Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman commit 41cdb4aca426daf7793e40408a8e0b3419eff277 Author: Stefano Stabellini Date: Mon Oct 26 17:02:14 2020 -0700 swiotlb: fix "x86: Don't panic if can not alloc buffer for swiotlb" commit e9696d259d0fb5d239e8c28ca41089838ea76d13 upstream. kernel/dma/swiotlb.c:swiotlb_init gets called first and tries to allocate a buffer for the swiotlb. It does so by calling memblock_alloc_low(PAGE_ALIGN(bytes), PAGE_SIZE); If the allocation must fail, no_iotlb_memory is set. Later during initialization swiotlb-xen comes in (drivers/xen/swiotlb-xen.c:xen_swiotlb_init) and given that io_tlb_start is != 0, it thinks the memory is ready to use when actually it is not. When the swiotlb is actually needed, swiotlb_tbl_map_single gets called and since no_iotlb_memory is set the kernel panics. Instead, if swiotlb-xen.c:xen_swiotlb_init knew the swiotlb hadn't been initialized, it would do the initialization itself, which might still succeed. Fix the panic by setting io_tlb_start to 0 on swiotlb initialization failure, and also by setting no_iotlb_memory to false on swiotlb initialization success. Fixes: ac2cbab21f31 ("x86: Don't panic if can not alloc buffer for swiotlb") Reported-by: Elliott Mitchell Tested-by: Elliott Mitchell Signed-off-by: Stefano Stabellini Reviewed-by: Christoph Hellwig Cc: stable@vger.kernel.org Signed-off-by: Konrad Rzeszutek Wilk Signed-off-by: Greg Kroah-Hartman commit 98699510d42f2ba586d9e72e738fc7c8c08ec21c Author: Coiby Xu Date: Fri Nov 6 07:19:09 2020 +0800 pinctrl: amd: fix incorrect way to disable debounce filter commit 06abe8291bc31839950f7d0362d9979edc88a666 upstream. The correct way to disable debounce filter is to clear bit 5 and 6 of the register. Cc: stable@vger.kerne.org Signed-off-by: Coiby Xu Reviewed-by: Hans de Goede Cc: Hans de Goede Link: https://lore.kernel.org/linux-gpio/df2c008b-e7b5-4fdd-42ea-4d1c62b52139@redhat.com/ Link: https://lore.kernel.org/r/20201105231912.69527-2-coiby.xu@gmail.com Signed-off-by: Linus Walleij Signed-off-by: Greg Kroah-Hartman commit b71d2fc983a1dbe3c35b3262a730bb5011dcd0f3 Author: Coiby Xu Date: Fri Nov 6 07:19:10 2020 +0800 pinctrl: amd: use higher precision for 512 RtcClk commit c64a6a0d4a928c63e5bc3b485552a8903a506c36 upstream. RTC is 32.768kHz thus 512 RtcClk equals 15625 usec. The documentation likely has dropped precision and that's why the driver mistakenly took the slightly deviated value. Cc: stable@vger.kernel.org Reported-by: Andy Shevchenko Suggested-by: Andy Shevchenko Suggested-by: Hans de Goede Signed-off-by: Coiby Xu Reviewed-by: Andy Shevchenko Reviewed-by: Hans de Goede Link: https://lore.kernel.org/linux-gpio/2f4706a1-502f-75f0-9596-cc25b4933b6c@redhat.com/ Link: https://lore.kernel.org/r/20201105231912.69527-3-coiby.xu@gmail.com Signed-off-by: Linus Walleij Signed-off-by: Greg Kroah-Hartman commit b959cd196963ccdf16b59d228829036b6b8e0913 Author: Thomas Zimmermann Date: Thu Nov 5 20:02:56 2020 +0100 drm/gma500: Fix out-of-bounds access to struct drm_device.vblank[] commit 06ad8d339524bf94b89859047822c31df6ace239 upstream. The gma500 driver expects 3 pipelines in several it's IRQ functions. Accessing struct drm_device.vblank[], this fails with devices that only have 2 pipelines. An example KASAN report is shown below. [ 62.267688] ================================================================== [ 62.268856] BUG: KASAN: slab-out-of-bounds in psb_irq_postinstall+0x250/0x3c0 [gma500_gfx] [ 62.269450] Read of size 1 at addr ffff8880012bc6d0 by task systemd-udevd/285 [ 62.269949] [ 62.270192] CPU: 0 PID: 285 Comm: systemd-udevd Tainted: G E 5.10.0-rc1-1-default+ #572 [ 62.270807] Hardware name: /DN2800MT, BIOS MTCDT10N.86A.0164.2012.1213.1024 12/13/2012 [ 62.271366] Call Trace: [ 62.271705] dump_stack+0xae/0xe5 [ 62.272180] print_address_description.constprop.0+0x17/0xf0 [ 62.272987] ? psb_irq_postinstall+0x250/0x3c0 [gma500_gfx] [ 62.273474] __kasan_report.cold+0x20/0x38 [ 62.273989] ? psb_irq_postinstall+0x250/0x3c0 [gma500_gfx] [ 62.274460] kasan_report+0x3a/0x50 [ 62.274891] psb_irq_postinstall+0x250/0x3c0 [gma500_gfx] [ 62.275380] drm_irq_install+0x131/0x1f0 <...> [ 62.300751] Allocated by task 285: [ 62.301223] kasan_save_stack+0x1b/0x40 [ 62.301731] __kasan_kmalloc.constprop.0+0xbf/0xd0 [ 62.302293] drmm_kmalloc+0x55/0x100 [ 62.302773] drm_vblank_init+0x77/0x210 Resolve the issue by only handling vblank entries up to the number of CRTCs. I'm adding a Fixes tag for reference, although the bug has been present since the driver's initial commit. Signed-off-by: Thomas Zimmermann Reviewed-by: Daniel Vetter Fixes: 5c49fd3aa0ab ("gma500: Add the core DRM files and headers") Cc: Alan Cox Cc: Dave Airlie Cc: Patrik Jakobsson Cc: dri-devel@lists.freedesktop.org Cc: stable@vger.kernel.org#v3.3+ Link: https://patchwork.freedesktop.org/patch/msgid/20201105190256.3893-1-tzimmermann@suse.de Signed-off-by: Greg Kroah-Hartman commit 951cb4f2319e1cb7e8d660e7f17b6baa7c608437 Author: Al Viro Date: Wed Oct 28 16:39:49 2020 -0400 don't dump the threads that had been already exiting when zapped. commit 77f6ab8b7768cf5e6bdd0e72499270a0671506ee upstream. Coredump logics needs to report not only the registers of the dumping thread, but (since 2.5.43) those of other threads getting killed. Doing that might require extra state saved on the stack in asm glue at kernel entry; signal delivery logics does that (we need to be able to save sigcontext there, at the very least) and so does seccomp. That covers all callers of do_coredump(). Secondary threads get hit with SIGKILL and caught as soon as they reach exit_mm(), which normally happens in signal delivery, so those are also fine most of the time. Unfortunately, it is possible to end up with secondary zapped when it has already entered exit(2) (or, worse yet, is oopsing). In those cases we reach exit_mm() when mm->core_state is already set, but the stack contents is not what we would have in signal delivery. At least on two architectures (alpha and m68k) it leads to infoleaks - we end up with a chunk of kernel stack written into coredump, with the contents consisting of normal C stack frames of the call chain leading to exit_mm() instead of the expected copy of userland registers. In case of alpha we leak 312 bytes of stack. Other architectures (including the regset-using ones) might have similar problems - the normal user of regsets is ptrace and the state of tracee at the time of such calls is special in the same way signal delivery is. Note that had the zapper gotten to the exiting thread slightly later, it wouldn't have been included into coredump anyway - we skip the threads that have already cleared their ->mm. So let's pretend that zapper always loses the race. IOW, have exit_mm() only insert into the dumper list if we'd gotten there from handling a fatal signal[*] As the result, the callers of do_exit() that have *not* gone through get_signal() are not seen by coredump logics as secondary threads. Which excludes voluntary exit()/oopsen/traps/etc. The dumper thread itself is unaffected by that, so seccomp is fine. [*] originally I intended to add a new flag in tsk->flags, but ebiederman pointed out that PF_SIGNALED is already doing just what we need. Cc: stable@vger.kernel.org Fixes: d89f3847def4 ("[PATCH] thread-aware coredumps, 2.5.43-C3") History-tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git Acked-by: "Eric W. Biederman" Signed-off-by: Al Viro Signed-off-by: Greg Kroah-Hartman commit cd567cc3b309a5459d7d66180ec37862f32714e9 Author: Wengang Wang Date: Fri Nov 13 22:52:23 2020 -0800 ocfs2: initialize ip_next_orphan commit f5785283dd64867a711ca1fb1f5bb172f252ecdf upstream. Though problem if found on a lower 4.1.12 kernel, I think upstream has same issue. In one node in the cluster, there is the following callback trace: # cat /proc/21473/stack __ocfs2_cluster_lock.isra.36+0x336/0x9e0 [ocfs2] ocfs2_inode_lock_full_nested+0x121/0x520 [ocfs2] ocfs2_evict_inode+0x152/0x820 [ocfs2] evict+0xae/0x1a0 iput+0x1c6/0x230 ocfs2_orphan_filldir+0x5d/0x100 [ocfs2] ocfs2_dir_foreach_blk+0x490/0x4f0 [ocfs2] ocfs2_dir_foreach+0x29/0x30 [ocfs2] ocfs2_recover_orphans+0x1b6/0x9a0 [ocfs2] ocfs2_complete_recovery+0x1de/0x5c0 [ocfs2] process_one_work+0x169/0x4a0 worker_thread+0x5b/0x560 kthread+0xcb/0xf0 ret_from_fork+0x61/0x90 The above stack is not reasonable, the final iput shouldn't happen in ocfs2_orphan_filldir() function. Looking at the code, 2067 /* Skip inodes which are already added to recover list, since dio may 2068 * happen concurrently with unlink/rename */ 2069 if (OCFS2_I(iter)->ip_next_orphan) { 2070 iput(iter); 2071 return 0; 2072 } 2073 The logic thinks the inode is already in recover list on seeing ip_next_orphan is non-NULL, so it skip this inode after dropping a reference which incremented in ocfs2_iget(). While, if the inode is already in recover list, it should have another reference and the iput() at line 2070 should not be the final iput (dropping the last reference). So I don't think the inode is really in the recover list (no vmcore to confirm). Note that ocfs2_queue_orphans(), though not shown up in the call back trace, is holding cluster lock on the orphan directory when looking up for unlinked inodes. The on disk inode eviction could involve a lot of IOs which may need long time to finish. That means this node could hold the cluster lock for very long time, that can lead to the lock requests (from other nodes) to the orhpan directory hang for long time. Looking at more on ip_next_orphan, I found it's not initialized when allocating a new ocfs2_inode_info structure. This causes te reflink operations from some nodes hang for very long time waiting for the cluster lock on the orphan directory. Fix: initialize ip_next_orphan as NULL. Signed-off-by: Wengang Wang Signed-off-by: Andrew Morton Reviewed-by: Joseph Qi Cc: Mark Fasheh Cc: Joel Becker Cc: Junxiao Bi Cc: Changwei Ge Cc: Gang He Cc: Jun Piao Cc: Link: https://lkml.kernel.org/r/20201109171746.27884-1-wen.gang.wang@oracle.com Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 98d9619d5094cff004ad26dc9bf1c31073e4564b Author: Alexander Usyskin Date: Thu Oct 29 11:54:42 2020 +0200 mei: protect mei_cl_mtu from null dereference commit bcbc0b2e275f0a797de11a10eff495b4571863fc upstream. A receive callback is queued while the client is still connected but can still be called after the client was disconnected. Upon disconnect cl->me_cl is set to NULL, hence we need to check that ME client is not-NULL in mei_cl_mtu to avoid null dereference. Cc: Signed-off-by: Alexander Usyskin Signed-off-by: Tomas Winkler Link: https://lore.kernel.org/r/20201029095444.957924-2-tomas.winkler@intel.com Signed-off-by: Greg Kroah-Hartman commit c6718a66229c6f22e7e8c64670e916ac57da9900 Author: Chris Brandt Date: Wed Nov 11 08:12:09 2020 -0500 usb: cdc-acm: Add DISABLE_ECHO for Renesas USB Download mode commit 6d853c9e4104b4fc8d55dc9cd3b99712aa347174 upstream. Renesas R-Car and RZ/G SoCs have a firmware download mode over USB. However, on reset a banner string is transmitted out which is not expected to be echoed back and will corrupt the protocol. Cc: stable Acked-by: Oliver Neukum Signed-off-by: Chris Brandt Link: https://lore.kernel.org/r/20201111131209.3977903-1-chris.brandt@renesas.com Signed-off-by: Greg Kroah-Hartman commit fe375b3260cae7e00ebfda0576921a5eabf3c677 Author: Joseph Qi Date: Tue Nov 3 10:29:02 2020 +0800 ext4: unlock xattr_sem properly in ext4_inline_data_truncate() commit 7067b2619017d51e71686ca9756b454de0e5826a upstream. It takes xattr_sem to check inline data again but without unlock it in case not have. So unlock it before return. Fixes: aef1c8513c1f ("ext4: let ext4_truncate handle inline data correctly") Reported-by: Dan Carpenter Cc: Tao Ma Signed-off-by: Joseph Qi Reviewed-by: Andreas Dilger Link: https://lore.kernel.org/r/1604370542-124630-1-git-send-email-joseph.qi@linux.alibaba.com Signed-off-by: Theodore Ts'o Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman commit ca0357a529f88545795e6adbe7e2ecf436a78134 Author: Kaixu Xia Date: Thu Oct 29 23:46:36 2020 +0800 ext4: correctly report "not supported" for {usr,grp}jquota when !CONFIG_QUOTA commit 174fe5ba2d1ea0d6c5ab2a7d4aa058d6d497ae4d upstream. The macro MOPT_Q is used to indicates the mount option is related to quota stuff and is defined to be MOPT_NOSUPPORT when CONFIG_QUOTA is disabled. Normally the quota options are handled explicitly, so it didn't matter that the MOPT_STRING flag was missing, even though the usrjquota and grpjquota mount options take a string argument. It's important that's present in the !CONFIG_QUOTA case, since without MOPT_STRING, the mount option matcher will match usrjquota= followed by an integer, and will otherwise skip the table entry, and so "mount option not supported" error message is never reported. [ Fixed up the commit description to better explain why the fix works. --TYT ] Fixes: 26092bf52478 ("ext4: use a table-driven handler for mount options") Signed-off-by: Kaixu Xia Link: https://lore.kernel.org/r/1603986396-28917-1-git-send-email-kaixuxia@tencent.com Signed-off-by: Theodore Ts'o Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman commit e07eab5c2991052a17d4e1a8ada6fd52973764f7 Author: Peter Zijlstra Date: Fri Oct 30 12:49:45 2020 +0100 perf: Fix get_recursion_context() [ Upstream commit ce0f17fc93f63ee91428af10b7b2ddef38cd19e5 ] One should use in_serving_softirq() to detect SoftIRQ context. Fixes: 96f6d4444302 ("perf_counter: avoid recursion") Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20201030151955.120572175@infradead.org Signed-off-by: Sasha Levin commit f44b8662a581d1e09b0af47fba1eac33c1a655ab Author: Wang Hai Date: Tue Nov 10 22:46:14 2020 +0800 cosa: Add missing kfree in error path of cosa_write [ Upstream commit 52755b66ddcef2e897778fac5656df18817b59ab ] If memory allocation for 'kbuf' succeed, cosa_write() doesn't have a corresponding kfree() in exception handling. Thus add kfree() for this function implementation. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Hulk Robot Signed-off-by: Wang Hai Acked-by: Jan "Yenya" Kasprzak Link: https://lore.kernel.org/r/20201110144614.43194-1-wanghai38@huawei.com Signed-off-by: Jakub Kicinski Signed-off-by: Sasha Levin commit 02317044a80418b4bf4513fa710e030c655a2a6f Author: Evan Nimmo Date: Tue Nov 10 15:28:25 2020 +1300 of/address: Fix of_node memory leak in of_dma_is_coherent [ Upstream commit a5bea04fcc0b3c0aec71ee1fd58fd4ff7ee36177 ] Commit dabf6b36b83a ("of: Add OF_DMA_DEFAULT_COHERENT & select it on powerpc") added a check to of_dma_is_coherent which returns early if OF_DMA_DEFAULT_COHERENT is enabled. This results in the of_node_put() being skipped causing a memory leak. Moved the of_node_get() below this check so we now we only get the node if OF_DMA_DEFAULT_COHERENT is not enabled. Fixes: dabf6b36b83a ("of: Add OF_DMA_DEFAULT_COHERENT & select it on powerpc") Signed-off-by: Evan Nimmo Link: https://lore.kernel.org/r/20201110022825.30895-1-evan.nimmo@alliedtelesis.co.nz Signed-off-by: Rob Herring Signed-off-by: Sasha Levin commit 8acbfc86951b9141ccced70183cd42f86e3ed7de Author: Christoph Hellwig Date: Wed Nov 11 08:07:37 2020 -0800 xfs: fix a missing unlock on error in xfs_fs_map_blocks [ Upstream commit 2bd3fa793aaa7e98b74e3653fdcc72fa753913b5 ] We also need to drop the iolock when invalidate_inode_pages2 fails, not only on all other error or successful cases. Fixes: 527851124d10 ("xfs: implement pNFS export operations") Signed-off-by: Christoph Hellwig Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Signed-off-by: Sasha Levin commit 21ff2ad79339e5dfebfc84e764e33e454e369fe4 Author: Darrick J. Wong Date: Sun Nov 8 16:32:44 2020 -0800 xfs: fix rmap key and record comparison functions [ Upstream commit 6ff646b2ceb0eec916101877f38da0b73e3a5b7f ] Keys for extent interval records in the reverse mapping btree are supposed to be computed as follows: (physical block, owner, fork, is_btree, is_unwritten, offset) This provides users the ability to look up a reverse mapping from a bmbt record -- start with the physical block; then if there are multiple records for the same block, move on to the owner; then the inode fork type; and so on to the file offset. However, the key comparison functions incorrectly remove the fork/btree/unwritten information that's encoded in the on-disk offset. This means that lookup comparisons are only done with: (physical block, owner, offset) This means that queries can return incorrect results. On consistent filesystems this hasn't been an issue because blocks are never shared between forks or with bmbt blocks; and are never unwritten. However, this bug means that online repair cannot always detect corruption in the key information in internal rmapbt nodes. Found by fuzzing keys[1].attrfork = ones on xfs/371. Fixes: 4b8ed67794fe ("xfs: add rmap btree operations") Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig Signed-off-by: Sasha Levin commit fa98ca4c5d013519225361ce96cc2740f9052f82 Author: Darrick J. Wong Date: Sun Nov 8 16:32:43 2020 -0800 xfs: fix flags argument to rmap lookup when converting shared file rmaps [ Upstream commit ea8439899c0b15a176664df62aff928010fad276 ] Pass the same oldext argument (which contains the existing rmapping's unwritten state) to xfs_rmap_lookup_le_range at the start of xfs_rmap_convert_shared. At this point in the code, flags is zero, which means that we perform lookups using the wrong key. Fixes: 3f165b334e51 ("xfs: convert unwritten status of reverse mappings for shared files") Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig Signed-off-by: Sasha Levin commit e4da271ee972704dbfc4d7a2778e48162542bb67 Author: Billy Tsai Date: Fri Oct 30 13:54:50 2020 +0800 pinctrl: aspeed: Fix GPI only function problem. [ Upstream commit 9b92f5c51e9a41352d665f6f956bd95085a56a83 ] Some gpio pin at aspeed soc is input only and the prefix name of these pin is "GPI" only. This patch fine-tune the condition of GPIO check from "GPIO" to "GPI" and it will fix the usage error of banks D and E in the AST2400/AST2500 and banks T and U in the AST2600. Fixes: 4d3d0e4272d8 ("pinctrl: Add core support for Aspeed SoCs") Signed-off-by: Billy Tsai Reviewed-by: Andrew Jeffery Link: https://lore.kernel.org/r/20201030055450.29613-1-billy_tsai@aspeedtech.com Signed-off-by: Linus Walleij Signed-off-by: Sasha Levin commit 59768f22aa817d32c5dde35df7ae31548149fcf2 Author: Suravee Suthikulpanit Date: Thu Oct 15 02:50:02 2020 +0000 iommu/amd: Increase interrupt remapping table limit to 512 entries [ Upstream commit 73db2fc595f358460ce32bcaa3be1f0cce4a2db1 ] Certain device drivers allocate IO queues on a per-cpu basis. On AMD EPYC platform, which can support up-to 256 cpu threads, this can exceed the current MAX_IRQ_PER_TABLE limit of 256, and result in the error message: AMD-Vi: Failed to allocate IRTE This has been observed with certain NVME devices. AMD IOMMU hardware can actually support upto 512 interrupt remapping table entries. Therefore, update the driver to match the hardware limit. Please note that this also increases the size of interrupt remapping table to 8KB per device when using the 128-bit IRTE format. Signed-off-by: Suravee Suthikulpanit Link: https://lore.kernel.org/r/20201015025002.87997-1-suravee.suthikulpanit@amd.com Signed-off-by: Joerg Roedel Signed-off-by: Sasha Levin commit 3c891e37b74209dd6fc364e34b2ac074151e4d5d Author: Hannes Reinecke Date: Thu Sep 24 12:45:59 2020 +0200 scsi: scsi_dh_alua: Avoid crash during alua_bus_detach() [ Upstream commit 5faf50e9e9fdc2117c61ff7e20da49cd6a29e0ca ] alua_bus_detach() might be running concurrently with alua_rtpg_work(), so we might trip over h->sdev == NULL and call BUG_ON(). The correct way of handling it is to not set h->sdev to NULL in alua_bus_detach(), and call rcu_synchronize() before the final delete to ensure that all concurrent threads have left the critical section. Then we can get rid of the BUG_ON() and replace it with a simple if condition. Link: https://lore.kernel.org/r/1600167537-12509-1-git-send-email-jitendra.khasdev@oracle.com Link: https://lore.kernel.org/r/20200924104559.26753-1-hare@suse.de Cc: Brian Bunker Acked-by: Brian Bunker Tested-by: Jitendra Khasdev Reviewed-by: Jitendra Khasdev Signed-off-by: Hannes Reinecke Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin commit 09a227a1071968e5cdeffb269de6f7c8fdddebd5 Author: Ye Bin Date: Fri Oct 9 15:02:15 2020 +0800 cfg80211: regulatory: Fix inconsistent format argument [ Upstream commit db18d20d1cb0fde16d518fb5ccd38679f174bc04 ] Fix follow warning: [net/wireless/reg.c:3619]: (warning) %d in format string (no. 2) requires 'int' but the argument type is 'unsigned int'. Reported-by: Hulk Robot Signed-off-by: Ye Bin Link: https://lore.kernel.org/r/20201009070215.63695-1-yebin10@huawei.com Signed-off-by: Johannes Berg Signed-off-by: Sasha Levin commit 74b8bf4d2c737c30186df407e8dac33524c42752 Author: Johannes Berg Date: Fri Oct 9 13:25:41 2020 +0200 mac80211: fix use of skb payload instead of header [ Upstream commit 14f46c1e5108696ec1e5a129e838ecedf108c7bf ] When ieee80211_skb_resize() is called from ieee80211_build_hdr() the skb has no 802.11 header yet, in fact it consist only of the payload as the ethernet frame is removed. As such, we're using the payload data for ieee80211_is_mgmt(), which is of course completely wrong. This didn't really hurt us because these are always data frames, so we could only have added more tailroom than we needed if we determined it was a management frame and sdata->crypto_tx_tailroom_needed_cnt was false. However, syzbot found that of course there need not be any payload, so we're using at best uninitialized memory for the check. Fix this to pass explicitly the kind of frame that we have instead of checking there, by replacing the "bool may_encrypt" argument with an argument that can carry the three possible states - it's not going to be encrypted, it's a management frame, or it's a data frame (and then we check sdata->crypto_tx_tailroom_needed_cnt). Reported-by: syzbot+32fd1a1bfe355e93f1e2@syzkaller.appspotmail.com Signed-off-by: Johannes Berg Link: https://lore.kernel.org/r/20201009132538.e1fd7f802947.I799b288466ea2815f9d4c84349fae697dca2f189@changeid Signed-off-by: Johannes Berg Signed-off-by: Sasha Levin commit 5b4c0ef711b25d32809cf6b264388ca313d487df Author: Evan Quan Date: Wed Oct 28 15:29:59 2020 +0800 drm/amdgpu: perform srbm soft reset always on SDMA resume [ Upstream commit 253475c455eb5f8da34faa1af92709e7bb414624 ] This can address the random SDMA hang after pci config reset seen on Hawaii. Signed-off-by: Evan Quan Tested-by: Sandeep Raghuraman Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin commit 4c21211820af03440a4d73e606b3a25bbad7f407 Author: Keita Suzuki Date: Tue Oct 27 07:31:24 2020 +0000 scsi: hpsa: Fix memory leak in hpsa_init_one() [ Upstream commit af61bc1e33d2c0ec22612b46050f5b58ac56a962 ] When hpsa_scsi_add_host() fails, h->lastlogicals is leaked since it is missing a free() in the error handler. Fix this by adding free() when hpsa_scsi_add_host() fails. Link: https://lore.kernel.org/r/20201027073125.14229-1-keitasuzuki.park@sslab.ics.keio.ac.jp Tested-by: Don Brace Acked-by: Don Brace Signed-off-by: Keita Suzuki Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin commit ada3d70c0f6f1f7d8360c18bb00a756562fed236 Author: Bob Peterson Date: Wed Oct 28 13:42:18 2020 -0500 gfs2: check for live vs. read-only file system in gfs2_fitrim [ Upstream commit c5c68724696e7d2f8db58a5fce3673208d35c485 ] Before this patch, gfs2_fitrim was not properly checking for a "live" file system. If the file system had something to trim and the file system was read-only (or spectator) it would start the trim, but when it starts the transaction, gfs2_trans_begin returns -EROFS (read-only file system) and it errors out. However, if the file system was already trimmed so there's no work to do, it never called gfs2_trans_begin. That code is bypassed so it never returns the error. Instead, it returns a good return code with 0 work. All this makes for inconsistent behavior: The same fstrim command can return -EROFS in one case and 0 in another. This tripped up xfstests generic/537 which reports the error as: +fstrim with unrecovered metadata just ate your filesystem This patch adds a check for a "live" (iow, active journal, iow, RW) file system, and if not, returns the error properly. Signed-off-by: Bob Peterson Signed-off-by: Andreas Gruenbacher Signed-off-by: Sasha Levin commit 88e168efcd813ffb3c7740994aed5e382d8171fa Author: Bob Peterson Date: Tue Oct 27 10:10:01 2020 -0500 gfs2: Free rd_bits later in gfs2_clear_rgrpd to fix use-after-free [ Upstream commit d0f17d3883f1e3f085d38572c2ea8edbd5150172 ] Function gfs2_clear_rgrpd calls kfree(rgd->rd_bits) before calling return_all_reservations, but return_all_reservations still dereferences rgd->rd_bits in __rs_deltree. Fix that by moving the call to kfree below the call to return_all_reservations. Signed-off-by: Bob Peterson Signed-off-by: Andreas Gruenbacher Signed-off-by: Sasha Levin commit bbd97e2c2854e61e4afc937e80bd5c0ac2fc6a7c Author: Evgeny Novikov Date: Fri Oct 2 18:01:55 2020 +0300 usb: gadget: goku_udc: fix potential crashes in probe [ Upstream commit 0d66e04875c5aae876cf3d4f4be7978fa2b00523 ] goku_probe() goes to error label "err" and invokes goku_remove() in case of failures of pci_enable_device(), pci_resource_start() and ioremap(). goku_remove() gets a device from pci_get_drvdata(pdev) and works with it without any checks, in particular it dereferences a corresponding pointer. But goku_probe() did not set this device yet. So, one can expect various crashes. The patch moves setting the device just after allocation of memory for it. Found by Linux Driver Verification project (linuxtesting.org). Reported-by: Pavel Andrianov Signed-off-by: Evgeny Novikov Signed-off-by: Felipe Balbi Signed-off-by: Sasha Levin commit 526eac8b420b73bc7125bda6f0e30d61b809693e Author: Masashi Honma Date: Sun Aug 9 08:32:58 2020 +0900 ath9k_htc: Use appropriate rs_datalen type commit 5024f21c159f8c1668f581fff37140741c0b1ba9 upstream. kernel test robot says: drivers/net/wireless/ath/ath9k/htc_drv_txrx.c:987:20: sparse: warning: incorrect type in assignment (different base types) drivers/net/wireless/ath/ath9k/htc_drv_txrx.c:987:20: sparse: expected restricted __be16 [usertype] rs_datalen drivers/net/wireless/ath/ath9k/htc_drv_txrx.c:987:20: sparse: got unsigned short [usertype] drivers/net/wireless/ath/ath9k/htc_drv_txrx.c:988:13: sparse: warning: restricted __be16 degrades to integer drivers/net/wireless/ath/ath9k/htc_drv_txrx.c:1001:13: sparse: warning: restricted __be16 degrades to integer Indeed rs_datalen has host byte order, so modify it's own type. Reported-by: kernel test robot Fixes: cd486e627e67 ("ath9k_htc: Discard undersized packets") Signed-off-by: Masashi Honma Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/20200808233258.4596-1-masashi.honma@gmail.com Signed-off-by: Greg Kroah-Hartman commit 1200ebbd06c2f569421dcab4e10649f3e299867c Author: Mark Gray Date: Wed Sep 16 05:19:35 2020 -0400 geneve: add transport ports in route lookup for geneve commit 34beb21594519ce64a55a498c2fe7d567bc1ca20 upstream. This patch adds transport ports information for route lookup so that IPsec can select Geneve tunnel traffic to do encryption. This is needed for OVS/OVN IPsec with encrypted Geneve tunnels. This can be tested by configuring a host-host VPN using an IKE daemon and specifying port numbers. For example, for an Openswan-type configuration, the following parameters should be configured on both hosts and IPsec set up as-per normal: $ cat /etc/ipsec.conf conn in ... left=$IP1 right=$IP2 ... leftprotoport=udp/6081 rightprotoport=udp ... conn out ... left=$IP1 right=$IP2 ... leftprotoport=udp rightprotoport=udp/6081 ... The tunnel can then be setup using "ip" on both hosts (but changing the relevant IP addresses): $ ip link add tun type geneve id 1000 remote $IP2 $ ip addr add 192.168.0.1/24 dev tun $ ip link set tun up This can then be tested by pinging from $IP1: $ ping 192.168.0.2 Without this patch the traffic is unencrypted on the wire. Fixes: 2d07dc79fe04 ("geneve: add initial netdev driver for GENEVE tunnels") Signed-off-by: Qiuyu Xiao Signed-off-by: Mark Gray Reviewed-by: Greg Rose Signed-off-by: David S. Miller [bwh: Backported to 4.9: - Use geneve->dst_port instead of geneve->cfg.info.key.tp_dst - Adjust context] Signed-off-by: Ben Hutchings Signed-off-by: Sasha Levin commit 5e2f790b857668d0bd3779b22a04df8944b04864 Author: Martyna Szapar Date: Thu Apr 18 13:31:53 2019 -0700 i40e: Memory leak in i40e_config_iwarp_qvlist commit 0b63644602cfcbac849f7ea49272a39e90fa95eb upstream. Added freeing the old allocation of vf->qvlist_info in function i40e_config_iwarp_qvlist before overwriting it with the new allocation. Signed-off-by: Martyna Szapar Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher [bwh: Backported to 4.9: adjust context] Signed-off-by: Ben Hutchings Signed-off-by: Sasha Levin commit 7f5e6e098397e4c0dbbecfe5cab86ec135765f0f Author: Martyna Szapar Date: Mon Apr 15 14:43:07 2019 -0700 i40e: Fix of memory leak and integer truncation in i40e_virtchnl.c commit 24474f2709af6729b9b1da1c5e160ab62e25e3a4 upstream. Fixed possible memory leak in i40e_vc_add_cloud_filter function: cfilter is being allocated and in some error conditions the function returns without freeing the memory. Fix of integer truncation from u16 (type of queue_id value) to u8 when calling i40e_vc_isvalid_queue_id function. Signed-off-by: Martyna Szapar Signed-off-by: Jeff Kirsher [bwh: Backported to 4.9: i40e_vc_add_cloud_filter() does not exist but the integer truncation is still possible] Signed-off-by: Ben Hutchings Signed-off-by: Sasha Levin commit b7715c9bb71fa4b95fdb9b98a8814d8e18cb7402 Author: Grzegorz Siwik Date: Fri Mar 29 15:08:37 2019 -0700 i40e: Wrong truncation from u16 to u8 commit c004804dceee9ca384d97d9857ea2e2795c2651d upstream. In this patch fixed wrong truncation method from u16 to u8 during validation. It was changed by changing u8 to u32 parameter in method declaration and arguments were changed to u32. Signed-off-by: Grzegorz Siwik Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher Signed-off-by: Ben Hutchings Signed-off-by: Sasha Levin commit f4a3ff4df40053000d56554f0d34aa98d4d917d6 Author: Sergey Nemov Date: Fri Mar 29 15:08:36 2019 -0700 i40e: add num_vectors checker in iwarp handler commit 7015ca3df965378bcef072cca9cd63ed098665b5 upstream. Field num_vectors from struct virtchnl_iwarp_qvlist_info should not be larger than num_msix_vectors_vf in the hw struct. The iwarp uses the same set of vectors as the LAN VF driver. Signed-off-by: Sergey Nemov Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher [bwh: Backported to 4.9: adjust context] Signed-off-by: Ben Hutchings Signed-off-by: Sasha Levin commit 8f29881eb732ca087069fcc8f65ffa316fc63185 Author: Christophe JAILLET Date: Sun Aug 6 23:37:01 2017 +0200 i40e: Fix a potential NULL pointer dereference commit 54902349ee95045b67e2f0c39b75f5418540064b upstream. If 'kzalloc()' fails, a NULL pointer will be dereferenced. Return an error code (-ENOMEM) instead. Signed-off-by: Christophe JAILLET Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher Signed-off-by: Ben Hutchings Signed-off-by: Sasha Levin commit 77440c3a37203e3f4667d06e37f76ef3968d2d8c Author: Will Deacon Date: Wed Oct 2 13:42:06 2019 +0100 pinctrl: devicetree: Avoid taking direct reference to device name string commit be4c60b563edee3712d392aaeb0943a768df7023 upstream. When populating the pinctrl mapping table entries for a device, the 'dev_name' field for each entry is initialised to point directly at the string returned by 'dev_name()' for the device and subsequently used by 'create_pinctrl()' when looking up the mappings for the device being probed. This is unreliable in the presence of calls to 'dev_set_name()', which may reallocate the device name string leaving the pinctrl mappings with a dangling reference. This then leads to a use-after-free every time the name is dereferenced by a device probe: | BUG: KASAN: invalid-access in strcmp+0x20/0x64 | Read of size 1 at addr 13ffffc153494b00 by task modprobe/590 | Pointer tag: [13], memory tag: [fe] | | Call trace: | __kasan_report+0x16c/0x1dc | kasan_report+0x10/0x18 | check_memory_region | __hwasan_load1_noabort+0x4c/0x54 | strcmp+0x20/0x64 | create_pinctrl+0x18c/0x7f4 | pinctrl_get+0x90/0x114 | devm_pinctrl_get+0x44/0x98 | pinctrl_bind_pins+0x5c/0x450 | really_probe+0x1c8/0x9a4 | driver_probe_device+0x120/0x1d8 Follow the example of sysfs, and duplicate the device name string before stashing it away in the pinctrl mapping entries. Cc: Linus Walleij Reported-by: Elena Petrova Tested-by: Elena Petrova Signed-off-by: Will Deacon Link: https://lore.kernel.org/r/20191002124206.22928-1-will@kernel.org Signed-off-by: Linus Walleij [bwh: Backported to 4.9: adjust context] Signed-off-by: Ben Hutchings Signed-off-by: Sasha Levin commit b4d4f6be3761fdcbc3aebf0cdf993ac85371a76c Author: Filipe Manana Date: Wed Sep 11 17:42:28 2019 +0100 Btrfs: fix missing error return if writeback for extent buffer never started [ Upstream commit 0607eb1d452d45c5ac4c745a9e9e0d95152ea9d0 ] If lock_extent_buffer_for_io() fails, it returns a negative value, but its caller btree_write_cache_pages() ignores such error. This means that a call to flush_write_bio(), from lock_extent_buffer_for_io(), might have failed. We should make btree_write_cache_pages() notice such error values and stop immediatelly, making sure filemap_fdatawrite_range() returns an error to the transaction commit path. A failure from flush_write_bio() should also result in the endio callback end_bio_extent_buffer_writepage() being invoked, which sets the BTRFS_FS_*_ERR bits appropriately, so that there's no risk a transaction or log commit doesn't catch a writeback failure. Reviewed-by: Josef Bacik Signed-off-by: Filipe Manana Signed-off-by: David Sterba Signed-off-by: Sasha Levin commit 9913074487dea937d24a7239063abd365bf3ba3d Author: Brian Foster Date: Thu Oct 29 14:30:48 2020 -0700 xfs: flush new eof page on truncate to avoid post-eof corruption [ Upstream commit 869ae85dae64b5540e4362d7fe4cd520e10ec05c ] It is possible to expose non-zeroed post-EOF data in XFS if the new EOF page is dirty, backed by an unwritten block and the truncate happens to race with writeback. iomap_truncate_page() will not zero the post-EOF portion of the page if the underlying block is unwritten. The subsequent call to truncate_setsize() will, but doesn't dirty the page. Therefore, if writeback happens to complete after iomap_truncate_page() (so it still sees the unwritten block) but before truncate_setsize(), the cached page becomes inconsistent with the on-disk block. A mapped read after the associated page is reclaimed or invalidated exposes non-zero post-EOF data. For example, consider the following sequence when run on a kernel modified to explicitly flush the new EOF page within the race window: $ xfs_io -fc "falloc 0 4k" -c fsync /mnt/file $ xfs_io -c "pwrite 0 4k" -c "truncate 1k" /mnt/file ... $ xfs_io -c "mmap 0 4k" -c "mread -v 1k 8" /mnt/file 00000400: 00 00 00 00 00 00 00 00 ........ $ umount /mnt/; mount /mnt/ $ xfs_io -c "mmap 0 4k" -c "mread -v 1k 8" /mnt/file 00000400: cd cd cd cd cd cd cd cd ........ Update xfs_setattr_size() to explicitly flush the new EOF page prior to the page truncate to ensure iomap has the latest state of the underlying block. Fixes: 68a9f5e7007c ("xfs: implement iomap based buffered write path") Signed-off-by: Brian Foster Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Signed-off-by: Sasha Levin commit 7eeef1093f1fef954cdc4152b9f67c978c45163a Author: Stephane Grosjean Date: Wed Oct 14 10:56:31 2020 +0200 can: peak_usb: peak_usb_get_ts_time(): fix timestamp wrapping [ Upstream commit ecc7b4187dd388549544195fb13a11b4ea8e6a84 ] Fabian Inostroza has discovered a potential problem in the hardware timestamp reporting from the PCAN-USB USB CAN interface (only), related to the fact that a timestamp of an event may precede the timestamp used for synchronization when both records are part of the same USB packet. However, this case was used to detect the wrapping of the time counter. This patch details and fixes the two identified cases where this problem can occur. Reported-by: Fabian Inostroza Signed-off-by: Stephane Grosjean Link: https://lore.kernel.org/r/20201014085631.15128-1-s.grosjean@peak-system.com Fixes: bb4785551f64 ("can: usb: PEAK-System Technik USB adapters driver core") Signed-off-by: Marc Kleine-Budde Signed-off-by: Sasha Levin commit 1e5d182a87d547d11138f2cc185bf37125d59e67 Author: Dan Carpenter Date: Thu Aug 13 17:06:04 2020 +0300 can: peak_usb: add range checking in decode operations [ Upstream commit a6921dd524fe31d1f460c161d3526a407533b6db ] These values come from skb->data so Smatch considers them untrusted. I believe Smatch is correct but I don't have a way to test this. The usb_if->dev[] array has 2 elements but the index is in the 0-15 range without checks. The cfd->len can be up to 255 but the maximum valid size is CANFD_MAX_DLEN (64) so that could lead to memory corruption. Fixes: 0a25e1f4f185 ("can: peak_usb: add support for PEAK new CANFD USB adapters") Signed-off-by: Dan Carpenter Link: https://lore.kernel.org/r/20200813140604.GA456946@mwanda Acked-by: Stephane Grosjean Signed-off-by: Marc Kleine-Budde Signed-off-by: Sasha Levin commit 919d9b622c896f8fac21faa125c4bcbceca8ddf2 Author: Oleksij Rempel Date: Wed Dec 18 09:39:02 2019 +0100 can: can_create_echo_skb(): fix echo skb generation: always use skb_clone() [ Upstream commit 286228d382ba6320f04fa2e7c6fc8d4d92e428f4 ] All user space generated SKBs are owned by a socket (unless injected into the key via AF_PACKET). If a socket is closed, all associated skbs will be cleaned up. This leads to a problem when a CAN driver calls can_put_echo_skb() on a unshared SKB. If the socket is closed prior to the TX complete handler, can_get_echo_skb() and the subsequent delivering of the echo SKB to all registered callbacks, a SKB with a refcount of 0 is delivered. To avoid the problem, in can_get_echo_skb() the original SKB is now always cloned, regardless of shared SKB or not. If the process exists it can now safely discard its SKBs, without disturbing the delivery of the echo SKB. The problem shows up in the j1939 stack, when it clones the incoming skb, which detects the already 0 refcount. We can easily reproduce this with following example: testj1939 -B -r can0: & cansend can0 1823ff40#0123 WARNING: CPU: 0 PID: 293 at lib/refcount.c:25 refcount_warn_saturate+0x108/0x174 refcount_t: addition on 0; use-after-free. Modules linked in: coda_vpu imx_vdoa videobuf2_vmalloc dw_hdmi_ahb_audio vcan CPU: 0 PID: 293 Comm: cansend Not tainted 5.5.0-rc6-00376-g9e20dcb7040d #1 Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) Backtrace: [] (dump_backtrace) from [] (show_stack+0x20/0x24) [] (show_stack) from [] (dump_stack+0x8c/0xa0) [] (dump_stack) from [] (__warn+0xe0/0x108) [] (__warn) from [] (warn_slowpath_fmt+0xa8/0xcc) [] (warn_slowpath_fmt) from [] (refcount_warn_saturate+0x108/0x174) [] (refcount_warn_saturate) from [] (j1939_can_recv+0x20c/0x210) [] (j1939_can_recv) from [] (can_rcv_filter+0xb4/0x268) [] (can_rcv_filter) from [] (can_receive+0xb0/0xe4) [] (can_receive) from [] (can_rcv+0x48/0x98) [] (can_rcv) from [] (__netif_receive_skb_one_core+0x64/0x88) [] (__netif_receive_skb_one_core) from [] (__netif_receive_skb+0x38/0x94) [] (__netif_receive_skb) from [] (netif_receive_skb_internal+0x64/0xf8) [] (netif_receive_skb_internal) from [] (netif_receive_skb+0x34/0x19c) [] (netif_receive_skb) from [] (can_rx_offload_napi_poll+0x58/0xb4) Fixes: 0ae89beb283a ("can: add destructor for self generated skbs") Signed-off-by: Oleksij Rempel Link: http://lore.kernel.org/r/20200124132656.22156-1-o.rempel@pengutronix.de Acked-by: Oliver Hartkopp Signed-off-by: Marc Kleine-Budde Signed-off-by: Sasha Levin commit 6968ec257328be676064e2bccf133fd4e9116d93 Author: Oliver Hartkopp Date: Tue Oct 20 08:44:43 2020 +0200 can: dev: __can_get_echo_skb(): fix real payload length return value for RTR frames [ Upstream commit ed3320cec279407a86bc4c72edc4a39eb49165ec ] The can_get_echo_skb() function returns the number of received bytes to be used for netdev statistics. In the case of RTR frames we get a valid (potential non-zero) data length value which has to be passed for further operations. But on the wire RTR frames have no payload length. Therefore the value to be used in the statistics has to be zero for RTR frames. Reported-by: Vincent Mailhol Signed-off-by: Oliver Hartkopp Link: https://lore.kernel.org/r/20201020064443.80164-1-socketcan@hartkopp.net Fixes: cf5046b309b3 ("can: dev: let can_get_echo_skb() return dlc of CAN frame") Signed-off-by: Marc Kleine-Budde Signed-off-by: Sasha Levin commit 451187b20431924d13fcfecc500d7cd2d9951bac Author: Vincent Mailhol Date: Sat Oct 3 00:41:45 2020 +0900 can: dev: can_get_echo_skb(): prevent call to kfree_skb() in hard IRQ context [ Upstream commit 2283f79b22684d2812e5c76fc2280aae00390365 ] If a driver calls can_get_echo_skb() during a hardware IRQ (which is often, but not always, the case), the 'WARN_ON(in_irq)' in net/core/skbuff.c#skb_release_head_state() might be triggered, under network congestion circumstances, together with the potential risk of a NULL pointer dereference. The root cause of this issue is the call to kfree_skb() instead of dev_kfree_skb_irq() in net/core/dev.c#enqueue_to_backlog(). This patch prevents the skb to be freed within the call to netif_rx() by incrementing its reference count with skb_get(). The skb is finally freed by one of the in-irq-context safe functions: dev_consume_skb_any() or dev_kfree_skb_any(). The "any" version is used because some drivers might call can_get_echo_skb() in a normal context. The reason for this issue to occur is that initially, in the core network stack, loopback skb were not supposed to be received in hardware IRQ context. The CAN stack is an exeption. This bug was previously reported back in 2017 in [1] but the proposed patch never got accepted. While [1] directly modifies net/core/dev.c, we try to propose here a smoother modification local to CAN network stack (the assumption behind is that only CAN devices are affected by this issue). [1] http://lore.kernel.org/r/57a3ffb6-3309-3ad5-5a34-e93c3fe3614d@cetitec.com Signed-off-by: Vincent Mailhol Link: https://lore.kernel.org/r/20201002154219.4887-2-mailhol.vincent@wanadoo.fr Fixes: 39549eef3587 ("can: CAN Network device driver and Netlink interface") Signed-off-by: Marc Kleine-Budde Signed-off-by: Sasha Levin commit b850b9e7f4e9cf14906f3a63d065b54a0392d1da Author: Dan Carpenter Date: Tue Nov 3 13:18:07 2020 +0300 ALSA: hda: prevent undefined shift in snd_hdac_ext_bus_get_link() [ Upstream commit 158e1886b6262c1d1c96a18c85fac5219b8bf804 ] This is harmless, but the "addr" comes from the user and it could lead to a negative shift or to shift wrapping if it's too high. Fixes: 0b00a5615dc4 ("ALSA: hdac_ext: add hdac extended controller") Signed-off-by: Dan Carpenter Link: https://lore.kernel.org/r/20201103101807.GC1127762@mwanda Signed-off-by: Takashi Iwai Signed-off-by: Sasha Levin commit 9aad1a8d7eda6c9fdaf672afa2a2898f965e4d33 Author: Jiri Olsa Date: Mon Nov 2 00:31:03 2020 +0100 perf tools: Add missing swap for ino_generation [ Upstream commit fe01adb72356a4e2f8735e4128af85921ca98fa1 ] We are missing swap for ino_generation field. Fixes: 5c5e854bc760 ("perf tools: Add attr->mmap2 support") Signed-off-by: Jiri Olsa Acked-by: Namhyung Kim Link: https://lore.kernel.org/r/20201101233103.3537427-2-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Sasha Levin commit cca76199bd6bc58010093bf5f50ba89a6f57bdd8 Author: zhuoliang zhang Date: Fri Oct 23 09:05:35 2020 +0200 net: xfrm: fix a race condition during allocing spi [ Upstream commit a779d91314ca7208b7feb3ad817b62904397c56d ] we found that the following race condition exists in xfrm_alloc_userspi flow: user thread state_hash_work thread ---- ---- xfrm_alloc_userspi() __find_acq_core() /*alloc new xfrm_state:x*/ xfrm_state_alloc() /*schedule state_hash_work thread*/ xfrm_hash_grow_check() xfrm_hash_resize() xfrm_alloc_spi /*hold lock*/ x->id.spi = htonl(spi) spin_lock_bh(&net->xfrm.xfrm_state_lock) /*waiting lock release*/ xfrm_hash_transfer() spin_lock_bh(&net->xfrm.xfrm_state_lock) /*add x into hlist:net->xfrm.state_byspi*/ hlist_add_head_rcu(&x->byspi) spin_unlock_bh(&net->xfrm.xfrm_state_lock) /*add x into hlist:net->xfrm.state_byspi 2 times*/ hlist_add_head_rcu(&x->byspi) 1. a new state x is alloced in xfrm_state_alloc() and added into the bydst hlist in __find_acq_core() on the LHS; 2. on the RHS, state_hash_work thread travels the old bydst and tranfers every xfrm_state (include x) into the new bydst hlist and new byspi hlist; 3. user thread on the LHS gets the lock and adds x into the new byspi hlist again. So the same xfrm_state (x) is added into the same list_hash (net->xfrm.state_byspi) 2 times that makes the list_hash become an inifite loop. To fix the race, x->id.spi = htonl(spi) in the xfrm_alloc_spi() is moved to the back of spin_lock_bh, sothat state_hash_work thread no longer add x which id.spi is zero into the hash_list. Fixes: f034b5d4efdf ("[XFRM]: Dynamic xfrm_state hash table sizing.") Signed-off-by: zhuoliang zhang Acked-by: Herbert Xu Signed-off-by: Steffen Klassert Signed-off-by: Sasha Levin commit 10c197e259f4951c115a61cb512ae8ab97fbae53 Author: Marc Zyngier Date: Thu Oct 15 21:41:44 2020 +0100 genirq: Let GENERIC_IRQ_IPI select IRQ_DOMAIN_HIERARCHY [ Upstream commit 151a535171be6ff824a0a3875553ea38570f4c05 ] kernel/irq/ipi.c otherwise fails to compile if nothing else selects it. Fixes: 379b656446a3 ("genirq: Add GENERIC_IRQ_IPI Kconfig symbol") Reported-by: Pavel Machek Tested-by: Pavel Machek Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/20201015101222.GA32747@amd Signed-off-by: Sasha Levin commit 9d0a691a967de9b468c1e24f45d248507cb73132 Author: Johannes Thumshirn Date: Tue Sep 22 17:27:29 2020 +0900 btrfs: reschedule when cloning lots of extents [ Upstream commit 6b613cc97f0ace77f92f7bc112b8f6ad3f52baf8 ] We have several occurrences of a soft lockup from fstest's generic/175 testcase, which look more or less like this one: watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [xfs_io:10030] Kernel panic - not syncing: softlockup: hung tasks CPU: 0 PID: 10030 Comm: xfs_io Tainted: G L 5.9.0-rc5+ #768 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4-rebuilt.opensuse.org 04/01/2014 Call Trace: dump_stack+0x77/0xa0 panic+0xfa/0x2cb watchdog_timer_fn.cold+0x85/0xa5 ? lockup_detector_update_enable+0x50/0x50 __hrtimer_run_queues+0x99/0x4c0 ? recalibrate_cpu_khz+0x10/0x10 hrtimer_run_queues+0x9f/0xb0 update_process_times+0x28/0x80 tick_handle_periodic+0x1b/0x60 __sysvec_apic_timer_interrupt+0x76/0x210 asm_call_on_stack+0x12/0x20 sysvec_apic_timer_interrupt+0x7f/0x90 asm_sysvec_apic_timer_interrupt+0x12/0x20 RIP: 0010:btrfs_tree_unlock+0x91/0x1a0 [btrfs] RSP: 0018:ffffc90007123a58 EFLAGS: 00000282 RAX: ffff8881cea2fbe0 RBX: ffff8881cea2fbe0 RCX: 0000000000000000 RDX: ffff8881d23fd200 RSI: ffffffff82045220 RDI: ffff8881cea2fba0 RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000032 R10: 0000160000000000 R11: 0000000000001000 R12: 0000000000001000 R13: ffff8882357fd5b0 R14: ffff88816fa76e70 R15: ffff8881cea2fad0 ? btrfs_tree_unlock+0x15b/0x1a0 [btrfs] btrfs_release_path+0x67/0x80 [btrfs] btrfs_insert_replace_extent+0x177/0x2c0 [btrfs] btrfs_replace_file_extents+0x472/0x7c0 [btrfs] btrfs_clone+0x9ba/0xbd0 [btrfs] btrfs_clone_files.isra.0+0xeb/0x140 [btrfs] ? file_update_time+0xcd/0x120 btrfs_remap_file_range+0x322/0x3b0 [btrfs] do_clone_file_range+0xb7/0x1e0 vfs_clone_file_range+0x30/0xa0 ioctl_file_clone+0x8a/0xc0 do_vfs_ioctl+0x5b2/0x6f0 __x64_sys_ioctl+0x37/0xa0 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f87977fc247 RSP: 002b:00007ffd51a2f6d8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f87977fc247 RDX: 00007ffd51a2f710 RSI: 000000004020940d RDI: 0000000000000003 RBP: 0000000000000004 R08: 00007ffd51a79080 R09: 0000000000000000 R10: 00005621f11352f2 R11: 0000000000000206 R12: 0000000000000000 R13: 0000000000000000 R14: 00005621f128b958 R15: 0000000080000000 Kernel Offset: disabled ---[ end Kernel panic - not syncing: softlockup: hung tasks ]--- All of these lockup reports have the call chain btrfs_clone_files() -> btrfs_clone() in common. btrfs_clone_files() calls btrfs_clone() with both source and destination extents locked and loops over the source extent to create the clones. Conditionally reschedule in the btrfs_clone() loop, to give some time back to other processes. CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Josef Bacik Signed-off-by: Johannes Thumshirn Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Sasha Levin commit 6e12733ef63c3f5be2a4f378eddfddfd82b622cf Author: Zeng Tao Date: Tue Sep 1 17:30:13 2020 +0800 time: Prevent undefined behaviour in timespec64_to_ns() [ Upstream commit cb47755725da7b90fecbb2aa82ac3b24a7adb89b ] UBSAN reports: Undefined behaviour in ./include/linux/time64.h:127:27 signed integer overflow: 17179869187 * 1000000000 cannot be represented in type 'long long int' Call Trace: timespec64_to_ns include/linux/time64.h:127 [inline] set_cpu_itimer+0x65c/0x880 kernel/time/itimer.c:180 do_setitimer+0x8e/0x740 kernel/time/itimer.c:245 __x64_sys_setitimer+0x14c/0x2c0 kernel/time/itimer.c:336 do_syscall_64+0xa1/0x540 arch/x86/entry/common.c:295 Commit bd40a175769d ("y2038: itimer: change implementation to timespec64") replaced the original conversion which handled time clamping correctly with timespec64_to_ns() which has no overflow protection. Fix it in timespec64_to_ns() as this is not necessarily limited to the usage in itimers. [ tglx: Added comment and adjusted the fixes tag ] Fixes: 361a3bf00582 ("time64: Add time64.h header and define struct timespec64") Signed-off-by: Zeng Tao Signed-off-by: Thomas Gleixner Reviewed-by: Arnd Bergmann Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/1598952616-6416-1-git-send-email-prime.zeng@hisilicon.com Signed-off-by: Sasha Levin commit 5ed0bc2d3e2df994b739bc0c6f536f480ae60c13 Author: Shijie Luo Date: Sun Nov 1 17:07:40 2020 -0800 mm: mempolicy: fix potential pte_unmap_unlock pte error [ Upstream commit 3f08842098e842c51e3b97d0dcdebf810b32558e ] When flags in queue_pages_pte_range don't have MPOL_MF_MOVE or MPOL_MF_MOVE_ALL bits, code breaks and passing origin pte - 1 to pte_unmap_unlock seems like not a good idea. queue_pages_pte_range can run in MPOL_MF_MOVE_ALL mode which doesn't migrate misplaced pages but returns with EIO when encountering such a page. Since commit a7f40cfe3b7a ("mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified") and early break on the first pte in the range results in pte_unmap_unlock on an underflow pte. This can lead to lockups later on when somebody tries to lock the pte resp. page_table_lock again.. Fixes: a7f40cfe3b7a ("mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified") Signed-off-by: Shijie Luo Signed-off-by: Miaohe Lin Signed-off-by: Andrew Morton Reviewed-by: Oscar Salvador Acked-by: Michal Hocko Cc: Miaohe Lin Cc: Feilong Lin Cc: Shijie Luo Cc: Link: https://lkml.kernel.org/r/20201019074853.50856-1-luoshijie1@huawei.com Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin commit 07c4433dff1449db45a7de14fd9e5db8fbe358e0 Author: Alexander Aring Date: Mon Oct 26 10:52:29 2020 -0400 gfs2: Wake up when sd_glock_disposal becomes zero [ Upstream commit da7d554f7c62d0c17c1ac3cc2586473c2d99f0bd ] Commit fc0e38dae645 ("GFS2: Fix glock deallocation race") fixed a sd_glock_disposal accounting bug by adding a missing atomic_dec statement, but it failed to wake up sd_glock_wait when that decrement causes sd_glock_disposal to reach zero. As a consequence, gfs2_gl_hash_clear can now run into a 10-minute timeout instead of being woken up. Add the missing wakeup. Fixes: fc0e38dae645 ("GFS2: Fix glock deallocation race") Cc: stable@vger.kernel.org # v2.6.39+ Signed-off-by: Alexander Aring Signed-off-by: Andreas Gruenbacher Signed-off-by: Sasha Levin commit b0db2f09db7930ed3bce3b6115ecf04f0acde2b6 Author: Steven Rostedt (VMware) Date: Mon Nov 2 15:31:27 2020 -0500 ring-buffer: Fix recursion protection transitions between interrupt context [ Upstream commit b02414c8f045ab3b9afc816c3735bc98c5c3d262 ] The recursion protection of the ring buffer depends on preempt_count() to be correct. But it is possible that the ring buffer gets called after an interrupt comes in but before it updates the preempt_count(). This will trigger a false positive in the recursion code. Use the same trick from the ftrace function callback recursion code which uses a "transition" bit that gets set, to allow for a single recursion for to handle transitions between contexts. Cc: stable@vger.kernel.org Fixes: 567cd4da54ff4 ("ring-buffer: User context bit recursion checking") Signed-off-by: Steven Rostedt (VMware) Signed-off-by: Sasha Levin commit 215ca5235bc5ef8dab3b6e2578eacfea4397abe0 Author: Michał Mirosław Date: Mon Nov 2 22:27:27 2020 +0100 regulator: defer probe when trying to get voltage from unresolved supply [ Upstream commit cf1ad559a20d1930aa7b47a52f54e1f8718de301 ] regulator_get_voltage_rdev() is called in regulator probe() when applying machine constraints. The "fixed" commit exposed the problem that non-bypassed regulators can forward the request to its parent (like bypassed ones) supply. Return -EPROBE_DEFER when the supply is expected but not resolved yet. Fixes: aea6cb99703e ("regulator: resolve supply after creating regulator") Cc: stable@vger.kernel.org Signed-off-by: Michał Mirosław Reported-by: Ondřej Jirman Reported-by: Corentin Labbe Tested-by: Ondřej Jirman Link: https://lore.kernel.org/r/a9041d68b4d35e4a2dd71629c8a6422662acb5ee.1604351936.git.mirq-linux@rere.qmqm.pl Signed-off-by: Mark Brown Signed-off-by: Sasha Levin