evie/android_kernel_oneplus_msm8998 - Gay Catgirls Forgejo: gay catgirls having sex

evie/android_kernel_oneplus_msm8998

Author	SHA1	Message	Date
Tejun Heo	85fbd722ad	libata, freezer: avoid block device removal while system is frozen Freezable kthreads and workqueues are fundamentally problematic in that they effectively introduce a big kernel lock widely used in the kernel and have already been the culprit of several deadlock scenarios. This is the latest occurrence. During resume, libata rescans all the ports and revalidates all pre-existing devices. If it determines that a device has gone missing, the device is removed from the system which involves invalidating block device and flushing bdi while holding driver core layer locks. Unfortunately, this can race with the rest of device resume. Because freezable kthreads and workqueues are thawed after device resume is complete and block device removal depends on freezable workqueues and kthreads (e.g. bdi_wq, jbd2) to make progress, this can lead to deadlock - block device removal can't proceed because kthreads are frozen and kthreads can't be thawed because device resume is blocked behind block device removal. `839a8e8660` ("writeback: replace custom worker pool implementation with unbound workqueue") made this particular deadlock scenario more visible but the underlying problem has always been there - the original forker task and jbd2 are freezable too. In fact, this is highly likely just one of many possible deadlock scenarios given that freezer behaves as a big kernel lock and we don't have any debug mechanism around it. I believe the right thing to do is getting rid of freezable kthreads and workqueues. This is something fundamentally broken. For now, implement a funny workaround in libata - just avoid doing block device hot[un]plug while the system is frozen. Kernel engineering at its finest. :( v2: Add EXPORT_SYMBOL_GPL(pm_freezing) for cases where libata is built as a module. v3: Comment updated and polling interval changed to 10ms as suggested by Rafael. v4: Add #ifdef CONFIG_FREEZER around the hack as pm_freezing is not defined when FREEZER is not configured thus breaking build. Reported by kbuild test robot. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Tomaž Šolc <tomaz.solc@tablix.org> Reviewed-by: "Rafael J. Wysocki" <rjw@rjwysocki.net> Link: https://bugzilla.kernel.org/show_bug.cgi?id=62801 Link: http://lkml.kernel.org/r/20131213174932.GA27070@htj.dyndns.org Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Len Brown <len.brown@intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: stable@vger.kernel.org Cc: kbuild test robot <fengguang.wu@intel.com>	2013-12-19 13:50:32 -05:00
Linus Torvalds	f7556698a3	Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "An RT group-scheduling fix and the sched-domains topology setup fix from Mel" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/rt: Fix rq's cpupri leak while enqueue/dequeue child RT entities sched: Assign correct scheduling domain to 'sd_llc'	2013-12-19 09:11:22 -08:00
Linus Torvalds	58cac3faef	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "An ABI documentation fix, and a mixed-PMU perf-info-corruption fix" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Document the new transaction sample type perf: Disable all pmus on unthrottling and rescheduling	2013-12-19 09:10:46 -08:00
Linus Torvalds	86fbf1617a	Merge branch 'akpm' (incoming from Andrew) Merge patches from Andrew Morton: "23 fixes and a MAINTAINERS update" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (24 commits) mm/hugetlb: check for pte NULL pointer in __page_check_address() fix build with make 3.80 mm/mempolicy: fix !vma in new_vma_page() MAINTAINERS: add Davidlohr as GPT maintainer mm/memory-failure.c: recheck PageHuge() after hugetlb page migrate successfully mm/compaction: respect ignore_skip_hint in update_pageblock_skip mm/mempolicy: correct putback method for isolate pages if failed mm: add missing dependency in Kconfig sh: always link in helper functions extracted from libgcc mm: page_alloc: exclude unreclaimable allocations from zone fairness policy mm: numa: defer TLB flush for THP migration as long as possible mm: numa: guarantee that tlb_flush_pending updates are visible before page table updates mm: fix TLB flush race between migration, and change_protection_range mm: numa: avoid unnecessary disruption of NUMA hinting during migration mm: numa: clear numa hinting information on mprotect sched: numa: skip inaccessible VMAs mm: numa: avoid unnecessary work on the failure path mm: numa: ensure anon_vma is locked to prevent parallel THP splits mm: numa: do not clear PTE for pte_numa update mm: numa: do not clear PMD during PTE update scan ...	2013-12-18 19:05:00 -08:00
Rik van Riel	2084140594	mm: fix TLB flush race between migration, and change_protection_range There are a few subtle races, between change_protection_range (used by mprotect and change_prot_numa) on one side, and NUMA page migration and compaction on the other side. The basic race is that there is a time window between when the PTE gets made non-present (PROT_NONE or NUMA), and the TLB is flushed. During that time, a CPU may continue writing to the page. This is fine most of the time, however compaction or the NUMA migration code may come in, and migrate the page away. When that happens, the CPU may continue writing, through the cached translation, to what is no longer the current memory location of the process. This only affects x86, which has a somewhat optimistic pte_accessible. All other architectures appear to be safe, and will either always flush, or flush whenever there is a valid mapping, even with no permissions (SPARC). The basic race looks like this: CPU A CPU B CPU C load TLB entry make entry PTE/PMD_NUMA fault on entry read/write old page start migrating page change PTE/PMD to new page read/write old page [] flush TLB reload TLB from new entry read/write new page lose data [] the old page may belong to a new user at this point! The obvious fix is to flush remote TLB entries, by making sure that pte_accessible aware of the fact that PROT_NONE and PROT_NUMA memory may still be accessible if there is a TLB flush pending for the mm. This should fix both NUMA migration and compaction. [mgorman@suse.de: fix build] Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Alex Thorlton <athorlton@sgi.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-12-18 19:04:51 -08:00
Mel Gorman	3c67f47455	sched: numa: skip inaccessible VMAs Inaccessible VMA should not be trapping NUMA hint faults. Skip them. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Alex Thorlton <athorlton@sgi.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-12-18 19:04:51 -08:00
Vivek Goyal	c97102ba96	kexec: migrate to reboot cpu Commit `1b3a5d02ee` ("reboot: move arch/x86 reboot= handling to generic kernel") moved reboot= handling to generic code. In the process it also removed the code in native_machine_shutdown() which are moving reboot process to reboot_cpu/cpu0. I guess that thought must have been that all reboot paths are calling migrate_to_reboot_cpu(), so we don't need this special handling. But kexec reboot path (kernel_kexec()) is not calling migrate_to_reboot_cpu() so above change broke kexec. Now reboot can happen on non-boot cpu and when INIT is sent in second kerneo to bring up BP, it brings down the machine. So start calling migrate_to_reboot_cpu() in kexec reboot path to avoid this problem. Bisected by WANG Chao. Reported-by: Matthew Whitehead <mwhitehe@redhat.com> Reported-by: Dave Young <dyoung@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Tested-by: Baoquan He <bhe@redhat.com> Tested-by: WANG Chao <chaowang@redhat.com> Acked-by: H. Peter Anvin <hpa@linux.intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-12-18 19:04:50 -08:00
Linus Torvalds	a81bddde96	Merge branch 'keys-devel' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull crypto key patches from David Howells: "There are four items: - A patch to fix X.509 certificate gathering. The problem was that I was coming up with a different path for signing_key.x509 in the build directory if it didn't exist to if it did exist. This meant that the X.509 cert container object file would be rebuilt on the second rebuild in a build directory and the kernel would get relinked. - Unconditionally remove files generated by SYSTEM_TRUSTED_KEYRING=y when doing make mrproper. - Actually initialise the persistent-keyring semaphore for init_user_ns. I have no idea why this works at all for users in the base user namespace unless it's something to do with systemd containerising the system. - Documentation for module signing" * 'keys-devel' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: Add Documentation/module-signing.txt file KEYS: fix uninitialized persistent_keyring_register_sem KEYS: Remove files generated when SYSTEM_TRUSTED_KEYRING=y X.509: Fix certificate gathering	2013-12-18 14:09:08 -08:00
Linus Torvalds	dd0508093b	Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Three fixes for scheduler crashes, each triggers in relatively rare, hardware environment dependent situations" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/fair: Rework sched_fair time accounting math64: Add mul_u64_u32_shr() sched: Remove PREEMPT_NEED_RESCHED from generic code sched: Initialize power_orig for overlapping groups	2013-12-17 12:35:54 -08:00
Chuansheng Liu	91f30a1702	mutexes: Give more informative mutex warning in the !lock->owner case When mutex debugging is enabled and an imbalanced mutex_unlock() is called, we get the following, slightly confusing warning: [ 364.208284] DEBUG_LOCKS_WARN_ON(lock->owner != current) But in that case the warning is due to an imbalanced mutex_unlock() call, and the lock->owner is NULL - so the message is misleading. So improve the message by testing for this case specifically: DEBUG_LOCKS_WARN_ON(!lock->owner) Signed-off-by: Liu, Chuansheng <chuansheng.liu@intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1386136693.3650.48.camel@cliu38-desktop-build [ Improved the changelog, changed the patch to use !lock->owner consistently. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-12-17 15:35:10 +01:00
Ingo Molnar	bb799d3b98	Linux 3.13-rc4 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.15 (GNU/Linux) iQEcBAABAgAGBQJSrhGrAAoJEHm+PkMAQRiGsNoH/jIK3CsQ2lbW7yRLXmfgtbzz i2Kep6D4SDvmaLpLYOVC8xNYTiE8jtTbSXHomwP5wMZ63MQDhBfnEWsEWqeZ9+D9 3Q46p0QWuoBgYu2VGkoxTfygkT6hhSpwWIi3SeImbY4fg57OHiUil/+YGhORM4Qc K4549OCTY3sIrgmWL77gzqjRUo+pQ4C73NKqZ3+5nlOmYBZC1yugk8mFwEpQkwhK 4NRNU760Fo+XIht/bINqRiPMddzC15p0mxvJy3cDW8bZa1tFSS9SB7AQUULBbcHL +2dFlFOEb5SV1sNiNPrJ0W+h2qUh2e7kPB0F8epaBppgbwVdyQoC2u4uuLV2ZN0= =lI2r -----END PGP SIGNATURE----- Merge tag 'v3.13-rc4' into core/locking Merge Linux 3.13-rc4, to refresh this rather old tree with the latest fixes. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-12-17 15:27:08 +01:00
Wanpeng Li	e777b63bbd	sched/numa: Fix period_slot recalculation The original code is as intended and was meant to scale the difference between the NUMA_PERIOD_THRESHOLD and local/remote ratio when adjusting the scan period. The period_slot recalculation can be dropped. Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Rik van Riel <riel@redhat.com> Link: http://lkml.kernel.org/r/1386833006-6600-4-git-send-email-liwanp@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-12-17 15:24:41 +01:00
Wanpeng Li	82897b4fd3	sched/numa: Use wrapper function task_faults_idx to calculate index in group_faults Use wrapper function task_faults_idx to calculate index in group_faults. Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Rik van Riel <riel@redhat.com> Link: http://lkml.kernel.org/r/1386833006-6600-3-git-send-email-liwanp@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-12-17 15:24:40 +01:00
Wanpeng Li	de1b301a19	sched/numa: Use wrapper function task_node to get node which task is on Use wrapper function task_node to get node which task is on. Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Acked-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1386833006-6600-2-git-send-email-liwanp@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-12-17 15:24:39 +01:00
Wanpeng Li	1bd53a7efd	sched/numa: Drop sysctl_numa_balancing_settle_count sysctl commit `887c290e` (sched/numa: Decide whether to favour task or group weights based on swap candidate relationships) drop the check against sysctl_numa_balancing_settle_count, this patch remove the sysctl. Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Acked-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Link: http://lkml.kernel.org/r/1386833006-6600-1-git-send-email-liwanp@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-12-17 15:24:38 +01:00
Ingo Molnar	ffe732c243	Merge branch 'sched/urgent' into sched/core Merge the latest batch of fixes before applying development patches. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-12-17 15:22:35 +01:00
Peter Zijlstra	bad7192b84	perf: Fix PERF_EVENT_IOC_PERIOD to force-reset the period Vince Weaver reports that, on all architectures apart from ARM, PERF_EVENT_IOC_PERIOD doesn't actually update the period until the next event fires. This is counter-intuitive behaviour and is better dealt with in the core code. This patch ensures that the period is forcefully reset when dealing with such a request in the core code. A subsequent patch removes the equivalent hack from the ARM back-end. Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Link: http://lkml.kernel.org/r/1385560479-11014-1-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-12-17 15:21:33 +01:00
Kirill Tkhai	757dfcaa41	sched/rt: Fix rq's cpupri leak while enqueue/dequeue child RT entities This patch touches the RT group scheduling case. Functions inc_rt_prio_smp() and dec_rt_prio_smp() change (global) rq's priority, while rt_rq passed to them may be not the top-level rt_rq. This is wrong, because changing of priority on a child level does not guarantee that the priority is the highest all over the rq. So, this leak makes RT balancing unusable. The short example: the task having the highest priority among all rq's RT tasks (no one other task has the same priority) are waking on a throttle rt_rq. The rq's cpupri is set to the task's priority equivalent, but real rq->rt.highest_prio.curr is less. The patch below fixes the problem. Signed-off-by: Kirill Tkhai <tkhai@yandex.ru> Signed-off-by: Peter Zijlstra <peterz@infradead.org> CC: Steven Rostedt <rostedt@goodmis.org> CC: stable@vger.kernel.org Link: http://lkml.kernel.org/r/49231385567953@web4m.yandex.ru Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-12-17 15:08:44 +01:00
Mel Gorman	5d4cf996cf	sched: Assign correct scheduling domain to 'sd_llc' Commit `42eb088e` (sched: Avoid NULL dereference on sd_busy) corrected a NULL dereference on sd_busy but the fix also altered what scheduling domain it used for the 'sd_llc' percpu variable. One impact of this is that a task selecting a runqueue may consider idle CPUs that are not cache siblings as candidates for running. Tasks are then running on CPUs that are not cache hot. This was found through bisection where ebizzy threads were not seeing equal performance and it looked like a scheduling fairness issue. This patch mitigates but does not completely fix the problem on all machines tested implying there may be an additional bug or a common root cause. Here are the average range of performance seen by individual ebizzy threads. It was tested on top of candidate patches related to x86 TLB range flushing. 4-core machine 3.13.0-rc3 3.13.0-rc3 vanilla fixsd-v3r3 Mean 1 0.00 ( 0.00%) 0.00 ( 0.00%) Mean 2 0.34 ( 0.00%) 0.10 ( 70.59%) Mean 3 1.29 ( 0.00%) 0.93 ( 27.91%) Mean 4 7.08 ( 0.00%) 0.77 ( 89.12%) Mean 5 193.54 ( 0.00%) 2.14 ( 98.89%) Mean 6 151.12 ( 0.00%) 2.06 ( 98.64%) Mean 7 115.38 ( 0.00%) 2.04 ( 98.23%) Mean 8 108.65 ( 0.00%) 1.92 ( 98.23%) 8-core machine Mean 1 0.00 ( 0.00%) 0.00 ( 0.00%) Mean 2 0.40 ( 0.00%) 0.21 ( 47.50%) Mean 3 23.73 ( 0.00%) 0.89 ( 96.25%) Mean 4 12.79 ( 0.00%) 1.04 ( 91.87%) Mean 5 13.08 ( 0.00%) 2.42 ( 81.50%) Mean 6 23.21 ( 0.00%) 69.46 (-199.27%) Mean 7 15.85 ( 0.00%) 101.72 (-541.77%) Mean 8 109.37 ( 0.00%) 19.13 ( 82.51%) Mean 12 124.84 ( 0.00%) 28.62 ( 77.07%) Mean 16 113.50 ( 0.00%) 24.16 ( 78.71%) It's eliminated for one machine and reduced for another. Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Alex Shi <alex.shi@linaro.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: H Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20131217092124.GV11295@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-12-17 15:08:43 +01:00
Alexander Shishkin	443772776c	perf: Disable all pmus on unthrottling and rescheduling Currently, only one PMU in a context gets disabled during unthrottling and event_sched_{out,in}(), however, events in one context may belong to different pmus, which results in PMUs being reprogrammed while they are still enabled. This means that mixed PMU use [which is rare in itself] resulted in potentially completely unreliable results: corrupted events, bogus results, etc. This patch temporarily disables PMUs that correspond to each event in the context while these events are being modified. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Stephane Eranian <eranian@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Link: http://lkml.kernel.org/r/1387196256-8030-1-git-send-email-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-12-17 15:04:00 +01:00
Li Zefan	c1a71504e9	cgroup: don't recycle cgroup id until all csses' have been destroyed Hugh reported this bug: > CONFIG_MEMCG_SWAP is broken in 3.13-rc. Try something like this: > > mkdir -p /tmp/tmpfs /tmp/memcg > mount -t tmpfs -o size=1G tmpfs /tmp/tmpfs > mount -t cgroup -o memory memcg /tmp/memcg > mkdir /tmp/memcg/old > echo 512M >/tmp/memcg/old/memory.limit_in_bytes > echo $$ >/tmp/memcg/old/tasks > cp /dev/zero /tmp/tmpfs/zero 2>/dev/null > echo $$ >/tmp/memcg/tasks > rmdir /tmp/memcg/old > sleep 1 # let rmdir work complete > mkdir /tmp/memcg/new > umount /tmp/tmpfs > dmesg \| grep WARNING > rmdir /tmp/memcg/new > umount /tmp/memcg > > Shows lots of WARNING: CPU: 1 PID: 1006 at kernel/res_counter.c:91 > res_counter_uncharge_locked+0x1f/0x2f() > > Breakage comes from `34c00c319c` ("memcg: convert to use cgroup id"). > > The lifetime of a cgroup id is different from the lifetime of the > css id it replaced: memsw's css_get()s do nothing to hold on to the > old cgroup id, it soon gets recycled to a new cgroup, which then > mysteriously inherits the old's swap, without any charge for it. Instead of removing cgroup id right after all the csses have been offlined, we should do that after csses have been destroyed. To make sure an invalid css pointer won't be returned after the css is destroyed, make sure css_from_id() returns NULL in this case. tj: Updated comment to note planned changes for cgrp->id. Reported-by: Hugh Dickins <hughd@google.com> Signed-off-by: Li Zefan <lizefan@huawei.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Tejun Heo <tj@kernel.org>	2013-12-17 08:11:52 -05:00
Miao Xie	c4602c1c81	ftrace: Initialize the ftrace profiler for each possible cpu Ftrace currently initializes only the online CPUs. This implementation has two problems: - If we online a CPU after we enable the function profile, and then run the test, we will lose the trace information on that CPU. Steps to reproduce: # echo 0 > /sys/devices/system/cpu/cpu1/online # cd <debugfs>/tracing/ # echo <some function name> >> set_ftrace_filter # echo 1 > function_profile_enabled # echo 1 > /sys/devices/system/cpu/cpu1/online # run test - If we offline a CPU before we enable the function profile, we will not clear the trace information when we enable the function profile. It will trouble the users. Steps to reproduce: # cd <debugfs>/tracing/ # echo <some function name> >> set_ftrace_filter # echo 1 > function_profile_enabled # run test # cat trace_stat/function* # echo 0 > /sys/devices/system/cpu/cpu1/online # echo 0 > function_profile_enabled # echo 1 > function_profile_enabled # cat trace_stat/function* # run test # cat trace_stat/function* So it is better that we initialize the ftrace profiler for each possible cpu every time we enable the function profile instead of just the online ones. Link: http://lkml.kernel.org/r/1387178401-10619-1-git-send-email-miaox@cn.fujitsu.com Cc: stable@vger.kernel.org # 2.6.31+ Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2013-12-16 10:53:46 -05:00
Ingo Molnar	fe361cfcf4	Linux 3.13-rc4 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.15 (GNU/Linux) iQEcBAABAgAGBQJSrhGrAAoJEHm+PkMAQRiGsNoH/jIK3CsQ2lbW7yRLXmfgtbzz i2Kep6D4SDvmaLpLYOVC8xNYTiE8jtTbSXHomwP5wMZ63MQDhBfnEWsEWqeZ9+D9 3Q46p0QWuoBgYu2VGkoxTfygkT6hhSpwWIi3SeImbY4fg57OHiUil/+YGhORM4Qc K4549OCTY3sIrgmWL77gzqjRUo+pQ4C73NKqZ3+5nlOmYBZC1yugk8mFwEpQkwhK 4NRNU760Fo+XIht/bINqRiPMddzC15p0mxvJy3cDW8bZa1tFSS9SB7AQUULBbcHL +2dFlFOEb5SV1sNiNPrJ0W+h2qUh2e7kPB0F8epaBppgbwVdyQoC2u4uuLV2ZN0= =lI2r -----END PGP SIGNATURE----- Merge tag 'v3.13-rc4' into perf/core Merge Linux 3.13-rc4, to refresh this branch with the latest fixes. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-12-16 14:51:32 +01:00
Ingo Molnar	73a7ac2808	Merge branch 'rcu/next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull v3.14 RCU updates from Paul E. McKenney. The main changes: * Update RCU documentation. * Miscellaneous fixes. * Add RCU torture scripts. * Static-analysis improvements. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-12-16 11:43:41 +01:00
Paul E. McKenney	6303b9c87d	rcu: Apply smp_mb__after_unlock_lock() to preserve grace periods RCU must ensure that there is the equivalent of a full memory barrier between any memory access preceding grace period and any memory access following that same grace period, regardless of which CPU(s) happen to execute the two memory accesses. Therefore, downgrading UNLOCK+LOCK to no longer imply a full memory barrier requires some adjustments to RCU. This commit therefore adds smp_mb__after_unlock_lock() invocations as needed after the RCU lock acquisitions that need to be part of a full-memory-barrier UNLOCK+LOCK. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: <linux-arch@vger.kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1386799151-2219-7-git-send-email-paulmck@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-12-16 11:36:16 +01:00
Linus Torvalds	9199c4caa1	PCI updates for v3.13: PCI device hotplug - Move device_del() from pci_stop_dev() to pci_destroy_dev() (Rafael J. Wysocki) Host bridge drivers - Update maintainers for DesignWare, i.MX6, Armada, R-Car (Bjorn Helgaas) - mvebu: Return 'unsupported' for Interrupt Line and Interrupt Pin (Jason Gunthorpe) Miscellaneous - Avoid unnecessary CPU switch when calling .probe() (Alexander Duyck) - Revert "workqueue: allow work_on_cpu() to be called recursively" (Bjorn Helgaas) - Disable Bus Master only on kexec reboot (Khalid Aziz) - Omit PCI ID macro strings to shorten quirk names for LTO (Michal Marek) MAINTAINERS \| 33 +++++++++++++++++++++++++++++++++ drivers/pci/host/pci-mvebu.c \| 5 +++++ drivers/pci/pci-driver.c \| 38 ++++++++++++++++++++++++++++++-------- drivers/pci/remove.c \| 4 +++- include/linux/kexec.h \| 3 +++ include/linux/pci.h \| 30 +++++++++++++++--------------- kernel/kexec.c \| 4 ++++ kernel/workqueue.c \| 32 ++++++++++---------------------- 8 files changed, 103 insertions(+), 46 deletions(-) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJSrLDLAAoJEFmIoMA60/r8JO4P/iakGDezwXcd9YTbdaq/NmEK 31JtBao9rJNSUXpFGaFKGm99A479QNb+Fo1o5NaysyGbcAA2W0HeCkaee13hpw2D l3wvzJRoEwqVySOzMwlGsxxhYuvGiZ2WVALNkwBzwyCiYtypNnUtGNCSp/3J6XKp 2ZUjoagyixdS4oYoR4irZucPBWwzW89Gx4oJ0rBttNVsXjiT1D2OzYDYTuxMyb2E ZjXJqTUYzfwiFxqKPrGOabUPD9GX3EWaFmu01FLlsSznPZkk9yJR3/eq6I6utp/E WSpP+7v3jnPHjZVky1FdHaP5wqurtNRsiWBQVyNYF+M5WofKA1hGA+niJDEB/pVL vE74fJfZzpET1jlZR+7Z5qHPxW2A8Q2ARn74A42DYLQGmJEh7VdARcayV1E4diRH DzEnhf33b9bwIC1Q0cESWZ5RH4r2Xbjg0a1qoVQYi5VEJEMWXID3Ofk64Bd9QOz4 oLZ27V7clurD+gNarch/zgpd9LVIHLFriR2YWPpA0Iwy9EjueH8GsiOS3NqDn2SQ sQg5utE4vdnix4VrCvvbufAr3kJngndtuj/s7I3lZqi7nCyS+jeFFvMUd9h3MUqZ rs1IN1/qeTBBNi2dtmcKDN2ItahCoBhTI37JRaijZwe+B5m6mTe049it+E0SZPI2 IqLOYWL4ggT0se/cMXld =Edvk -----END PGP SIGNATURE----- Merge tag 'pci-v3.13-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "PCI device hotplug - Move device_del() from pci_stop_dev() to pci_destroy_dev() (Rafael Wysocki) Host bridge drivers - Update maintainers for DesignWare, i.MX6, Armada, R-Car (Bjorn Helgaas) - mvebu: Return 'unsupported' for Interrupt Line and Interrupt Pin (Jason Gunthorpe) Miscellaneous - Avoid unnecessary CPU switch when calling .probe() (Alexander Duyck) - Revert "workqueue: allow work_on_cpu() to be called recursively" (Bjorn Helgaas) - Disable Bus Master only on kexec reboot (Khalid Aziz) - Omit PCI ID macro strings to shorten quirk names for LTO (Michal Marek)" * tag 'pci-v3.13-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: MAINTAINERS: Add DesignWare, i.MX6, Armada, R-Car PCI host maintainers PCI: Disable Bus Master only on kexec reboot PCI: mvebu: Return 'unsupported' for Interrupt Line and Interrupt Pin PCI: Omit PCI ID macro strings to shorten quirk names PCI: Move device_del() from pci_stop_dev() to pci_destroy_dev() Revert "workqueue: allow work_on_cpu() to be called recursively" PCI: Avoid unnecessary CPU switch when calling driver .probe() method	2013-12-15 11:45:27 -08:00
Vladimir Davydov	10bf2f7e7d	cgroup: fix fail path in cgroup_load_subsys() Calling cgroup_unload_subsys() from cgroup_load_subsys() after online_css() failure will result in a NULL ptr dereference on attempt to offline_css(), because online_css() only assigns css to cgroup on success. Let's fix that by skipping calls to offline_css() and css_free() in cgroup_unload_subsys() if there is no css, and freeing css in cgroup_load_subsys() on online_css() failure. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>	2013-12-13 15:46:49 -05:00
Xiao Guangrong	6bd364d829	KEYS: fix uninitialized persistent_keyring_register_sem We run into this bug: [ 2736.063245] Unable to handle kernel paging request for data at address 0x00000000 [ 2736.063293] Faulting instruction address: 0xc00000000037efb0 [ 2736.063300] Oops: Kernel access of bad area, sig: 11 [#1] [ 2736.063303] SMP NR_CPUS=2048 NUMA pSeries [ 2736.063310] Modules linked in: sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6table_security ip6table_raw ip6t_REJECT iptable_nat nf_nat_ipv4 iptable_mangle iptable_security iptable_raw ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack ebtable_filter ebtables ip6table_filter iptable_filter ip_tables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 nf_nat nf_conntrack ip6_tables ibmveth pseries_rng nx_crypto nfsd auth_rpcgss nfs_acl lockd sunrpc binfmt_misc xfs libcrc32c dm_service_time sd_mod crc_t10dif crct10dif_common ibmvfc scsi_transport_fc scsi_tgt dm_mirror dm_region_hash dm_log dm_multipath dm_mod [ 2736.063383] CPU: 1 PID: 7128 Comm: ssh Not tainted 3.10.0-48.el7.ppc64 #1 [ 2736.063389] task: c000000131930120 ti: c0000001319a0000 task.ti: c0000001319a0000 [ 2736.063394] NIP: c00000000037efb0 LR: c0000000006c40f8 CTR: 0000000000000000 [ 2736.063399] REGS: c0000001319a3870 TRAP: 0300 Not tainted (3.10.0-48.el7.ppc64) [ 2736.063403] MSR: 8000000000009032 <SF,EE,ME,IR,DR,RI> CR: 28824242 XER: 20000000 [ 2736.063415] SOFTE: 0 [ 2736.063418] CFAR: c00000000000908c [ 2736.063421] DAR: 0000000000000000, DSISR: 40000000 [ 2736.063425] GPR00: c0000000006c40f8 c0000001319a3af0 c000000001074788 c0000001319a3bf0 GPR04: 0000000000000000 0000000000000000 0000000000000020 000000000000000a GPR08: fffffffe00000002 00000000ffff0000 0000000080000001 c000000000924888 GPR12: 0000000028824248 c000000007e00400 00001fffffa0f998 0000000000000000 GPR16: 0000000000000022 00001fffffa0f998 0000010022e92470 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR24: 0000000000000000 c000000000f4a828 00003ffffe527108 0000000000000000 GPR28: c000000000f4a730 c000000000f4a828 0000000000000000 c0000001319a3bf0 [ 2736.063498] NIP [c00000000037efb0] .__list_add+0x30/0x110 [ 2736.063504] LR [c0000000006c40f8] .rwsem_down_write_failed+0x78/0x264 [ 2736.063508] PACATMSCRATCH [800000000280f032] [ 2736.063511] Call Trace: [ 2736.063516] [c0000001319a3af0] [c0000001319a3b80] 0xc0000001319a3b80 (unreliable) [ 2736.063523] [c0000001319a3b80] [c0000000006c40f8] .rwsem_down_write_failed+0x78/0x264 [ 2736.063530] [c0000001319a3c50] [c0000000006c1bb0] .down_write+0x70/0x78 [ 2736.063536] [c0000001319a3cd0] [c0000000002e5ffc] .keyctl_get_persistent+0x20c/0x320 [ 2736.063542] [c0000001319a3dc0] [c0000000002e2388] .SyS_keyctl+0x238/0x260 [ 2736.063548] [c0000001319a3e30] [c000000000009e7c] syscall_exit+0x0/0x7c [ 2736.063553] Instruction dump: [ 2736.063556] 7c0802a6 fba1ffe8 fbc1fff0 fbe1fff8 7cbd2b78 7c9e2378 7c7f1b78 f8010010 [ 2736.063566] f821ff71 e8a50008 7fa52040 40de00c0 <e8be0000> 7fbd2840 40de0094 7fbff040 [ 2736.063579] ---[ end trace 2708241785538296 ]--- It's caused by uninitialized persistent_keyring_register_sem. The bug was introduced by commit `f36f8c75`, two typos are in that commit: CONFIG_KEYS_KERBEROS_CACHE should be CONFIG_PERSISTENT_KEYRINGS and krb_cache_register_sem should be persistent_keyring_register_sem. Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: David Howells <dhowells@redhat.com>	2013-12-13 15:59:11 +00:00
Kirill Tkhai	f46a3cbbeb	KEYS: Remove files generated when SYSTEM_TRUSTED_KEYRING=y Always remove generated SYSTEM_TRUSTED_KEYRING files while doing make mrproper. Signed-off-by: Kirill Tkhai <tkhai@yandex.ru> Signed-off-by: David Howells <dhowells@redhat.com>	2013-12-13 15:59:11 +00:00
David Howells	d7ec435fdd	X.509: Fix certificate gathering Fix the gathering of certificates from both the source tree and the build tree to correctly calculate the pathnames of all the certificates. The problem was that if the default generated cert, signing_key.x509, didn't exist then it would not have a path attached and if it did, it would have a path attached. This means that the contents of kernel/.x509.list would change between the first compilation in a directory and the second. After the second it would remain stable because the signing_key.x509 file exists. The consequence was that the kernel would get relinked unconditionally on the second recompilation. The second recompilation would also show something like this: X.509 certificate list changed CERTS kernel/x509_certificate_list - Including cert /home/torvalds/v2.6/linux/signing_key.x509 AS kernel/system_certificates.o LD kernel/built-in.o which is why the relink would happen. Unfortunately, it isn't a simple matter of just sticking a path on the front of the filename of the certificate in the build directory as make can't then work out how to build it. So the path has to be prepended to the name for sorting and duplicate elimination and then removed for the make rule if it is in the build tree. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: David Howells <dhowells@redhat.com>	2013-12-13 15:28:14 +00:00
Teodora Baluta	bd73a7f5cd	rcu: Remove "extern" from function declarations in kernel/rcu/rcu.h Function prototypes don't need to have the "extern" keyword since this is the default behavior. Its explicit use is redundant. This commit therefore removes them. Signed-off-by: Teodora Baluta <teobaluta@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2013-12-12 12:34:17 -08:00
Chen Gang	d100895086	rcu/torture: Dynamically allocate SRCU output buffer to avoid overflow If the rcutorture SRCU output exceeds 4096 bytes, for example, if you have more than about 75 CPUs, it will overflow the current statically allocated buffer. This commit therefore replaces this static buffer with a dynamically buffer whose size is based on the number of CPUs. Benefits: - Avoids both buffer overflow and output truncation. - Handles an arbitrarily large number of CPUs. - Straightforward implementation. Shortcomings: - Some memory is wasted: 1 cpu now comsumes 50 - 60 bytes, and this patch provides 200 bytes. Therefore, for 1K CPUs, roughly 100KB of memory will be wasted. However, the memory is freed immediately after printing, so this wastage should not be a problem in practice. Testing (Fedora16 2 CPUs, 2GB RAM x86_64): - as module, with/without "torture_type=srcu". - build-in not boot runnable, with/without "torture_type=srcu". - build-in let boot runnable, with/without "torture_type=srcu". Signed-off-by: Chen Gang <gang.chen@asianux.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2013-12-12 12:34:16 -08:00
Paul E. McKenney	a096932f0c	rcu: Don't activate RCU core on NO_HZ_FULL CPUs Whenever a CPU receives a scheduling-clock interrupt, RCU checks to see if the RCU core needs anything from this CPU. If so, RCU raises RCU_SOFTIRQ to carry out any needed processing. This approach has worked well historically, but it is undesirable on NO_HZ_FULL CPUs. Such CPUs are expected to spend almost all of their time in userspace, so that scheduling-clock interrupts can be disabled while there is only one runnable task on the CPU in question. Unfortunately, raising any softirq has the potential to wake up ksoftirqd, which would provide the second runnable task on that CPU, preventing disabling of scheduling-clock interrupts. What is needed instead is for RCU to leave NO_HZ_FULL CPUs alone, relying on the grace-period kthreads' quiescent-state forcing to do any needed RCU work on behalf of those CPUs. This commit therefore refrains from raising RCU_SOFTIRQ on any NO_HZ_FULL CPUs during any grace periods that have been in effect for less than one second. The one-second limit handles the case where an inappropriate workload is running on a NO_HZ_FULL CPU that features lots of scheduling-clock interrupts, but no idle or userspace time. Reported-by: Mike Galbraith <bitbucket@online.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Mike Galbraith <bitbucket@online.de> Toasted-by: Frederic Weisbecker <fweisbec@gmail.com>	2013-12-12 12:34:15 -08:00
Lai Jiangshan	79a62f957e	rcu: Warn on allegedly impossible rcu_read_unlock_special() from irq After commit #10f39bb1b2c1 (rcu: protect __rcu_read_unlock() against scheduler-using irq handlers), it is no longer possible to enter the main body of rcu_read_lock_special() from an NMI, interrupt, or softirq handler. In theory, this implies that the check for "in_irq() \|\| in_serving_softirq()" must always fail, so that in theory this check could be removed entirely. In practice, this commit wraps this condition with a WARN_ON_ONCE(). If this warning never triggers, then the condition will be removed entirely. [ paulmck: And one way of triggering the WARN_ON() is if a scheduling clock interrupt occurs in an RCU read-side critical section, setting RCU_READ_UNLOCK_NEED_QS, which is handled by rcu_read_unlock_special(). Updated this commit to return if only that bit was set. ] Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2013-12-12 12:34:00 -08:00
Linus Torvalds	5dec682c7f	Keyrings fixes 2013-12-10 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.15 (GNU/Linux) iQIVAwUAUqdgihOxKuMESys7AQIishAAjGG3LnEp12fd7oay5//4SEg31ybPqGj7 Xotk9mblBW7cWPbLqk7xyIhxpIj2zM14XatEV2TfBNZV0Lcrzr1M5s87ZRUX0ui+ FHbtfn/M3Or8AvX7W1HZgG2Se7M0Ba6Y2RRXNsEdgqk6KMQuHhPEZ2FZ5vCx6BAD vXzxWIKVNeqMXR7z6xkyqmpztnVQs+ZgzE9+c96QKyhBdtBy4spDmfgJS90m0pjN HhgONpdfgosknj8yu43rWIQvd3UUO5BVntCeic94Fbgh77ZAEi0dD7ifLz/ebcja pfJfPXxYzkCfIrSdAzQF8iYnQ+5rRomvGsMcvtq6mBooah/YmEkmKpKJDTp47wyN IEUeJaxM2Qp2jcUSjEd7lY9o1AK4sj+90cKeVRUd5kzZP1iolkQHR+dGiAwqbn6w MJBn9fCamvhpsZDhl2G/DICprFDvErd9zAlNubivggJmITnPXNWx+K1RDL4fO4qc zLHTGHkxYyACM/7oJfwbH/NyZ1yu53OlE3R2h6TA6ISc7nAed7qzGAZyrSV92Quc pItGaQ/zNa0sbe+nufCx9FOWu8sA3x7qnazLxhVtlPde9nlxcpGo5cagqcyc4hyp /IGK2JoGkgHvehECY8miJlsu7UqmThIhmy6o4T4X6ErX1M9ifR92WGeXkAi/2mh/ WciVpPvTrqM= =/AdA -----END PGP SIGNATURE----- Merge tag 'keys-devel-20131210' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull misc keyrings fixes from David Howells: "These break down into five sets: - A patch to error handling in the big_key type for huge payloads. If the payload is larger than the "low limit" and the backing store allocation fails, then big_key_instantiate() doesn't clear the payload pointers in the key, assuming them to have been previously cleared - but only one of them is. Unfortunately, the garbage collector still calls big_key_destroy() when sees one of the pointers with a weird value in it (and not NULL) which it then tries to clean up. - Three patches to fix the keyring type: * A patch to fix the hash function to correctly divide keyrings off from keys in the topology of the tree inside the associative array. This is only a problem if searching through nested keyrings - and only if the hash function incorrectly puts the a keyring outside of the 0 branch of the root node. * A patch to fix keyrings' use of the associative array. The __key_link_begin() function initially passes a NULL key pointer to assoc_array_insert() on the basis that it's holding a place in the tree whilst it does more allocation and stuff. This is only a problem when a node contains 16 keys that match at that level and we want to add an also matching 17th. This should easily be manufactured with a keyring full of keyrings (without chucking any other sort of key into the mix) - except for (a) above which makes it on average adding the 65th keyring. * A patch to fix searching down through nested keyrings, where any keyring in the set has more than 16 keyrings and none of the first keyrings we look through has a match (before the tree iteration needs to step to a more distal node). Test in keyutils test suite: http://git.kernel.org/cgit/linux/kernel/git/dhowells/keyutils.git/commit/?id=8b4ae963ed92523aea18dfbb8cab3f4979e13bd1 - A patch to fix the big_key type's use of a shmem file as its backing store causing audit messages and LSM check failures. This is done by setting S_PRIVATE on the file to avoid LSM checks on the file (access to the shmem file goes through the keyctl() interface and so is gated by the LSM that way). This isn't normally a problem if a key is used by the context that generated it - and it's currently only used by libkrb5. Test in keyutils test suite: http://git.kernel.org/cgit/linux/kernel/git/dhowells/keyutils.git/commit/?id=d9a53cbab42c293962f2f78f7190253fc73bd32e - A patch to add a generated file to .gitignore. - A patch to fix the alignment of the system certificate data such that it it works on s390. As I understand it, on the S390 arch, symbols must be 2-byte aligned because loading the address discards the least-significant bit" * tag 'keys-devel-20131210' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: KEYS: correct alignment of system_certificate_list content in assembly file Ignore generated file kernel/x509_certificate_list security: shmem: implement kernel private shmem inodes KEYS: Fix searching of nested keyrings KEYS: Fix multiple key add into associative array KEYS: Fix the keyring hash function KEYS: Pre-clear struct key on allocation	2013-12-12 10:15:24 -08:00
Linus Torvalds	5cdec2d833	futex: move user address verification up to common code When debugging the read-only hugepage case, I was confused by the fact that get_futex_key() did an access_ok() only for the non-shared futex case, since the user address checking really isn't in any way specific to the private key handling. Now, it turns out that the shared key handling does effectively do the equivalent checks inside get_user_pages_fast() (it doesn't actually check the address range on x86, but does check the page protections for being a user page). So it wasn't actually a bug, but the fact that we treat the address differently for private and shared futexes threw me for a loop. Just move the check up, so that it gets done for both cases. Also, use the 'rw' parameter for the type, even if it doesn't actually matter any more (it's a historical artifact of the old racy i386 "page faults from kernel space don't check write protections"). Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-12-12 09:53:51 -08:00
Linus Torvalds	f12d5bfceb	futex: fix handling of read-only-mapped hugepages The hugepage code had the exact same bug that regular pages had in commit `7485d0d375` ("futexes: Remove rw parameter from get_futex_key()"). The regular page case was fixed by commit `9ea71503a8` ("futex: Fix regression with read only mappings"), but the transparent hugepage case (added in `a5b338f2b0`: "thp: update futex compound knowledge") case remained broken. Found by Dave Jones and his trinity tool. Reported-and-tested-by: Dave Jones <davej@fedoraproject.org> Cc: stable@kernel.org # v2.6.38+ Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Mel Gorman <mgorman@suse.de> Cc: Darren Hart <dvhart@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-12-12 09:38:42 -08:00
Wei Yongjun	0be8669dd5	cgroup: fix missing unlock on error in cgroup_load_subsys() Add the missing unlock before return from function cgroup_load_subsys() in the error handling case. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>	2013-12-12 10:45:36 -05:00
Peter Zijlstra	c7f2e3cd6c	perf: Optimize ring-buffer write by depending on control dependencies Remove a full barrier from the ring-buffer write path by relying on a control dependency to order a LOAD -> STORE scenario. Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-8alv40z6ikk57jzbaobnxrjl@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-12-11 15:53:22 +01:00
Peter Zijlstra	9dbdb15553	sched/fair: Rework sched_fair time accounting Christian suffers from a bad BIOS that wrecks his i5's TSC sync. This results in him occasionally seeing time going backwards - which crashes the scheduler ... Most of our time accounting can actually handle that except the most common one; the tick time update of sched_fair. There is a further problem with that code; previously we assumed that because we get a tick every TICK_NSEC our time delta could never exceed 32bits and math was simpler. However, ever since Frederic managed to get NO_HZ_FULL merged; this is no longer the case since now a task can run for a long time indeed without getting a tick. It only takes about ~4.2 seconds to overflow our u32 in nanoseconds. This means we not only need to better deal with time going backwards; but also means we need to be able to deal with large deltas. This patch reworks the entire code and uses mul_u64_u32_shr() as proposed by Andy a long while ago. We express our virtual time scale factor in a u32 multiplier and shift right and the 32bit mul_u64_u32_shr() implementation reduces to a single 32x32->64 multiply if the time delta is still short (common case). For 64bit a 64x64->128 multiply can be used if ARCH_SUPPORTS_INT128. Reported-and-Tested-by: Christian Engelmayer <cengelma@gmx.at> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: fweisbec@gmail.com Cc: Paul Turner <pjt@google.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20131118172706.GI3866@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-12-11 15:52:35 +01:00
Peter Zijlstra	8e8339a3a1	sched: Initialize power_orig for overlapping groups Yinghai reported that he saw a /0 in sg_capacity on his EX parts. Make sure to always initialize power_orig now that we actually use it. Ideally build_sched_domains() -> init_sched_groups_power() would also initialize this; but for some yet unexplained reason some setups seem to miss updates there. Reported-by: Yinghai Lu <yinghai@kernel.org> Tested-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-l8ng2m9uml6fhibln8wqpom7@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-12-11 15:47:59 +01:00
Hendrik Brueckner	62226983da	KEYS: correct alignment of system_certificate_list content in assembly file Apart from data-type specific alignment constraints, there are also architecture-specific alignment requirements. For example, on s390 symbols must be on even addresses implying a 2-byte alignment. If the system_certificate_list_end symbol is on an odd address and if this address is loaded, the least-significant bit is ignored. As a result, the load_system_certificate_list() fails to load the certificates because of a wrong certificate length calculation. To be safe, align system_certificate_list on an 8-byte boundary. Also improve the length calculation of the system_certificate_list content. Introduce a system_certificate_list_size (8-byte aligned because of unsigned long) variable that stores the length. Let the linker calculate this size by introducing a start and end label for the certificate content. Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: David Howells <dhowells@redhat.com>	2013-12-10 18:25:28 +00:00
Rusty Russell	7cfe5b3310	Ignore generated file kernel/x509_certificate_list $ git status # On branch pending-rebases # Untracked files: # (use "git add <file>..." to include in what will be committed) # # kernel/x509_certificate_list nothing added to commit but untracked files present (use "git add" to track) $ Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David Howells <dhowells@redhat.com>	2013-12-10 18:21:34 +00:00
Paul E. McKenney	24ef659a85	rcu: Provide better diagnostics for blocking in RCU callback functions Currently blocking in an RCU callback function will result in "scheduling while atomic", which could be triggered for any number of reasons. To aid debugging, this patch introduces a rcu_callback_map that is used to tie the inappropriate voluntary context switch back to the fact that the function is being invoked from within a callback. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2013-12-09 15:12:39 -08:00
Paul E. McKenney	bc72d962d6	rcu: Improve SRCU's grace-period comments This commit documents the memory-barrier guarantees provided by synchronize_srcu() and call_srcu(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2013-12-09 15:12:38 -08:00
Paul E. McKenney	04f34650ca	rcu: Fix CONFIG_RCU_FANOUT_EXACT for odd fanout/leaf values Each element of the rcu_state structure's ->levelspread[] array is intended to contain the per-level fanout, where the zero-th element corresponds to the root of the rcu_node tree, and the last element corresponds to the leaves. In the CONFIG_RCU_FANOUT_EXACT case, this means that the last element should be filled in from CONFIG_RCU_FANOUT_LEAF (or from the rcu_fanout_leaf boot parameter, if provided) and that the remaining elements should be filled in from CONFIG_RCU_FANOUT. Unfortunately, the current code in rcu_init_levelspread() takes the opposite approach, placing CONFIG_RCU_FANOUT_LEAF in the zero-th element and CONFIG_RCU_FANOUT in the remaining elements. For typical power-of-two values, this generates odd but functional rcu_node trees. However, other values, for example CONFIG_RCU_FANOUT=3 and CONFIG_RCU_FANOUT_LEAF=2, generate trees that can leave some CPUs out of the grace-period computation, resulting in too-short grace periods and therefore a broken RCU implementation. This commit therefore fixes rcu_init_levelspread() to set the last ->levelspread[] array element from CONFIG_RCU_FANOUT_LEAF and the remaining elements from CONFIG_RCU_FANOUT, thus generating the intended rcu_node trees. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2013-12-09 15:12:38 -08:00
Fengguang Wu	f6f7ee9af7	rcu: Fix coccinelle warnings This commit fixes the following coccinelle warning: kernel/rcu/tree.c:712:9-10: WARNING: return of 0/1 in function 'rcu_lockdep_current_cpu_online' with return type bool Return statements in functions returning bool should use true/false instead of 1/0. Generated by: coccinelle/misc/boolreturn.cocci Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2013-12-09 15:12:25 -08:00
Frederic Weisbecker	531f64fd6f	posix-timers: Convert abuses of BUG_ON to WARN_ON The posix cpu timers code makes a heavy use of BUG_ON() but none of these concern fatal issues that require to stop the machine. So let's just warn the user when some internal state slips out of our hands. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Andrew Morton <akpm@linux-foundation.org>	2013-12-09 16:56:29 +01:00
Frederic Weisbecker	e73d84e33f	posix-timers: Remove remaining uses of tasklist_lock The remaining uses of tasklist_lock were mostly about synchronizing against sighand modifications, getting coherent and safe group samples and also thread/process wide timers list handling. All of this is already safely synchronizable with the target's sighand lock. Let's use it on these places instead. Also update the comments about locking. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Andrew Morton <akpm@linux-foundation.org>	2013-12-09 16:56:28 +01:00
Frederic Weisbecker	3d7a1427e4	posix-timers: Use sighand lock instead of tasklist_lock on timer deletion Timer deletion doesn't need the tasklist lock. We need to protect against: * concurrent access to the lists p->cputime_expires and p->sighand->cputime_expires * task reaping that may also delete the timer list entry * timer firing We already hold the timer lock which protects us against concurrent timer firing. The rest only need the targets sighand to be locked. So hold it and drop the use of tasklist_lock there. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Andrew Morton <akpm@linux-foundation.org>	2013-12-09 16:53:51 +01:00

... 4 5 6 7 8 ...