Commit graph

410 commits

Author SHA1 Message Date
Srinivasarao P
2d309c994d Merge android-4.4.107 (79f138a) into msm-4.4
* refs/heads/tmp-79f138a
  Linux 4.4.107
  ath9k: fix tx99 potential info leak
  IB/ipoib: Grab rtnl lock on heavy flush when calling ndo_open/stop
  RDMA/cma: Avoid triggering undefined behavior
  macvlan: Only deliver one copy of the frame to the macvlan interface
  udf: Avoid overflow when session starts at large offset
  scsi: bfa: integer overflow in debugfs
  scsi: sd: change allow_restart to bool in sysfs interface
  scsi: sd: change manage_start_stop to bool in sysfs interface
  vt6655: Fix a possible sleep-in-atomic bug in vt6655_suspend
  scsi: scsi_devinfo: Add REPORTLUN2 to EMC SYMMETRIX blacklist entry
  raid5: Set R5_Expanded on parity devices as well as data.
  pinctrl: adi2: Fix Kconfig build problem
  usb: musb: da8xx: fix babble condition handling
  tty fix oops when rmmod 8250
  powerpc/perf/hv-24x7: Fix incorrect comparison in memord
  scsi: hpsa: destroy sas transport properties before scsi_host
  scsi: hpsa: cleanup sas_phy structures in sysfs when unloading
  PCI: Detach driver before procfs & sysfs teardown on device remove
  xfs: fix incorrect extent state in xfs_bmap_add_extent_unwritten_real
  xfs: fix log block underflow during recovery cycle verification
  l2tp: cleanup l2tp_tunnel_delete calls
  bcache: fix wrong cache_misses statistics
  bcache: explicitly destroy mutex while exiting
  GFS2: Take inode off order_write list when setting jdata flag
  thermal/drivers/step_wise: Fix temperature regulation misbehavior
  ppp: Destroy the mutex when cleanup
  clk: tegra: Fix cclk_lp divisor register
  clk: imx6: refine hdmi_isfr's parent to make HDMI work on i.MX6 SoCs w/o VPU
  clk: mediatek: add the option for determining PLL source clock
  mm: Handle 0 flags in _calc_vm_trans() macro
  crypto: tcrypt - fix buffer lengths in test_aead_speed()
  arm-ccn: perf: Prevent module unload while PMU is in use
  target/file: Do not return error for UNMAP if length is zero
  target:fix condition return in core_pr_dump_initiator_port()
  iscsi-target: fix memory leak in lio_target_tiqn_addtpg()
  target/iscsi: Fix a race condition in iscsit_add_reject_from_cmd()
  powerpc/ipic: Fix status get and status clear
  powerpc/opal: Fix EBUSY bug in acquiring tokens
  netfilter: ipvs: Fix inappropriate output of procfs
  powerpc/powernv/cpufreq: Fix the frequency read by /proc/cpuinfo
  PCI/PME: Handle invalid data when reading Root Status
  dmaengine: ti-dma-crossbar: Correct am335x/am43xx mux value type
  rtc: pcf8563: fix output clock rate
  video: fbdev: au1200fb: Return an error code if a memory allocation fails
  video: fbdev: au1200fb: Release some resources if a memory allocation fails
  video: udlfb: Fix read EDID timeout
  fbdev: controlfb: Add missing modes to fix out of bounds access
  sfc: don't warn on successful change of MAC
  target: fix race during implicit transition work flushes
  target: fix ALUA transition timeout handling
  target: Use system workqueue for ALUA transitions
  btrfs: add missing memset while reading compressed inline extents
  NFSv4.1 respect server's max size in CREATE_SESSION
  efi/esrt: Cleanup bad memory map log messages
  perf symbols: Fix symbols__fixup_end heuristic for corner cases
  net/mlx4_core: Avoid delays during VF driver device shutdown
  afs: Fix afs_kill_pages()
  afs: Fix page leak in afs_write_begin()
  afs: Populate and use client modification time
  afs: Fix the maths in afs_fs_store_data()
  afs: Prevent callback expiry timer overflow
  afs: Migrate vlocation fields to 64-bit
  afs: Flush outstanding writes when an fd is closed
  afs: Adjust mode bits processing
  afs: Populate group ID from vnode status
  afs: Fix missing put_page()
  drm/radeon: reinstate oland workaround for sclk
  mmc: mediatek: Fixed bug where clock frequency could be set wrong
  sched/deadline: Use deadline instead of period when calculating overflow
  sched/deadline: Throttle a constrained deadline task activated after the deadline
  sched/deadline: Make sure the replenishment timer fires in the next period
  drm/radeon/si: add dpm quirk for Oland
  fjes: Fix wrong netdevice feature flags
  scsi: hpsa: limit outstanding rescans
  scsi: hpsa: update check for logical volume status
  openrisc: fix issue handling 8 byte get_user calls
  intel_th: pci: Add Gemini Lake support
  mlxsw: reg: Fix SPVMLR max record count
  mlxsw: reg: Fix SPVM max record count
  net: Resend IGMP memberships upon peer notification.
  dmaengine: Fix array index out of bounds warning in __get_unmap_pool()
  net: wimax/i2400m: fix NULL-deref at probe
  writeback: fix memory leak in wb_queue_work()
  netfilter: bridge: honor frag_max_size when refragmenting
  drm/omap: fix dmabuf mmap for dma_alloc'ed buffers
  Input: i8042 - add TUXEDO BU1406 (N24_25BU) to the nomux list
  NFSD: fix nfsd_reset_versions for NFSv4.
  NFSD: fix nfsd_minorversion(.., NFSD_AVAIL)
  net: bcmgenet: Power up the internal PHY before probing the MII
  net: bcmgenet: power down internal phy if open or resume fails
  net: bcmgenet: reserved phy revisions must be checked first
  net: bcmgenet: correct MIB access of UniMAC RUNT counters
  net: bcmgenet: correct the RBUF_OVFL_CNT and RBUF_ERR_CNT MIB values
  net: initialize msg.msg_flags in recvfrom
  userfaultfd: selftest: vm: allow to build in vm/ directory
  userfaultfd: shmem: __do_fault requires VM_FAULT_NOPAGE
  md-cluster: free md_cluster_info if node leave cluster
  usb: phy: isp1301: Add OF device ID table
  mac80211: Fix addition of mesh configuration element
  KEYS: add missing permission check for request_key() destination
  ext4: fix crash when a directory's i_size is too small
  ext4: fix fdatasync(2) after fallocate(2) operation
  dmaengine: dmatest: move callback wait queue to thread context
  sched/rt: Do not pull from current CPU if only one CPU to pull
  xhci: Don't add a virt_dev to the devs array before it's fully allocated
  Bluetooth: btusb: driver to enable the usb-wakeup feature
  ceph: drop negative child dentries before try pruning inode's alias
  usbip: fix stub_send_ret_submit() vulnerability to null transfer_buffer
  USB: core: prevent malicious bNumInterfaces overflow
  USB: uas and storage: Add US_FL_BROKEN_FUA for another JMicron JMS567 ID
  tracing: Allocate mask_str buffer dynamically
  autofs: fix careless error in recent commit
  crypto: salsa20 - fix blkcipher_walk API usage
  crypto: hmac - require that the underlying hash algorithm is unkeyed
  UPSTREAM: arm64: setup: introduce kaslr_offset()
  UPSTREAM: kcov: fix comparison callback signature
  UPSTREAM: kcov: support comparison operands collection
  UPSTREAM: kcov: remove pointless current != NULL check
  UPSTREAM: kcov: support compat processes
  UPSTREAM: kcov: simplify interrupt check
  UPSTREAM: kcov: make kcov work properly with KASLR enabled
  UPSTREAM: kcov: add more missing includes
  UPSTREAM: kcov: add missing #include <linux/sched.h>
  UPSTREAM: kcov: properly check if we are in an interrupt
  UPSTREAM: kcov: don't profile branches in kcov
  UPSTREAM: kcov: don't trace the code coverage code
  BACKPORT: kernel: add kcov code coverage

Conflicts:
	Makefile
	mm/kasan/Makefile
	scripts/Makefile.lib

Change-Id: Ic19953706ea2e700621b0ba94d1c90bbffa4f471
Signed-off-by: Srinivasarao P <spathi@codeaurora.org>
2018-01-18 12:49:58 +05:30
Dmitry Vyukov
9b83f370dc BACKPORT: kernel: add kcov code coverage
kcov provides code coverage collection for coverage-guided fuzzing
(randomized testing).  Coverage-guided fuzzing is a testing technique
that uses coverage feedback to determine new interesting inputs to a
system.  A notable user-space example is AFL
(http://lcamtuf.coredump.cx/afl/).  However, this technique is not
widely used for kernel testing due to missing compiler and kernel
support.

kcov does not aim to collect as much coverage as possible.  It aims to
collect more or less stable coverage that is function of syscall inputs.
To achieve this goal it does not collect coverage in soft/hard
interrupts and instrumentation of some inherently non-deterministic or
non-interesting parts of kernel is disbled (e.g.  scheduler, locking).

Currently there is a single coverage collection mode (tracing), but the
API anticipates additional collection modes.  Initially I also
implemented a second mode which exposes coverage in a fixed-size hash
table of counters (what Quentin used in his original patch).  I've
dropped the second mode for simplicity.

This patch adds the necessary support on kernel side.  The complimentary
compiler support was added in gcc revision 231296.

We've used this support to build syzkaller system call fuzzer, which has
found 90 kernel bugs in just 2 months:

  https://github.com/google/syzkaller/wiki/Found-Bugs

We've also found 30+ bugs in our internal systems with syzkaller.
Another (yet unexplored) direction where kcov coverage would greatly
help is more traditional "blob mutation".  For example, mounting a
random blob as a filesystem, or receiving a random blob over wire.

Why not gcov.  Typical fuzzing loop looks as follows: (1) reset
coverage, (2) execute a bit of code, (3) collect coverage, repeat.  A
typical coverage can be just a dozen of basic blocks (e.g.  an invalid
input).  In such context gcov becomes prohibitively expensive as
reset/collect coverage steps depend on total number of basic
blocks/edges in program (in case of kernel it is about 2M).  Cost of
kcov depends only on number of executed basic blocks/edges.  On top of
that, kernel requires per-thread coverage because there are always
background threads and unrelated processes that also produce coverage.
With inlined gcov instrumentation per-thread coverage is not possible.

kcov exposes kernel PCs and control flow to user-space which is
insecure.  But debugfs should not be mapped as user accessible.

Based on a patch by Quentin Casasnovas.

[akpm@linux-foundation.org: make task_struct.kcov_mode have type `enum kcov_mode']
[akpm@linux-foundation.org: unbreak allmodconfig]
[akpm@linux-foundation.org: follow x86 Makefile layout standards]
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: syzkaller <syzkaller@googlegroups.com>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Tavis Ormandy <taviso@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Kees Cook <keescook@google.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: David Drysdale <drysdale@google.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from 5c9a8750a6409c63a0f01d51a9024861022f6593)
Change-Id: I17b5e04f6e89b241924e78ec32ead79c38b860ce
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-18 09:41:57 -08:00
Blagovest Kolenichev
f9719b203c Merge android-4.4@d6fbbe5 (v4.4.93) into msm-4.4
* refs/heads/tmp-d6fbbe5
  Linux 4.4.93
  x86/alternatives: Fix alt_max_short macro to really be a max()
  USB: serial: console: fix use-after-free after failed setup
  USB: serial: qcserial: add Dell DW5818, DW5819
  USB: serial: option: add support for TP-Link LTE module
  USB: serial: cp210x: add support for ELV TFD500
  USB: serial: ftdi_sio: add id for Cypress WICED dev board
  fix unbalanced page refcounting in bio_map_user_iov
  direct-io: Prevent NULL pointer access in submit_page_section
  usb: gadget: composite: Fix use-after-free in usb_composite_overwrite_options
  ALSA: line6: Fix leftover URB at error-path during probe
  ALSA: caiaq: Fix stray URB at probe error path
  ALSA: seq: Fix copy_from_user() call inside lock
  ALSA: seq: Fix use-after-free at creating a port
  ALSA: usb-audio: Kill stray URB at exiting
  iommu/amd: Finish TLB flush in amd_iommu_unmap()
  usb: renesas_usbhs: Fix DMAC sequence for receiving zero-length packet
  KVM: nVMX: fix guest CR4 loading when emulating L2 to L1 exit
  crypto: shash - Fix zero-length shash ahash digest crash
  HID: usbhid: fix out-of-bounds bug
  dmaengine: edma: Align the memcpy acnt array size with the transfer
  MIPS: math-emu: Remove pr_err() calls from fpu_emu()
  USB: dummy-hcd: Fix deadlock caused by disconnect detection
  rcu: Allow for page faults in NMI handlers
  iwlwifi: mvm: use IWL_HCMD_NOCOPY for MCAST_FILTER_CMD
  nl80211: Define policy for packet pattern attributes
  CIFS: Reconnect expired SMB sessions
  ext4: in ext4_seek_{hole,data}, return -ENXIO for negative offsets
  brcmfmac: add length check in brcmf_cfg80211_escan_handler()
  ANDROID: HACK: arm64: use -mno-implicit-float instead of -mgeneral-regs-only
  sched: Update task->on_rq when tasks are moving between runqueues
  FROMLIST: f2fs: expose some sectors to user in inline data or dentry case
  crypto: Work around deallocated stack frame reference gcc bug on sparc.
  UPSTREAM: f2fs: fix potential panic during fstrim
  ANDROID: fscrypt: remove unnecessary fscrypto.h
  ANDROID: binder: fix node sched policy calculation
  ANDROID: Kbuild, LLVMLinux: allow overriding clang target triple
  CHROMIUM: arm64: Disable asm-operand-width warning for clang
  CHROMIUM: kbuild: clang: Disable the 'duplicate-decl-specifier' warning
  UPSTREAM: x86/build: Use cc-option to validate stack alignment parameter
  UPSTREAM: x86/build: Fix stack alignment for CLang
  UPSTREAM: efi/libstub/arm64: Set -fpie when building the EFI stub
  BACKPORT: efi/libstub/arm64: Force 'hidden' visibility for section markers
  UPSTREAM: compiler, clang: always inline when CONFIG_OPTIMIZE_INLINING is disabled
  UPSTREAM: x86/boot: #undef memcpy() et al in string.c
  UPSTREAM: crypto: arm64/sha - avoid non-standard inline asm tricks
  UPSTREAM: kbuild: clang: Disable 'address-of-packed-member' warning
  UPSTREAM: x86/build: Specify stack alignment for clang
  UPSTREAM: x86/build: Use __cc-option for boot code compiler options
  BACKPORT: kbuild: Add __cc-option macro
  UPSTREAM: x86/hweight: Don't clobber %rdi
  BACKPORT: x86/hweight: Get rid of the special calling convention
  BACKPORT: x86/mm/kaslr: Use the _ASM_MUL macro for multiplication to work around Clang incompatibility
  UPSTREAM: crypto, x86: aesni - fix token pasting for clang
  UPSTREAM: x86/kbuild: Use cc-option to enable -falign-{jumps/loops}
  UPSTREAM: compiler, clang: properly override 'inline' for clang
  UPSTREAM: compiler, clang: suppress warning for unused static inline functions
  UPSTREAM: Kbuild: provide a __UNIQUE_ID for clang
  UPSTREAM: modules: mark __inittest/__exittest as __maybe_unused
  BACKPORT: kbuild: Add support to generate LLVM assembly files
  UPSTREAM: kbuild: use -Oz instead of -Os when using clang
  BACKPORT: kbuild, LLVMLinux: Add -Werror to cc-option to support clang
  UPSTREAM: kbuild: drop -Wno-unknown-warning-option from clang options
  UPSTREAM: kbuild: fix asm-offset generation to work with clang
  UPSTREAM: kbuild: consolidate redundant sed script ASM offset generation
  UPSTREAM: kbuild: Consolidate header generation from ASM offset information
  UPSTREAM: kbuild: clang: add -no-integrated-as to KBUILD_[AC]FLAGS
  UPSTREAM: kbuild: Add better clang cross build support

Conflicts:
	arch/x86/lib/Makefile
	net/wireless/nl80211.c

Change-Id: I76032e8d1206903bc948b9ed918e7ddee7e746c7
Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
2017-10-20 06:07:34 -07:00
Dmitry Shmidt
d6fbbe5e66 This is the 4.4.93 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlnnAMwACgkQONu9yGCS
 aT63GRAAzQRMJMgigfYCskk/3bXyca7lkvt9mi7DXhYjsFhouCz99w/LXw85+mDj
 AEqpFolmNdKysmpHeuNkCOU88Dx1Tg/W05bISproQaoB9DYBmBBnPSY05TSFtm3B
 IHeOHOmObJYh98f38y8idJ6pOA+N/r8xWJRmr/86kxNObB1E+0pBSONIJjfcFzp8
 Ny5GVb/O9dTkU/wqDw+jlpLDo/KMKansve4XSQeAIkZGY7Gp43z+CBXjCxuCYwTg
 /nvjZ5W0yemVlXTZWqvDhkDbjJ9+N0IKWjpTgcPrKWsYqKcOx5xY1vAGN1ynOAE7
 3NF+B6MDZwvyVfcisNXgYHxOy81f+HOgSBZKmfnKGZh+JoFmMG3nXY3osyIl8/sh
 2K9QdUiI7ldI5UkLDQJ16GpJwaiPocEBHETKjtn9H09CpMPHGtwxFNm1rIMDRDZl
 VQeiVkmra1CPcsnbYQNH+H4dXza4qcSlSqVukQQckQHiYGbP8UGqZVM0L6cfOmQs
 oLG69Rd5OGGWDizxgTfId16wOnelGij6LvNB8GBB+Vs3E4W8efsnIqVhcR4GjT1o
 1vRQzsrmdjPCmBCjJ9qRAG84oz6WfbjvmGVN2RizfWW2uM79cUcdBzN+Bzf9WTw/
 F30cJZi6OJFrH4bXdufbe6/XZXqyv4pkBDtv7qfTXA2/UmXS3IA=
 =uPo9
 -----END PGP SIGNATURE-----

Merge 4.4.93 into android-4.4

Changes in 4.4.93
	brcmfmac: add length check in brcmf_cfg80211_escan_handler()
	ext4: in ext4_seek_{hole,data}, return -ENXIO for negative offsets
	CIFS: Reconnect expired SMB sessions
	nl80211: Define policy for packet pattern attributes
	iwlwifi: mvm: use IWL_HCMD_NOCOPY for MCAST_FILTER_CMD
	rcu: Allow for page faults in NMI handlers
	USB: dummy-hcd: Fix deadlock caused by disconnect detection
	MIPS: math-emu: Remove pr_err() calls from fpu_emu()
	dmaengine: edma: Align the memcpy acnt array size with the transfer
	HID: usbhid: fix out-of-bounds bug
	crypto: shash - Fix zero-length shash ahash digest crash
	KVM: nVMX: fix guest CR4 loading when emulating L2 to L1 exit
	usb: renesas_usbhs: Fix DMAC sequence for receiving zero-length packet
	iommu/amd: Finish TLB flush in amd_iommu_unmap()
	ALSA: usb-audio: Kill stray URB at exiting
	ALSA: seq: Fix use-after-free at creating a port
	ALSA: seq: Fix copy_from_user() call inside lock
	ALSA: caiaq: Fix stray URB at probe error path
	ALSA: line6: Fix leftover URB at error-path during probe
	usb: gadget: composite: Fix use-after-free in usb_composite_overwrite_options
	direct-io: Prevent NULL pointer access in submit_page_section
	fix unbalanced page refcounting in bio_map_user_iov
	USB: serial: ftdi_sio: add id for Cypress WICED dev board
	USB: serial: cp210x: add support for ELV TFD500
	USB: serial: option: add support for TP-Link LTE module
	USB: serial: qcserial: add Dell DW5818, DW5819
	USB: serial: console: fix use-after-free after failed setup
	x86/alternatives: Fix alt_max_short macro to really be a max()
	Linux 4.4.93

Change-Id: I731bf1eef5aca9728dddd23bfbe407f0c6ff2d84
Signed-off-by: Dmitry Shmidt <dimitrysh@google.com>
2017-10-19 10:08:29 -07:00
Paul E. McKenney
5fd4551659 rcu: Allow for page faults in NMI handlers
commit 28585a832602747cbfa88ad8934013177a3aae38 upstream.

A number of architecture invoke rcu_irq_enter() on exception entry in
order to allow RCU read-side critical sections in the exception handler
when the exception is from an idle or nohz_full CPU.  This works, at
least unless the exception happens in an NMI handler.  In that case,
rcu_nmi_enter() would already have exited the extended quiescent state,
which would mean that rcu_irq_enter() would (incorrectly) cause RCU
to think that it is again in an extended quiescent state.  This will
in turn result in lockdep splats in response to later RCU read-side
critical sections.

This commit therefore causes rcu_irq_enter() and rcu_irq_exit() to
take no action if there is an rcu_nmi_enter() in effect, thus avoiding
the unscheduled return to RCU quiescent state.  This in turn should
make the kernel safe for on-demand RCU voyeurism.

Link: http://lkml.kernel.org/r/20170922211022.GA18084@linux.vnet.ibm.com

Cc: stable@vger.kernel.org
Fixes: 0be964be0 ("module: Sanitize RCU usage and locking")
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-18 09:20:41 +02:00
Paul E. McKenney
3bc5ee6fd7 rcu: Stop disabling interrupts in scheduler fastpaths
We need the scheduler's fastpaths to be, well, fast, and unnecessarily
disabling and re-enabling interrupts is not necessarily consistent with
this goal.  Especially given that there are regions of the scheduler that
already have interrupts disabled.

This commit therefore moves the call to rcu_note_context_switch()
to one of the interrupts-disabled regions of the scheduler, and
removes the now-redundant disabling and re-enabling of interrupts from
rcu_note_context_switch() and the functions it calls.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Change-Id: I8de5c9890b1db126b06d4d8fed717b3c8bfcf866
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Shift rcu_note_context_switch() to avoid deadlock, as suggested
  by Peter Zijlstra. ]
Git-commit: 46a5d164db53ba6066b11889abb7fa6bddbe5cf7
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
[prsood@codeaurora.org: resolve trivial merge conflicts]
Signed-off-by: Prateek Sood <prsood@codeaurora.org>
2017-10-03 00:05:13 -07:00
Paul E. McKenney
b8bddf51cf rcu: Simplify rcu_sched_qs() control flow
This commit applies an early-exit approach to rcu_sched_qs(), reducing
the nesting level and saving a line of code.

Change-Id: Ib5dff7a0e3a26ce8ed319fd03b15e77b7b6650db
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Git-commit: fecbf6f01fbd83e6419ccb7f61d9a6eb987f1d92
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Prateek Sood <prsood@codeaurora.org>
2017-10-03 00:04:24 -07:00
Prasad Sodagudi
4f659aa55e rcu: Induce msm watchdog bite for rcu stalls
Every RCU stall need to be debugged, So collect the ram
dumps on every RCU stall to debug further by inducing
non secure watchdog bite whenever rcu stall detected.

Change-Id: I6c1cfddc92f06b48c3f22fe9970b70f2ec670bf6
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
2017-03-09 10:07:35 -08:00
Alex Shi
2f0de5192a Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android 2016-12-12 22:17:37 +08:00
Dmitry Shmidt
97b84c9650 This is the 4.4.37 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlhI+pgACgkQONu9yGCS
 aT4cfg/9EvPNVXZaO12eBDOXExA9SJKysG7kxVq2Ks6Z4hpKOkbvTv3enbUM7P37
 eN+G2v5zythE7MRR4MfF1pFcUDDLury3yAdZSfghRNpU85LwKLJmPYh74WVl9SSj
 wNoAG27gbAidVg5aYNVa3vkt+kmNkM8zNJ79Sk7mfqyGbC25GBkqQ/e2oFnGtZfN
 KbQ/J0G9hvKhFc8q6xY/Oz/PZbuf1kB7W2YuuQq1eLvNf99PErh1UAthL7C0HKDZ
 p6BMCIoE/Q3X4iXhGIpr0NJLr7/ke+fwjl3+ACwfj/xJ5vew/mj+tT8xxZvUGtaA
 jtP432UtRkoofpsgZ6pFUhRcWxErsd5ikAM7xfrzsO2vPE+XGnMODms2A0G7J8kC
 rXF4QtUR9i92HTroTG32EhIGDH1M0ZqeM5Hy+8iMsMidIuemHIORZdva/ZPSR7Lk
 7rtD2Ye2r54oDib97d8MdMIlsQxS2gUF8Yh2NxquK6UCbBdXYzgMo0XlM6sMYrur
 Nt1u9IYa4hVnSbFPh4pmm7MBwNkkyUKrI/Jc/4tr0f3yGOS/xKq3thTxsWUdxoRi
 Gs7nkJx9vPZdB0gmd4wA4Lvjk4VCtyXQI3apMsSvZpG0IOnjKQRJ+6ExN7ZC1Tgf
 WeWRfMS6SfoRLnbPFumGpjp5OJloCxvMcRQtvfiTy3eMSR++Y0c=
 =rmG9
 -----END PGP SIGNATURE-----

Merge tag 'v4.4.37' into android-4.4.y

This is the 4.4.37 stable release

Change-Id: Ic6753a5a223abc02c4fe5205642d4f904de2e5b8
2016-12-08 15:08:27 -08:00
Ding Tianhong
dfb704f96c rcu: Fix soft lockup for rcu_nocb_kthread
commit bedc1969150d480c462cdac320fa944b694a7162 upstream.

Carrying out the following steps results in a softlockup in the
RCU callback-offload (rcuo) kthreads:

1. Connect to ixgbevf, and set the speed to 10Gb/s.
2. Use ifconfig to bring the nic up and down repeatedly.

[  317.005148] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
[  368.106005] BUG: soft lockup - CPU#1 stuck for 22s! [rcuos/1:15]
[  368.106005] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  368.106005] task: ffff88057dd8a220 ti: ffff88057dd9c000 task.ti: ffff88057dd9c000
[  368.106005] RIP: 0010:[<ffffffff81579e04>]  [<ffffffff81579e04>] fib_table_lookup+0x14/0x390
[  368.106005] RSP: 0018:ffff88061fc83ce8  EFLAGS: 00000286
[  368.106005] RAX: 0000000000000001 RBX: 00000000020155c0 RCX: 0000000000000001
[  368.106005] RDX: ffff88061fc83d50 RSI: ffff88061fc83d70 RDI: ffff880036d11a00
[  368.106005] RBP: ffff88061fc83d08 R08: 0000000000000001 R09: 0000000000000000
[  368.106005] R10: ffff880036d11a00 R11: ffffffff819e0900 R12: ffff88061fc83c58
[  368.106005] R13: ffffffff816154dd R14: ffff88061fc83d08 R15: 00000000020155c0
[  368.106005] FS:  0000000000000000(0000) GS:ffff88061fc80000(0000) knlGS:0000000000000000
[  368.106005] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  368.106005] CR2: 00007f8c2aee9c40 CR3: 000000057b222000 CR4: 00000000000407e0
[  368.106005] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  368.106005] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  368.106005] Stack:
[  368.106005]  00000000010000c0 ffff88057b766000 ffff8802e380b000 ffff88057af03e00
[  368.106005]  ffff88061fc83dc0 ffffffff815349a6 ffff88061fc83d40 ffffffff814ee146
[  368.106005]  ffff8802e380af00 00000000e380af00 ffffffff819e0900 020155c0010000c0
[  368.106005] Call Trace:
[  368.106005]  <IRQ>
[  368.106005]
[  368.106005]  [<ffffffff815349a6>] ip_route_input_noref+0x516/0xbd0
[  368.106005]  [<ffffffff814ee146>] ? skb_release_data+0xd6/0x110
[  368.106005]  [<ffffffff814ee20a>] ? kfree_skb+0x3a/0xa0
[  368.106005]  [<ffffffff8153698f>] ip_rcv_finish+0x29f/0x350
[  368.106005]  [<ffffffff81537034>] ip_rcv+0x234/0x380
[  368.106005]  [<ffffffff814fd656>] __netif_receive_skb_core+0x676/0x870
[  368.106005]  [<ffffffff814fd868>] __netif_receive_skb+0x18/0x60
[  368.106005]  [<ffffffff814fe4de>] process_backlog+0xae/0x180
[  368.106005]  [<ffffffff814fdcb2>] net_rx_action+0x152/0x240
[  368.106005]  [<ffffffff81077b3f>] __do_softirq+0xef/0x280
[  368.106005]  [<ffffffff8161619c>] call_softirq+0x1c/0x30
[  368.106005]  <EOI>
[  368.106005]
[  368.106005]  [<ffffffff81015d95>] do_softirq+0x65/0xa0
[  368.106005]  [<ffffffff81077174>] local_bh_enable+0x94/0xa0
[  368.106005]  [<ffffffff81114922>] rcu_nocb_kthread+0x232/0x370
[  368.106005]  [<ffffffff81098250>] ? wake_up_bit+0x30/0x30
[  368.106005]  [<ffffffff811146f0>] ? rcu_start_gp+0x40/0x40
[  368.106005]  [<ffffffff8109728f>] kthread+0xcf/0xe0
[  368.106005]  [<ffffffff810971c0>] ? kthread_create_on_node+0x140/0x140
[  368.106005]  [<ffffffff816147d8>] ret_from_fork+0x58/0x90
[  368.106005]  [<ffffffff810971c0>] ? kthread_create_on_node+0x140/0x140

==================================cut here==============================

It turns out that the rcuos callback-offload kthread is busy processing
a very large quantity of RCU callbacks, and it is not reliquishing the
CPU while doing so.  This commit therefore adds an cond_resched_rcu_qs()
within the loop to allow other tasks to run.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
[ paulmck: Substituted cond_resched_rcu_qs for cond_resched. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Dhaval Giani <dhaval.giani@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-12-08 07:15:24 +01:00
Guenter Roeck
8b94247342 ANDROID: rcu_sync: Export rcu_sync_lockdep_assert
x86_64:allmodconfig fails to build with the following error.

ERROR: "rcu_sync_lockdep_assert" [kernel/locking/locktorture.ko] undefined!

Introduced by commit 3228c5eb7a ("RFC: FROMLIST: locking/percpu-rwsem:
Optimize readers and reduce global impact"). The applied upstream version
exports the missing symbol, so let's do the same.

Change-Id: If4e516715c3415fe8c82090f287174857561550d
Fixes: 3228c5eb7a ("RFC: FROMLIST: locking/percpu-rwsem: Optimize ...")
Signed-off-by: Guenter Roeck <groeck@chromium.org>
2016-09-14 14:26:20 +05:30
Peter Zijlstra
a81c69e149 RFC: FROMLIST: cgroup: avoid synchronize_sched() in __cgroup_procs_write()
The current percpu-rwsem read side is entirely free of serializing insns
at the cost of having a synchronize_sched() in the write path.

The latency of the synchronize_sched() is too high for cgroups. The
commit 1ed1328792 talks about the write path being a fairly cold path
but this is not the case for Android which moves task to the foreground
cgroup and back around binder IPC calls from foreground processes to
background processes, so it is significantly hotter than human initiated
operations.

Switch cgroup_threadgroup_rwsem into the slow mode for now to avoid the
problem, hopefully it should not be that slow after another commit
80127a39681b ("locking/percpu-rwsem: Optimize readers and reduce global
impact").

We could just add rcu_sync_enter() into cgroup_init() but we do not want
another synchronize_sched() at boot time, so this patch adds the new helper
which doesn't block but currently can only be called before the first use.

Cc: Tejun Heo <tj@kernel.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Reported-by: John Stultz <john.stultz@linaro.org>
Reported-by: Dmitry Shmidt <dimitrysh@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
[jstultz: backported to 4.4]
Change-Id: I34aa9c394d3052779b56976693e96d861bd255f2
Mailing-list-URL: https://lkml.org/lkml/2016/8/11/557
Signed-off-by: John Stultz <john.stultz@linaro.org>
2016-09-14 14:26:20 +05:30
Guenter Roeck
4547c0f935 ANDROID: rcu_sync: Export rcu_sync_lockdep_assert
x86_64:allmodconfig fails to build with the following error.

ERROR: "rcu_sync_lockdep_assert" [kernel/locking/locktorture.ko] undefined!

Introduced by commit 3228c5eb7a ("RFC: FROMLIST: locking/percpu-rwsem:
Optimize readers and reduce global impact"). The applied upstream version
exports the missing symbol, so let's do the same.

Change-Id: If4e516715c3415fe8c82090f287174857561550d
Fixes: 3228c5eb7a ("RFC: FROMLIST: locking/percpu-rwsem: Optimize ...")
Signed-off-by: Guenter Roeck <groeck@chromium.org>
2016-08-31 10:10:44 -07:00
Peter Zijlstra
0c3240a1ef RFC: FROMLIST: cgroup: avoid synchronize_sched() in __cgroup_procs_write()
The current percpu-rwsem read side is entirely free of serializing insns
at the cost of having a synchronize_sched() in the write path.

The latency of the synchronize_sched() is too high for cgroups. The
commit 1ed1328792 talks about the write path being a fairly cold path
but this is not the case for Android which moves task to the foreground
cgroup and back around binder IPC calls from foreground processes to
background processes, so it is significantly hotter than human initiated
operations.

Switch cgroup_threadgroup_rwsem into the slow mode for now to avoid the
problem, hopefully it should not be that slow after another commit
80127a39681b ("locking/percpu-rwsem: Optimize readers and reduce global
impact").

We could just add rcu_sync_enter() into cgroup_init() but we do not want
another synchronize_sched() at boot time, so this patch adds the new helper
which doesn't block but currently can only be called before the first use.

Cc: Tejun Heo <tj@kernel.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Reported-by: John Stultz <john.stultz@linaro.org>
Reported-by: Dmitry Shmidt <dimitrysh@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
[jstultz: backported to 4.4]
Change-Id: I34aa9c394d3052779b56976693e96d861bd255f2
Mailing-list-URL: https://lkml.org/lkml/2016/8/11/557
Signed-off-by: John Stultz <john.stultz@linaro.org>
2016-08-26 09:37:43 -07:00
Paul E. McKenney
39cd2dd39a Merge branches 'doc.2015.10.06a', 'percpu-rwsem.2015.10.06a' and 'torture.2015.10.06a' into HEAD
doc.2015.10.06a:  Documentation updates.
percpu-rwsem.2015.10.06a:  Optimization of per-CPU reader-writer semaphores.
torture.2015.10.06a:  Torture-test updates.
2015-10-07 16:06:25 -07:00
Paul E. McKenney
d2856b046d Merge branches 'fixes.2015.10.06a' and 'exp.2015.10.07a' into HEAD
exp.2015.10.07a:  Reduce OS jitter of RCU-sched expedited grace periods.
fixes.2015.10.06a:  Miscellaneous fixes.
2015-10-07 16:05:21 -07:00
Paul E. McKenney
338b0f760e rcu: Better hotplug handling for synchronize_sched_expedited()
Earlier versions of synchronize_sched_expedited() can prematurely end
grace periods due to the fact that a CPU marked as cpu_is_offline()
can still be using RCU read-side critical sections during the time that
CPU makes its last pass through the scheduler and into the idle loop
and during the time that a given CPU is in the process of coming online.
This commit therefore eliminates this window by adding additional
interaction with the CPU-hotplug operations.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-07 16:02:50 -07:00
Paul E. McKenney
b08517c76d rcu: Enable stall warnings for synchronize_rcu_expedited()
This commit redirects synchronize_rcu_expedited()'s wait to
synchronize_sched_expedited_wait(), thus enabling RCU CPU
stall warnings.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-07 16:02:50 -07:00
Paul E. McKenney
c58656382e rcu: Add tasks to expedited stall-warning messages
This commit adds task-print ability to the expedited RCU CPU stall
warning messages in preparation for adding stall warnings to
synchornize_rcu_expedited().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-07 16:02:50 -07:00
Paul E. McKenney
74611ecb0f rcu: Add online/offline info to expedited stall warning message
This commit makes the RCU CPU stall warning message print online/offline
indications immediately after the CPU number.  A "O" indicates global
offline, a "." global online, and a "o" indicates RCU believes that the
CPU is offline for the current grace period and "." otherwise, and an
"N" indicates that RCU believes that the CPU will be offline for the
next grace period, and "." otherwise, all right after the CPU number.
So for CPU 10, you would normally see "10-...:" indicating that everything
believes that the CPU is online.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-07 16:02:50 -07:00
Paul E. McKenney
dcdb8807ba rcu: Consolidate expedited CPU selection
Now that sync_sched_exp_select_cpus() and sync_rcu_exp_select_cpus()
are identical aside from the the argument to smp_call_function_single(),
this commit consolidates them with a functional argument.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-07 16:02:50 -07:00
Paul E. McKenney
66fe6cbee4 rcu: Prepare for consolidating expedited CPU selection
This commit brings sync_sched_exp_select_cpus() into alignment with
sync_rcu_exp_select_cpus(), as a first step towards consolidating them
into one function.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-07 16:02:50 -07:00
Paul E. McKenney
807226e2fb rcu: Stop excluding CPU hotplug in synchronize_sched_expedited()
Now that synchronize_sched_expedited() uses IPIs, a hook in
rcu_sched_qs(), and the ->expmask field in the rcu_node combining
tree, it is no longer necessary to exclude CPU hotplug.  Any
races with CPU hotplug will be detected when attempting to send
the IPI.  This commit therefore removes the code excluding
CPU hotplug operations.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-07 16:02:49 -07:00
Paul E. McKenney
83c2c735e7 rcu: Stop silencing lockdep false positive for expedited grace periods
This reverts commit af859beaab (rcu: Silence lockdep false positive
for expedited grace periods).  Because synchronize_rcu_expedited()
no longer invokes synchronize_sched_expedited(), ->exp_funnel_mutex
acquisition is no longer nested, so the false positive no longer happens.
This commit therefore removes the extra lockdep data structures, as they
are no longer needed.
2015-10-07 16:02:49 -07:00
Paul E. McKenney
6587a23b6b rcu: Switch synchronize_sched_expedited() to IPI
This commit switches synchronize_sched_expedited() from stop_one_cpu_nowait()
to smp_call_function_single(), thus moving from an IPI and a pair of
context switches to an IPI and a single pass through the scheduler.
Of course, if the scheduler actually does decide to switch to a different
task, there will still be a pair of context switches, but there would
likely have been a pair of context switches anyway, just a bit later.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-07 16:01:12 -07:00
Paul E. McKenney
4f441a258f rcutorture: Fix unused-function warning for torturing_tasks()
The torturing_tasks() function is used only in kernels built with
CONFIG_PROVE_RCU=y, so the second definition can result in unused-function
compiler warnings.  This commit adds __maybe_unused to suppress these
warnings.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:28:09 -07:00
Paul E. McKenney
889d487a26 rcutorture: Fix module unwind when bad torture_type specified
The rcutorture module has a list of torture types, and specifying a
type not on this list is supposed to cleanly fail the module load.
Unfortunately, the "fail" happens without the "cleanly".  This commit
therefore adds the needed clean-up after an incorrect torture_type.

Reported-by: David Miller <davem@davemloft.net>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: David Miller <davem@davemloft.net>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:28:01 -07:00
Oleg Nesterov
4bace7344d rcu_sync: Cleanup the CONFIG_PROVE_RCU checks
1. Rename __rcu_sync_is_idle() to rcu_sync_lockdep_assert() and
   change it to use rcu_lockdep_assert().

2. Change rcu_sync_is_idle() to return rsp->gp_state == GP_IDLE
   unconditonally, this way we can remove the same check from
   rcu_sync_lockdep_assert() and clearly isolate the debugging
   code.

Note: rcu_sync_enter()->wait_event(gp_state == GP_PASSED) needs
another CONFIG_PROVE_RCU check, the same as is done in ->sync(); but
this needs some simple preparations in the core RCU code to avoid the
code duplication.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:25:45 -07:00
Oleg Nesterov
07899a6e5f rcu_sync: Introduce rcu_sync_dtor()
This commit allows rcu_sync structures to be safely deallocated,
The trick is to add a new ->wait field to the gp_ops array.
This field is a pointer to the rcu_barrier() function corresponding
to the flavor of RCU in question.  This allows a new rcu_sync_dtor()
to wait for any outstanding callbacks before freeing the rcu_sync
structure.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:25:21 -07:00
Oleg Nesterov
3a518b76af rcu_sync: Add CONFIG_PROVE_RCU checks
This commit validates that the caller of rcu_sync_is_idle() holds the
corresponding type of RCU read-side lock, but only in kernels built
with CONFIG_PROVE_RCU=y.  This validation is carried out via a new
rcu_sync_ops->held() method that is checked within rcu_sync_is_idle().

Note that although this does add code to the fast path, it only does so
in kernels built with CONFIG_PROVE_RCU=y.

Suggested-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:25:16 -07:00
Oleg Nesterov
82e8c565be rcu_sync: Simplify rcu_sync using new rcu_sync_ops structure
This commit adds the new struct rcu_sync_ops which holds sync/call
methods, and turns the function pointers in rcu_sync_struct into an array
of struct rcu_sync_ops.  This simplifies the "init" helpers by collapsing
a switch statement and explicit multiple definitions into a simple
assignment and a helper macro, respectively.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:25:10 -07:00
Oleg Nesterov
cc44ca848f rcu: Create rcu_sync infrastructure
The rcu_sync infrastructure can be thought of as infrastructure to be
used to implement reader-writer primitives having extremely lightweight
readers during times when there are no writers.  The first use is in
the percpu_rwsem used by the VFS subsystem.

This infrastructure is functionally equivalent to

        struct rcu_sync_struct {
                atomic_t counter;
        };

	/* Check possibility of fast-path read-side operations. */
        static inline bool rcu_sync_is_idle(struct rcu_sync_struct *rss)
        {
                return atomic_read(&rss->counter) == 0;
        }

	/* Tell readers to use slowpaths. */
        static inline void rcu_sync_enter(struct rcu_sync_struct *rss)
        {
                atomic_inc(&rss->counter);
                synchronize_sched();
        }

	/* Allow readers to once again use fastpaths. */
        static inline void rcu_sync_exit(struct rcu_sync_struct *rss)
        {
                synchronize_sched();
                atomic_dec(&rss->counter);
        }

The main difference is that it records the state and only calls
synchronize_sched() if required.  At least some of the calls to
synchronize_sched() will be optimized away when rcu_sync_enter() and
rcu_sync_exit() are invoked repeatedly in quick succession.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:25:04 -07:00
Paul E. McKenney
3836f5337f torture: Consolidate cond_resched_rcu_qs() into stutter_wait()
This commit moves cond_resched_rcu_qs() into stutter_wait(), saving
a line and also avoiding RCU CPU stall warnings from all torture
loops containing a stutter_wait().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:25:01 -07:00
Paul E. McKenney
c34d2f4184 rcu: Correct comment for values of ->gp_state field
This commit corrects the comment for the values of the ->gp_state field,
which previously incorrectly said that these were for the ->gp_flags
field.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:16:11 -07:00
Petr Mladek
77f81fe08e rcu: Finish folding ->fqs_state into ->gp_state
Commit commit 4cdfc175c2 ("rcu: Move quiescent-state forcing
into kthread") started the process of folding the old ->fqs_state into
->gp_state, but did not complete it.  This situation does not cause
any malfunction, but can result in extremely confusing trace output.
This commit completes this task of eliminating ->fqs_state in favor
of ->gp_state.

The old ->fqs_state was also used to decide when to collect dyntick-idle
snapshots.  For this purpose, we add a boolean variable into the kthread,
which is set on the first call to rcu_gp_fqs() for a given grace period
and clear otherwise.

Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:15:59 -07:00
Paul E. McKenney
49f5903b47 rcu: Move preemption disabling out of __srcu_read_lock()
Currently, __srcu_read_lock() cannot be invoked from restricted
environments because it contains calls to preempt_disable() and
preempt_enable(), both of which can invoke lockdep, which is a bad
idea in some restricted execution modes.  This commit therefore moves
the preempt_disable() and preempt_enable() from __srcu_read_lock()
to srcu_read_lock().  It also inserts the preempt_disable() and
preempt_enable() around the call to __srcu_read_lock() in do_exit().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:15:43 -07:00
Paul E. McKenney
7f21aeef72 rcu: Add online/offline info to stall warning message
This commit makes the RCU CPU stall warning message print online/offline
indications immediately after a hyphen following the CPU number.  A "O"
indicates that the global CPU-hotplug system believes that the CPU is
online, a "o" that RCU perceived the CPU to be online at the beginning
of the current expedited grace period, and an "N" that RCU currently
believes that it will perceive the CPU as being online at the beginning
of the next expedited grace period, with "." otherwise for all three
indications.  So for CPU 10, you would normally see "10-OoN:" indicating
that everything believes that the CPU is online.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-06 11:10:18 -07:00
Paul E. McKenney
ee968ac61d rcu: Eliminate panic when silly boot-time fanout specified
This commit loosens rcutree.rcu_fanout_leaf range checks
and replaces a panic() with a fallback to compile-time values.
This fallback is accompanied by a WARN_ON(), and both occur when the
rcutree.rcu_fanout_leaf value is too small to accommodate the number of
CPUs.  For example, given the current four-level limit for the rcu_node
tree, a system with more than 16 CPUs built with CONFIG_FANOUT=2 must
have rcutree.rcu_fanout_leaf larger than 2.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:09:41 -07:00
Boqun Feng
bb73c52bad rcu: Don't disable preemption for Tiny and Tree RCU readers
Because preempt_disable() maps to barrier() for non-debug builds,
it forces the compiler to spill and reload registers.  Because Tree
RCU and Tiny RCU now only appear in CONFIG_PREEMPT=n builds, these
barrier() instances generate needless extra code for each instance of
rcu_read_lock() and rcu_read_unlock().  This extra code slows down Tree
RCU and bloats Tiny RCU.

This commit therefore removes the preempt_disable() and preempt_enable()
from the non-preemptible implementations of __rcu_read_lock() and
__rcu_read_unlock(), respectively.  However, for debug purposes,
preempt_disable() and preempt_enable() are still invoked if
CONFIG_PREEMPT_COUNT=y, because this allows detection of sleeping inside
atomic sections in non-preemptible kernels.

However, Tiny and Tree RCU operates by coalescing all RCU read-side
critical sections on a given CPU that lie between successive quiescent
states.  It is therefore necessary to compensate for removing barriers
from __rcu_read_lock() and __rcu_read_unlock() by adding them to a
couple of the RCU functions invoked during quiescent states, namely to
rcu_all_qs() and rcu_note_context_switch().  However, note that the latter
is more paranoia than necessity, at least until link-time optimizations
become more aggressive.

This is based on an earlier patch by Paul E. McKenney, fixing
a bug encountered in kernels built with CONFIG_PREEMPT=n and
CONFIG_PREEMPT_COUNT=y.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-10-06 11:08:23 -07:00
Boqun Feng
db3e8db45e rcu: Use call_rcu_func_t to replace explicit type equivalents
We have had the call_rcu_func_t typedef for a quite awhile, but we still
use explicit function pointer types in some places.  These types can
confuse cscope and can be hard to read.  This patch therefore replaces
these types with the call_rcu_func_t typedef.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:08:19 -07:00
Boqun Feng
b6a4ae766e rcu: Use rcu_callback_t in call_rcu*() and friends
As we now have rcu_callback_t typedefs as the type of rcu callbacks, we
should use it in call_rcu*() and friends as the type of parameters. This
could save us a few lines of code and make it clear which function
requires an rcu callbacks rather than other callbacks as its argument.

Besides, this can also help cscope to generate a better database for
code reading.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2015-10-06 11:08:05 -07:00
Paul E. McKenney
5b74c45890 rcu: Make ->cpu_no_qs be a union for aggregate OR
This commit converts the rcu_data structure's ->cpu_no_qs field
to a union.  The bytewise side of this union allows individual access
to indications as to whether this CPU needs to find a quiescent state
for a normal (.norm) and/or expedited (.exp) grace period.  The setwise
side of the union allows testing whether or not a quiescent state is
needed at all, for either type of grace period.

For now, only .norm is used.  A later commit will introduce the expedited
usage.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-09-20 21:16:21 -07:00
Paul E. McKenney
0d43eb34f9 rcu: Invert passed_quiesce and rename to cpu_no_qs
This commit inverts the sense of the rcu_data structure's ->passed_quiesce
field and renames it to ->cpu_no_qs.  This will allow a later commit to
use an "aggregate OR" operation to test expedited as well as normal grace
periods without added overhead.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-09-20 21:16:21 -07:00
Paul E. McKenney
97c668b8e9 rcu: Rename qs_pending to core_needs_qs
An upcoming commit needs to invert the sense of the ->passed_quiesce
rcu_data structure field, so this commit is taking this opportunity
to clarify things a bit by renaming ->qs_pending to ->core_needs_qs.

So if !rdp->core_needs_qs, then this CPU need not concern itself with
quiescent states, in particular, it need not acquire its leaf rcu_node
structure's ->lock to check.  Otherwise, it needs to report the next
quiescent state.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-09-20 21:16:20 -07:00
Paul E. McKenney
bce5fa12aa rcu: Move synchronize_sched_expedited() to combining tree
Currently, synchronize_sched_expedited() uses a single global counter
to track the number of remaining context switches that the current
expedited grace period must wait on.  This is problematic on large
systems, where the resulting memory contention can be pathological.
This commit therefore makes synchronize_sched_expedited() instead use
the combining tree in the same manner as synchronize_rcu_expedited(),
keeping memory contention down to a dull roar.

This commit creates a temporary function sync_sched_exp_select_cpus()
that is very similar to sync_rcu_exp_select_cpus().  A later commit
will consolidate these two functions, which becomes possible when
synchronize_sched_expedited() switches from stop_one_cpu_nowait() to
smp_call_function_single().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-09-20 21:16:20 -07:00
Paul E. McKenney
8203d6d0ee rcu: Use single-stage IPI algorithm for RCU expedited grace period
The current preemptible-RCU expedited grace-period algorithm invokes
synchronize_sched_expedited() to enqueue all tasks currently running
in a preemptible-RCU read-side critical section, then waits for all the
->blkd_tasks lists to drain.  This works, but results in both an IPI and
a double context switch even on CPUs that do not happen to be running
in a preemptible RCU read-side critical section.

This commit implements a new algorithm that causes less OS jitter.
This new algorithm IPIs all online CPUs that are not idle (from an
RCU perspective), but refrains from self-IPIs.  If a CPU receiving
this IPI is not in a preemptible RCU read-side critical section (or
is just now exiting one), it pushes quiescence up the rcu_node tree,
otherwise, it sets a flag that will be handled by the upcoming outermost
rcu_read_unlock(), which will then push quiescence up the tree.

The expedited grace period must of course wait on any pre-existing blocked
readers, and newly blocked readers must be queued carefully based on
the state of both the normal and the expedited grace periods.  This
new queueing approach also avoids the need to update boost state,
courtesy of the fact that blocked tasks are no longer ever migrated to
the root rcu_node structure.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-09-20 21:16:19 -07:00
Paul E. McKenney
b9585e940a rcu: Consolidate tree setup for synchronize_rcu_expedited()
This commit replaces sync_rcu_preempt_exp_init1(() and
sync_rcu_preempt_exp_init2() with sync_exp_reset_tree_hotplug()
and sync_exp_reset_tree(), which will also be used by
synchronize_sched_expedited(), and sync_rcu_exp_select_nodes(), which
contains code specific to synchronize_rcu_expedited().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-09-20 21:16:18 -07:00
Paul E. McKenney
7922cd0e56 rcu: Move rcu_report_exp_rnp() to allow consolidation
This is a nearly pure code-movement commit, moving rcu_report_exp_rnp(),
sync_rcu_preempt_exp_done(), and rcu_preempted_readers_exp() so
that later commits can make synchronize_sched_expedited() use them.
The non-code-movement portion of this commit tags rcu_report_exp_rnp()
as __maybe_unused to avoid build errors when CONFIG_PREEMPT=n.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-09-20 21:16:18 -07:00
Paul E. McKenney
f4ecea309d rcu: Use rsp->expedited_wq instead of sync_rcu_preempt_exp_wq
Now that there is an ->expedited_wq waitqueue in each rcu_state structure,
there is no need for the sync_rcu_preempt_exp_wq global variable.  This
commit therefore substitutes ->expedited_wq for sync_rcu_preempt_exp_wq.
It also initializes ->expedited_wq only once at boot instead of at the
start of each expedited grace period.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-09-20 21:16:17 -07:00