Commit graph

193 commits

Author SHA1 Message Date
Blagovest Kolenichev
95a027ead7 Merge branch 'android-4.4@e4528dd' into branch 'msm-4.4'
* refs/heads/tmp-e4528dd:
  Linux 4.4.65
  perf/core: Fix concurrent sys_perf_event_open() vs. 'move_group' race
  ping: implement proper locking
  staging/android/ion : fix a race condition in the ion driver
  vfio/pci: Fix integer overflows, bitmask check
  tipc: check minimum bearer MTU
  netfilter: nfnetlink: correctly validate length of batch messages
  xc2028: avoid use after free
  mnt: Add a per mount namespace limit on the number of mounts
  tipc: fix socket timer deadlock
  tipc: fix random link resets while adding a second bearer
  gfs2: avoid uninitialized variable warning
  hostap: avoid uninitialized variable use in hfa384x_get_rid
  tty: nozomi: avoid a harmless gcc warning
  tipc: correct error in node fsm
  tipc: re-enable compensation for socket receive buffer double counting
  tipc: make dist queue pernet
  tipc: make sure IPv6 header fits in skb headroom
  ANDROID: uid_sys_stats: fix access of task_uid(task)
  BACKPORT: f2fs: sanity check log_blocks_per_seg
  Linux 4.4.64
  tipc: fix crash during node removal
  block: fix del_gendisk() vs blkdev_ioctl crash
  x86, pmem: fix broken __copy_user_nocache cache-bypass assumptions
  hv: don't reset hv_context.tsc_page on crash
  Drivers: hv: balloon: account for gaps in hot add regions
  Drivers: hv: balloon: keep track of where ha_region starts
  Tools: hv: kvp: ensure kvp device fd is closed on exec
  kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd
  x86/mce/AMD: Give a name to MCA bank 3 when accessed with legacy MSRs
  powerpc/kprobe: Fix oops when kprobed on 'stdu' instruction
  ubi/upd: Always flush after prepared for an update
  mac80211: reject ToDS broadcast data frames
  mmc: sdhci-esdhc-imx: increase the pad I/O drive strength for DDR50 card
  ACPI / power: Avoid maybe-uninitialized warning
  Input: elantech - add Fujitsu Lifebook E547 to force crc_enabled
  VSOCK: Detach QP check should filter out non matching QPs.
  Drivers: hv: vmbus: Reduce the delay between retries in vmbus_post_msg()
  Drivers: hv: get rid of timeout in vmbus_open()
  Drivers: hv: don't leak memory in vmbus_establish_gpadl()
  s390/mm: fix CMMA vs KSM vs others
  CIFS: remove bad_network_name flag
  cifs: Do not send echoes before Negotiate is complete
  ring-buffer: Have ring_buffer_iter_empty() return true when empty
  tracing: Allocate the snapshot buffer before enabling probe
  KEYS: fix keyctl_set_reqkey_keyring() to not leak thread keyrings
  KEYS: Change the name of the dead type to ".dead" to prevent user access
  KEYS: Disallow keyrings beginning with '.' to be joined as session keyrings
  ANDROID: sdcardfs: Call lower fs's revalidate
  ANDROID: sdcardfs: Avoid setting GIDs outside of valid ranges
  ANDROID: sdcardfs: Copy meta-data from lower inode
  Revert "Revert "Android: sdcardfs: Don't do d_add for lower fs""
  ANDROID: sdcardfs: Use filesystem specific hash
  ANDROID: AVB error handler to invalidate vbmeta partition.
  ANDROID: Update init/do_mounts_dm.c to the latest ChromiumOS version.
  Revert "[RFC]cgroup: Change from CAP_SYS_NICE to CAP_SYS_RESOURCE for cgroup migration permissions"

Conflicts:
	drivers/md/Makefile

Change-Id: I8f5ed53cb8b6cc66914f10c6ac820003b87b8759
Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
2017-05-02 06:40:36 -07:00
Greg Kroah-Hartman
e4528dd775 This is the 4.4.65 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlkFXvsACgkQONu9yGCS
 aT6kPg//QqrRCxSUBYahQ1Jp16AVLiEWjJ3umzBhGGSPn7FfsWF8951R1WBHGlFI
 lEUa3Pfi0U1sh0q4v6pTmQ/AYoa67DcKorxQegH9JoaRp0IvWpSaGMSfbmKP5pDl
 PQyRL6DmOFkf/6X0dvby5ybbt2Kp59zTm8RFeFLRo3LTUK30w/tBTVvouk+UW3KA
 KtjeL70OSOHgWoHXhNWDX1JTTBGFFTI2x0jlFeUtq10t2kRxAMDZpB/IY0VJ3ZTe
 iso6+hC8JyzsXUYP82ZfZ7BAv/hSWBV3ErHyrUmhqWfE/Px7PFEeo3OyG3Bqosu6
 aZW78jwFoqZcAhkVTQepWMHonUT+XLHUgCzc2MqFR4HW6JoQhKDdIqlt1Lqp6y1O
 XsYOrPU1WqHhyoO9E3YwmAIjlYBHxYSUiCnqI9WtvvExJUhXXk/wwzgXUFrZPD01
 berofViH2LJAxde0sqpidpNRg98m+MAK47M03I/tZUUykjGDi8NPTvM4FBbNCEty
 3qaVVCUm7o8YzZg54QF61O+ciceoQdnsQJVy94EV3n2pgdN/7pG0v1KikBRKfsPK
 1Wp+l0tdLkms56ElXyt/lHtF5Pre5i4sE6SdnZa3RHTUV168PFVYqJUCqWRwCD50
 QMs+yLvRHwCFst+ix29Xn+c7KYKcMyqPvCrI8oczfokV/tvMVd8=
 =1GiA
 -----END PGP SIGNATURE-----

Merge 4.4.65 into android-4.4

Changes in 4.4.65:
	tipc: make sure IPv6 header fits in skb headroom
	tipc: make dist queue pernet
	tipc: re-enable compensation for socket receive buffer double counting
	tipc: correct error in node fsm
	tty: nozomi: avoid a harmless gcc warning
	hostap: avoid uninitialized variable use in hfa384x_get_rid
	gfs2: avoid uninitialized variable warning
	tipc: fix random link resets while adding a second bearer
	tipc: fix socket timer deadlock
	mnt: Add a per mount namespace limit on the number of mounts
	xc2028: avoid use after free
	netfilter: nfnetlink: correctly validate length of batch messages
	tipc: check minimum bearer MTU
	vfio/pci: Fix integer overflows, bitmask check
	staging/android/ion : fix a race condition in the ion driver
	ping: implement proper locking
	perf/core: Fix concurrent sys_perf_event_open() vs. 'move_group' race
	Linux 4.4.65

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-04-30 07:30:52 +02:00
Eric W. Biederman
c50fd34e10 mnt: Add a per mount namespace limit on the number of mounts
commit d29216842a85c7970c536108e093963f02714498 upstream.

CAI Qian <caiqian@redhat.com> pointed out that the semantics
of shared subtrees make it possible to create an exponentially
increasing number of mounts in a mount namespace.

    mkdir /tmp/1 /tmp/2
    mount --make-rshared /
    for i in $(seq 1 20) ; do mount --bind /tmp/1 /tmp/2 ; done

Will create create 2^20 or 1048576 mounts, which is a practical problem
as some people have managed to hit this by accident.

As such CVE-2016-6213 was assigned.

Ian Kent <raven@themaw.net> described the situation for autofs users
as follows:

> The number of mounts for direct mount maps is usually not very large because of
> the way they are implemented, large direct mount maps can have performance
> problems. There can be anywhere from a few (likely case a few hundred) to less
> than 10000, plus mounts that have been triggered and not yet expired.
>
> Indirect mounts have one autofs mount at the root plus the number of mounts that
> have been triggered and not yet expired.
>
> The number of autofs indirect map entries can range from a few to the common
> case of several thousand and in rare cases up to between 30000 and 50000. I've
> not heard of people with maps larger than 50000 entries.
>
> The larger the number of map entries the greater the possibility for a large
> number of active mounts so it's not hard to expect cases of a 1000 or somewhat
> more active mounts.

So I am setting the default number of mounts allowed per mount
namespace at 100,000.  This is more than enough for any use case I
know of, but small enough to quickly stop an exponential increase
in mounts.  Which should be perfect to catch misconfigurations and
malfunctioning programs.

For anyone who needs a higher limit this can be changed by writing
to the new /proc/sys/fs/mount-max sysctl.

Tested-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
[bwh: Backported to 4.4: adjust context]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-30 05:49:28 +02:00
Kees Cook
22562e0cec sysctl: enable strict writes
SYSCTL_WRITES_WARN was added in commit f4aacea2f5 ("sysctl: allow for
strict write position handling"), and released in v3.16 in August of
2014.  Since then I can find only 1 instance of non-zero offset
writing[1], and it was fixed immediately in CRIU[2].  As such, it
appears safe to flip this to the strict state now.

[1] https://www.google.com/search?q="when%20file%20position%20was%20not%200"
[2] http://lists.openvz.org/pipermail/criu/2015-April/019819.html

Change-Id: Ibf8d46fa34fa9fd4df3527dc4dfc3e3d31b2f7e0
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: 41662f5cc55335807d39404371cfcbb1909304c4
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2017-01-12 16:01:51 -08:00
Runmin Wang
9cc5c789d9 Merge remote-tracking branch 'msm4.4/tmp-da9a92f' into msm-4.4
* origin/tmp-da9a92f:
  arm64: kaslr: increase randomization granularity
  arm64: relocatable: deal with physically misaligned kernel images
  arm64: don't map TEXT_OFFSET bytes below the kernel if we can avoid it
  arm64: kernel: replace early 64-bit literal loads with move-immediates
  arm64: introduce mov_q macro to move a constant into a 64-bit register
  arm64: kernel: perform relocation processing from ID map
  arm64: kernel: use literal for relocated address of __secondary_switched
  arm64: kernel: don't export local symbols from head.S
  arm64: simplify kernel segment mapping granularity
  arm64: cover the .head.text section in the .text segment mapping
  arm64: move early boot code to the .init segment
  arm64: use 'segment' rather than 'chunk' to describe mapped kernel regions
  arm64: mm: Mark .rodata as RO
  Linux 4.4.16
  ovl: verify upper dentry before unlink and rename
  drm/i915: Revert DisplayPort fast link training feature
  tmpfs: fix regression hang in fallocate undo
  tmpfs: don't undo fallocate past its last page
  crypto: qat - make qat_asym_algs.o depend on asn1 headers
  xen/acpi: allow xen-acpi-processor driver to load on Xen 4.7
  File names with trailing period or space need special case conversion
  cifs: dynamic allocation of ntlmssp blob
  Fix reconnect to not defer smb3 session reconnect long after socket reconnect
  53c700: fix BUG on untagged commands
  s390: fix test_fp_ctl inline assembly contraints
  scsi: fix race between simultaneous decrements of ->host_failed
  ovl: verify upper dentry in ovl_remove_and_whiteout()
  ovl: Copy up underlying inode's ->i_mode to overlay inode
  ARM: mvebu: fix HW I/O coherency related deadlocks
  ARM: dts: armada-38x: fix MBUS_ID for crypto SRAM on Armada 385 Linksys
  ARM: sunxi/dt: make the CHIP inherit from allwinner,sun5i-a13
  ALSA: hda: add AMD Stoney PCI ID with proper driver caps
  ALSA: hda - fix use-after-free after module unload
  ALSA: ctl: Stop notification after disconnection
  ALSA: pcm: Free chmap at PCM free callback, too
  ALSA: hda/realtek - add new pin definition in alc225 pin quirk table
  ALSA: hda - fix read before array start
  ALSA: hda - Add PCI ID for Kabylake-H
  ALSA: hda/realtek: Add Lenovo L460 to docking unit fixup
  ALSA: timer: Fix negative queue usage by racy accesses
  ALSA: echoaudio: Fix memory allocation
  ALSA: au88x0: Fix calculation in vortex_wtdma_bufshift()
  ALSA: hda / realtek - add two more Thinkpad IDs (5050,5053) for tpt460 fixup
  ALSA: hda - Fix the headset mic jack detection on Dell machine
  ALSA: dummy: Fix a use-after-free at closing
  hwmon: (dell-smm) Cache fan_type() calls and change fan detection
  hwmon: (dell-smm) Disallow fan_type() calls on broken machines
  hwmon: (dell-smm) Restrict fan control and serial number to CAP_SYS_ADMIN by default
  tty/vt/keyboard: fix OOB access in do_compute_shiftstate()
  tty: vt: Fix soft lockup in fbcon cursor blink timer.
  iio:ad7266: Fix probe deferral for vref
  iio:ad7266: Fix support for optional regulators
  iio:ad7266: Fix broken regulator error handling
  iio: accel: kxsd9: fix the usage of spi_w8r8()
  staging: iio: accel: fix error check
  iio: hudmidity: hdc100x: fix incorrect shifting and scaling
  iio: humidity: hdc100x: fix IIO_TEMP channel reporting
  iio: humidity: hdc100x: correct humidity integration time mask
  iio: proximity: as3935: fix buffer stack trashing
  iio: proximity: as3935: remove triggered buffer processing
  iio: proximity: as3935: correct IIO_CHAN_INFO_RAW output
  iio: light apds9960: Add the missing dev.parent
  iio:st_pressure: fix sampling gains (bring inline with ABI)
  iio: Fix error handling in iio_trigger_attach_poll_func
  xen/balloon: Fix declared-but-not-defined warning
  perf/x86: Fix undefined shift on 32-bit kernels
  memory: omap-gpmc: Fix omap gpmc EXTRADELAY timing
  drm/vmwgfx: Fix error paths when mapping framebuffer
  drm/vmwgfx: Delay pinning fbdev framebuffer until after mode set
  drm/vmwgfx: Check pin count before attempting to move a buffer
  drm/vmwgfx: Work around mode set failure in 2D VMs
  drm/vmwgfx: Add an option to change assumed FB bpp
  drm/ttm: Make ttm_bo_mem_compat available
  drm: atmel-hlcdc: actually disable scaling when no scaling is required
  drm: make drm_atomic_set_mode_prop_for_crtc() more reliable
  drm: add missing drm_mode_set_crtcinfo call
  drm/i915: Update CDCLK_FREQ register on BDW after changing cdclk frequency
  drm/i915: Update ifdeffery for mutex->owner
  drm/i915: Refresh cached DP port register value on resume
  drm/i915/ilk: Don't disable SSC source if it's in use
  drm/nouveau/disp/sor/gf119: select correct sor when poking training pattern
  drm/nouveau: fix for disabled fbdev emulation
  drm/nouveau/fbcon: fix out-of-bounds memory accesses
  drm/nouveau/gr/gf100-: update sm error decoding from gk20a nvgpu headers
  drm/nouveau/disp/sor/gf119: both links use the same training register
  virtio_balloon: fix PFN format for virtio-1
  drm/dp/mst: Always clear proposed vcpi table for port.
  drm/amdkfd: destroy dbgmgr in notifier release
  drm/amdkfd: unbind only existing processes
  ubi: Make recover_peb power cut aware
  drm/amdgpu/gfx7: fix broken condition check
  drm/radeon: fix asic initialization for virtualized environments
  btrfs: account for non-CoW'd blocks in btrfs_abort_transaction
  percpu: fix synchronization between synchronous map extension and chunk destruction
  percpu: fix synchronization between chunk->map_extend_work and chunk destruction
  af_unix: fix hard linked sockets on overlay
  vfs: add d_real_inode() helper
  arm64: Rework valid_user_regs
  ipmi: Remove smi_msg from waiting_rcv_msgs list before handle_one_recv_msg()
  drm/mgag200: Black screen fix for G200e rev 4
  iommu/amd: Fix unity mapping initialization race
  iommu/vt-d: Enable QI on all IOMMUs before setting root entry
  iommu/arm-smmu: Wire up map_sg for arm-smmu-v3
  base: make module_create_drivers_dir race-free
  tracing: Handle NULL formats in hold_module_trace_bprintk_format()
  HID: multitouch: enable palm rejection for Windows Precision Touchpad
  HID: hiddev: validate num_values for HIDIOCGUSAGES, HIDIOCSUSAGES commands
  HID: elo: kill not flush the work
  KVM: nVMX: VMX instructions: fix segment checks when L1 is in long mode.
  kvm: Fix irq route entries exceeding KVM_MAX_IRQ_ROUTES
  KEYS: potential uninitialized variable
  ARCv2: LLSC: software backoff is NOT needed starting HS2.1c
  ARCv2: Check for LL-SC livelock only if LLSC is enabled
  ipv6: Fix mem leak in rt6i_pcpu
  cdc_ncm: workaround for EM7455 "silent" data interface
  net_sched: fix mirrored packets checksum
  packet: Use symmetric hash for PACKET_FANOUT_HASH.
  sched/fair: Fix cfs_rq avg tracking underflow
  UBIFS: Implement ->migratepage()
  mm: Export migrate_page_move_mapping and migrate_page_copy
  MIPS: KVM: Fix modular KVM under QEMU
  ARM: 8579/1: mm: Fix definition of pmd_mknotpresent
  ARM: 8578/1: mm: ensure pmd_present only checks the valid bit
  ARM: imx6ul: Fix Micrel PHY mask
  NFS: Fix another OPEN_DOWNGRADE bug
  make nfs_atomic_open() call d_drop() on all ->open_context() errors.
  nfsd: check permissions when setting ACLs
  posix_acl: Add set_posix_acl
  nfsd: Extend the mutex holding region around in nfsd4_process_open2()
  nfsd: Always lock state exclusively.
  nfsd4/rpc: move backchannel create logic into rpc code
  writeback: use higher precision calculation in domain_dirty_limits()
  thermal: cpu_cooling: fix improper order during initialization
  uvc: Forward compat ioctls to their handlers directly
  Revert "gpiolib: Split GPIO flags parsing and GPIO configuration"
  x86/amd_nb: Fix boot crash on non-AMD systems
  kprobes/x86: Clear TF bit in fault on single-stepping
  x86, build: copy ldlinux.c32 to image.iso
  locking/static_key: Fix concurrent static_key_slow_inc()
  locking/qspinlock: Fix spin_unlock_wait() some more
  locking/ww_mutex: Report recursive ww_mutex locking early
  of: irq: fix of_irq_get[_byname]() kernel-doc
  of: fix autoloading due to broken modalias with no 'compatible'
  mnt: If fs_fully_visible fails call put_filesystem.
  mnt: Account for MS_RDONLY in fs_fully_visible
  mnt: fs_fully_visible test the proper mount for MNT_LOCKED
  usb: common: otg-fsm: add license to usb-otg-fsm
  USB: EHCI: declare hostpc register as zero-length array
  usb: dwc2: fix regression on big-endian PowerPC/ARM systems
  powerpc/tm: Always reclaim in start_thread() for exec() class syscalls
  powerpc/pseries: Fix IBM_ARCH_VEC_NRCORES_OFFSET since POWER8NVL was added
  powerpc/pseries: Fix PCI config address for DDW
  powerpc/iommu: Remove the dependency on EEH struct in DDW mechanism
  IB/mlx4: Properly initialize GRH TClass and FlowLabel in AHs
  IB/cm: Fix a recently introduced locking bug
  EDAC, sb_edac: Fix rank lookup on Broadwell
  mac80211: Fix mesh estab_plinks counting in STA removal case
  mac80211_hwsim: Add missing check for HWSIM_ATTR_SIGNAL
  mac80211: mesh: flush mesh paths unconditionally
  mac80211: fix fast_tx header alignment
  Linux 4.4.15
  usb: dwc3: exynos: Fix deferred probing storm.
  usb: host: ehci-tegra: Grab the correct UTMI pads reset
  usb: gadget: fix spinlock dead lock in gadgetfs
  USB: mos7720: delete parport
  xhci: Fix handling timeouted commands on hosts in weird states.
  USB: xhci: Add broken streams quirk for Frescologic device id 1009
  usb: xhci-plat: properly handle probe deferral for devm_clk_get()
  xhci: Cleanup only when releasing primary hcd
  usb: musb: host: correct cppi dma channel for isoch transfer
  usb: musb: Ensure rx reinit occurs for shared_fifo endpoints
  usb: musb: Stop bulk endpoint while queue is rotated
  usb: musb: only restore devctl when session was set in backup
  usb: quirks: Add no-lpm quirk for Acer C120 LED Projector
  usb: quirks: Fix sorting
  USB: uas: Fix slave queue_depth not being set
  crypto: user - re-add size check for CRYPTO_MSG_GETALG
  crypto: ux500 - memmove the right size
  crypto: vmx - Increase priority of aes-cbc cipher
  AX.25: Close socket connection on session completion
  bpf: try harder on clones when writing into skb
  net: alx: Work around the DMA RX overflow issue
  net: macb: fix default configuration for GMAC on AT91
  neigh: Explicitly declare RCU-bh read side critical section in neigh_xmit()
  bpf, perf: delay release of BPF prog after grace period
  sock_diag: do not broadcast raw socket destruction
  Bridge: Fix ipv6 mc snooping if bridge has no ipv6 address
  ipmr/ip6mr: Initialize the last assert time of mfc entries.
  netem: fix a use after free
  esp: Fix ESN generation under UDP encapsulation
  sit: correct IP protocol used in ipip6_err
  net: Don't forget pr_fmt on net_dbg_ratelimited for CONFIG_DYNAMIC_DEBUG
  net_sched: fix pfifo_head_drop behavior vs backlog
  sdcardfs: Truncate packages_gid.list on overflow
  UPSTREAM: cdc_ncm: do not call usbnet_link_change from cdc_ncm_bind
  BACKPORT: proc: add /proc/<pid>/timerslack_ns interface
  BACKPORT: timer: convert timer_slack_ns from unsigned long to u64
  netfilter: xt_quota2: make quota2_log work well
  Revert "usb: gadget: prevent change of Host MAC address of 'usb0' interface"
  BACKPORT: PM / sleep: Go direct_complete if driver has no callbacks
  ANDROID: base-cfg: enable UID_CPUTIME
  UPSTREAM: USB: usbfs: fix potential infoleak in devio
  UPSTREAM: ALSA: timer: Fix leak in events via snd_timer_user_ccallback
  UPSTREAM: ALSA: timer: Fix leak in events via snd_timer_user_tinterrupt
  UPSTREAM: ALSA: timer: Fix leak in SNDRV_TIMER_IOCTL_PARAMS
  ANDROID: configs: remove unused configs
  ANDROID: cpu: send KOBJ_ONLINE event when enabling cpus
  ANDROID: dm verity fec: initialize recursion level
  ANDROID: dm verity fec: fix RS block calculation
  Linux 4.4.14
  netfilter: x_tables: introduce and use xt_copy_counters_from_user
  netfilter: x_tables: do compat validation via translate_table
  netfilter: x_tables: xt_compat_match_from_user doesn't need a retval
  netfilter: ip6_tables: simplify translate_compat_table args
  netfilter: ip_tables: simplify translate_compat_table args
  netfilter: arp_tables: simplify translate_compat_table args
  netfilter: x_tables: don't reject valid target size on some architectures
  netfilter: x_tables: validate all offsets and sizes in a rule
  netfilter: x_tables: check for bogus target offset
  netfilter: x_tables: check standard target size too
  netfilter: x_tables: add compat version of xt_check_entry_offsets
  netfilter: x_tables: assert minimum target size
  netfilter: x_tables: kill check_entry helper
  netfilter: x_tables: add and use xt_check_entry_offsets
  netfilter: x_tables: validate targets of jumps
  netfilter: x_tables: don't move to non-existent next rule
  drm/core: Do not preserve framebuffer on rmfb, v4.
  crypto: qat - fix adf_ctl_drv.c:undefined reference to adf_init_pf_wq
  netfilter: x_tables: fix unconditional helper
  netfilter: x_tables: make sure e->next_offset covers remaining blob size
  netfilter: x_tables: validate e->target_offset early
  MIPS: Fix 64k page support for 32 bit kernels.
  sparc64: Fix return from trap window fill crashes.
  sparc: Harden signal return frame checks.
  sparc64: Take ctx_alloc_lock properly in hugetlb_setup().
  sparc64: Reduce TLB flushes during hugepte changes
  sparc/PCI: Fix for panic while enabling SR-IOV
  sparc64: Fix sparc64_set_context stack handling.
  sparc64: Fix numa node distance initialization
  sparc64: Fix bootup regressions on some Kconfig combinations.
  sparc: Fix system call tracing register handling.
  fix d_walk()/non-delayed __d_free() race
  sched: panic on corrupted stack end
  proc: prevent stacking filesystems on top
  x86/entry/traps: Don't force in_interrupt() to return true in IST handlers
  wext: Fix 32 bit iwpriv compatibility issue with 64 bit Kernel
  ecryptfs: forbid opening files without mmap handler
  memcg: add RCU locking around css_for_each_descendant_pre() in memcg_offline_kmem()
  parisc: Fix pagefault crash in unaligned __get_user() call
  pinctrl: mediatek: fix dual-edge code defect
  powerpc/pseries: Add POWER8NVL support to ibm,client-architecture-support call
  powerpc: Use privileged SPR number for MMCR2
  powerpc: Fix definition of SIAR and SDAR registers
  powerpc/pseries/eeh: Handle RTAS delay requests in configure_bridge
  arm64: mm: always take dirty state from new pte in ptep_set_access_flags
  arm64: Provide "model name" in /proc/cpuinfo for PER_LINUX32 tasks
  crypto: ccp - Fix AES XTS error for request sizes above 4096
  crypto: public_key: select CRYPTO_AKCIPHER
  irqchip/gic-v3: Fix ICC_SGI1R_EL1.INTID decoding mask
  s390/bpf: reduce maximum program size to 64 KB
  s390/bpf: fix recache skb->data/hlen for skb_vlan_push/pop
  gpio: bcm-kona: fix bcm_kona_gpio_reset() warnings
  ARM: fix PTRACE_SETVFPREGS on SMP systems
  ALSA: hda/realtek: Add T560 docking unit fixup
  ALSA: hda/realtek - Add support for new codecs ALC700/ALC701/ALC703
  ALSA: hda/realtek - ALC256 speaker noise issue
  ALSA: hda - Fix headset mic detection problem for Dell machine
  ALSA: hda - Add PCI ID for Kabylake
  KVM: irqfd: fix NULL pointer dereference in kvm_irq_map_gsi
  KVM: x86: fix OOPS after invalid KVM_SET_DEBUGREGS
  vxlan, gre, geneve: Set a large MTU on ovs-created tunnel devices
  geneve: Relax MTU constraints
  vxlan: Relax MTU constraints
  ipv6: Skip XFRM lookup if dst_entry in socket cache is valid
  l2tp: fix configuration passed to setup_udp_tunnel_sock()
  bridge: Don't insert unnecessary local fdb entry on changing mac address
  tcp: record TLP and ER timer stats in v6 stats
  vxlan: Accept user specified MTU value when create new vxlan link
  team: don't call netdev_change_features under team->lock
  sfc: on MC reset, clear PIO buffer linkage in TXQs
  bpf, inode: disallow userns mounts
  uapi glibc compat: fix compilation when !__USE_MISC in glibc
  udp: prevent skbs lingering in tunnel socket queues
  bpf: Use mount_nodev not mount_ns to mount the bpf filesystem
  tuntap: correctly wake up process during uninit
  switchdev: pass pointer to fib_info instead of copy
  tipc: fix nametable publication field in nl compat
  netlink: Fix dump skb leak/double free
  tipc: check nl sock before parsing nested attributes
  scsi: Add QEMU CD-ROM to VPD Inquiry Blacklist
  scsi_lib: correctly retry failed zero length REQ_TYPE_FS commands
  cs-etm: associating output packet with CPU they executed on
  cs-etm: removing unecessary structure field
  cs-etm: account for each trace buffer in the queue
  cs-etm: avoid casting variable
  perf tools: fixing Makefile problems
  perf tools: new naming convention for openCSD
  perf scripts: Add python scripts for CoreSight traces
  perf tools: decoding capailitity for CoreSight traces
  perf symbols: Check before overwriting build_id
  perf tools: pushing driver configuration down to the kernel
  perf tools: add infrastructure for PMU specific configuration
  coresight: etm-perf: incorporating sink definition from the cmd line
  coresight: adding sink parameter to function coresight_build_path()
  perf: passing struct perf_event to function setup_aux()
  perf/core: adding PMU driver specific configuration
  perf tools: adding coresight etm PMU record capabilities
  perf tools: making coresight PMU listable
  coresight: tmc: implementing TMC-ETR AUX space API
  coresight: Add support for Juno platform
  coresight: Handle build path error
  coresight: Fix erroneous memset in tmc_read_unprepare_etr
  coresight: Fix tmc_read_unprepare_etr
  coresight: Fix NULL pointer dereference in _coresight_build_path
  ANDROID: dm verity fec: add missing release from fec_ktype
  ANDROID: dm verity fec: limit error correction recursion
  ANDROID: restrict access to perf events
  FROMLIST: security,perf: Allow further restriction of perf_event_open
  BACKPORT: perf tools: Document the perf sysctls
  Revert "armv6 dcc tty driver"
  Revert "arm: dcc_tty: fix armv6 dcc tty build failure"
  ARM64: Ignore Image-dtb from git point of view
  arm64: add option to build Image-dtb
  ANDROID: usb: gadget: f_midi: set fi->f to NULL when free f_midi function
  Linux 4.4.13
  xfs: handle dquot buffer readahead in log recovery correctly
  xfs: print name of verifier if it fails
  xfs: skip stale inodes in xfs_iflush_cluster
  xfs: fix inode validity check in xfs_iflush_cluster
  xfs: xfs_iflush_cluster fails to abort on error
  xfs: Don't wrap growfs AGFL indexes
  xfs: disallow rw remount on fs with unknown ro-compat features
  gcov: disable tree-loop-im to reduce stack usage
  scripts/package/Makefile: rpmbuild add support of RPMOPTS
  dma-debug: avoid spinlock recursion when disabling dma-debug
  PM / sleep: Handle failures in device_suspend_late() consistently
  ext4: silence UBSAN in ext4_mb_init()
  ext4: address UBSAN warning in mb_find_order_for_block()
  ext4: fix oops on corrupted filesystem
  ext4: clean up error handling when orphan list is corrupted
  ext4: fix hang when processing corrupted orphaned inode list
  drm/imx: Match imx-ipuv3-crtc components using device node in platform data
  drm/i915: Don't leave old junk in ilk active watermarks on readout
  drm/atomic: Verify connector->funcs != NULL when clearing states
  drm/fb_helper: Fix references to dev->mode_config.num_connector
  drm/i915/fbdev: Fix num_connector references in intel_fb_initial_config()
  drm/amdgpu: Fix hdmi deep color support.
  drm/amdgpu: use drm_mode_vrefresh() rather than mode->vrefresh
  drm/vmwgfx: Fix order of operation
  drm/vmwgfx: use vmw_cmd_dx_cid_check for query commands.
  drm/vmwgfx: Enable SVGA_3D_CMD_DX_SET_PREDICATION
  drm/gma500: Fix possible out of bounds read
  sunrpc: fix stripping of padded MIC tokens
  xen: use same main loop for counting and remapping pages
  xen/events: Don't move disabled irqs
  powerpc/eeh: Restore initial state in eeh_pe_reset_and_recover()
  Revert "powerpc/eeh: Fix crash in eeh_add_device_early() on Cell"
  powerpc/eeh: Don't report error in eeh_pe_reset_and_recover()
  powerpc/book3s64: Fix branching to OOL handlers in relocatable kernel
  pipe: limit the per-user amount of pages allocated in pipes
  QE-UART: add "fsl,t1040-ucc-uart" to of_device_id
  wait/ptrace: assume __WALL if the child is traced
  mm: use phys_addr_t for reserve_bootmem_region() arguments
  media: v4l2-compat-ioctl32: fix missing reserved field copy in put_v4l2_create32
  PCI: Disable all BAR sizing for devices with non-compliant BARs
  pinctrl: exynos5440: Use off-stack memory for pinctrl_gpio_range
  clk: bcm2835: divider value has to be 1 or more
  clk: bcm2835: pll_off should only update CM_PLL_ANARST
  clk: at91: fix check of clk_register() returned value
  clk: bcm2835: Fix PLL poweron
  cpuidle: Fix cpuidle_state_is_coupled() argument in cpuidle_enter()
  cpuidle: Indicate when a device has been unregistered
  PM / Runtime: Fix error path in pm_runtime_force_resume()
  mfd: intel_soc_pmic_core: Terminate panel control GPIO lookup table correctly
  mfd: intel-lpss: Save register context on suspend
  hwmon: (ads7828) Enable internal reference
  aacraid: Fix for KDUMP driver hang
  aacraid: Fix for aac_command_thread hang
  aacraid: Relinquish CPU during timeout wait
  rtlwifi: pci: use dev_kfree_skb_irq instead of kfree_skb in rtl_pci_reset_trx_ring
  rtlwifi: Fix logic error in enter/exit power-save mode
  rtlwifi: btcoexist: Implement antenna selection
  rtlwifi: rtl8723be: Add antenna select module parameter
  hwrng: exynos - Fix unbalanced PM runtime put on timeout error path
  ath5k: Change led pin configuration for compaq c700 laptop
  ath10k: fix kernel panic, move arvifs list head init before htt init
  ath10k: fix rx_channel during hw reconfigure
  ath10k: fix firmware assert in monitor mode
  ath10k: fix debugfs pktlog_filter write
  ath9k: Fix LED polarity for some Mini PCI AR9220 MB92 cards.
  ath9k: Add a module parameter to invert LED polarity.
  ARM: dts: imx35: restore existing used clock enumeration
  ARM: dts: exynos: Add interrupt line to MAX8997 PMIC on exynos4210-trats
  ARM: dts: at91: fix typo in sama5d2 PIN_PD24 description
  ARM: mvebu: fix GPIO config on the Linksys boards
  Input: uinput - handle compat ioctl for UI_SET_PHYS
  ASoC: ak4642: Enable cache usage to fix crashes on resume
  affs: fix remount failure when there are no options changed
  MIPS: VDSO: Build with `-fno-strict-aliasing'
  MIPS: lib: Mark intrinsics notrace
  MIPS: Build microMIPS VDSO for microMIPS kernels
  MIPS: Fix sigreturn via VDSO on microMIPS kernel
  MIPS: ptrace: Prevent writes to read-only FCSR bits
  MIPS: ptrace: Fix FP context restoration FCSR regression
  MIPS: Disable preemption during prctl(PR_SET_FP_MODE, ...)
  MIPS: Prevent "restoration" of MSA context in non-MSA kernels
  MIPS: Fix MSA ld_*/st_* asm macros to use PTR_ADDU
  MIPS: Use copy_s.fmt rather than copy_u.fmt
  MIPS: Loongson-3: Reserve 32MB for RS780E integrated GPU
  MIPS: Reserve nosave data for hibernation
  MIPS: ath79: make bootconsole wait for both THRE and TEMT
  MIPS: Sync icache & dcache in set_pte_at
  MIPS: Handle highmem pages in __update_cache
  MIPS: Flush highmem pages in __flush_dcache_page
  MIPS: Fix watchpoint restoration
  MIPS: Fix uapi include in exported asm/siginfo.h
  MIPS: Fix siginfo.h to use strict posix types
  MIPS: Avoid using unwind_stack() with usermode
  MIPS: Don't unwind to user mode with EVA
  MIPS: MSA: Fix a link error on `_init_msa_upper' with older GCC
  MIPS: math-emu: Fix jalr emulation when rd == $0
  MIPS64: R6: R2 emulation bugfix
  coresight: etb10: adjust read pointer only when needed
  coresight: configuring ETF in FIFO mode when acting as link
  coresight: tmc: implementing TMC-ETF AUX space API
  coresight: moving struct cs_buffers to header file
  coresight: tmc: keep track of memory width
  coresight: tmc: make sysFS and Perf mode mutually exclusive
  coresight: tmc: dump system memory content only when needed
  coresight: tmc: adding mode of operation for link/sinks
  coresight: tmc: getting rid of multiple read access
  coresight: tmc: allocating memory when needed
  coresight: tmc: making prepare/unprepare functions generic
  coresight: tmc: splitting driver in ETB/ETF and ETR components
  coresight: tmc: cleaning up header file
  coresight: tmc: introducing new header file
  coresight: tmc: clearly define number of transfers per burst
  coresight: tmc: re-implementing tmc_read_prepare/unprepare() functions
  coresight: tmc: waiting for TMCReady bit before programming
  coresight: tmc: modifying naming convention
  coresight: tmc: adding sysFS management entries
  coresight: etm4x: add tracer ID for A72 Maia processor.
  coresight: etb10: fixing the right amount of words to read
  coresight: stm: adding driver for CoreSight STM component
  coresight: adding path for STM device
  coresight: etm4x: modify q_support type
  coresight: no need to do the forced type conversion
  coresight: removing gratuitous boot time log messages
  coresight: etb10: splitting sysFS "status" entry
  coresight: moving coresight_simple_func() to header file
  coresight: etm4x: implementing the perf PMU API
  coresight: etm4x: implementing user/kernel mode tracing
  coresight: etm4x: moving etm_drvdata::enable to atomic field
  coresight: etm4x: unlocking tracers in default arch init
  coresight: etm4x: splitting etmv4 default configuration
  coresight: etm4x: splitting struct etmv4_drvdata
  coresight: etm4x: adding config and traceid registers
  coresight: etm4x: moving sysFS entries to a dedicated file
  stm class: Support devices that override software assigned masters
  stm class: Remove unnecessary pointer increment
  stm class: Fix stm device initialization order
  stm class: Do not leak the chrdev in error path
  stm class: Remove a pointless line
  stm class: stm_heartbeat: Make nr_devs parameter read-only
  stm class: dummy_stm: Make nr_dummies parameter read-only
  MAINTAINERS: Add a git tree for the stm class
  perf/ring_buffer: Document AUX API usage
  perf/core: Free AUX pages in unmap path
  perf/ring_buffer: Refuse to begin AUX transaction after rb->aux_mmap_count drops
  perf auxtrace: Add perf_evlist pointer to *info_priv_size()
  perf session: Simplify tool stubs
  perf inject: Hit all DSOs for AUX data in JIT and other cases
  perf tools: tracepoint_error() can receive e=NULL, robustify it
  perf evlist: Make perf_evlist__open() open evsels with their cpus and threads (like perf record does)
  perf evsel: Introduce disable() method
  perf cpumap: Auto initialize cpu__max_{node,cpu}
  drivers/hwtracing: make coresight-etm-perf.c explicitly non-modular
  drivers/hwtracing: make coresight-* explicitly non-modular
  coresight: introducing a global trace ID function
  coresight: etm-perf: new PMU driver for ETM tracers
  coresight: etb10: implementing AUX API
  coresight: etb10: adding operation mode for sink->enable()
  coresight: etb10: moving to local atomic operations
  coresight: etm3x: implementing perf_enable/disable() API
  coresight: etm3x: implementing user/kernel mode tracing
  coresight: etm3x: consolidating initial config
  coresight: etm3x: changing default trace configuration
  coresight: etm3x: set progbit to stop trace collection
  coresight: etm3x: adding operation mode for etm_enable()
  coresight: etm3x: splitting struct etm_drvdata
  coresight: etm3x: unlocking tracers in default arch init
  coresight: etm3x: moving sysFS entries to dedicated file
  coresight: etm3x: moving etm_readl/writel to header file
  coresight: moving PM runtime operations to core framework
  coresight: add API to get sink from path
  coresight: associating path with session rather than tracer
  coresight: etm4x: Check every parameter used by dma_xx_coherent.
  coresight: "DEVICE_ATTR_RO" should defined as static.
  coresight: implementing 'cpu_id()' API
  coresight: removing bind/unbind options from sysfs
  coresight: remove csdev's link from topology
  coresight: release reference taken by 'bus_find_device()'
  coresight: coresight_unregister() function cleanup
  coresight: fixing lockdep error
  coresight: fixing indentation problem
  coresight: Fix a typo in Kconfig
  coresight: checking for NULL string in coresight_name_match()
  perf/core: Disable the event on a truncated AUX record
  perf/core: Don't leak event in the syscall error path
  perf/core: Fix perf_sched_count derailment
  stm class: dummy_stm: Add link callback for fault injection
  stm class: Plug stm device's unlink callback
  stm class: Fix a race in unlinking
  stm class: Fix unbalanced module/device refcounting
  stm class: Guard output assignment against concurrency
  stm class: Fix unlocking braino in the error path
  stm class: Add heartbeat stm source device
  stm class: dummy_stm: Create multiple devices
  stm class: Support devices with multiple instances
  stm class: Use driver's packet callback return value
  stm class: Prevent user-controllable allocations
  stm class: Fix link list locking
  stm class: Fix locking in unbinding policy path
  stm class: Select CONFIG_SRCU
  stm class: Hide STM-specific options if STM is disabled
  perf: Synchronously free aux pages in case of allocation failure
  Linux 4.4.12
  kbuild: move -Wunused-const-variable to W=1 warning level
  Revert "scsi: fix soft lockup in scsi_remove_target() on module removal"
  scsi: Add intermediate STARGET_REMOVE state to scsi_target_state
  hpfs: implement the show_options method
  hpfs: fix remount failure when there are no options changed
  UBI: Fix static volume checks when Fastmap is used
  SIGNAL: Move generic copy_siginfo() to signal.h
  thunderbolt: Fix double free of drom buffer
  IB/srp: Fix a debug kernel crash
  ALSA: hda - Fix headset mic detection problem for one Dell machine
  ALSA: hda/realtek - Add support for ALC295/ALC3254
  ALSA: hda - Fix headphone noise on Dell XPS 13 9360
  ALSA: hda/realtek - New codecs support for ALC234/ALC274/ALC294
  mcb: Fixed bar number assignment for the gdd
  clk: bcm2835: add locking to pll*_on/off methods
  locking,qspinlock: Fix spin_is_locked() and spin_unlock_wait()
  serial: samsung: Reorder the sequence of clock control when call s3c24xx_serial_set_termios()
  serial: 8250_mid: recognize interrupt source in handler
  serial: 8250_mid: use proper bar for DNV platform
  serial: 8250_pci: fix divide error bug if baud rate is 0
  Fix OpenSSH pty regression on close
  tty/serial: atmel: fix hardware handshake selection
  TTY: n_gsm, fix false positive WARN_ON
  tty: vt, return error when con_startup fails
  xen/x86: actually allocate legacy interrupts on PV guests
  KVM: x86: mask CPUID(0xD,0x1).EAX against host value
  MIPS: KVM: Fix timer IRQ race when writing CP0_Compare
  MIPS: KVM: Fix timer IRQ race when freezing timer
  KVM: x86: fix ordering of cr0 initialization code in vmx_cpu_reset
  KVM: MTRR: remove MSR 0x2f8
  staging: comedi: das1800: fix possible NULL dereference
  usb: gadget: udc: core: Fix argument of dev_err() in usb_gadget_map_request()
  USB: leave LPM alone if possible when binding/unbinding interface drivers
  usb: misc: usbtest: fix pattern tests for scatterlists.
  usb: f_mass_storage: test whether thread is running before starting another
  usb: gadget: f_fs: Fix EFAULT generation for async read operations
  USB: serial: option: add even more ZTE device ids
  USB: serial: option: add more ZTE device ids
  USB: serial: option: add support for Cinterion PH8 and AHxx
  USB: serial: io_edgeport: fix memory leaks in probe error path
  USB: serial: io_edgeport: fix memory leaks in attach error path
  USB: serial: quatech2: fix use-after-free in probe error path
  USB: serial: keyspan: fix use-after-free in probe error path
  USB: serial: mxuport: fix use-after-free in probe error path
  mei: bus: call mei_cl_read_start under device lock
  mei: amthif: discard not read messages
  mei: fix NULL dereferencing during FW initiated disconnection
  Bluetooth: vhci: Fix race at creating hci device
  Bluetooth: vhci: purge unhandled skbs
  Bluetooth: vhci: fix open_timeout vs. hdev race
  mmc: sdhci-pci: Remove MMC_CAP_BUS_WIDTH_TEST for Intel controllers
  mmc: longer timeout for long read time quirk
  dell-rbtn: Ignore ACPI notifications if device is suspended
  ACPI / osi: Fix an issue that acpi_osi=!* cannot disable ACPICA internal strings
  mmc: sdhci-acpi: Remove MMC_CAP_BUS_WIDTH_TEST for Intel controllers
  mmc: mmc: Fix partition switch timeout for some eMMCs
  can: fix handling of unmodifiable configuration options
  irqchip/gic-v3: Configure all interrupts as non-secure Group-1
  irqchip/gic: Ensure ordering between read of INTACK and shared data
  Input: pwm-beeper - fix - scheduling while atomic
  mfd: omap-usb-tll: Fix scheduling while atomic BUG
  sched/loadavg: Fix loadavg artifacts on fully idle and on fully loaded systems
  clk: qcom: msm8916: Fix crypto clock flags
  crypto: sun4i-ss - Replace spinlock_bh by spin_lock_irq{save|restore}
  crypto: talitos - fix ahash algorithms registration
  crypto: caam - fix caam_jr_alloc() ret code
  ring-buffer: Prevent overflow of size in ring_buffer_resize()
  ring-buffer: Use long for nr_pages to avoid overflow failures
  asix: Fix offset calculation in asix_rx_fixup() causing slow transmissions
  fs/cifs: correctly to anonymous authentication for the NTLM(v2) authentication
  fs/cifs: correctly to anonymous authentication for the NTLM(v1) authentication
  fs/cifs: correctly to anonymous authentication for the LANMAN authentication
  fs/cifs: correctly to anonymous authentication via NTLMSSP
  remove directory incorrectly tries to set delete on close on non-empty directories
  kvm: arm64: Fix EC field in inject_abt64
  arm/arm64: KVM: Enforce Break-Before-Make on Stage-2 page tables
  arm64: cpuinfo: Missing NULL terminator in compat_hwcap_str
  arm64: Implement pmdp_set_access_flags() for hardware AF/DBM
  arm64: Implement ptep_set_access_flags() for hardware AF/DBM
  arm64: Ensure pmd_present() returns false after pmd_mknotpresent()
  arm64: Fix typo in the pmdp_huge_get_and_clear() definition
  ext4: iterate over buffer heads correctly in move_extent_per_page()
  perf test: Fix build of BPF and LLVM on older glibc libraries
  perf/core: Fix perf_event_open() vs. execve() race
  perf/x86/intel/pt: Generate PMI in the STOP region as well
  Btrfs: don't use src fd for printk
  UPSTREAM: mac80211: fix "warning: ‘target_metric’ may be used uninitialized"
  Revert "drivers: power: use 'current' instead of 'get_current()'"
  cpufreq: interactive: drop cpufreq_{get,put}_global_kobject func calls
  Revert "cpufreq: interactive: build fixes for 4.4"
  xt_qtaguid: Fix panic caused by processing non-full socket.
  fiq_debugger: Add fiq_debugger.disable option
  UPSTREAM: procfs: fixes pthread cross-thread naming if !PR_DUMPABLE
  FROMLIST: wlcore: Disable filtering in AP role
  Revert "drivers: power: Add watchdog timer to catch drivers which lockup during suspend."
  fiq_debugger: Add option to apply uart overlay by FIQ_DEBUGGER_UART_OVERLAY
  Revert "Recreate asm/mach/mmc.h include file"
  Revert "ARM: Add 'card_present' state to mmc_platfrom_data"
  usb: dual-role: make stub functions inline
  Revert "mmc: Add status IRQ and status callback function to mmc platform data"
  quick selinux support for tracefs
  Revert "hid-multitouch: Filter collections by application usage."
  Revert "HID: steelseries: validate output report details"
  xt_qtaguid: Fix panic caused by synack processing
  Revert "mm: vmscan: Add a debug file for shrinkers"
  Revert "SELinux: Enable setting security contexts on rootfs inodes."
  Revert "SELinux: build fix for 4.1"
  fuse: Add support for d_canonical_path
  vfs: change d_canonical_path to take two paths
  android: recommended.cfg: remove CONFIG_UID_STAT
  netfilter: xt_qtaguid: seq_printf fixes
  Revert "misc: uidstat: Adding uid stat driver to collect network statistics."
  Revert "net: activity_stats: Add statistics for network transmission activity"
  Revert "net: activity_stats: Stop using obsolete create_proc_read_entry api"
  Revert "misc: uidstat: avoid create_stat() race and blockage."
  Revert "misc: uidstat: Remove use of obsolete create_proc_read_entry api"
  Revert "misc seq_printf fixes for 4.4"
  Revert "misc: uid_stat: Include linux/atomic.h instead of asm/atomic.h"
  Revert "net: socket ioctl to reset connections matching local address"
  Revert "net: fix iterating over hashtable in tcp_nuke_addr()"
  Revert "net: fix crash in tcp_nuke_addr()"
  Revert "Don't kill IPv4 sockets when killing IPv6 sockets was requested."
  Revert "tcp: Fix IPV6 module build errors"
  android: base-cfg: remove CONFIG_SWITCH
  Revert "switch: switch class and GPIO drivers."
  Revert "drivers: switch: remove S_IWUSR from dev_attr"
  ANDROID: base-cfg: enable CONFIG_IP_NF_NAT
  BACKPORT: selinux: restrict kernel module loading
  android: base-cfg: enable CONFIG_QUOTA

Conflicts:
	Documentation/sysctl/kernel.txt
	drivers/cpufreq/cpufreq_interactive.c
	drivers/hwtracing/coresight/Kconfig
	drivers/hwtracing/coresight/Makefile
	drivers/hwtracing/coresight/coresight-etm4x.c
	drivers/hwtracing/coresight/coresight-etm4x.h
	drivers/hwtracing/coresight/coresight-priv.h
	drivers/hwtracing/coresight/coresight-stm.c
	drivers/hwtracing/coresight/coresight-tmc.c
	drivers/mmc/core/core.c
	include/linux/coresight-stm.h
	include/linux/coresight.h
	include/linux/msm_mdp.h
	include/uapi/linux/coresight-stm.h
	kernel/events/core.c
	kernel/sched/fair.c
	net/Makefile
	net/ipv4/netfilter/arp_tables.c
	net/ipv4/netfilter/ip_tables.c
	net/ipv4/tcp.c
	net/ipv6/netfilter/ip6_tables.c
	net/netfilter/xt_quota2.c
	sound/core/pcm.c

Change-Id: I17aa0002815014e9bddc47e67769a53c15768a99
Signed-off-by: Runmin Wang <runminw@codeaurora.org>
2016-10-28 10:48:35 -07:00
Jeff Vander Stoep
1a565f59cb FROMLIST: security,perf: Allow further restriction of perf_event_open
When kernel.perf_event_open is set to 3 (or greater), disallow all
access to performance events by users without CAP_SYS_ADMIN.
Add a Kconfig symbol CONFIG_SECURITY_PERF_EVENTS_RESTRICT that
makes this value the default.

This is based on a similar feature in grsecurity
(CONFIG_GRKERNSEC_PERF_HARDEN).  This version doesn't include making
the variable read-only.  It also allows enabling further restriction
at run-time regardless of whether the default is changed.

https://lkml.org/lkml/2016/1/11/587

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

Git-repo: https://android.googlesource.com/kernel/common.git
Git-commit: 012b0adcf7299f6509d4984cf46ee11e6eaed4e4
[d-cagle@codeaurora.org: Resolve trivial merge conflicts]
Signed-off-by: Dennis Cagle <d-cagle@codeaurora.org>
Bug: 29054680
Change-Id: Iff5bff4fc1042e85866df9faa01bce8d04335ab8
2016-09-13 12:23:33 -07:00
Ben Hutchings
a647f40d2b BACKPORT: perf tools: Document the perf sysctls
perf_event_paranoid was only documented in source code and a perf error
message.  Copy the documentation from the error message to
Documentation/sysctl/kernel.txt.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-doc@vger.kernel.org
Link: http://lkml.kernel.org/r/20160119213515.GG2637@decadent.org.uk
[ Remove reference to external Documentation file, provide info inline, as before ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

Bug: 29054680
Change-Id: I13e73cfb2ad761c94762d0c8196df7725abdf5c5
Git-repo: https://android.googlesource.com/kernel/common.git
Git-commit: b79154b8f7702f6e8a56ce9f1355f841cec16c37
[d-cagle@codeaurora.org: Resolve trivial merge conflicts]
Signed-off-by: Dennis Cagle <d-cagle@codeaurora.org>
2016-09-13 12:19:28 -07:00
Dmitry Shmidt
b558f17a13 This is the 4.4.16 stable release
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJXmOXmAAoJEDjbvchgkmk+QYIP/1S8oBZsvjfDzvH8t63HyLeH
 i43MFlYoFAqUIZc002XpluSvZ8uHoG+r7R8Hq3wmv48wxe3M6OBnMdBVTht6mPw+
 t5OLTZr40lWaJm2EIi4aekueMIrCgmL+Et+IFYv7ZVBuYLteVcfny+zdq4EqGmgj
 /a19+L/sTTr4SHtJIhHxWhiVJ9fVMgQk/N3VgQmIiNF2+lVbiFI7QQiDPLbFl0KK
 CM4ETO22HxHCYilGpzhpSMsHCxv12VqNaXNLAsPAepGGW7PqvUmrEWAqgwsbOfRc
 GxTLNk0dUgJqMrfEpQ8ZOMlgzvCAYG2jZuNSuT+nuzrWSUP+WOGRi9TTTxp1CYuZ
 PHlhNTH7ZnqosxJUUZS2d9N5ygpqD48Rhlfl824YzOWCy94VeUnedkVLb20uJwPF
 Y5aQ5WjktBC9why5e4OgGQERvx/U9KTk8E1zRfZZPc2oft9My0YxuemjjKAKZiYN
 ne4WhXbgOJTQkAoZwh2xqny3bWyEaoSrWpQ3R7bBJ9SIRLEOdCKzKpduDbAnbMP7
 QWgQOQC/6qA1mKqjrqF4KPA1Quo9PcUK2Ajh523ewMGCowgY90vyejAgh4Q8g0GC
 fKlx+jJDoKVDbQ8v4hc9PPHMsNNIKT9a1ptwVS3lE+bq1D5Ffm57A4/uOTMYHVab
 gKqu8h1CA0MCVBsH3nNA
 =nY8S
 -----END PGP SIGNATURE-----

Merge tag 'v4.4.16' into android-4.4.y

This is the 4.4.16 stable release

Change-Id: Ibaf7b7e03695e1acebc654a2ca1a4bfcc48fcea4
2016-08-01 15:57:55 -07:00
Jeff Vander Stoep
934f4983c7 FROMLIST: security,perf: Allow further restriction of perf_event_open
When kernel.perf_event_open is set to 3 (or greater), disallow all
access to performance events by users without CAP_SYS_ADMIN.
Add a Kconfig symbol CONFIG_SECURITY_PERF_EVENTS_RESTRICT that
makes this value the default.

This is based on a similar feature in grsecurity
(CONFIG_GRKERNSEC_PERF_HARDEN).  This version doesn't include making
the variable read-only.  It also allows enabling further restriction
at run-time regardless of whether the default is changed.

https://lkml.org/lkml/2016/1/11/587

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

Bug: 29054680
Change-Id: Iff5bff4fc1042e85866df9faa01bce8d04335ab8
2016-06-16 13:44:10 +05:30
Ben Hutchings
690829a7ad BACKPORT: perf tools: Document the perf sysctls
perf_event_paranoid was only documented in source code and a perf error
message.  Copy the documentation from the error message to
Documentation/sysctl/kernel.txt.

perf_cpu_time_max_percent was already documented but missing from the
list at the top, so add it there.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-doc@vger.kernel.org
Link: http://lkml.kernel.org/r/20160119213515.GG2637@decadent.org.uk
[ Remove reference to external Documentation file, provide info inline, as before ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

Bug: 29054680
Change-Id: I13e73cfb2ad761c94762d0c8196df7725abdf5c5
2016-06-16 13:44:10 +05:30
Alex Shi
9ad8208bd7 Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android 2016-06-14 17:08:03 +08:00
Willy Tarreau
fa6d0ba12a pipe: limit the per-user amount of pages allocated in pipes
commit 759c01142a5d0f364a462346168a56de28a80f52 upstream.

On no-so-small systems, it is possible for a single process to cause an
OOM condition by filling large pipes with data that are never read. A
typical process filling 4000 pipes with 1 MB of data will use 4 GB of
memory. On small systems it may be tricky to set the pipe max size to
prevent this from happening.

This patch makes it possible to enforce a per-user soft limit above
which new pipes will be limited to a single page, effectively limiting
them to 4 kB each, as well as a hard limit above which no new pipes may
be created for this user. This has the effect of protecting the system
against memory abuse without hurting other users, and still allowing
pipes to work correctly though with less data at once.

The limit are controlled by two new sysctls : pipe-user-pages-soft, and
pipe-user-pages-hard. Both may be disabled by setting them to zero. The
default soft limit allows the default number of FDs per process (1024)
to create pipes of the default size (64kB), thus reaching a limit of 64MB
before starting to create only smaller pipes. With 256 processes limited
to 1024 FDs each, this results in 1024*64kB + (256*1024 - 1024) * 4kB =
1084 MB of memory allocated for a user. The hard limit is disabled by
default to avoid breaking existing applications that make intensive use
of pipes (eg: for splicing).

Reported-by: socketpair@gmail.com
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Mitigates: CVE-2013-4312 (Linux 2.0+)
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Moritz Muehlenhoff <moritz@wikimedia.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-07 18:14:35 -07:00
Jeff Vander Stoep
9d5f5d9346 FROMLIST: security,perf: Allow further restriction of perf_event_open
When kernel.perf_event_open is set to 3 (or greater), disallow all
access to performance events by users without CAP_SYS_ADMIN.
Add a Kconfig symbol CONFIG_SECURITY_PERF_EVENTS_RESTRICT that
makes this value the default.

This is based on a similar feature in grsecurity
(CONFIG_GRKERNSEC_PERF_HARDEN).  This version doesn't include making
the variable read-only.  It also allows enabling further restriction
at run-time regardless of whether the default is changed.

https://lkml.org/lkml/2016/1/11/587

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

Bug: 29054680
Change-Id: Iff5bff4fc1042e85866df9faa01bce8d04335ab8
2016-05-31 22:22:16 -07:00
Ben Hutchings
ebac2a3dcd BACKPORT: perf tools: Document the perf sysctls
perf_event_paranoid was only documented in source code and a perf error
message.  Copy the documentation from the error message to
Documentation/sysctl/kernel.txt.

perf_cpu_time_max_percent was already documented but missing from the
list at the top, so add it there.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-doc@vger.kernel.org
Link: http://lkml.kernel.org/r/20160119213515.GG2637@decadent.org.uk
[ Remove reference to external Documentation file, provide info inline, as before ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

Bug: 29054680
Change-Id: I13e73cfb2ad761c94762d0c8196df7725abdf5c5
2016-05-31 22:22:01 -07:00
David Collins
d4b065ff47 sysctl: add boot_reason and cold_boot sysctl entries for arm64
Define boot_reason and cold_boot variables in the arm64 version
of setup.c so that arm64 targets can export the boot_reason and
cold_boot sysctl entries.

This feature is required by the qpnp-power-on driver.

Change-Id: Id2d4ff5b8caa2e6a35d4ac61e338963d602c8b84
Signed-off-by: David Collins <collinsd@codeaurora.org>
[osvaldob: resolved trival merge conflicts]
Signed-off-by: Osvaldo Banuelos <osvaldob@codeaurora.org>
2016-03-01 12:22:13 -08:00
dcashman
d49d88766b FROMLIST: mm: mmap: Add new /proc tunable for mmap_base ASLR.
(cherry picked from commit https://lkml.org/lkml/2015/12/21/337)

ASLR  only uses as few as 8 bits to generate the random offset for the
mmap base address on 32 bit architectures. This value was chosen to
prevent a poorly chosen value from dividing the address space in such
a way as to prevent large allocations. This may not be an issue on all
platforms. Allow the specification of a minimum number of bits so that
platforms desiring greater ASLR protection may determine where to place
the trade-off.

Bug: 24047224
Signed-off-by: Daniel Cashman <dcashman@android.com>
Signed-off-by: Daniel Cashman <dcashman@google.com>
Change-Id: Ibf9ed3d4390e9686f5cc34f605d509a20d40e6c2
2016-02-16 13:54:14 -08:00
Rik van Riel
f8ade3666c add extra free kbytes tunable
Add a userspace visible knob to tell the VM to keep an extra amount
of memory free, by increasing the gap between each zone's min and
low watermarks.

This is useful for realtime applications that call system
calls and have a bound on the number of allocations that happen
in any short time period.  In this application, extra_free_kbytes
would be left at an amount equal to or larger than than the
maximum number of allocations that happen in any burst.

It may also be useful to reduce the memory use of virtual
machines (temporarily?), in a way that does not cause memory
fragmentation like ballooning does.

[ccross]
Revived for use on old kernels where no other solution exists.
The tunable will be removed on kernels that do better at avoiding
direct reclaim.

Change-Id: I765a42be8e964bfd3e2886d1ca85a29d60c3bb3e
Signed-off-by: Rik van Riel<riel@redhat.com>
Signed-off-by: Colin Cross <ccross@android.com>
2016-02-16 13:54:12 -08:00
Chun Chen
c56050c700 Documentation/sysctl/vm.txt: fix misleading code reference of overcommit_memory
The origin document references to cap_vm_enough_memory is because
cap_vm_enough_memory invoked __vm_enough_memory before and it no longer
does now.

Signed-off-by: Chun Chen <ramichen@tencent.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-09 15:11:24 -08:00
Jiri Kosina
55537871ef kernel/watchdog.c: perform all-CPU backtrace in case of hard lockup
In many cases of hardlockup reports, it's actually not possible to know
why it triggered, because the CPU that got stuck is usually waiting on a
resource (with IRQs disabled) in posession of some other CPU is holding.

IOW, we are often looking at the stacktrace of the victim and not the
actual offender.

Introduce sysctl / cmdline parameter that makes it possible to have
hardlockup detector perform all-CPU backtrace.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Aaron Tomlin <atomlin@redhat.com>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Acked-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05 19:34:48 -08:00
Phil Sutter
2e64126bb0 net: qdisc: enhance default_qdisc documentation
Aside from some lingual cleanup, point out which interfaces are not or
partly covered by this setting.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-17 16:09:22 -07:00
Yaowei Bai
013110a73d mm/page_alloc.c: fix a misleading comment
The comment says that the per-cpu batchsize and zone watermarks are
determined by present_pages which is definitely wrong, they are both
calculated from managed_pages.  Fix it.

Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-08 15:35:28 -07:00
Rabin Vincent
a10726bb54 Documentation: mm: fix location of extfrag_index
/proc/extfrag_index does not exist.  This file is in debugfs.  Fix the
description of extfrag_threshold to reflect this.

Signed-off-by: Rabin Vincent <rabin.vincent@axis.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2015-07-24 15:05:56 +02:00
Nicolas Iooss
5202efe544 coredump: use from_kuid/kgid when formatting corename
When adding __printf attribute to cn_printf, gcc reports some issues:

  fs/coredump.c:213:5: warning: format '%d' expects argument of type
  'int', but argument 3 has type 'kuid_t' [-Wformat=]
       err = cn_printf(cn, "%d", cred->uid);
       ^
  fs/coredump.c:217:5: warning: format '%d' expects argument of type
  'int', but argument 3 has type 'kgid_t' [-Wformat=]
       err = cn_printf(cn, "%d", cred->gid);
       ^

These warnings come from the fact that the value of uid/gid needs to be
extracted from the kuid_t/kgid_t structure before being used as an
integer.  More precisely, cred->uid and cred->gid need to be converted to
either user-namespace uid/gid or to init_user_ns uid/gid.

Use init_user_ns in order not to break existing ABI, and document this in
Documentation/sysctl/kernel.txt.

While at it, format uid and gid values with %u instead of %d because
uid_t/__kernel_uid32_t and gid_t/__kernel_gid32_t are unsigned int.

Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 17:00:43 -07:00
Chris Metcalf
fe4ba3c343 watchdog: add watchdog_cpumask sysctl to assist nohz
Change the default behavior of watchdog so it only runs on the
housekeeping cores when nohz_full is enabled at build and boot time.
Allow modifying the set of cores the watchdog is currently running on
with a new kernel.watchdog_cpumask sysctl.

In the current system, the watchdog subsystem runs a periodic timer that
schedules the watchdog kthread to run.  However, nohz_full cores are
designed to allow userspace application code running on those cores to
have 100% access to the CPU.  So the watchdog system prevents the
nohz_full application code from being able to run the way it wants to,
thus the motivation to suppress the watchdog on nohz_full cores, which
this patchset provides by default.

However, if we disable the watchdog globally, then the housekeeping
cores can't benefit from the watchdog functionality.  So we allow
disabling it only on some cores.  See Documentation/lockup-watchdogs.txt
for more information.

[jhubbard@nvidia.com: fix a watchdog crash in some configurations]
Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
Acked-by: Don Zickus <dzickus@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:40 -07:00
Heinrich Schuchardt
0ec62afeb1 Doc/sysctl/kernel.txt: document threads-max
File /proc/sys/kernel/threads-max controls the maximum number of threads
that can be created using fork().

[akpm@linux-foundation.org: fix typo, per Guenter]
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-17 09:04:07 -04:00
Eric B Munson
5bbe3547aa mm: allow compaction of unevictable pages
Currently, pages which are marked as unevictable are protected from
compaction, but not from other types of migration.  The POSIX real time
extension explicitly states that mlock() will prevent a major page
fault, but the spirit of this is that mlock() should give a process the
ability to control sources of latency, including minor page faults.
However, the mlock manpage only explicitly says that a locked page will
not be written to swap and this can cause some confusion.  The
compaction code today does not give a developer who wants to avoid swap
but wants to have large contiguous areas available any method to achieve
this state.  This patch introduces a sysctl for controlling compaction
behavior with respect to the unevictable lru.  Users who demand no page
faults after a page is present can set compact_unevictable_allowed to 0
and users who need the large contiguous areas can enable compaction on
locked memory by leaving the default value of 1.

To illustrate this problem I wrote a quick test program that mmaps a
large number of 1MB files filled with random data.  These maps are
created locked and read only.  Then every other mmap is unmapped and I
attempt to allocate huge pages to the static huge page pool.  When the
compact_unevictable_allowed sysctl is 0, I cannot allocate hugepages
after fragmenting memory.  When the value is set to 1, allocations
succeed.

Signed-off-by: Eric B Munson <emunson@akamai.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-15 16:35:17 -07:00
Ulrich Obergfell
195daf665a watchdog: enable the new user interface of the watchdog mechanism
With the current user interface of the watchdog mechanism it is only
possible to disable or enable both lockup detectors at the same time.
This series introduces new kernel parameters and changes the semantics of
some existing kernel parameters, so that the hard lockup detector and the
soft lockup detector can be disabled or enabled individually.  With this
series applied, the user interface is as follows.

- parameters in /proc/sys/kernel

  . soft_watchdog
    This is a new parameter to control and examine the run state of
    the soft lockup detector.

  . nmi_watchdog
    The semantics of this parameter have changed. It can now be used
    to control and examine the run state of the hard lockup detector.

  . watchdog
    This parameter is still available to control the run state of both
    lockup detectors at the same time. If this parameter is examined,
    it shows the logical OR of soft_watchdog and nmi_watchdog.

  . watchdog_thresh
    The semantics of this parameter are not affected by the patch.

- kernel command line parameters

  . nosoftlockup
    The semantics of this parameter have changed. It can now be used
    to disable the soft lockup detector at boot time.

  . nmi_watchdog=0 or nmi_watchdog=1
    Disable or enable the hard lockup detector at boot time. The patch
    introduces '=1' as a new option.

  . nowatchdog
    The semantics of this parameter are not affected by the patch. It
    is still available to disable both lockup detectors at boot time.

Also, remove the proc_dowatchdog() function which is no longer needed.

[dzickus@redhat.com: wrote changelog]
[dzickus@redhat.com: update documentation for kernel params and sysctl]
Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14 16:48:59 -07:00
Linus Torvalds
59d53737a8 Merge branch 'akpm' (patches from Andrew)
Merge second set of updates from Andrew Morton:
 "More of MM"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (83 commits)
  mm/nommu.c: fix arithmetic overflow in __vm_enough_memory()
  mm/mmap.c: fix arithmetic overflow in __vm_enough_memory()
  vmstat: Reduce time interval to stat update on idle cpu
  mm/page_owner.c: remove unnecessary stack_trace field
  Documentation/filesystems/proc.txt: describe /proc/<pid>/map_files
  mm: incorporate read-only pages into transparent huge pages
  vmstat: do not use deferrable delayed work for vmstat_update
  mm: more aggressive page stealing for UNMOVABLE allocations
  mm: always steal split buddies in fallback allocations
  mm: when stealing freepages, also take pages created by splitting buddy page
  mincore: apply page table walker on do_mincore()
  mm: /proc/pid/clear_refs: avoid split_huge_page()
  mm: pagewalk: fix misbehavior of walk_page_range for vma(VM_PFNMAP)
  mempolicy: apply page table walker on queue_pages_range()
  arch/powerpc/mm/subpage-prot.c: use walk->vma and walk_page_vma()
  memcg: cleanup preparation for page table walk
  numa_maps: remove numa_maps->vma
  numa_maps: fix typo in gather_hugetbl_stats
  pagemap: use walk->vma instead of calling find_vma()
  clear_refs: remove clear_refs_private->vma and introduce clear_refs_test_walk()
  ...
2015-02-11 18:23:28 -08:00
Kirill A. Shutemov
dc6c9a35b6 mm: account pmd page tables to the process
Dave noticed that unprivileged process can allocate significant amount of
memory -- >500 MiB on x86_64 -- and stay unnoticed by oom-killer and
memory cgroup.  The trick is to allocate a lot of PMD page tables.  Linux
kernel doesn't account PMD tables to the process, only PTE.

The use-cases below use few tricks to allocate a lot of PMD page tables
while keeping VmRSS and VmPTE low.  oom_score for the process will be 0.

	#include <errno.h>
	#include <stdio.h>
	#include <stdlib.h>
	#include <unistd.h>
	#include <sys/mman.h>
	#include <sys/prctl.h>

	#define PUD_SIZE (1UL << 30)
	#define PMD_SIZE (1UL << 21)

	#define NR_PUD 130000

	int main(void)
	{
		char *addr = NULL;
		unsigned long i;

		prctl(PR_SET_THP_DISABLE);
		for (i = 0; i < NR_PUD ; i++) {
			addr = mmap(addr + PUD_SIZE, PUD_SIZE, PROT_WRITE|PROT_READ,
					MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
			if (addr == MAP_FAILED) {
				perror("mmap");
				break;
			}
			*addr = 'x';
			munmap(addr, PMD_SIZE);
			mmap(addr, PMD_SIZE, PROT_WRITE|PROT_READ,
					MAP_ANONYMOUS|MAP_PRIVATE|MAP_FIXED, -1, 0);
			if (addr == MAP_FAILED)
				perror("re-mmap"), exit(1);
		}
		printf("PID %d consumed %lu KiB in PMD page tables\n",
				getpid(), i * 4096 >> 10);
		return pause();
	}

The patch addresses the issue by account PMD tables to the process the
same way we account PTE.

The main place where PMD tables is accounted is __pmd_alloc() and
free_pmd_range(). But there're few corner cases:

 - HugeTLB can share PMD page tables. The patch handles by accounting
   the table to all processes who share it.

 - x86 PAE pre-allocates few PMD tables on fork.

 - Architectures with FIRST_USER_ADDRESS > 0. We need to adjust sanity
   check on exit(2).

Accounting only happens on configuration where PMD page table's level is
present (PMD is not folded).  As with nr_ptes we use per-mm counter.  The
counter value is used to calculate baseline for badness score by
oom-killer.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: David Rientjes <rientjes@google.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11 17:06:04 -08:00
Linus Torvalds
73b4f63aeb Documentation changes for 3.20
Highlights this time around include:
 
  - A thrashing of SubmittingPatches to bring it out of the "send everything
    to Linus" era of kernel development.
 
  - A new document on completions from Nicholas McGuire
 
  - Lots of typo fixes, formatting improvements, corrections, build fixes,
    and more.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJU2QvPAAoJEI3ONVYwIuV6UdAP/iFa3yK0jTMFJV49K2PhVPiW
 9AxlcDT3mKCmGrCxS2ST84Kyvxi+bn5k6pQbOhXqPTLRzlWj7o+9zG76yCvhI1/0
 mSUc/DoEZSlxQCAi4RH6lNFBtyopwOF1Fy2hqP8Wj7fOu2OxJ+DbTFJ7Cjy2Ybnq
 REFT+28mD3GwBYv1r6mirbXsxBGUWK4avqUl9TPkC1AgVoksZ/aLrDffdmixL6Ul
 cNBGGn3/mQhDCd6QgrHfdSe3BmL/vpiO5nr7BnSGe/hg65tUgi5d5s9tyP1eRkYo
 d+3Ityz24ISCW6kAGH9udNzdtw2QA7vxdVIcz01RgxHQcYdcTyE7ZH28+A+aEtxm
 ANkOO5PvaQJrCdPATOuVvwTPmK+xjdW7TmmPiJZ3QpVuPkFlUDP+amOgOajBDQ2l
 UDHE2GfrUcjUG4FSUGXLVKXVAXuqLG8DG1rMSEf1utb80jxcuhK2kIrhjfRi4gle
 gL3PKd7SfhMowm3QbaMxdiy0RpNK+IlJpiFsDFWUJwQCJvCtxlwL/RalfGxitjqs
 RxJo+uPxFLrmsnM8fdw5a/82R0T/nclvnzniq5PoZROOJ6VzKvwn3oS9d63bliPI
 JQTfNpbfYq8sRlrPlp+XETrrcbmmYyxgqvurLobNXb74cdJHEzHjqf9OG++NKtQQ
 Jre067cSnkSrHAHtxNns
 =r07E
 -----END PGP SIGNATURE-----

Merge tag 'docs-for-linus' of git://git.lwn.net/linux-2.6

Pull documentation updates from Jonathan Corbet:
 "Highlights this time around include:

   - A thrashing of SubmittingPatches to bring it out of the "send
     everything to Linus" era of kernel development.

   - A new document on completions from Nicholas McGuire

   - Lots of typo fixes, formatting improvements, corrections, build
     fixes, and more"

* tag 'docs-for-linus' of git://git.lwn.net/linux-2.6: (35 commits)
  Documentation: Fix the wrong command `echo -1 > set_ftrace_pid` for cleaning the filter.
  can-doc: Fixed a wrong filepath in can.txt
  Documentation: Fix trivial typo in comment.
  kgdb,docs: Fix typo and minor style issues
  Documentation: add description for FTRACE probe status
  doc: brief user documentation for completion
  Documentation/misc-devices/mei: Fix indentation of embedded code.
  Documentation/misc-devices/mei: Fix indentation of enumeration.
  Documentation/misc-devices/mei: Fix spacing around parentheses.
  Documentation/misc-devices/mei: Fix formatting of headings.
  Documentation: devicetree: Fix double words in Doumentation/devicetree
  Documentation: mm: Fix typo in vm.txt
  lockstat: Add documentation on contention and contenting points
  Documentation: fix blackfin gptimers-example build errors
  Fixes column alignment in table of contents entry 1.9 in Documentation/filesystems/proc.txt
  CodingStyle: enable emacs display of trailing whitespace
  DocBook: Do not exceed argument list limit
  gpio: board.txt: Fix the gpio name example
  Documentation/SubmittingPatches: unify whitespace/tabs for the DCO
  MAINTAINERS: Add the docs-next git tree to the maintainer entry
  ...
2015-02-11 13:03:11 -08:00
Linus Torvalds
c5ce28df0e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) More iov_iter conversion work from Al Viro.

    [ The "crypto: switch af_alg_make_sg() to iov_iter" commit was
      wrong, and this pull actually adds an extra commit on top of the
      branch I'm pulling to fix that up, so that the pre-merge state is
      ok.   - Linus ]

 2) Various optimizations to the ipv4 forwarding information base trie
    lookup implementation.  From Alexander Duyck.

 3) Remove sock_iocb altogether, from CHristoph Hellwig.

 4) Allow congestion control algorithm selection via routing metrics.
    From Daniel Borkmann.

 5) Make ipv4 uncached route list per-cpu, from Eric Dumazet.

 6) Handle rfs hash collisions more gracefully, also from Eric Dumazet.

 7) Add xmit_more support to r8169, e1000, and e1000e drivers.  From
    Florian Westphal.

 8) Transparent Ethernet Bridging support for GRO, from Jesse Gross.

 9) Add BPF packet actions to packet scheduler, from Jiri Pirko.

10) Add support for uniqu flow IDs to openvswitch, from Joe Stringer.

11) New NetCP ethernet driver, from Muralidharan Karicheri and Wingman
    Kwok.

12) More sanely handle out-of-window dupacks, which can result in
    serious ACK storms.  From Neal Cardwell.

13) Various rhashtable bug fixes and enhancements, from Herbert Xu,
    Patrick McHardy, and Thomas Graf.

14) Support xmit_more in be2net, from Sathya Perla.

15) Group Policy extensions for vxlan, from Thomas Graf.

16) Remove Checksum Offload support for vxlan, from Tom Herbert.

17) Like ipv4, support lockless transmit over ipv6 UDP sockets.  From
    Vlad Yasevich.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1494+1 commits)
  crypto: fix af_alg_make_sg() conversion to iov_iter
  ipv4: Namespecify TCP PMTU mechanism
  i40e: Fix for stats init function call in Rx setup
  tcp: don't include Fast Open option in SYN-ACK on pure SYN-data
  openvswitch: Only set TUNNEL_VXLAN_OPT if VXLAN-GBP metadata is set
  ipv6: Make __ipv6_select_ident static
  ipv6: Fix fragment id assignment on LE arches.
  bridge: Fix inability to add non-vlan fdb entry
  net: Mellanox: Delete unnecessary checks before the function call "vunmap"
  cxgb4: Add support in cxgb4 to get expansion rom version via ethtool
  ethtool: rename reserved1 memeber in ethtool_drvinfo for expansion ROM version
  net: dsa: Remove redundant phy_attach()
  IB/mlx4: Reset flow support for IB kernel ULPs
  IB/mlx4: Always use the correct port for mirrored multicast attachments
  net/bonding: Fix potential bad memory access during bonding events
  tipc: remove tipc_snprintf
  tipc: nl compat add noop and remove legacy nl framework
  tipc: convert legacy nl stats show to nl compat
  tipc: convert legacy nl net id get to nl compat
  tipc: convert legacy nl net id set to nl compat
  ...
2015-02-10 20:01:30 -08:00
Willem de Bruijn
b245be1f4d net-timestamp: no-payload only sysctl
Tx timestamps are looped onto the error queue on top of an skb. This
mechanism leaks packet headers to processes unless the no-payload
options SOF_TIMESTAMPING_OPT_TSONLY is set.

Add a sysctl that optionally drops looped timestamp with data. This
only affects processes without CAP_NET_RAW.

The policy is checked when timestamps are generated in the stack.
It is possible for timestamps with data to be reported after the
sysctl is set, if these were queued internally earlier.

No vulnerability is immediately known that exploits knowledge
gleaned from packet headers, but it may still be preferable to allow
administrators to lock down this path at the cost of possible
breakage of legacy applications.

Signed-off-by: Willem de Bruijn <willemb@google.com>

----

Changes
  (v1 -> v2)
  - test socket CAP_NET_RAW instead of capable(CAP_NET_RAW)
  (rfc -> v1)
  - document the sysctl in Documentation/sysctl/net.txt
  - fix access control race: read .._OPT_TSONLY only once,
        use same value for permission check and skb generation.
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-02 18:46:51 -08:00
Masanari Iida
633708a4a4 Documentation: mm: Fix typo in vm.txt
This patch fix a spelling typo in Documentation/sysctl/vm.txt

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2015-01-28 15:13:11 -07:00
Seth Jennings
c5f4546593 livepatch: kernel: add TAINT_LIVEPATCH
This adds a new taint flag to indicate when the kernel or a kernel
module has been live patched.  This will provide a clean indication in
bug reports that live patching was used.

Additionally, if the crash occurs in a live patched function, the live
patch module will appear beside the patched function in the backtrace.

Signed-off-by: Seth Jennings <sjenning@redhat.com>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Petr Mladek <pmladek@suse.cz>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2014-12-22 15:40:48 +01:00
Manfred Spraul
0050ee059f ipc/msg: increase MSGMNI, remove scaling
SysV can be abused to allocate locked kernel memory.  For most systems, a
small limit doesn't make sense, see the discussion with regards to SHMMAX.

Therefore: increase MSGMNI to the maximum supported.

And: If we ignore the risk of locking too much memory, then an automatic
scaling of MSGMNI doesn't make sense.  Therefore the logic can be removed.

The code preserves auto_msgmni to avoid breaking any user space applications
that expect that the value exists.

Notes:
1) If an administrator must limit the memory allocations, then he can set
MSGMNI as necessary.

Or he can disable sysv entirely (as e.g. done by Android).

2) MSGMAX and MSGMNB are intentionally not increased, as these values are used
to control latency vs. throughput:
If MSGMNB is large, then msgsnd() just returns and more messages can be queued
before a task switch to a task that calls msgrcv() is forced.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Rafael Aquini <aquini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:52 -08:00
Linus Torvalds
70e71ca0af Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) New offloading infrastructure and example 'rocker' driver for
    offloading of switching and routing to hardware.

    This work was done by a large group of dedicated individuals, not
    limited to: Scott Feldman, Jiri Pirko, Thomas Graf, John Fastabend,
    Jamal Hadi Salim, Andy Gospodarek, Florian Fainelli, Roopa Prabhu

 2) Start making the networking operate on IOV iterators instead of
    modifying iov objects in-situ during transfers.  Thanks to Al Viro
    and Herbert Xu.

 3) A set of new netlink interfaces for the TIPC stack, from Richard
    Alpe.

 4) Remove unnecessary looping during ipv6 routing lookups, from Martin
    KaFai Lau.

 5) Add PAUSE frame generation support to gianfar driver, from Matei
    Pavaluca.

 6) Allow for larger reordering levels in TCP, which are easily
    achievable in the real world right now, from Eric Dumazet.

 7) Add a variable of napi_schedule that doesn't need to disable cpu
    interrupts, from Eric Dumazet.

 8) Use a doubly linked list to optimize neigh_parms_release(), from
    Nicolas Dichtel.

 9) Various enhancements to the kernel BPF verifier, and allow eBPF
    programs to actually be attached to sockets.  From Alexei
    Starovoitov.

10) Support TSO/LSO in sunvnet driver, from David L Stevens.

11) Allow controlling ECN usage via routing metrics, from Florian
    Westphal.

12) Remote checksum offload, from Tom Herbert.

13) Add split-header receive, BQL, and xmit_more support to amd-xgbe
    driver, from Thomas Lendacky.

14) Add MPLS support to openvswitch, from Simon Horman.

15) Support wildcard tunnel endpoints in ipv6 tunnels, from Steffen
    Klassert.

16) Do gro flushes on a per-device basis using a timer, from Eric
    Dumazet.  This tries to resolve the conflicting goals between the
    desired handling of bulk vs.  RPC-like traffic.

17) Allow userspace to ask for the CPU upon what a packet was
    received/steered, via SO_INCOMING_CPU.  From Eric Dumazet.

18) Limit GSO packets to half the current congestion window, from Eric
    Dumazet.

19) Add a generic helper so that all drivers set their RSS keys in a
    consistent way, from Eric Dumazet.

20) Add xmit_more support to enic driver, from Govindarajulu
    Varadarajan.

21) Add VLAN packet scheduler action, from Jiri Pirko.

22) Support configurable RSS hash functions via ethtool, from Eyal
    Perry.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1820 commits)
  Fix race condition between vxlan_sock_add and vxlan_sock_release
  net/macb: fix compilation warning for print_hex_dump() called with skb->mac_header
  net/mlx4: Add support for A0 steering
  net/mlx4: Refactor QUERY_PORT
  net/mlx4_core: Add explicit error message when rule doesn't meet configuration
  net/mlx4: Add A0 hybrid steering
  net/mlx4: Add mlx4_bitmap zone allocator
  net/mlx4: Add a check if there are too many reserved QPs
  net/mlx4: Change QP allocation scheme
  net/mlx4_core: Use tasklet for user-space CQ completion events
  net/mlx4_core: Mask out host side virtualization features for guests
  net/mlx4_en: Set csum level for encapsulated packets
  be2net: Export tunnel offloads only when a VxLAN tunnel is created
  gianfar: Fix dma check map error when DMA_API_DEBUG is enabled
  cxgb4/csiostor: Don't use MASTER_MUST for fw_hello call
  net: fec: only enable mdio interrupt before phy device link up
  net: fec: clear all interrupt events to support i.MX6SX
  net: fec: reset fep link status in suspend function
  net: sock: fix access via invalid file descriptor
  net: introduce helper macro for_each_cmsghdr
  ...
2014-12-11 14:27:06 -08:00
Prarit Bhargava
9e3961a097 kernel: add panic_on_warn
There have been several times where I have had to rebuild a kernel to
cause a panic when hitting a WARN() in the code in order to get a crash
dump from a system.  Sometimes this is easy to do, other times (such as
in the case of a remote admin) it is not trivial to send new images to
the user.

A much easier method would be a switch to change the WARN() over to a
panic.  This makes debugging easier in that I can now test the actual
image the WARN() was seen on and I do not have to engage in remote
debugging.

This patch adds a panic_on_warn kernel parameter and
/proc/sys/kernel/panic_on_warn calls panic() in the
warn_slowpath_common() path.  The function will still print out the
location of the warning.

An example of the panic_on_warn output:

The first line below is from the WARN_ON() to output the WARN_ON()'s
location.  After that the panic() output is displayed.

    WARNING: CPU: 30 PID: 11698 at /home/prarit/dummy_module/dummy-module.c:25 init_dummy+0x1f/0x30 [dummy_module]()
    Kernel panic - not syncing: panic_on_warn set ...

    CPU: 30 PID: 11698 Comm: insmod Tainted: G        W  OE  3.17.0+ #57
    Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.00.29.D696.1311111329 11/11/2013
     0000000000000000 000000008e3f87df ffff88080f093c38 ffffffff81665190
     0000000000000000 ffffffff818aea3d ffff88080f093cb8 ffffffff8165e2ec
     ffffffff00000008 ffff88080f093cc8 ffff88080f093c68 000000008e3f87df
    Call Trace:
     [<ffffffff81665190>] dump_stack+0x46/0x58
     [<ffffffff8165e2ec>] panic+0xd0/0x204
     [<ffffffffa038e05f>] ? init_dummy+0x1f/0x30 [dummy_module]
     [<ffffffff81076b90>] warn_slowpath_common+0xd0/0xd0
     [<ffffffffa038e040>] ? dummy_greetings+0x40/0x40 [dummy_module]
     [<ffffffff81076c8a>] warn_slowpath_null+0x1a/0x20
     [<ffffffffa038e05f>] init_dummy+0x1f/0x30 [dummy_module]
     [<ffffffff81002144>] do_one_initcall+0xd4/0x210
     [<ffffffff811b52c2>] ? __vunmap+0xc2/0x110
     [<ffffffff810f8889>] load_module+0x16a9/0x1b30
     [<ffffffff810f3d30>] ? store_uevent+0x70/0x70
     [<ffffffff810f49b9>] ? copy_module_from_fd.isra.44+0x129/0x180
     [<ffffffff810f8ec6>] SyS_finit_module+0xa6/0xd0
     [<ffffffff8166cf29>] system_call_fastpath+0x12/0x17

Successfully tested by me.

hpa said: There is another very valid use for this: many operators would
rather a machine shuts down than being potentially compromised either
functionally or security-wise.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-10 17:41:10 -08:00
Eric Dumazet
960fb622f8 net: provide a per host RSS key generic infrastructure
RSS (Receive Side Scaling) typically uses Toeplitz hash and a 40 or 52 bytes
RSS key.

Some drivers use a constant (and well known key), some drivers use a random
key per port, making bonding setups hard to tune. Well known keys increase
attack surface, considering that number of queues is usually a power of two.

This patch provides infrastructure to help drivers doing the right thing.

netdev_rss_key_fill() should be used by drivers to initialize their RSS key,
even if they provide ethtool -X support to let user redefine the key later.

A new /proc/sys/net/core/netdev_rss_key file can be used to get the host
RSS key even for drivers not providing ethtool -x support, in case some
applications want to precisely setup flows to match some RX queues.

Tested:

myhost:~# cat /proc/sys/net/core/netdev_rss_key
11:63:99:bb:79:fb:a5:a7:07:45:b2:20:bf:02:42:2d:08:1a:dd:19:2b:6b:23:ac:56:28:9d:70:c3:ac:e8:16:4b:b7:c1:10:53:a4:78:41:36:40:74:b6:15:ca:27:44:aa:b3:4d:72

myhost:~# ethtool -x eth0
RX flow hash indirection table for eth0 with 8 RX ring(s):
    0:      0     1     2     3     4     5     6     7
RSS hash key:
11:63:99:bb:79:fb:a5:a7:07:45:b2:20:bf:02:42:2d:08:1a:dd:19:2b:6b:23:ac:56:28:9d:70:c3:ac:e8:16:4b:b7:c1:10:53:a4:78:41

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-16 15:59:11 -05:00
Joe Perches
ba7a46f16d net: Convert LIMIT_NETDEBUG to net_dbg_ratelimited
Use the more common dynamic_debug capable net_dbg_ratelimited
and remove the LIMIT_NETDEBUG macro.

All messages are still ratelimited.

Some KERN_<LEVEL> uses are changed to KERN_DEBUG.

This may have some negative impact on messages that were
emitted at KERN_INFO that are not not enabled at all unless
DEBUG is defined or dynamic_debug is enabled.  Even so,
these messages are now _not_ emitted by default.

This also eliminates the use of the net_msg_warn sysctl
"/proc/sys/net/core/warnings".  For backward compatibility,
the sysctl is not removed, but it has no function.  The extern
declaration of net_msg_warn is removed from sock.h and made
static in net/core/sysctl_net_core.c

Miscellanea:

o Update the sysctl documentation
o Remove the embedded uses of pr_fmt
o Coalesce format fragments
o Realign arguments

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-11 14:10:31 -05:00
Oleg Nesterov
b03023ecbd coredump: add %i/%I in core_pattern to report the tid of the crashed thread
format_corename() can only pass the leader's pid to the core handler,
but there is no simple way to figure out which thread originated the
coredump.

As Jan explains, this also means that there is no simple way to create
the backtrace of the crashed process:

As programs are mostly compiled with implicit gcc -fomit-frame-pointer
one needs program's .eh_frame section (equivalently PT_GNU_EH_FRAME
segment) or .debug_frame section.  .debug_frame usually is present only
in separate debug info files usually not even installed on the system.
While .eh_frame is a part of the executable/library (and it is even
always mapped for C++ exceptions unwinding) it no longer has to be
present anywhere on the disk as the program could be upgraded in the
meantime and the running instance has its executable file already
unlinked from disk.

One possibility is to echo 0x3f >/proc/*/coredump_filter and dump all
the file-backed memory including the executable's .eh_frame section.
But that can create huge core files, for example even due to mmapped
data files.

Other possibility would be to read .eh_frame from /proc/PID/mem at the
core_pattern handler time of the core dump.  For the backtrace one needs
to read the register state first which can be done from core_pattern
handler:

    ptrace(PTRACE_SEIZE, tid, 0, PTRACE_O_TRACEEXIT)
    close(0);    // close pipe fd to resume the sleeping dumper
    waitpid();   // should report EXIT
    PTRACE_GETREGS or other requests

The remaining problem is how to get the 'tid' value of the crashed
thread.  It could be read from the first NT_PRSTATUS note of the core
file but that makes the core_pattern handler complicated.

Unfortunately %t is already used so this patch uses %i/%I.

Automatic Bug Reporting Tool (https://github.com/abrt/abrt/wiki/overview)
is experimenting with this.  It is using the elfutils
(https://fedorahosted.org/elfutils/) unwinder for generating the
backtraces.  Apart from not needing matching executables as mentioned
above, another advantage is that we can get the backtrace without saving
the core (which might be quite large) to disk.

[mmilata@redhat.com: final paragraph of changelog]
Signed-off-by: Jan Kratochvil <jan.kratochvil@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
Cc: Mark Wielaard <mjw@redhat.com>
Cc: Martin Milata <mmilata@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-14 02:18:21 +02:00
Erik Hugne
a5325ae5b8 tipc: add name distributor resiliency queue
TIPC name table updates are distributed asynchronously in a cluster,
entailing a risk of certain race conditions. E.g., if two nodes
simultaneously issue conflicting (overlapping) publications, this may
not be detected until both publications have reached a third node, in
which case one of the publications will be silently dropped on that
node. Hence, we end up with an inconsistent name table.

In most cases this conflict is just a temporary race, e.g., one
node is issuing a publication under the assumption that a previous,
conflicting, publication has already been withdrawn by the other node.
However, because of the (rtt related) distributed update delay, this
may not yet hold true on all nodes. The symptom of this failure is a
syslog message: "tipc: Cannot publish {%u,%u,%u}, overlap error".

In this commit we add a resiliency queue at the receiving end of
the name table distributor. When insertion of an arriving publication
fails, we retain it in this queue for a short amount of time, assuming
that another update will arrive very soon and clear the conflict. If so
happens, we insert the publication, otherwise we drop it.

The (configurable) retention value defaults to 2000 ms. Knowing from
experience that the situation described above is extremely rare, there
is no risk that the queue will accumulate any large number of items.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-01 17:51:48 -07:00
Josh Hunt
69361eef90 panic: add TAINT_SOFTLOCKUP
This taint flag will be set if the system has ever entered a softlockup
state.  Similar to TAINT_WARN it is useful to know whether or not the
system has been in a softlockup state when debugging.

[akpm@linux-foundation.org: apply the taint before calling panic()]
Signed-off-by: Josh Hunt <johunt@akamai.com>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08 15:57:24 -07:00
Aaron Tomlin
ed235875e2 kernel/watchdog.c: print traces for all cpus on lockup detection
A 'softlockup' is defined as a bug that causes the kernel to loop in
kernel mode for more than a predefined period to time, without giving
other tasks a chance to run.

Currently, upon detection of this condition by the per-cpu watchdog
task, debug information (including a stack trace) is sent to the system
log.

On some occasions, we have observed that the "victim" rather than the
actual "culprit" (i.e.  the owner/holder of the contended resource) is
reported to the user.  Often this information has proven to be
insufficient to assist debugging efforts.

To avoid loss of useful debug information, for architectures which
support NMI, this patch makes it possible to improve soft lockup
reporting.  This is accomplished by issuing an NMI to each cpu to obtain
a stack trace.

If NMI is not supported we just revert back to the old method.  A sysctl
and boot-time parameter is available to toggle this feature.

[dzickus@redhat.com: add CONFIG_SMP in certain areas]
[akpm@linux-foundation.org: additional CONFIG_SMP=n optimisations]
[mq@suse.cz: fix warning]
Signed-off-by: Aaron Tomlin <atomlin@redhat.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Mateusz Guzik <mguzik@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Jan Moskyto Matejka <mq@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-23 16:47:44 -07:00
David Rientjes
7cd2b0a34a mm, pcp: allow restoring percpu_pagelist_fraction default
Oleg reports a division by zero error on zero-length write() to the
percpu_pagelist_fraction sysctl:

    divide error: 0000 [#1] SMP DEBUG_PAGEALLOC
    CPU: 1 PID: 9142 Comm: badarea_io Not tainted 3.15.0-rc2-vm-nfs+ #19
    Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
    task: ffff8800d5aeb6e0 ti: ffff8800d87a2000 task.ti: ffff8800d87a2000
    RIP: 0010: percpu_pagelist_fraction_sysctl_handler+0x84/0x120
    RSP: 0018:ffff8800d87a3e78  EFLAGS: 00010246
    RAX: 0000000000000f89 RBX: ffff88011f7fd000 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000010
    RBP: ffff8800d87a3e98 R08: ffffffff81d002c8 R09: ffff8800d87a3f50
    R10: 000000000000000b R11: 0000000000000246 R12: 0000000000000060
    R13: ffffffff81c3c3e0 R14: ffffffff81cfddf8 R15: ffff8801193b0800
    FS:  00007f614f1e9740(0000) GS:ffff88011f440000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 00007f614f1fa000 CR3: 00000000d9291000 CR4: 00000000000006e0
    Call Trace:
      proc_sys_call_handler+0xb3/0xc0
      proc_sys_write+0x14/0x20
      vfs_write+0xba/0x1e0
      SyS_write+0x46/0xb0
      tracesys+0xe1/0xe6

However, if the percpu_pagelist_fraction sysctl is set by the user, it
is also impossible to restore it to the kernel default since the user
cannot write 0 to the sysctl.

This patch allows the user to write 0 to restore the default behavior.
It still requires a fraction equal to or larger than 8, however, as
stated by the documentation for sanity.  If a value in the range [1, 7]
is written, the sysctl will return EINVAL.

This successfully solves the divide by zero issue at the same time.

Signed-off-by: David Rientjes <rientjes@google.com>
Reported-by: Oleg Drokin <green@linuxhacker.ru>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-23 16:47:43 -07:00
Kees Cook
f4aacea2f5 sysctl: allow for strict write position handling
When writing to a sysctl string, each write, regardless of VFS position,
begins writing the string from the start.  This means the contents of
the last write to the sysctl controls the string contents instead of the
first:

  open("/proc/sys/kernel/modprobe", O_WRONLY)   = 1
  write(1, "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"..., 4096) = 4096
  write(1, "/bin/true", 9)                = 9
  close(1)                                = 0

  $ cat /proc/sys/kernel/modprobe
  /bin/true

Expected behaviour would be to have the sysctl be "AAAA..." capped at
maxlen (in this case KMOD_PATH_LEN: 256), instead of truncating to the
contents of the second write.  Similarly, multiple short writes would
not append to the sysctl.

The old behavior is unlike regular POSIX files enough that doing audits
of software that interact with sysctls can end up in unexpected or
dangerous situations.  For example, "as long as the input starts with a
trusted path" turns out to be an insufficient filter, as what must also
happen is for the input to be entirely contained in a single write
syscall -- not a common consideration, especially for high level tools.

This provides kernel.sysctl_writes_strict as a way to make this behavior
act in a less surprising manner for strings, and disallows non-zero file
position when writing numeric sysctls (similar to what is already done
when reading from non-zero file positions).  For now, the default (0) is
to warn about non-zero file position use, but retain the legacy
behavior.  Setting this to -1 disables the warning, and setting this to
1 enables the file position respecting behavior.

[akpm@linux-foundation.org: fix build]
[akpm@linux-foundation.org: move misplaced hunk, per Randy]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:13 -07:00
Denys Vlasenko
4a0da71b96 Documentation/sysctl/vm.txt: clarify vfs_cache_pressure description
Existing description is worded in a way which almost encourages setting of
vfs_cache_pressure above 100, possibly way above it.

Users are left in a dark what this numeric value is - an int?  a
percentage?  what the scale is?

As a result, we are getting reports about noticeable performance
degradation from users who have set vfs_cache_pressure to ridiculously
high values - because they thought there is no downside to it.

Via code inspection it's obvious that this value is treated as a
percentage.  This patch changes text to reflect this fact, and adds a
cautionary paragraph advising against setting vfs_cache_pressure sky high.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:54:13 -07:00
Mel Gorman
4f9b16a647 mm: disable zone_reclaim_mode by default
When it was introduced, zone_reclaim_mode made sense as NUMA distances
punished and workloads were generally partitioned to fit into a NUMA
node.  NUMA machines are now common but few of the workloads are
NUMA-aware and it's routine to see major performance degradation due to
zone_reclaim_mode being enabled but relatively few can identify the
problem.

Those that require zone_reclaim_mode are likely to be able to detect
when it needs to be enabled and tune appropriately so lets have a
sensible default for the bulk of users.

This patch (of 2):

zone_reclaim_mode causes processes to prefer reclaiming memory from
local node instead of spilling over to other nodes.  This made sense
initially when NUMA machines were almost exclusively HPC and the
workload was partitioned into nodes.  The NUMA penalties were
sufficiently high to justify reclaiming the memory.  On current machines
and workloads it is often the case that zone_reclaim_mode destroys
performance but not all users know how to detect this.  Favour the
common case and disable it by default.  Users that are sophisticated
enough to know they need zone_reclaim_mode will detect it.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:53:59 -07:00
Liu Hua
80df284765 hung_task: check the value of "sysctl_hung_task_timeout_sec"
As sysctl_hung_task_timeout_sec is unsigned long, when this value is
larger then LONG_MAX/HZ, the function schedule_timeout_interruptible in
watchdog will return immediately without sleep and with print :

  schedule_timeout: wrong timeout value ffffffffffffff83

and then the funtion watchdog will call schedule_timeout_interruptible
again and again.  The screen will be filled with

	"schedule_timeout: wrong timeout value ffffffffffffff83"

This patch does some check and correction in sysctl, to let the function
schedule_timeout_interruptible allways get the valid parameter.

Signed-off-by: Liu Hua <sdu.liu@huawei.com>
Tested-by: Satoru Takeuchi <satoru.takeuchi@gmail.com>
Cc: <stable@vger.kernel.org>	[3.4+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 16:36:07 -07:00
Linus Torvalds
6f4c98e1c2 Nothing major: the stricter permissions checking for sysfs broke
a staging driver; fix included.  Greg KH said he'd take the patch
 but hadn't as the merge window opened, so it's included here
 to avoid breaking build.
 
 Cheers,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJTQMH9AAoJENkgDmzRrbjxo4UP/jwlenP44v+RFpo/dn8Z8E2n
 SREQscU5ZZKvuyFD6kUdvOz8YC/nTrJvXoVkMUF05GVbuvb8/8UPtT9ECVemd0rW
 xNy4aFfv9rbrqRLBLpLK9LAgTuhwlbTgGxgL78zRn3hWmf1hBZWCY+cEvKM8l/+9
 oEQdORL0sUpZh7iryAeGqbOrXT4gqJEvSLOFwiYTSo6ryzWIilmdXSUAh6s8MIEX
 PR1+oH9J8B6J29lcXKMf8/sDI1EBUeSLdBmMCuN5Y7xpYxsQLroVx94kPbdBY+XK
 ZRoYuUGSUJfGRZY46cFKApIGeF07z1DGoyXghbSWEQrI+23TMUmrKUg47LSukE4Y
 yCUf8HAtqIA3gVc9GKDdSp/2UpkAhTTv5ogKgnIzs1InWtOIBdDRSVUQXDosFEXw
 6ZZe1pQs2zfXyXxO4j0Wq36K4RgI0aqOVw+dcC+w5BidjVylgnYRV0PSDd72tid7
 bIfnjDbUBo+o4LanPNGYK474KyO7AslgTE50w6zwbJzgdwCQ36hCpKqScBZzm60a
 42LrgTVoIHHWAL1tDzWL/LzWflZGdJAezzNje0/f2Q3bGMiNHWoljAvUphkTZ7qt
 E8+jWqmM+riH3e8Y5wKpO1BKt7NGHISEy//bUlnqTwisjIzVILZ6VjfugQ1AI+0x
 llTXPBotFvfvXqxunBg7
 =yzUO
 -----END PGP SIGNATURE-----

Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull module updates from Rusty Russell:
 "Nothing major: the stricter permissions checking for sysfs broke a
  staging driver; fix included.  Greg KH said he'd take the patch but
  hadn't as the merge window opened, so it's included here to avoid
  breaking build"

* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  staging: fix up speakup kobject mode
  Use 'E' instead of 'X' for unsigned module taint flag.
  VERIFY_OCTAL_PERMISSIONS: stricter checking for sysfs perms.
  kallsyms: fix percpu vars on x86-64 with relocation.
  kallsyms: generalize address range checking
  module: LLVMLinux: Remove unused function warning from __param_check macro
  Fix: module signature vs tracepoints: add new TAINT_UNSIGNED_MODULE
  module: remove MODULE_GENERIC_TABLE
  module: allow multiple calls to MODULE_DEVICE_TABLE() per module
  module: use pr_cont
2014-04-06 09:38:07 -07:00
Dave Hansen
5509a5d27b drop_caches: add some documentation and info message
There is plenty of anecdotal evidence and a load of blog posts
suggesting that using "drop_caches" periodically keeps your system
running in "tip top shape".  Perhaps adding some kernel documentation
will increase the amount of accurate data on its use.

If we are not shrinking caches effectively, then we have real bugs.
Using drop_caches will simply mask the bugs and make them harder to
find, but certainly does not fix them, nor is it an appropriate
"workaround" to limit the size of the caches.  On the contrary, there
have been bug reports on issues that turned out to be misguided use of
cache dropping.

Dropping caches is a very drastic and disruptive operation that is good
for debugging and running tests, but if it creates bug reports from
production use, kernel developers should be aware of its use.

Add a bit more documentation about it, a syslog message to track down
abusers, and vmstat drop counters to help analyze problem reports.

[akpm@linux-foundation.org: checkpatch fixes]
[hannes@cmpxchg.org: add runtime suppression control]
Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:21:04 -07:00