Commit graph

1508 commits

Author SHA1 Message Date
Will Drewry
5d4ac07c90 CHROMIUM: dm: boot time specification of dm=
This is a wrap-up of three patches pending upstream approval.
I'm bundling them because they are interdependent, and it'll be
easier to drop it on rebase later.

1. dm: allow a dm-fs-style device to be shared via dm-ioctl

Integrates feedback from Alisdair, Mike, and Kiyoshi.

Two main changes occur here:

- One function is added which allows for a programmatically created
mapped device to be inserted into the dm-ioctl hash table.  This binds
the device to a name and, optional, uuid which is needed by udev and
allows for userspace management of the mapped device.

- dm_table_complete() was extended to handle all of the final
functional changes required for the table to be operational once
called.

2. init: boot to device-mapper targets without an initr*

Add a dm= kernel parameter modeled after the md= parameter from
do_mounts_md.  It allows for device-mapper targets to be configured at
boot time for use early in the boot process (as the root device or
otherwise).  It also replaces /dev/XXX calls with major:minor opportunistically.

The format is dm="name uuid ro,table line 1,table line 2,...".  The
parser expects the comma to be safe to use as a newline substitute but,
otherwise, uses the normal separator of space.  Some attempt has been
made to make it forgiving of additional spaces (using skip_spaces()).

A mapped device created during boot will be assigned a minor of 0 and
may be access via /dev/dm-0.

An example dm-linear root with no uuid may look like:

root=/dev/dm-0  dm="lroot none ro, 0 4096 linear /dev/ubdb 0, 4096 4096 linear /dv/ubdc 0"

Once udev is started, /dev/dm-0 will become /dev/mapper/lroot.

Older upstream threads:
http://marc.info/?l=dm-devel&m=127429492521964&w=2
http://marc.info/?l=dm-devel&m=127429499422096&w=2
http://marc.info/?l=dm-devel&m=127429493922000&w=2

Latest upstream threads:
https://patchwork.kernel.org/patch/104859/
https://patchwork.kernel.org/patch/104860/
https://patchwork.kernel.org/patch/104861/

Bug: 27175947

Signed-off-by: Will Drewry <wad@chromium.org>

Review URL: http://codereview.chromium.org/2020011

Change-Id: I92bd53432a11241228d2e5ac89a3b20d19b05a31
2016-08-18 18:56:03 +05:30
Srivatsa Vaddagiri
efb86bd08a sched: Introduce Window Assisted Load Tracking (WALT)
use a window based view of time in order to track task
demand and CPU utilization in the scheduler.

Window Assisted Load Tracking (WALT) implementation credits:
 Srivatsa Vaddagiri, Steve Muckle, Syed Rameez Mustafa, Joonwoo Park,
 Pavan Kumar Kondeti, Olav Haugan

2016-03-06: Integration with EAS/refactoring by Vikram Mulukutla
            and Todd Kjos

Change-Id: I21408236836625d4e7d7de1843d20ed5ff36c708

Includes fixes for issues:

eas/walt: Use walt_ktime_clock() instead of ktime_get_ns() to avoid a
race resulting in watchdog resets
BUG: 29353986
Change-Id: Ic1820e22a136f7c7ebd6f42e15f14d470f6bbbdb

Handle walt accounting anomoly during resume

During resume, there is a corner case where on wakeup, a task's
prev_runnable_sum can go negative. This is a workaround that
fixes the condition and warns (instead of crashing).

BUG: 29464099
Change-Id: I173e7874324b31a3584435530281708145773508

Signed-off-by: Todd Kjos <tkjos@google.com>
Signed-off-by: Srinath Sridharan <srinathsr@google.com>
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
[jstultz: fwdported to 4.4]
Signed-off-by: John Stultz <john.stultz@linaro.org>
2016-08-11 14:26:43 -07:00
Patrick Bellasi
765c2ab363 FIXUP: sched: fix build for non-SMP target
Currently the build for a single-core (e.g. user-mode) Linux is broken
and this configuration is required (at least) to run some network tests.

The main issues for the current code support on single-core systems are:
1. {se,rq}::sched_avg is not available nor maintained for !SMP systems
   This means that load and utilisation signals are NOT available in single
   core systems. All the EAS code depends on these signals.
2. sched_group_energy is also SMP dependant. Again this means that all the
   EAS setup and preparation code (energyn model initialization) has to be
   properly guarded/disabled for !SMP systems.
3. SchedFreq depends on utilization signal, which is not available on
   !SMP systems.
4. SchedTune is useless on unicore systems if SchedFreq is not available.
5. WALT machinery is not required on single-core systems.

This patch addresses all these issues by enforcing some constraints for
single-core systems:
a) WALT, SchedTune and SchedTune are now dependant on SMP
b) The default governor for !SMP systems is INTERACTIVE
c) The energy model initialisation/build functions are
d) Other minor code re-arrangements and CONFIG_SMP guarding to enable
   single core builds.

Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
2016-08-10 15:08:01 -07:00
Jeremy Compostella
26cab56d9a ANDROID: dm: fix dm_substitute_devices()
When candidate is the last parameter, candidate_end points to the '\0'
character and not the DM_FIELD_SEP character.  In such a situation, we
should not move the candidate_end pointer one character backward.

Signed-off-by: Jeremy Compostella <jeremy.compostella@intel.com>
2016-08-10 13:24:12 -07:00
Badhri Jagan Sridharan
5a77db7839 ANDROID: dm: Rebase on top of 4.1
1. "dm: optimize use SRCU and RCU" removes the use of dm_table_put.
2. "dm: remove request-based logic from make_request_fn wrapper" necessitates
    calling dm_setup_md_queue or else the request_queue's make_request_fn
    pointer ends being unset.

[    7.711600] Internal error: Oops - bad mode: 0 [#1] PREEMPT SMP
[    7.717519] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G        W       4.1.15-02273-gb057d16-dirty #33
[    7.726559] Hardware name: HiKey Development Board (DT)
[    7.731779] task: ffffffc005f8acc0 ti: ffffffc005f8c000 task.ti: ffffffc005f8c000
[    7.739257] PC is at 0x0
[    7.741787] LR is at generic_make_request+0x8c/0x108
....
[    9.082931] Call trace:
[    9.085372] [<          (null)>]           (null)
[    9.090074] [<ffffffc0003f4ac0>] submit_bio+0x98/0x1e0
[    9.095212] [<ffffffc0001e2618>] _submit_bh+0x120/0x1f0
[    9.096165] cfg80211: Calling CRDA to update world regulatory domain
[    9.106781] [<ffffffc0001e5450>] __bread_gfp+0x94/0x114
[    9.112004] [<ffffffc00024a748>] ext4_fill_super+0x18c/0x2d64
[    9.117750] [<ffffffc0001b275c>] mount_bdev+0x194/0x1c0
[    9.122973] [<ffffffc0002450dc>] ext4_mount+0x14/0x1c
[    9.128021] [<ffffffc0001b29a0>] mount_fs+0x3c/0x194
[    9.132985] [<ffffffc0001d059c>] vfs_kern_mount+0x4c/0x134
[    9.138467] [<ffffffc0001d2168>] do_mount+0x204/0xbbc
[    9.143514] [<ffffffc0001d2ec4>] SyS_mount+0x94/0xe8
[    9.148479] [<ffffffc000c54074>] mount_block_root+0x120/0x24c
[    9.154222] [<ffffffc000c543e8>] mount_root+0x110/0x12c
[    9.159443] [<ffffffc000c54574>] prepare_namespace+0x170/0x1b8
[    9.165273] [<ffffffc000c53d98>] kernel_init_freeable+0x23c/0x260
[    9.171365] [<ffffffc0009b1748>] kernel_init+0x10/0x118
[    9.176589] Code: bad PC value
[    9.179807] ---[ end trace 75e1bc52ba364d13 ]---

Bug: 27175947

Signed-off-by: Badhri Jagan Sridharan <Badhri@google.com>
Change-Id: I952d86fd1475f0825f9be1386e3497b36127abd0
2016-08-10 13:23:56 -07:00
Will Drewry
96b0434c25 CHROMIUM: dm: boot time specification of dm=
This is a wrap-up of three patches pending upstream approval.
I'm bundling them because they are interdependent, and it'll be
easier to drop it on rebase later.

1. dm: allow a dm-fs-style device to be shared via dm-ioctl

Integrates feedback from Alisdair, Mike, and Kiyoshi.

Two main changes occur here:

- One function is added which allows for a programmatically created
mapped device to be inserted into the dm-ioctl hash table.  This binds
the device to a name and, optional, uuid which is needed by udev and
allows for userspace management of the mapped device.

- dm_table_complete() was extended to handle all of the final
functional changes required for the table to be operational once
called.

2. init: boot to device-mapper targets without an initr*

Add a dm= kernel parameter modeled after the md= parameter from
do_mounts_md.  It allows for device-mapper targets to be configured at
boot time for use early in the boot process (as the root device or
otherwise).  It also replaces /dev/XXX calls with major:minor opportunistically.

The format is dm="name uuid ro,table line 1,table line 2,...".  The
parser expects the comma to be safe to use as a newline substitute but,
otherwise, uses the normal separator of space.  Some attempt has been
made to make it forgiving of additional spaces (using skip_spaces()).

A mapped device created during boot will be assigned a minor of 0 and
may be access via /dev/dm-0.

An example dm-linear root with no uuid may look like:

root=/dev/dm-0  dm="lroot none ro, 0 4096 linear /dev/ubdb 0, 4096 4096 linear /dv/ubdc 0"

Once udev is started, /dev/dm-0 will become /dev/mapper/lroot.

Older upstream threads:
http://marc.info/?l=dm-devel&m=127429492521964&w=2
http://marc.info/?l=dm-devel&m=127429499422096&w=2
http://marc.info/?l=dm-devel&m=127429493922000&w=2

Latest upstream threads:
https://patchwork.kernel.org/patch/104859/
https://patchwork.kernel.org/patch/104860/
https://patchwork.kernel.org/patch/104861/

Bug: 27175947

Signed-off-by: Will Drewry <wad@chromium.org>

Review URL: http://codereview.chromium.org/2020011

Change-Id: I92bd53432a11241228d2e5ac89a3b20d19b05a31
2016-08-10 13:23:42 -07:00
Runmin Wang
750075feff Merge remote-tracking branch 'origin/tmp-917a9a9133a6' into lsk
* tmp-917a9:
  ARM/vdso: Mark the vDSO code read-only after init
  x86/vdso: Mark the vDSO code read-only after init
  lkdtm: Verify that '__ro_after_init' works correctly
  arch: Introduce post-init read-only memory
  x86/mm: Always enable CONFIG_DEBUG_RODATA and remove the Kconfig option
  mm/init: Add 'rodata=off' boot cmdline parameter to disable read-only kernel mappings
  asm-generic: Consolidate mark_rodata_ro()
  Linux 4.4.6
  ld-version: Fix awk regex compile failure
  target: Drop incorrect ABORT_TASK put for completed commands
  block: don't optimize for non-cloned bio in bio_get_last_bvec()
  MIPS: smp.c: Fix uninitialised temp_foreign_map
  MIPS: Fix build error when SMP is used without GIC
  ovl: fix getcwd() failure after unsuccessful rmdir
  ovl: copy new uid/gid into overlayfs runtime inode
  userfaultfd: don't block on the last VM updates at exit time
  powerpc/powernv: Fix OPAL_CONSOLE_FLUSH prototype and usages
  powerpc/powernv: Add a kmsg_dumper that flushes console output on panic
  powerpc: Fix dedotify for binutils >= 2.26
  Revert "drm/radeon/pm: adjust display configuration after powerstate"
  drm/radeon: Fix error handling in radeon_flip_work_func.
  drm/amdgpu: Fix error handling in amdgpu_flip_work_func.
  Revert "drm/radeon: call hpd_irq_event on resume"
  x86/mm: Fix slow_virt_to_phys() for X86_PAE again
  gpu: ipu-v3: Do not bail out on missing optional port nodes
  mac80211: Fix Public Action frame RX in AP mode
  mac80211: check PN correctly for GCMP-encrypted fragmented MPDUs
  mac80211: minstrel_ht: fix a logic error in RTS/CTS handling
  mac80211: minstrel_ht: set default tx aggregation timeout to 0
  mac80211: fix use of uninitialised values in RX aggregation
  mac80211: minstrel: Change expected throughput unit back to Kbps
  iwlwifi: mvm: inc pending frames counter also when txing non-sta
  can: gs_usb: fixed disconnect bug by removing erroneous use of kfree()
  cfg80211/wext: fix message ordering
  wext: fix message delay/ordering
  ovl: fix working on distributed fs as lower layer
  ovl: ignore lower entries when checking purity of non-directory entries
  ASoC: wm8958: Fix enum ctl accesses in a wrong type
  ASoC: wm8994: Fix enum ctl accesses in a wrong type
  ASoC: samsung: Use IRQ safe spin lock calls
  ASoC: dapm: Fix ctl value accesses in a wrong type
  ncpfs: fix a braino in OOM handling in ncp_fill_cache()
  jffs2: reduce the breakage on recovery from halfway failed rename()
  dmaengine: at_xdmac: fix residue computation
  tracing: Fix check for cpu online when event is disabled
  s390/dasd: fix diag 0x250 inline assembly
  s390/mm: four page table levels vs. fork
  KVM: MMU: fix reserved bit check for ept=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0
  KVM: MMU: fix ept=0/pte.u=1/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo
  KVM: PPC: Book3S HV: Sanitize special-purpose register values on guest exit
  KVM: s390: correct fprs on SIGP (STOP AND) STORE STATUS
  KVM: VMX: disable PEBS before a guest entry
  kvm: cap halt polling at exactly halt_poll_ns
  PCI: Allow a NULL "parent" pointer in pci_bus_assign_domain_nr()
  ARM: OMAP2+: hwmod: Introduce ti,no-idle dt property
  ARM: dts: dra7: do not gate cpsw clock due to errata i877
  ARM: mvebu: fix overlap of Crypto SRAM with PCIe memory window
  arm64: account for sparsemem section alignment when choosing vmemmap offset
  Linux 4.4.5
  drm/amdgpu: fix topaz/tonga gmc assignment in 4.4 stable
  modules: fix longstanding /proc/kallsyms vs module insertion race.
  drm/i915: refine qemu south bridge detection
  drm/i915: more virtual south bridge detection
  block: get the 1st and last bvec via helpers
  block: check virt boundary in bio_will_gap()
  drm/amdgpu: Use drm_calloc_large for VM page_tables array
  thermal: cpu_cooling: fix out of bounds access in time_in_idle
  i2c: brcmstb: allocate correct amount of memory for regmap
  ubi: Fix out of bounds write in volume update code
  cxl: Fix PSL timebase synchronization detection
  MIPS: traps: Fix SIGFPE information leak from `do_ov' and `do_trap_or_bp'
  MIPS: scache: Fix scache init with invalid line size.
  USB: serial: option: add support for Quectel UC20
  USB: serial: option: add support for Telit LE922 PID 0x1045
  USB: qcserial: add Sierra Wireless EM74xx device ID
  USB: qcserial: add Dell Wireless 5809e Gobi 4G HSPA+ (rev3)
  USB: cp210x: Add ID for Parrot NMEA GPS Flight Recorder
  usb: chipidea: otg: change workqueue ci_otg as freezable
  ALSA: timer: Fix broken compat timer user status ioctl
  ALSA: hdspm: Fix zero-division
  ALSA: hdsp: Fix wrong boolean ctl value accesses
  ALSA: hdspm: Fix wrong boolean ctl value accesses
  ALSA: seq: oss: Don't drain at closing a client
  ALSA: pcm: Fix ioctls for X32 ABI
  ALSA: timer: Fix ioctls for X32 ABI
  ALSA: rawmidi: Fix ioctls X32 ABI
  ALSA: hda - Fix mic issues on Acer Aspire E1-472
  ALSA: ctl: Fix ioctls for X32 ABI
  ALSA: usb-audio: Add a quirk for Plantronics DA45
  adv7604: fix tx 5v detect regression
  dmaengine: pxa_dma: fix cyclic transfers
  Fix directory hardlinks from deleted directories
  jffs2: Fix page lock / f->sem deadlock
  Revert "jffs2: Fix lock acquisition order bug in jffs2_write_begin"
  Btrfs: fix loading of orphan roots leading to BUG_ON
  pata-rb532-cf: get rid of the irq_to_gpio() call
  tracing: Do not have 'comm' filter override event 'comm' field
  ata: ahci: don't mark HotPlugCapable Ports as external/removable
  PM / sleep / x86: Fix crash on graph trace through x86 suspend
  arm64: vmemmap: use virtual projection of linear region
  Adding Intel Lewisburg device IDs for SATA
  writeback: flush inode cgroup wb switches instead of pinning super_block
  block: bio: introduce helpers to get the 1st and last bvec
  libata: Align ata_device's id on a cacheline
  libata: fix HDIO_GET_32BIT ioctl
  drm/amdgpu: return from atombios_dp_get_dpcd only when error
  drm/amdgpu/gfx8: specify which engine to wait before vm flush
  drm/amdgpu: apply gfx_v8 fixes to gfx_v7 as well
  drm/amdgpu/pm: update current crtc info after setting the powerstate
  drm/radeon/pm: update current crtc info after setting the powerstate
  drm/ast: Fix incorrect register check for DRAM width
  target: Fix WRITE_SAME/DISCARD conversion to linux 512b sectors
  iommu/vt-d: Use BUS_NOTIFY_REMOVED_DEVICE in hotplug path
  iommu/amd: Fix boot warning when device 00:00.0 is not iommu covered
  iommu/amd: Apply workaround for ATS write permission check
  arm/arm64: KVM: Fix ioctl error handling
  KVM: x86: fix root cause for missed hardware breakpoints
  vfio: fix ioctl error handling
  Fix cifs_uniqueid_to_ino_t() function for s390x
  CIFS: Fix SMB2+ interim response processing for read requests
  cifs: fix out-of-bounds access in lease parsing
  fbcon: set a default value to blink interval
  kvm: x86: Update tsc multiplier on change.
  mips/kvm: fix ioctl error handling
  parisc: Fix ptrace syscall number and return value modification
  PCI: keystone: Fix MSI code that retrieves struct pcie_port pointer
  block: Initialize max_dev_sectors to 0
  drm/amdgpu: mask out WC from BO on unsupported arches
  btrfs: async-thread: Fix a use-after-free error for trace
  btrfs: Fix no_space in write and rm loop
  Btrfs: fix deadlock running delayed iputs at transaction commit time
  drivers: sh: Restore legacy clock domain on SuperH platforms
  use ->d_seq to get coherency between ->d_inode and ->d_flags
  Linux 4.4.4
  iwlwifi: mvm: don't allow sched scans without matches to be started
  iwlwifi: update and fix 7265 series PCI IDs
  iwlwifi: pcie: properly configure the debug buffer size for 8000
  iwlwifi: dvm: fix WoWLAN
  security: let security modules use PTRACE_MODE_* with bitmasks
  IB/cma: Fix RDMA port validation for iWarp
  x86/irq: Plug vector cleanup race
  x86/irq: Call irq_force_move_complete with irq descriptor
  x86/irq: Remove outgoing CPU from vector cleanup mask
  x86/irq: Remove the cpumask allocation from send_cleanup_vector()
  x86/irq: Clear move_in_progress before sending cleanup IPI
  x86/irq: Remove offline cpus from vector cleanup
  x86/irq: Get rid of code duplication
  x86/irq: Copy vectormask instead of an AND operation
  x86/irq: Check vector allocation early
  x86/irq: Reorganize the search in assign_irq_vector
  x86/irq: Reorganize the return path in assign_irq_vector
  x86/irq: Do not use apic_chip_data.old_domain as temporary buffer
  x86/irq: Validate that irq descriptor is still active
  x86/irq: Fix a race in x86_vector_free_irqs()
  x86/irq: Call chip->irq_set_affinity in proper context
  x86/entry/compat: Add missing CLAC to entry_INT80_32
  x86/mpx: Fix off-by-one comparison with nr_registers
  hpfs: don't truncate the file when delete fails
  do_last(): ELOOP failure exit should be done after leaving RCU mode
  should_follow_link(): validate ->d_seq after having decided to follow
  xen/pcifront: Fix mysterious crashes when NUMA locality information was extracted.
  xen/pciback: Save the number of MSI-X entries to be copied later.
  xen/pciback: Check PF instead of VF for PCI_COMMAND_MEMORY
  xen/scsiback: correct frontend counting
  xen/arm: correctly handle DMA mapping of compound pages
  ARM: at91/dt: fix typo in sama5d2 pinmux descriptions
  ARM: OMAP2+: Fix onenand initialization to avoid filesystem corruption
  do_last(): don't let a bogus return value from ->open() et.al. to confuse us
  kernel/resource.c: fix muxed resource handling in __request_region()
  sunrpc/cache: fix off-by-one in qword_get()
  tracing: Fix showing function event in available_events
  powerpc/eeh: Fix partial hotplug criterion
  KVM: x86: MMU: fix ubsan index-out-of-range warning
  KVM: x86: fix conversion of addresses to linear in 32-bit protected mode
  KVM: x86: fix missed hardware breakpoints
  KVM: arm/arm64: vgic: Ensure bitmaps are long enough
  KVM: async_pf: do not warn on page allocation failures
  of/irq: Fix msi-map calculation for nonzero rid-base
  NFSv4: Fix a dentry leak on alias use
  nfs: fix nfs_size_to_loff_t
  block: fix use-after-free in dio_bio_complete
  bio: return EINTR if copying to user space got interrupted
  i2c: i801: Adding Intel Lewisburg support for iTCO
  phy: core: fix wrong err handle for phy_power_on
  writeback: keep superblock pinned during cgroup writeback association switches
  cgroup: make sure a parent css isn't offlined before its children
  cpuset: make mm migration asynchronous
  PCI/AER: Flush workqueue on device remove to avoid use-after-free
  ARCv2: SMP: Emulate IPI to self using software triggered interrupt
  ARCv2: STAR 9000950267: Handle return from intr to Delay Slot #2
  libata: fix sff host state machine locking while polling
  qla2xxx: Fix stale pointer access.
  spi: atmel: fix gpio chip-select in case of non-DT platform
  target: Fix race with SCF_SEND_DELAYED_TAS handling
  target: Fix remote-port TMR ABORT + se_cmd fabric stop
  target: Fix TAS handling for multi-session se_node_acls
  target: Fix LUN_RESET active TMR descriptor handling
  target: Fix LUN_RESET active I/O handling for ACK_KREF
  ALSA: hda - Fixing background noise on Dell Inspiron 3162
  ALSA: hda - Apply clock gate workaround to Skylake, too
  Revert "workqueue: make sure delayed work run in local cpu"
  workqueue: handle NUMA_NO_NODE for unbound pool_workqueue lookup
  mac80211: Requeue work after scan complete for all VIF types.
  rfkill: fix rfkill_fop_read wait_event usage
  tick/nohz: Set the correct expiry when switching to nohz/lowres mode
  perf stat: Do not clean event's private stats
  cdc-acm:exclude Samsung phone 04e8:685d
  Revert "Staging: panel: usleep_range is preferred over udelay"
  Staging: speakup: Fix getting port information
  sd: Optimal I/O size is in bytes, not sectors
  libceph: don't spam dmesg with stray reply warnings
  libceph: use the right footer size when skipping a message
  libceph: don't bail early from try_read() when skipping a message
  libceph: fix ceph_msg_revoke()
  seccomp: always propagate NO_NEW_PRIVS on tsync
  cpufreq: Fix NULL reference crash while accessing policy->governor_data
  cpufreq: pxa2xx: fix pxa_cpufreq_change_voltage prototype
  hwmon: (ads1015) Handle negative conversion values correctly
  hwmon: (gpio-fan) Remove un-necessary speed_index lookup for thermal hook
  hwmon: (dell-smm) Blacklist Dell Studio XPS 8000
  Thermal: do thermal zone update after a cooling device registered
  Thermal: handle thermal zone device properly during system sleep
  Thermal: initialize thermal zone device correctly
  IB/mlx5: Expose correct maximum number of CQE capacity
  IB/qib: Support creating qps with GFP_NOIO flag
  IB/qib: fix mcast detach when qp not attached
  IB/cm: Fix a recently introduced deadlock
  dmaengine: dw: disable BLOCK IRQs for non-cyclic xfer
  dmaengine: at_xdmac: fix resume for cyclic transfers
  dmaengine: dw: fix cyclic transfer callbacks
  dmaengine: dw: fix cyclic transfer setup
  nfit: fix multi-interface dimm handling, acpi6.1 compatibility
  ACPI / PCI / hotplug: unlock in error path in acpiphp_enable_slot()
  ACPI: Revert "ACPI / video: Add Dell Inspiron 5737 to the blacklist"
  ACPI / video: Add disable_backlight_sysfs_if quirk for the Toshiba Satellite R830
  ACPI / video: Add disable_backlight_sysfs_if quirk for the Toshiba Portege R700
  lib: sw842: select crc32
  uapi: update install list after nvme.h rename
  ideapad-laptop: Add Lenovo Yoga 700 to no_hw_rfkill dmi list
  ideapad-laptop: Add Lenovo ideapad Y700-17ISK to no_hw_rfkill dmi list
  toshiba_acpi: Fix blank screen at boot if transflective backlight is supported
  make sure that freeing shmem fast symlinks is RCU-delayed
  drm/radeon/pm: adjust display configuration after powerstate
  drm/radeon: Don't hang in radeon_flip_work_func on disabled crtc. (v2)
  drm: Fix treatment of drm_vblank_offdelay in drm_vblank_on() (v2)
  drm: Fix drm_vblank_pre/post_modeset regression from Linux 4.4
  drm: Prevent vblank counter bumps > 1 with active vblank clients. (v2)
  drm: No-Op redundant calls to drm_vblank_off() (v2)
  drm/radeon: use post-decrement in error handling
  drm/qxl: use kmalloc_array to alloc reloc_info in qxl_process_single_command
  drm/i915: fix error path in intel_setup_gmbus()
  drm/i915/dsi: don't pass arbitrary data to sideband
  drm/i915/dsi: defend gpio table against out of bounds access
  drm/i915/skl: Don't skip mst encoders in skl_ddi_pll_select()
  drm/i915: Don't reject primary plane windowing with color keying enabled on SKL+
  drm/i915/dp: fall back to 18 bpp when sink capability is unknown
  drm/i915: Make sure DC writes are coherent on flush.
  drm/i915: Init power domains early in driver load
  drm/i915: intel_hpd_init(): Fix suspend/resume reprobing
  drm/i915: Restore inhibiting the load of the default context
  drm: fix missing reference counting decrease
  drm/radeon: hold reference to fences in radeon_sa_bo_new
  drm/radeon: mask out WC from BO on unsupported arches
  drm: add helper to check for wc memory support
  drm/radeon: fix DP audio support for APU with DCE4.1 display engine
  drm/radeon: Add a common function for DFS handling
  drm/radeon: cleaned up VCO output settings for DP audio
  drm/radeon: properly byte swap vce firmware setup
  drm/radeon: clean up fujitsu quirks
  drm/radeon: Fix "slow" audio over DP on DCE8+
  drm/radeon: call hpd_irq_event on resume
  drm/radeon: Fix off-by-one errors in radeon_vm_bo_set_addr
  drm/dp/mst: deallocate payload on port destruction
  drm/dp/mst: Reverse order of MST enable and clearing VC payload table.
  drm/dp/mst: move GUID storage from mgr, port to only mst branch
  drm/dp/mst: Calculate MST PBN with 31.32 fixed point
  drm: Add drm_fixp_from_fraction and drm_fixp2int_ceil
  drm/dp/mst: fix in RAD element access
  drm/dp/mst: fix in MSTB RAD initialization
  drm/dp/mst: always send reply for UP request
  drm/dp/mst: process broadcast messages correctly
  drm/nouveau: platform: Fix deferred probe
  drm/nouveau/disp/dp: ensure sink is powered up before attempting link training
  drm/nouveau/display: Enable vblank irqs after display engine is on again.
  drm/nouveau/kms: take mode_config mutex in connector hotplug path
  drm/amdgpu/pm: adjust display configuration after powerstate
  drm/amdgpu: Don't hang in amdgpu_flip_work_func on disabled crtc.
  drm/amdgpu: use post-decrement in error handling
  drm/amdgpu: fix issue with overlapping userptrs
  drm/amdgpu: hold reference to fences in amdgpu_sa_bo_new (v2)
  drm/amdgpu: remove unnecessary forward declaration
  drm/amdgpu: fix s4 resume
  drm/amdgpu: remove exp hardware support from iceland
  drm/amdgpu: don't load MEC2 on topaz
  drm/amdgpu: drop topaz support from gmc8 module
  drm/amdgpu: pull topaz gmc bits into gmc_v7
  drm/amdgpu: The VI specific EXE bit should only apply to GMC v8.0 above
  drm/amdgpu: iceland use CI based MC IP
  drm/amdgpu: move gmc7 support out of CIK dependency
  drm/amdgpu: no need to load MC firmware on fiji
  drm/amdgpu: fix amdgpu_bo_pin_restricted VRAM placing v2
  drm/amdgpu: fix tonga smu resume
  drm/amdgpu: fix lost sync_to if scheduler is enabled.
  drm/amdgpu: call hpd_irq_event on resume
  drm/amdgpu: Fix off-by-one errors in amdgpu_vm_bo_map
  drm/vmwgfx: respect 'nomodeset'
  drm/vmwgfx: Fix a width / pitch mismatch on framebuffer updates
  drm/vmwgfx: Fix an incorrect lock check
  virtio_pci: fix use after free on release
  virtio_balloon: fix race between migration and ballooning
  virtio_balloon: fix race by fill and leak
  regulator: mt6311: MT6311_REGULATOR needs to select REGMAP_I2C
  regulator: axp20x: Fix GPIO LDO enable value for AXP22x
  clk: exynos: use irqsave version of spin_lock to avoid deadlock with irqs
  cxl: use correct operator when writing pcie config space values
  sparc64: fix incorrect sign extension in sys_sparc64_personality
  EDAC, mc_sysfs: Fix freeing bus' name
  EDAC: Robustify workqueues destruction
  MIPS: Fix buffer overflow in syscall_get_arguments()
  MIPS: Fix some missing CONFIG_CPU_MIPSR6 #ifdefs
  MIPS: hpet: Choose a safe value for the ETIME check
  MIPS: Loongson-3: Fix SMP_ASK_C0COUNT IPI handler
  Revert "MIPS: Fix PAGE_MASK definition"
  cputime: Prevent 32bit overflow in time[val|spec]_to_cputime()
  time: Avoid signed overflow in timekeeping_get_ns()
  Bluetooth: 6lowpan: Fix handling of uncompressed IPv6 packets
  Bluetooth: 6lowpan: Fix kernel NULL pointer dereferences
  Bluetooth: Fix incorrect removing of IRKs
  Bluetooth: Add support of Toshiba Broadcom based devices
  Bluetooth: Use continuous scanning when creating LE connections
  Drivers: hv: vmbus: Fix a Host signaling bug
  tools: hv: vss: fix the write()'s argument: error -> vss_msg
  mmc: sdhci: Allow override of get_cd() called from sdhci_request()
  mmc: sdhci: Allow override of mmc host operations
  mmc: sdhci-pci: Fix card detect race for Intel BXT/APL
  mmc: pxamci: fix again read-only gpio detection polarity
  mmc: sdhci-acpi: Fix card detect race for Intel BXT/APL
  mmc: mmci: fix an ages old detection error
  mmc: core: Enable tuning according to the actual timing
  mmc: sdhci: Fix sdhci_runtime_pm_bus_on/off()
  mmc: mmc: Fix incorrect use of driver strength switching HS200 and HS400
  mmc: sdio: Fix invalid vdd in voltage switch power cycle
  mmc: sdhci: Fix DMA descriptor with zero data length
  mmc: sdhci-pci: Do not default to 33 Ohm driver strength for Intel SPT
  mmc: usdhi6rol0: handle NULL data in timeout
  clockevents/tcb_clksrc: Prevent disabling an already disabled clock
  posix-clock: Fix return code on the poll method's error path
  irqchip/gic-v3-its: Fix double ICC_EOIR write for LPI in EOImode==1
  irqchip/atmel-aic: Fix wrong bit operation for IRQ priority
  irqchip/mxs: Add missing set_handle_irq()
  irqchip/omap-intc: Add support for spurious irq handling
  coresight: checking for NULL string in coresight_name_match()
  dm: fix dm_rq_target_io leak on faults with .request_fn DM w/ blk-mq paths
  dm snapshot: fix hung bios when copy error occurs
  dm space map metadata: remove unused variable in brb_pop()
  tda1004x: only update the frontend properties if locked
  vb2: fix a regression in poll() behavior for output,streams
  gspca: ov534/topro: prevent a division by 0
  si2157: return -EINVAL if firmware blob is too big
  media: dvb-core: Don't force CAN_INVERSION_AUTO in oneshot mode
  rc: sunxi-cir: Initialize the spinlock properly
  namei: ->d_inode of a pinned dentry is stable only for positives
  mei: validate request value in client notify request ioctl
  mei: fix fasync return value on error
  rtlwifi: rtl8723be: Fix module parameter initialization
  rtlwifi: rtl8188ee: Fix module parameter initialization
  rtlwifi: rtl8192se: Fix module parameter initialization
  rtlwifi: rtl8723ae: Fix initialization of module parameters
  rtlwifi: rtl8192de: Fix incorrect module parameter descriptions
  rtlwifi: rtl8192ce: Fix handling of module parameters
  rtlwifi: rtl8192cu: Add missing parameter setup
  rtlwifi: rtl_pci: Fix kernel panic
  locks: fix unlock when fcntl_setlk races with a close
  um: link with -lpthread
  uml: fix hostfs mknod()
  uml: flush stdout before forking
  s390/fpu: signals vs. floating point control register
  s390/compat: correct restore of high gprs on signal return
  s390/dasd: fix performance drop
  s390/dasd: fix refcount for PAV reassignment
  s390/dasd: prevent incorrect length error under z/VM after PAV changes
  s390: fix normalization bug in exception table sorting
  btrfs: initialize the seq counter in struct btrfs_device
  Btrfs: Initialize btrfs_root->highest_objectid when loading tree root and subvolume roots
  Btrfs: fix transaction handle leak on failure to create hard link
  Btrfs: fix number of transaction units required to create symlink
  Btrfs: send, don't BUG_ON() when an empty symlink is found
  btrfs: statfs: report zero available if metadata are exhausted
  Btrfs: igrab inode in writepage
  Btrfs: add missing brelse when superblock checksum fails
  KVM: s390: fix memory overwrites when vx is disabled
  s390/kvm: remove dependency on struct save_area definition
  clocksource/drivers/vt8500: Increase the minimum delta
  genirq: Validate action before dereferencing it in handle_irq_event_percpu()
  mm: numa: quickly fail allocations for NUMA balancing on full nodes
  mm: thp: fix SMP race condition between THP page fault and MADV_DONTNEED
  ocfs2: unlock inode if deleting inode from orphan fails
  drm/i915: shut up gen8+ SDE irq dmesg noise
  iw_cxgb3: Fix incorrectly returning error on success
  spi: omap2-mcspi: Prevent duplicate gpio_request
  drivers: android: correct the size of struct binder_uintptr_t for BC_DEAD_BINDER_DONE
  USB: option: add "4G LTE usb-modem U901"
  USB: option: add support for SIM7100E
  USB: cp210x: add IDs for GE B650V3 and B850V3 boards
  usb: dwc3: Fix assignment of EP transfer resources
  can: ems_usb: Fix possible tx overflow
  dm thin: fix race condition when destroying thin pool workqueue
  bcache: Change refill_dirty() to always scan entire disk if necessary
  bcache: prevent crash on changing writeback_running
  bcache: allows use of register in udev to avoid "device_busy" error.
  bcache: unregister reboot notifier if bcache fails to unregister device
  bcache: fix a leak in bch_cached_dev_run()
  bcache: clear BCACHE_DEV_UNLINK_DONE flag when attaching a backing device
  bcache: Add a cond_resched() call to gc
  bcache: fix a livelock when we cause a huge number of cache misses
  lib/ucs2_string: Correct ucs2 -> utf8 conversion
  efi: Add pstore variables to the deletion whitelist
  efi: Make efivarfs entries immutable by default
  efi: Make our variable validation list include the guid
  efi: Do variable name validation tests in utf8
  efi: Use ucs2_as_utf8 in efivarfs instead of open coding a bad version
  lib/ucs2_string: Add ucs2 -> utf8 helper functions
  ARM: 8457/1: psci-smp is built only for SMP
  drm/gma500: Use correct unref in the gem bo create function
  devm_memremap: Fix error value when memremap failed
  KVM: s390: fix guest fprs memory leak
  arm64: errata: Add -mpc-relative-literal-loads to build flags
  ARM: debug-ll: fix BCM63xx entry for multiplatform
  ext4: fix bh->b_state corruption
  sctp: Fix port hash table size computation
  unix_diag: fix incorrect sign extension in unix_lookup_by_ino
  tipc: unlock in error path
  rtnl: RTM_GETNETCONF: fix wrong return value
  IFF_NO_QUEUE: Fix for drivers not calling ether_setup()
  tcp/dccp: fix another race at listener dismantle
  route: check and remove route cache when we get route
  net_sched fix: reclassification needs to consider ether protocol changes
  pppoe: fix reference counting in PPPoE proxy
  l2tp: Fix error creating L2TP tunnels
  net/mlx4_en: Avoid changing dev->features directly in run-time
  net/mlx4_en: Choose time-stamping shift value according to HW frequency
  net/mlx4_en: Count HW buffer overrun only once
  qmi_wwan: add "4G LTE usb-modem U901"
  tcp: md5: release request socket instead of listener
  tipc: fix premature addition of node to lookup table
  af_unix: Guard against other == sk in unix_dgram_sendmsg
  af_unix: Don't set err in unix_stream_read_generic unless there was an error
  ipv4: fix memory leaks in ip_cmsg_send() callers
  bonding: Fix ARP monitor validation
  bpf: fix branch offset adjustment on backjumps after patching ctx expansion
  flow_dissector: Fix unaligned access in __skb_flow_dissector when used by eth_get_headlen
  net: Copy inner L3 and L4 headers as unaligned on GRE TEB
  sctp: translate network order to host order when users get a hmacid
  enic: increment devcmd2 result ring in case of timeout
  tg3: Fix for tg3 transmit queue 0 timed out when too many gso_segs
  net:Add sysctl_max_skb_frags
  tcp: do not drop syn_recv on all icmp reports
  unix: correctly track in-flight fds in sending process user_struct
  ipv6: fix a lockdep splat
  ipv6: addrconf: Fix recursive spin lock call
  ipv6/udp: use sticky pktinfo egress ifindex on connect()
  ipv6: enforce flowi6_oif usage in ip6_dst_lookup_tail()
  tcp: beware of alignments in tcp_get_info()
  switchdev: Require RTNL mutex to be held when sending FDB notifications
  inet: frag: Always orphan skbs inside ip_defrag()
  tipc: fix connection abort during subscription cancel
  net: dsa: fix mv88e6xxx switches
  sctp: allow setting SCTP_SACK_IMMEDIATELY by the application
  pptp: fix illegal memory access caused by multiple bind()s
  af_unix: fix struct pid memory leak
  tcp: fix NULL deref in tcp_v4_send_ack()
  lwt: fix rx checksum setting for lwt devices tunneling over ipv6
  tunnels: Allow IPv6 UDP checksums to be correctly controlled.
  net: dp83640: Fix tx timestamp overflow handling.
  gro: Make GRO aware of lightweight tunnels.
  af_iucv: Validate socket address length in iucv_sock_bind()

Conflicts:
	arch/arm64/Makefile
	arch/arm64/include/asm/cacheflush.h
	drivers/mmc/host/sdhci.c
	drivers/usb/dwc3/ep0.c
	drivers/usb/dwc3/gadget.c
	kernel/module.c
	sound/core/pcm_compat.c

CRs-Fixed: 1010239
Signed-off-by: Runmin Wang <runminw@codeaurora.org>
Change-Id: I41a28636fc9ad91f9d979b191784609476294cdf
2016-07-12 11:40:49 -07:00
Patrick Bellasi
13001f47c9 sched/tune: add initial support for CGroups based boosting
To support task performance boosting, the usage of a single knob has the
advantage to be a simple solution, both from the implementation and the
usability standpoint.  However, on a real system it can be difficult to
identify a single value for the knob which fits the needs of multiple
different tasks. For example, some kernel threads and/or user-space
background services should be better managed the "standard" way while we
still want to be able to boost the performance of specific workloads.

In order to improve the flexibility of the task boosting mechanism this
patch is the first of a small series which extends the previous
implementation to introduce a "per task group" support.
This first patch introduces just the basic CGroups support, a new
"schedtune" CGroups controller is added which allows to configure
different boost value for different groups of tasks.
To keep the implementation simple but still effective for a boosting
strategy, the new controller:
  1. allows only a two layer hierarchy
  2. supports only a limited number of boost groups

A two layer hierarchy allows to place each task either:
  a) in the root control group
     thus being subject to a system-wide boosting value
  b) in a child of the root group
     thus being subject to the specific boost value defined by that
     "boost group"

The limited number of "boost groups" supported is mainly motivated by
the observation that in a real system it could be useful to have only
few classes of tasks which deserve different treatment.
For example, background vs foreground or interactive vs low-priority.
As an additional benefit, a limited number of boost groups allows also
to have a simpler implementation especially for the code required to
compute the boost value for CPUs which have runnable tasks belonging to
different boost groups.

cc: Tejun Heo <tj@kernel.org>
cc: Li Zefan <lizefan@huawei.com>
cc: Johannes Weiner <hannes@cmpxchg.org>
cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
2016-05-10 16:53:24 +08:00
Patrick Bellasi
344f4ec429 sched/tune: add sysctl interface to define a boost value
The current (CFS) scheduler implementation does not allow "to boost"
tasks performance by running them at a higher OPP compared to the
minimum required to meet their workload demands.

To support tasks performance boosting the scheduler should provide a
"knob" which allows to tune how much the system is going to be optimised
for energy efficiency vs performance.

This patch is the first of a series which provides a simple interface to
define a tuning knob. One system-wide "boost" tunable is exposed via:
  /proc/sys/kernel/sched_cfs_boost
which can be configured in the range [0..100], to define a percentage
where:
  - 0%   boost requires to operate in "standard" mode by scheduling
         tasks at the minimum capacities required by the workload demand
  - 100% boost requires to push at maximum the task performances,
         "regardless" of the incurred energy consumption

A boost value in between these two boundaries is used to bias the
power/performance trade-off, the higher the boost value the more the
scheduler is biased toward performance boosting instead of energy
efficiency.

cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
2016-05-10 16:53:24 +08:00
Joonwoo Park
2e0ebb0155 sched: add option whether CPU C-state is used to guide task placement
There are CPUs that don't have an obvious low power mode exit latency
penalty.  Add a new Kconfig CONFIG_SCHED_HMP_CSTATE_AWARE which controls
whether CPU C-state is used to guide task placement.

CRs-fixed: 1006303
Change-Id: Ie8dbab8e173c3a1842d922f4d1fbd8cc4221789c
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-04-22 15:05:24 -07:00
Srivatsa Vaddagiri
551f83f5d6 sched: Add CONFIG_SCHED_HMP Kconfig option
Add a compile-time flag to enable or disable scheduler features for
HMP (heterogenous multi-processor) systems. Main feature deals with
optimizing task placement for best power/performance tradeoff.

Also extend features currently dependent on CONFIG_SCHED_FREQ_INPUT to
be enabled for CONFIG_HMP as well.

Change-Id: I03b3942709a80cc19f7b934a8089e1d84c14d72d
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed minor ifdefry conflict.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 19:59:00 -07:00
Mark Brown
0f221533ba Merge remote-tracking branch 'lsk/linux-linaro-lsk-v4.4-android' into linux-linaro-lsk-v4.4-android 2016-03-18 09:50:49 +00:00
Kees Cook
97db5772c0 mm/init: Add 'rodata=off' boot cmdline parameter to disable read-only kernel mappings
commit d2aa1acad22f1bdd0cfa67b3861800e392254454 upstream.

It may be useful to debug writes to the readonly sections of memory,
so provide a cmdline "rodata=off" to allow for this. This can be
expanded in the future to support "log" and "write" modes, but that
will need to be architecture-specific.

This also makes KDB software breakpoints more usable, as read-only
mappings can now be disabled on any kernel.

Suggested-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Brown <david.brown@linaro.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Emese Revfy <re.emese@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: PaX Team <pageexec@freemail.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-arch <linux-arch@vger.kernel.org>
Link: http://lkml.kernel.org/r/1455748879-21872-3-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: David Brown <david.brown@linaro.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
2016-03-17 18:51:51 +00:00
Rom Lemarchand
05f5cf60a4 initramfs: Add skip_initramfs command line option
Add a skip_initramfs option to allow choosing whether to boot using
the initramfs or not at runtime.

Change-Id: If30428fa748c1d4d3d7b9d97c1f781de5e4558c3
Signed-off-by: Rom Lemarchand <romlem@google.com>
2016-02-16 13:54:10 -08:00
Chris Wilson
86fffe4a61 kernel: remove stop_machine() Kconfig dependency
Currently the full stop_machine() routine is only enabled on SMP if
module unloading is enabled, or if the CPUs are hotpluggable.  This
leads to configurations where stop_machine() is broken as it will then
only run the callback on the local CPU with irqs disabled, and not stop
the other CPUs or run the callback on them.

For example, this breaks MTRR setup on x86 in certain configs since
ea8596bb2d ("kprobes/x86: Remove unused text_poke_smp() and
text_poke_smp_batch() functions") as the MTRR is only established on the
boot CPU.

This patch removes the Kconfig option for STOP_MACHINE and uses the SMP
and HOTPLUG_CPU config options to compile the correct stop_machine() for
the architecture, removing the false dependency on MODULE_UNLOAD in the
process.

Link: https://lkml.org/lkml/2014/10/8/124
References: https://bugs.freedesktop.org/show_bug.cgi?id=84794
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Pranith Kumar <bobby.prani@gmail.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Iulia Manda <iulia.manda21@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Chuck Ebbert <cebbert.lkml@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-12-12 10:15:34 -08:00
Mathieu Desnoyers
5b25b13ab0 sys_membarrier(): system-wide memory barrier (generic, x86)
Here is an implementation of a new system call, sys_membarrier(), which
executes a memory barrier on all threads running on the system.  It is
implemented by calling synchronize_sched().  It can be used to
distribute the cost of user-space memory barriers asymmetrically by
transforming pairs of memory barriers into pairs consisting of
sys_membarrier() and a compiler barrier.  For synchronization primitives
that distinguish between read-side and write-side (e.g.  userspace RCU
[1], rwlocks), the read-side can be accelerated significantly by moving
the bulk of the memory barrier overhead to the write-side.

The existing applications of which I am aware that would be improved by
this system call are as follows:

* Through Userspace RCU library (http://urcu.so)
  - DNS server (Knot DNS) https://www.knot-dns.cz/
  - Network sniffer (http://netsniff-ng.org/)
  - Distributed object storage (https://sheepdog.github.io/sheepdog/)
  - User-space tracing (http://lttng.org)
  - Network storage system (https://www.gluster.org/)
  - Virtual routers (https://events.linuxfoundation.org/sites/events/files/slides/DPDK_RCU_0MQ.pdf)
  - Financial software (https://lkml.org/lkml/2015/3/23/189)

Those projects use RCU in userspace to increase read-side speed and
scalability compared to locking.  Especially in the case of RCU used by
libraries, sys_membarrier can speed up the read-side by moving the bulk of
the memory barrier cost to synchronize_rcu().

* Direct users of sys_membarrier
  - core dotnet garbage collector (https://github.com/dotnet/coreclr/issues/198)

Microsoft core dotnet GC developers are planning to use the mprotect()
side-effect of issuing memory barriers through IPIs as a way to implement
Windows FlushProcessWriteBuffers() on Linux.  They are referring to
sys_membarrier in their github thread, specifically stating that
sys_membarrier() is what they are looking for.

To explain the benefit of this scheme, let's introduce two example threads:

Thread A (non-frequent, e.g. executing liburcu synchronize_rcu())
Thread B (frequent, e.g. executing liburcu
rcu_read_lock()/rcu_read_unlock())

In a scheme where all smp_mb() in thread A are ordering memory accesses
with respect to smp_mb() present in Thread B, we can change each
smp_mb() within Thread A into calls to sys_membarrier() and each
smp_mb() within Thread B into compiler barriers "barrier()".

Before the change, we had, for each smp_mb() pairs:

Thread A                    Thread B
previous mem accesses       previous mem accesses
smp_mb()                    smp_mb()
following mem accesses      following mem accesses

After the change, these pairs become:

Thread A                    Thread B
prev mem accesses           prev mem accesses
sys_membarrier()            barrier()
follow mem accesses         follow mem accesses

As we can see, there are two possible scenarios: either Thread B memory
accesses do not happen concurrently with Thread A accesses (1), or they
do (2).

1) Non-concurrent Thread A vs Thread B accesses:

Thread A                    Thread B
prev mem accesses
sys_membarrier()
follow mem accesses
                            prev mem accesses
                            barrier()
                            follow mem accesses

In this case, thread B accesses will be weakly ordered. This is OK,
because at that point, thread A is not particularly interested in
ordering them with respect to its own accesses.

2) Concurrent Thread A vs Thread B accesses

Thread A                    Thread B
prev mem accesses           prev mem accesses
sys_membarrier()            barrier()
follow mem accesses         follow mem accesses

In this case, thread B accesses, which are ensured to be in program
order thanks to the compiler barrier, will be "upgraded" to full
smp_mb() by synchronize_sched().

* Benchmarks

On Intel Xeon E5405 (8 cores)
(one thread is calling sys_membarrier, the other 7 threads are busy
looping)

1000 non-expedited sys_membarrier calls in 33s =3D 33 milliseconds/call.

* User-space user of this system call: Userspace RCU library

Both the signal-based and the sys_membarrier userspace RCU schemes
permit us to remove the memory barrier from the userspace RCU
rcu_read_lock() and rcu_read_unlock() primitives, thus significantly
accelerating them. These memory barriers are replaced by compiler
barriers on the read-side, and all matching memory barriers on the
write-side are turned into an invocation of a memory barrier on all
active threads in the process. By letting the kernel perform this
synchronization rather than dumbly sending a signal to every process
threads (as we currently do), we diminish the number of unnecessary wake
ups and only issue the memory barriers on active threads. Non-running
threads do not need to execute such barrier anyway, because these are
implied by the scheduler context switches.

Results in liburcu:

Operations in 10s, 6 readers, 2 writers:

memory barriers in reader:    1701557485 reads, 2202847 writes
signal-based scheme:          9830061167 reads,    6700 writes
sys_membarrier:               9952759104 reads,     425 writes
sys_membarrier (dyn. check):  7970328887 reads,     425 writes

The dynamic sys_membarrier availability check adds some overhead to
the read-side compared to the signal-based scheme, but besides that,
sys_membarrier slightly outperforms the signal-based scheme. However,
this non-expedited sys_membarrier implementation has a much slower grace
period than signal and memory barrier schemes.

Besides diminishing the number of wake-ups, one major advantage of the
membarrier system call over the signal-based scheme is that it does not
need to reserve a signal. This plays much more nicely with libraries,
and with processes injected into for tracing purposes, for which we
cannot expect that signals will be unused by the application.

An expedited version of this system call can be added later on to speed
up the grace period. Its implementation will likely depend on reading
the cpu_curr()->mm without holding each CPU's rq lock.

This patch adds the system call to x86 and to asm-generic.

[1] http://urcu.so

membarrier(2) man page:

MEMBARRIER(2)              Linux Programmer's Manual             MEMBARRIER(2)

NAME
       membarrier - issue memory barriers on a set of threads

SYNOPSIS
       #include <linux/membarrier.h>

       int membarrier(int cmd, int flags);

DESCRIPTION
       The cmd argument is one of the following:

       MEMBARRIER_CMD_QUERY
              Query  the  set  of  supported commands. It returns a bitmask of
              supported commands.

       MEMBARRIER_CMD_SHARED
              Execute a memory barrier on all threads running on  the  system.
              Upon  return from system call, the caller thread is ensured that
              all running threads have passed through a state where all memory
              accesses  to  user-space  addresses  match program order between
              entry to and return from the system  call  (non-running  threads
              are de facto in such a state). This covers threads from all pro=E2=80=90
              cesses running on the system.  This command returns 0.

       The flags argument needs to be 0. For future extensions.

       All memory accesses performed  in  program  order  from  each  targeted
       thread is guaranteed to be ordered with respect to sys_membarrier(). If
       we use the semantic "barrier()" to represent a compiler barrier forcing
       memory  accesses  to  be performed in program order across the barrier,
       and smp_mb() to represent explicit memory barriers forcing full  memory
       ordering  across  the barrier, we have the following ordering table for
       each pair of barrier(), sys_membarrier() and smp_mb():

       The pair ordering is detailed as (O: ordered, X: not ordered):

                              barrier()   smp_mb() sys_membarrier()
              barrier()          X           X            O
              smp_mb()           X           O            O
              sys_membarrier()   O           O            O

RETURN VALUE
       On success, these system calls return zero.  On error, -1 is  returned,
       and errno is set appropriately. For a given command, with flags
       argument set to 0, this system call is guaranteed to always return the
       same value until reboot.

ERRORS
       ENOSYS System call is not implemented.

       EINVAL Invalid arguments.

Linux                             2015-04-15                     MEMBARRIER(2)

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Nicholas Miell <nmiell@comcast.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Pranith Kumar <bobby.prani@gmail.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-11 15:21:34 -07:00
Dave Young
2965faa5e0 kexec: split kexec_load syscall from kexec core code
There are two kexec load syscalls, kexec_load another and kexec_file_load.
 kexec_file_load has been splited as kernel/kexec_file.c.  In this patch I
split kexec_load syscall code to kernel/kexec.c.

And add a new kconfig option KEXEC_CORE, so we can disable kexec_load and
use kexec_file_load only, or vice verse.

The original requirement is from Ted Ts'o, he want kexec kernel signature
being checked with CONFIG_KEXEC_VERIFY_SIG enabled.  But kexec-tools use
kexec_load syscall can bypass the checking.

Vivek Goyal proposed to create a common kconfig option so user can compile
in only one syscall for loading kexec kernel.  KEXEC/KEXEC_FILE selects
KEXEC_CORE so that old config files still work.

Because there's general code need CONFIG_KEXEC_CORE, so I updated all the
architecture Kconfig with a new option KEXEC_CORE, and let KEXEC selects
KEXEC_CORE in arch Kconfig.  Also updated general kernel code with to
kexec_load syscall.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Dave Young <dyoung@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Petr Tesarik <ptesarik@suse.cz>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-10 13:29:01 -07:00
Frederic Weisbecker
90f023030e kmod: use system_unbound_wq instead of khelper
We need to launch the usermodehelper kernel threads with the widest
affinity and this is partly why we use khelper.  This workqueue has
unbound properties and thus a wide affinity inherited by all its children.

Now khelper also has special properties that we aren't much interested in:
ordered and singlethread.  There is really no need about ordering as all
we do is creating kernel threads.  This can be done concurrently.  And
singlethread is a useless limitation as well.

The workqueue engine already proposes generic unbound workqueues that
don't share these useless properties and handle well parallel jobs.

The only worrysome specific is their affinity to the node of the current
CPU.  It's fine for creating the usermodehelper kernel threads but those
inherit this affinity for longer jobs such as requesting modules.

This patch proposes to use these node affine unbound workqueues assuming
that a node is sufficient to handle several parallel usermodehelper
requests.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-10 13:29:01 -07:00
Linus Torvalds
b793c005ce Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem updates from James Morris:
 "Highlights:

   - PKCS#7 support added to support signed kexec, also utilized for
     module signing.  See comments in 3f1e1bea.

     ** NOTE: this requires linking against the OpenSSL library, which
        must be installed, e.g.  the openssl-devel on Fedora **

   - Smack
      - add IPv6 host labeling; ignore labels on kernel threads
      - support smack labeling mounts which use binary mount data

   - SELinux:
      - add ioctl whitelisting (see
        http://kernsec.org/files/lss2015/vanderstoep.pdf)
      - fix mprotect PROT_EXEC regression caused by mm change

   - Seccomp:
      - add ptrace options for suspend/resume"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (57 commits)
  PKCS#7: Add OIDs for sha224, sha284 and sha512 hash algos and use them
  Documentation/Changes: Now need OpenSSL devel packages for module signing
  scripts: add extract-cert and sign-file to .gitignore
  modsign: Handle signing key in source tree
  modsign: Use if_changed rule for extracting cert from module signing key
  Move certificate handling to its own directory
  sign-file: Fix warning about BIO_reset() return value
  PKCS#7: Add MODULE_LICENSE() to test module
  Smack - Fix build error with bringup unconfigured
  sign-file: Document dependency on OpenSSL devel libraries
  PKCS#7: Appropriately restrict authenticated attributes and content type
  KEYS: Add a name for PKEY_ID_PKCS7
  PKCS#7: Improve and export the X.509 ASN.1 time object decoder
  modsign: Use extract-cert to process CONFIG_SYSTEM_TRUSTED_KEYS
  extract-cert: Cope with multiple X.509 certificates in a single file
  sign-file: Generate CMS message as signature instead of PKCS#7
  PKCS#7: Support CMS messages also [RFC5652]
  X.509: Change recorded SKID & AKID to not include Subject or Issuer
  PKCS#7: Check content type and versions
  MAINTAINERS: The keyrings mailing list has moved
  ...
2015-09-08 12:41:25 -07:00
Linus Torvalds
7d9071a095 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:
 "In this one:

   - d_move fixes (Eric Biederman)

   - UFS fixes (me; locking is mostly sane now, a bunch of bugs in error
     handling ought to be fixed)

   - switch of sb_writers to percpu rwsem (Oleg Nesterov)

   - superblock scalability (Josef Bacik and Dave Chinner)

   - swapon(2) race fix (Hugh Dickins)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (65 commits)
  vfs: Test for and handle paths that are unreachable from their mnt_root
  dcache: Reduce the scope of i_lock in d_splice_alias
  dcache: Handle escaped paths in prepend_path
  mm: fix potential data race in SyS_swapon
  inode: don't softlockup when evicting inodes
  inode: rename i_wb_list to i_io_list
  sync: serialise per-superblock sync operations
  inode: convert inode_sb_list_lock to per-sb
  inode: add hlist_fake to avoid the inode hash lock in evict
  writeback: plug writeback at a high level
  change sb_writers to use percpu_rw_semaphore
  shift percpu_counter_destroy() into destroy_super_work()
  percpu-rwsem: kill CONFIG_PERCPU_RWSEM
  percpu-rwsem: introduce percpu_rwsem_release() and percpu_rwsem_acquire()
  percpu-rwsem: introduce percpu_down_read_trylock()
  document rwsem_release() in sb_wait_write()
  fix the broken lockdep logic in __sb_start_write()
  introduce __sb_writers_{acquired,release}() helpers
  ufs_inode_get{frag,block}(): get rid of 'phys' argument
  ufs_getfrag_block(): tidy up a bit
  ...
2015-09-05 20:34:28 -07:00
Linus Torvalds
6c0f568e84 Merge branch 'akpm' (patches from Andrew)
Merge patch-bomb from Andrew Morton:

 - a few misc things

 - Andy's "ambient capabilities"

 - fs/nofity updates

 - the ocfs2 queue

 - kernel/watchdog.c updates and feature work.

 - some of MM.  Includes Andrea's userfaultfd feature.

[ Hadn't noticed that userfaultfd was 'default y' when applying the
  patches, so that got fixed in this merge instead.  We do _not_ mark
  new features that nobody uses yet 'default y'   - Linus ]

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits)
  mm/hugetlb.c: make vma_has_reserves() return bool
  mm/madvise.c: make madvise_behaviour_valid() return bool
  mm/memory.c: make tlb_next_batch() return bool
  mm/dmapool.c: change is_page_busy() return from int to bool
  mm: remove struct node_active_region
  mremap: simplify the "overlap" check in mremap_to()
  mremap: don't do uneccesary checks if new_len == old_len
  mremap: don't do mm_populate(new_addr) on failure
  mm: move ->mremap() from file_operations to vm_operations_struct
  mremap: don't leak new_vma if f_op->mremap() fails
  mm/hugetlb.c: make vma_shareable() return bool
  mm: make GUP handle pfn mapping unless FOLL_GET is requested
  mm: fix status code which move_pages() returns for zero page
  mm: memcontrol: bring back the VM_BUG_ON() in mem_cgroup_swapout()
  genalloc: add support of multiple gen_pools per device
  genalloc: add name arg to gen_pool_get() and devm_gen_pool_create()
  mm/memblock: WARN_ON when nid differs from overlap region
  Documentation/features/vm: add feature description and arch support status for batched TLB flush after unmap
  mm: defer flush of writable TLB entries
  mm: send one IPI per CPU to TLB flush all entries after unmapping pages
  ...
2015-09-05 14:27:38 -07:00
Mel Gorman
72b252aed5 mm: send one IPI per CPU to TLB flush all entries after unmapping pages
An IPI is sent to flush remote TLBs when a page is unmapped that was
potentially accesssed by other CPUs.  There are many circumstances where
this happens but the obvious one is kswapd reclaiming pages belonging to a
running process as kswapd and the task are likely running on separate
CPUs.

On small machines, this is not a significant problem but as machine gets
larger with more cores and more memory, the cost of these IPIs can be
high.  This patch uses a simple structure that tracks CPUs that
potentially have TLB entries for pages being unmapped.  When the unmapping
is complete, the full TLB is flushed on the assumption that a refill cost
is lower than flushing individual entries.

Architectures wishing to do this must give the following guarantee.

        If a clean page is unmapped and not immediately flushed, the
        architecture must guarantee that a write to that linear address
        from a CPU with a cached TLB entry will trap a page fault.

This is essentially what the kernel already depends on but the window is
much larger with this patch applied and is worth highlighting.  The
architecture should consider whether the cost of the full TLB flush is
higher than sending an IPI to flush each individual entry.  An additional
architecture helper called flush_tlb_local is required.  It's a trivial
wrapper with some accounting in the x86 case.

The impact of this patch depends on the workload as measuring any benefit
requires both mapped pages co-located on the LRU and memory pressure.  The
case with the biggest impact is multiple processes reading mapped pages
taken from the vm-scalability test suite.  The test case uses NR_CPU
readers of mapped files that consume 10*RAM.

Linear mapped reader on a 4-node machine with 64G RAM and 48 CPUs

                                           4.2.0-rc1          4.2.0-rc1
                                             vanilla       flushfull-v7
Ops lru-file-mmap-read-elapsed      159.62 (  0.00%)   120.68 ( 24.40%)
Ops lru-file-mmap-read-time_range    30.59 (  0.00%)     2.80 ( 90.85%)
Ops lru-file-mmap-read-time_stddv     6.70 (  0.00%)     0.64 ( 90.38%)

           4.2.0-rc1    4.2.0-rc1
             vanilla flushfull-v7
User          581.00       611.43
System       5804.93      4111.76
Elapsed       161.03       122.12

This is showing that the readers completed 24.40% faster with 29% less
system CPU time.  From vmstats, it is known that the vanilla kernel was
interrupted roughly 900K times per second during the steady phase of the
test and the patched kernel was interrupts 180K times per second.

The impact is lower on a single socket machine.

                                           4.2.0-rc1          4.2.0-rc1
                                             vanilla       flushfull-v7
Ops lru-file-mmap-read-elapsed       25.33 (  0.00%)    20.38 ( 19.54%)
Ops lru-file-mmap-read-time_range     0.91 (  0.00%)     1.44 (-58.24%)
Ops lru-file-mmap-read-time_stddv     0.28 (  0.00%)     0.47 (-65.34%)

           4.2.0-rc1    4.2.0-rc1
             vanilla flushfull-v7
User           58.09        57.64
System        111.82        76.56
Elapsed        27.29        22.55

It's still a noticeable improvement with vmstat showing interrupts went
from roughly 500K per second to 45K per second.

The patch will have no impact on workloads with no memory pressure or have
relatively few mapped pages.  It will have an unpredictable impact on the
workload running on the CPU being flushed as it'll depend on how many TLB
entries need to be refilled and how long that takes.  Worst case, the TLB
will be completely cleared of active entries when the target PFNs were not
resident at all.

[sasha.levin@oracle.com: trace tlb flush after disabling preemption in try_to_unmap_flush]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Andrea Arcangeli
a14c151e56 userfaultfd: buildsystem activation
This allows to select the userfaultfd during configuration to build it.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Cc: Sanidhya Kashyap <sanidhya.gatech@gmail.com>
Cc: zhang.zhanghailiang@huawei.com
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Andres Lagar-Cavilla <andreslc@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Huangpeng (Peter)" <peter.huangpeng@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Linus Torvalds
8bdc69b764 Merge branch 'for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:

 - a new PIDs controller is added.  It turns out that PIDs are actually
   an independent resource from kmem due to the limited PID space.

 - more core preparations for the v2 interface.  Once cpu side interface
   is settled, it should be ready for lifting the devel mask.
   for-4.3-unified-base was temporarily branched so that other trees
   (block) can pull cgroup core changes that blkcg changes depend on.

 - a non-critical idr_preload usage bug fix.

* 'for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: pids: fix invalid get/put usage
  cgroup: introduce cgroup_subsys->legacy_name
  cgroup: don't print subsystems for the default hierarchy
  cgroup: make cftype->private a unsigned long
  cgroup: export cgrp_dfl_root
  cgroup: define controller file conventions
  cgroup: fix idr_preload usage
  cgroup: add documentation for the PIDs controller
  cgroup: implement the PIDs subsystem
  cgroup: allow a cgroup subsystem to reject a fork
2015-09-02 08:04:23 -07:00
Oleg Nesterov
bf3eac84c4 percpu-rwsem: kill CONFIG_PERCPU_RWSEM
Remove CONFIG_PERCPU_RWSEM, the next patch adds the unconditional
user of percpu_rw_semaphore.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2015-08-15 13:52:11 +02:00
David Howells
cfc411e7ff Move certificate handling to its own directory
Move certificate handling out of the kernel/ directory and into a certs/
directory to get all the weird stuff in one place and move the generated
signing keys into this directory.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: David Woodhouse <David.Woodhouse@intel.com>
2015-08-14 16:06:13 +01:00
David Howells
228c37ff98 sign-file: Document dependency on OpenSSL devel libraries
The revised sign-file program is no longer a script that wraps the openssl
program, but now rather a program that makes use of OpenSSL's crypto
library.  This means that to build the sign-file program, the kernel build
process now has a dependency on the OpenSSL development packages in
addition to OpenSSL itself.

Document this in Kconfig and in module-signing.txt.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: David Woodhouse <David.Woodhouse@intel.com>
2015-08-12 17:01:01 +01:00
Ingo Molnar
9b9412dc70 Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU changes from Paul E. McKenney:

  - The combination of tree geometry-initialization simplifications
    and OS-jitter-reduction changes to expedited grace periods.
    These two are stacked due to the large number of conflicts
    that would otherwise result.

    [ With one addition, a temporary commit to silence a lockdep false
      positive. Additional changes to the expedited grace-period
      primitives (queued for 4.4) remove the cause of this false
      positive, and therefore include a revert of this temporary commit. ]

  - Documentation updates.

  - Torture-test updates.

  - Miscellaneous fixes.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-12 12:12:12 +02:00
David Woodhouse
99d27b1b52 modsign: Add explicit CONFIG_SYSTEM_TRUSTED_KEYS option
Let the user explicitly provide a file containing trusted keys, instead of
just automatically finding files matching *.x509 in the build tree and
trusting whatever we find. This really ought to be an *explicit*
configuration, and the build rules for dealing with the files were
fairly painful too.

Fix applied from James Morris that removes an '=' from a macro definition
in kernel/Makefile as this is a feature that only exists from GNU make 3.82
onwards.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2015-08-07 16:26:14 +01:00
David Woodhouse
fb11794991 modsign: Use single PEM file for autogenerated key
The current rule for generating signing_key.priv and signing_key.x509 is
a classic example of a bad rule which has a tendency to break parallel
make. When invoked to create *either* target, it generates the other
target as a side-effect that make didn't predict.

So let's switch to using a single file signing_key.pem which contains
both key and certificate. That matches what we do in the case of an
external key specified by CONFIG_MODULE_SIG_KEY anyway, so it's also
slightly cleaner.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2015-08-07 16:26:14 +01:00
David Woodhouse
1329e8cc69 modsign: Extract signing cert from CONFIG_MODULE_SIG_KEY if needed
Where an external PEM file or PKCS#11 URI is given, we can get the cert
from it for ourselves instead of making the user drop signing_key.x509
in place for us.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2015-08-07 16:26:14 +01:00
David Woodhouse
19e91b69d7 modsign: Allow external signing key to be specified
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2015-08-07 16:26:14 +01:00
David Howells
091f6e26eb MODSIGN: Extract the blob PKCS#7 signature verifier from module signing
Extract the function that drives the PKCS#7 signature verification given a
data blob and a PKCS#7 blob out from the module signing code and lump it with
the system keyring code as it's generic.  This makes it independent of module
config options and opens it to use by the firmware loader.

Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Ming Lei <ming.lei@canonical.com>
Cc: Seth Forshee <seth.forshee@canonical.com>
Cc: Kyle McMartin <kyle@kernel.org>
2015-08-07 16:26:13 +01:00
David Howells
3f1e1bea34 MODSIGN: Use PKCS#7 messages as module signatures
Move to using PKCS#7 messages as module signatures because:

 (1) We have to be able to support the use of X.509 certificates that don't
     have a subjKeyId set.  We're currently relying on this to look up the
     X.509 certificate in the trusted keyring list.

 (2) PKCS#7 message signed information blocks have a field that supplies the
     data required to match with the X.509 certificate that signed it.

 (3) The PKCS#7 certificate carries fields that specify the digest algorithm
     used to generate the signature in a standardised way and the X.509
     certificates specify the public key algorithm in a standardised way - so
     we don't need our own methods of specifying these.

 (4) We now have PKCS#7 message support in the kernel for signed kexec purposes
     and we can make use of this.

To make this work, the old sign-file script has been replaced with a program
that needs compiling in a previous patch.  The rules to build it are added
here.

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Vivek Goyal <vgoyal@redhat.com>
2015-08-07 16:26:13 +01:00
Mel Gorman
4248b0da46 fs, file table: reinit files_stat.max_files after deferred memory initialisation
Dave Hansen reported the following;

	My laptop has been behaving strangely with 4.2-rc2.  Once I log
	in to my X session, I start getting all kinds of strange errors
	from applications and see this in my dmesg:

        	VFS: file-max limit 8192 reached

The problem is that the file-max is calculated before memory is fully
initialised and miscalculates how much memory the kernel is using.  This
patch recalculates file-max after deferred memory initialisation.  Note
that using memory hotplug infrastructure would not have avoided this
problem as the value is not recalculated after memory hot-add.

4.1:             files_stat.max_files = 6582781
4.2-rc2:         files_stat.max_files = 8192
4.2-rc2 patched: files_stat.max_files = 6562467

Small differences with the patch applied and 4.1 but not enough to matter.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reported-by: Dave Hansen <dave.hansen@intel.com>
Cc: Nicolai Stange <nicstange@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Alex Ng <alexng@microsoft.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07 04:39:40 +03:00
Paul E. McKenney
be55fa2ad2 rcu: Hide RCU_NOCB_CPU behind RCU_EXPERT
This commit prevents Kconfig from asking the user about RCU_NOCB_CPU
unless the user really wants to be asked.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
2015-07-22 15:27:25 -07:00
Aleksa Sarai
49b786ea14 cgroup: implement the PIDs subsystem
Adds a new single-purpose PIDs subsystem to limit the number of
tasks that can be forked inside a cgroup. Essentially this is an
implementation of RLIMIT_NPROC that applies to a cgroup rather than a
process tree.

However, it should be noted that organisational operations (adding and
removing tasks from a PIDs hierarchy) will *not* be prevented. Rather,
the number of tasks in the hierarchy cannot exceed the limit through
forking. This is due to the fact that, in the unified hierarchy, attach
cannot fail (and it is not possible for a task to overcome its PIDs
cgroup policy limit by attaching to a child cgroup -- even if migrating
mid-fork it must be able to fork in the parent first).

PIDs are fundamentally a global resource, and it is possible to reach
PID exhaustion inside a cgroup without hitting any reasonable kmemcg
policy. Once you've hit PID exhaustion, you're only in a marginally
better state than OOM. This subsystem allows PID exhaustion inside a
cgroup to be prevented.

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2015-07-14 17:29:23 -04:00
Paul E. McKenney
d1ec4c34c7 rcu: Drop RCU_USER_QS in favor of NO_HZ_FULL
The RCU_USER_QS Kconfig parameter is now just a synonym for NO_HZ_FULL,
so this commit eliminates RCU_USER_QS, replacing all uses with NO_HZ_FULL.

Reported-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
2015-07-06 13:52:18 -07:00
Linus Torvalds
22a093b2fb Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
 "Debug info and other statistics fixes and related enhancements"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/numa: Fix numa balancing stats in /proc/pid/sched
  sched/numa: Show numa_group ID in /proc/sched_debug task listings
  sched/debug: Move print_cfs_rq() declaration to kernel/sched/sched.h
  sched/stat: Expose /proc/pid/schedstat if CONFIG_SCHED_INFO=y
  sched/stat: Simplify the sched_info accounting dependency
2015-07-04 08:56:53 -07:00
Linus Torvalds
91cca0f0ff Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull max log buf size increase from Ingo Molnar:
 "Ran into this limit recently, so increase it by an order of magnitude"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  printk: Increase maximum CONFIG_LOG_BUF_SHIFT from 21 to 25
2015-07-04 08:16:41 -07:00
Naveen N. Rao
f6db834799 sched/stat: Simplify the sched_info accounting dependency
Both CONFIG_SCHEDSTATS=y and CONFIG_TASK_DELAY_ACCT=y track task
sched_info, which results in ugly #if clauses.

Simplify the code by introducing a synthethic CONFIG_SCHED_INFO
switch, selected by both.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: a.p.zijlstra@chello.nl
Cc: ricklind@us.ibm.com
Link: http://lkml.kernel.org/r/8d19eef800811a94b0f91bcbeb27430a884d7433.1435255405.git.naveen.n.rao@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-04 10:04:30 +02:00
Linus Torvalds
2d01eedf1d Merge branch 'akpm' (patches from Andrew)
Merge third patchbomb from Andrew Morton:

 - the rest of MM

 - scripts/gdb updates

 - ipc/ updates

 - lib/ updates

 - MAINTAINERS updates

 - various other misc things

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (67 commits)
  genalloc: rename of_get_named_gen_pool() to of_gen_pool_get()
  genalloc: rename dev_get_gen_pool() to gen_pool_get()
  x86: opt into HAVE_COPY_THREAD_TLS, for both 32-bit and 64-bit
  MAINTAINERS: add zpool
  MAINTAINERS: BCACHE: Kent Overstreet has changed email address
  MAINTAINERS: move Jens Osterkamp to CREDITS
  MAINTAINERS: remove unused nbd.h pattern
  MAINTAINERS: update brcm gpio filename pattern
  MAINTAINERS: update brcm dts pattern
  MAINTAINERS: update sound soc intel patterns
  MAINTAINERS: remove website for paride
  MAINTAINERS: update Emulex ocrdma email addresses
  bcache: use kvfree() in various places
  libcxgbi: use kvfree() in cxgbi_free_big_mem()
  target: use kvfree() in session alloc and free
  IB/ehca: use kvfree() in ipz_queue_{cd}tor()
  drm/nouveau/gem: use kvfree() in u_free()
  drm: use kvfree() in drm_free_large()
  cxgb4: use kvfree() in t4_free_mem()
  cxgb3: use kvfree() in cxgb_free_mem()
  ...
2015-07-01 17:47:51 -07:00
Linus Torvalds
02201e3f1b Minor merge needed, due to function move.
Main excitement here is Peter Zijlstra's lockless rbtree optimization to
 speed module address lookup.  He found some abusers of the module lock
 doing that too.
 
 A little bit of parameter work here too; including Dan Streetman's breaking
 up the big param mutex so writing a parameter can load another module (yeah,
 really).  Unfortunately that broke the usual suspects, !CONFIG_MODULES and
 !CONFIG_SYSFS, so those fixes were appended too.
 
 Cheers,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVkgKHAAoJENkgDmzRrbjxQpwQAJVmBN6jF3SnwbQXv9vRixjH
 58V33sb1G1RW+kXxQ3/e8jLX/4VaN479CufruXQp+IJWXsN/CH0lbC3k8m7u50d7
 b1Zeqd/Yrh79rkc11b0X1698uGCSMlzz+V54Z0QOTEEX+nSu2ZZvccFS4UaHkn3z
 rqDo00lb7rxQz8U25qro2OZrG6D3ub2q20TkWUB8EO4AOHkPn8KWP2r429Axrr0K
 wlDWDTTt8/IsvPbuPf3T15RAhq1avkMXWn9nDXDjyWbpLfTn8NFnWmtesgY7Jl4t
 GjbXC5WYekX3w2ZDB9KaT/DAMQ1a7RbMXNSz4RX4VbzDl+yYeSLmIh2G9fZb1PbB
 PsIxrOgy4BquOWsJPm+zeFPSC3q9Cfu219L4AmxSjiZxC3dlosg5rIB892Mjoyv4
 qxmg6oiqtc4Jxv+Gl9lRFVOqyHZrTC5IJ+xgfv1EyP6kKMUKLlDZtxZAuQxpUyxR
 HZLq220RYnYSvkWauikq4M8fqFM8bdt6hLJnv7bVqllseROk9stCvjSiE3A9szH5
 OgtOfYV5GhOeb8pCZqJKlGDw+RoJ21jtNCgOr6DgkNKV9CX/kL/Puwv8gnA0B0eh
 dxCeB7f/gcLl7Cg3Z3gVVcGlgak6JWrLf5ITAJhBZ8Lv+AtL2DKmwEWS/iIMRmek
 tLdh/a9GiCitqS0bT7GE
 =tWPQ
 -----END PGP SIGNATURE-----

Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull module updates from Rusty Russell:
 "Main excitement here is Peter Zijlstra's lockless rbtree optimization
  to speed module address lookup.  He found some abusers of the module
  lock doing that too.

  A little bit of parameter work here too; including Dan Streetman's
  breaking up the big param mutex so writing a parameter can load
  another module (yeah, really).  Unfortunately that broke the usual
  suspects, !CONFIG_MODULES and !CONFIG_SYSFS, so those fixes were
  appended too"

* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (26 commits)
  modules: only use mod->param_lock if CONFIG_MODULES
  param: fix module param locks when !CONFIG_SYSFS.
  rcu: merge fix for Convert ACCESS_ONCE() to READ_ONCE() and WRITE_ONCE()
  module: add per-module param_lock
  module: make perm const
  params: suppress unused variable error, warn once just in case code changes.
  modules: clarify CONFIG_MODULE_COMPRESS help, suggest 'N'.
  kernel/module.c: avoid ifdefs for sig_enforce declaration
  kernel/workqueue.c: remove ifdefs over wq_power_efficient
  kernel/params.c: export param_ops_bool_enable_only
  kernel/params.c: generalize bool_enable_only
  kernel/module.c: use generic module param operaters for sig_enforce
  kernel/params: constify struct kernel_param_ops uses
  sysfs: tightened sysfs permission checks
  module: Rework module_addr_{min,max}
  module: Use __module_address() for module_address_lookup()
  module: Make the mod_tree stuff conditional on PERF_EVENTS || TRACING
  module: Optimize __module_address() using a latched RB-tree
  rbtree: Implement generic latch_tree
  seqlock: Introduce raw_read_seqcount_latch()
  ...
2015-07-01 10:49:25 -07:00
Ingo Molnar
fb39f98d14 printk: Increase maximum CONFIG_LOG_BUF_SHIFT from 21 to 25
So I tried to some kernel debugging that produced a ton of kernel messages
on a big box, and wanted to save them all: but CONFIG_LOG_BUF_SHIFT maxes
out at 21 (2 MB).

Increase it to 25 (32 MB).

This does not affect any existing config or defaults.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-07-01 10:19:11 +02:00
Mel Gorman
0e1cc95b4c mm: meminit: finish initialisation of struct pages before basic setup
Waiman Long reported that 24TB machines hit OOM during basic setup when
struct page initialisation was deferred.  One approach is to initialise
memory on demand but it interferes with page allocator paths.  This patch
creates dedicated threads to initialise memory before basic setup.  It
then blocks on a rw_semaphore until completion as a wait_queue and counter
is overkill.  This may be slower to boot but it's simplier overall and
also gets rid of a section mangling which existed so kswapd could do the
initialisation.

[akpm@linux-foundation.org: include rwsem.h, use DECLARE_RWSEM, fix comment, remove unneeded cast]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Waiman Long <waiman.long@hp.com
Cc: Nathan Zimmer <nzimmer@sgi.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Scott Norton <scott.norton@hp.com>
Tested-by: Daniel J Blueman <daniel@numascale.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-30 19:44:56 -07:00
Linus Torvalds
bbe179f88d Merge branch 'for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:

 - threadgroup_lock got reorganized so that its users can pick the
   actual locking mechanism to use.  Its only user - cgroups - is
   updated to use a percpu_rwsem instead of per-process rwsem.

   This makes things a bit lighter on hot paths and allows cgroups to
   perform and fail multi-task (a process) migrations atomically.
   Multi-task migrations are used in several places including the
   unified hierarchy.

 - Delegation rule and documentation added to unified hierarchy.  This
   will likely be the last interface update from the cgroup core side
   for unified hierarchy before lifting the devel mask.

 - Some groundwork for the pids controller which is scheduled to be
   merged in the coming devel cycle.

* 'for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: add delegation section to unified hierarchy documentation
  cgroup: require write perm on common ancestor when moving processes on the default hierarchy
  cgroup: separate out cgroup_procs_write_permission() from __cgroup_procs_write()
  kernfs: make kernfs_get_inode() public
  MAINTAINERS: add a cgroup core co-maintainer
  cgroup: fix uninitialised iterator in for_each_subsys_which
  cgroup: replace explicit ss_mask checking with for_each_subsys_which
  cgroup: use bitmask to filter for_each_subsys
  cgroup: add seq_file forward declaration for struct cftype
  cgroup: simplify threadgroup locking
  sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem
  sched, cgroup: reorganize threadgroup locking
  cgroup: switch to unsigned long for bitmasks
  cgroup: reorganize include/linux/cgroup.h
  cgroup: separate out include/linux/cgroup-defs.h
  cgroup: fix some comment typos
2015-06-26 19:50:04 -07:00
Linus Torvalds
8d7804a2f0 Driver core patches for 4.2-rc1
Here is the driver core / firmware changes for 4.2-rc1.
 
 A number of small changes all over the place in the driver core, and in
 the firmware subsystem.  Nothing really major, full details in the
 shortlog.  Some of it is a bit of churn, given that the platform driver
 probing changes was found to not work well, so they were reverted.
 
 All of these have been in linux-next for a while with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlWNoCQACgkQMUfUDdst+ym4JACdFrrXoMt2pb8nl5gMidGyM9/D
 jg8AnRgdW8ArDA/xOarULd/X43eA3J3C
 =Al2B
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the driver core / firmware changes for 4.2-rc1.

  A number of small changes all over the place in the driver core, and
  in the firmware subsystem.  Nothing really major, full details in the
  shortlog.  Some of it is a bit of churn, given that the platform
  driver probing changes was found to not work well, so they were
  reverted.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'driver-core-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (31 commits)
  Revert "base/platform: Only insert MEM and IO resources"
  Revert "base/platform: Continue on insert_resource() error"
  Revert "of/platform: Use platform_device interface"
  Revert "base/platform: Remove code duplication"
  firmware: add missing kfree for work on async call
  fs: sysfs: don't pass count == 0 to bin file readers
  base:dd - Fix for typo in comment to function driver_deferred_probe_trigger().
  base/platform: Remove code duplication
  of/platform: Use platform_device interface
  base/platform: Continue on insert_resource() error
  base/platform: Only insert MEM and IO resources
  firmware: use const for remaining firmware names
  firmware: fix possible use after free on name on asynchronous request
  firmware: check for file truncation on direct firmware loading
  firmware: fix __getname() missing failure check
  drivers: of/base: move of_init to driver_init
  drivers/base: cacheinfo: fix annoying typo when DT nodes are absent
  sysfs: disambiguate between "error code" and "failure" in comments
  driver-core: fix build for !CONFIG_MODULES
  driver-core: make __device_attach() static
  ...
2015-06-26 15:07:37 -07:00
Linus Torvalds
47a469421d Merge branch 'akpm' (patches from Andrew)
Merge second patchbomb from Andrew Morton:

 - most of the rest of MM

 - lots of misc things

 - procfs updates

 - printk feature work

 - updates to get_maintainer, MAINTAINERS, checkpatch

 - lib/ updates

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (96 commits)
  exit,stats: /* obey this comment */
  coredump: add __printf attribute to cn_*printf functions
  coredump: use from_kuid/kgid when formatting corename
  fs/reiserfs: remove unneeded cast
  NILFS2: support NFSv2 export
  fs/befs/btree.c: remove unneeded initializations
  fs/minix: remove unneeded cast
  init/do_mounts.c: add create_dev() failure log
  kasan: remove duplicate definition of the macro KASAN_FREE_PAGE
  fs/efs: femove unneeded cast
  checkpatch: emit "NOTE: <types>" message only once after multiple files
  checkpatch: emit an error when there's a diff in a changelog
  checkpatch: validate MODULE_LICENSE content
  checkpatch: add multi-line handling for PREFER_ETHER_ADDR_COPY
  checkpatch: suggest using eth_zero_addr() and eth_broadcast_addr()
  checkpatch: fix processing of MEMSET issues
  checkpatch: suggest using ether_addr_equal*()
  checkpatch: avoid NOT_UNIFIED_DIFF errors on cover-letter.patch files
  checkpatch: remove local from codespell path
  checkpatch: add --showfile to allow input via pipe to show filenames
  ...
2015-06-26 09:52:05 -07:00
Vishnu Pratap Singh
c69e3c3a0c init/do_mounts.c: add create_dev() failure log
If create_dev() function fails to create the root mount device
(/dev/root), then it goes to panic as root device not found but there is
no printk in this case.  So I have added the log in case it fails to
create the root device.  It will help in debugging.

[akpm@linux-foundation.org: simplify printk(), use pr_emerg(), display errno]
Signed-off-by: Vishnu Pratap Singh <vishnu.ps@samsung.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Dan Ehrenberg <dehrenberg@chromium.org>
Cc: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 17:00:42 -07:00
Iago López Galeiras
2e13ba54a2 fs, proc: introduce CONFIG_PROC_CHILDREN
Commit 818411616b ("fs, proc: introduce /proc/<pid>/task/<tid>/children
entry") introduced the children entry for checkpoint restore and the
file is only available on kernels configured with CONFIG_EXPERT and
CONFIG_CHECKPOINT_RESTORE.

This is available in most distributions (Fedora, Debian, Ubuntu, CoreOS)
because they usually enable CONFIG_EXPERT and CONFIG_CHECKPOINT_RESTORE.
But Arch does not enable CONFIG_EXPERT or CONFIG_CHECKPOINT_RESTORE.

However, the children proc file is useful outside of checkpoint restore.
I would like to use it in rkt.  The rkt process exec() another program
it does not control, and that other program will fork()+exec() a child
process.  I would like to find the pid of the child process from an
external tool without iterating in /proc over all processes to find
which one has a parent pid equal to rkt.

This commit introduces CONFIG_PROC_CHILDREN and makes
CONFIG_CHECKPOINT_RESTORE select it.  This allows enabling
/proc/<pid>/task/<tid>/children without needing to enable
CONFIG_CHECKPOINT_RESTORE and CONFIG_EXPERT.

Alban tested that /proc/<pid>/task/<tid>/children is present when the
kernel is configured with CONFIG_PROC_CHILDREN=y but without
CONFIG_CHECKPOINT_RESTORE

Signed-off-by: Iago López Galeiras <iago@endocode.com>
Tested-by: Alban Crequy <alban@endocode.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Djalal Harouni <djalal@endocode.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25 17:00:37 -07:00