Commit graph

80171 commits

Author SHA1 Message Date
Russell King
5e4ba617c1 ARM: VFP: fix emulation of second VFP instruction
Martin Storsjö reports that the sequence:

        ee312ac1        vsub.f32        s4, s3, s2
        ee702ac0        vsub.f32        s5, s1, s0
        e59f0028        ldr             r0, [pc, #40]
        ee111a90        vmov            r1, s3

on Raspberry Pi (implementor 41 architecture 1 part 20 variant b rev 5)
where s3 is a denormal and s2 is zero results in incorrect behaviour -
the instruction "vsub.f32 s5, s1, s0" is not executed:

        VFP: bounce: trigger ee111a90 fpexc d0000780
        VFP: emulate: INST=0xee312ac1 SCR=0x00000000
        ...

As we can see, the instruction triggering the exception is the "vmov"
instruction, and we emulate the "vsub.f32 s4, s3, s2" but fail to
properly take account of the FPEXC_FP2V flag in FPEXC.  This is because
the test for the second instruction register being valid is bogus, and
will always skip emulation of the second instruction.

Cc: <stable@vger.kernel.org>
Reported-by: Martin Storsjö <martin@martin.st>
Tested-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2013-02-25 16:09:12 +00:00
Marek Szyprowski
d589829107 ARM: DMA-mapping: fix memory leak in IOMMU dma-mapping implementation
This patch removes page_address() usage in IOMMU-aware dma-mapping
implementation and replaced it with direct use of the cpu virtual address
provided by the caller. page_address() returned incorrect address for
pages remapped in atomic pool, what caused memory leak.

Reported-by: Hiroshi Doyu <hdoyu@nvidia.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Hiroshi Doyu <hdoyu@nvidia.com>
2013-02-25 15:30:44 +01:00
Seung-Woo Kim
60460abffc ARM: dma-mapping: Add maximum alignment order for dma iommu buffers
Alignment order for a dma iommu buffer is set by buffer size. For
large buffer, it is a waste of iommu address space. So configurable
parameter to limit maximum alignment order can reduce the waste.

Signed-off-by: Seung-Woo Kim <sw0312.kim@samsung.com>
Signed-off-by: Kyungmin.park <kyungmin.park@samsung.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
2013-02-25 15:30:43 +01:00
Marek Szyprowski
f8669bef11 ARM: dma-mapping: use himem for DMA buffers for IOMMU-mapped devices
IOMMU can provide access to any memory page, so there is no point in
limiting the allocated pages only to lowmem, once other parts of
dma-mapping subsystem correctly supports himem pages.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
2013-02-25 15:30:43 +01:00
Marek Szyprowski
9848e48f4c ARM: dma-mapping: add support for CMA regions placed in highmem zone
This patch adds missing pieces to correctly support memory pages served
from CMA regions placed in high memory zones. Please note that the default
global CMA area is still put into lowmem and is limited by optional
architecture specific DMA zone. One can however put device specific CMA
regions in high memory zone to reduce lowmem usage.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
2013-02-25 15:30:42 +01:00
Prathyush K
18177d12c0 arm: dma mapping: export arm iommu functions
This patch adds EXPORT_SYMBOL_GPL calls to the three arm iommu
functions - arm_iommu_create_mapping, arm_iommu_free_mapping
and arm_iommu_attach_device. These three functions are arm specific
wrapper functions for creating/freeing/using an iommu mapping and
they are called by various drivers. If any of these drivers need
to be built as dynamic modules, these functions need to be exported.

Changelog v2: using EXPORT_SYMBOL_GPL as suggested by Marek.

Signed-off-by: Prathyush K <prathyush.k@samsung.com>
[m.szyprowski: extended with recently introduced
 EXPORT_SYMBOL_GPL(arm_iommu_detach_device)]
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
2013-02-25 15:30:42 +01:00
Hiroshi Doyu
6fe3675803 ARM: dma-mapping: Add arm_iommu_detach_device()
A counter part of arm_iommu_attach_device().

Signed-off-by: Hiroshi Doyu <hdoyu@nvidia.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
2013-02-25 15:30:41 +01:00
Hiroshi Doyu
fab112a394 ARM: dma-mapping: Add macro to_dma_iommu_mapping()
This can be built without CONFIG_ARM_DMA_USE_IOMMU.

Signed-off-by: Hiroshi Doyu <hdoyu@nvidia.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
2013-02-25 15:30:41 +01:00
Hiroshi Doyu
d09e1333ec ARM: dma-mapping: Set arm_dma_set_mask() for iommu->set_dma_mask()
struct dma_map_ops iommu_ops doesn't have ->set_dma_mask, which causes
crash when dma_set_mask() is called from some driver.

Signed-off-by: Hiroshi Doyu <hdoyu@nvidia.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
2013-02-25 15:30:41 +01:00
Laurent Pinchart
3e3a182328 ARM: iommu: Include linux/kref.h in asm/dma-iommu.h
The dma_iommu_mapping structure defined in asm/dma-iommu.h embeds a
struct kref, include the appropriate header file.

Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
2013-02-25 15:30:40 +01:00
Al Viro
3f6d078d4a fix compat truncate/ftruncate
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-25 09:24:55 -05:00
Marc Zyngier
3b8cd8a015 ARM: KVM: fix compilation after removal of user_alloc from struct kvm_memory_slot
Commit 7a905b1 (KVM: Remove user_alloc from struct kvm_memory_slot)
broke KVM/ARM by removing the user_alloc field from a public structure.

As we only used this field to alert the user that we didn't support
this operation mode, there is no harm in discarding this bit of code
without any remorse.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-02-25 11:48:41 +02:00
Marc Zyngier
2b5e1e474f ARM: KVM: Rename KVM_MEMORY_SLOTS -> KVM_USER_MEM_SLOTS
Commit bbacc0c (KVM: Rename KVM_MEMORY_SLOTS -> KVM_USER_MEM_SLOTS)
broke KVM/ARM by changing a global #define.

Apply the same change to fix the compilation breakage.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-02-25 11:47:59 +02:00
Marc Zyngier
bef103aa7d ARM: KVM: fix kvm_arch_{prepare,commit}_memory_region
Commit f82a8cfe9 (KVM: struct kvm_memory_slot.user_alloc -> bool)
broke the ARM KVM port by changing the prototype of two global
functions.

Apply the same change to fix the compilation breakage.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
2013-02-25 11:47:28 +02:00
Linus Torvalds
ab7826595e This is the MFD pull request for the 3.9 merge window.
No new drivers this time, but a bunch of fairly big cleanups:
 
 - Roger Quadros worked on a OMAP USBHS and TLL platform data consolidation,
   OMAP5 support and clock management code cleanup.
 
 - The first step of a major sync for the ab8500 driver from Lee Jones. In
   particular, the debugfs and the sysct interfaces got extended and improved.
 
 - Peter Ujfalusi sent a nice patchset for cleaning and fixing the twl-core
   driver, with a much needed module id lookup code improvement.
 
 - The regular wm5102 and arizona cleanups and fixes from Mark Brown.
 
 - Laxman Dewangan extended the palmas APIs in order to implement the palmas
   GPIO and rt drivers.
 
 - Laxman also added DT support for the tps65090 driver.
 
 - The Intel SCH and ICH drivers got a couple fixes from Aaron Sierra and
   Darren Hart.
 
 - Linus Walleij patchset for the ab8500 driver allowed ab8500 and ab9540 based
   devices to switch to the new abx500 pin-ctrl driver.
 
 - The max8925 now has device tree and irqdomain support thanks to Qing Xu.
 
 - The recently added rtsx driver got a few cleanups and fixes for a better
   card detection code path and now also supports the RTS5227 chipset, thanks
   to Wei Wang and Roger Tseng.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJRJWtoAAoJEIqAPN1PVmxKHvEP/2B0uUJQpRjTjc9lgrzaUdFO
 QdSyj8Vpz1fZPF3cDuik/Be91U1nICyJWZZ7Yvngih3kbqiGw8nbp9AATwdF4Anl
 +QEtPS9YIpu1w5NS4r01q+cx3NxFsaR5yfNPKLWWnLtN8zHq4qQS8SF8jjdrsgkz
 zUJrjn1GfPPeuyR6Fuko9Mxk6oQ6TUazWJwmxzSCcrKKQ7GlAnbRWcwv2Qnv3GTg
 xi9S85xmLtVhLmMIQ2OR4JP56Vc49OL2/+hv1uShtFVDR4JLeZ5l8j0IPjgXJTrv
 ZbcrzEfJkV+zkY5RmDQrZzNLIWQmH9lj0bzitrS8Ii622J/mfnvQEgnC5fObphjg
 sYKJuBtO9o7XtFJxH7/oXdwM2gC5jDy43FmRv9J2mIDVoX8kNZMzY/sqmL8jm7KD
 MZU0D7liOyF3/1VbZHGSplKp+HKd5XYXCDQEesG/urkDr2edYHdb52iUVYGC32yV
 MaTqXnRMGURdjhCorAZPV8hoQOSWV8pj6oskk+E3upCmWjQWdhKMw3/HzasDLO9B
 eoQSgwZXJLTV8hJjAdgPUjZbxyOyW+9jKA12nyqwX3noq0NseSSJaeZUWmLecmbF
 vXcg4LAwSUycNJ0DxtWC2oEUV9jv4B+dOE7avVT6BDSvth4nzDlO9gOTtxBLcgTz
 JKrAPxY1nS4AUXJbsYTq
 =BhC1
 -----END PGP SIGNATURE-----

Merge tag 'mfd-3.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6

Pull MFS updates from Samuel Ortiz:
 "This is the MFD pull request for the 3.9 merge window.

  No new drivers this time, but a bunch of fairly big cleanups:

   - Roger Quadros worked on a OMAP USBHS and TLL platform data
     consolidation, OMAP5 support and clock management code cleanup.

   - The first step of a major sync for the ab8500 driver from Lee
     Jones.  In particular, the debugfs and the sysct interfaces got
     extended and improved.

   - Peter Ujfalusi sent a nice patchset for cleaning and fixing the
     twl-core driver, with a much needed module id lookup code
     improvement.

   - The regular wm5102 and arizona cleanups and fixes from Mark Brown.

   - Laxman Dewangan extended the palmas APIs in order to implement the
     palmas GPIO and rt drivers.

   - Laxman also added DT support for the tps65090 driver.

   - The Intel SCH and ICH drivers got a couple fixes from Aaron Sierra
     and Darren Hart.

   - Linus Walleij patchset for the ab8500 driver allowed ab8500 and
     ab9540 based devices to switch to the new abx500 pin-ctrl driver.

   - The max8925 now has device tree and irqdomain support thanks to
     Qing Xu.

   - The recently added rtsx driver got a few cleanups and fixes for a
     better card detection code path and now also supports the RTS5227
     chipset, thanks to Wei Wang and Roger Tseng."

* tag 'mfd-3.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6: (109 commits)
  mfd: lpc_ich: Use devres API to allocate private data
  mfd: lpc_ich: Add Device IDs for Intel Wellsburg PCH
  mfd: lpc_sch: Accomodate partial population of the MFD devices
  mfd: da9052-i2c: Staticize da9052_i2c_fix()
  mfd: syscon: Fix sparse warning
  mfd: twl-core: Fix kernel panic on boot
  mfd: rtsx: Fix issue that booting OS with SD card inserted
  mfd: ab8500: Fix compile error
  mfd: Add missing GENERIC_HARDIRQS dependecies
  Documentation: Add docs for max8925 dt
  mfd: max8925: Add dts
  mfd: max8925: Support dt for backlight
  mfd: max8925: Fix onkey driver irq base
  mfd: max8925: Fix mfd device register failure
  mfd: max8925: Add irqdomain for dt
  mfd: vexpress: Allow vexpress-sysreg to self-initialise
  mfd: rtsx: Support RTS5227
  mfd: rtsx: Implement driving adjustment to device-dependent callbacks
  mfd: vexpress: Add pseudo-GPIO based LEDs
  mfd: ab8500: Rename ab8500 to abx500 for hwmon driver
  ...
2013-02-24 20:00:58 -08:00
Linus Torvalds
21fbd5809a Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media updates from Mauro Carvalho Chehab:

 - Some cleanups at V4L2 documentation

 - new drivers: ts2020 frontend, ov9650 sensor, s5c73m3 sensor,
   sh-mobile veu mem2mem driver, radio-ma901, davinci_vpfe staging
   driver

 - Lots of missing MAINTAINERS entries added

 - several em28xx driver improvements, including its conversion to
   videobuf2

 - several fixups on drivers to make them to better comply with the API

 - DVB core: add support for DVBv5 stats, allowing the implementation of
   statistics for new standards like ISDB

 - mb86a20s: add statistics to the driver

 - lots of new board additions, cleanups, and driver improvements.

* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (596 commits)
  [media] media: Add 0x3009 USB PID to ttusb2 driver (fixed diff)
  [media] rtl28xxu: Add USB IDs for Compro VideoMate U620F
  [media] em28xx: add usb id for terratec h5 rev. 3
  [media] media: rc: gpio-ir-recv: add support for device tree parsing
  [media] mceusb: move check earlier to make smatch happy
  [media] radio-si470x doc: add info about v4l2-ctl and sox+alsa
  [media] staging: media: Remove unnecessary OOM messages
  [media] sh_vou: Use vou_dev instead of vou_file wherever possible
  [media] sh_vou: Use video_drvdata()
  [media] drivers/media/platform/soc_camera/pxa_camera.c: use devm_ functions
  [media] mt9t112: mt9t111 format set up differs from mt9t112
  [media] sh-mobile-ceu-camera: fix SHARPNESS control default
  Revert "[media] fc0011: Return early, if the frequency is already tuned"
  [media] cx18/ivtv: fix regression: remove __init from a non-init function
  [media] em28xx: fix analog streaming with USB bulk transfers
  [media] stv0900: remove unnecessary null pointer check
  [media] fc0011: Return early, if the frequency is already tuned
  [media] fc0011: Add some sanity checks and cleanups
  [media] fc0011: Fix xin value clamping
  Revert "[media] [PATH,1/2] mxl5007 move reset to attach"
  ...
2013-02-24 17:35:10 -08:00
Linus Torvalds
77be36de8b Features:
- Xen ACPI memory and CPU hotplug drivers - allowing Xen hypervisor
    to be aware of new CPU and new DIMMs
  - Cleanups
 Bug-fixes:
  - Fixes a long-standing bug in the PV spinlock wherein we did not
    kick VCPUs that were in a tight loop.
  - Fixes in the error paths for the event channel machinery.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQEcBAABAgAGBQJRJS1kAAoJEFjIrFwIi8fJj2YIAMO3/LVUZyojX/d8U9pqrCly
 lFfEF2UVjcxHJSj0ZFNXt1o3fnYP1SLRlT9u7ZLDjXf6Lmxmw6/C3Haw2wp3DfGq
 yUR0G/X9CPTBEgMYDdX7bjeTjyURvZcUaFwr+qodaaeL3uXx2pW6621Sc6jRKuia
 yAFVZMAKeaRrvUUIXjKHtlpRp9LKFdSztShMtYqmFvxEwrJPq2b37caKruoUCa6o
 X/YO0fvE9QtYD/pG0jsghFmLh/mcr+n9IFMCUXo1Yc9FdQBExtKzABDS5jdpuFND
 4aMDE3dqUmHmpbaQhRE7SdblvpyrGdQXL6FSTjvwBgISfLo847CrnRKRgPp0YeA=
 =LQeU
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.9-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen

Pull Xen update from Konrad Rzeszutek Wilk:
 "This has two new ACPI drivers for Xen - a physical CPU offline/online
  and a memory hotplug.  The way this works is that ACPI kicks the
  drivers and they make the appropiate hypercall to the hypervisor to
  tell it that there is a new CPU or memory.  There also some changes to
  the Xen ARM ABIs and couple of fixes.  One particularly nasty bug in
  the Xen PV spinlock code was fixed by Stefan Bader - and has been
  there since the 2.6.32!

  Features:
   - Xen ACPI memory and CPU hotplug drivers - allowing Xen hypervisor
     to be aware of new CPU and new DIMMs
   - Cleanups
  Bug-fixes:
   - Fixes a long-standing bug in the PV spinlock wherein we did not
     kick VCPUs that were in a tight loop.
   - Fixes in the error paths for the event channel machinery"

Fix up a few semantic conflicts with the ACPI interface changes in
drivers/xen/xen-acpi-{cpu,mem}hotplug.c.

* tag 'stable/for-linus-3.9-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen: event channel arrays are xen_ulong_t and not unsigned long
  xen: Send spinlock IPI to all waiters
  xen: introduce xen_remap, use it instead of ioremap
  xen: close evtchn port if binding to irq fails
  xen-evtchn: correct comment and error output
  xen/tmem: Add missing %s in the printk statement.
  xen/acpi: move xen_acpi_get_pxm under CONFIG_XEN_DOM0
  xen/acpi: ACPI cpu hotplug
  xen/acpi: Move xen_acpi_get_pxm to Xen's acpi.h
  xen/stub: driver for CPU hotplug
  xen/acpi: ACPI memory hotplug
  xen/stub: driver for memory hotplug
  xen: implement updated XENMEM_add_to_physmap_range ABI
  xen/smp: Move the common CPU init code a bit to prep for PVH patch.
2013-02-24 16:18:31 -08:00
Linus Torvalds
89f883372f Merge tag 'kvm-3.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Marcelo Tosatti:
 "KVM updates for the 3.9 merge window, including x86 real mode
  emulation fixes, stronger memory slot interface restrictions, mmu_lock
  spinlock hold time reduction, improved handling of large page faults
  on shadow, initial APICv HW acceleration support, s390 channel IO
  based virtio, amongst others"

* tag 'kvm-3.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (143 commits)
  Revert "KVM: MMU: lazily drop large spte"
  x86: pvclock kvm: align allocation size to page size
  KVM: nVMX: Remove redundant get_vmcs12 from nested_vmx_exit_handled_msr
  x86 emulator: fix parity calculation for AAD instruction
  KVM: PPC: BookE: Handle alignment interrupts
  booke: Added DBCR4 SPR number
  KVM: PPC: booke: Allow multiple exception types
  KVM: PPC: booke: use vcpu reference from thread_struct
  KVM: Remove user_alloc from struct kvm_memory_slot
  KVM: VMX: disable apicv by default
  KVM: s390: Fix handling of iscs.
  KVM: MMU: cleanup __direct_map
  KVM: MMU: remove pt_access in mmu_set_spte
  KVM: MMU: cleanup mapping-level
  KVM: MMU: lazily drop large spte
  KVM: VMX: cleanup vmx_set_cr0().
  KVM: VMX: add missing exit names to VMX_EXIT_REASONS array
  KVM: VMX: disable SMEP feature when guest is in non-paging mode
  KVM: Remove duplicate text in api.txt
  Revert "KVM: MMU: split kvm_mmu_free_page"
  ...
2013-02-24 13:07:18 -08:00
Doug Anderson
488755b5cc ARM: dts: Add disable-wp for sd card slot on smdk5250
The next change will remove the code from the dw_mmc-exynos that added
the DW_MCI_QUIRK_NO_WRITE_PROTECT.  Keep existing functionality of
having no write protect pin on smdk5250 by adding the disable-wp
property.

Signed-off-by: Doug Anderson <dianders@chromium.org>
Acked-by: Seungwon Jeon <tgih.jun@samsung.com>
Acked-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Chris Ball <cjb@laptop.org>
2013-02-24 14:36:53 -05:00
Al Viro
561c673197 switch lseek to COMPAT_SYSCALL_DEFINE
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-24 10:52:26 -05:00
Andrea Arcangeli
a8aed3e075 x86/mm/pageattr: Prevent PSE and GLOABL leftovers to confuse pmd/pte_present and pmd_huge
Without this patch any kernel code that reads kernel memory in
non present kernel pte/pmds (as set by pageattr.c) will crash.

With this kernel code:

static struct page *crash_page;
static unsigned long *crash_address;
[..]
	crash_page = alloc_pages(GFP_KERNEL, 9);
	crash_address = page_address(crash_page);
	if (set_memory_np((unsigned long)crash_address, 1))
		printk("set_memory_np failure\n");
[..]

The kernel will crash if inside the "crash tool" one would try
to read the memory at the not present address.

crash> p crash_address
crash_address = $8 = (long unsigned int *) 0xffff88023c000000
crash> rd 0xffff88023c000000
[ *lockup* ]

The lockup happens because _PAGE_GLOBAL and _PAGE_PROTNONE
shares the same bit, and pageattr leaves _PAGE_GLOBAL set on a
kernel pte which is then mistaken as _PAGE_PROTNONE (so
pte_present returns true by mistake and the kernel fault then
gets confused and loops).

With THP the same can happen after we taught pmd_present to
check _PAGE_PROTNONE and _PAGE_PSE in commit
027ef6c878 ("mm: thp: fix pmd_present for
split_huge_page and PROT_NONE with THP").  THP has the same
problem with _PAGE_GLOBAL as the 4k pages, but it also has a
problem with _PAGE_PSE, which must be cleared too.

After the patch is applied copy_user correctly returns -EFAULT
and doesn't lockup anymore.

crash> p crash_address
crash_address = $9 = (long unsigned int *) 0xffff88023c000000
crash> rd 0xffff88023c000000
rd: read error: kernel virtual address: ffff88023c000000  type:
"64-bit KVADDR"

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Shaohua Li <shaohua.li@intel.com>
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-02-24 13:01:45 +01:00
Andrea Arcangeli
954f857187 Revert "x86, mm: Make spurious_fault check explicitly check explicitly check the PRESENT bit"
I got a report for a minor regression introduced by commit
027ef6c878 ("mm: thp: fix pmd_present for split_huge_page and
PROT_NONE with THP").

So the problem is, pageattr creates kernel pagetables (pte and
pmds) that breaks pte_present/pmd_present and the patch above
exposed this invariant breakage for pmd_present.

The same problem already existed for the pte and pte_present and
it was fixed by commit 660a293ea9 ("x86, mm: Make
spurious_fault check explicitly check the PRESENT bit") (if it
wasn't for that commit, it wouldn't even be a regression).  That
fix avoids the pagefault to use pte_present.  I could follow
through by stopping using pmd_present/pmd_huge too.

However I think it's more robust to fix pageattr and to clear
the PSE/GLOBAL bitflags too in addition to the present bitflag.
So the kernel page fault can keep using the regular
pte_present/pmd_present/pmd_huge.

The confusion arises because _PAGE_GLOBAL and _PAGE_PROTNONE are
sharing the same bit, and in the pmd case we pretend _PAGE_PSE
to be set only in present pmds (to facilitate split_huge_page
final tlb flush).

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Shaohua Li <shaohua.li@intel.com>
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-02-24 13:01:44 +01:00
Wen Congyang
942670d0dc x86/mm/numa: Don't check if node is NUMA_NO_NODE
If we aren't debugging per_cpu maps, the cpu's node is stored in
per_cpu variable numa_node.  If `node' is NUMA_NO_NODE, it means
the caller wants to clear the cpu's node.  So we should also
call set_cpu_numa_node() in this case.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-02-24 13:01:43 +01:00
Al Viro
aee41fe2b2 lseek() and truncate() on sparc really need sign extension
ftruncate() doesn't - it's declared with size as unsigned long,
but truncate() and lseek() have that argument as signed long.
IOW, these two really need sign extension + branch to native
syscall; argument validation in sys_... does *not* suffice.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-24 03:02:41 -05:00
Chris Zankel
c50842df47 xtensa: add support for TLS
The Xtensa architecture provides a global register called THREADPTR
for the purpose of Thread Local Storage (TLS) support. This allows us
to use a fairly simple implementation, keeping the thread pointer in
the regset and simply saving and restoring it upon entering/exiting
the from user space.

Signed-off-by: Chris Zankel <chris@zankel.net>
2013-02-23 19:35:57 -08:00
Max Filippov
b0c438e642 xtensa: add missing include asm/uaccess.h to checksum.h
This fixes the following build errors seen in the linux-next:

arch/xtensa/include/asm/checksum.h:247:2: error: implicit declaration of
	function 'access_ok' [-Werror=implicit-function-declaration]
arch/xtensa/include/asm/checksum.h:247:16: error: 'VERIFY_WRITE' undeclared
	(first use in this function)

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
2013-02-23 19:23:13 -08:00
Max Filippov
e98c5b5b52 xtensa: do not enable GENERIC_GPIO by default
Now that drivers/gpio/devres.c build does not depend on GPIOLIB do not
enable GENERIC_GPIO by default to fix the following build errors seen
in the linux-next:

include/asm-generic/gpio.h:270:2: error: implicit declaration of function
	'__gpio_get_value' [-Werror=implicit-function-declaration]
include/asm-generic/gpio.h:276:2: error: implicit declaration of function
	'__gpio_set_value' [-Werror=implicit-function-declaration]
include/linux/gpio.h:60:19: error: redefinition of 'gpio_cansleep'
	include/linux/gpio.h:62:2: error: implicit declaration of function
       	'__gpio_cansleep' [-Werror=implicit-function-declaration]
include/linux/gpio.h:67:2: error: implicit declaration of function
	'__gpio_to_irq' [-Werror=implicit-function-declaration]
drivers/gpio/devres.c:26:2: error: implicit declaration of function
	'gpio_free' [-Werror=implicit-function-declaration]
drivers/gpio/devres.c:60:2: error: implicit declaration of function
	'gpio_request' [-Werror=implicit-function-declaration]
drivers/gpio/devres.c:90:2: error: implicit declaration of function
	'gpio_request_one' [-Werror=implicit-function-declaration]

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
2013-02-23 19:23:06 -08:00
Max Filippov
4b2bb03f10 xtensa: complete ptrace handling of register windows
Compute WindowBase and WindowMask registers correctly on ptrace calls.
Work done earlier by Maxim, Christian and Marc.

Signed-off-by: Marc Gauthier <marc@tensilica.com>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
2013-02-23 19:23:00 -08:00
dann
e6ffe17ec4 xtensa: add support for oprofile
Support call graph profiling.
Keep upper two bits of PC unchanged through backtrace rather than take
them from sp (a1). The stack pointer is usually in the same GB (same
upper 2 bits) as PC, but technically doesn't always have to be (and
might not in the future, when taking full advantage of MMU v3).

Signed-off-by: Dan Nicolaescu <dann@xtensa-linux.org>
Signed-off-by: Pete Delaney <piet@tensilica.com>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
2013-02-23 19:22:54 -08:00
Max Filippov
2d6f82fee4 xtensa: move spill_registers to traps.h
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
2013-02-23 19:22:48 -08:00
Victor Prupis
b6c7e873da xtensa: ISS: add host file-based simulated disk
Simdisk is a block device that maps to a file in the host file system.
It is usable for testing in the simulated environment, like xt-sim or
QEMU. Device binding to host file may be changed at runtime via proc
interface provided the device is not in use. Number of block devices
and initial binding to host files is controlled via kernel/module
parameters, with defaults specified in the kernel configuration.

Signed-off-by: Victor Prupis <vnp@tensilica.com>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
2013-02-23 19:22:41 -08:00
Max Filippov
c5a285bb1b xtensa: fix str[n]cmp return value
str[n]cmp functions return negative value if the first string is less
than the second, positive value if the first string is greater than the
second and zero if they are equal. This is important when these
functions are used for sorting/binary search.

With incorrect strcmp return value bsearch was always failing in the
find_symbol_in_section making it impossible to load any module.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
2013-02-23 19:22:31 -08:00
Max Filippov
de73b6b1bd xtensa: avoid mmap cache aliasing
Provide arch_get_unmapped_area function aligning shared memory mapping
addresses to the biggest of the page size or the cache way size. That
guarantees that corresponding virtual addresses of shared mappings are
cached by the same cache sets.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
2013-02-23 19:12:53 -08:00
Max Filippov
475c32d0a1 xtensa: add finit_module syscall
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
2013-02-23 19:12:52 -08:00
Max Filippov
5d9f36b94d xtensa: pull signal definitions from signal-defs.h
This fixes the following build error in the current linux-next:

include/linux/signal.h:261:2: error: unknown type name '__sigrestore_t'
make[2]: *** [arch/xtensa/kernel/asm-offsets.s] Error 1
make[1]: *** [prepare0] Error 2
make: *** [sub-make] Error 2

that appeared after 32dae82 'consolidate kernel-side struct sigaction declarations'

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
2013-02-23 19:12:52 -08:00
Max Filippov
e969161b66 xtensa: fix ipc_parse_version selection
shmctl may be called with IPC_64 flag, select function version of
ipc_parse_version to correctly handle that.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
2013-02-23 19:12:52 -08:00
Marc Gauthier
2d1c645cc5 xtensa: dispatch medium-priority interrupts
Add support for dispatching medium-priority interrupts, that is,
interrupts of priority levels 2 to EXCM_LEVEL. IRQ handling may be
preempted by higher priority IRQ.

Signed-off-by: Marc Gauthier <marc@tensilica.com>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
2013-02-23 19:12:52 -08:00
Pete Delaney
d0b73b488c xtensa: Add config files for Diamond 233L - Rev C processor variant
The Diamond 233L processor is a pre-configured Xtensa processor tailored
for Linux application.

Signed-off-by: Pete Delaney <piet@tensilica.com>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
2013-02-23 19:12:52 -08:00
Stephen Warren
2a02bc166d xtensa: use new common dtc rule
The current rules have the .dtb files build in a different directory
from the .dts files. This patch changes xtensa to use the generic dtb
rule which builds .dtb files in the same directory as the source .dts.

This requires moving parts of arch/xtensa/boot/Makefile into newly
created arch/xtensa/boot/dts/Makefile, and updating arch/xtensa/Makefile
to call the new Makefile.

Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
2013-02-23 19:12:52 -08:00
Max Filippov
127bc79e30 xtensa: rename prom_update_property to of_update_property
This rename happened in 79d1c71 powerpc+of: Rename the drivers/of prom_*
functions to of_*.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
2013-02-23 19:12:51 -08:00
Linus Torvalds
9e2d59ad58 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal
Pull signal handling cleanups from Al Viro:
 "This is the first pile; another one will come a bit later and will
  contain SYSCALL_DEFINE-related patches.

   - a bunch of signal-related syscalls (both native and compat)
     unified.

   - a bunch of compat syscalls switched to COMPAT_SYSCALL_DEFINE
     (fixing several potential problems with missing argument
     validation, while we are at it)

   - a lot of now-pointless wrappers killed

   - a couple of architectures (cris and hexagon) forgot to save
     altstack settings into sigframe, even though they used the
     (uninitialized) values in sigreturn; fixed.

   - microblaze fixes for delivery of multiple signals arriving at once

   - saner set of helpers for signal delivery introduced, several
     architectures switched to using those."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (143 commits)
  x86: convert to ksignal
  sparc: convert to ksignal
  arm: switch to struct ksignal * passing
  alpha: pass k_sigaction and siginfo_t using ksignal pointer
  burying unused conditionals
  make do_sigaltstack() static
  arm64: switch to generic old sigaction() (compat-only)
  arm64: switch to generic compat rt_sigaction()
  arm64: switch compat to generic old sigsuspend
  arm64: switch to generic compat rt_sigqueueinfo()
  arm64: switch to generic compat rt_sigpending()
  arm64: switch to generic compat rt_sigprocmask()
  arm64: switch to generic sigaltstack
  sparc: switch to generic old sigsuspend
  sparc: COMPAT_SYSCALL_DEFINE does all sign-extension as well as SYSCALL_DEFINE
  sparc: kill sign-extending wrappers for native syscalls
  kill sparc32_open()
  sparc: switch to use of generic old sigaction
  sparc: switch sys_compat_rt_sigaction() to COMPAT_SYSCALL_DEFINE
  mips: switch to generic sys_fork() and sys_clone()
  ...
2013-02-23 18:50:11 -08:00
Linus Torvalds
5ce1a70e2f Merge branch 'akpm' (more incoming from Andrew)
Merge second patch-bomb from Andrew Morton:

 - A little DM fix

 - the MM queue

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (154 commits)
  ksm: allocate roots when needed
  mm: cleanup "swapcache" in do_swap_page
  mm,ksm: swapoff might need to copy
  mm,ksm: FOLL_MIGRATION do migration_entry_wait
  ksm: shrink 32-bit rmap_item back to 32 bytes
  ksm: treat unstable nid like in stable tree
  ksm: add some comments
  tmpfs: fix mempolicy object leaks
  tmpfs: fix use-after-free of mempolicy object
  mm/fadvise.c: drain all pagevecs if POSIX_FADV_DONTNEED fails to discard all pages
  mm: export mmu notifier invalidates
  mm: accelerate mm_populate() treatment of THP pages
  mm: use long type for page counts in mm_populate() and get_user_pages()
  mm: accurately document nr_free_*_pages functions with code comments
  HWPOISON: change order of error_states[]'s elements
  HWPOISON: fix misjudgement of page_action() for errors on mlocked pages
  memcg: stop warning on memcg_propagate_kmem
  net: change type of virtio_chan->p9_max_pages
  vmscan: change type of vm_total_pages to unsigned long
  fs/nfsd: change type of max_delegations, nfsd_drc_max_mem and nfsd_drc_mem_used
  ...
2013-02-23 17:50:35 -08:00
Zhang Yanfei
6434b94a16 ia64: use %ld to print pages calculated in nr_free_buffer_pages
Now the function nr_free_buffer_pages returns unsigned long, so use %ld
to print its return value.

Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23 17:50:22 -08:00
Shaohua Li
ec8acf20af swap: add per-partition lock for swapfile
swap_lock is heavily contended when I test swap to 3 fast SSD (even
slightly slower than swap to 2 such SSD).  The main contention comes
from swap_info_get().  This patch tries to fix the gap with adding a new
per-partition lock.

Global data like nr_swapfiles, total_swap_pages, least_priority and
swap_list are still protected by swap_lock.

nr_swap_pages is an atomic now, it can be changed without swap_lock.  In
theory, it's possible get_swap_page() finds no swap pages but actually
there are free swap pages.  But sounds not a big problem.

Accessing partition specific data (like scan_swap_map and so on) is only
protected by swap_info_struct.lock.

Changing swap_info_struct.flags need hold swap_lock and
swap_info_struct.lock, because scan_scan_map() will check it.  read the
flags is ok with either the locks hold.

If both swap_lock and swap_info_struct.lock must be hold, we always hold
the former first to avoid deadlock.

swap_entry_free() can change swap_list.  To delete that code, we add a
new highest_priority_index.  Whenever get_swap_page() is called, we
check it.  If it's valid, we use it.

It's a pity get_swap_page() still holds swap_lock().  But in practice,
swap_lock() isn't heavily contended in my test with this patch (or I can
say there are other much more heavier bottlenecks like TLB flush).  And
BTW, looks get_swap_page() doesn't really need the lock.  We never free
swap_info[] and we check SWAP_WRITEOK flag.  The only risk without the
lock is we could swapout to some low priority swap, but we can quickly
recover after several rounds of swap, so sounds not a big deal to me.
But I'd prefer to fix this if it's a real problem.

"swap: make each swap partition have one address_space" improved the
swapout speed from 1.7G/s to 2G/s.  This patch further improves the
speed to 2.3G/s, so around 15% improvement.  It's a multi-process test,
so TLB flush isn't the biggest bottleneck before the patches.

[arnd@arndb.de: fix it for nommu]
[hughd@google.com: add missing unlock]
[minchan@kernel.org: get rid of lockdep whinge on sys_swapon]
Signed-off-by: Shaohua Li <shli@fusionio.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23 17:50:17 -08:00
Tang Chen
01a178a94e acpi, memory-hotplug: support getting hotplug info from SRAT
We now provide an option for users who don't want to specify physical
memory address in kernel commandline.

         /*
          * For movablemem_map=acpi:
          *
          * SRAT:                |_____| |_____| |_________| |_________| ......
          * node id:                0       1         1           2
          * hotpluggable:           n       y         y           n
          * movablemem_map:              |_____| |_________|
          *
          * Using movablemem_map, we can prevent memblock from allocating memory
          * on ZONE_MOVABLE at boot time.
          */

So user just specify movablemem_map=acpi, and the kernel will use
hotpluggable info in SRAT to determine which memory ranges should be set
as ZONE_MOVABLE.

If all the memory ranges in SRAT is hotpluggable, then no memory can be
used by kernel.  But before parsing SRAT, memblock has already reserve
some memory ranges for other purposes, such as for kernel image, and so
on.  We cannot prevent kernel from using these memory.  So we need to
exclude these ranges even if these memory is hotpluggable.

Furthermore, there could be several memory ranges in the single node
which the kernel resides in.  We may skip one range that have memory
reserved by memblock, but if the rest of memory is too small, then the
kernel will fail to boot.  So, make the whole node which the kernel
resides in un-hotpluggable.  Then the kernel has enough memory to use.

NOTE: Using this way will cause NUMA performance down because the
      whole node will be set as ZONE_MOVABLE, and kernel cannot use memory
      on it.  If users don't want to lose NUMA performance, just don't use
      it.

[akpm@linux-foundation.org: fix warning]
[akpm@linux-foundation.org: use strcmp()]
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Wu Jianguo <wujianguo@huawei.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Len Brown <lenb@kernel.org>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23 17:50:14 -08:00
Tang Chen
27168d38fa acpi, memory-hotplug: extend movablemem_map ranges to the end of node
When implementing movablemem_map boot option, we introduced an array
movablemem_map.map[] to store the memory ranges to be set as
ZONE_MOVABLE.

Since ZONE_MOVABLE is the latst zone of a node, if user didn't specify
the whole node memory range, we need to extend it to the node end so
that we can use it to prevent memblock from allocating memory in the
ranges user didn't specify.

We now implement movablemem_map boot option like this:

        /*
         * For movablemem_map=nn[KMG]@ss[KMG]:
         *
         * SRAT:                |_____| |_____| |_________| |_________| ......
         * node id:                0       1         1           2
         * user specified:                |__|                 |___|
         * movablemem_map:                |___| |_________|    |______| ......
         *
         * Using movablemem_map, we can prevent memblock from allocating memory
         * on ZONE_MOVABLE at boot time.
         *
         * NOTE: In this case, SRAT info will be ingored.
         */

[akpm@linux-foundation.org: clean up code, fix build warning]
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Wu Jianguo <wujianguo@huawei.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Len Brown <lenb@kernel.org>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23 17:50:14 -08:00
Tang Chen
e8d1955258 acpi, memory-hotplug: parse SRAT before memblock is ready
On linux, the pages used by kernel could not be migrated.  As a result,
if a memory range is used by kernel, it cannot be hot-removed.  So if we
want to hot-remove memory, we should prevent kernel from using it.

The way now used to prevent this is specify a memory range by
movablemem_map boot option and set it as ZONE_MOVABLE.

But when the system is booting, memblock will allocate memory, and
reserve the memory for kernel.  And before we parse SRAT, and know the
node memory ranges, memblock is working.  And it may allocate memory in
ranges to be set as ZONE_MOVABLE.  This memory can be used by kernel,
and never be freed.

So, let's parse SRAT before memblock is called first.  And it is early
enough.

The first call of memblock_find_in_range_node() is in:

  setup_arch()
    |-->setup_real_mode()

so, this patch add a function early_parse_srat() to parse SRAT, and call
it before setup_real_mode() is called.

NOTE:

1) early_parse_srat() is called before numa_init(), and has initialized
   numa_meminfo.  So DO NOT clear numa_nodes_parsed in numa_init() and DO
   NOT zero numa_meminfo in numa_init(), otherwise we will lose memory
   numa info.

2) I don't know why using count of memory affinities parsed from SRAT
   as a return value in original acpi_numa_init().  So I add a static
   variable srat_mem_cnt to remember this count and use it as the return
   value of the new acpi_numa_init()

[mhocko@suse.cz: parse SRAT before memblock is ready fix]
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Wu Jianguo <wujianguo@huawei.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Len Brown <lenb@kernel.org>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23 17:50:14 -08:00
Yasuaki Ishimatsu
4d59a75125 x86: get pg_data_t's memory from other node
During the implementation of SRAT support, we met a problem.  In
setup_arch(), we have the following call series:

 1) memblock is ready;
 2) some functions use memblock to allocate memory;
 3) parse ACPI tables, such as SRAT.

Before 3), we don't know which memory is hotpluggable, and as a result,
we cannot prevent memblock from allocating hotpluggable memory.  So, in
2), there could be some hotpluggable memory allocated by memblock.

Now, we are trying to parse SRAT earlier, before memblock is ready.  But
I think we need more investigation on this topic.  So in this v5, I
dropped all the SRAT support, and v5 is just the same as v3, and it is
based on 3.8-rc3.

As we planned, we will support getting info from SRAT without users'
participation at last.  And we will post another patch-set to do so.

And also, I think for now, we can add this boot option as the first step
of supporting movable node.  Since Linux cannot migrate the direct
mapped pages, the only way for now is to limit the whole node containing
only movable memory.

Using SRAT is one way.  But even if we can use SRAT, users still need an
interface to enable/disable this functionality if they don't want to
loose their NUMA performance.  So I think, a user interface is always
needed.

For now, users can disable this functionality by not specifying the boot
option.  Later, we will post SRAT support, and add another option value
"movablecore_map=acpi" to using SRAT.

This patch:

If system can create movable node which all memory of the node is
allocated as ZONE_MOVABLE, setup_node_data() cannot allocate memory for
the node's pg_data_t.  So, use memblock_alloc_try_nid() instead of
memblock_alloc_nid() to retry when the first allocation fails.

Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Cc: Wu Jianguo <wujianguo@huawei.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23 17:50:14 -08:00
Wen Congyang
e13fe8695c cpu-hotplug,memory-hotplug: clear cpu_to_node() when offlining the node
When the node is offlined, there is no memory/cpu on the node.  If a
sleep task runs on a cpu of this node, it will be migrated to the cpu on
the other node.  So we can clear cpu-to-node mapping.

[akpm@linux-foundation.org: numa_clear_node() and numa_set_node() can no longer be __cpuinit]
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Jiang Liu <liuj97@gmail.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23 17:50:13 -08:00
Wen Congyang
c4c6052464 cpu_hotplug: clear apicid to node when the cpu is hotremoved
When a cpu is hotpluged, we call acpi_map_cpu2node() in
_acpi_map_lsapic() to store the cpu's node and apicid's node.  But we
don't clear the cpu's node in acpi_unmap_lsapic() when this cpu is
hotremoved.  If the node is also hotremoved, we will get the following
messages:

  kernel BUG at include/linux/gfp.h:329!
  invalid opcode: 0000 [#1] SMP
  Modules linked in: ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat xt_CHECKSUM iptable_mangle bridge stp llc sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm crc32c_intel microcode pcspkr i2c_i801 i2c_core lpc_ich mfd_core ioatdma e1000e i7core_edac edac_core sg acpi_memhotplug igb dca sd_mod crc_t10dif megaraid_sas mptsas mptscsih mptbase scsi_transport_sas scsi_mod
  Pid: 3126, comm: init Not tainted 3.6.0-rc3-tangchen-hostbridge+ #13 FUJITSU-SV PRIMEQUEST 1800E/SB
  RIP: 0010:[<ffffffff811bc3fd>]  [<ffffffff811bc3fd>] allocate_slab+0x28d/0x300
  RSP: 0018:ffff88078a049cf8  EFLAGS: 00010246
  RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
  RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000000000000246
  RBP: ffff88078a049d38 R08: 00000000000040d0 R09: 0000000000000001
  R10: 0000000000000000 R11: 0000000000000b5f R12: 00000000000052d0
  R13: ffff8807c1417300 R14: 0000000000030038 R15: 0000000000000003
  FS:  00007fa9b1b44700(0000) GS:ffff8807c3800000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 00007fa9b09acca0 CR3: 000000078b855000 CR4: 00000000000007e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
  Process init (pid: 3126, threadinfo ffff88078a048000, task ffff8807bb6f2650)
  Call Trace:
    new_slab+0x30/0x1b0
    __slab_alloc+0x358/0x4c0
    kmem_cache_alloc_node_trace+0xb4/0x1e0
    alloc_fair_sched_group+0xd0/0x1b0
    sched_create_group+0x3e/0x110
    sched_autogroup_create_attach+0x4d/0x180
    sys_setsid+0xd4/0xf0
    system_call_fastpath+0x16/0x1b
  Code: 89 c4 e9 73 fe ff ff 31 c0 89 de 48 c7 c7 45 de 9e 81 44 89 45 c8 e8 22 05 4b 00 85 db 44 8b 45 c8 0f 89 4f ff ff ff 0f 0b eb fe <0f> 0b 90 eb fd 0f 0b eb fe 89 de 48 c7 c7 45 de 9e 81 31 c0 44
  RIP  [<ffffffff811bc3fd>] allocate_slab+0x28d/0x300
   RSP <ffff88078a049cf8>
  ---[ end trace adf84c90f3fea3e5 ]---

The reason is that the cpu's node is not NUMA_NO_NODE, we will call
alloc_pages_exact_node() to alloc memory on the node, but the node is
offlined.

If the node is onlined, we still need cpu's node.  For example: a task
on the cpu is sleeped when the cpu is hotremoved.  We will choose
another cpu to run this task when it is waked up.  If we know the cpu's
node, we will choose the cpu on the same node first.  So we should clear
cpu-to-node mapping when the node is offlined.

This patch only clears apicid-to-node mapping when the cpu is
hotremoved.

[akpm@linux-foundation.org: fix section error]
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Jiang Liu <liuj97@gmail.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23 17:50:13 -08:00