Commit graph

62550 commits

Author SHA1 Message Date
Andrea Arcangeli
3526741f09 powerpc: gup_hugepte() support THP based tail recounting
Up to this point the code assumed old refcounting for hugepages (pre-thp).
This updates the code directly to the thp mapcount tail page refcounting.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-02 16:06:57 -07:00
Andrea Arcangeli
8596468487 powerpc: gup_hugepte() avoid freeing the head page too many times
We only taken "refs" pins on the head page not "*nr" pins.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-02 16:06:57 -07:00
Andrea Arcangeli
405e44f2e3 powerpc: get_hugepte() don't put_page() the wrong page
"page" may have changed to point to the next hugepage after the loop
completed, The references have been taken on the head page, so the
put_page must happen there too.

This is a longstanding issue pre-thp inclusion.

It's totally unclear how these page_cache_add_speculative and
pte_val(pte) != pte_val(*ptep) checks are necessary across all the
powerpc gup_fast code, when x86 doesn't need any of that: there's no way
the page can be freed with irq disabled so we're guaranteed the
atomic_inc will happen on a page with page_count > 0 (so not needing the
speculative check).

The pte check is also meaningless on x86: no need to rollback on x86 if
the pte changed, because the pte can still change a CPU tick after the
check succeeded and it won't be rolled back in that case.  The important
thing is we got a reference on a valid page that was mapped there a CPU
tick ago.  So not knowing the soft tlb refill code of ppc64 in great
detail I'm not removing the "speculative" page_count increase and the
pte checks across all the code, but unless there's a strong reason for
it they should be later cleaned up too.

If a pte can change from huge to non-huge (like it could happen with
THP) passing a pte_t *ptep to gup_hugepte() would also require to repeat
the is_hugepd in gup_hugepte(), but that shouldn't happen with hugetlbfs
only so I'm not altering that.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-02 16:06:57 -07:00
Andrea Arcangeli
2839bdc1bf powerpc: remove superfluous PageTail checks on the pte gup_fast
This part of gup_fast doesn't seem capable of handling hugetlbfs ptes,
those should be handled by gup_hugepd only, so these checks are
superfluous.

Plus if this wasn't a noop, it would have oopsed because, the insistence
of using the speculative refcounting would trigger a VM_BUG_ON if a tail
page was encountered in the page_cache_get_speculative().

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-02 16:06:57 -07:00
Andrea Arcangeli
70b50f94f1 mm: thp: tail page refcounting fix
Michel while working on the working set estimation code, noticed that
calling get_page_unless_zero() on a random pfn_to_page(random_pfn)
wasn't safe, if the pfn ended up being a tail page of a transparent
hugepage under splitting by __split_huge_page_refcount().

He then found the problem could also theoretically materialize with
page_cache_get_speculative() during the speculative radix tree lookups
that uses get_page_unless_zero() in SMP if the radix tree page is freed
and reallocated and get_user_pages is called on it before
page_cache_get_speculative has a chance to call get_page_unless_zero().

So the best way to fix the problem is to keep page_tail->_count zero at
all times.  This will guarantee that get_page_unless_zero() can never
succeed on any tail page.  page_tail->_mapcount is guaranteed zero and
is unused for all tail pages of a compound page, so we can simply
account the tail page references there and transfer them to
tail_page->_count in __split_huge_page_refcount() (in addition to the
head_page->_mapcount).

While debugging this s/_count/_mapcount/ change I also noticed get_page is
called by direct-io.c on pages returned by get_user_pages.  That wasn't
entirely safe because the two atomic_inc in get_page weren't atomic.  As
opposed to other get_user_page users like secondary-MMU page fault to
establish the shadow pagetables would never call any superflous get_page
after get_user_page returns.  It's safer to make get_page universally safe
for tail pages and to use get_page_foll() within follow_page (inside
get_user_pages()).  get_page_foll() is safe to do the refcounting for tail
pages without taking any locks because it is run within PT lock protected
critical sections (PT lock for pte and page_table_lock for
pmd_trans_huge).

The standard get_page() as invoked by direct-io instead will now take
the compound_lock but still only for tail pages.  The direct-io paths
are usually I/O bound and the compound_lock is per THP so very
finegrined, so there's no risk of scalability issues with it.  A simple
direct-io benchmarks with all lockdep prove locking and spinlock
debugging infrastructure enabled shows identical performance and no
overhead.  So it's worth it.  Ideally direct-io should stop calling
get_page() on pages returned by get_user_pages().  The spinlock in
get_page() is already optimized away for no-THP builds but doing
get_page() on tail pages returned by GUP is generally a rare operation
and usually only run in I/O paths.

This new refcounting on page_tail->_mapcount in addition to avoiding new
RCU critical sections will also allow the working set estimation code to
work without any further complexity associated to the tail page
refcounting with THP.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: <stable@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-02 16:06:57 -07:00
Tony Luck
5569459cd3 [IA64] Wire up cross memory attach syscalls
Add sys_process_vm_readv and sys_process_vm_writev to ia64
syscall table. Passes tests at http://ozlabs.org/~cyeoh/cma/cma-test-20110718.tgz

Signed-off-by: Tony Luck <tony.luck@intel.com>
2011-11-02 14:00:26 -07:00
Linus Torvalds
d211858837 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/vfs-queue
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/vfs-queue:
  vfs: add d_prune dentry operation
  vfs: protect i_nlink
  filesystems: add set_nlink()
  filesystems: add missing nlink wrappers
  logfs: remove unnecessary nlink setting
  ocfs2: remove unnecessary nlink setting
  jfs: remove unnecessary nlink setting
  hypfs: remove unnecessary nlink setting
  vfs: ignore error on forced remount
  readlinkat: ensure we return ENOENT for the empty pathname for normal lookups
  vfs: fix dentry leak in simple_fill_super()
2011-11-02 11:41:01 -07:00
Linus Torvalds
de0a5345a5 Merge branch 'for-linus' of git://github.com/richardweinberger/linux
* 'for-linus' of git://github.com/richardweinberger/linux: (90 commits)
  um: fix ubd cow size
  um: Fix kmalloc argument order in um/vdso/vma.c
  um: switch to use of drivers/Kconfig
  UserModeLinux-HOWTO.txt: fix a typo
  UserModeLinux-HOWTO.txt: remove ^H characters
  um: we need sys/user.h only on i386
  um: merge delay_{32,64}.c
  um: distribute exports to where exported stuff is defined
  um: kill system-um.h
  um: generic ftrace.h will do...
  um: segment.h is x86-only and needed only there
  um: asm/pda.h is not needed anymore
  um: hw_irq.h can go generic as well
  um: switch to generic-y
  um: clean Kconfig up a bit
  um: a couple of missing dependencies...
  um: kill useless argument of free_chan() and free_one_chan()
  um: unify ptrace_user.h
  um: unify KSTK_...
  um: fix gcov build breakage
  ...
2011-11-02 09:45:39 -07:00
Richard Weinberger
8535639810 um: fix ubd cow size
ubd_file_size() cannot use ubd_dev->cow.file because at this time
ubd_dev->cow.file is not initialized.
Therefore, ubd_file_size() will always report a wrong disk size when
COW files are used.
Reading from /dev/ubd* would crash the kernel.

We have to read the correct disk size from the COW file's backing
file.

Signed-off-by: Richard Weinberger <richard@nod.at>
CC: stable@kernel.org
2011-11-02 14:15:42 +01:00
Dave Jones
0d65ede0a6 um: Fix kmalloc argument order in um/vdso/vma.c
kmalloc size is 1st arg, not second.

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Richard Weinberger <richard@nod.at>

Cc: <stable@kernel.org> # 3.0.x
[richard@nod.at: on 3.0 the to be patched file is
arch/um/sys-x86_64/vdso/vma.c]
2011-11-02 14:15:42 +01:00
Al Viro
3369465ed1 um: switch to use of drivers/Kconfig
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:41 +01:00
Richard Weinberger
38b64aed78 um: we need sys/user.h only on i386
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:38 +01:00
Richard Weinberger
d0af6cbfa2 um: merge delay_{32,64}.c
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:37 +01:00
Al Viro
73395a0002 um: distribute exports to where exported stuff is defined
ksyms.c is down to the stuff defined in various USER_OBJS

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:36 +01:00
Al Viro
a34978cbd9 um: kill system-um.h
most of it belonged in irqflags.h, actually

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:34 +01:00
Al Viro
b8c655d727 um: generic ftrace.h will do...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:34 +01:00
Al Viro
46ecca8ae1 um: segment.h is x86-only and needed only there
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:33 +01:00
Al Viro
c2ad3ad009 um: asm/pda.h is not needed anymore
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:32 +01:00
Al Viro
8e66cda4af um: hw_irq.h can go generic as well
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:31 +01:00
Al Viro
f5e900770f um: switch to generic-y
kill wrapper headers

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:30 +01:00
Al Viro
d805a78603 um: clean Kconfig up a bit
* kill duplicates with drivers/char/Kconfig
* take watchdog one into drivers/watchdog/Kconfig
* take mmapper to arch/um/Kconfig.um
* rename Kconfig.char menu to "UML Character Devices"

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:29 +01:00
Al Viro
772bd0a5a5 um: kill useless argument of free_chan() and free_one_chan()
delay_free_irq is always 0 for those...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:28 +01:00
Al Viro
966e803ab1 um: unify ptrace_user.h
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:27 +01:00
Al Viro
a10c95d84c um: unify KSTK_...
... and switch get_thread_register() to HOST_... for register numbers

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:26 +01:00
Al Viro
4d211093e8 um: fix gcov build breakage
a) exports in gmon_syms.c duplicate kernel/gcov/* ones
b) excluding -pg in vdso compile is not enough - -fprofile-arcs
and -ftest-coverage also needs to be excluded

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:26 +01:00
Al Viro
c32324e312 um: page_offset.h is never used
... and neither is the only define in it

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:25 +01:00
Al Viro
3fb77d7256 um: irq_vectors.h just shadows x86 one
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:24 +01:00
Al Viro
ff9586e98f um: required-features.h is there only to shadow x86 one...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:23 +01:00
Al Viro
8807c1d561 um: asm/apic.h is there only to shadow the x86 one...
... so take it to arch/um/x86/asm.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:22 +01:00
Al Viro
c506c0e4a7 um: take ubd_user.h to its users...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:22 +01:00
Al Viro
b3ee571e58 um: take ldt.h to arch/x86/um/asm/mm_context.h
it's x86-only and we have no business playing with it in asm/mmu.h; make
the latter have
	struct uml_arch_mm_context arch;
instead of
	struct uml_ldt ldt;
and let arch/<subarch>/um/asm/mm_context.h decide what'll be in there.
While we are at it, kill host_ldt.h - it's not needed in part of places
that include it (we want asm/ldt.h in those) and it can be trivially
expanded into the single remaining one.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:21 +01:00
Al Viro
f67aa2ffb7 um: merge signal_{32,64}.c
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:20 +01:00
Al Viro
fbe9868693 um: no need to play with save_sp in signal frame setup anymore
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:19 +01:00
Al Viro
c7ea591c91 um: increase stack growth cushion in pagefault
analog of [PATCH] i386: let usermode execute the "enter" instruction from
circa 2006.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:19 +01:00
Al Viro
3579a38973 um: merge HOST_... of registers common on i386 and amd64
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:18 +01:00
Al Viro
8edc4147be um: sanitize paths in sys_call_table* includes
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:18 +01:00
Al Viro
1bbd5f21f4 um: merge os-Linux/tls.c into arch/x86/um/os-Linux/tls.c
it's i386-specific; moreover, analogs on other targets have
incompatible interface - PTRACE_GET_THREAD_AREA does exist
elsewhere, but struct user_desc does *not*

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:17 +01:00
Al Viro
c5cc32fe14 um: move asm/desc.h into arch/x86/um/asm
its only purpose is to shadow the x86 asm/desc.h

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:16 +01:00
Al Viro
2014d01878 um: merge host_ldt_{32,64}.h
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:15 +01:00
Al Viro
09e129a603 um: merge tls_{32,64}.h
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:15 +01:00
Al Viro
4dc706c2f2 um: take um_mmu.h to asm/mmu.h, clean asm/mmu_context.h a bit
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:14 +01:00
Al Viro
fced95cacf um: kill um_uaccess.h
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:13 +01:00
Al Viro
ece67c8697 um: take mconsole*.h to arch/um/drivers
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:12 +01:00
Al Viro
510c72a3cf um: take chan_*.h and line.h to arch/um/drivers
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:12 +01:00
Al Viro
17e052093b um: take register_winch_irq() into the caller of is_skas_winch()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:11 +01:00
Al Viro
0a9e70b1cd um: kill shared/mem_kern.h
... nothing declared there exists

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:10 +01:00
Al Viro
445c5786c9 um: kill shared/tlb.h
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:10 +01:00
Al Viro
c75d053b70 um: make flush_tlb_kernel_range_common() static
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:09 +01:00
Al Viro
5ade8878e0 um: kill shared/task.h and HOST_TASK_REGS
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:08 +01:00
Al Viro
1cf5e62ab9 um: shared/syscall.h is not even included
... and functions declared in it do not exist

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2011-11-02 14:15:08 +01:00