evie/android_kernel_oneplus_msm8998 - Gay Catgirls Forgejo: gay catgirls having sex

evie/android_kernel_oneplus_msm8998

Author	SHA1	Message	Date
Julius Volz	bc4768eb08	ipvs: Move userspace definitions to include/linux/ip_vs.h Current versions of ipvsadm include "/usr/src/linux/include/net/ip_vs.h" directly. This file also contains kernel-only definitions. Normally, public definitions should live in include/linux, so this patch moves the definitions shared with userspace to a new file, "include/linux/ip_vs.h". This also removes the unused NFC_IPVS_PROPERTY bitmask, which was once used to point into skb->nfcache. To make old ipvsadms still compile with this, the old header file includes the new one. Thanks to Dave Miller and Horms for noting/adding the missing Kbuild entry for the new header file. Signed-off-by: Julius Volz <juliusv@google.com> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-31 20:45:24 -07:00
David S. Miller	c3f26a269c	netdev: Fix lockdep warnings in multiqueue configurations. When support for multiple TX queues were added, the netif_tx_lock() routines we converted to iterate over all TX queues and grab each queue's spinlock. This causes heartburn for lockdep and it's not a healthy thing to do with lots of TX queues anyways. So modify this to use a top-level lock and a "frozen" state for the individual TX queues. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-31 16:58:50 -07:00
Joel Becker	ecb3d28c7e	[PATCH] configfs: Convenience macros for attribute definition. Sysfs has the _ATTR() and _ATTR_RO() macros to make defining extended form attributes easier. configfs should have something similiar. - _CONFIGFS_ATTR() and _CONFIGFS_ATTR_RO() are the counterparts to the sysfs macros. - CONFIGFS_ATTR_STRUCT() creates the extended form attribute structure. - CONFIGFS_ATTR_OPS() defines the show_attribute()/store_attribute() operations that call the show()/store() operations of the extended form configfs_attributes. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-07-31 16:21:13 -07:00
Joel Becker	dacdd0e047	[PATCH] configfs: Include linux/err.h in linux/configfs.h We now use PTR_ERR() in the ->make_item() and ->make_group() operations. Folks including configfs.h need err.h. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>	2008-07-31 16:21:12 -07:00
David Miller	419ca3f135	lockdep: fix combinatorial explosion in lock subgraph traversal When we traverse the graph, either forwards or backwards, we are interested in whether a certain property exists somewhere in a node reachable in the graph. Therefore it is never necessary to traverse through a node more than once to get a correct answer to the given query. Take advantage of this property using a global ID counter so that we need not clear all the markers in all the lock_class entries before doing a traversal. A new ID is choosen when we start to traverse, and we continue through a lock_class only if it's ID hasn't been marked with the new value yet. This short-circuiting is essential especially for high CPU count systems. The scheduler has a runqueue per cpu, and needs to take two runqueue locks at a time, which leads to long chains of backwards and forwards subgraphs from these runqueue lock nodes. Without the short-circuit implemented here, a graph traversal on a runqueue lock can take up to (1 << (N - 1)) checks on a system with N cpus. For anything more than 16 cpus or so, lockdep will eventually bring the machine to a complete standstill. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-31 18:38:28 +02:00
Ingo Molnar	e4e4e534fa	sched clock: revert various sched_clock() changes Found an interactivity problem on a quad core test-system - simple CPU loops would occasionally delay the system un an unacceptable way. After much debugging with Peter Zijlstra it turned out that the problem is caused by the string of sched_clock() changes - they caused the CPU clock to jump backwards a bit - which confuses the scheduler arithmetics. (which is unsigned for performance reasons) So revert: # `c300ba2`: sched_clock: and multiplier for TSC to gtod drift # `c0c8773`: sched_clock: only update deltas with local reads. # `af52a90`: sched_clock: stop maximum check on NO HZ # `f7cce27`: sched_clock: widen the max and min time This solves the interactivity problems. Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Mike Galbraith <efault@gmx.de>	2008-07-31 17:20:29 +02:00
Patrick McHardy	ae375044d3	netfilter: nf_conntrack_tcp: decrease timeouts while data in unacknowledged In order to time out dead connections quicker, keep track of outstanding data and cap the timeout. Suggested by Herbert Xu. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-31 00:38:01 -07:00
Alan Cox	963e4975c6	pata_it821x: Driver updates and reworking - Add support for the RDC 1010 variant - Rework the core library to have a read_id method. This allows the hacky bits of it821x to go and prepares us for pata_hd - Switch from WARN to BUG in ata_id_string as it will reboot if you get it wrong so WARN won't be seen - Allow the issue of command 0xFC on the 821x. This is needed to query rebuild status. - Tidy up printk formatting - Do more ident rewriting on RAID volumes to handle firmware provided ident data which is rather wonky - Report the firmware revision and device layout in RAID mode - Don't try and disable raid on the 8211 or RDC - they don't have the relevant bits Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-07-31 02:04:50 -04:00
Alexander Beregalov	1f938d060a	libata.h: replace __FUNCTION__ with __func__ Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2008-07-31 01:48:16 -04:00
Linus Torvalds	660fc1f4d8	Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: powerpc/mm: Lockless get_user_pages_fast() for 64-bit (v3) powerpc: Don't use the wrong thread_struct for ptrace get/set VSX regs powerpc: Fix ptrace buffer size for VSX powerpc: Correctly hookup PTRACE_GET/SETVSRREGS for 32 bit processes ide/powermac: Fix use of uninitialized pointer on media-bay powerpc: Allow non-hcall return values for lparcfg writes ipmi/powerpc: Use linux/of_{device,platform}.h instead of asm powerpc/fsl: proliferate simple-bus compatibility to soc nodes Documentation: remove old sbc8260 board specific information cpm2: Rework baud rate generators configuration to support external clocks. powerpc: rtc_cmos_setup: assign interrupts only if there is i8259 PIC cpm_uart: Add generic clock API support to set baudrates cpm_uart: Modem control lines support powerpc: implement GPIO LIB API on CPM1 Freescale SoC. cpm2: Implement GPIO LIB API on CPM2 Freescale SoC. powerpc: Fix 8xx build failure powerpc: clean up the Book-E HW watchpoint support	2008-07-30 10:43:56 -07:00
Stephen Rothwell	3dd730f2b4	cpumask: statement expressions confuse some versions of gcc when you take the address of the result. Noticed on a sparc64 compile using a version 3.4.5 cross compiler. kernel/time/tick-common.c: In function `tick_check_new_device': kernel/time/tick-common.c:210: error: invalid lvalue in unary `&' ... Just make it a regular expression. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-30 10:35:49 -07:00
Linus Torvalds	a4319d9fa0	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (47 commits) net: Make "networking" one-click deselectable. ipv6: Fix useless proc net sockstat6 removal tcp: MD5: Use MIB counter instead of warning for MD5 mismatch. pkt_sched: Fix OOPS on ingress qdisc add. niu: Fix error checking in niu_ethflow_to_class. IPv6: datagram_send_ctl() should exit immediately when an error occured mac80211: fix mesh beaconing PS3: gelic: use unsigned long for irqflags mac80211: fix cfg80211 hooks for master interface nl80211: fix dump callbacks mac80211: partially fix skb->cb use rtl8187: Improve wireless statistics for RTL8187B rtl8187: Fix for TX sequence number problem mac80211: append CONFIG_ to MAC80211_VERBOSE_PS_DEBUG in net/mac80211/tx.c. mac80211: fix sparse integer as NULL pointer warning drivers/net/wireless/iwlwifi/iwl-led.c: printk fix mac80211: return correct error return from ieee80211_wep_init mac80211: tx, use dev_kfree_skb_any for beacon_get rt2x00: Clear queue entry flags during initialization rt2x00: Force full register config after start() ...	2008-07-30 10:13:37 -07:00
Jack Steiner	c627f9cc04	mm: add zap_vma_ptes(): a library function to unmap driver ptes zap_vma_ptes() is intended to be used by drivers to unmap ptes assigned to the driver private vmas. This interface is similar to zap_page_range() but is less general & less likely to be abused. Needed by the GRU driver. Signed-off-by: Jack Steiner <steiner@sgi.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-30 09:41:47 -07:00
Joerg Roedel	204b885e73	introduce lower_32_bits() macro The file kernel.h contains the upper_32_bits macro. This patch adds the other part, the lower_32_bits macro. Its first use will be in the driver for AMD IOMMU. Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-30 09:41:46 -07:00
Jerome Arbez-Gindre	2c203003f6	connector: add a BlackBoard user to connector Add a BlackBoard user to connector. BlackBoard is part of the TSP GPL sampling framework (http://savannah.nongnu.org/p/tsp) [akpm@linux-foundation.org: add comment] Signed-off-by: Jerome Arbez-Gindre <jeromearbezgindre@gmail.com> Acked-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-30 09:41:45 -07:00
Vegard Nossum	3f1712bac5	print_ip_sym(): use %pS Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-30 09:41:45 -07:00
Yinghai Lu	1d1958f050	mm: remove find_max_pfn_with_active_regions It has no user now Also print out info about adding/removing active regions. Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-30 09:41:44 -07:00
Thomas Renninger	a1531acd43	cpufreq acpi: only call _PPC after cpufreq ACPI init funcs got called already Ingo Molnar provided a fix to not call _PPC at processor driver initialization time in "[PATCH] ACPI: fix cpufreq regression" (git commit `e4233dec74`) But it can still happen that _PPC is called at processor driver initialization time. This patch should make sure that this is not possible anymore. Signed-off-by: Thomas Renninger <trenn@suse.de> Cc: Andi Kleen <andi@firstfloor.org> Cc: Len Brown <lenb@kernel.org> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Ingo Molnar <mingo@elte.hu> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-30 09:41:43 -07:00
Magnus Damm	1a4e564b7d	resource: add resource_size() Avoid one-off errors by introducing a resource_size() function. Signed-off-by: Magnus Damm <damm@igel.co.jp> Cc: Ben Dooks <ben-linux@fluff.org> Cc: Jean Delvare <khali@linux-fr.org> Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-30 09:41:43 -07:00
David Brownell	95b1bc2053	[MTD] MTD_DEBUG always does compile-time typechecks The current style for debug messages is to ensure they're always parsed by the compiler and then subjected to dead code removal. That way builds won't break only when debug options get enabled, which is common when they are stripped out early by CPP. This patch makes CONFIG_MTD_DEBUG adopt that convention. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>	2008-07-30 14:21:05 +01:00
Alexey Korolev	96d8b647cf	[MTD] [NAND] fix subpage read for small page NAND Current implementation of subpage read feature for NAND has issues with small page devices. Small page NAND do not support RNDOUT command. So subpage feature is not applicable for them. This patch disables support of subpage for small page NAND. The code is verified on nandsim(SP NAND simulation) and on LP NAND devices. Thanks a lot to Artem for finding this issue. Signed-off-by: Alexey Korolev <akorolev@infradead.org> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>	2008-07-30 11:59:24 +01:00
David S. Miller	785957d3e8	tcp: MD5: Use MIB counter instead of warning for MD5 mismatch. From a report by Matti Aarnio, and preliminary patch by Adam Langley. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-07-30 03:27:25 -07:00
Philipp Zabel	0eb5d5ab3e	regulator: TI bq24022 Li-Ion Charger driver This adds a regulator driver for the TI bq24022 Single-Chip Li-Ion Charger with its nCE and ISET2 pins connected to GPIOs. Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com> Signed-off-by: Liam Girdwood <lg@opensource.wolfsonmicro.com>	2008-07-30 10:10:23 +01:00
Mark Brown	48d335ba31	regulator: fixed regulator interface This patch adds support for fixed regulators. This class of regulator is not software controllable but can coexist on machines with software controlable regulators. Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Liam Girdwood <lg@opensource.wolfsonmicro.com>	2008-07-30 10:10:21 +01:00
Liam Girdwood	4c1184e85c	regulator: machine driver interface This interface is for machine specific code and allows the creation of voltage/current domains (with constraints) for each regulator. It can provide regulator constraints that will prevent device damage through overvoltage or over current caused by buggy client drivers. It also allows the creation of a regulator tree whereby some regulators are supplied by others (similar to a clock tree). Signed-off-by: Liam Girdwood <lg@opensource.wolfsonmicro.com> Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>	2008-07-30 10:10:21 +01:00
Liam Girdwood	571a354b15	regulator: regulator driver interface This allows regulator drivers to register their regulators and provide operations to the core. It also has a notifier call chain for propagating regulator events to clients. Signed-off-by: Liam Girdwood <lg@opensource.wolfsonmicro.com> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>	2008-07-30 10:10:20 +01:00
Liam Girdwood	e2ce4eaa76	regulator: consumer device interface Add support to allow consumer device drivers to control their regulator power supply. This uses a similar API to the kernel clock interface in that consumer drivers can get and put a regulator (like they can with clocks atm) and get/set voltage, current limit, mode, enable and disable. This should allow consumers complete control over their supply voltage and current limit. This also compiles out if not in use so drivers can be reused in systems with no regulator based power control. Signed-off-by: Liam Girdwood <lg@opensource.wolfsonmicro.com> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>	2008-07-30 10:10:20 +01:00
Nick Piggin	ce0ad7f095	powerpc/mm: Lockless get_user_pages_fast() for 64-bit (v3) Implement lockless get_user_pages_fast for 64-bit powerpc. Page table existence is guaranteed with RCU, and speculative page references are used to take a reference to the pages without having a prior existence guarantee on them. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2008-07-30 15:26:54 +10:00
David S. Miller	e93dc4891d	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6	2008-07-29 21:51:00 -07:00
Anton Vorontsov	9fec6060d9	Merge branch 'master' of /home/cbou/linux-2.6 Conflicts: drivers/power/Kconfig drivers/power/Makefile	2008-07-30 02:05:23 +04:00
Johannes Berg	d0f0980414	mac80211: partially fix skb->cb use This patch fixes mac80211 to not use the skb->cb over the queue step from virtual interfaces to the master. The patch also, for now, disables aggregation because that would still require requeuing, will fix that in a separate patch. There are two other places (software requeue and powersaving stations) where requeue can happen, but that is not currently used by any drivers/not possible to use respectively. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-07-29 16:55:08 -04:00
Henrique de Moraes Holschuh	f1b23361a0	rfkill: document the rfkill struct locking (v2) Reorder fields in struct rfkill and add comments to make it clear which fields are protected by rfkill->mutex. Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Acked-by: Ivo van Doorn <IvDoorn@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2008-07-29 16:36:33 -04:00
Adrian McMenamin	1795cf48b3	sh/maple: clean maple bus code This patch cleans up the handling of the maple bus queue to remove the risk of races when adding packets. It also removes references to the redundant connect and disconnect functions. Signed-off-by: Adrian McMenamin <adrian@mcmen.demon.co.uk> Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2008-07-29 22:10:56 +09:00
FUJITA Tomonori	8978b74253	generic, x86: fix add iommu_num_pages helper function This IOMMU helper function doesn't work for some architectures: http://marc.info/?l=linux-kernel&m=121699304403202&w=2 It also breaks POWER and SPARC builds: http://marc.info/?l=linux-kernel&m=121730388001890&w=2 Currently, only x86 IOMMUs use this so let's move it to x86 for now. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-29 12:12:48 +02:00
Avi Kivity	ed84862433	KVM: Advertise synchronized mmu support to userspace Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-07-29 12:34:02 +03:00
Andrea Arcangeli	e930bffe95	KVM: Synchronize guest physical memory map to host virtual memory map Synchronize changes to host virtual addresses which are part of a KVM memory slot to the KVM shadow mmu. This allows pte operations like swapping, page migration, and madvise() to transparently work with KVM. Signed-off-by: Andrea Arcangeli <andrea@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2008-07-29 12:33:53 +03:00
Linus Torvalds	5dfb66ba8c	Merge branch 'for-linus' of git://git.o-hand.com/linux-mfd * 'for-linus' of git://git.o-hand.com/linux-mfd: mfd: accept pure device as a parent, not only platform_device mfd: add platform_data to mfd_cell mfd: Coding style fixes mfd: Use to_platform_device instead of container_of	2008-07-28 18:15:41 -07:00
Linus Torvalds	1d9b9f6a53	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (21 commits) x86/PCI: use dev_printk when possible PCI: add D3 power state avoidance quirk PCI: fix bogus "'device' may be used uninitialized" warning in pci_slot PCI: add an option to allow ASPM enabled forcibly PCI: disable ASPM on pre-1.1 PCIe devices PCI: disable ASPM per ACPI FADT setting PCI MSI: Don't disable MSIs if the mask bit isn't supported PCI: handle 64-bit resources better on 32-bit machines PCI: rewrite PCI BAR reading code PCI: document pci_target_state PCI hotplug: fix typo in pcie hotplug output x86 gart: replace to_pages macro with iommu_num_pages x86, AMD IOMMU: replace to_pages macro with iommu_num_pages iommu: add iommu_num_pages helper function dma-coherent: add documentation to new interfaces Cris: convert to using generic dma-coherent mem allocator Sh: use generic per-device coherent dma allocator ARM: support generic per-device coherent dma mem Generic dma-coherent: fix DMA_MEMORY_EXCLUSIVE x86: use generic per-device dma coherent allocator ...	2008-07-28 18:14:24 -07:00
Dmitry Baryshkov	424f525a12	mfd: accept pure device as a parent, not only platform_device Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com> Signed-off-by: Samuel Ortiz <sameo@openedhand.com>	2008-07-29 01:30:26 +02:00
Hisashi Hifumi	8ab22b9abb	vfs: pagecache usage optimization for pagesize!=blocksize When we read some part of a file through pagecache, if there is a pagecache of corresponding index but this page is not uptodate, read IO is issued and this page will be uptodate. I think this is good for pagesize == blocksize environment but there is room for improvement on pagesize != blocksize environment. Because in this case a page can have multiple buffers and even if a page is not uptodate, some buffers can be uptodate. So I suggest that when all buffers which correspond to a part of a file that we want to read are uptodate, use this pagecache and copy data from this pagecache to user buffer even if a page is not uptodate. This can reduce read IO and improve system throughput. I wrote a benchmark program and got result number with this program. This benchmark do: 1: mount and open a test file. 2: create a 512MB file. 3: close a file and umount. 4: mount and again open a test file. 5: pwrite randomly 300000 times on a test file. offset is aligned by IO size(1024bytes). 6: measure time of preading randomly 100000 times on a test file. The result was: 2.6.26 330 sec 2.6.26-patched 226 sec Arch:i386 Filesystem:ext3 Blocksize:1024 bytes Memory: 1GB On ext3/4, a file is written through buffer/block. So random read/write mixed workloads or random read after random write workloads are optimized with this patch under pagesize != blocksize environment. This test result showed this. The benchmark program is as follows: #include <stdio.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> #include <time.h> #include <stdlib.h> #include <string.h> #include <sys/mount.h> #define LEN 1024 #define LOOP 1024512 / 512MB / main(void) { unsigned long i, offset, filesize; int fd; char buf[LEN]; time_t t1, t2; if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) { perror("cannot mount\n"); exit(1); } memset(buf, 0, LEN); fd = open("/root/test1/testfile", O_CREAT\|O_RDWR\|O_TRUNC); if (fd < 0) { perror("cannot open file\n"); exit(1); } for (i = 0; i < LOOP; i++) write(fd, buf, LEN); close(fd); if (umount("/root/test1/") < 0) { perror("cannot umount\n"); exit(1); } if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) { perror("cannot mount\n"); exit(1); } fd = open("/root/test1/testfile", O_RDWR); if (fd < 0) { perror("cannot open file\n"); exit(1); } filesize = LEN LOOP; for (i = 0; i < 300000; i++){ offset = (random() % filesize) & (~(LEN - 1)); pwrite(fd, buf, LEN, offset); } printf("start test\n"); time(&t1); for (i = 0; i < 100000; i++){ offset = (random() % filesize) & (~(LEN - 1)); pread(fd, buf, LEN, offset); } time(&t2); printf("%ld sec\n", t2-t1); close(fd); if (umount("/root/test1/") < 0) { perror("cannot umount\n"); exit(1); } } Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Christoph Hellwig <hch@infradead.org> Cc: Jan Kara <jack@ucw.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-28 16:30:21 -07:00
Andrea Arcangeli	cddb8a5c14	mmu-notifiers: core With KVM/GFP/XPMEM there isn't just the primary CPU MMU pointing to pages. There are secondary MMUs (with secondary sptes and secondary tlbs) too. sptes in the kvm case are shadow pagetables, but when I say spte in mmu-notifier context, I mean "secondary pte". In GRU case there's no actual secondary pte and there's only a secondary tlb because the GRU secondary MMU has no knowledge about sptes and every secondary tlb miss event in the MMU always generates a page fault that has to be resolved by the CPU (this is not the case of KVM where the a secondary tlb miss will walk sptes in hardware and it will refill the secondary tlb transparently to software if the corresponding spte is present). The same way zap_page_range has to invalidate the pte before freeing the page, the spte (and secondary tlb) must also be invalidated before any page is freed and reused. Currently we take a page_count pin on every page mapped by sptes, but that means the pages can't be swapped whenever they're mapped by any spte because they're part of the guest working set. Furthermore a spte unmap event can immediately lead to a page to be freed when the pin is released (so requiring the same complex and relatively slow tlb_gather smp safe logic we have in zap_page_range and that can be avoided completely if the spte unmap event doesn't require an unpin of the page previously mapped in the secondary MMU). The mmu notifiers allow kvm/GRU/XPMEM to attach to the tsk->mm and know when the VM is swapping or freeing or doing anything on the primary MMU so that the secondary MMU code can drop sptes before the pages are freed, avoiding all page pinning and allowing 100% reliable swapping of guest physical address space. Furthermore it avoids the code that teardown the mappings of the secondary MMU, to implement a logic like tlb_gather in zap_page_range that would require many IPI to flush other cpu tlbs, for each fixed number of spte unmapped. To make an example: if what happens on the primary MMU is a protection downgrade (from writeable to wrprotect) the secondary MMU mappings will be invalidated, and the next secondary-mmu-page-fault will call get_user_pages and trigger a do_wp_page through get_user_pages if it called get_user_pages with write=1, and it'll re-establishing an updated spte or secondary-tlb-mapping on the copied page. Or it will setup a readonly spte or readonly tlb mapping if it's a guest-read, if it calls get_user_pages with write=0. This is just an example. This allows to map any page pointed by any pte (and in turn visible in the primary CPU MMU), into a secondary MMU (be it a pure tlb like GRU, or an full MMU with both sptes and secondary-tlb like the shadow-pagetable layer with kvm), or a remote DMA in software like XPMEM (hence needing of schedule in XPMEM code to send the invalidate to the remote node, while no need to schedule in kvm/gru as it's an immediate event like invalidating primary-mmu pte). At least for KVM without this patch it's impossible to swap guests reliably. And having this feature and removing the page pin allows several other optimizations that simplify life considerably. Dependencies: 1) mm_take_all_locks() to register the mmu notifier when the whole VM isn't doing anything with "mm". This allows mmu notifier users to keep track if the VM is in the middle of the invalidate_range_begin/end critical section with an atomic counter incraese in range_begin and decreased in range_end. No secondary MMU page fault is allowed to map any spte or secondary tlb reference, while the VM is in the middle of range_begin/end as any page returned by get_user_pages in that critical section could later immediately be freed without any further ->invalidate_page notification (invalidate_range_begin/end works on ranges and ->invalidate_page isn't called immediately before freeing the page). To stop all page freeing and pagetable overwrites the mmap_sem must be taken in write mode and all other anon_vma/i_mmap locks must be taken too. 2) It'd be a waste to add branches in the VM if nobody could possibly run KVM/GRU/XPMEM on the kernel, so mmu notifiers will only enabled if CONFIG_KVM=m/y. In the current kernel kvm won't yet take advantage of mmu notifiers, but this already allows to compile a KVM external module against a kernel with mmu notifiers enabled and from the next pull from kvm.git we'll start using them. And GRU/XPMEM will also be able to continue the development by enabling KVM=m in their config, until they submit all GRU/XPMEM GPLv2 code to the mainline kernel. Then they can also enable MMU_NOTIFIERS in the same way KVM does it (even if KVM=n). This guarantees nobody selects MMU_NOTIFIER=y if KVM and GRU and XPMEM are all =n. The mmu_notifier_register call can fail because mm_take_all_locks may be interrupted by a signal and return -EINTR. Because mmu_notifier_reigster is used when a driver startup, a failure can be gracefully handled. Here an example of the change applied to kvm to register the mmu notifiers. Usually when a driver startups other allocations are required anyway and -ENOMEM failure paths exists already. struct kvm kvm_arch_create_vm(void) { struct kvm kvm = kzalloc(sizeof(struct kvm), GFP_KERNEL); + int err; if (!kvm) return ERR_PTR(-ENOMEM); INIT_LIST_HEAD(&kvm->arch.active_mmu_pages); + kvm->arch.mmu_notifier.ops = &kvm_mmu_notifier_ops; + err = mmu_notifier_register(&kvm->arch.mmu_notifier, current->mm); + if (err) { + kfree(kvm); + return ERR_PTR(err); + } + return kvm; } mmu_notifier_unregister returns void and it's reliable. The patch also adds a few needed but missing includes that would prevent kernel to compile after these changes on non-x86 archs (x86 didn't need them by luck). [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: fix mm/filemap_xip.c build] [akpm@linux-foundation.org: fix mm/mmu_notifier.c build] Signed-off-by: Andrea Arcangeli <andrea@qumranet.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Christoph Lameter <cl@linux-foundation.org> Cc: Jack Steiner <steiner@sgi.com> Cc: Robin Holt <holt@sgi.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Kanoj Sarcar <kanojsarcar@yahoo.com> Cc: Roland Dreier <rdreier@cisco.com> Cc: Steve Wise <swise@opengridcomputing.com> Cc: Avi Kivity <avi@qumranet.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Anthony Liguori <aliguori@us.ibm.com> Cc: Chris Wright <chrisw@redhat.com> Cc: Marcelo Tosatti <marcelo@kvack.org> Cc: Eric Dumazet <dada1@cosmosbay.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: Izik Eidus <izike@qumranet.com> Cc: Anthony Liguori <aliguori@us.ibm.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-28 16:30:21 -07:00
Andrea Arcangeli	7906d00cd1	mmu-notifiers: add mm_take_all_locks() operation mm_take_all_locks holds off reclaim from an entire mm_struct. This allows mmu notifiers to register into the mm at any time with the guarantee that no mmu operation is in progress on the mm. This operation locks against the VM for all pte/vma/mm related operations that could ever happen on a certain mm. This includes vmtruncate, try_to_unmap, and all page faults. The caller must take the mmap_sem in write mode before calling mm_take_all_locks(). The caller isn't allowed to release the mmap_sem until mm_drop_all_locks() returns. mmap_sem in write mode is required in order to block all operations that could modify pagetables and free pages without need of altering the vma layout (for example populate_range() with nonlinear vmas). It's also needed in write mode to avoid new anon_vmas to be associated with existing vmas. A single task can't take more than one mm_take_all_locks() in a row or it would deadlock. mm_take_all_locks() and mm_drop_all_locks are expensive operations that may have to take thousand of locks. mm_take_all_locks() can fail if it's interrupted by signals. When mmu_notifier_register returns, we must be sure that the driver is notified if some task is in the middle of a vmtruncate for the 'mm' where the mmu notifier was registered (mmu_notifier_invalidate_range_start/end is run around the vmtruncation but mmu_notifier_register can run after mmu_notifier_invalidate_range_start and before mmu_notifier_invalidate_range_end). Same problem for rmap paths. And we've to remove page pinning to avoid replicating the tlb_gather logic inside KVM (and GRU doesn't work well with page pinning regardless of needing tlb_gather), so without mm_take_all_locks when vmtruncate frees the page, kvm would have no way to notice that it mapped into sptes a page that is going into the freelist without a chance of any further mmu_notifier notification. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Andrea Arcangeli <andrea@qumranet.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Jack Steiner <steiner@sgi.com> Cc: Robin Holt <holt@sgi.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Kanoj Sarcar <kanojsarcar@yahoo.com> Cc: Roland Dreier <rdreier@cisco.com> Cc: Steve Wise <swise@opengridcomputing.com> Cc: Avi Kivity <avi@qumranet.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Anthony Liguori <aliguori@us.ibm.com> Cc: Chris Wright <chrisw@redhat.com> Cc: Marcelo Tosatti <marcelo@kvack.org> Cc: Eric Dumazet <dada1@cosmosbay.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: Izik Eidus <izike@qumranet.com> Cc: Anthony Liguori <aliguori@us.ibm.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-28 16:30:21 -07:00
Andrea Arcangeli	6beeac76f5	mmu-notifiers: add list_del_init_rcu() Introduce list_del_init_rcu() and document it. Signed-off-by: Andrea Arcangeli <andrea@qumranet.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Jack Steiner <steiner@sgi.com> Cc: Robin Holt <holt@sgi.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Kanoj Sarcar <kanojsarcar@yahoo.com> Cc: Roland Dreier <rdreier@cisco.com> Cc: Steve Wise <swise@opengridcomputing.com> Cc: Avi Kivity <avi@qumranet.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Anthony Liguori <aliguori@us.ibm.com> Cc: Chris Wright <chrisw@redhat.com> Cc: Marcelo Tosatti <marcelo@kvack.org> Cc: Eric Dumazet <dada1@cosmosbay.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: Izik Eidus <izike@qumranet.com> Cc: Anthony Liguori <aliguori@us.ibm.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-28 16:30:20 -07:00
Mike Rapoport	56edb58be1	mfd: add platform_data to mfd_cell Adding platform_data to mfd_cell allows passing of platform data directly to the platform_device created for each cell and thus reuse of existing drivers. On the other side it can be used as a hook to mfd_cell itself removing the need in mfd_get_cell method. Signed-off-by: Mike Rapoport <mike@compulab.co.il> Acked-by: Dmitry Baryshkov <dbaryshkov@gmail.com> Signed-off-by: Samuel Ortiz <sameo@openedhand.com>	2008-07-29 01:23:32 +02:00
Alan Cox	979b1791e5	PCI: add D3 power state avoidance quirk Libata has some hacks to deal with certain controllers going silly in D3 state. The right way to handle this is to keep a PCI device flag for such devices. That can then be generalised for no ATA devices with power problems. Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2008-07-28 15:12:11 -07:00
Shaohua Li	149e16372a	PCI: disable ASPM on pre-1.1 PCIe devices Disable ASPM on pre-1.1 PCIe devices, as many of them don't implement it correctly. Tested-by: Jack Howarth <howarth@bromo.msbb.uc.edu> Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2008-07-28 14:56:57 -07:00
Shaohua Li	5fde244d39	PCI: disable ASPM per ACPI FADT setting The ACPI FADT table includes an ASPM control bit. If the bit is set, do not enable ASPM since it may indicate that the platform doesn't actually support the feature. Tested-by: Jack Howarth <howarth@bromo.msbb.uc.edu> Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2008-07-28 14:56:09 -07:00
Ingo Molnar	9e3ee1c39c	Merge branch 'linus' into cpus4096 Conflicts: kernel/stop_machine.c Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-28 23:32:00 +02:00
Jesse Barnes	29111f579f	Merge branch 'x86/iommu' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip into for-linus	2008-07-28 14:31:10 -07:00
Linus Torvalds	e56b3bc794	cpu masks: optimize and clean up cpumask_of_cpu() Clean up and optimize cpumask_of_cpu(), by sharing all the zero words. Instead of stupidly generating all possible i=0...NR_CPUS 2^i patterns creating a huge array of constant bitmasks, realize that the zero words can be shared. In other words, on a 64-bit architecture, we only ever need 64 of these arrays - with a different bit set in one single world (with enough zero words around it so that we can create any bitmask by just offsetting in that big array). And then we just put enough zeroes around it that we can point every single cpumask to be one of those things. So when we have 4k CPU's, instead of having 4k arrays (of 4k bits each, with one bit set in each array - 2MB memory total), we have exactly 64 arrays instead, each 8k bits in size (64kB total). And then we just point cpumask(n) to the right position (which we can calculate dynamically). Once we have the right arrays, getting "cpumask(n)" ends up being: static inline const cpumask_t get_cpu_mask(unsigned int cpu) { const unsigned long p = cpu_bit_bitmap[1 + cpu % BITS_PER_LONG]; p -= cpu / BITS_PER_LONG; return (const cpumask_t *)p; } This brings other advantages and simplifications as well: - we are not wasting memory that is just filled with a single bit in various different places - we don't need all those games to re-create the arrays in some dense format, because they're already going to be dense enough. if we compile a kernel for up to 4k CPU's, "wasting" that 64kB of memory is a non-issue (especially since by doing this "overlapping" trick we probably get better cache behaviour anyway). [ mingo@elte.hu: Converted Linus's mails into a commit. See: http://lkml.org/lkml/2008/7/27/156 http://lkml.org/lkml/2008/7/28/320 Also applied a family filter - which also has the side-effect of leaving out the bits where Linus calls me an idio... Oh, never mind ;-) ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-28 22:20:41 +02:00

... 3 4 5 6 7 ...