Guenter Roeck says:
====================
net: dsa: Fixes and enhancements
Patch 01/15 addresses a bug indicated by an an annoying and unhelpful
log message.
Patches 02/15 and 03/15 are minor enhancements, adding support for
known switch revisions.
Patches 04/15 and 05/15 add support for MV88E6352 and MV88E6176.
Patch 06/15 adds support for hardware monitoring, specifically for
reporting the chip temperature, to the dsa subsystem.
Patches 07/15 and 08/15 implement hardware monitoring for MV88E6352,
MV88E6176, MV88E6123, MV88E6161, and MV88E6165.
Patch 09/15 and 10/15 add support for EEPROM access to the DSA subsystem.
Patch 11/15 implements EEPROM access for MV88E6352 and MV88E6176.
Patch 12/15 adds support for reading switch registers to the DSA
subsystem.
Patches 13/15 amd 14/15 implement support for reading switch registers
to the drivers for MV88E6352, MV88E6176, MV88E6123, MV88E6161, and MV88E6165.
Patch 15/15 adds support for reading additional RMON registers to the drivers
for MV88E6352, MV88E6176, MV88E6123, MV88E6161, and MV88E6165.
The series was tested on top of v3.18-rc2 in an x86 system with MV88E6352.
Testing in systems with 88E6131, 88E6060 and MV88E6165 was done earlier
(I don't have access to those systems right now). The series was also build
tested using my build system at http://server.roeck-us.net:8010/builders.
Look into the 'dsa' column for build results.
The series merges cleanly into net-next as of today (10/29).
v3:
- Fix bug in eeprom patches seen if devicetree is enabled:
eeprom-length property is attached to switch devicetree node,
not to dsa node, and there was a compile error.
v2:
- Made reporting chip temperatures through the hwmon subsystem optional
with new Kconfig option
- Changed the hwmon chip name to <network device name>_dsa<index>
- Made EEPROM presence and size configurable through platform and devicetree
data
- Various minor changes and fixes (see individual patches for details)
====================
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Display sw_in_discards, sw_in_filtered, and sw_out_filtered for chips
supported by mv88e6123_61_65 and mv88e6352 drivers.
The variables are provided in port registers, not the normal status registers.
Mark by adding 0x100 to the register offset and add special handling code
to mv88e6xxx_get_ethtool_stats.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The infrastructure can now report switch registers to ethtool.
Add support for it to the mv88e6123_61_65 driver.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for reading switch registers with 'ethtool -d'.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
MV88E6352 supports read and write access to its configuration eeprom.
There is no means to detect if an EEPROM is connected to the switch.
Also, the switch supports EEPROMs with different sizes, but can not detect
or report the type or size of connected EEPROMs. Therefore, do not implement
the get_eeprom_len callback but depend on platform or devicetree data to
provide information about EEPROM presence and size.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The dsa core now supports reading from and writing to a switch EEPROM
if connected. Describe optional devicetree property indicating that
an EEPROM is present and its size.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
On some chips it is possible to access the switch eeprom.
Add infrastructure support for it.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
MV88E6123 and compatible chips support reading the chip temperature
from PHY register 6:26.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
MV88E6352 supports reading the chip temperature from two PHY registers,
6:26 and 6:27. Report it using the more accurate register 6:27.
Also report temperature limit and alarm.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some switches provide chip temperature data.
Add support for reporting it through the hwmon subsystem.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
MV88E6176 is mostly compatible to MV88E6352 and is documented
in the same functional specification. Add support for it.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marvell 88E6352 is mostly compatible to MV88E6123/61/65,
but requires indirect phy access. Also, its configuration
registers are a bit different.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Report known silicon revisions when probing Marvell 88E6131 switches.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Report known silicon revisions when probing Marvell 88E6060 switches.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Setting skb->protocol to a private protocol type may result in warning
messages such as
e1000e 0000:00:19.0 em1: checksum_partial proto=dada!
This happens if the L3 protocol is IP or IPv6 and skb->ip_summed is set
to CHECKSUM_PARTIAL. Looking through the code, it appears that changing
skb->protocol for transmitted packets is not necessary and may actually
be harmful. For example, it prevents purposely unmodified (from a DSA
perspective) network drivers from properly setting up their transmit
checksum offload pointers since they inspect skb->protocol to set up the
IPv4 header or IPv6 header pointers. So don't unnecessarily change the
protocol field.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Back in commit 5136b2da77 ("PCI: convert bus code to use dev_groups"),
I misstyped the 'enable' sysfs filename as 'enabled', which broke the
userspace API. This patch fixes that issue by renaming the file back.
Fixes: 5136b2da77 ("PCI: convert bus code to use dev_groups")
Reported-by: Jeff Epler <jepler@unpythonic.net>
Tested-by: Jeff Epler <jepler@unpythonic.net> # on v3.14-rt
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: stable@vger.kernel.org # 3.13
Although the diffstat looks scary, it's just because of the removal of
the dead code (s6000), thus it must not affect anything serious.
Other than that, all small fixes. The only core fix is zero-clear for
a PCM compat ioctl. The rest are driver-specific, bebob, sgtl500,
adau1761, intel-sst, ad1889 and a few HD-audio quirks as usual.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJUURWjAAoJEGwxgFQ9KSmkWiEP/R2n/jskNVH1Pk/9BwTpo+ND
YsH1Ysni5+QfiEh4R+FJJxnaBS2nbXYGbCMG9qIoaEmkMkujMn7eoyq0QLoZVkVs
tB4rEFkPwPKpZJjsPEwzgOk06FQR04qU2PbovA5dijC0mtLaaVwp5Hd/Xfvv09vQ
ts3y+yA4Y7DpdIwJ727c9aeEALuZAdfAgqS755+ZDBXbY0uefEIXfV2eSj4Fuuis
nWjMPdlfLAaBrwOa+h62mHkzw5jARGvEWlPC8Q9v+Z1liAJa+aDI1Pj7Ctr4NaFI
DL3I5UigETjPAll90z9F0qdwV6Z8kmppw1uoZntt2OwFfyo37675hJX8AbHTjik8
BXVKwQeJkgwufHgHQBen80MQh59Whn34dJ9qKYHM5Xa+v/ex2ACxeJQIa3+1nxcd
jbfszGlI2PZNZCNtYBxtcSpiXu0Sdfhfy16v4V6CCZ6kQ9hj7GQm9HrB9NrFAHED
zWFbCf5ouu5M3i2j6VKLGmP0f6I/Mxb+4Rvw0KzlKKnM1eqFK/Ttqn0aHJXaTEb9
AoHbzlAgES1Lo8Ftg/xbX4J8NcAMQRkDUYpiSELFc/+LQ3rNU+Bcgf5aWa/rVnII
PnClako0XPSQcRvvhVOMcJzuj0YLCr5TMtmlbDCDgPm39pgCkUoKhnJHeJ/IbCRG
Ixk33L41prRJF3Yuw1CJ
=1RIC
-----END PGP SIGNATURE-----
Merge tag 'sound-3.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Although the diffstat looks scary, it's just because of the removal of
the dead code (s6000), thus it must not affect anything serious.
Other than that, all small fixes. The only core fix is zero-clear for
a PCM compat ioctl. The rest are driver-specific, bebob, sgtl500,
adau1761, intel-sst, ad1889 and a few HD-audio quirks as usual"
* tag 'sound-3.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda - Add workaround for CMI8888 snoop behavior
ALSA: pcm: Zero-clear reserved fields of PCM status ioctl in compat mode
ALSA: bebob: Uninitialized id returned by saffirepro_both_clk_src_get
ALSA: hda/realtek - New SSID for Headset quirk
ALSA: ad1889: Fix probable mask then right shift defects
ALSA: bebob: fix wrong decoding of clock information for Terratec PHASE 88 Rack FW
ALSA: hda/realtek - Update restore default value for ALC283
ALSA: hda/realtek - Update restore default value for ALC282
ASoC: fsl: use strncpy() to prevent copying of over-long names
ASoC: adau1761: Fix input PGA volume
ASoC: s6000: remove driver
ASoC: Intel: HSW/BDW only support S16 and S24 formats.
ASoC: sgtl500: Document the required supplies
ext4_ext_convert_to_initialized() can return more blocks than are
actually allocated from map->m_lblk in case where initial part of the
on-disk extent is zeroed out. Luckily this doesn't have serious
consequences because the caller currently uses the return value
only to unmap metadata buffers. Anyway this is a data
corruption/exposure problem waiting to happen so fix it.
Coverity-id: 1226848
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When clearing inode journal flag, we call jbd2_journal_flush() to force
all the journalled data to their final locations. Currently we ignore
when this fails and continue clearing inode journal flag. This isn't a
big problem because when jbd2_journal_flush() fails, journal is likely
aborted anyway. But it can still lead to somewhat confusing results so
rather bail out early.
Coverity-id: 989044
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When ext4_handle_dirty_dx_node() or ext4_handle_dirty_dirent_node()
fail, there's really something wrong with the fs and there's no point in
continuing further. Just return error from make_indexed_dir() in that
case. Also initialize frames array so that if we return early due to
error, dx_release() doesn't try to dereference uninitialized memory
(which could happen also due to error in do_split()).
Coverity-id: 741300
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
The old hash function didn't work well for 64-bit block numbers, and
used undefined (negative) shift right behavior. Use the generic
64-bit hash function instead.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reported-by: Andrey Ryabinin <a.ryabinin@samsung.com>
If we can't load the journal, remove the procfs files for the extent
status information file to avoid leaking resources.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
ext4 does not permit changing the metadata or journal checksum feature
flag while mounted. Until we decide to support that, don't allow a
remount to change the journal_csum flag (right now we silently fail to
change anything).
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If metadata checksumming is turned on for the FS, we need to tell the
journal to use checksumming too.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
When we fail to load block bitmap in __ext4_new_inode() we will
dereference NULL pointer in ext4_journal_get_write_access(). So check
for error from ext4_read_block_bitmap().
Coverity-id: 989065
Cc: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When there are no meta block groups update_backups() will compute the
backup block in 32-bit arithmetics thus possibly overflowing the block
number and corrupting the filesystem. OTOH filesystems without meta
block groups larger than 16 TB should be rare. Fix the problem by doing
the counting in 64-bit arithmetics.
Coverity-id: 741252
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Without that fix connector-analog-tv driver isn't probed when compiled
as module.
Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Following commands:
modprobe ixgbe
ifconfig ethX up
ethtool -s ethX advertise 0x020
can lead to "setup link failed with code -14" error due to the setup_link
call racing with the SFP detection routine in the watchdog.
This patch resolves this issue by protecting the setup_link call with check
for __IXGBE_IN_SFP_INIT.
Reported-by: Scott Harrison <scoharr2@cisco.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Martin Zhang <martinbj2008@gmail.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Incoming packet is dropped silently by sk_filter(), if the skb was
allocated from pfmemalloc reserves and the corresponding socket is
not marked with the SOCK_MEMALLOC flag.
Igb driver allocates pages for DMA with __skb_alloc_page(), which
calls alloc_pages_node() with the __GFP_MEMALLOC flag. So, in case
of OOM condition, igb can get pages with pfmemalloc flag set.
If an incoming packet hits the pfmemalloc page and is large enough
(small packets are copying into the memory, allocated with
netdev_alloc_skb_ip_align(), so they are not affected), it will be
dropped.
This behavior is ok under high memory pressure, but the problem is
that the igb driver reuses these mapped pages. So, packets are still
dropping even if all memory issues are gone and there is a plenty
of free memory.
In my case, some TCP sessions hang on a small percentage (< 0.1%)
of machines days after OOMs.
Fix this by avoiding reuse of such pages.
Signed-off-by: Roman Gushchin <klamm@yandex-team.ru>
Tested-by: Aaron Brown "aaron.f.brown@intel.com"
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
VMWare's e1000 implementation does not seem to support unicast filtering.
This can be observed by configuring a macvlan interface on eth0 in a VM in
VMWare Fusion 5.0.5, and trying to use that interface instead of eth0.
Tested on 3.16.
Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Pull two RCU fixes from Paul E. McKenney:
" - Complete the work of commit dd56af42bd (rcu: Eliminate deadlock
between CPU hotplug and expedited grace periods), which was
intended to allow synchronize_sched_expedited() to be safely
used when holding locks acquired by CPU-hotplug notifiers.
This commit makes the put_online_cpus() avoid the deadlock
instead of just handling the get_online_cpus().
- Complete the work of commit 35ce7f29a4 (rcu: Create rcuo
kthreads only for onlined CPUs), which was intended to allow
RCU to avoid allocating unneeded kthreads on systems where the
firmware says that there are more CPUs than are really present.
This commit makes rcu_barrier() aware of the mismatch, so that
it doesn't hang waiting for non-existent CPUs. "
Signed-off-by: Ingo Molnar <mingo@kernel.org>
User visible:
* Fix report -F (abort, in_tx, mispredict, etc) segfaults for sample.data files
without branch info (Jiri Olsa)
* Add patch that should have went in a previous patchkit to use global cache
provided by libunwind (Namhyung Kim)
* Make CPUINFO_PROC an array to support different kernels, problem
detected when the information reported via /proc/cpuinfo changed on ARM (Wang Nan)
* 'perf probe' --demangle typo fix and a new --quiet option (Masami Hiramatsu)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJUUOC5AAoJEBpxZoYYoA71wRYH/A5DhJq50X4KhHAr/RLcEn3T
HNFUGrQKChuczZ11JM9dZiCHN8TXTdG2ql71SFXzAQ96TP7LwWVcHlAQQV7CIwQM
jGFwSqPrDK2pD8EhQoOEChobGpNDAUPZclUUwCdw22cvoig0xdfjQjpNehAsxyzI
jgl6wb6hF/lHuoYXFxijrV3RbGcnNVUCHnjyhKD6WTplq3EvT16hsLFaM9t/r/p/
wL7wB5Sz8IUUSCWaq63u39spGwy+qH+bgZ9fyaccxCMhBPQ5+Eo9moHIfnjwuxbP
d8l7pDnCMXGiaMO1Jt3BH1C3mQr3HNFCxxrgQOOLM9VVmxi5WPcmu945tC354uw=
=Klbg
-----END PGP SIGNATURE-----
Merge tag 'perf-urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
- Fix report -F (abort, in_tx, mispredict, etc) segfaults for sample.data files
without branch info (Jiri Olsa)
- Add patch that should have went in a previous patchkit to use global cache
provided by libunwind (Namhyung Kim)
- Make CPUINFO_PROC an array to support different kernels, problem
detected when the information reported via /proc/cpuinfo changed on ARM (Wang Nan)
- 'perf probe' --demangle typo fix and a new --quiet option (Masami Hiramatsu)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Merge misc fixes from Andrew Morton:
"21 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (21 commits)
mm/balloon_compaction: fix deflation when compaction is disabled
sh: fix sh770x SCIF memory regions
zram: avoid NULL pointer access in concurrent situation
mm/slab_common: don't check for duplicate cache names
ocfs2: fix d_splice_alias() return code checking
mm: rmap: split out page_remove_file_rmap()
mm: memcontrol: fix missed end-writeback page accounting
mm: page-writeback: inline account_page_dirtied() into single caller
lib/bitmap.c: fix undefined shift in __bitmap_shift_{left|right}()
drivers/rtc/rtc-bq32k.c: fix register value
memory-hotplug: clear pgdat which is allocated by bootmem in try_offline_node()
drivers/rtc/rtc-s3c.c: fix initialization failure without rtc source clock
kernel/kmod: fix use-after-free of the sub_info structure
drivers/rtc/rtc-pm8xxx.c: rework to support pm8941 rtc
mm, thp: fix collapsing of hugepages on madvise
drivers: of: add return value to of_reserved_mem_device_init()
mm: free compound page with correct order
gcov: add ARM64 to GCOV_PROFILE_ALL
fsnotify: next_i is freed during fsnotify_unmount_inodes.
mm/compaction.c: avoid premature range skip in isolate_migratepages_range
...
If CONFIG_BALLOON_COMPACTION=n balloon_page_insert() does not link pages
with balloon and doesn't set PagePrivate flag, as a result
balloon_page_dequeue() cannot get any pages because it thinks that all
of them are isolated. Without balloon compaction nobody can isolate
ballooned pages. It's safe to remove this check.
Fixes: d6d86c0a7f ("mm/balloon_compaction: redesign ballooned pages management").
Signed-off-by: Konstantin Khlebnikov <k.khlebnikov@samsung.com>
Reported-by: Matt Mullins <mmullins@mmlx.us>
Cc: <stable@vger.kernel.org> [3.17]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Resources scif1_resources & scif2_resources overlap. Actual SCIF region
size is 0x10.
This is regression from commit d850acf975 ("sh: Declare SCIF register
base and IRQ as resources")
Signed-off-by: Andriy Skulysh <askulysh@gmail.com>
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is a rare NULL pointer bug in mem_used_total_show() and
mem_used_max_store() in concurrent situation, like this:
zram is not initialized, process A is a mem_used_total reader which runs
periodically, while process B try to init zram.
process A process B
access meta, get a NULL value
init zram, done
init_done() is true
access meta->mem_pool, get a NULL pointer BUG
This patch fixes this issue.
Signed-off-by: Weijie Yang <weijie.yang@samsung.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The SLUB cache merges caches with the same size and alignment and there
was long standing bug with this behavior:
- create the cache named "foo"
- create the cache named "bar" (which is merged with "foo")
- delete the cache named "foo" (but it stays allocated because "bar"
uses it)
- create the cache named "foo" again - it fails because the name "foo"
is already used
That bug was fixed in commit 694617474e ("slab_common: fix the check
for duplicate slab names") by not warning on duplicate cache names when
the SLUB subsystem is used.
Recently, cache merging was implemented the with SLAB subsystem too, in
12220dea07 ("mm/slab: support slab merge")). Therefore we need stop
checking for duplicate names even for the SLAB subsystem.
This patch fixes the bug by removing the check.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
d_splice_alias() can return a valid dentry, NULL or an ERR_PTR.
Currently the code checks not for ERR_PTR and will cuase an oops in
ocfs2_dentry_attach_lock(). Fix this by using IS_ERR_OR_NULL().
Signed-off-by: Richard Weinberger <richard@nod.at>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
page_remove_rmap() has too many branches on PageAnon() and is hard to
follow. Move the file part into a separate function.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 0a31bc97c8 ("mm: memcontrol: rewrite uncharge API") changed
page migration to uncharge the old page right away. The page is locked,
unmapped, truncated, and off the LRU, but it could race with writeback
ending, which then doesn't unaccount the page properly:
test_clear_page_writeback() migration
wait_on_page_writeback()
TestClearPageWriteback()
mem_cgroup_migrate()
clear PCG_USED
mem_cgroup_update_page_stat()
if (PageCgroupUsed(pc))
decrease memcg pages under writeback
release pc->mem_cgroup->move_lock
The per-page statistics interface is heavily optimized to avoid a
function call and a lookup_page_cgroup() in the file unmap fast path,
which means it doesn't verify whether a page is still charged before
clearing PageWriteback() and it has to do it in the stat update later.
Rework it so that it looks up the page's memcg once at the beginning of
the transaction and then uses it throughout. The charge will be
verified before clearing PageWriteback() and migration can't uncharge
the page as long as that is still set. The RCU lock will protect the
memcg past uncharge.
As far as losing the optimization goes, the following test results are
from a microbenchmark that maps, faults, and unmaps a 4GB sparse file
three times in a nested fashion, so that there are two negative passes
that don't account but still go through the new transaction overhead.
There is no actual difference:
old: 33.195102545 seconds time elapsed ( +- 0.01% )
new: 33.199231369 seconds time elapsed ( +- 0.03% )
The time spent in page_remove_rmap()'s callees still adds up to the
same, but the time spent in the function itself seems reduced:
# Children Self Command Shared Object Symbol
old: 0.12% 0.11% filemapstress [kernel.kallsyms] [k] page_remove_rmap
new: 0.12% 0.08% filemapstress [kernel.kallsyms] [k] page_remove_rmap
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Cc: <stable@vger.kernel.org> [3.17.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A follow-up patch would have changed the call signature. To save the
trouble, just fold it instead.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Cc: <stable@vger.kernel.org> [3.17.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If __bitmap_shift_left() or __bitmap_shift_right() are asked to shift by
a multiple of BITS_PER_LONG, they will try to shift a long value by
BITS_PER_LONG bits which is undefined. Change the functions to avoid
the undefined shift.
Coverity id: 1192175
Coverity id: 1192174
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix register value in bq32000 trickle charging.
Mike reported that I'm using wrong value in one trickle-charging case,
and after checking docs, I must admit he's right.
Signed-off-by: Pavel Machek <pavel@denx.de>
Reported-by: Mike Bremford <mike@bfo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When hot adding the same memory after hot removal, the following
messages are shown:
WARNING: CPU: 20 PID: 6 at mm/page_alloc.c:4968 free_area_init_node+0x3fe/0x426()
...
Call Trace:
dump_stack+0x46/0x58
warn_slowpath_common+0x81/0xa0
warn_slowpath_null+0x1a/0x20
free_area_init_node+0x3fe/0x426
hotadd_new_pgdat+0x90/0x110
add_memory+0xd4/0x200
acpi_memory_device_add+0x1aa/0x289
acpi_bus_attach+0xfd/0x204
acpi_bus_attach+0x178/0x204
acpi_bus_scan+0x6a/0x90
acpi_device_hotplug+0xe8/0x418
acpi_hotplug_work_fn+0x1f/0x2b
process_one_work+0x14e/0x3f0
worker_thread+0x11b/0x510
kthread+0xe1/0x100
ret_from_fork+0x7c/0xb0
The detaled explanation is as follows:
When hot removing memory, pgdat is set to 0 in try_offline_node(). But
if the pgdat is allocated by bootmem allocator, the clearing step is
skipped.
And when hot adding the same memory, the uninitialized pgdat is reused.
But free_area_init_node() checks wether pgdat is set to zero. As a
result, free_area_init_node() hits WARN_ON().
This patch clears pgdat which is allocated by bootmem allocator in
try_offline_node().
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Zhang Zhen <zhenzhang.zhang@huawei.com>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Toshi Kani <toshi.kani@hp.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix unconditional initialization failure on non-exynos3250 SoCs.
Commit df9e26d093 ("rtc: s3c: add support for RTC of Exynos3250 SoC")
introduced rtc source clock support, but also added initialization
failure on SoCs, which doesn't need such clock.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>