commit 513f86d73855ce556ea9522b6bfd79f87356dc3a upstream.
If there an inode points to a block which is also some other type of
metadata block (such as a block allocation bitmap), the
buffer_verified flag can be set when it was validated as that other
metadata block type; however, it would make a really terrible external
attribute block. The reason why we use the verified flag is to avoid
constantly reverifying the block. However, it doesn't take much
overhead to make sure the magic number of the xattr block is correct,
and this will avoid potential crashes.
This addresses CVE-2018-10879.
https://bugzilla.kernel.org/show_bug.cgi?id=200001
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
[Backported to 4.4: adjust context]
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: I06728150aefd0fffbdb6bd7cbce0858221ff6f74
(cherry picked from commit 62a28a64d87fbdce5c0a988b440a4ae6dd37b41e)
commit 8bc1379b82b8e809eef77a9fedbb75c6c297be19 upstream.
Use a separate journal transaction if it turns out that we need to
convert an inline file to use an data block. Otherwise we could end
up failing due to not having journal credits.
This addresses CVE-2018-10883.
https://bugzilla.kernel.org/show_bug.cgi?id=200071
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
[fengc@google.com: 4.4 backport: adjust context]
Signed-off-by: Chenbo Feng <fengc@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: I75f040b4276587a6a234a6a53fd1d3d70be6ae09
(cherry picked from commit d49dc6f1d53479bca01900540a89639eea8b154e)
commit 5369a762c882c0b6e9599e4ebbb3a9ba9eee7e2d upstream.
In theory this should have been caught earlier when the xattr list was
verified, but in case it got missed, it's simple enough to add check
to make sure we don't overrun the xattr buffer.
This addresses CVE-2018-10879.
https://bugzilla.kernel.org/show_bug.cgi?id=200001
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
[bwh: Backported to 3.16:
- Add inode parameter to ext4_xattr_set_entry() and update callers
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
[adjusted for 4.4 context]
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change-Id: Ife3baeba57d5e63e7745ee8d5f4b01c6e9de4bc6
(cherry picked from commit ff3692e264d5c34ca9a15ab995808f98d9f874a8)
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlre3XwACgkQONu9yGCS
aT5KcRAAxB6w9SbjjlGv+PsN3ISQgnIPjWadBQ12WWnpr1sqZi0wrMZRsNiK5+UN
wPalUBiLiAIqNoDVSrDUgjyqC+wnQjhM/9tudEBqXQ6TQbSHQfQpZHQabLEtXxCP
Yd1EHwEgJrCHqaj17oFZFkps20ooKtSnYQ57pyZNem5EPR/ayaMWvo6WM7k6d2hD
E2WE57ShLbvslYaSvmDXML6o9f/bBKHOuL0GymVtDEUcyTLuw3GZaplnuaSLz6kc
o7tU2xVV+yajmpiEt4iR40Pgk+pygEGC14OI8dj/YHVotDzJKWnMgQ/HKxr8kyra
ImQPwu9DmaWqAUGr2SRmE/SXJpKdeYM1rxA/H3pMSaP9nRc2ccHyQF/ASGfHs+Mv
9hNQBjRugS4UXDzFhRlEh97CyfVa/ZuF0WgiBtBYnXSdXKA1xDq9cVf3UJg7k6om
1X7HLEVLhVLR7/liPjhOlTj9vrUzc6NcN+uVdfnmspI1BjTBe3ezzLqEP8VTUsNQ
p/V9r0i6TGR3gYQuTzjU/MaAuBZwj1D5sCnVUphCNUtSJf/0cjQsfYUcgtrtk67U
9Bjlo0pWHpAXxARiegBY3n5ClkZpdqEnt4Dp2MdR65pTSJ4MfC2UDLemUgB18arU
IllNzG2GywgQSouH3s5XPNZLkEvX8iK5lUWqRQ7ZiaA/0jVkn70=
=K6Qy
-----END PGP SIGNATURE-----
Merge 4.4.129 into android-4.4
Changes in 4.4.129
media: v4l2-compat-ioctl32: don't oops on overlay
parisc: Fix out of array access in match_pci_device()
perf intel-pt: Fix overlap detection to identify consecutive buffers correctly
perf intel-pt: Fix sync_switch
perf intel-pt: Fix error recovery from missing TIP packet
perf intel-pt: Fix timestamp following overflow
radeon: hide pointless #warning when compile testing
Revert "perf tests: Decompress kernel module before objdump"
block/loop: fix deadlock after loop_set_status
s390/qdio: don't retry EQBS after CCQ 96
s390/qdio: don't merge ERROR output buffers
s390/ipl: ensure loadparm valid flag is set
getname_kernel() needs to make sure that ->name != ->iname in long case
rtl8187: Fix NULL pointer dereference in priv->conf_mutex
hwmon: (ina2xx) Fix access to uninitialized mutex
cdc_ether: flag the Cinterion AHS8 modem by gemalto as WWAN
slip: Check if rstate is initialized before uncompressing
lan78xx: Correctly indicate invalid OTP
x86/hweight: Get rid of the special calling convention
x86/hweight: Don't clobber %rdi
tty: make n_tty_read() always abort if hangup is in progress
ubifs: Check ubifs_wbuf_sync() return code
ubi: fastmap: Don't flush fastmap work on detach
ubi: Fix error for write access
ubi: Reject MLC NAND
fs/reiserfs/journal.c: add missing resierfs_warning() arg
resource: fix integer overflow at reallocation
ipc/shm: fix use-after-free of shm file via remap_file_pages()
mm, slab: reschedule cache_reap() on the same CPU
usb: musb: gadget: misplaced out of bounds check
ARM: dts: at91: at91sam9g25: fix mux-mask pinctrl property
ARM: dts: at91: sama5d4: fix pinctrl compatible string
xen-netfront: Fix hang on device removal
regmap: Fix reversed bounds check in regmap_raw_write()
ACPI / video: Add quirk to force acpi-video backlight on Samsung 670Z5E
ACPI / hotplug / PCI: Check presence of slot itself in get_slot_status()
USB:fix USB3 devices behind USB3 hubs not resuming at hibernate thaw
usb: dwc3: pci: Properly cleanup resource
HID: i2c-hid: fix size check and type usage
powerpc/powernv: Handle unknown OPAL errors in opal_nvram_write()
powerpc/64: Fix smp_wmb barrier definition use use lwsync consistently
powerpc/powernv: define a standard delay for OPAL_BUSY type retry loops
powerpc/powernv: Fix OPAL NVRAM driver OPAL_BUSY loops
HID: Fix hid_report_len usage
HID: core: Fix size as type u32
ASoC: ssm2602: Replace reg_default_raw with reg_default
thunderbolt: Resume control channel after hibernation image is created
random: use a tighter cap in credit_entropy_bits_safe()
jbd2: if the journal is aborted then don't allow update of the log tail
ext4: don't update checksum of new initialized bitmaps
ext4: fail ext4_iget for root directory if unallocated
RDMA/ucma: Don't allow setting RDMA_OPTION_IB_PATH without an RDMA device
ALSA: pcm: Fix UAF at PCM release via PCM timer access
IB/srp: Fix srp_abort()
IB/srp: Fix completion vector assignment algorithm
dmaengine: at_xdmac: fix rare residue corruption
um: Use POSIX ucontext_t instead of struct ucontext
iommu/vt-d: Fix a potential memory leak
mmc: jz4740: Fix race condition in IRQ mask update
clk: mvebu: armada-38x: add support for 1866MHz variants
clk: mvebu: armada-38x: add support for missing clocks
clk: bcm2835: De-assert/assert PLL reset signal when appropriate
thermal: imx: Fix race condition in imx_thermal_probe()
watchdog: f71808e_wdt: Fix WD_EN register read
ALSA: oss: consolidate kmalloc/memset 0 call to kzalloc
ALSA: pcm: Use ERESTARTSYS instead of EINTR in OSS emulation
ALSA: pcm: Avoid potential races between OSS ioctls and read/write
ALSA: pcm: Return -EBUSY for OSS ioctls changing busy streams
ALSA: pcm: Fix mutex unbalance in OSS emulation ioctls
ALSA: pcm: Fix endless loop for XRUN recovery in OSS emulation
vfio-pci: Virtualize PCIe & AF FLR
vfio/pci: Virtualize Maximum Payload Size
vfio/pci: Virtualize Maximum Read Request Size
ext4: don't allow r/w mounts if metadata blocks overlap the superblock
drm/radeon: Fix PCIe lane width calculation
ext4: fix crashes in dioread_nolock mode
ext4: fix deadlock between inline_data and ext4_expand_extra_isize_ea()
ALSA: line6: Use correct endpoint type for midi output
ALSA: rawmidi: Fix missing input substream checks in compat ioctls
ALSA: hda - New VIA controller suppor no-snoop path
HID: hidraw: Fix crash on HIDIOCGFEATURE with a destroyed device
MIPS: uaccess: Add micromips clobbers to bzero invocation
MIPS: memset.S: EVA & fault support for small_memset
MIPS: memset.S: Fix return of __clear_user from Lpartial_fixup
MIPS: memset.S: Fix clobber of v1 in last_fixup
powerpc/eeh: Fix enabling bridge MMIO windows
powerpc/lib: Fix off-by-one in alternate feature patching
jffs2_kill_sb(): deal with failed allocations
hypfs_kill_super(): deal with failed allocations
rpc_pipefs: fix double-dput()
Don't leak MNT_INTERNAL away from internal mounts
autofs: mount point create should honour passed in mode
mm: allow GFP_{FS,IO} for page_cache_read page cache allocation
mm/filemap.c: fix NULL pointer in page_cache_tree_insert()
ext4: bugfix for mmaped pages in mpage_release_unused_pages()
fanotify: fix logic of events on child
writeback: safer lock nesting
Linux 4.4.129
Change-Id: I8806d2cc92fe512f27a349e8f630ced0cac9a8d7
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlquPNIACgkQONu9yGCS
aT6LURAAjSz1VBeImaAE0gwA95OTImKYIGvaltpP9Gls7o7brrheSiBXUIYHNFkQ
2TGGJQL7aJD+t0cTi4qqJndEq9Znq8VosA0xAUpDddv/Ebz9vtwh88Sjhspomjw0
cNHNQZC/+voKfl29aTATb/UilBb5Oku9ENlPrD7tKHIyHR82t6pt5Pp4Fj33L2p0
0EvJXXjBTJayOk8HqIquOOL+qGf+egRn80xCmMOmAhqf+/8OXFC5SI6F351NiVwa
dHK2v5LrTqTI6bUBIx7TYTAdkt6g5QNxm2dW/VmW5GWrCcWaiopTLRkd/7Lz8AAp
N/0dKERm1Y9dPhZ9c8+FsDB5uRosSw/CgU8ONUJskC9XTIJTGk5kdBsw2U6O6aMG
llO1Xg+hFqdiw/9GRojrt6WwXmDukjz5UsIKkoh8QB0cxFk5CQQpvXOcKOEIEr6A
fE+T+zobka0gdv9agbdxwq7fd49ZddrIgTwtg9QMXSX5LJ4xzdt34d8cwmaSOTER
Jxn3Y0p8Y0ZHEgRG2rojMF1Ic3CPOS/0Jm5tROWw3el43WHl3U4tM3Kh6sso56TF
5R6GI83+xupQOyt4fcCglcdHth6cmZzz+7draXdvRzDB1EhGlbjXo7R3rcM4ptdl
x8uU9dclirciWGrQmcp5UsR7/xADlvSzsTJaDjvxIf34C2KKXNE=
=x2eC
-----END PGP SIGNATURE-----
Merge 4.4.122 into android-4.4
Changes in 4.4.122
RDMA/ucma: Limit possible option size
RDMA/ucma: Check that user doesn't overflow QP state
RDMA/mlx5: Fix integer overflow while resizing CQ
scsi: qla2xxx: Fix NULL pointer crash due to active timer for ABTS
workqueue: Allow retrieval of current task's work struct
drm: Allow determining if current task is output poll worker
drm/nouveau: Fix deadlock on runtime suspend
drm/radeon: Fix deadlock on runtime suspend
drm/amdgpu: Fix deadlock on runtime suspend
drm/amdgpu: Notify sbios device ready before send request
drm/radeon: fix KV harvesting
drm/amdgpu: fix KV harvesting
MIPS: BMIPS: Do not mask IPIs during suspend
MIPS: ath25: Check for kzalloc allocation failure
MIPS: OCTEON: irq: Check for null return on kzalloc allocation
Input: matrix_keypad - fix race when disabling interrupts
loop: Fix lost writes caused by missing flag
kbuild: Handle builtin dtb file names containing hyphens
bcache: don't attach backing with duplicate UUID
x86/MCE: Serialize sysfs changes
ALSA: hda/realtek - Fix dock line-out volume on Dell Precision 7520
ALSA: seq: Don't allow resizing pool in use
ALSA: seq: More protection for concurrent write and ioctl races
ALSA: hda: add dock and led support for HP EliteBook 820 G3
ALSA: hda: add dock and led support for HP ProBook 640 G2
nospec: Include <asm/barrier.h> dependency
watchdog: hpwdt: SMBIOS check
watchdog: hpwdt: Check source of NMI
watchdog: hpwdt: fix unused variable warning
netfilter: nfnetlink_queue: fix timestamp attribute
ARM: omap2: hide omap3_save_secure_ram on non-OMAP3 builds
Input: tca8418_keypad - remove double read of key event register
tc358743: fix register i2c_rd/wr function fix
netfilter: add back stackpointer size checks
netfilter: x_tables: fix missing timer initialization in xt_LED
netfilter: nat: cope with negative port range
netfilter: IDLETIMER: be syzkaller friendly
netfilter: ebtables: CONFIG_COMPAT: don't trust userland offsets
netfilter: bridge: ebt_among: add missing match size checks
netfilter: ipv6: fix use-after-free Write in nf_nat_ipv6_manip_pkt
netfilter: use skb_to_full_sk in ip_route_me_harder
netfilter: x_tables: pass xt_counters struct instead of packet counter
netfilter: x_tables: pass xt_counters struct to counter allocator
netfilter: x_tables: pack percpu counter allocations
ext4: inplace xattr block update fails to deduplicate blocks
ubi: Fix race condition between ubi volume creation and udev
scsi: qla2xxx: Replace fcport alloc with qla2x00_alloc_fcport
NFS: Fix an incorrect type in struct nfs_direct_req
Revert "ARM: dts: LogicPD Torpedo: Fix I2C1 pinmux"
x86/module: Detect and skip invalid relocations
x86: Treat R_X86_64_PLT32 as R_X86_64_PC32
serial: sh-sci: prevent lockup on full TTY buffers
tty/serial: atmel: add new version check for usart
uas: fix comparison for error code
staging: comedi: fix comedi_nsamples_left.
staging: android: ashmem: Fix lockdep issue during llseek
USB: storage: Add JMicron bridge 152d:2567 to unusual_devs.h
usb: quirks: add control message delay for 1b1c:1b20
USB: usbmon: remove assignment from IS_ERR argument
usb: usbmon: Read text within supplied buffer size
usb: gadget: f_fs: Fix use-after-free in ffs_fs_kill_sb()
serial: 8250_pci: Add Brainboxes UC-260 4 port serial device
fixup: sctp: verify size of a new chunk in _sctp_make_chunk()
Linux 4.4.122
Change-Id: I0946c4a7c59be33f18bed6498c3cdb748e82bbaf
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit ec00022030da5761518476096626338bd67df57a upstream.
When an xattr block has a single reference, block is updated inplace
and it is reinserted to the cache. Later, a cache lookup is performed
to see whether an existing block has the same contents. This cache
lookup will most of the time return the just inserted entry so
deduplication is not achieved.
Running the following test script will produce two xattr blocks which
can be observed in "File ACL: " line of debugfs output:
mke2fs -b 1024 -I 128 -F -O extent /dev/sdb 1G
mount /dev/sdb /mnt/sdb
touch /mnt/sdb/{x,y}
setfattr -n user.1 -v aaa /mnt/sdb/x
setfattr -n user.2 -v bbb /mnt/sdb/x
setfattr -n user.1 -v aaa /mnt/sdb/y
setfattr -n user.2 -v bbb /mnt/sdb/y
debugfs -R 'stat x' /dev/sdb | cat
debugfs -R 'stat y' /dev/sdb | cat
This patch defers the reinsertion to the cache so that we can locate
other blocks with the same contents.
Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlkJWpMACgkQONu9yGCS
aT7guA/+JdSobjlRUshtcbUGVEwMjSuNFkZEpeTUWxxkrnNVPnIefP4jcCXEctvL
OxY4TxtvCQO/m+4Yx0ImVkcPBajd55OWiV90fZ0khVwu+4abLPgizj9lUhrXmmGV
LZjRyurtYFAwoGCvNsPE8NHxf923SFB8j1og0dEmoFGrH4tI+K5A9KPYKaYASU9Q
uT5rQMU0YrZBvJYzTc9DNKWHD4ekLzn7o/ORodFwQVC7pdQdGpOCq3Ap+LZbtYnn
146ziEfycRBSt3x9kYf7gztdGLv4tLZJJv7McI6qfX8+Vrt+Wgy4ObSblmTl57RH
4WAxed2gZ8NW+fnSJZFR8iomRBu1dsyyTESSt1lCEC6i29ardQip5y4/yGLaBtiJ
nbcUp1Ld+twQYm0p1UMJVo0DUE6xcrwnCoNyhkGzz1XfdQQwvFCaq30PlsjKxI6E
X/1rRfuICH2dmIn1ziiCb8qBBjHvKbZY5Mg7W8s6E12yIGKuY08m3KaimSMdWt1D
jKUKMGD9AunC2l4OAGggObMoTG5SaGSSDr8yPG9QxVvD0AvpnpSEFJ8PIi5O7JiB
jcFNZawAljzIf0VYGrbGAzbrijiaan/WHm3va7U7K1JzIdFzbOlUANpJLhBR70Mb
Gc3GEcdMflqJUJ6lapEaaFyC8qPjNI5Ks0/7ER0pgTICBoFVSyg=
=eMmm
-----END PGP SIGNATURE-----
Merge 4.4.66 into android-4.4
Changes in 4.4.66:
f2fs: do more integrity verification for superblock
xc2028: unlock on error in xc2028_set_config()
ARM: OMAP2+: timer: add probe for clocksources
clk: sunxi: Add apb0 gates for H3
crypto: testmgr - fix out of bound read in __test_aead()
drm/amdgpu: fix array out of bounds
ext4: check if in-inode xattr is corrupted in ext4_expand_extra_isize_ea()
md:raid1: fix a dead loop when read from a WriteMostly disk
MIPS: Fix crash registers on non-crashing CPUs
net: cavium: liquidio: Avoid dma_unmap_single on uninitialized ndata
net_sched: close another race condition in tcf_mirred_release()
RDS: Fix the atomicity for congestion map update
regulator: core: Clear the supply pointer if enabling fails
usb: gadget: f_midi: Fixed a bug when buflen was smaller than wMaxPacketSize
xen/x86: don't lose event interrupts
sparc64: kern_addr_valid regression
sparc64: Fix kernel panic due to erroneous #ifdef surrounding pmd_write()
net: neigh: guard against NULL solicit() method
net: phy: handle state correctly in phy_stop_machine
l2tp: purge socket queues in the .destruct() callback
net/packet: fix overflow in check for tp_frame_nr
net/packet: fix overflow in check for tp_reserve
l2tp: take reference on sessions being dumped
l2tp: fix PPP pseudo-wire auto-loading
net: ipv4: fix multipath RTM_GETROUTE behavior when iif is given
sctp: listen on the sock only when it's state is listening or closed
tcp: clear saved_syn in tcp_disconnect()
dp83640: don't recieve time stamps twice
net: ipv6: RTF_PCPU should not be settable from userspace
netpoll: Check for skb->queue_mapping
ip6mr: fix notification device destruction
macvlan: Fix device ref leak when purging bc_queue
ipv6: check skb->protocol before lookup for nexthop
ipv6: check raw payload size correctly in ioctl
ALSA: firewire-lib: fix inappropriate assignment between signed/unsigned type
ALSA: seq: Don't break snd_use_lock_sync() loop by timeout
MIPS: KGDB: Use kernel context for sleeping threads
MIPS: Avoid BUG warning in arch_check_elf
p9_client_readdir() fix
Input: i8042 - add Clevo P650RS to the i8042 reset list
nfsd: check for oversized NFSv2/v3 arguments
ARCv2: save r30 on kernel entry as gcc uses it for code-gen
ftrace/x86: Fix triple fault with graph tracing and suspend-to-ram
Linux 4.4.66
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 9e92f48c34eb2b9af9d12f892e2fe1fce5e8ce35 upstream.
We aren't checking to see if the in-inode extended attribute is
corrupted before we try to expand the inode's extra isize fields.
This can lead to potential crashes caused by the BUG_ON() check in
ext4_xattr_shift_entries().
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(Cherry-pick from commit 82939d7999dfc1f1998c4b1c12e2f19edbdff272)
The conversion is generally straightforward. The only tricky part is
that xattr block corresponding to found mbcache entry can get freed
before we get buffer lock for that block. So we have to check whether
the entry is still valid after getting buffer lock.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Bug; 32461228
commit b47820edd1634dc1208f9212b7ecfb4230610a23 upstream.
We temporally change checksum fields in buffers of some types of
metadata into '0' for verifying the checksum values. By doing this
without locking the buffer, some metadata's checksums, which are
being committed or written back to the storage, could be damaged.
In our test, several metadata blocks were found with damaged metadata
checksum value during recovery process. When we only verify the
checksum value, we have to avoid modifying checksum fields directly.
Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com>
Signed-off-by: Youngjin Gil <youngjin.gil@samsung.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Török Edwin <edwin@etorok.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2e81a4eeedcaa66e35f58b81e0755b87057ce392 upstream.
When we need to move xattrs into external xattr block, we call
ext4_xattr_block_set() from ext4_expand_extra_isize_ea(). That may end
up calling ext4_mark_inode_dirty() again which will recurse back into
the inode expansion code leading to deadlocks.
Protect from recursion using EXT4_STATE_NO_EXPAND inode flag and move
its management into ext4_expand_extra_isize_ea() since its manipulation
is safe there (due to xattr_sem) from possible races with
ext4_xattr_set_handle() which plays with it as well.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 443a8c41cd49de66a3fda45b32b9860ea0292b84 upstream.
We did not count with the padding of xattr value when computing desired
shift of xattrs in the inode when expanding i_extra_isize. As a result
we could create unaligned start of inline xattrs. Account for alignment
properly.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 418c12d08dc64a45107c467ec1ba29b5e69b0715 upstream.
When multiple xattrs need to be moved out of inode, we did not properly
recompute total size of xattr headers in the inode and the new header
position. Thus when moving the second and further xattr we asked
ext4_xattr_shift_entries() to move too much and from the wrong place,
resulting in possible xattr value corruption or general memory
corruption.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d0141191a20289f8955c1e03dad08e42e6f71ca9 upstream.
The code in ext4_expand_extra_isize_ea() treated new_extra_isize
argument sometimes as the desired target i_extra_isize and sometimes as
the amount by which we need to grow current i_extra_isize. These happen
to coincide when i_extra_isize is 0 which used to be the common case and
so nobody noticed this until recently when we added i_projid to the
inode and so i_extra_isize now needs to grow from 28 to 32 bytes.
The result of these bugs was that we sometimes unnecessarily decided to
move xattrs out of inode even if there was enough space and we often
ended up corrupting in-inode xattrs because arguments to
ext4_xattr_shift_entries() were just wrong. This could demonstrate
itself as BUG_ON in ext4_xattr_shift_entries() triggering.
Fix the problem by introducing new isize_diff variable and use it where
appropriate.
Reported-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The xattr_handler operations are currently all passed a file system
specific flags value which the operations can use to disambiguate between
different handlers; some file systems use that to distinguish the xattr
namespace, for example. In some oprations, it would be useful to also have
access to the handler prefix. To allow that, pass a pointer to the handler
to operations instead of the flags value alone.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Create separate predicate functions to test/set/clear feature flags,
thereby replacing the wordy old macros. Furthermore, clean out the
places where we open-coded feature tests.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Instead of overloading EIO for CRC errors and corrupt structures,
return the same error codes that XFS returns for the same issues.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Pull fourth vfs update from Al Viro:
"d_inode() annotations from David Howells (sat in for-next since before
the beginning of merge window) + four assorted fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
RCU pathwalk breakage when running into a symlink overmounting something
fix I_DIO_WAKEUP definition
direct-io: only inc/dec inode->i_dio_count for file systems
fs/9p: fix readdir()
VFS: assorted d_backing_inode() annotations
VFS: fs/inode.c helpers: d_inode() annotations
VFS: fs/cachefiles: d_backing_inode() annotations
VFS: fs library helpers: d_inode() annotations
VFS: assorted weird filesystems: d_inode() annotations
VFS: normal filesystems (and lustre): d_inode() annotations
VFS: security/: d_inode() annotations
VFS: security/: d_backing_inode() annotations
VFS: net/: d_inode() annotations
VFS: net/unix: d_backing_inode() annotations
VFS: kernel/: d_inode() annotations
VFS: audit: d_backing_inode() annotations
VFS: Fix up some ->d_inode accesses in the chelsio driver
VFS: Cachefiles should perform fs modifications on the top layer only
VFS: AF_UNIX sockets should call mknod on the top layer only
that's the bulk of filesystem drivers dealing with inodes of their own
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
In this if statement, the previous condition is useless, the later one
has covered it.
Signed-off-by: Weiyuan <weiyuan.wei@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Remove unused header files and header files which are included in
ext4.h.
Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Besides the fact that this replacement improves code readability
it also protects from errors caused direct EXT4_S(sb)->s_es manipulation
which may result attempt to use uninitialized csum machinery.
#Testcase_BEGIN
IMG=/dev/ram0
MNT=/mnt
mkfs.ext4 $IMG
mount $IMG $MNT
#Enable feature directly on disk, on mounted fs
tune2fs -O metadata_csum $IMG
# Provoke metadata update, likey result in OOPS
touch $MNT/test
umount $MNT
#Testcase_END
# Replacement script
@@
expression E;
@@
- EXT4_HAS_RO_COMPAT_FEATURE(E, EXT4_FEATURE_RO_COMPAT_METADATA_CSUM)
+ ext4_has_metadata_csum(E)
https://bugzilla.kernel.org/show_bug.cgi?id=82201
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
When loading extended attributes, check each entry's value offset to
make sure it doesn't collide with the entries.
Without this check it is easy to crash the kernel by mounting a
malicious FS containing a file with an EA wherein e_value_offs = 0 and
e_value_size > 0 and then deleting the EA, which corrupts the name
list.
(See the f_ea_value_crash test's FS image in e2fsprogs for an example.)
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
The EXT4_STATE_DELALLOC_RESERVED flag was originally implemented
because it was too hard to make sure the mballoc and get_block flags
could be reliably passed down through all of the codepaths that end up
calling ext4_mb_new_blocks().
Since then, we have mb_flags passed down through most of the code
paths, so getting rid of EXT4_STATE_DELALLOC_RESERVED isn't as tricky
as it used to.
This commit plumbs in the last of what is required, and then adds a
WARN_ON check to make sure we haven't missed anything. If this passes
a full regression test run, we can then drop
EXT4_STATE_DELALLOC_RESERVED.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
In ext4_xattr_set_handle() we have checked the xattr name's length. So
we should also check it in ext4_xattr_get() to avoid unneeded lookup
caused by invalid name.
Signed-off-by: Zhang Zhen <zhenzhang.zhang@huawei.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
When heavily exercising xattr code the assertion that
jbd2_journal_dirty_metadata() shouldn't return error was triggered:
WARNING: at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1237
jbd2_journal_dirty_metadata+0x1ba/0x260()
CPU: 0 PID: 8877 Comm: ceph-osd Tainted: G W 3.10.0-ceph-00049-g68d04c9 #1
Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
ffffffff81a1d3c8 ffff880214469928 ffffffff816311b0 ffff880214469968
ffffffff8103fae0 ffff880214469958 ffff880170a9dc30 ffff8802240fbe80
0000000000000000 ffff88020b366000 ffff8802256e7510 ffff880214469978
Call Trace:
[<ffffffff816311b0>] dump_stack+0x19/0x1b
[<ffffffff8103fae0>] warn_slowpath_common+0x70/0xa0
[<ffffffff8103fb2a>] warn_slowpath_null+0x1a/0x20
[<ffffffff81267c2a>] jbd2_journal_dirty_metadata+0x1ba/0x260
[<ffffffff81245093>] __ext4_handle_dirty_metadata+0xa3/0x140
[<ffffffff812561f3>] ext4_xattr_release_block+0x103/0x1f0
[<ffffffff81256680>] ext4_xattr_block_set+0x1e0/0x910
[<ffffffff8125795b>] ext4_xattr_set_handle+0x38b/0x4a0
[<ffffffff810a319d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff81257b32>] ext4_xattr_set+0xc2/0x140
[<ffffffff81258547>] ext4_xattr_user_set+0x47/0x50
[<ffffffff811935ce>] generic_setxattr+0x6e/0x90
[<ffffffff81193ecb>] __vfs_setxattr_noperm+0x7b/0x1c0
[<ffffffff811940d4>] vfs_setxattr+0xc4/0xd0
[<ffffffff8119421e>] setxattr+0x13e/0x1e0
[<ffffffff811719c7>] ? __sb_start_write+0xe7/0x1b0
[<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
[<ffffffff8118c65c>] ? fget_light+0x3c/0x130
[<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
[<ffffffff8118f1f8>] ? __mnt_want_write+0x58/0x70
[<ffffffff811946be>] SyS_fsetxattr+0xbe/0x100
[<ffffffff816407c2>] system_call_fastpath+0x16/0x1b
The reason for the warning is that buffer_head passed into
jbd2_journal_dirty_metadata() didn't have journal_head attached. This is
caused by the following race of two ext4_xattr_release_block() calls:
CPU1 CPU2
ext4_xattr_release_block() ext4_xattr_release_block()
lock_buffer(bh);
/* False */
if (BHDR(bh)->h_refcount == cpu_to_le32(1))
} else {
le32_add_cpu(&BHDR(bh)->h_refcount, -1);
unlock_buffer(bh);
lock_buffer(bh);
/* True */
if (BHDR(bh)->h_refcount == cpu_to_le32(1))
get_bh(bh);
ext4_free_blocks()
...
jbd2_journal_forget()
jbd2_journal_unfile_buffer()
-> JH is gone
error = ext4_handle_dirty_xattr_block(handle, inode, bh);
-> triggers the warning
We fix the problem by moving ext4_handle_dirty_xattr_block() under the
buffer lock. Sadly this cannot be done in nojournal mode as that
function can call sync_dirty_buffer() which would deadlock. Luckily in
nojournal mode the race is harmless (we only dirty already freed buffer)
and thus for nojournal mode we leave the dirtying outside of the buffer
lock.
Reported-by: Sage Weil <sage@inktank.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
This patch adds new interfaces to create and destory cache,
ext4_xattr_create_cache() and ext4_xattr_destroy_cache(), and remove
the cache creation and destory calls from ex4_init_xattr() and
ext4_exitxattr() in fs/ext4/xattr.c.
fs/ext4/super.c has been changed so that when a filesystem is mounted
a cache is allocated and attched to its ext4_sb_info structure.
fs/mbcache.c has been changed so that only one slab allocator is
allocated and used by all mbcache structures.
Signed-off-by: T. Makphaibulchoke <tmac@hp.com>
The function ext4_expand_extra_isize_ea() doesn't need the size of all
of the extended attribute headers. So if we don't calculate it when
it is unneeded, it we can skip some undeeded memory references, and as
a bonus, we eliminate some kvetching by static code analysis tools.
Addresses-Coverity-Id: #741291
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
If we take the 2nd retry path in ext4_expand_extra_isize_ea, we
potentionally return from the function without having freed these
allocations. If we don't do the return, we over-write the previous
allocation pointers, so we leak either way.
Spotted with Coverity.
[ Fixed by tytso to set is and bs to NULL after freeing these
pointers, in case in the retry loop we later end up triggering an
error causing a jump to cleanup, at which point we could have a double
free bug. -- Ted ]
Signed-off-by: Dave Jones <davej@fedoraproject.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Cc: stable@vger.kernel.org
Currently when new xattr block is created or released we we would call
dquot_free_block() or dquot_alloc_block() respectively, among the else
decrementing or incrementing the number of blocks assigned to the
inode by one block.
This however does not work for bigalloc file system because we always
allocate/free the whole cluster so we have to count with that in
dquot_free_block() and dquot_alloc_block() as well.
Use the clusters-to-blocks conversion EXT4_C2B() when passing number of
blocks to the dquot_alloc/free functions to fix the problem.
The problem has been revealed by xfstests #117 (and possibly others).
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Cc: stable@vger.kernel.org
Operations which modify extended attributes may need extra journal
credits if inline data is used, since there is a chance that some
extended attributes may need to get pushed to an external attribute
block.
Changes to reflect this was made in xattr.c, but they were missed in
fs/ext4/acl.c. To fix this, abstract the calculation of the number of
credits needed for xattr operations to an inline function defined in
ext4_jbd2.h, and use it in acl.c and xattr.c.
Also move the function declarations used in inline.c from xattr.h
(where they are non-obviously hidden, and caused problems since
ext4_jbd2.h needs to use the function ext4_has_inline_data), and move
them to ext4.h.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Tao Ma <boyu.mt@taobao.com>
Reviewed-by: Jan Kara <jack@suse.cz>
So we can better understand what bits of ext4 are responsible for
long-running jbd2 handles, use jbd2__journal_start() so we can pass
context information for logging purposes.
The recommended way for finding the longer-running handles is:
T=/sys/kernel/debug/tracing
EVENT=$T/events/jbd2/jbd2_handle_stats
echo "interval > 5" > $EVENT/filter
echo 1 > $EVENT/enable
./run-my-fs-benchmark
cat $T/trace > /tmp/problem-handles
This will list handles that were active for longer than 20ms. Having
longer-running handles is bad, because a commit started at the wrong
time could stall for those 20+ milliseconds, which could delay an
fsync() or an O_SYNC operation. Here is an example line from the
trace file describing a handle which lived on for 311 jiffies, or over
1.2 seconds:
postmark-2917 [000] .... 196.435786: jbd2_handle_stats: dev 254,32
tid 570 type 2 line_no 2541 interval 311 sync 0 requested_blocks 1
dirtied_blocks 0
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Because the function 'sb_getblk' seldomly fails to return NULL
value,it will be better to use 'unlikely' to optimize it.
Signed-off-by: Wang Shilong <wangsl-fnst@cn.fujitsu.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The only reason for sb_getblk() failing is if it can't allocate the
buffer_head. So ENOMEM is more appropriate than EIO. In addition,
make sure that the file system is marked as being inconsistent if
sb_getblk() fails.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
Not all architectures (in particular, sparc64) have empty_zero_page.
So instead of copying from empty_zero_page, use memset to clear the
inline data by signalling to ext4_xattr_set_entry() via a magic
pointer value, EXT4_ZERO_ATTR_VALUE, which is defined by casting -1 to
a pointer.
This fixes a build failure on sparc64, and the memset() should be more
efficient than using memcpy() anyway.
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Now we that store data in the inode, in case we need to store some
xattrs and inode doesn't have enough space, Andreas suggested that we
should keep the xattr(metadata) in and data should be pushed out. So
this patch does the work.
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
The inline data feature will need some inline xattr functions, so
export them from fs/ext4/xattr.c so that inline.c can use them.
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
ext4_handle_release_buffer() was intended to remove journal
write access from a buffer, but it doesn't actually do anything
at all other than add a BUFFER_TRACE point, but it's not reliably
used for that either. Remove all the associated dead code.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
In xattr block operation, we use h_refcount to indicate whether the
xattr block is shared among many inodes. And xattr block csum uses
s_csum_seed if it is shared and i_csum_seed if it belongs to
one inode. But this has a problem. So consider the block is shared
first bewteen inode A and B, and B has some xattr update and CoW
the xattr block. When it updates the *old* xattr block(because
of the h_refcount change) and calls ext4_xattr_release_block, we
has no idea that inode A is the real owner of the *old* xattr
block and we can't use the i_csum_seed of inode A either in xattr
block csum calculation. And I don't think we have an easy way to
find inode A.
So this patch just removes the tricky i_csum_seed and we now uses
s_csum_seed every time for the xattr block csum. The corresponding
patch for the e2fsprogs will be sent in another patch.
This is spotted by xfstests 117.
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Acked-by: Darrick J. Wong <djwong@us.ibm.com>
Calculate and verify the checksums of extended attribute blocks. This
only applies to separate EA blocks that are pointed to by
inode->i_file_acl (i.e. external EA blocks); the checksum lives in
the EA header.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Add argument validation to debug functions.
Use ##__VA_ARGS__.
Fix format and argument mismatches.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Processes hang forever on a sync-mounted ext2 file system that
is mounted with the ext4 module (default in Fedora 16).
I can reproduce this reliably by mounting an ext2 partition with
"-o sync" and opening a new file an that partition with vim. vim
will hang in "D" state forever. The same happens on ext4 without
a journal.
I am attaching a small patch here that solves this issue for me.
In the sync mounted case without a journal,
ext4_handle_dirty_metadata() may call sync_dirty_buffer(), which
can't be called with buffer lock held.
Also move mb_cache_entry_release inside lock to avoid race
fixed previously by 8a2bfdcb ext[34]: EA block reference count racing fix
Note too that ext2 fixed this same problem in 2006 with
b2f49033 [PATCH] fix deadlock in ext2
Signed-off-by: Martin.Wilck@ts.fujitsu.com
[sandeen@redhat.com: move mb_cache_entry_release before unlock, edit commit msg]
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
We could return directly from ext4_xattr_check_block(). Thus, we
shouldn't need to define a 'error' variable.
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Ceph users reported that when using Ceph on ext4, the filesystem
would often become corrupted, containing inodes with incorrect
i_blocks counters.
I managed to reproduce this with a very hacked-up "streamtest"
binary from the Ceph tree.
Ceph is doing a lot of xattr writes, to out-of-inode blocks.
There is also another thread which does sync_file_range and close,
of the same files. The problem appears to happen due to this race:
sync/flush thread xattr-set thread
----------------- ----------------
do_writepages ext4_xattr_set
ext4_da_writepages ext4_xattr_set_handle
mpage_da_map_blocks ext4_xattr_block_set
set DELALLOC_RESERVE
ext4_new_meta_blocks
ext4_mb_new_blocks
if (!i_delalloc_reserved_flag)
vfs_dq_alloc_block
ext4_get_blocks
down_write(i_data_sem)
set i_delalloc_reserved_flag
...
up_write(i_data_sem)
if (i_delalloc_reserved_flag)
vfs_dq_alloc_block_nofail
In other words, the sync/flush thread pops in and sets
i_delalloc_reserved_flag on the inode, which makes the xattr thread
think that it's in a delalloc path in ext4_new_meta_blocks(),
and add the block for a second time, after already having added
it once in the !i_delalloc_reserved_flag case in ext4_mb_new_blocks
The real problem is that we shouldn't be using the DELALLOC_RESERVED
state flag, and instead we should be passing
EXT4_GET_BLOCKS_DELALLOC_RESERVE down to ext4_map_blocks() instead of
using an inode state flag. We'll fix this for now with using
i_data_sem to prevent this race, but this is really not the right way
to fix things.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org