-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlrQ7ekACgkQONu9yGCS
aT6Znw//VtAP82BGP/+H6X6gt0rBRIYseEJkHOpKRu5PK+Vpx7mMQFIfBId95P6R
buq1QyzY9yz8ixbByg/w60WA2jK/I9i0tDGBnSlZzNmUvbk01oBN+cc/weZDynF7
rFbSvD3aTmPB4nm9VE+n7V/tgGeuu/bwi04zulAm/B0/zA+w9GZv/aAto3WlLdjF
ogZPSo5y6ifm6Qryq9sTR42LyDBXOy1klRSIK5EXY1OnIvPL1HSYR3ea2yj3AMXB
RPvpCCY8j7zC9yVifX1c+Gfv2tXVHb9kjgheJixP2J4M3fFlR5tjLQXtTP2S2I8G
cuMcdT6MiQw31rMoLcpej66dMtkL3k6sEpzcnSPPNenTuDIolz7BLEyaO/hhgi9J
6vIXAd4Xm9D8HkH3iG/L3GtD3JXpVPtHyli/X1M3hz/VNUSOUPENIjMmGoxfBOtQ
d7c8VGxDjnqmafri3fBAm4c603qW7O1wqJ7vLs9z7vgOIxlOLoJ/uiazoJKgW6O0
z0S/BABWqpAUAI9jgm2GvRDR2keM2mhQIgIrY0+ZpnaLSGe3MugB+GbK6xdBCuYA
anOv9VTEAPlTc8gb+GlusbUVjQyacEDwXoT6f9mELCW8cqpMgh+3TiKFihbYkUTN
ly/DxZH3jpva0dq94Mgjv1u/nlg9ac3zqGeo9buQQFC7MSoZKEM=
=LiZa
-----END PGP SIGNATURE-----
Merge 4.4.128 into android-4.4
Changes in 4.4.128
cfg80211: make RATE_INFO_BW_20 the default
md/raid5: make use of spin_lock_irq over local_irq_disable + spin_lock
rtc: snvs: fix an incorrect check of return value
x86/asm: Don't use RBP as a temporary register in csum_partial_copy_generic()
NFSv4.1: RECLAIM_COMPLETE must handle NFS4ERR_CONN_NOT_BOUND_TO_SESSION
IB/srpt: Fix abort handling
af_key: Fix slab-out-of-bounds in pfkey_compile_policy.
mac80211: bail out from prep_connection() if a reconfig is ongoing
bna: Avoid reading past end of buffer
qlge: Avoid reading past end of buffer
ipmi_ssif: unlock on allocation failure
net: cdc_ncm: Fix TX zero padding
net: ethernet: ti: cpsw: adjust cpsw fifos depth for fullduplex flow control
lockd: fix lockd shutdown race
drivers/misc/vmw_vmci/vmci_queue_pair.c: fix a couple integer overflow tests
pidns: disable pid allocation if pid_ns_prepare_proc() is failed in alloc_pid()
s390: move _text symbol to address higher than zero
net/mlx4_en: Avoid adding steering rules with invalid ring
NFSv4.1: Work around a Linux server bug...
CIFS: silence lockdep splat in cifs_relock_file()
blk-mq: NVMe 512B/4K+T10 DIF/DIX format returns I/O error on dd with split op
net: qca_spi: Fix alignment issues in rx path
netxen_nic: set rcode to the return status from the call to netxen_issue_cmd
Input: elan_i2c - check if device is there before really probing
Input: elantech - force relative mode on a certain module
KVM: PPC: Book3S PR: Check copy_to/from_user return values
vmxnet3: ensure that adapter is in proper state during force_close
SMB2: Fix share type handling
bus: brcmstb_gisb: Use register offsets with writes too
bus: brcmstb_gisb: correct support for 64-bit address output
PowerCap: Fix an error code in powercap_register_zone()
ARM: dts: imx53-qsrb: Pulldown PMIC IRQ pin
staging: wlan-ng: prism2mgmt.c: fixed a double endian conversion before calling hfa384x_drvr_setconfig16, also fixes relative sparse warning
x86/tsc: Provide 'tsc=unstable' boot parameter
ARM: dts: imx6qdl-wandboard: Fix audio channel swap
ipv6: avoid dad-failures for addresses with NODAD
async_tx: Fix DMA_PREP_FENCE usage in do_async_gen_syndrome()
usb: dwc3: keystone: check return value
btrfs: fix incorrect error return ret being passed to mapping_set_error
ata: libahci: properly propagate return value of platform_get_irq()
neighbour: update neigh timestamps iff update is effective
arp: honour gratuitous ARP _replies_
usb: chipidea: properly handle host or gadget initialization failure
USB: ene_usb6250: fix first command execution
net: x25: fix one potential use-after-free issue
USB: ene_usb6250: fix SCSI residue overwriting
serial: 8250: omap: Disable DMA for console UART
serial: sh-sci: Fix race condition causing garbage during shutdown
sh_eth: Use platform device for printing before register_netdev()
scsi: csiostor: fix use after free in csio_hw_use_fwconfig()
powerpc/mm: Fix virt_addr_valid() etc. on 64-bit hash
ath5k: fix memory leak on buf on failed eeprom read
selftests/powerpc: Fix TM resched DSCR test with some compilers
xfrm: fix state migration copy replay sequence numbers
iio: hi8435: avoid garbage event at first enable
iio: hi8435: cleanup reset gpio
ext4: handle the rest of ext4_mb_load_buddy() ENOMEM errors
md-cluster: fix potential lock issue in add_new_disk
ARM: davinci: da8xx: Create DSP device only when assigned memory
ray_cs: Avoid reading past end of buffer
leds: pca955x: Correct I2C Functionality
sched/numa: Use down_read_trylock() for the mmap_sem
net/mlx5: Tolerate irq_set_affinity_hint() failures
selinux: do not check open permission on sockets
block: fix an error code in add_partition()
mlx5: fix bug reading rss_hash_type from CQE
net: ieee802154: fix net_device reference release too early
libceph: NULL deref on crush_decode() error path
netfilter: ctnetlink: fix incorrect nf_ct_put during hash resize
pNFS/flexfiles: missing error code in ff_layout_alloc_lseg()
ASoC: rsnd: SSI PIO adjust to 24bit mode
scsi: bnx2fc: fix race condition in bnx2fc_get_host_stats()
fix race in drivers/char/random.c:get_reg()
ext4: fix off-by-one on max nr_pages in ext4_find_unwritten_pgoff()
tcp: better validation of received ack sequences
net: move somaxconn init from sysctl code
Input: elan_i2c - clear INT before resetting controller
bonding: Don't update slave->link until ready to commit
KVM: nVMX: Fix handling of lmsw instruction
net: llc: add lock_sock in llc_ui_bind to avoid a race condition
ARM: dts: ls1021a: add "fsl,ls1021a-esdhc" compatible string to esdhc node
thermal: power_allocator: fix one race condition issue for thermal_instances list
perf probe: Add warning message if there is unexpected event name
l2tp: fix missing print session offset info
rds; Reset rs->rs_bound_addr in rds_add_bound() failure path
hwmon: (ina2xx) Make calibration register value fixed
media: videobuf2-core: don't go out of the buffer range
ASoC: Intel: cht_bsw_rt5645: Analog Mic support
scsi: libiscsi: Allow sd_shutdown on bad transport
scsi: mpt3sas: Proper handling of set/clear of "ATA command pending" flag.
vfb: fix video mode and line_length being set when loaded
gpio: label descriptors using the device name
ASoC: Intel: sst: Fix the return value of 'sst_send_byte_stream_mrfld()'
wl1251: check return from call to wl1251_acx_arp_ip_filter
hdlcdrv: Fix divide by zero in hdlcdrv_ioctl
ovl: filter trusted xattr for non-admin
powerpc/[booke|4xx]: Don't clobber TCR[WP] when setting TCR[DIE]
dmaengine: imx-sdma: Handle return value of clk_prepare_enable
arm64: futex: Fix undefined behaviour with FUTEX_OP_OPARG_SHIFT usage
net/mlx5: avoid build warning for uniprocessor
cxgb4: FW upgrade fixes
rtc: opal: Handle disabled TPO in opal_get_tpo_time()
rtc: interface: Validate alarm-time before handling rollover
SUNRPC: ensure correct error is reported by xs_tcp_setup_socket()
net: freescale: fix potential null pointer dereference
KVM: SVM: do not zero out segment attributes if segment is unusable or not present
clk: scpi: fix return type of __scpi_dvfs_round_rate
clk: Fix __set_clk_rates error print-string
powerpc/spufs: Fix coredump of SPU contexts
perf trace: Add mmap alias for s390
qlcnic: Fix a sleep-in-atomic bug in qlcnic_82xx_hw_write_wx_2M and qlcnic_82xx_hw_read_wx_2M
mISDN: Fix a sleep-in-atomic bug
drm/omap: fix tiled buffer stride calculations
cxgb4: fix incorrect cim_la output for T6
Fix serial console on SNI RM400 machines
bio-integrity: Do not allocate integrity context for bio w/o data
skbuff: return -EMSGSIZE in skb_to_sgvec to prevent overflow
sit: reload iphdr in ipip6_rcv
net/mlx4: Fix the check in attaching steering rules
net/mlx4: Check if Granular QoS per VF has been enabled before updating QP qos_vport
perf header: Set proper module name when build-id event found
perf report: Ensure the perf DSO mapping matches what libdw sees
tags: honor COMPILED_SOURCE with apart output directory
e1000e: fix race condition around skb_tstamp_tx()
cx25840: fix unchecked return values
mceusb: sporadic RX truncation corruption fix
net: phy: avoid genphy_aneg_done() for PHYs without clause 22 support
ARM: imx: Add MXC_CPU_IMX6ULL and cpu_is_imx6ull
e1000e: Undo e1000e_pm_freeze if __e1000_shutdown fails
perf/core: Correct event creation with PERF_FORMAT_GROUP
MIPS: mm: fixed mappings: correct initialisation
MIPS: mm: adjust PKMAP location
MIPS: kprobes: flush_insn_slot should flush only if probe initialised
Fix loop device flush before configure v3
net: emac: fix reset timeout with AR8035 phy
perf tests: Decompress kernel module before objdump
skbuff: only inherit relevant tx_flags
xen: avoid type warning in xchg_xen_ulong
bnx2x: Allow vfs to disable txvlan offload
sctp: fix recursive locking warning in sctp_do_peeloff
sparc64: ldc abort during vds iso boot
iio: magnetometer: st_magn_spi: fix spi_device_id table
Bluetooth: Send HCI Set Event Mask Page 2 command only when needed
cpuidle: dt: Add missing 'of_node_put()'
ACPICA: Events: Add runtime stub support for event APIs
ACPICA: Disassembler: Abort on an invalid/unknown AML opcode
s390/dasd: fix hanging safe offline
vxlan: dont migrate permanent fdb entries during learn
bcache: stop writeback thread after detaching
bcache: segregate flash only volume write streams
scsi: libsas: fix memory leak in sas_smp_get_phy_events()
scsi: libsas: fix error when getting phy events
scsi: libsas: initialize sas_phy status according to response of DISCOVER
blk-mq: fix kernel oops in blk_mq_tag_idle()
tty: n_gsm: Allow ADM response in addition to UA for control dlci
EDAC, mv64x60: Fix an error handling path
cxgb4vf: Fix SGE FL buffer initialization logic for 64K pages
perf tools: Fix copyfile_offset update of output offset
ipsec: check return value of skb_to_sgvec always
rxrpc: check return value of skb_to_sgvec always
virtio_net: check return value of skb_to_sgvec always
virtio_net: check return value of skb_to_sgvec in one more location
random: use lockless method of accessing and updating f->reg_idx
futex: Remove requirement for lock_page() in get_futex_key()
Kbuild: provide a __UNIQUE_ID for clang
arp: fix arp_filter on l3slave devices
net: fix possible out-of-bound read in skb_network_protocol()
net/ipv6: Fix route leaking between VRFs
netlink: make sure nladdr has correct size in netlink_connect()
net/sched: fix NULL dereference in the error path of tcf_bpf_init()
pptp: remove a buggy dst release in pptp_connect()
sctp: do not leak kernel memory to user space
sctp: sctp_sockaddr_af must check minimal addr length for AF_INET6
sky2: Increase D3 delay to sky2 stops working after suspend
vhost: correctly remove wait queue during poll failure
vlan: also check phy_driver ts_info for vlan's real device
bonding: fix the err path for dev hwaddr sync in bond_enslave
bonding: move dev_mc_sync after master_upper_dev_link in bond_enslave
bonding: process the err returned by dev_set_allmulti properly in bond_enslave
net: fool proof dev_valid_name()
ip_tunnel: better validate user provided tunnel names
ipv6: sit: better validate user provided tunnel names
ip6_gre: better validate user provided tunnel names
ip6_tunnel: better validate user provided tunnel names
vti6: better validate user provided tunnel names
r8169: fix setting driver_data after register_netdev
net sched actions: fix dumping which requires several messages to user space
net/ipv6: Increment OUTxxx counters after netfilter hook
ipv6: the entire IPv6 header chain must fit the first fragment
vrf: Fix use after free and double free in vrf_finish_output
Revert "xhci: plat: Register shutdown for xhci_plat"
Linux 4.4.128
Change-Id: I9c1e58f634cc18f15a840c9d192c892dfcc5ff73
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit 7bd897cfce1eb373892d35d7f73201b0f9b221c4 ]
We don't set an error code on this path. It means that we return NULL
instead of an error pointer and the caller does a NULL dereference.
Fixes: 6d1d8050b4 ("block, partition: add partition_meta_info to hd_struct")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 19b7ccf8651df09d274671b53039c672a52ad84d upstream.
Commit 25520d55cd ("block: Inline blk_integrity in struct gendisk")
introduced blk_integrity_revalidate(), which seems to assume ownership
of the stable pages flag and unilaterally clears it if no blk_integrity
profile is registered:
if (bi->profile)
disk->queue->backing_dev_info->capabilities |=
BDI_CAP_STABLE_WRITES;
else
disk->queue->backing_dev_info->capabilities &=
~BDI_CAP_STABLE_WRITES;
It's called from revalidate_disk() and rescan_partitions(), making it
impossible to enable stable pages for drivers that support partitions
and don't use blk_integrity: while the call in revalidate_disk() can be
trivially worked around (see zram, which doesn't support partitions and
hence gets away with zram_revalidate_disk()), rescan_partitions() can
be triggered from userspace at any time. This breaks rbd, where the
ceph messenger is responsible for generating/verifying CRCs.
Since blk_integrity_{un,}register() "must" be used for (un)registering
the integrity profile with the block layer, move BDI_CAP_STABLE_WRITES
setting there. This way drivers that call blk_integrity_register() and
use integrity infrastructure won't interfere with drivers that don't
but still want stable pages.
Fixes: 25520d55cd ("block: Inline blk_integrity in struct gendisk")
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Mike Snitzer <snitzer@redhat.com>
Tested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
[idryomov@gmail.com: backport to < 4.11: bdi is embedded in queue]
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b30a337ca27c4f40439e4bfb290cba5f88d73bb7 upstream.
The initialization of partition's percpu_ref should have been done before
sending out KOBJ_ADD uevent, which may cause userspace to read partition
table. So the uninitialized percpu_ref may be accessed in data path.
This patch fixes this issue reported by Naveen.
Reported-by: Naveen Kaje <nkaje@codeaurora.org>
Tested-by: Naveen Kaje <nkaje@codeaurora.org>
Fixes: 6c71013ecb7e2(block: partition: convert percpu ref)
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For disk devices, a new uevent parameter 'NPARTS' specifies the number
of partitions detected by the kernel. Partition devices get 'PARTN' which
specifies the partitions index in the table, and 'PARTNAME', which
specifies PARTNAME specifices the partition name of a partition device
Signed-off-by: Dima Zavin <dima@android.com>
Today, blockdev --rereadpt /dev/sda will fail with EBUSY if any
partition of sda is mounted (and will fail with EINVAL if pointed
at a partition). But it will pass if the entire block device is
formatted with a filesystem and mounted. I don't think this makes
sense; partitioning should surely not ever change out from under
a mounted device.
So check for bdev->bd_super, and fail that with -EBUSY as well.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Up until now the_integrity profile has been dynamically allocated and
attached to struct gendisk after the disk has been made active.
This causes problems because NVMe devices need to register the profile
prior to the partition table being read due to a mandatory metadata
buffer requirement. In addition, DM goes through hoops to deal with
preallocating, but not initializing integrity profiles.
Since the integrity profile is small (4 bytes + a pointer), Christoph
suggested moving it to struct gendisk proper. This requires several
changes:
- Moving the blk_integrity definition to genhd.h.
- Inlining blk_integrity in struct gendisk.
- Removing the dynamic allocation code.
- Adding helper functions which allow gendisk to set up and tear down
the integrity sysfs dir when a disk is added/deleted.
- Adding a blk_integrity_revalidate() callback for updating the stable
pages bdi setting.
- The calls that depend on whether a device has an integrity profile or
not now key off of the bi->profile pointer.
- Simplifying the integrity support routines in DM (Mike Snitzer).
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reported-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Percpu refcount is the perfect match for partition's case,
and the conversion is quite straight.
With the convertion, one pair of atomic inc/dec can be saved
for accounting block I/O, which is run in hot path of block I/O.
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
So the helper can be used in both generic partition
case and part0 case.
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Releases the dev_t minor when all references are closed to prevent
another device from acquiring the same major/minor.
Since the partition's release may be invoked from call_rcu's soft-irq
context, the ext_dev_idr's mutex had to be replaced with a spinlock so
as not so sleep.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
This reverts commit 8761a3dc1f.
There are situations where the destruction path is called
with the bdev->bd_mutex already held, which then deadlocks in
loop_clr_fd(). The normal partition cleanup does a trylock()
on the mutex, but it'd be nice to have a more bullet proof
method in loop. So punt this more involved fix to the next
merge window, and just back out this buggy fix for now.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Any partitions added by user space to the loop device were being
left in place after detaching the loop device. This was because
the detach path issued a BLKRRPART to clean up partitions if
LO_FLAGS_PARTSCAN was set, meaning that the partitions were auto
scanned on attach. Replace this BLKRRPART with code that
unconditionally cleans up partitions on detach instead.
Signed-off-by: Phillip Susi <psusi@ubuntu.com>
Modified by Jens to export delete_partition().
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Currently, sizeof(struct parsed_partitions) may be 64KB in 32bit arch, so
it is easy to trigger page allocation failure by check_partition,
especially in hotplug block device situation(such as, USB mass storage,
MMC card, ...), and Felipe Balbi has observed the failure.
This patch does below optimizations on the allocation of struct
parsed_partitions to try to address the issue:
- make parsed_partitions.parts as pointer so that the pointed memory can
fit in 32KB buffer, then approximate 32KB memory can be saved
- vmalloc the buffer pointed by parsed_partitions.parts because 32KB is
still a bit big for kmalloc
- given that many devices have the partition count limit, so only
allocate disk_max_parts() partitions instead of 256 partitions always
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Reported-by: Felipe Balbi <balbi@ti.com>
Cc: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While adding and removing a lot of disks disks and partitions this
sometimes shows up:
WARNING: at fs/sysfs/dir.c:512 sysfs_add_one+0xc9/0x130() (Not tainted)
Hardware name:
sysfs: cannot create duplicate filename '/dev/block/259:751'
Modules linked in: raid1 autofs4 bnx2fc cnic uio fcoe libfcoe libfc 8021q scsi_transport_fc scsi_tgt garp stp llc sunrpc cpufreq_ondemand powernow_k8 freq_table mperf ipv6 dm_mirror dm_region_hash dm_log power_meter microcode dcdbas serio_raw amd64_edac_mod edac_core edac_mce_amd i2c_piix4 i2c_core k10temp bnx2 sg ixgbe dca mdio ext4 mbcache jbd2 dm_round_robin sr_mod cdrom sd_mod crc_t10dif ata_generic pata_acpi pata_atiixp ahci mptsas mptscsih mptbase scsi_transport_sas dm_multipath dm_mod [last unloaded: scsi_wait_scan]
Pid: 44103, comm: async/16 Not tainted 2.6.32-195.el6.x86_64 #1
Call Trace:
warn_slowpath_common+0x87/0xc0
warn_slowpath_fmt+0x46/0x50
sysfs_add_one+0xc9/0x130
sysfs_do_create_link+0x12b/0x170
sysfs_create_link+0x13/0x20
device_add+0x317/0x650
idr_get_new+0x13/0x50
add_partition+0x21c/0x390
rescan_partitions+0x32b/0x470
sd_open+0x81/0x1f0 [sd_mod]
__blkdev_get+0x1b6/0x3c0
blkdev_get+0x10/0x20
register_disk+0x155/0x170
add_disk+0xa6/0x160
sd_probe_async+0x13b/0x210 [sd_mod]
add_wait_queue+0x46/0x60
async_thread+0x102/0x250
default_wake_function+0x0/0x20
async_thread+0x0/0x250
kthread+0x96/0xa0
child_rip+0xa/0x20
kthread+0x0/0xa0
child_rip+0x0/0x20
This most likely happens because dev_t is freed while the number is
still used and idr_get_new() is not protected on every use. The fix
adds a mutex where it wasn't before and moves the dev_t free function so
it is called after device del.
Signed-off-by: Tomas Henzl <thenzl@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a new operation code (BLKPG_RESIZE_PARTITION) to the BLKPG ioctl that
allows altering the size of an existing partition, even if it is currently
in use.
This patch converts hd_struct->nr_sects into sequence counter because
One might extend a partition while IO is happening to it and update of
nr_sects can be non-atomic on 32bit machines with 64bit sector_t. This
can lead to issues like reading inconsistent size of a partition. Sequence
counter have been used so that readers don't have to take bdev mutex lock
as we call sector_in_part() very frequently.
Now all the access to hd_struct->nr_sects should happen using sequence
counter read/update helper functions part_nr_sects_read/part_nr_sects_write.
There is one exception though, set_capacity()/get_capacity(). I think
theoritically race should exist there too but this patch does not
modify set_capacity()/get_capacity() due to sheer number of call sites
and I am afraid that change might break something. I have left that as a
TODO item. We can handle it later if need be. This patch does not introduce
any new races as such w.r.t set_capacity()/get_capacity().
v2: Add CONFIG_LBDAF test to UP preempt case as suggested by Phillip.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Phillip Susi <psusi@ubuntu.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Since 2.6.39 (1196f8b), when a driver returns -ENOMEDIUM for open(),
__blkdev_get() calls rescan_partitions() to remove
in-kernel partition structures and raise KOBJ_CHANGE uevent.
However it ends up calling driver's revalidate_disk without open
and could cause oops.
In the case of SCSI:
process A process B
----------------------------------------------
sys_open
__blkdev_get
sd_open
returns -ENOMEDIUM
scsi_remove_device
<scsi_device torn down>
rescan_partitions
sd_revalidate_disk
<oops>
Oopses are reported here:
http://marc.info/?l=linux-scsi&m=132388619710052
This patch separates the partition invalidation from rescan_partitions()
and use it for -ENOMEDIUM case.
Reported-by: Huajun Li <huajun.li.lee@gmail.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>