Commit graph

1261 commits

Author SHA1 Message Date
Alex Estrin
cabf4efd07 IB/ipoib: Fix for potential no-carrier state
[ Upstream commit 1029361084d18cc270f64dfd39529fafa10cfe01 ]

On reboot SM can program port pkey table before ipoib registered its
event handler, which could result in missing pkey event and leave root
interface with initial pkey value from index 0.

Since OPA port starts with invalid pkey in index 0, root interface will
fail to initialize and stay down with no-carrier flag.

For IB ipoib interface may end up with pkey different from value
opensm put in pkey table idx 0, resulting in connectivity issues
(different mcast groups, for example).

Close the window by calling event handler after registration
to make sure ipoib pkey is in sync with port pkey table.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30 07:48:55 +02:00
Bart Van Assche
7a113a311b IB/srp: Fix completion vector assignment algorithm
commit 3a148896b24adf8688dc0c59af54531931677a40 upstream.

Ensure that cv_end is equal to ibdev->num_comp_vectors for the
NUMA node with the highest index. This patch improves spreading
of RDMA channels over completion vectors and thereby improves
performance, especially on systems with only a single NUMA node.
This patch drops support for the comp_vector login parameter by
ignoring the value of that parameter since I have not found a
good way to combine support for that parameter and automatic
spreading of RDMA channels over completion vectors.

Fixes: d92c0da71a ("IB/srp: Add multichannel support")
Reported-by: Alexander Schmid <alex@modula-shop-systems.de>
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Alexander Schmid <alex@modula-shop-systems.de>
Cc: stable@vger.kernel.org
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-24 09:32:08 +02:00
Bart Van Assche
6931ced5d5 IB/srp: Fix srp_abort()
commit e68088e78d82920632eba112b968e49d588d02a2 upstream.

Before commit e494f6a728 ("[SCSI] improved eh timeout handler") it
did not really matter whether or not abort handlers like srp_abort()
called .scsi_done() when returning another value than SUCCESS. Since
that commit however this matters. Hence only call .scsi_done() when
returning SUCCESS.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-24 09:32:08 +02:00
Bart Van Assche
727631ffc0 IB/srpt: Fix abort handling
[ Upstream commit 55d694275f41a1c0eef4ef49044ff29bc3999490 ]

Let the target core check the CMD_T_ABORTED flag instead of the SRP
target driver. Hence remove the transport_check_aborted_status()
call. Since state == SRPT_STATE_CMD_RSP_SENT is something that really
should not happen, do not try to recover if srpt_queue_response() is
called for an I/O context that is in that state. This patch is a bug
fix because the srpt_abort_cmd() call is misplaced - if that function
is called from srpt_queue_response() it should either be called
before the command state is changed or after the response has been
sent.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Andy Grover <agrover@redhat.com>
Cc: David Disseldorp <ddiss@suse.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-13 19:50:01 +02:00
Erez Shitrit
64e3d45533 IB/ipoib: Avoid memory leak if the SA returns a different DGID
[ Upstream commit 439000892ee17a9c92f1e4297818790ef8bb4ced ]

The ipoib path database is organized around DGIDs from the LLADDR, but the
SA is free to return a different GID when asked for path. This causes a
bug because the SA's modified DGID is copied into the database key, even
though it is no longer the correct lookup key, causing a memory leak and
other malfunctions.

Ensure the database key does not change after the SA query completes.

Demonstration of the bug is as  follows
ipoib wants to send to GID fe80:0000:0000:0000:0002:c903:00ef:5ee2, it
creates new record in the DB with that gid as a key, and issues a new
request to the SM.
Now, the SM from some reason returns path-record with other SGID (for
example, 2001:0000:0000:0000:0002:c903:00ef:5ee2 that contains the local
subnet prefix) now ipoib will overwrite the current entry with the new
one, and if new request to the original GID arrives ipoib  will not find
it in the DB (was overwritten) and will create new record that in its
turn will also be overwritten by the response from the SM, and so on
till the driver eats all the device memory.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-24 10:58:47 +01:00
Feras Daoud
8716c87ec2 IB/ipoib: Update broadcast object if PKey value was changed in index 0
[ Upstream commit 9a9b8112699d78e7f317019b37f377e90023f3ed ]

Update the broadcast address in the priv->broadcast object when the
Pkey value changes in index 0, otherwise the multicast GID value will
keep the previous value of the PKey, and will not be updated.
This leads to interface state down because the interface will keep the
old PKey value.

For example, in SR-IOV environment, if the PF changes the value of PKey
index 0 for one of the VFs, then the VF receives PKey change event that
triggers heavy flush. This flush calls update_parent_pkey that update the
broadcast object and its relevant members. If in this case the multicast
GID will not be updated, the interface state will be down.

Fixes: c290414169 ("IPoIB: Fix pkey change flow for virtualization environments")
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-24 10:58:42 +01:00
Feras Daoud
cd6a157b38 IB/ipoib: Fix deadlock between ipoib_stop and mcast join flow
[ Upstream commit 3e31a490e01a6e67cbe9f6e1df2f3ff0fbf48972 ]

Before calling ipoib_stop, rtnl_lock should be taken, then
the flow clears the IPOIB_FLAG_ADMIN_UP and IPOIB_FLAG_OPER_UP
flags, and waits for mcast completion if IPOIB_MCAST_FLAG_BUSY
is set.

On the other hand, the flow of multicast join task initializes
a mcast completion, sets the IPOIB_MCAST_FLAG_BUSY and calls
ipoib_mcast_join. If IPOIB_FLAG_OPER_UP flag is not set, this
call returns EINVAL without setting the mcast completion and
leads to a deadlock.

    ipoib_stop                          |
        |                               |
    clear_bit(IPOIB_FLAG_ADMIN_UP)      |
        |                               |
    Context Switch                      |
        |                       ipoib_mcast_join_task
        |                               |
        |                       spin_lock_irq(lock)
        |                               |
        |                       init_completion(mcast)
        |                               |
        |                       set_bit(IPOIB_MCAST_FLAG_BUSY)
        |                               |
        |                       Context Switch
        |                               |
    clear_bit(IPOIB_FLAG_OPER_UP)       |
        |                               |
    spin_lock_irqsave(lock)             |
        |                               |
    Context Switch                      |
        |                       ipoib_mcast_join
        |                       return (-EINVAL)
        |                               |
        |                       spin_unlock_irq(lock)
        |                               |
        |                       Context Switch
        |                               |
    ipoib_mcast_dev_flush               |
    wait_for_completion(mcast)          |

ipoib_stop will wait for mcast completion for ever, and will
not release the rtnl_lock. As a result panic occurs with the
following trace:

    [13441.639268] Call Trace:
    [13441.640150]  [<ffffffff8168b579>] schedule+0x29/0x70
    [13441.641038]  [<ffffffff81688fc9>] schedule_timeout+0x239/0x2d0
    [13441.641914]  [<ffffffff810bc017>] ? complete+0x47/0x50
    [13441.642765]  [<ffffffff810a690d>] ? flush_workqueue_prep_pwqs+0x16d/0x200
    [13441.643580]  [<ffffffff8168b956>] wait_for_completion+0x116/0x170
    [13441.644434]  [<ffffffff810c4ec0>] ? wake_up_state+0x20/0x20
    [13441.645293]  [<ffffffffa05af170>] ipoib_mcast_dev_flush+0x150/0x190 [ib_ipoib]
    [13441.646159]  [<ffffffffa05ac967>] ipoib_ib_dev_down+0x37/0x60 [ib_ipoib]
    [13441.647013]  [<ffffffffa05a4805>] ipoib_stop+0x75/0x150 [ib_ipoib]

Fixes: 08bc327629cb ("IB/ipoib: fix for rare multicast join race condition")
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-24 10:58:42 +01:00
Erez Shitrit
fed4d0f0ee IB/ipoib: Fix race condition in neigh creation
[ Upstream commit 16ba3defb8bd01a9464ba4820a487f5b196b455b ]

When using enhanced mode for IPoIB, two threads may execute xmit in
parallel to two different TX queues while the target is the same.
In this case, both of them will add the same neighbor to the path's
neigh link list and we might see the following message:

  list_add double add: new=ffff88024767a348, prev=ffff88024767a348...
  WARNING: lib/list_debug.c:31__list_add_valid+0x4e/0x70
  ipoib_start_xmit+0x477/0x680 [ib_ipoib]
  dev_hard_start_xmit+0xb9/0x3e0
  sch_direct_xmit+0xf9/0x250
  __qdisc_run+0x176/0x5d0
  __dev_queue_xmit+0x1f5/0xb10
  __dev_queue_xmit+0x55/0xb10

Analysis:
Two SKB are scheduled to be transmitted from two cores.
In ipoib_start_xmit, both gets NULL when calling ipoib_neigh_get.
Two calls to neigh_add_path are made. One thread takes the spin-lock
and calls ipoib_neigh_alloc which creates the neigh structure,
then (after the __path_find) the neigh is added to the path's neigh
link list. When the second thread enters the critical section it also
calls ipoib_neigh_alloc but in this case it gets the already allocated
ipoib_neigh structure, which is already linked to the path's neigh
link list and adds it again to the list. Which beside of triggering
the list, it creates a loop in the linked list. This loop leads to
endless loop inside path_rec_completion.

Solution:
Check list_empty(&neigh->list) before adding to the list.
Add a similar fix in "ipoib_multicast.c::ipoib_mcast_send"

Fixes: b63b70d877 ('IPoIB: Use a private hash table for path lookup in xmit path')
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-03 10:19:43 +01:00
Bart Van Assche
6c2c83eb1b IB/srpt: Disable RDMA access by the initiator
commit bec40c26041de61162f7be9d2ce548c756ce0f65 upstream.

With the SRP protocol all RDMA operations are initiated by the target.
Since no RDMA operations are initiated by the initiator, do not grant
the initiator permission to submit RDMA reads or writes to the target.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-17 09:35:24 +01:00
Sagi Grimberg
52cd7920b7 RDMA/iser: Fix possible mr leak on device removal event
[ Upstream commit ea174c9573b0e0c8bc1a7a90fe9360ccb7aa9cbb ]

When the rdma device is removed, we must cleanup all
the rdma resources within the DEVICE_REMOVAL event
handler to let the device teardown gracefully. When
this happens with live I/O, some memory regions are
occupied. Thus, track them too and dereg all the mr's.

We are safe with mr access by iscsi_iser_cleanup_task.

Reported-by: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-25 14:22:12 +01:00
Alex Vesker
26c66554d7 IB/ipoib: Grab rtnl lock on heavy flush when calling ndo_open/stop
[ Upstream commit b4b678b06f6eef18bff44a338c01870234db0bc9 ]

When ndo_open and ndo_stop are called RTNL lock should be held.
In this specific case ipoib_ib_dev_open calls the offloaded ndo_open
which re-sets the number of TX queue assuming RTNL lock is held.
Since RTNL lock is not held, RTNL assert will fail.

Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-20 10:05:01 +01:00
Bart Van Assche
ecc5e89147 IB/srp: Avoid that a cable pull can trigger a kernel crash
commit 8a0d18c62121d3c554a83eb96e2752861d84d937 upstream.

This patch fixes the following kernel crash:

general protection fault: 0000 [#1] PREEMPT SMP
Workqueue: ib_mad2 timeout_sends [ib_core]
Call Trace:
 ib_sa_path_rec_callback+0x1c4/0x1d0 [ib_core]
 send_handler+0xb2/0xd0 [ib_core]
 timeout_sends+0x14d/0x220 [ib_core]
 process_one_work+0x200/0x630
 worker_thread+0x4e/0x3b0
 kthread+0x113/0x150

Fixes: commit aef9ec39c4 ("IB: Add SCSI RDMA Protocol (SRP) initiator")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-30 08:37:23 +00:00
Bart Van Assche
3e32b40435 IB/srpt: Do not accept invalid initiator port names
commit c70ca38960399a63d5c048b7b700612ea321d17e upstream.

Make srpt_parse_i_port_id() return a negative value if hex2bin()
fails.

Fixes: commit a42d985bd5 ("ib_srpt: Initial SRP Target merge for v3.3-rc1")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-30 08:37:23 +00:00
Feras Daoud
3652b0b6f2 IB/ipoib: Change list_del to list_del_init in the tx object
[ Upstream commit 27d41d29c7f093f6f77843624fbb080c1b4a8b9c ]

Since ipoib_cm_tx_start function and ipoib_cm_tx_reap function
belong to different work queues, they can run in parallel.
In this case if ipoib_cm_tx_reap calls list_del and release the
lock, ipoib_cm_tx_start may acquire it and call list_del_init
on the already deleted object.
Changing list_del to list_del_init in ipoib_cm_tx_reap fixes the problem.

Fixes: 839fcaba35 ("IPoIB: Connected mode experimental support")
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-15 17:13:11 +01:00
Feras Daoud
bf184ddd21 IB/ipoib: Replace list_del of the neigh->list with list_del_init
[ Upstream commit c586071d1dc8227a7182179b8e50ee92cc43f6d2 ]

In order to resolve a situation where a few process delete
the same list element in sequence and cause panic, list_del
is replaced with list_del_init. In this case if the first
process that calls list_del releases the lock before acquiring
it again, other processes who can acquire the lock will call
list_del_init.

Fixes: b63b70d877 ("IPoIB: Use a private hash table for path lookup")
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-08 10:14:17 +02:00
Feras Daoud
f1d53c6d48 IB/ipoib: rtnl_unlock can not come after free_netdev
[ Upstream commit 89a3987ab7a923c047c6dec008e60ad6f41fac22 ]

The ipoib_vlan_add function calls rtnl_unlock after free_netdev,
rtnl_unlock not only releases the lock, but also calls netdev_run_todo.
The latter function browses the net_todo_list array and completes the
unregistration of all its net_device instances. If we call free_netdev
before rtnl_unlock, then netdev_run_todo call over the freed device causes
panic.
To fix, move rtnl_unlock call before free_netdev call.

Fixes: 9baa0b0364 ("IB/ipoib: Add rtnl_link_ops support")
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-08 10:14:17 +02:00
Feras Daoud
9326a1374b IB/ipoib: Fix deadlock over vlan_mutex
[ Upstream commit 1c3098cdb05207e740715857df7b0998e372f527 ]

This patch fixes Deadlock while executing ipoib_vlan_delete.

The function takes the vlan_rwsem semaphore and calls
unregister_netdevice. The later function calls
ipoib_mcast_stop_thread that cause workqueue flush.

When the queue has one of the ipoib_ib_dev_flush_xxx events,
a deadlock occur because these events also tries to catch the
same vlan_rwsem semaphore.

To fix, unregister_netdevice should be called after releasing
the semaphore.

Fixes: cbbe1efa49 ("IPoIB: Fix deadlock between ipoib_open() and child interface create")
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-08 10:14:17 +02:00
Nicholas Bellinger
9745cbec9c iser-target: Avoid isert_conn->cm_id dereference in isert_login_recv_done
commit fce50a2fa4e9c6e103915c351b6d4a98661341d6 upstream.

This patch fixes a NULL pointer dereference in isert_login_recv_done()
of isert_conn->cm_id due to isert_cma_handler() -> isert_connect_error()
resetting isert_conn->cm_id = NULL during a failed login attempt.

As per Sagi, we will always see the completion of all recv wrs posted
on the qp (given that we assigned a ->done handler), this is a FLUSH
error completion, we just don't get to verify that because we deref
NULL before.

The issue here, was the assumption that dereferencing the connection
cm_id is always safe, which is not true since:

    commit 4a579da258
    Author: Sagi Grimberg <sagig@mellanox.com>
    Date:   Sun Mar 29 15:52:04 2015 +0300

         iser-target: Fix possible deadlock in RDMA_CM connection error

As I see it, we have a direct reference to the isert_device from
isert_conn which is the one-liner fix that we actually need like
we do in isert_rdma_read_done() and isert_rdma_write_done().

Reported-by: Andrea Righi <righi.andrea@gmail.com>
Tested-by: Andrea Righi <righi.andrea@gmail.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-11 09:08:50 -07:00
Shamir Rabinovitch
1360f4301c IB/IPoIB: ibX: failed to create mcg debug file
commit 771a52584096c45e4565e8aabb596eece9d73d61 upstream.

When udev renames the netdev devices, ipoib debugfs entries does not
get renamed. As a result, if subsequent probe of ipoib device reuse the
name then creating a debugfs entry for the new device would fail.

Also, moved ipoib_create_debug_files and ipoib_delete_debug_files as part
of ipoib event handling in order to avoid any race condition between these.

Fixes: 1732b0ef3b ([IPoIB] add path record information in debugfs)
Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:27:00 +02:00
Bart Van Assche
696255449b IB/srp: Fix race conditions related to task management
commit 0a6fdbdeb1c25e31763c1fb333fa2723a7d2aba6 upstream.

Avoid that srp_process_rsp() overwrites the status information
in ch if the SRP target response timed out and processing of
another task management function has already started. Avoid that
issuing multiple task management functions concurrently triggers
list corruption. This patch prevents that the following stack
trace appears in the system log:

WARNING: CPU: 8 PID: 9269 at lib/list_debug.c:52 __list_del_entry_valid+0xbc/0xc0
list_del corruption. prev->next should be ffffc90004bb7b00, but was ffff8804052ecc68
CPU: 8 PID: 9269 Comm: sg_reset Tainted: G        W       4.10.0-rc7-dbg+ #3
Call Trace:
 dump_stack+0x68/0x93
 __warn+0xc6/0xe0
 warn_slowpath_fmt+0x4a/0x50
 __list_del_entry_valid+0xbc/0xc0
 wait_for_completion_timeout+0x12e/0x170
 srp_send_tsk_mgmt+0x1ef/0x2d0 [ib_srp]
 srp_reset_device+0x5b/0x110 [ib_srp]
 scsi_ioctl_reset+0x1c7/0x290
 scsi_ioctl+0x12a/0x420
 sd_ioctl+0x9d/0x100
 blkdev_ioctl+0x51e/0x9f0
 block_ioctl+0x38/0x40
 do_vfs_ioctl+0x8f/0x700
 SyS_ioctl+0x3c/0x70
 entry_SYSCALL_64_fastpath+0x18/0xad

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Israel Rukshin <israelr@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Laurence Oberman <loberman@redhat.com>
Cc: Steve Feeley <Steve.Feeley@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15 09:57:13 +08:00
Bart Van Assche
944690cdb5 IB/srp: Avoid that duplicate responses trigger a kernel bug
commit 6cb72bc1b40bb2c1750ee7a5ebade93bed49a5fb upstream.

After srp_process_rsp() returns there is a short time during which
the scsi_host_find_tag() call will return a pointer to the SCSI
command that is being completed. If during that time a duplicate
response is received, avoid that the following call stack appears:

BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: srp_recv_done+0x450/0x6b0 [ib_srp]
Oops: 0000 [#1] SMP
CPU: 10 PID: 0 Comm: swapper/10 Not tainted 4.10.0-rc7-dbg+ #1
Call Trace:
 <IRQ>
 __ib_process_cq+0x4b/0xd0 [ib_core]
 ib_poll_handler+0x1d/0x70 [ib_core]
 irq_poll_softirq+0xba/0x120
 __do_softirq+0xba/0x4c0
 irq_exit+0xbe/0xd0
 smp_apic_timer_interrupt+0x38/0x50
 apic_timer_interrupt+0x90/0xa0
 </IRQ>
RIP: srp_recv_done+0x450/0x6b0 [ib_srp] RSP: ffff88046f483e20

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Israel Rukshin <israelr@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Laurence Oberman <loberman@redhat.com>
Cc: Steve Feeley <Steve.Feeley@sandisk.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15 09:57:13 +08:00
Erez Shitrit
bb4a21dcb6 IB/IPoIB: Add destination address when re-queue packet
commit 2b0841766a898aba84630fb723989a77a9d3b4e6 upstream.

When sending packet to destination that was not resolved yet
via path query, the driver keeps the skb and tries to re-send it
again when the path is resolved.

But when re-sending via dev_queue_xmit the kernel doesn't call
to dev_hard_header, so IPoIB needs to keep 20 bytes in the skb
and to put the destination address inside them.

In that way the dev_start_xmit will have the correct destination,
and the driver won't take the destination from the skb->data, while
nothing exists there, which causes to packet be be dropped.

The test flow is:
1. Run the SM on remote node,
2. Restart the driver.
4. Ping some destination,
3. Observe that first ICMP request will be dropped.

Fixes: fc791b633515 ("IB/ipoib: move back IB LL address into the hard header")
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Tested-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15 09:57:13 +08:00
Feras Daoud
10beca5374 IB/ipoib: Fix deadlock between rmmod and set_mode
commit 0a0007f28304cb9fc87809c86abb80ec71317f20 upstream.

When calling set_mode from sys/fs, the call flow locks the sys/fs lock
first and then tries to lock rtnl_lock (when calling ipoib_set_mod).
On the other hand, the rmmod call flow takes the rtnl_lock first
(when calling unregister_netdev) and then tries to take the sys/fs
lock. Deadlock a->b, b->a.

The problem starts when ipoib_set_mod frees it's rtnl_lck and tries
to get it after that.

    set_mod:
    [<ffffffff8104f2bd>] ? check_preempt_curr+0x6d/0x90
    [<ffffffff814fee8e>] __mutex_lock_slowpath+0x13e/0x180
    [<ffffffff81448655>] ? __rtnl_unlock+0x15/0x20
    [<ffffffff814fed2b>] mutex_lock+0x2b/0x50
    [<ffffffff81448675>] rtnl_lock+0x15/0x20
    [<ffffffffa02ad807>] ipoib_set_mode+0x97/0x160 [ib_ipoib]
    [<ffffffffa02b5f5b>] set_mode+0x3b/0x80 [ib_ipoib]
    [<ffffffff8134b840>] dev_attr_store+0x20/0x30
    [<ffffffff811f0fe5>] sysfs_write_file+0xe5/0x170
    [<ffffffff8117b068>] vfs_write+0xb8/0x1a0
    [<ffffffff8117ba81>] sys_write+0x51/0x90
    [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b

    rmmod:
    [<ffffffff81279ffc>] ? put_dec+0x10c/0x110
    [<ffffffff8127a2ee>] ? number+0x2ee/0x320
    [<ffffffff814fe6a5>] schedule_timeout+0x215/0x2e0
    [<ffffffff8127cc04>] ? vsnprintf+0x484/0x5f0
    [<ffffffff8127b550>] ? string+0x40/0x100
    [<ffffffff814fe323>] wait_for_common+0x123/0x180
    [<ffffffff81060250>] ? default_wake_function+0x0/0x20
    [<ffffffff8119661e>] ? ifind_fast+0x5e/0xb0
    [<ffffffff814fe43d>] wait_for_completion+0x1d/0x20
    [<ffffffff811f2e68>] sysfs_addrm_finish+0x228/0x270
    [<ffffffff811f2fb3>] sysfs_remove_dir+0xa3/0xf0
    [<ffffffff81273f66>] kobject_del+0x16/0x40
    [<ffffffff8134cd14>] device_del+0x184/0x1e0
    [<ffffffff8144e59b>] netdev_unregister_kobject+0xab/0xc0
    [<ffffffff8143c05e>] rollback_registered+0xae/0x130
    [<ffffffff8143c102>] unregister_netdevice+0x22/0x70
    [<ffffffff8143c16e>] unregister_netdev+0x1e/0x30
    [<ffffffffa02a91b0>] ipoib_remove_one+0xe0/0x120 [ib_ipoib]
    [<ffffffffa01ed95f>] ib_unregister_device+0x4f/0x100 [ib_core]
    [<ffffffffa021f5e1>] mlx4_ib_remove+0x41/0x180 [mlx4_ib]
    [<ffffffffa01ab771>] mlx4_remove_device+0x71/0x90 [mlx4_core]

Fixes: 862096a8bb ("IB/ipoib: Add more rtnl_link_ops callbacks")
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15 09:57:13 +08:00
Paolo Abeni
e114e66eec IB/ipoib: move back IB LL address into the hard header
commit fc791b6335152c5278dc4a4991bcb2d329f806f9 upstream.

After the commit 9207f9d45b0a ("net: preserve IP control block
during GSO segmentation"), the GSO CB and the IPoIB CB conflict.
That destroy the IPoIB address information cached there,
causing a severe performance regression, as better described here:

http://marc.info/?l=linux-kernel&m=146787279825501&w=2

This change moves the data cached by the IPoIB driver from the
skb control lock into the IPoIB hard header, as done before
the commit 936d7de3d7 ("IPoIB: Stop lying about hard_header_len
and use skb->cb to stash LL addresses").
In order to avoid GRO issue, on packet reception, the IPoIB driver
stash into the skb a dummy pseudo header, so that the received
packets have actually a hard header matching the declared length.
To avoid changing the connected mode maximum mtu, the allocated
head buffer size is increased by the pseudo header length.

After this commit, IPoIB performances are back to pre-regression
value.

v2 -> v3: rebased
v1 -> v2: avoid changing the max mtu, increasing the head buf size

Fixes: 9207f9d45b0a ("net: preserve IP control block during GSO segmentation")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Vasiliy Tolstov <v.tolstov@selfip.ru>
Cc: Nikolay Borisov <n.borisov.lkml@gmail.com>
Cc: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-01 08:30:53 +01:00
Kamal Heib
aa02f29e95 IB/IPoIB: Remove can't use GFP_NOIO warning
commit 0b59970e7d96edcb3c7f651d9d48e1a59af3c3b0 upstream.

Remove the warning print of "can't use of GFP_NOIO" to avoid prints in
each QP creation when devices aren't supporting IB_QP_CREATE_USE_GFP_NOIO.

This print become more annoying when the IPoIB interface is configured
to work in connected mode.

Fixes: 09b93088d7 ('IB: Add a QP creation flag to use GFP_NOIO allocations')
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-26 08:23:47 +01:00
Bart Van Assche
7b13692156 IPoIB: Avoid reading an uninitialized member variable
commit 11b642b84e8c43e8597de031678d15c08dd057bc upstream.

This patch avoids that Coverity reports the following:

    Using uninitialized value port_attr.state when calling printk

Fixes: commit 94232d9ce8 ("IPoIB: Start multicast join process only on active ports")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09 08:07:51 +01:00
Alex Vesker
87160fb51b IB/ipoib: Don't allow MC joins during light MC flush
commit 344bacca8cd811809fc33a249f2738ab757d327f upstream.

This fix solves a race between light flush and on the fly joins.
Light flush doesn't set the device to down and unset IPOIB_OPER_UP
flag, this means that if while flushing we have a MC join in progress
and the QP was attached to BC MGID we can have a mismatches when
re-attaching a QP to the BC MGID.

The light flush would set the broadcast group to NULL causing an on
the fly join to rejoin and reattach to the BC MCG as well as adding
the BC MGID to the multicast list. The flush process would later on
remove the BC MGID and detach it from the QP. On the next flush
the BC MGID is present in the multicast list but not found when trying
to detach it because of the previous double attach and single detach.

[18332.714265] ------------[ cut here ]------------
[18332.717775] WARNING: CPU: 6 PID: 3767 at drivers/infiniband/core/verbs.c:280 ib_dealloc_pd+0xff/0x120 [ib_core]
...
[18332.775198] Hardware name: Red Hat KVM, BIOS Bochs 01/01/2011
[18332.779411]  0000000000000000 ffff8800b50dfbb0 ffffffff813fed47 0000000000000000
[18332.784960]  0000000000000000 ffff8800b50dfbf0 ffffffff8109add1 0000011832f58300
[18332.790547]  ffff880226a596c0 ffff880032482000 ffff880032482830 ffff880226a59280
[18332.796199] Call Trace:
[18332.798015]  [<ffffffff813fed47>] dump_stack+0x63/0x8c
[18332.801831]  [<ffffffff8109add1>] __warn+0xd1/0xf0
[18332.805403]  [<ffffffff8109aebd>] warn_slowpath_null+0x1d/0x20
[18332.809706]  [<ffffffffa025d90f>] ib_dealloc_pd+0xff/0x120 [ib_core]
[18332.814384]  [<ffffffffa04f3d7c>] ipoib_transport_dev_cleanup+0xfc/0x1d0 [ib_ipoib]
[18332.820031]  [<ffffffffa04ed648>] ipoib_ib_dev_cleanup+0x98/0x110 [ib_ipoib]
[18332.825220]  [<ffffffffa04e62c8>] ipoib_dev_cleanup+0x2d8/0x550 [ib_ipoib]
[18332.830290]  [<ffffffffa04e656f>] ipoib_uninit+0x2f/0x40 [ib_ipoib]
[18332.834911]  [<ffffffff81772a8a>] rollback_registered_many+0x1aa/0x2c0
[18332.839741]  [<ffffffff81772bd1>] rollback_registered+0x31/0x40
[18332.844091]  [<ffffffff81773b18>] unregister_netdevice_queue+0x48/0x80
[18332.848880]  [<ffffffffa04f489b>] ipoib_vlan_delete+0x1fb/0x290 [ib_ipoib]
[18332.853848]  [<ffffffffa04df1cd>] delete_child+0x7d/0xf0 [ib_ipoib]
[18332.858474]  [<ffffffff81520c08>] dev_attr_store+0x18/0x30
[18332.862510]  [<ffffffff8127fe4a>] sysfs_kf_write+0x3a/0x50
[18332.866349]  [<ffffffff8127f4e0>] kernfs_fop_write+0x120/0x170
[18332.870471]  [<ffffffff81207198>] __vfs_write+0x28/0xe0
[18332.874152]  [<ffffffff810e09bf>] ? percpu_down_read+0x1f/0x50
[18332.878274]  [<ffffffff81208062>] vfs_write+0xa2/0x1a0
[18332.881896]  [<ffffffff812093a6>] SyS_write+0x46/0xa0
[18332.885632]  [<ffffffff810039b7>] do_syscall_64+0x57/0xb0
[18332.889709]  [<ffffffff81883321>] entry_SYSCALL64_slow_path+0x25/0x25
[18332.894727] ---[ end trace 09ebbe31f831ef17 ]---

Fixes: ee1e2c82c2 ("IPoIB: Refresh paths instead of flushing them on SM change events")
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-07 15:23:46 +02:00
Erez Shitrit
79b993c132 IB/ipoib: Fix memory corruption in ipoib cm mode connect flow
commit 546481c2816ea3c061ee9d5658eb48070f69212e upstream.

When a new CM connection is being requested, ipoib driver copies data
from the path pointer in the CM/tx object, the path object might be
invalid at the point and memory corruption will happened later when now
the CM driver will try using that data.

The next scenario demonstrates it:
	neigh_add_path --> ipoib_cm_create_tx -->
	queue_work (pointer to path is in the cm/tx struct)
	#while the work is still in the queue,
	#the port goes down and causes the ipoib_flush_paths:
	ipoib_flush_paths --> path_free --> kfree(path)
	#at this point the work scheduled starts.
	ipoib_cm_tx_start --> copy from the (invalid)path pointer:
	(memcpy(&pathrec, &p->path->pathrec, sizeof pathrec);)
	 -> memory corruption.

To fix that the driver now starts the CM/tx connection only if that
specific path exists in the general paths database.
This check is protected with the relevant locks, and uses the gid from
the neigh member in the CM/tx object which is valid according to the ref
count that was taken by the CM/tx.

Fixes: 839fcaba35 ('IPoIB: Connected mode experimental support')
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-07 15:23:46 +02:00
Carol L Soto
917f84b8df IB/IPoIB: Do not set skb truesize since using one linearskb
[ Upstream commit bb6a777369449d15a4a890306d2f925cae720e1c ]

We are seeing this warning: at net/core/skbuff.c:4174
and before commit a44878d100 ("IB/ipoib: Use one linear skb in RX flow")
skb truesize was not being set when ipoib was using just one skb.
Removing this line avoids the warning when running tcp tests like iperf.

Fixes: a44878d100 ("IB/ipoib: Use one linear skb in RX flow")
Signed-off-by: Carol L Soto <clsoto@linux.vnet.ibm.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-15 08:27:49 +02:00
Erez Shitrit
9bb807338a IB/IPoIB: Don't update neigh validity for unresolved entries
commit 61c78eea9516a921799c17b4c20558e2aa780fd3 upstream.

ipoib_neigh_get unconditionally updates the "alive" variable member on
any packet send.  This prevents the neighbor garbage collection from
cleaning out a dead neighbor entry if we are still queueing packets
for it.  If the queue for this neighbor is full, then don't update the
alive timestamp.  That way the neighbor can time out even if packets
are still being queued as long as none of them are being sent.

Fixes: b63b70d877 ("IPoIB: Use a private hash table for path lookup in xmit path")
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-20 18:09:25 +02:00
Bart Van Assche
a03870181d IB/srp: Fix a debug kernel crash
commit 54f5c9c52d69afa55abf2b034df8d45f588466c3 upstream.

Avoid that the following BUG() is triggered against a debug
kernel:

kernel BUG at include/linux/scatterlist.h:92!
RIP: 0010:[<ffffffffa0467199>]  [<ffffffffa0467199>] srp_map_idb+0x199/0x1a0 [ib_srp]
Call Trace:
 [<ffffffffa04685fa>] srp_map_data+0x84a/0x890 [ib_srp]
 [<ffffffffa0469674>] srp_queuecommand+0x1e4/0x610 [ib_srp]
 [<ffffffff813f5a5e>] scsi_dispatch_cmd+0x9e/0x180
 [<ffffffff813f8b07>] scsi_request_fn+0x477/0x610
 [<ffffffff81298ffe>] __blk_run_queue+0x2e/0x40
 [<ffffffff81299070>] blk_delay_work+0x20/0x30
 [<ffffffff81071f07>] process_one_work+0x197/0x480
 [<ffffffff81072239>] worker_thread+0x49/0x490
 [<ffffffff810787ea>] kthread+0xea/0x100
 [<ffffffff8159b632>] ret_from_fork+0x22/0x40

Fixes: f7f7aab1a5 ("IB/srp: Convert to new registration API")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-01 12:15:53 -07:00
Jenny Derzhavetz
48f447bceb iser-target: Rework connection termination
commit 6d1fba0c2cc7efe42fd761ecbba833ed0ea7b07e upstream.

When we receive an event that triggers connection termination,
we have a a couple of things we may want to do:
1. In case we are already terminating, bailout early
2. In case we are connected but not bound, disconnect and schedule
   a connection cleanup silently (don't reinstate)
3. In case we are connected and bound, disconnect and reinstate the connection

This rework fixes a bug that was detected against a mis-behaved
initiator which rejected our rdma_cm accept, in this stage the
isert_conn is no bound and reinstate caused a bogus dereference.

What's great about this is that we don't need the
post_recv_buf_count anymore, so get rid of it.

Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12 09:09:03 -07:00
Jenny Derzhavetz
60f0f01da7 iser-target: Separate flows for np listeners and connections cma events
commit f81bf458208ef6d12b2fc08091204e3859dcdba4 upstream.

No need to restrict this check to specific events.

Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12 09:09:03 -07:00
Jenny Derzhavetz
b0d31bbebb iser-target: Add new state ISER_CONN_BOUND to isert_conn
commit aea92980601f7ddfcb3c54caa53a43726314fe46 upstream.

We need an indication that isert_conn->iscsi_conn binding has
happened so we'll know not to invoke a connection reinstatement
on an unbound connection which will lead to a bogus isert_conn->conn
dereferece.

Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12 09:09:03 -07:00
Jenny Derzhavetz
a91eb042e1 iser-target: Fix identification of login rx descriptor type
commit b89a7c25462b164db280abc3b05d4d9d888d40e9 upstream.

Once connection request is accepted, one rx descriptor
is posted to receive login request. This descriptor has rx type,
but is outside the main pool of rx descriptors, and thus
was mistreated as tx type.

Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12 09:09:02 -07:00
Alex Estrin
bcda0fd9a8 IB/ipoib: fix for rare multicast join race condition
commit 08bc327629cbd63bb2f66677e4b33b643695097c upstream.

A narrow window for race condition still exist between
multicast join thread and *dev_flush workers.
A kernel crash caused by prolong erratic link state changes
was observed (most likely a faulty cabling):

[167275.656270] BUG: unable to handle kernel NULL pointer dereference at
0000000000000020
[167275.665973] IP: [<ffffffffa05f8f2e>] ipoib_mcast_join+0xae/0x1d0 [ib_ipoib]
[167275.674443] PGD 0
[167275.677373] Oops: 0000 [#1] SMP
...
[167275.977530] Call Trace:
[167275.982225]  [<ffffffffa05f92f0>] ? ipoib_mcast_free+0x200/0x200 [ib_ipoib]
[167275.992024]  [<ffffffffa05fa1b7>] ipoib_mcast_join_task+0x2a7/0x490
[ib_ipoib]
[167276.002149]  [<ffffffff8109d5fb>] process_one_work+0x17b/0x470
[167276.010754]  [<ffffffff8109e3cb>] worker_thread+0x11b/0x400
[167276.019088]  [<ffffffff8109e2b0>] ? rescuer_thread+0x400/0x400
[167276.027737]  [<ffffffff810a5aef>] kthread+0xcf/0xe0
Here was a hit spot:
ipoib_mcast_join() {
..............
      rec.qkey      = priv->broadcast->mcmember.qkey;
                                       ^^^^^^^
.....
 }
Proposed patch should prevent multicast join task to continue
if link state change is detected.

Signed-off-by: Alex Estrin <alex.estrin@intel.com>
Changes from v4:
- as suggested by Doug Ledford, optimized spinlock usage,
i.e. ipoib_mcast_join() is called with lock held.
Changes from v3:
- sync with priv->lock before flag check.
Chages from v2:
- Move check for OPER_UP flag state to mcast_join() to
ensure no event worker is in progress.
- minor style fixes.
Changes from v1:
- No need to lock again if error detected.
Signed-off-by: Doug Ledford <dledford@redhat.com>
Cc: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12 09:08:59 -07:00
Bart Van Assche
84512e476c IB/srpt: Simplify srpt_handle_tsk_mgmt()
commit 51093254bf879bc9ce96590400a87897c7498463 upstream.

Let the target core check task existence instead of the SRP target
driver. Additionally, let the target core check the validity of the
task management request instead of the ib_srpt driver.

This patch fixes the following kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
IP: [<ffffffffa0565f37>] srpt_handle_new_iu+0x6d7/0x790 [ib_srpt]
Oops: 0002 [#1] SMP
Call Trace:
 [<ffffffffa05660ce>] srpt_process_completion+0xde/0x570 [ib_srpt]
 [<ffffffffa056669f>] srpt_compl_thread+0x13f/0x160 [ib_srpt]
 [<ffffffff8109726f>] kthread+0xcf/0xe0
 [<ffffffff81613cfc>] ret_from_fork+0x7c/0xb0

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Fixes: 3e4f574857 ("ib_srpt: Convert TMR path to target_submit_tmr")
Tested-by: Alex Estrin <alex.estrin@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Cc: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12 09:08:53 -07:00
Sagi Grimberg
f1a47d37fb iser-target: Remove explicit mlx4 work-around
The driver now exposes sufficient limits so we can
avoid having mlx4 specific work-around.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-08 13:01:11 -05:00
Bart Van Assche
57b0be9c0f IB/srp: Fix srp_map_sg_fr()
After dma_map_sg() has been called the return value of that function
must be used as the number of elements in the scatterlist instead of
scsi_sg_count().

Fixes: commit f7f7aab1a5 ("IB/srp: Convert to new registration API")
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: stable <stable@vger.kernel.org> # v4.4+
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-07 17:20:11 -05:00
Bart Van Assche
a745f4f410 IB/srp: Fix indirect data buffer rkey endianness
Detected by sparse.

Fixes: commit 330179f2fa ("IB/srp: Register the indirect data buffer descriptor")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: stable <stable@vger.kernel.org> # v4.3+
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-07 17:20:11 -05:00
Christoph Hellwig
fc925518aa IB/srp: Initialize dma_length in srp_map_idb
Without this sg_dma_len will return 0 on architectures tha have
the dma_length field.

Fixes: commit f7f7aab1a5 ("IB/srp: Convert to new registration API")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-07 17:20:11 -05:00
Sagi Grimberg
09c0c0bea5 IB/srp: Fix possible send queue overflow
When using work request based memory registration (fast_reg)
we must reserve SQ entries for registration and invalidation
in addition to send operations. Each IO consumes 3 SQ entries
(registration, send, invalidation) so we need to allocate 3x
larger send-queue instead of 2x.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-07 17:20:11 -05:00
Bart Van Assche
4d59ad2995 IB/srp: Fix a memory leak
If srp_connect_ch() returns a positive value then that is considered
by its caller as a connection failure but this does not result in a
scsi_host_put() call and additionally causes the srp_create_target()
function to return a positive value while it should return a negative
value. Avoid all this confusion and additionally fix a memory leak by
ensuring that srp_connect_ch() always returns a value that is <= 0.
This patch avoids that a rejected login triggers the following memory
leak:

unreferenced object 0xffff88021b24a220 (size 8):
  comm "srp_daemon", pid 56421, jiffies 4295006762 (age 4240.750s)
  hex dump (first 8 bytes):
    68 6f 73 74 35 38 00 a5                          host58..
  backtrace:
    [<ffffffff8151014a>] kmemleak_alloc+0x7a/0xc0
    [<ffffffff81165c1e>] __kmalloc_track_caller+0xfe/0x160
    [<ffffffff81260d2b>] kvasprintf+0x5b/0x90
    [<ffffffff81260e2d>] kvasprintf_const+0x8d/0xb0
    [<ffffffff81254b0c>] kobject_set_name_vargs+0x3c/0xa0
    [<ffffffff81337e3c>] dev_set_name+0x3c/0x40
    [<ffffffff81355757>] scsi_host_alloc+0x327/0x4b0
    [<ffffffffa03edc8e>] srp_create_target+0x4e/0x8a0 [ib_srp]
    [<ffffffff8133778b>] dev_attr_store+0x1b/0x20
    [<ffffffff811f27fa>] sysfs_kf_write+0x4a/0x60
    [<ffffffff811f1e8e>] kernfs_fop_write+0x14e/0x180
    [<ffffffff81176eef>] __vfs_write+0x2f/0xf0
    [<ffffffff811771e4>] vfs_write+0xa4/0x100
    [<ffffffff81177c64>] SyS_write+0x54/0xc0
    [<ffffffff8151b257>] entry_SYSCALL_64_fastpath+0x12/0x6f

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-07 17:20:11 -05:00
Arnd Bergmann
2c63d1072a IB/iser: use sector_div instead of do_div
do_div is the wrong way to divide a sector_t, as it is less
efficient when sector_t is 32-bit wide. With the upcoming
do_div optimizations, the kernel starts warning about this:

drivers/infiniband/ulp/iser/iser_verbs.c:1296:4: note: in expansion of macro 'do_div'
include/asm-generic/div64.h:224:22: warning: passing argument 1 of '__div64_32' from incompatible pointer type

This changes the code to use sector_div instead, which always
produces optimal code.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-07 16:42:33 -05:00
Linus Torvalds
d83763f4a6 SCSI misc on 20151113
Sorry for the delay in this patch which was mostly caused by getting the
 merger of the mpt2/mpt3sas driver, which was seen as an essential item of
 maintenance work to do before the drivers diverge too much.  Unfortunately,
 this caused a compile failure (detected by linux-next), which then had to be
 fixed up and incubated.  In addition to the mpt2/3sas rework, there are
 updates from pm80xx, lpfc, bnx2fc, hpsa, ipr, aacraid, megaraid_sas, storvsc
 and ufs plus an assortment of changes including some year 2038 issues, a fix
 for a remove before detach issue in some drivers and a couple of other minor
 issues.
 
 Signed-off-by: James Bottomley <JBottomley@Odin.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABAgAGBQJWRnpiAAoJEDeqqVYsXL0MJd0IAIkNP1q/6ksMw/lam2UlbdxN
 zFsFOIhGM3xLnBFehbCx7JW3cmybA7WC5jhMjaa1OeJmYHLpwbTzvVwrLs2l/0y1
 /S+G0wwxROZfIKj/1EO3oKbSPCu9N3lStxOnmNXt5PSUEzAXrTqSNTtnMf2ZGh7j
 bQxTWEJe+66GckgGw4ozTXJHWXqM/Zs/FsYjn2h/WzFhFv8utr7zRSzHMVjcqpLG
 TK/Lt03hiTGqiignKrV9W6JzdGTWf2LGIsj/njgR0dxv59cNH8PrHIjZyknSBxgn
 lblinsCjxDHWnP4BSfTi9MQG1lEiZiWO3Y6TKkKJTgxZ9M0Eitspc+cLOiJ1mpg=
 =HvQf
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull final round of SCSI updates from James Bottomley:
 "Sorry for the delay in this patch which was mostly caused by getting
  the merger of the mpt2/mpt3sas driver, which was seen as an essential
  item of maintenance work to do before the drivers diverge too much.
  Unfortunately, this caused a compile failure (detected by linux-next),
  which then had to be fixed up and incubated.

  In addition to the mpt2/3sas rework, there are updates from pm80xx,
  lpfc, bnx2fc, hpsa, ipr, aacraid, megaraid_sas, storvsc and ufs plus
  an assortment of changes including some year 2038 issues, a fix for a
  remove before detach issue in some drivers and a couple of other minor
  issues"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (141 commits)
  mpt3sas: fix inline markers on non inline function declarations
  sd: Clear PS bit before Mode Select.
  ibmvscsi: set max_lun to 32
  ibmvscsi: display default value for max_id, max_lun and max_channel.
  mptfusion: don't allow negative bytes in kbuf_alloc_2_sgl()
  scsi: pmcraid: replace struct timeval with ktime_get_real_seconds()
  mvumi: 64bit value for seconds_since1970
  be2iscsi: Fix bogus WARN_ON length check
  scsi_scan: don't dump trace when scsi_prep_async_scan() is called twice
  mpt3sas: Bump mpt3sas driver version to 09.102.00.00
  mpt3sas: Single driver module which supports both SAS 2.0 & SAS 3.0 HBAs
  mpt2sas, mpt3sas: Update the driver versions
  mpt3sas: setpci reset kernel oops fix
  mpt3sas: Added OEM Gen2 PnP ID branding names
  mpt3sas: Refcount fw_events and fix unsafe list usage
  mpt3sas: Refcount sas_device objects and fix unsafe list usage
  mpt3sas: sysfs attribute to report Backup Rail Monitor Status
  mpt3sas: Ported WarpDrive product SSS6200 support
  mpt3sas: fix for driver fails EEH, recovery from injected pci bus error
  mpt3sas: Manage MSI-X vectors according to HBA device type
  ...
2015-11-13 20:35:54 -08:00
Linus Torvalds
9aa3d651a9 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
 "This series contains HCH's changes to absorb configfs attribute
  ->show() + ->store() function pointer usage from it's original
  tree-wide consumers, into common configfs code.

  It includes usb-gadget, target w/ drivers, netconsole and ocfs2
  changes to realize the improved simplicity, that now renders the
  original include/target/configfs_macros.h CPP magic for fabric drivers
  and others, unnecessary and obsolete.

  And with common code in place, new configfs attributes can be added
  easier than ever before.

  Note, there are further improvements in-flight from other folks for
  v4.5 code in configfs land, plus number of target fixes for post -rc1
  code"

In the meantime, a new user of the now-removed old configfs API came in
through the char/misc tree in commit 7bd1d4093c ("stm class: Introduce
an abstraction for System Trace Module devices").

This merge resolution comes from Alexander Shishkin, who updated his stm
class tracing abstraction to account for the removal of the old
show_attribute and store_attribute methods in commit 517982229f
("configfs: remove old API") from this pull.  As Alexander says about
that patch:

 "There's no need to keep an extra wrapper structure per item and the
  awkward show_attribute/store_attribute item ops are no longer needed.

  This patch converts policy code to the new api, all the while making
  the code quite a bit smaller and easier on the eyes.

  Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>"

That patch was folded into the merge so that the tree should be fully
bisectable.

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (23 commits)
  configfs: remove old API
  ocfs2/cluster: use per-attribute show and store methods
  ocfs2/cluster: move locking into attribute store methods
  netconsole: use per-attribute show and store methods
  target: use per-attribute show and store methods
  spear13xx_pcie_gadget: use per-attribute show and store methods
  dlm: use per-attribute show and store methods
  usb-gadget/f_serial: use per-attribute show and store methods
  usb-gadget/f_phonet: use per-attribute show and store methods
  usb-gadget/f_obex: use per-attribute show and store methods
  usb-gadget/f_uac2: use per-attribute show and store methods
  usb-gadget/f_uac1: use per-attribute show and store methods
  usb-gadget/f_mass_storage: use per-attribute show and store methods
  usb-gadget/f_sourcesink: use per-attribute show and store methods
  usb-gadget/f_printer: use per-attribute show and store methods
  usb-gadget/f_midi: use per-attribute show and store methods
  usb-gadget/f_loopback: use per-attribute show and store methods
  usb-gadget/ether: use per-attribute show and store methods
  usb-gadget/f_acm: use per-attribute show and store methods
  usb-gadget/f_hid: use per-attribute show and store methods
  ...
2015-11-13 20:04:17 -08:00
Christoph Hellwig
64d513ac31 scsi: use host wide tags by default
This patch changes the !blk-mq path to the same defaults as the blk-mq
I/O path by always enabling block tagging, and always using host wide
tags.  We've had blk-mq available for a few releases so bugs with
this mode should have been ironed out, and this ensures we get better
coverage of over tagging setup over different configs.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
2015-11-09 17:11:57 -08:00
Sagi Grimberg
9a21be531c IB/srp: Dont allocate a page vector when using fast_reg
The new fast registration API does not reuqire a page vector
so we can't avoid allocating it.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Tested-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28 22:27:19 -04:00
Sagi Grimberg
51c2b8e2ac IB/srp: Remove srp_finish_mapping
No callers left, remove it.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Tested-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28 22:27:19 -04:00
Sagi Grimberg
f7f7aab1a5 IB/srp: Convert to new registration API
Instead of constructing a page list, call ib_map_mr_sg
and post a new ib_reg_wr. srp_map_finish_fr now returns
the number of sg elements registered.

Remove srp_finish_mapping since no one is calling it.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Tested-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28 22:27:19 -04:00