Commit graph

566 commits

Author SHA1 Message Date
Steffen Maier
f723101e1c scsi: zfcp: fix missing REC trigger trace on enqueue without ERP thread
commit 6a76550841d412330bd86aed3238d1888ba70f0e upstream.

Example trace record formatted with zfcpdbf from s390-tools:

Timestamp      : ...
Area           : REC
Subarea        : 00
Level          : 1
Exception      : -
CPU ID         : ..
Caller         : 0x...
Record ID      : 1                      ZFCP_DBF_REC_TRIG
Tag            : .......
LUN            : 0x...
WWPN           : 0x...
D_ID           : 0x...
Adapter status : 0x...
Port status    : 0x...
LUN status     : 0x...
Ready count    : 0x...
Running count  : 0x...
ERP want       : 0x0.                   ZFCP_ERP_ACTION_REOPEN_...
ERP need       : 0xc0                   ZFCP_ERP_ACTION_NONE

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Cc: <stable@vger.kernel.org> #2.6.38+
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-03 11:21:31 +02:00
Steffen Maier
1d7e23543a scsi: zfcp: fix missing REC trigger trace for all objects in ERP_FAILED
commit 8c3d20aada70042a39c6a6625be037c1472ca610 upstream.

That other commit introduced an inconsistency because it would trace on
ERP_FAILED for all callers of port forced reopen triggers (not just
terminate_rport_io), but it would not trace on ERP_FAILED for all callers of
other ERP triggers such as adapter, port regular, LUN.

Therefore, generalize that other commit. zfcp_erp_action_enqueue() already
had two early outs which re-used the one zfcp_dbf_rec_trig() call.  All ERP
trigger functions finally run through zfcp_erp_action_enqueue().  So move
the special handling for ZFCP_STATUS_COMMON_ERP_FAILED into
zfcp_erp_action_enqueue() and add another early out with new trace marker
for pseudo ERP need in this case. This removes all early returns from all
ERP trigger functions so we always end up at zfcp_dbf_rec_trig().

Example trace record formatted with zfcpdbf from s390-tools:

Timestamp      : ...
Area           : REC
Subarea        : 00
Level          : 1
Exception      : -
CPU ID         : ..
Caller         : 0x...
Record ID      : 1                      ZFCP_DBF_REC_TRIG
Tag            : .......
LUN            : 0x...
WWPN           : 0x...
D_ID           : 0x...
Adapter status : 0x...
Port status    : 0x...
LUN status     : 0x...
Ready count    : 0x...
Running count  : 0x...
ERP want       : 0x0.                   ZFCP_ERP_ACTION_REOPEN_...
ERP need       : 0xe0                   ZFCP_ERP_ACTION_FAILED

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Cc: <stable@vger.kernel.org> #2.6.38+
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-03 11:21:31 +02:00
Steffen Maier
24f2f5d726 scsi: zfcp: fix missing REC trigger trace on terminate_rport_io for ERP_FAILED
commit d70aab55924b44f213fec2b900b095430b33eec6 upstream.

For problem determination we always want to see when we were invoked on the
terminate_rport_io callback whether we perform something or not.

Temporal event sequence of interest with a long fast_io_fail_tmo of 27 sec:

loose remote port

t   workqueue
[s] zfcp_q_<dev>       IRQ                 zfcperp<dev>

=== ================== =================== ============================

  0                    recv RSCN
                       q p.test_link_work
    block rport
     start fast_io_fail_tmo
    send ADISC ELS
  4                    recv ADISC fail
                       block zfcp_port
                                           port forced reopen
                                           send open port
 12                    recv open port fail
                                           q p.gid_pn_work
                                           zfcp_erp_wakeup
                                           (zfcp_erp_wait would return)
    GID_PN fail

Before this point, we got a SCSI trace with tag "sctrpi1" on fast_io_fail,
e.g. with the typical 5 sec setting.

    port.status |= ERP_FAILED

If fast_io_fail_tmo triggers after this point, we missed a SCSI trace.

    workqueue
    fc_dl_<host>
    ==================
 27 fc_timeout_fail_rport_io
    fc_terminate_rport_io
    zfcp_scsi_terminate_rport_io
    zfcp_erp_port_forced_reopen
    _zfcp_erp_port_forced_reopen
     if (port.status & ERP_FAILED)
      return;

Therefore, write a trace before above early return.

Example trace record formatted with zfcpdbf from s390-tools:

Timestamp      : ...
Area           : REC
Subarea        : 00
Level          : 1
Exception      : -
CPU ID         : ..
Caller         : 0x...
Record ID      : 1                      ZFCP_DBF_REC_TRIG
Tag            : sctrpi1                SCSI terminate rport I/O
LUN            : 0xffffffffffffffff                     none (invalid)
WWPN           : 0x<wwpn>
D_ID           : 0x<n_port_id>
Adapter status : 0x...
Port status    : 0x...
LUN status     : 0x00000000                             none (invalid)
Ready count    : 0x...
Running count  : 0x...
ERP want       : 0x03                   ZFCP_ERP_ACTION_REOPEN_PORT_FORCED
ERP need       : 0xe0                   ZFCP_ERP_ACTION_FAILED

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Cc: <stable@vger.kernel.org> #2.6.38+
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-03 11:21:31 +02:00
Steffen Maier
24558a878a scsi: zfcp: fix missing REC trigger trace on terminate_rport_io early return
commit 96d9270499471545048ed8a6d7f425a49762283d upstream.

get_device() and its internally used kobject_get() only return NULL if they
get passed NULL as argument. zfcp_get_port_by_wwpn() loops over
adapter->port_list so the iteration variable port is always non-NULL.
Struct device is embedded in struct zfcp_port so &port->dev is always
non-NULL. This is the argument to get_device().  However, if we get an
fc_rport in terminate_rport_io() for which we cannot find a match within
zfcp_get_port_by_wwpn(), the latter can return NULL.  v2.6.30 commit
70932935b6 ("[SCSI] zfcp: Fix oops when port disappears") introduced an
early return without adding a trace record for this case.  Even if we don't
need recovery in this case, for debugging we should still see that our
callback was invoked originally by scsi_transport_fc.

Example trace record formatted with zfcpdbf from s390-tools:

Timestamp      : ...
Area           : REC
Subarea        : 00
Level          : 1
Exception      : -
CPU ID         : ..
Caller         : 0x...
Record ID      : 1
Tag            : sctrpin        SCSI terminate rport I/O, no zfcp port
LUN            : 0xffffffffffffffff                     none (invalid)
WWPN           : 0x<wwpn>               WWPN
D_ID           : 0x<n_port_id>          N_Port-ID
Adapter status : 0x...
Port status    : 0xffffffff             unknown (-1)
LUN status     : 0x00000000                             none (invalid)
Ready count    : 0x...
Running count  : 0x...
ERP want       : 0x03                   ZFCP_ERP_ACTION_REOPEN_PORT_FORCED
ERP need       : 0xc0                   ZFCP_ERP_ACTION_NONE

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Fixes: 70932935b6 ("[SCSI] zfcp: Fix oops when port disappears")
Cc: <stable@vger.kernel.org> #2.6.38+
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-03 11:21:31 +02:00
Steffen Maier
9ffe2bc9c3 scsi: zfcp: fix misleading REC trigger trace where erp_action setup failed
commit 512857a795cbbda5980efa4cdb3c0b6602330408 upstream.

If a SCSI device is deleted during scsi_eh host reset, we cannot get a
reference to the SCSI device anymore since scsi_device_get returns !=0 by
design. Assuming the recovery of adapter and port(s) was successful,
zfcp_erp_strategy_followup_success() attempts to trigger a LUN reset for the
half-gone SCSI device. Unfortunately, it causes the following confusing
trace record which states that zfcp will do a LUN recovery as "ERP need" is
ZFCP_ERP_ACTION_REOPEN_LUN == 1 and equals "ERP want".

Old example trace record formatted with zfcpdbf from s390-tools:

Tag:           : ersfs_3 ERP, trigger, unit reopen, port reopen succeeded
LUN            : 0x<FCP_LUN>
WWPN           : 0x<WWPN>
D_ID           : 0x<N_Port-ID>
Adapter status : 0x5400050b
Port status    : 0x54000001
LUN status     : 0x40000000     ZFCP_STATUS_COMMON_RUNNING
                                but not ZFCP_STATUS_COMMON_UNBLOCKED as it
                                was closed on close part of adapter reopen
ERP want       : 0x01
ERP need       : 0x01           misleading

However, zfcp_erp_setup_act() returns NULL as it cannot get the reference.
Hence, zfcp_erp_action_enqueue() takes an early goto out and _NO_ recovery
actually happens.

We always do want the recovery trigger trace record even if no erp_action
could be enqueued as in this case. For other cases where we did not enqueue
an erp_action, 'need' has always been zero to indicate this. In order to
indicate above goto out, introduce an eyecatcher "flag" to mark the "ERP
need" as 'not needed' but still keep the information which erp_action type,
that zfcp_erp_required_act() had decided upon, is needed.  0xc_ is chosen to
be visibly different from 0x0_ in "ERP want".

New example trace record formatted with zfcpdbf from s390-tools:

Tag:           : ersfs_3 ERP, trigger, unit reopen, port reopen succeeded
LUN            : 0x<FCP_LUN>
WWPN           : 0x<WWPN>
D_ID           : 0x<N_Port-ID>
Adapter status : 0x5400050b
Port status    : 0x54000001
LUN status     : 0x40000000
ERP want       : 0x01
ERP need       : 0xc1           would need LUN ERP, but no action set up
                   ^

Before v2.6.38 commit ae0904f60f ("[SCSI] zfcp: Redesign of the debug
tracing for recovery actions.") we could detect this case because the
"erp_action" field in the trace was NULL. The rework removed erp_action as
argument and field from the trace.

This patch here is for tracing. A fix to allow LUN recovery in the case at
hand is a topic for a separate patch.

See also commit fdbd1c5e27 ("[SCSI] zfcp: Allow running unit/LUN shutdown
without acquiring reference") for a similar case and background info.

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Fixes: ae0904f60f ("[SCSI] zfcp: Redesign of the debug tracing for recovery actions.")
Cc: <stable@vger.kernel.org> #2.6.38+
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-03 11:21:31 +02:00
Steffen Maier
9d8aee0700 scsi: zfcp: fix missing SCSI trace for retry of abort / scsi_eh TMF
commit 81979ae63e872ef650a7197f6ce6590059d37172 upstream.

We already have a SCSI trace for the end of abort and scsi_eh TMF. Due to
zfcp_erp_wait() and fc_block_scsi_eh() time can pass between the start of
our eh callback and an actual send/recv of an abort / TMF request.  In order
to see the temporal sequence including any abort / TMF send retries, add a
trace before the above two blocking functions.  This supports problem
determination with scsi_eh and parallel zfcp ERP.

No need to explicitly trace the beginning of our eh callback, since we
typically can send an abort / TMF and see its HBA response (in the worst
case, it's a pseudo response on dismiss all of adapter recovery, e.g. due to
an FSF request timeout [fsrth_1] of the abort / TMF). If we cannot send, we
now get a trace record for the first "abrt_wt" or "[lt]r_wait" which denotes
almost the beginning of the callback.

No need to explicitly trace the wakeup after the above two blocking
functions because the next retry loop causes another trace in any case and
that is sufficient.

Example trace records formatted with zfcpdbf from s390-tools:

Timestamp      : ...
Area           : SCSI
Subarea        : 00
Level          : 1
Exception      : -
CPU ID         : ..
Caller         : 0x...
Record ID      : 1
Tag            : abrt_wt        abort, before zfcp_erp_wait()
Request ID     : 0x0000000000000000                     none (invalid)
SCSI ID        : 0x<scsi_id>
SCSI LUN       : 0x<scsi_lun>
SCSI LUN high  : 0x<scsi_lun_high>
SCSI result    : 0x<scsi_result_of_cmd_to_be_aborted>
SCSI retries   : 0x<retries_of_cmd_to_be_aborted>
SCSI allowed   : 0x<allowed_retries_of_cmd_to_be_aborted>
SCSI scribble  : 0x<req_id_of_cmd_to_be_aborted>
SCSI opcode    : <CDB_of_cmd_to_be_aborted>
FCP rsp inf cod: 0x..                                   none (invalid)
FCP rsp IU     : ...                                    none (invalid)

Timestamp      : ...
Area           : SCSI
Subarea        : 00
Level          : 1
Exception      : -
CPU ID         : ..
Caller         : 0x...
Record ID      : 1
Tag            : lr_wait        LUN reset, before zfcp_erp_wait()
Request ID     : 0x0000000000000000                     none (invalid)
SCSI ID        : 0x<scsi_id>
SCSI LUN       : 0x<scsi_lun>
SCSI LUN high  : 0x<scsi_lun_high>
SCSI result    : 0x...                                  unrelated
SCSI retries   : 0x..                                   unrelated
SCSI allowed   : 0x..                                   unrelated
SCSI scribble  : 0x...                                  unrelated
SCSI opcode    : ...                                    unrelated
FCP rsp inf cod: 0x..                                   none (invalid)
FCP rsp IU     : ...                                    none (invalid)

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Fixes: 63caf367e1 ("[SCSI] zfcp: Improve reliability of SCSI eh handlers in zfcp")
Fixes: af4de36d91 ("[SCSI] zfcp: Block scsi_eh thread for rport state BLOCKED")
Cc: <stable@vger.kernel.org> #2.6.38+
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-03 11:21:31 +02:00
Steffen Maier
faeba52abe scsi: zfcp: fix missing SCSI trace for result of eh_host_reset_handler
commit df30781699f53e4fd4c494c6f7dd16e3d5c21d30 upstream.

For problem determination we need to see whether and why we were successful
or not. This allows deduction of scsi_eh escalation.

Example trace record formatted with zfcpdbf from s390-tools:

Timestamp      : ...
Area           : SCSI
Subarea        : 00
Level          : 1
Exception      : -
CPU ID         : ..
Caller         : 0x...
Record ID      : 1
Tag            : schrh_r        SCSI host reset handler result
Request ID     : 0x0000000000000000                     none (invalid)
SCSI ID        : 0xffffffff                             none (invalid)
SCSI LUN       : 0xffffffff                             none (invalid)
SCSI LUN high  : 0xffffffff                             none (invalid)
SCSI result    : 0x00002002     field re-used for midlayer value: SUCCESS
                                or in other cases: 0x2009 == FAST_IO_FAIL
SCSI retries   : 0xff                                   none (invalid)
SCSI allowed   : 0xff                                   none (invalid)
SCSI scribble  : 0xffffffffffffffff                     none (invalid)
SCSI opcode    : ffffffff ffffffff ffffffff ffffffff    none (invalid)
FCP rsp inf cod: 0xff                                   none (invalid)
FCP rsp IU     : 00000000 00000000 00000000 00000000    none (invalid)
                 00000000 00000000

v2.6.35 commit a1dbfddd02 ("[SCSI] zfcp: Pass return code from
fc_block_scsi_eh to scsi eh") introduced the first return with something
other than the previously hardcoded single SUCCESS return path.

Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Fixes: a1dbfddd02 ("[SCSI] zfcp: Pass return code from fc_block_scsi_eh to scsi eh")
Cc: <stable@vger.kernel.org> #2.6.38+
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-03 11:21:31 +02:00
Jens Remus
3679713402 scsi: zfcp: fix infinite iteration on ERP ready list
commit fa89adba1941e4f3b213399b81732a5c12fd9131 upstream.

zfcp_erp_adapter_reopen() schedules blocking of all of the adapter's
rports via zfcp_scsi_schedule_rports_block() and enqueues a reopen
adapter ERP action via zfcp_erp_action_enqueue(). Both are separately
processed asynchronously and concurrently.

Blocking of rports is done in a kworker by zfcp_scsi_rport_work(). It
calls zfcp_scsi_rport_block(), which then traces a DBF REC "scpdely" via
zfcp_dbf_rec_trig().  zfcp_dbf_rec_trig() acquires the DBF REC spin lock
and then iterates with list_for_each() over the adapter's ERP ready list
without holding the ERP lock. This opens a race window in which the
current list entry can be moved to another list, causing list_for_each()
to iterate forever on the wrong list, as the erp_ready_head is never
encountered as terminal condition.

Meanwhile the ERP action can be processed in the ERP thread by
zfcp_erp_thread(). It calls zfcp_erp_strategy(), which acquires the ERP
lock and then calls zfcp_erp_action_to_running() to move the ERP action
from the ready to the running list.  zfcp_erp_action_to_running() can
move the ERP action using list_move() just during the aforementioned
race window. It then traces a REC RUN "erator1" via zfcp_dbf_rec_run().
zfcp_dbf_rec_run() tries to acquire the DBF REC spin lock. If this is
held by the infinitely looping kworker, it effectively spins forever.

Example Sequence Diagram:

Process                ERP Thread             rport_work
-------------------    -------------------    -------------------
zfcp_erp_adapter_reopen()
zfcp_erp_adapter_block()
zfcp_scsi_schedule_rports_block()
lock ERP                                      zfcp_scsi_rport_work()
zfcp_erp_action_enqueue(ZFCP_ERP_ACTION_REOPEN_ADAPTER)
list_add_tail() on ready                      !(rport_task==RPORT_ADD)
wake_up() ERP thread                          zfcp_scsi_rport_block()
zfcp_dbf_rec_trig()    zfcp_erp_strategy()    zfcp_dbf_rec_trig()
unlock ERP                                    lock DBF REC
zfcp_erp_wait()        lock ERP
|                      zfcp_erp_action_to_running()
|                                             list_for_each() ready
|                      list_move()              current entry
|                        ready to running
|                      zfcp_dbf_rec_run()       endless loop over running
|                      zfcp_dbf_rec_run_lvl()
|                      lock DBF REC spins forever

Any adapter recovery can trigger this, such as setting the device offline
or reboot.

V4.9 commit 4eeaa4f3f1d6 ("zfcp: close window with unblocked rport
during rport gone") introduced additional tracing of (un)blocking of
rports. It missed that the adapter->erp_lock must be held when calling
zfcp_dbf_rec_trig().

This fix uses the approach formerly introduced by commit aa0fec6239
("[SCSI] zfcp: Fix sparse warning by providing new entry in dbf") that got
later removed by commit ae0904f60f ("[SCSI] zfcp: Redesign of the debug
tracing for recovery actions.").

Introduce zfcp_dbf_rec_trig_lock(), a wrapper for zfcp_dbf_rec_trig() that
acquires and releases the adapter->erp_lock for read.

Reported-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Jens Remus <jremus@linux.ibm.com>
Fixes: 4eeaa4f3f1d6 ("zfcp: close window with unblocked rport during rport gone")
Cc: <stable@vger.kernel.org> # 2.6.32+
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26 08:49:00 +02:00
Steffen Maier
6e9abbc964 scsi: zfcp: fix erp_action use-before-initialize in REC action trace
commit ab31fd0ce65ec93828b617123792c1bb7c6dcc42 upstream.

v4.10 commit 6f2ce1c6af37 ("scsi: zfcp: fix rport unblock race with LUN
recovery") extended accessing parent pointer fields of struct
zfcp_erp_action for tracing.  If an erp_action has never been enqueued
before, these parent pointer fields are uninitialized and NULL. Examples
are zfcp objects freshly added to the parent object's children list,
before enqueueing their first recovery subsequently. In
zfcp_erp_try_rport_unblock(), we iterate such list. Accessing erp_action
fields can cause a NULL pointer dereference.  Since the kernel can read
from lowcore on s390, it does not immediately cause a kernel page
fault. Instead it can cause hangs on trying to acquire the wrong
erp_action->adapter->dbf->rec_lock in zfcp_dbf_rec_action_lvl()
                      ^bogus^
while holding already other locks with IRQs disabled.

Real life example from attaching lots of LUNs in parallel on many CPUs:

crash> bt 17723
PID: 17723  TASK: ...               CPU: 25  COMMAND: "zfcperp0.0.1800"
 LOWCORE INFO:
  -psw      : 0x0404300180000000 0x000000000038e424
  -function : _raw_spin_lock_wait_flags at 38e424
...
 #0 [fdde8fc90] zfcp_dbf_rec_action_lvl at 3e0004e9862 [zfcp]
 #1 [fdde8fce8] zfcp_erp_try_rport_unblock at 3e0004dfddc [zfcp]
 #2 [fdde8fd38] zfcp_erp_strategy at 3e0004e0234 [zfcp]
 #3 [fdde8fda8] zfcp_erp_thread at 3e0004e0a12 [zfcp]
 #4 [fdde8fe60] kthread at 173550
 #5 [fdde8feb8] kernel_thread_starter at 10add2

zfcp_adapter
 zfcp_port
  zfcp_unit <address>, 0x404040d600000000
  scsi_device NULL, returning early!
zfcp_scsi_dev.status = 0x40000000
0x40000000 ZFCP_STATUS_COMMON_RUNNING

crash> zfcp_unit <address>
struct zfcp_unit {
  erp_action = {
    adapter = 0x0,
    port = 0x0,
    unit = 0x0,
  },
}

zfcp_erp_action is always fully embedded into its container object. Such
container object is never moved in its object tree (only add or delete).
Hence, erp_action parent pointers can never change.

To fix the issue, initialize the erp_action parent pointers before
adding the erp_action container to any list and thus before it becomes
accessible from outside of its initializing function.

In order to also close the time window between zfcp_erp_setup_act()
memsetting the entire erp_action to zero and setting the parent pointers
again, drop the memset and instead explicitly initialize individually
all erp_action fields except for parent pointers. To be extra careful
not to introduce any other unintended side effect, even keep zeroing the
erp_action fields for list and timer. Also double-check with
WARN_ON_ONCE that erp_action parent pointers never change, so we get to
know when we would deviate from previous behavior.

Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: 6f2ce1c6af37 ("scsi: zfcp: fix rport unblock race with LUN recovery")
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02 09:40:49 +01:00
Steffen Maier
4dd6cbbc21 scsi: zfcp: trace high part of "new" 64 bit SCSI LUN
commit 5d4a3d0a2ff23799b956e5962b886287614e7fad upstream.

Complements debugging aspects of the otherwise functionally complete
v3.17 commit 9cb78c16f5 ("scsi: use 64-bit LUNs").

While I don't have access to a target exporting 3 or 4 level LUNs,
I did test it by explicitly attaching a non-existent fake 4 level LUN
by means of zfcp sysfs attribute "unit_add".
In order to see corresponding trace records of otherwise successful
events, we had to increase the trace level of area SCSI and HBA to 6.

$ echo 6 > /sys/kernel/debug/s390dbf/zfcp_0.0.1880_scsi/level
$ echo 6 > /sys/kernel/debug/s390dbf/zfcp_0.0.1880_hba/level

$ echo 0x4011402240334044 > \
  /sys/bus/ccw/drivers/zfcp/0.0.1880/0x50050763031bd327/unit_add

Example output formatted by an updated zfcpdbf from the s390-tools
package interspersed with kernel messages at scsi_logging_level=4605:

Timestamp      : ...
Area           : REC
Subarea        : 00
Level          : 1
Exception      : -
CPU ID         : ..
Caller         : 0x...
Record ID      : 1
Tag            : scsla_1
LUN            : 0x4011402240334044
WWPN           : 0x50050763031bd327
D_ID           : 0x00......
Adapter status : 0x5400050b
Port status    : 0x54000001
LUN status     : 0x41000000
Ready count    : 0x00000001
Running count  : 0x00000000
ERP want       : 0x01
ERP need       : 0x01

scsi 2:0:0:4630896905707208721: scsi scan: INQUIRY pass 1 length 36
scsi 2:0:0:4630896905707208721: scsi scan: INQUIRY successful with code 0x0

Timestamp      : ...
Area           : HBA
Subarea        : 00
Level          : 6
Exception      : -
CPU ID         : ..
Caller         : 0x...
Record ID      : 1
Tag            : fs_norm
Request ID     : 0x<inquiry2-req-id>
Request status : 0x00000010
FSF cmnd       : 0x00000001
FSF sequence no: 0x...
FSF issued     : ...
FSF stat       : 0x00000000
FSF stat qual  : 00000000 00000000 00000000 00000000
Prot stat      : 0x00000001
Prot stat qual : ........ ........ 00000000 00000000
Port handle    : 0x...
LUN handle     : 0x...
|
Timestamp      : ...
Area           : SCSI
Subarea        : 00
Level          : 6
Exception      : -
CPU ID         : ..
Caller         : 0x...
Record ID      : 1
Tag            : rsl_nor
Request ID     : 0x<inquiry2-req-id>
SCSI ID        : 0x00000000
SCSI LUN       : 0x40224011
SCSI LUN high  : 0x40444033 <=======================
SCSI result    : 0x00000000
SCSI retries   : 0x00
SCSI allowed   : 0x03
SCSI scribble  : 0x<inquiry2-req-id>
SCSI opcode    : 12000000 a4000000 00000000 00000000
FCP rsp inf cod: 0x00
FCP rsp IU     : 00000000 00000000 00000000 00000000
                 00000000 00000000

scsi 2:0:0:4630896905707208721: scsi scan: INQUIRY pass 2 length 164
scsi 2:0:0:4630896905707208721: scsi scan: INQUIRY successful with code 0x0
scsi 2:0:0:4630896905707208721: scsi scan: peripheral device type of 31, \
no device added

Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: 9cb78c16f5 ("scsi: use 64-bit LUNs")
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27 11:00:15 +02:00
Steffen Maier
1e6c640a75 scsi: zfcp: trace HBA FSF response by default on dismiss or timedout late response
commit fdb7cee3b9e3c561502e58137a837341f10cbf8b upstream.

At the default trace level, we only trace unsuccessful events including
FSF responses.

zfcp_dbf_hba_fsf_response() only used protocol status and FSF status to
decide on an unsuccessful response. However, this is only one of multiple
possible sources determining a failed struct zfcp_fsf_req.

An FSF request can also "fail" if its response runs into an ERP timeout
or if it gets dismissed because a higher level recovery was triggered
[trace tags "erscf_1" or "erscf_2" in zfcp_erp_strategy_check_fsfreq()].
FSF requests with ERP timeout are:
FSF_QTCB_EXCHANGE_CONFIG_DATA, FSF_QTCB_EXCHANGE_PORT_DATA,
FSF_QTCB_OPEN_PORT_WITH_DID or FSF_QTCB_CLOSE_PORT or
FSF_QTCB_CLOSE_PHYSICAL_PORT for target ports,
FSF_QTCB_OPEN_LUN, FSF_QTCB_CLOSE_LUN.
One example is slow queue processing which can cause follow-on errors,
e.g. FSF_PORT_ALREADY_OPEN after FSF_QTCB_OPEN_PORT_WITH_DID timed out.
In order to see the root cause, we need to see late responses even if the
channel presented them successfully with FSF_PROT_GOOD and FSF_GOOD.
Example trace records formatted with zfcpdbf from the s390-tools package:

Timestamp      : ...
Area           : REC
Subarea        : 00
Level          : 1
Exception      : -
CPU ID         : ..
Caller         : ...
Record ID      : 1
Tag            : fcegpf1
LUN            : 0xffffffffffffffff
WWPN           : 0x<WWPN>
D_ID           : 0x00<D_ID>
Adapter status : 0x5400050b
Port status    : 0x41200000
LUN status     : 0x00000000
Ready count    : 0x00000001
Running count  : 0x...
ERP want       : 0x02				ZFCP_ERP_ACTION_REOPEN_PORT
ERP need       : 0x02				ZFCP_ERP_ACTION_REOPEN_PORT
|
Timestamp      : ...				30 seconds later
Area           : REC
Subarea        : 00
Level          : 1
Exception      : -
CPU ID         : ..
Caller         : ...
Record ID      : 2
Tag            : erscf_2
LUN            : 0xffffffffffffffff
WWPN           : 0x<WWPN>
D_ID           : 0x00<D_ID>
Adapter status : 0x5400050b
Port status    : 0x41200000
LUN status     : 0x00000000
Request ID     : 0x<request_ID>
ERP status     : 0x10000000			ZFCP_STATUS_ERP_TIMEDOUT
ERP step       : 0x0800				ZFCP_ERP_STEP_PORT_OPENING
ERP action     : 0x02				ZFCP_ERP_ACTION_REOPEN_PORT
ERP count      : 0x00
|
Timestamp      : ...				later than previous record
Area           : HBA
Subarea        : 00
Level          : 5	> default level		=> 3	<= default level
Exception      : -
CPU ID         : 00
Caller         : ...
Record ID      : 1
Tag            : fs_qtcb			=> fs_rerr
Request ID     : 0x<request_ID>
Request status : 0x00001010			ZFCP_STATUS_FSFREQ_DISMISSED
						| ZFCP_STATUS_FSFREQ_CLEANUP
FSF cmnd       : 0x00000005
FSF sequence no: 0x...
FSF issued     : ...				> 30 seconds ago
FSF stat       : 0x00000000			FSF_GOOD
FSF stat qual  : 00000000 00000000 00000000 00000000
Prot stat      : 0x00000001			FSF_PROT_GOOD
Prot stat qual : 00000000 00000000 00000000 00000000
Port handle    : 0x...
LUN handle     : 0x00000000
QTCB log length: ...
QTCB log info  : ...

In case of problems detecting that new responses are waiting on the input
queue, we sooner or later trigger adapter recovery due to an FSF request
timeout (trace tag "fsrth_1").
FSF requests with FSF request timeout are:
typically FSF_QTCB_ABORT_FCP_CMND; but theoretically also
FSF_QTCB_EXCHANGE_CONFIG_DATA or FSF_QTCB_EXCHANGE_PORT_DATA via sysfs,
FSF_QTCB_OPEN_PORT_WITH_DID or FSF_QTCB_CLOSE_PORT for WKA ports,
FSF_QTCB_FCP_CMND for task management function (LUN / target reset).
One or more pending requests can meanwhile have FSF_PROT_GOOD and FSF_GOOD
because the channel filled in the response via DMA into the request's QTCB.

In a theroretical case, inject code can create an erroneous FSF request
on purpose. If data router is enabled, it uses deferred error reporting.
A READ SCSI command can succeed with FSF_PROT_GOOD, FSF_GOOD, and
SAM_STAT_GOOD. But on writing the read data to host memory via DMA,
it can still fail, e.g. if an intentionally wrong scatter list does not
provide enough space. Rather than getting an unsuccessful response,
we get a QDIO activate check which in turn triggers adapter recovery.
One or more pending requests can meanwhile have FSF_PROT_GOOD and FSF_GOOD
because the channel filled in the response via DMA into the request's QTCB.
Example trace records formatted with zfcpdbf from the s390-tools package:

Timestamp      : ...
Area           : HBA
Subarea        : 00
Level          : 6	> default level		=> 3	<= default level
Exception      : -
CPU ID         : ..
Caller         : ...
Record ID      : 1
Tag            : fs_norm			=> fs_rerr
Request ID     : 0x<request_ID2>
Request status : 0x00001010			ZFCP_STATUS_FSFREQ_DISMISSED
						| ZFCP_STATUS_FSFREQ_CLEANUP
FSF cmnd       : 0x00000001
FSF sequence no: 0x...
FSF issued     : ...
FSF stat       : 0x00000000			FSF_GOOD
FSF stat qual  : 00000000 00000000 00000000 00000000
Prot stat      : 0x00000001			FSF_PROT_GOOD
Prot stat qual : ........ ........ 00000000 00000000
Port handle    : 0x...
LUN handle     : 0x...
|
Timestamp      : ...
Area           : SCSI
Subarea        : 00
Level          : 3
Exception      : -
CPU ID         : ..
Caller         : ...
Record ID      : 1
Tag            : rsl_err
Request ID     : 0x<request_ID2>
SCSI ID        : 0x...
SCSI LUN       : 0x...
SCSI result    : 0x000e0000			DID_TRANSPORT_DISRUPTED
SCSI retries   : 0x00
SCSI allowed   : 0x05
SCSI scribble  : 0x<request_ID2>
SCSI opcode    : 28...				Read(10)
FCP rsp inf cod: 0x00
FCP rsp IU     : 00000000 00000000 00000000 00000000
                                         ^^	SAM_STAT_GOOD
                 00000000 00000000

Only with luck in both above cases, we could see a follow-on trace record
of an unsuccesful event following a successful but late FSF response with
FSF_PROT_GOOD and FSF_GOOD. Typically this was the case for I/O requests
resulting in a SCSI trace record "rsl_err" with DID_TRANSPORT_DISRUPTED
[On ZFCP_STATUS_FSFREQ_DISMISSED, zfcp_fsf_protstatus_eval() sets
ZFCP_STATUS_FSFREQ_ERROR seen by the request handler functions as failure].
However, the reason for this follow-on trace was invisible because the
corresponding HBA trace record was missing at the default trace level
(by default hidden records with tags "fs_norm", "fs_qtcb", or "fs_open").

On adapter recovery, after we had shut down the QDIO queues, we perform
unsuccessful pseudo completions with flag ZFCP_STATUS_FSFREQ_DISMISSED
for each pending FSF request in zfcp_fsf_req_dismiss_all().
In order to find the root cause, we need to see all pseudo responses even
if the channel presented them successfully with FSF_PROT_GOOD and FSF_GOOD.

Therefore, check zfcp_fsf_req.status for ZFCP_STATUS_FSFREQ_DISMISSED
or ZFCP_STATUS_FSFREQ_ERROR and trace with a new tag "fs_rerr".

It does not matter that there are numerous places which set
ZFCP_STATUS_FSFREQ_ERROR after the location where we trace an FSF response
early. These cases are based on protocol status != FSF_PROT_GOOD or
== FSF_PROT_FSF_STATUS_PRESENTED and are thus already traced by default
as trace tag "fs_perr" or "fs_ferr" respectively.

NB: The trace record with tag "fssrh_1" for status read buffers on dismiss
all remains. zfcp_fsf_req_complete() handles this and returns early.
All other FSF request types are handled separately and as described above.

Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: 8a36e4532e ("[SCSI] zfcp: enhancement of zfcp debug features")
Fixes: 2e261af84c ("[SCSI] zfcp: Only collect FSF/HBA debug data for matching trace levels")
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27 11:00:15 +02:00
Steffen Maier
7194822422 scsi: zfcp: fix payload with full FCP_RSP IU in SCSI trace records
commit 12c3e5754c8022a4f2fd1e9f00d19e99ee0d3cc1 upstream.

If the FCP_RSP UI has optional parts (FCP_SNS_INFO or FCP_RSP_INFO) and
thus does not fit into the fsp_rsp field built into a SCSI trace record,
trace the full FCP_RSP UI with all optional parts as payload record
instead of just FCP_SNS_INFO as payload and
a 1 byte RSP_INFO_CODE part of FCP_RSP_INFO built into the SCSI record.

That way we would also get the full FCP_SNS_INFO in case a
target would ever send more than
min(SCSI_SENSE_BUFFERSIZE==96, ZFCP_DBF_PAY_MAX_REC==256)==96.

The mandatory part of FCP_RSP IU is only 24 bytes.
PAYload costs at least one full PAY record of 256 bytes anyway.
We cap to the hardware response size which is only FSF_FCP_RSP_SIZE==128.
So we can just put the whole FCP_RSP IU with any optional parts into
PAYload similarly as we do for SAN PAY since v4.9 commit aceeffbb59bb
("zfcp: trace full payload of all SAN records (req,resp,iels)").
This does not cause any additional trace records wasting memory.

Decoded trace records were confusing because they showed a hard-coded
sense data length of 96 even if the FCP_RSP_IU field FCP_SNS_LEN showed
actually less.

Since the same commit, we set pl_len for SAN traces to the full length of a
request/response even if we cap the corresponding trace.
In contrast, here for SCSI traces we set pl_len to the pre-computed
length of FCP_RSP IU considering SNS_LEN or RSP_LEN if valid.
Nonetheless we trace a hardcoded payload of length FSF_FCP_RSP_SIZE==128
if there were optional parts.
This makes it easier for the zfcpdbf tool to format only the relevant
part of the long FCP_RSP UI buffer. And any trailing information is still
available in the payload trace record just in case.

Rename the payload record tag from "fcp_sns" to "fcp_riu" to make the new
content explicit to zfcpdbf which can then pick a suitable field name such
as "FCP rsp IU all:" instead of "Sense info :"
Also, the same zfcpdbf can still be backwards compatible with "fcp_sns".

Old example trace record before this fix, formatted with the tool zfcpdbf
from s390-tools:

Timestamp      : ...
Area           : SCSI
Subarea        : 00
Level          : 3
Exception      : -
CPU id         : ..
Caller         : 0x...
Record id      : 1
Tag            : rsl_err
Request id     : 0x<request_id>
SCSI ID        : 0x...
SCSI LUN       : 0x...
SCSI result    : 0x00000002
SCSI retries   : 0x00
SCSI allowed   : 0x05
SCSI scribble  : 0x<request_id>
SCSI opcode    : 00000000 00000000 00000000 00000000
FCP rsp inf cod: 0x00
FCP rsp IU     : 00000000 00000000 00000202 00000000
                                       ^^==FCP_SNS_LEN_VALID
                 00000020 00000000
                 ^^^^^^^^==FCP_SNS_LEN==32
Sense len      : 96 <==min(SCSI_SENSE_BUFFERSIZE,ZFCP_DBF_PAY_MAX_REC)
Sense info     : 70000600 00000018 00000000 29000000
                 00000400 00000000 00000000 00000000
                 00000000 00000000 00000000 00000000<==superfluous
                 00000000 00000000 00000000 00000000<==superfluous
                 00000000 00000000 00000000 00000000<==superfluous
                 00000000 00000000 00000000 00000000<==superfluous

New example trace records with this fix:

Timestamp      : ...
Area           : SCSI
Subarea        : 00
Level          : 3
Exception      : -
CPU ID         : ..
Caller         : 0x...
Record ID      : 1
Tag            : rsl_err
Request ID     : 0x<request_id>
SCSI ID        : 0x...
SCSI LUN       : 0x...
SCSI result    : 0x00000002
SCSI retries   : 0x00
SCSI allowed   : 0x03
SCSI scribble  : 0x<request_id>
SCSI opcode    : a30c0112 00000000 02000000 00000000
FCP rsp inf cod: 0x00
FCP rsp IU     : 00000000 00000000 00000a02 00000200
                 00000020 00000000
FCP rsp IU len : 56
FCP rsp IU all : 00000000 00000000 00000a02 00000200
                                       ^^=FCP_RESID_UNDER|FCP_SNS_LEN_VALID
                 00000020 00000000 70000500 00000018
                 ^^^^^^^^==FCP_SNS_LEN
                                   ^^^^^^^^^^^^^^^^^
                 00000000 240000cb 00011100 00000000
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                 00000000 00000000
                 ^^^^^^^^^^^^^^^^^==FCP_SNS_INFO

Timestamp      : ...
Area           : SCSI
Subarea        : 00
Level          : 1
Exception      : -
CPU ID         : ..
Caller         : 0x...
Record ID      : 1
Tag            : lr_okay
Request ID     : 0x<request_id>
SCSI ID        : 0x...
SCSI LUN       : 0x...
SCSI result    : 0x00000000
SCSI retries   : 0x00
SCSI allowed   : 0x05
SCSI scribble  : 0x<request_id>
SCSI opcode    : <CDB of unrelated SCSI command passed to eh handler>
FCP rsp inf cod: 0x00
FCP rsp IU     : 00000000 00000000 00000100 00000000
                 00000000 00000008
FCP rsp IU len : 32
FCP rsp IU all : 00000000 00000000 00000100 00000000
                                       ^^==FCP_RSP_LEN_VALID
                 00000000 00000008 00000000 00000000
                          ^^^^^^^^==FCP_RSP_LEN
                                   ^^^^^^^^^^^^^^^^^==FCP_RSP_INFO

Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: 250a1352b9 ("[SCSI] zfcp: Redesign of the debug tracing for SCSI records.")
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27 11:00:15 +02:00
Steffen Maier
d0fbe221b8 scsi: zfcp: fix missing trace records for early returns in TMF eh handlers
commit 1a5d999ebfc7bfe28deb48931bb57faa8e4102b6 upstream.

For problem determination we need to see that we were in scsi_eh
as well as whether and why we were successful or not.

The following commits introduced new early returns without adding
a trace record:

v2.6.35 commit a1dbfddd02
("[SCSI] zfcp: Pass return code from fc_block_scsi_eh to scsi eh")
on fc_block_scsi_eh() returning != 0 which is FAST_IO_FAIL,

v2.6.30 commit 63caf367e1
("[SCSI] zfcp: Improve reliability of SCSI eh handlers in zfcp")
on not having gotten an FSF request after the maximum number of retry
attempts and thus could not issue a TMF and has to return FAILED.

Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: a1dbfddd02 ("[SCSI] zfcp: Pass return code from fc_block_scsi_eh to scsi eh")
Fixes: 63caf367e1 ("[SCSI] zfcp: Improve reliability of SCSI eh handlers in zfcp")
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27 11:00:15 +02:00
Steffen Maier
1a84736948 scsi: zfcp: fix passing fsf_req to SCSI trace on TMF to correlate with HBA
commit 9fe5d2b2fd30aa8c7827ec62cbbe6d30df4fe3e3 upstream.

Without this fix we get SCSI trace records on task management functions
which cannot be correlated to HBA trace records because all fields
related to the FSF request are empty (zero).
Also, the FCP_RSP_IU is missing as well as any sense data if available.

This was caused by v2.6.14 commit 8a36e4532e ("[SCSI] zfcp: enhancement
of zfcp debug features") introducing trace records for TMFs but
hard coding NULL for a possibly existing TMF FSF request.
The scsi_cmnd scribble is also zero or unrelated for the TMF request
so it also could not lookup a suitable FSF request from there.

A broken example trace record formatted with zfcpdbf from the s390-tools
package:

Timestamp      : ...
Area           : SCSI
Subarea        : 00
Level          : 1
Exception      : -
CPU ID         : ..
Caller         : 0x...
Record ID      : 1
Tag            : lr_fail
Request ID     : 0x0000000000000000
                   ^^^^^^^^^^^^^^^^ no correlation to HBA record
SCSI ID        : 0x<scsitarget>
SCSI LUN       : 0x<scsilun>
SCSI result    : 0x000e0000
SCSI retries   : 0x00
SCSI allowed   : 0x05
SCSI scribble  : 0x0000000000000000
SCSI opcode    : 2a000017 3bb80000 08000000 00000000
FCP rsp inf cod: 0x00
                   ^^ no TMF response
FCP rsp IU     : 00000000 00000000 00000000 00000000
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                 00000000 00000000
                 ^^^^^^^^^^^^^^^^^ no interesting FCP_RSP_IU
Sense len      : ...
^^^^^^^^^^^^^^^^^^^^ no sense data length
Sense info     : ...
^^^^^^^^^^^^^^^^^^^^ no sense data content, even if present

There are some true cases where we really do not have an FSF request:
"rsl_fai" from zfcp_dbf_scsi_fail_send() called for early
returns / completions in zfcp_scsi_queuecommand(),
"abrt_or", "abrt_bl", "abrt_ru", "abrt_ar" from
zfcp_scsi_eh_abort_handler() where we did not get as far,
"lr_nres", "tr_nres" from zfcp_task_mgmt_function() where we're
successful and do not need to do anything because adapter stopped.
For these cases it's correct to pass NULL for fsf_req to _zfcp_dbf_scsi().

Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: 8a36e4532e ("[SCSI] zfcp: enhancement of zfcp debug features")
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27 11:00:15 +02:00
Steffen Maier
52661717ee scsi: zfcp: fix capping of unsuccessful GPN_FT SAN response trace records
commit 975171b4461be296a35e83ebd748946b81cf0635 upstream.

v4.9 commit aceeffbb59bb ("zfcp: trace full payload of all SAN records
(req,resp,iels)") fixed trace data loss of 2.6.38 commit 2c55b750a8
("[SCSI] zfcp: Redesign of the debug tracing for SAN records.")
necessary for problem determination, e.g. to see the
currently active zone set during automatic port scan.

While it already saves space by not dumping any empty residual entries
of the large successful GPN_FT response (4 pages), there are seldom cases
where the GPN_FT response is unsuccessful and likely does not have
FC_NS_FID_LAST set in fp_flags so we did not cap the trace record.
We typically see such case for an initiator WWPN, which is not in any zone.

Cap unsuccessful responses to at least the actual basic CT_IU response
plus whatever fits the SAN trace record built-in "payload" buffer
just in case there's trailing information
of which we would at least see the existence and its beginning.

In order not to erroneously cap successful responses, we need to swap
calling the trace function and setting the CT / ELS status to success (0).

Example trace record pair formatted with zfcpdbf:

Timestamp      : ...
Area           : SAN
Subarea        : 00
Level          : 1
Exception      : -
CPU ID         : ..
Caller         : 0x...
Record ID      : 1
Tag            : fssct_1
Request ID     : 0x<request_id>
Destination ID : 0x00fffffc
SAN req short  : 01000000 fc020000 01720ffc 00000000
                 00000008
SAN req length : 20
|
Timestamp      : ...
Area           : SAN
Subarea        : 00
Level          : 1
Exception      : -
CPU ID         : ..
Caller         : 0x...
Record ID      : 2
Tag            : fsscth2
Request ID     : 0x<request_id>
Destination ID : 0x00fffffc
SAN resp short : 01000000 fc020000 80010000 00090700
                 00000000 00000000 00000000 00000000 [trailing info]
                 00000000 00000000 00000000 00000000 [trailing info]
SAN resp length: 16384
San resp info  : 01000000 fc020000 80010000 00090700
                 00000000 00000000 00000000 00000000 [trailing info]
                 00000000 00000000 00000000 00000000 [trailing info]
                 00000000 00000000 00000000 00000000 [trailing info]
                 00000000 00000000 00000000 00000000 [trailing info]
                 00000000 00000000 00000000 00000000 [trailing info]
                 00000000 00000000 00000000 00000000 [trailing info]
                 00000000 00000000 00000000 00000000 [trailing info]
                 00000000 00000000 00000000 00000000 [trailing info]
                 00000000 00000000 00000000 00000000 [trailing info]
                 00000000 00000000 00000000 00000000 [trailing info]
                 00000000 00000000 00000000 00000000 [trailing info]
                 00000000 00000000 00000000 00000000 [trailing info]
                 00000000 00000000 00000000 00000000 [trailing info]
                 00000000 00000000 00000000 00000000 [trailing info]
                 00000000 00000000 00000000 00000000 [trailing info]

The fix saves all but one of the previously associated 64 PAYload trace
record chunks of size 256 bytes each.

Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: aceeffbb59bb ("zfcp: trace full payload of all SAN records (req,resp,iels)")
Fixes: 2c55b750a8 ("[SCSI] zfcp: Redesign of the debug tracing for SAN records.")
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27 11:00:14 +02:00
Benjamin Block
d0c02c6f3e scsi: zfcp: add handling for FCP_RESID_OVER to the fcp ingress path
commit a099b7b1fc1f0418ab8d79ecf98153e1e134656e upstream.

Up until now zfcp would just ignore the FCP_RESID_OVER flag in the FCP
response IU. When this flag is set, it is possible, in regards to the
FCP standard, that the storage-server processes the command normally, up
to the point where data is missing and simply ignores those.

In this case no CHECK CONDITION would be set, and because we ignored the
FCP_RESID_OVER flag we resulted in at least a data loss or even
-corruption as a follow-up error, depending on how the
applications/layers on top behave. To prevent this, we now set the
host-byte of the corresponding scsi_cmnd to DID_ERROR.

Other storage-behaviors, where the same condition results in a CHECK
CONDITION set in the answer, don't need to be changed as they are
handled in the mid-layer already.

Following is an example trace record decoded with zfcpdbf from the
s390-tools package. We forcefully injected a fc_dl which is one byte too
small:

Timestamp      : ...
Area           : SCSI
Subarea        : 00
Level          : 3
Exception      : -
CPU ID         : ..
Caller         : 0x...
Record ID      : 1
Tag            : rsl_err
Request ID     : 0x...
SCSI ID        : 0x...
SCSI LUN       : 0x...
SCSI result    : 0x00070000
                     ^^DID_ERROR
SCSI retries   : 0x..
SCSI allowed   : 0x..
SCSI scribble  : 0x...
SCSI opcode    : 2a000000 00000000 08000000 00000000
FCP rsp inf cod: 0x00
FCP rsp IU     : 00000000 00000000 00000400 00000001
                                       ^^fr_flags==FCP_RESID_OVER
                                         ^^fr_status==SAM_STAT_GOOD
                                            ^^^^^^^^fr_resid
                 00000000 00000000

As of now, we don't actively handle to possibility that a response IU
has both flags - FCP_RESID_OVER and FCP_RESID_UNDER - set at once.

Reported-by: Luke M. Hopkins <lmhopkin@us.ibm.com>
Reviewed-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: 553448f6c4 ("[SCSI] zfcp: Message cleanup")
Fixes: ea127f975424 ("[PATCH] s390 (7/7): zfcp host adapter.") (tglx/history.git)
Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27 11:00:14 +02:00
Steffen Maier
cfc4996743 scsi: zfcp: fix queuecommand for scsi_eh commands when DIX enabled
commit 71b8e45da51a7b64a23378221c0a5868bd79da4f upstream.

Since commit db007fc5e2 ("[SCSI] Command protection operation"),
scsi_eh_prep_cmnd() saves scmd->prot_op and temporarily resets it to
SCSI_PROT_NORMAL.
Other FCP LLDDs such as qla2xxx and lpfc shield their queuecommand()
to only access any of scsi_prot_sg...() if
(scsi_get_prot_op(cmd) != SCSI_PROT_NORMAL).

Do the same thing for zfcp, which introduced DIX support with
commit ef3eb71d8b ("[SCSI] zfcp: Introduce experimental support for
DIF/DIX").

Otherwise, TUR SCSI commands as part of scsi_eh likely fail in zfcp,
because the regular SCSI command with DIX protection data, that scsi_eh
re-uses in scsi_send_eh_cmnd(), of course still has
(scsi_prot_sg_count() != 0) and so zfcp sends down bogus requests to the
FCP channel hardware.

This causes scsi_eh_test_devices() to have (finish_cmds == 0)
[not SCSI device is online or not scsi_eh_tur() failed]
so regular SCSI commands, that caused / were affected by scsi_eh,
are moved to work_q and scsi_eh_test_devices() itself returns false.
In turn, it unnecessarily escalates in our case in scsi_eh_ready_devs()
beyond host reset to finally scsi_eh_offline_sdevs()
which sets affected SCSI devices offline with the following kernel message:

"kernel: sd H:0:T:L: Device offlined - not ready after error recovery"

Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: ef3eb71d8b ("[SCSI] zfcp: Introduce experimental support for DIF/DIX")
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27 11:00:14 +02:00
Steffen Maier
52e02d6bf3 scsi: zfcp: fix use-after-free by not tracing WKA port open/close on failed send
commit 2dfa6688aafdc3f74efeb1cf05fb871465d67f79 upstream.

Dan Carpenter kindly reported:
<quote>
The patch d27a7cb91960: "zfcp: trace on request for open and close of
WKA port" from Aug 10, 2016, leads to the following static checker
warning:

	drivers/s390/scsi/zfcp_fsf.c:1615 zfcp_fsf_open_wka_port()
	warn: 'req' was already freed.

drivers/s390/scsi/zfcp_fsf.c
  1609          zfcp_fsf_start_timer(req, ZFCP_FSF_REQUEST_TIMEOUT);
  1610          retval = zfcp_fsf_req_send(req);
  1611          if (retval)
  1612                  zfcp_fsf_req_free(req);
                                          ^^^
Freed.

  1613  out:
  1614          spin_unlock_irq(&qdio->req_q_lock);
  1615          if (req && !IS_ERR(req))
  1616                  zfcp_dbf_rec_run_wka("fsowp_1", wka_port, req->req_id);
                                                                  ^^^^^^^^^^^
Use after free.

  1617          return retval;
  1618  }

Same thing for zfcp_fsf_close_wka_port() as well.
</quote>

Rather than relying on req being NULL (or ERR_PTR) for all cases where
we don't want to trace or should not trace,
simply check retval which is unconditionally initialized with -EIO != 0
and it can only become 0 on successful retval = zfcp_fsf_req_send(req).
With that we can also remove the then again unnecessary unconditional
initialization of req which was introduced with that earlier commit.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Suggested-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: d27a7cb91960 ("zfcp: trace on request for open and close of WKA port")
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-14 15:22:52 -08:00
Steffen Maier
565ae61d89 scsi: zfcp: fix rport unblock race with LUN recovery
commit 6f2ce1c6af37191640ee3ff6e8fc39ea10352f4c upstream.

It is unavoidable that zfcp_scsi_queuecommand() has to finish requests
with DID_IMM_RETRY (like fc_remote_port_chkready()) during the time
window when zfcp detected an unavailable rport but
fc_remote_port_delete(), which is asynchronous via
zfcp_scsi_schedule_rport_block(), has not yet blocked the rport.

However, for the case when the rport becomes available again, we should
prevent unblocking the rport too early.  In contrast to other FCP LLDDs,
zfcp has to open each LUN with the FCP channel hardware before it can
send I/O to a LUN.  So if a port already has LUNs attached and we
unblock the rport just after port recovery, recoveries of LUNs behind
this port can still be pending which in turn force
zfcp_scsi_queuecommand() to unnecessarily finish requests with
DID_IMM_RETRY.

This also opens a time window with unblocked rport (until the followup
LUN reopen recovery has finished).  If a scsi_cmnd timeout occurs during
this time window fc_timed_out() cannot work as desired and such command
would indeed time out and trigger scsi_eh. This prevents a clean and
timely path failover.  This should not happen if the path issue can be
recovered on FC transport layer such as path issues involving RSCNs.

Fix this by only calling zfcp_scsi_schedule_rport_register(), to
asynchronously trigger fc_remote_port_add(), after all LUN recoveries as
children of the rport have finished and no new recoveries of equal or
higher order were triggered meanwhile.  Finished intentionally includes
any recovery result no matter if successful or failed (still unblock
rport so other successful LUNs work).  For simplicity, we check after
each finished LUN recovery if there is another LUN recovery pending on
the same port and then do nothing.  We handle the special case of a
successful recovery of a port without LUN children the same way without
changing this case's semantics.

For debugging we introduce 2 new trace records written if the rport
unblock attempt was aborted due to still unfinished or freshly triggered
recovery. The records are only written above the default trace level.

Benjamin noticed the important special case of new recovery that can be
triggered between having given up the erp_lock and before calling
zfcp_erp_action_cleanup() within zfcp_erp_strategy().  We must avoid the
following sequence:

ERP thread                 rport_work      other context
-------------------------  --------------  --------------------------------
port is unblocked, rport still blocked,
 due to pending/running ERP action,
 so ((port->status & ...UNBLOCK) != 0)
 and (port->rport == NULL)
unlock ERP
zfcp_erp_action_cleanup()
case ZFCP_ERP_ACTION_REOPEN_LUN:
zfcp_erp_try_rport_unblock()
((status & ...UNBLOCK) != 0) [OLD!]
                                           zfcp_erp_port_reopen()
                                           lock ERP
                                           zfcp_erp_port_block()
                                           port->status clear ...UNBLOCK
                                           unlock ERP
                                           zfcp_scsi_schedule_rport_block()
                                           port->rport_task = RPORT_DEL
                                           queue_work(rport_work)
                           zfcp_scsi_rport_work()
                           (port->rport_task != RPORT_ADD)
                           port->rport_task = RPORT_NONE
                           zfcp_scsi_rport_block()
                           if (!port->rport) return
zfcp_scsi_schedule_rport_register()
port->rport_task = RPORT_ADD
queue_work(rport_work)
                           zfcp_scsi_rport_work()
                           (port->rport_task == RPORT_ADD)
                           port->rport_task = RPORT_NONE
                           zfcp_scsi_rport_register()
                           (port->rport == NULL)
                           rport = fc_remote_port_add()
                           port->rport = rport;

Now the rport was erroneously unblocked while the zfcp_port is blocked.
This is another situation we want to avoid due to scsi_eh
potential. This state would at least remain until the new recovery from
the other context finished successfully, or potentially forever if it
failed.  In order to close this race, we take the erp_lock inside
zfcp_erp_try_rport_unblock() when checking the status of zfcp_port or
LUN.  With that, the possible corresponding rport state sequences would
be: (unblock[ERP thread],block[other context]) if the ERP thread gets
erp_lock first and still sees ((port->status & ...UNBLOCK) != 0),
(block[other context],NOP[ERP thread]) if the ERP thread gets erp_lock
after the other context has already cleard ...UNBLOCK from port->status.

Since checking fields of struct erp_action is unsafe because they could
have been overwritten (re-used for new recovery) meanwhile, we only
check status of zfcp_port and LUN since these are only changed under
erp_lock elsewhere. Regarding the check of the proper status flags (port
or port_forced are similar to the shown adapter recovery):

[zfcp_erp_adapter_shutdown()]
zfcp_erp_adapter_reopen()
 zfcp_erp_adapter_block()
  * clear UNBLOCK ---------------------------------------+
 zfcp_scsi_schedule_rports_block()                       |
 write_lock_irqsave(&adapter->erp_lock, flags);-------+  |
 zfcp_erp_action_enqueue()                            |  |
  zfcp_erp_setup_act()                                |  |
   * set ERP_INUSE -----------------------------------|--|--+
 write_unlock_irqrestore(&adapter->erp_lock, flags);--+  |  |
.context-switch.                                         |  |
zfcp_erp_thread()                                        |  |
 zfcp_erp_strategy()                                     |  |
  write_lock_irqsave(&adapter->erp_lock, flags);------+  |  |
  ...                                                 |  |  |
  zfcp_erp_strategy_check_target()                    |  |  |
   zfcp_erp_strategy_check_adapter()                  |  |  |
    zfcp_erp_adapter_unblock()                        |  |  |
     * set UNBLOCK -----------------------------------|--+  |
  zfcp_erp_action_dequeue()                           |     |
   * clear ERP_INUSE ---------------------------------|-----+
  ...                                                 |
  write_unlock_irqrestore(&adapter->erp_lock, flags);-+

Hence, we should check for both UNBLOCK and ERP_INUSE because they are
interleaved.  Also we need to explicitly check ERP_FAILED for the link
down case which currently does not clear the UNBLOCK flag in
zfcp_fsf_link_down_info_eval().

Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: 8830271c48 ("[SCSI] zfcp: Dont fail SCSI commands when transitioning to blocked fc_rport")
Fixes: a2fa0aede0 ("[SCSI] zfcp: Block FC transport rports early on errors")
Fixes: 5f852be9e1 ("[SCSI] zfcp: Fix deadlock between zfcp ERP and SCSI")
Fixes: 338151e066 ("[SCSI] zfcp: make use of fc_remote_port_delete when target port is unavailable")
Fixes: 3859f6a248 ("[PATCH] zfcp: add rports to enable scsi_add_device to work again")
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09 08:07:48 +01:00
Steffen Maier
3b3739dfa6 scsi: zfcp: do not trace pure benign residual HBA responses at default level
commit 56d23ed7adf3974f10e91b643bd230e9c65b5f79 upstream.

Since quite a while, Linux issues enough SCSI commands per scsi_device
which successfully return with FCP_RESID_UNDER, FSF_FCP_RSP_AVAILABLE,
and SAM_STAT_GOOD.  This floods the HBA trace area and we cannot see
other and important HBA trace records long enough.

Therefore, do not trace HBA response errors for pure benign residual
under counts at the default trace level.

This excludes benign residual under count combined with other validity
bits set in FCP_RSP_IU, such as FCP_SNS_LEN_VAL.  For all those other
cases, we still do want to see both the HBA record and the corresponding
SCSI record by default.

Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: a54ca0f62f ("[SCSI] zfcp: Redesign of the debug tracing for HBA records.")
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09 08:07:48 +01:00
Benjamin Block
1cc0f9488b scsi: zfcp: fix use-after-"free" in FC ingress path after TMF
commit dac37e15b7d511e026a9313c8c46794c144103cd upstream.

When SCSI EH invokes zFCP's callbacks for eh_device_reset_handler() and
eh_target_reset_handler(), it expects us to relent the ownership over
the given scsi_cmnd and all other scsi_cmnds within the same scope - LUN
or target - when returning with SUCCESS from the callback ('release'
them).  SCSI EH can then reuse those commands.

We did not follow this rule to release commands upon SUCCESS; and if
later a reply arrived for one of those supposed to be released commands,
we would still make use of the scsi_cmnd in our ingress tasklet. This
will at least result in undefined behavior or a kernel panic because of
a wrong kernel pointer dereference.

To fix this, we NULLify all pointers to scsi_cmnds (struct zfcp_fsf_req
*)->data in the matching scope if a TMF was successful. This is done
under the locks (struct zfcp_adapter *)->abort_lock and (struct
zfcp_reqlist *)->lock to prevent the requests from being removed from
the request-hashtable, and the ingress tasklet from making use of the
scsi_cmnd-pointer in zfcp_fsf_fcp_cmnd_handler().

For cases where a reply arrives during SCSI EH, but before we get a
chance to NULLify the pointer - but before we return from the callback
-, we assume that the code is protected from races via the CAS operation
in blk_complete_request() that is called in scsi_done().

The following stacktrace shows an example for a crash resulting from the
previous behavior:

Unable to handle kernel pointer dereference at virtual kernel address fffffee17a672000
Oops: 0038 [#1] SMP
CPU: 2 PID: 0 Comm: swapper/2 Not tainted
task: 00000003f7ff5be0 ti: 00000003f3d38000 task.ti: 00000003f3d38000
Krnl PSW : 0404d00180000000 00000000001156b0 (smp_vcpu_scheduled+0x18/0x40)
           R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 EA:3
Krnl GPRS: 000000200000007e 0000000000000000 fffffee17a671fd8 0000000300000015
           ffffffff80000000 00000000005dfde8 07000003f7f80e00 000000004fa4e800
           000000036ce8d8f8 000000036ce8d9c0 00000003ece8fe00 ffffffff969c9e93
           00000003fffffffd 000000036ce8da10 00000000003bf134 00000003f3b07918
Krnl Code: 00000000001156a2: a7190000        lghi    %r1,0
           00000000001156a6: a7380015        lhi    %r3,21
          #00000000001156aa: e32050000008    ag    %r2,0(%r5)
          >00000000001156b0: 482022b0        lh    %r2,688(%r2)
           00000000001156b4: ae123000        sigp    %r1,%r2,0(%r3)
           00000000001156b8: b2220020        ipm    %r2
           00000000001156bc: 8820001c        srl    %r2,28
           00000000001156c0: c02700000001    xilf    %r2,1
Call Trace:
([<0000000000000000>] 0x0)
 [<000003ff807bdb8e>] zfcp_fsf_fcp_cmnd_handler+0x3de/0x490 [zfcp]
 [<000003ff807be30a>] zfcp_fsf_req_complete+0x252/0x800 [zfcp]
 [<000003ff807c0a48>] zfcp_fsf_reqid_check+0xe8/0x190 [zfcp]
 [<000003ff807c194e>] zfcp_qdio_int_resp+0x66/0x188 [zfcp]
 [<000003ff80440c64>] qdio_kick_handler+0xdc/0x310 [qdio]
 [<000003ff804463d0>] __tiqdio_inbound_processing+0xf8/0xcd8 [qdio]
 [<0000000000141fd4>] tasklet_action+0x9c/0x170
 [<0000000000141550>] __do_softirq+0xe8/0x258
 [<000000000010ce0a>] do_softirq+0xba/0xc0
 [<000000000014187c>] irq_exit+0xc4/0xe8
 [<000000000046b526>] do_IRQ+0x146/0x1d8
 [<00000000005d6a3c>] io_return+0x0/0x8
 [<00000000005d6422>] vtime_stop_cpu+0x4a/0xa0
([<0000000000000000>] 0x0)
 [<0000000000103d8a>] arch_cpu_idle+0xa2/0xb0
 [<0000000000197f94>] cpu_startup_entry+0x13c/0x1f8
 [<0000000000114782>] smp_start_secondary+0xda/0xe8
 [<00000000005d6efe>] restart_int_handler+0x56/0x6c
 [<0000000000000000>] 0x0
Last Breaking-Event-Address:
 [<00000000003bf12e>] arch_spin_lock_wait+0x56/0xb0

Suggested-by: Steffen Maier <maier@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Fixes: ea127f9754 ("[PATCH] s390 (7/7): zfcp host adapter.") (tglx/history.git)
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09 08:07:47 +01:00
Dan Carpenter
63411d9862 scsi: zfcp: spin_lock_irqsave() is not nestable
commit e7cb08e894a0b876443ef8fdb0706575dc00a5d2 upstream.

We accidentally overwrite the original saved value of "flags" so that we
can't re-enable IRQs at the end of the function.  Presumably this
function is mostly called with IRQs disabled or it would be obvious in
testing.

Fixes: aceeffbb59bb ("zfcp: trace full payload of all SAN records (req,resp,iels)")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-28 03:01:29 -04:00
Steffen Maier
84386a526e zfcp: trace full payload of all SAN records (req,resp,iels)
commit aceeffbb59bb91404a0bda32a542d7ebf878433a upstream.

This was lost with commit 2c55b750a8
("[SCSI] zfcp: Redesign of the debug tracing for SAN records.")
but is necessary for problem determination, e.g. to see the
currently active zone set during automatic port scan.

For the large GPN_FT response (4 pages), save space by not dumping
any empty residual entries.

Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: 2c55b750a8 ("[SCSI] zfcp: Redesign of the debug tracing for SAN records.")
Reviewed-by: Alexey Ishchuk <aishchuk@linux.vnet.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-28 03:01:29 -04:00
Steffen Maier
918637a5c0 zfcp: fix payload trace length for SAN request&response
commit 94db3725f049ead24c96226df4a4fb375b880a77 upstream.

commit 2c55b750a8
("[SCSI] zfcp: Redesign of the debug tracing for SAN records.")
started to add FC_CT_HDR_LEN which made zfcp dump random data
out of bounds for RSPN GS responses because u.rspn.rsp
is the largest and last field in the union of struct zfcp_fc_req.
Other request/response types only happened to stay within bounds
due to the padding of the union or
due to the trace capping of u.gspn.rsp to ZFCP_DBF_SAN_MAX_PAYLOAD.

Timestamp      : ...
Area           : SAN
Subarea        : 00
Level          : 1
Exception      : -
CPU id         : ..
Caller         : ...
Record id      : 2
Tag            : fsscth2
Request id     : 0x...
Destination ID : 0x00fffffc
Payload short  : 01000000 fc020000 80020000 00000000
                 xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx <===
                 00000000 00000000 00000000 00000000
Payload length : 32                                  <===

struct zfcp_fc_req {
    [0] struct zfcp_fsf_ct_els ct_els;
   [56] struct scatterlist sg_req;
   [96] struct scatterlist sg_rsp;
        union {
            struct {req; rsp;} adisc;    SIZE: 28+28=   56
            struct {req; rsp;} gid_pn;   SIZE: 24+20=   44
            struct {rspsg; req;} gpn_ft; SIZE: 40*4+20=180
            struct {req; rsp;} gspn;     SIZE: 20+273= 293
            struct {req; rsp;} rspn;     SIZE: 277+16= 293
  [136] } u;
}
SIZE: 432

Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: 2c55b750a8 ("[SCSI] zfcp: Redesign of the debug tracing for SAN records.")
Reviewed-by: Alexey Ishchuk <aishchuk@linux.vnet.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-28 03:01:29 -04:00
Steffen Maier
3125f1f490 zfcp: fix D_ID field with actual value on tracing SAN responses
commit 771bf03537ddfa4a4dde62ef9dfbc82e4f77ab20 upstream.

With commit 2c55b750a8
("[SCSI] zfcp: Redesign of the debug tracing for SAN records.")
we lost the N_Port-ID where an ELS response comes from.
With commit 7c7dc19681
("[SCSI] zfcp: Simplify handling of ct and els requests")
we lost the N_Port-ID where a CT response comes from.
It's especially useful if the request SAN trace record
with D_ID was already lost due to trace buffer wrap.

GS uses an open WKA port handle and ELS just a D_ID, and
only for ELS we could get D_ID from QTCB bottom via zfcp_fsf_req.
To cover both cases, add a new field to zfcp_fsf_ct_els
and fill it in on request to use in SAN response trace.
Strictly speaking the D_ID on SAN response is the FC frame's S_ID.
We don't need a field for the other end which is always us.

Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: 2c55b750a8 ("[SCSI] zfcp: Redesign of the debug tracing for SAN records.")
Fixes: 7c7dc19681 ("[SCSI] zfcp: Simplify handling of ct and els requests")
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-28 03:01:29 -04:00
Steffen Maier
aab264b6e3 zfcp: restore tracing of handle for port and LUN with HBA records
commit 7c964ffe586bc0c3d9febe9bf97a2e4b2866e5b7 upstream.

This information was lost with
commit a54ca0f62f
("[SCSI] zfcp: Redesign of the debug tracing for HBA records.")
but is required to debug e.g. invalid handle situations.

Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: a54ca0f62f ("[SCSI] zfcp: Redesign of the debug tracing for HBA records.")
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-28 03:01:29 -04:00
Steffen Maier
b5752b0db0 zfcp: trace on request for open and close of WKA port
commit d27a7cb91960cf1fdd11b10071e601828cbf4b1f upstream.

Since commit a54ca0f62f
("[SCSI] zfcp: Redesign of the debug tracing for HBA records.")
HBA records no longer contain WWPN, D_ID, or LUN
to reduce duplicate information which is already in REC records.
In contrast to "regular" target ports, we don't use recovery to open
WKA ports such as directory/nameserver, so we don't get REC records.
Therefore, introduce pseudo REC running records without any
actual recovery action but including D_ID of WKA port on open/close.

Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: a54ca0f62f ("[SCSI] zfcp: Redesign of the debug tracing for HBA records.")
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-28 03:01:29 -04:00
Steffen Maier
f9fbf66c6d zfcp: restore: Dont use 0 to indicate invalid LUN in rec trace
commit 0102a30a6ff60f4bb4c07358ca3b1f92254a6c25 upstream.

bring back
commit d21e9daa63
("[SCSI] zfcp: Dont use 0 to indicate invalid LUN in rec trace")
which was lost with
commit ae0904f60f
("[SCSI] zfcp: Redesign of the debug tracing for recovery actions.")

Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: ae0904f60f ("[SCSI] zfcp: Redesign of the debug tracing for recovery actions.")
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-28 03:01:29 -04:00
Steffen Maier
31eaad7244 zfcp: retain trace level for SCSI and HBA FSF response records
commit 35f040df97fa0e94c7851c054ec71533c88b4b81 upstream.

While retaining the actual filtering according to trace level,
the following commits started to write such filtered records
with a hardcoded record level of 1 instead of the actual record level:
commit 250a1352b9
("[SCSI] zfcp: Redesign of the debug tracing for SCSI records.")
commit a54ca0f62f
("[SCSI] zfcp: Redesign of the debug tracing for HBA records.")

Now we can distinguish written records again for offline level filtering.

Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: 250a1352b9 ("[SCSI] zfcp: Redesign of the debug tracing for SCSI records.")
Fixes: a54ca0f62f ("[SCSI] zfcp: Redesign of the debug tracing for HBA records.")
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-28 03:01:29 -04:00
Steffen Maier
a01fae0faf zfcp: close window with unblocked rport during rport gone
commit 4eeaa4f3f1d6c47b69f70e222297a4df4743363e upstream.

On a successful end of reopen port forced,
zfcp_erp_strategy_followup_success() re-uses the port erp_action
and the subsequent zfcp_erp_action_cleanup() now
sees ZFCP_ERP_SUCCEEDED with
erp_action->action==ZFCP_ERP_ACTION_REOPEN_PORT
instead of ZFCP_ERP_ACTION_REOPEN_PORT_FORCED
but must not perform zfcp_scsi_schedule_rport_register().

We can detect this because the fresh port reopen erp_action
is in its very first step ZFCP_ERP_STEP_UNINITIALIZED.

Otherwise this opens a time window with unblocked rport
(until the followup port reopen recovery would block it again).
If a scsi_cmnd timeout occurs during this time window
fc_timed_out() cannot work as desired and such command
would indeed time out and trigger scsi_eh. This prevents
a clean and timely path failover.
This should not happen if the path issue can be recovered
on FC transport layer such as path issues involving RSCNs.

Also, unnecessary and repeated DID_IMM_RETRY for pending and
undesired new requests occur because internally zfcp still
has its zfcp_port blocked.

As follow-on errors with scsi_eh, it can cause,
in the worst case, permanently lost paths due to one of:
sd <scsidev>: [<scsidisk>] Medium access timeout failure. Offlining disk!
sd <scsidev>: Device offlined - not ready after error recovery

For fix validation and to aid future debugging with other recoveries
we now also trace (un)blocking of rports.

Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: 5767620c38 ("[SCSI] zfcp: Do not unblock rport from REOPEN_PORT_FORCED")
Fixes: a2fa0aede0 ("[SCSI] zfcp: Block FC transport rports early on errors")
Fixes: 5f852be9e1 ("[SCSI] zfcp: Fix deadlock between zfcp ERP and SCSI")
Fixes: 338151e066 ("[SCSI] zfcp: make use of fc_remote_port_delete when target port is unavailable")
Fixes: 3859f6a248 ("[PATCH] zfcp: add rports to enable scsi_add_device to work again")
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-28 03:01:29 -04:00
Steffen Maier
c9f34228ac zfcp: fix ELS/GS request&response length for hardware data router
commit 70369f8e15b220f50a16348c79a61d3f7054813c upstream.

In the hardware data router case, introduced with kernel 3.2
commit 86a9668a8d ("[SCSI] zfcp: support for hardware data router")
the ELS/GS request&response length needs to be initialized
as in the chained SBAL case.

Otherwise, the FCP channel rejects ELS requests with
FSF_REQUEST_SIZE_TOO_LARGE.

Such ELS requests can be issued by user space through BSG / HBA API,
or zfcp itself uses ADISC ELS for remote port link test on RSCN.
The latter can cause a short path outage due to
unnecessary remote target port recovery because the always
failing ADISC cannot detect extremely short path interruptions
beyond the local FCP channel.

Below example is decoded with zfcpdbf from s390-tools:

Timestamp      : ...
Area           : SAN
Subarea        : 00
Level          : 1
Exception      : -
CPU id         : ..
Caller         : zfcp_dbf_san_req+0408
Record id      : 1
Tag            : fssels1
Request id     : 0x<reqid>
Destination ID : 0x00<target d_id>
Payload info   : 52000000 00000000 <our wwpn       >           [ADISC]
                 <our wwnn       > 00<s_id> 00000000
                 00000000 00000000 00000000 00000000

Timestamp      : ...
Area           : HBA
Subarea        : 00
Level          : 1
Exception      : -
CPU id         : ..
Caller         : zfcp_dbf_hba_fsf_res+0740
Record id      : 1
Tag            : fs_ferr
Request id     : 0x<reqid>
Request status : 0x00000010
FSF cmnd       : 0x0000000b               [FSF_QTCB_SEND_ELS]
FSF sequence no: 0x...
FSF issued     : ...
FSF stat       : 0x00000061		  [FSF_REQUEST_SIZE_TOO_LARGE]
FSF stat qual  : 00000000 00000000 00000000 00000000
Prot stat      : 0x00000100
Prot stat qual : 00000000 00000000 00000000 00000000

Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: 86a9668a8d ("[SCSI] zfcp: support for hardware data router")
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-28 03:01:28 -04:00
Steffen Maier
4abdfdd09e zfcp: fix fc_host port_type with NPIV
commit bd77befa5bcff8c51613de271913639edf85fbc2 upstream.

For an NPIV-enabled FCP device, zfcp can erroneously show
"NPort (fabric via point-to-point)" instead of "NPIV VPORT"
for the port_type sysfs attribute of the corresponding
fc_host.
s390-tools that can be affected are dbginfo.sh and ziomon.

zfcp_fsf_exchange_config_evaluate() ignores
fsf_qtcb_bottom_config.connection_features indicating NPIV
and only sets fc_host_port_type to FC_PORTTYPE_NPORT if
fsf_qtcb_bottom_config.fc_topology is FSF_TOPO_FABRIC.

Only the independent zfcp_fsf_exchange_port_evaluate()
evaluates connection_features to overwrite fc_host_port_type
to FC_PORTTYPE_NPIV in case of NPIV.
Code was introduced with upstream kernel 2.6.30
commit 0282985da5
("[SCSI] zfcp: Report fc_host_port_type as NPIV").

This works during FCP device recovery (such as set online)
because it performs FSF_QTCB_EXCHANGE_CONFIG_DATA followed by
FSF_QTCB_EXCHANGE_PORT_DATA in sequence.

However, the zfcp-specific scsi host sysfs attributes
"requests", "megabytes", or "seconds_active" trigger only
zfcp_fsf_exchange_config_evaluate() resetting fc_host
port_type to FC_PORTTYPE_NPORT despite NPIV.

The zfcp-specific scsi host sysfs attribute "utilization"
triggers only zfcp_fsf_exchange_port_evaluate() correcting
the fc_host port_type again in case of NPIV.

Evaluate fsf_qtcb_bottom_config.connection_features
in zfcp_fsf_exchange_config_evaluate() where it belongs to.

Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: 0282985da5 ("[SCSI] zfcp: Report fc_host_port_type as NPIV")
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-28 03:01:28 -04:00
Linus Torvalds
ca520cab25 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking and atomic updates from Ingo Molnar:
 "Main changes in this cycle are:

   - Extend atomic primitives with coherent logic op primitives
     (atomic_{or,and,xor}()) and deprecate the old partial APIs
     (atomic_{set,clear}_mask())

     The old ops were incoherent with incompatible signatures across
     architectures and with incomplete support.  Now every architecture
     supports the primitives consistently (by Peter Zijlstra)

   - Generic support for 'relaxed atomics':

       - _acquire/release/relaxed() flavours of xchg(), cmpxchg() and {add,sub}_return()
       - atomic_read_acquire()
       - atomic_set_release()

     This came out of porting qwrlock code to arm64 (by Will Deacon)

   - Clean up the fragile static_key APIs that were causing repeat bugs,
     by introducing a new one:

       DEFINE_STATIC_KEY_TRUE(name);
       DEFINE_STATIC_KEY_FALSE(name);

     which define a key of different types with an initial true/false
     value.

     Then allow:

       static_branch_likely()
       static_branch_unlikely()

     to take a key of either type and emit the right instruction for the
     case.  To be able to know the 'type' of the static key we encode it
     in the jump entry (by Peter Zijlstra)

   - Static key self-tests (by Jason Baron)

   - qrwlock optimizations (by Waiman Long)

   - small futex enhancements (by Davidlohr Bueso)

   - ... and misc other changes"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (63 commits)
  jump_label/x86: Work around asm build bug on older/backported GCCs
  locking, ARM, atomics: Define our SMP atomics in terms of _relaxed() operations
  locking, include/llist: Use linux/atomic.h instead of asm/cmpxchg.h
  locking/qrwlock: Make use of _{acquire|release|relaxed}() atomics
  locking/qrwlock: Implement queue_write_unlock() using smp_store_release()
  locking/lockref: Remove homebrew cmpxchg64_relaxed() macro definition
  locking, asm-generic: Add _{relaxed|acquire|release}() variants for 'atomic_long_t'
  locking, asm-generic: Rework atomic-long.h to avoid bulk code duplication
  locking/atomics: Add _{acquire|release|relaxed}() variants of some atomic operations
  locking, compiler.h: Cast away attributes in the WRITE_ONCE() magic
  locking/static_keys: Make verify_keys() static
  jump label, locking/static_keys: Update docs
  locking/static_keys: Provide a selftest
  jump_label: Provide a self-test
  s390/uaccess, locking/static_keys: employ static_branch_likely()
  x86, tsc, locking/static_keys: Employ static_branch_likely()
  locking/static_keys: Add selftest
  locking/static_keys: Add a new static_key interface
  locking/static_keys: Rework update logic
  locking/static_keys: Add static_key_{en,dis}able() helpers
  ...
2015-09-03 15:46:07 -07:00
Peter Senna Tschudin
3b97487445 s390: remove unneeded semicolon
Remove unneeded semicolon.

The semantic patch that detects this change is available
at scripts/coccinelle/misc/semicolon.cocci.

Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-08-07 09:57:00 +02:00
Peter Zijlstra
805de8f43c atomic: Replace atomic_{set,clear}_mask() usage
Replace the deprecated atomic_{set,clear}_mask() usage with the now
ubiquous atomic_{or,andnot}() functions.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-27 14:06:24 +02:00
Hannes Reinecke
b84b1d522f scsi: Do not set cmd_per_lun to 1 in the host template
'0' is now used as the default cmd_per_lun value,
so there's no need to explicitly set it to '1' in the
host template.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
2015-05-31 18:06:28 -07:00
David Rientjes
11d8336045 mm, mempool: do not allow atomic resizing
Allocating a large number of elements in atomic context could quickly
deplete memory reserves, so just disallow atomic resizing entirely.

Nothing currently uses mempool_resize() with anything other than
GFP_KERNEL, so convert existing callers to drop the gfp_mask.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Steffen Maier <maier@linux.vnet.ibm.com>	[zfcp]
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Steve French <sfrench@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14 16:49:06 -07:00
Christoph Hellwig
db5ed4dfd5 scsi: drop reason argument from ->change_queue_depth
Drop the now unused reason argument from the ->change_queue_depth method.
Also add a return value to scsi_adjust_queue_depth, and rename it to
scsi_change_queue_depth now that it can be used as the default
->change_queue_depth implementation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Reviewed-by: Hannes Reinecke <hare@suse.de>
2014-11-24 14:45:27 +01:00
Christoph Hellwig
c40ecc12cf scsi: avoid ->change_queue_depth indirection for queue full tracking
All drivers use the implementation for ramping the queue up and down, so
instead of overloading the change_queue_depth method call the
implementation diretly if the driver opts into it by setting the
track_queue_depth flag in the host template.

Note that a few drivers validated the new queue depth in their
change_queue_depth method, but as we never go over the queue depth
set during slave_configure or the sysfs file this isn't nessecary
and can safely be removed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Venkatesh Srinivas <venkateshs@google.com>
2014-11-24 14:45:12 +01:00
Martin Peschke
18f87a67e6 zfcp: auto port scan resiliency
This patch improves the Fibre Channel port scan behaviour of the zfcp lldd.
Without it the zfcp device driver may churn up the storage area network by
excessive scanning and scan bursts, particularly in big virtual server
environments, potentially resulting in interference of virtual servers and
reduced availability of storage connectivity.

The two main issues as to the zfcp device drivers automatic port scan in
virtual server environments are frequency and simultaneity.
On the one hand, there is no point in allowing lots of ports scans
in a row. It makes sense, though, to make sure that a scan is conducted
eventually if there has been any indication for potential SAN changes.
On the other hand, lots of virtual servers receiving the same indication
for a SAN change had better not attempt to conduct a scan instantly,
that is, at the same time.

Hence this patch has a two-fold approach for better port scanning:
the introduction of a rate limit to amend frequency issues, and the
introduction of a short random backoff to amend simultaneity issues.
Both approaches boil down to deferred port scans, with delays
comprising parts for both approaches.

The new port scan behaviour is summarised best by:

                                               NEW:    NEW:
                          no_auto_port_rescan  random  rate    flush
                                               backoff limit   =wait

adapter resume/thaw       yes                  yes     no      yes*
adapter online (user)     no                   yes     no      yes*
port rescan (user)        no                   no      no      yes
adapter recovery (user)   yes                  yes     yes     no
adapter recovery (other)  yes                  yes     yes     no
incoming ELS              yes                  yes     yes     no
incoming ELS lost         yes                  yes     yes     no

Implementation is straight-forward by converting an existing worker to
a delayed worker. But care is needed whenever that worker is going to be
flushed (in order to make sure work has been completed), since a flush
operation cancels the timer set up for deferred execution (see * above).

There is a small race window whenever a port scan work starts
running up to the point in time of storing the time stamp for that port
scan. The impact is negligible. Closing that gap isn't trivial, though, and
would the destroy the beauty of a simple work-to-delayed-work conversion.

Signed-off-by: Martin Peschke <mpeschke@linux.vnet.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-11-20 09:11:30 +01:00
Steffen Maier
c8bba14435 zfcp: bring back unit sysfs attributes for automatic LUN scan
Through sysfs attributes, zfcp unit objects
provide a trigger for manual LUN recovery
and export information for problem determination.

With commit
f8210e3488
"[SCSI] zfcp: Allow midlayer to scan for LUNs when running in NPIV mode"
and when attaching SCSI devices through this new optional method,
no more zfcp unit objects are allocated for such SCSI devices.
Hence, the above-mentioned trigger and information were missing.

The information and context is located in SCSI transport device data since
b62a8d9b45
"[SCSI] zfcp: Use SCSI device data zfcp_scsi_dev instead of zfcp_unit"
57c237731b
"[SCSI] zfcp: Add zfcp private struct as SCSI device driver data"
Hence, introduce the trigger and the information unconditionally
for all SCSI devices attached through zfcp.

We prefix the attribute names with 'zfcp_' to prevent collisions and
to avoid mix-ups such as with the common 'state' attribute.

Since some of the new attribute views do not need zfcp_port
in the ZFCP_DEFINE_SCSI_ATTR helper macro, remove zfcp_port
to avoid compiler warnings on unused variable.
It's easy to open code the conversion from zfcp_scsi_dev to zfcp_port
for the two already existing attributes hba_id and wwpn.

Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Reviewed-by: Martin Peschke <mpeschke@linux.vnet.ibm.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-11-20 09:11:29 +01:00
Martin Peschke
1b33ef2394 zfcp: remove access control tables interface (port leftovers)
This patch removes some leftovers for commit
663e0890e3
"[SCSI] zfcp: remove access control tables interface".

The "access denied" case for ports is gone, as well.
The corresponding flag was cleared, but never set.
So clean it up.

Sysfs flag is kept, though, for backward-compatibility.
Now it returns always 0.

Signed-off-by: Martin Peschke <mpeschke@linux.vnet.ibm.com>
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-11-20 09:11:28 +01:00
Christoph Hellwig
c8b09f6fb6 scsi: don't set tagging state from scsi_adjust_queue_depth
Remove the tagged argument from scsi_adjust_queue_depth, and just let it
handle the queue depth.  For most drivers those two are fairly separate,
given that most modern drivers don't care about the SCSI "tagged" status
of a command at all, and many old drivers allow queuing of multiple
untagged commands in the driver.

Instead we start out with the ->simple_tags flag set before calling
->slave_configure, which is how all drivers actually looking at
->simple_tags except for one worke anyway.  The one other case looks
broken, but I've kept the behavior as-is for now.

Except for that we only change ->simple_tags from the ->change_queue_type,
and when rejecting a tag message in a single driver, so keeping this
churn out of scsi_adjust_queue_depth is a clear win.

Now that the usage of scsi_adjust_queue_depth is more obvious we can
also remove all the trivial instances in ->slave_alloc or ->slave_configure
that just set it to the cmd_per_lun default.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
2014-11-12 11:19:43 +01:00
Christoph Hellwig
5066863337 scsi: remove abuses of scsi_populate_tag
Unless we want to build a SPI tag message we should just check SCMD_TAGGED
instead of reverse engineering a tag type through the use of
scsi_populate_tag_msg.

Also rename the function to spi_populate_tag_msg, make it behave like the
other spi message helpers, and move it to the spi transport class.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Reviewed-by: Hannes Reinecke <hare@suse.de>
2014-11-12 11:19:41 +01:00
Linus Torvalds
ebb067d2f4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:
 "Mostly cleanups and bug-fixes, with two exceptions.

  The first is lazy flushing of I/O-TLBs for PCI to improve performance,
  the second is software dirty bits in the pmd for the madvise-free
  implementation"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (24 commits)
  s390/locking: Reenable optimistic spinning
  s390/mm: implement dirty bits for large segment table entries
  KVM: s390/mm: Fix page table locking vs. split pmd lock
  s390/dasd: fix camel case
  s390/3215: fix hanging console issue
  s390/irq: improve displayed interrupt order in /proc/interrupts
  s390/seccomp: fix error return for filtered system calls
  s390/pci: introduce lazy IOTLB flushing for DMA unmap
  dasd: fix error recovery for alias devices during format
  dasd: fix list_del corruption during format
  dasd: fix unresponsive device during format
  dasd: use aliases for formatted devices during format
  s390/pci: fix kmsg component
  s390/kdump: Return NOTIFY_OK for all actions other than MEM_GOING_OFFLINE
  s390/watchdog: Fix module name in Kconfig help text
  s390/dasd: replace seq_printf by seq_puts
  s390/dasd: replace pr_warning by pr_warn
  s390/dasd: Move EXPORT_SYMBOL after function/variable
  s390/dasd: remove unnecessary null test before debugfs_remove
  s390/zfcp: use qdio buffer helpers
  ...
2014-08-07 08:41:00 -07:00
Sebastian Ott
852eb1aa2b s390/zfcp: use qdio buffer helpers
Use qdio buffer helpers to manage the buffers used for the request
and response queues.

No functional change.

Reviewed-by: Steffen Maier <maier@linux.vnet.ibm.com>
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-07-22 09:26:16 +02:00
Hannes Reinecke
9cb78c16f5 scsi: use 64-bit LUNs
The SCSI standard defines 64-bit values for LUNs, and large arrays
employing large or hierarchical LUN numbers become more and more
common.

So update the linux SCSI stack to use 64-bit LUN numbers.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Ewan Milne <emilne@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2014-07-17 22:07:37 +02:00
Linus Torvalds
9073e1a804 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina:
 "Usual earth-shaking, news-breaking, rocket science pile from
  trivial.git"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (23 commits)
  doc: usb: Fix typo in Documentation/usb/gadget_configs.txt
  doc: add missing files to timers/00-INDEX
  timekeeping: Fix some trivial typos in comments
  mm: Fix some trivial typos in comments
  irq: Fix some trivial typos in comments
  NUMA: fix typos in Kconfig help text
  mm: update 00-INDEX
  doc: Documentation/DMA-attributes.txt fix typo
  DRM: comment: `halve' -> `half'
  Docs: Kconfig: `devlopers' -> `developers'
  doc: typo on word accounting in kprobes.c in mutliple architectures
  treewide: fix "usefull" typo
  treewide: fix "distingush" typo
  mm/Kconfig: Grammar s/an/a/
  kexec: Typo s/the/then/
  Documentation/kvm: Update cpuid documentation for steal time and pv eoi
  treewide: Fix common typo in "identify"
  __page_to_pfn: Fix typo in comment
  Correct some typos for word frequency
  clk: fixed-factor: Fix a trivial typo
  ...
2013-11-15 16:47:22 -08:00
Hendrik Brueckner
8e6a828566 s390/s390dbf: use debug_level_enabled() where applicable
Refactor direct debug level comparisons with the (internal) s390db->level
member.  Use the debug_level_enabled() function instead.

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-10-24 17:16:54 +02:00
Maxime Jayat
3f79410c7c treewide: Fix common typo in "identify"
Correct common misspelling of "identify" as "indentify" throughout
the kernel

Signed-off-by: Maxime Jayat <maxime@artisandeveloppeur.fr>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-10-14 15:31:06 +02:00