Monday, January 21, 2013

"errDisabled" state on fibre channel port

Last week I received an alert that one of our HP servers had lost a path to its SAN disk, but luckily, there were 4 paths (1 active, 3 redundant) so there were no outages. The first thing I did was check our SAN and what I saw was a bunch of warnings messages:


Checking the connectivity status in Unisphere, I can see that there is an issue with the SPA-0 port as none of our hosts are logged in through that port.


At this point, I was pretty sure there was an issue with the SFP on the EMC or the MDS switch that it was connected to. I logged into the MDS switch where the SPA-0 port was connected to and did a "sh interface brief". The interface had a status of "errDisabled" and checking the logs on the switch gave the following errors:

2013 Jan 14 22:46:46 MDS01 %PORT-5-IF_DOWN_BIT_ERR_RT_THRES_EXCEEDED: %$VSAN 10%$ Interface fc1/3 is down (Error disabled - bit error rate too high)  

2013 Jan 14 22:46:46 MDS01 %MODULE-4-MOD_WARNING: Module 1 (serial: JAF153
7ARCA) reported warning on ports 1/3-1/3 (Fibre Channel) due to MAC Bit error exceeded threshold in device 92 (device error 0xc5c00503)


After a couple of google searches, the errors indicated seem to be associated with a layer 1 error. However, doing a "sh interface fc1/3 counters" command confirmed that there were no CRC errors.

fc1/3
    5 minutes input rate 0 bits/sec, 0 bytes/sec, 0 frames/sec
    5 minutes output rate 0 bits/sec, 0 bytes/sec, 0 frames/sec
    176164352568 frames input, 318207931191012 bytes
      0 class-2 frames, 0 bytes
      176164352568 class-3 frames, 318207931191012 bytes
      0 class-f frames, 0 bytes
      0 discards, 0 errors, 0 CRC
      0 unknown class, 0 too long, 0 too short
    74714008703 frames output, 134557513043104 bytes
      0 class-2 frames, 0 bytes
      74714008703 class-3 frames, 134557513043104 bytes
      0 class-f frames, 0 bytes
      72 discards, 0 errors
    42 input OLS, 3 LRR, 0 NOS, 0 loop inits
    11 output OLS, 2 LRR, 49 NOS, 0 loop inits
    1 link failures, 5 sync losses, 1 signal losses
     2975871457 transmit B2B credit transitions from zero
     9404246064 receive B2B credit transitions from zero
      16 receive B2B credit remaining
      0 transmit B2B credit remaining
      0 low priority transmit B2B credit remaining


At this point, I opened up a SR with Cisco TAC. What the engineer noticed that I didn't was the large amount of "B2B credit transitions from zero". When we cleared the counters, we can see that the buffer to buffer receive credits increased rapidly (over 1 million in less than 10 minutes). He then did a "sh int fc1/3 transceiver details" command but we did not see any indications of Tx or Rx errors in there. At this point, we deduced that there were issues with the SFP on the EMC. I then contacted EMC and it was verified that there was an issue with the SFP.


A       01/14/13 22:52:52 scsitarg         7117001b SFP Diagnostic condition "Received power low alarm" detected for physical port 0 .

The SFP was then replaced and a restart of the fc1/3 port on the MDS switch resolved the issue. In general, this problem indicates a layer 1 issue so try to replace the SFPs and FC cable on the affected ports.


3 comments:

  1. Thank you for sharing ,we got same error on the MDS let me check in storage box port status .

    ReplyDelete
  2. Hi ,
    i got same error on the port and verified in logs and find same error but if i execute the command "sh interface fc1/3 counters" . i see that 0 discards, 1 errors, 0 CRC.
    ----------
    fc1/6
    5 minutes input rate 0 bits/sec, 0 bytes/sec, 0 frames/sec
    5 minutes output rate 0 bits/sec, 0 bytes/sec, 0 frames/sec
    13 frames input, 976 bytes
    0 class-2 frames, 0 bytes
    13 class-3 frames, 976 bytes
    0 class-f frames, 0 bytes
    0 discards, 1 errors, 0 CRC
    0 unknown class, 0 too long, 1 too short
    33 frames output, 1736 bytes
    0 class-2 frames, 0 bytes
    33 class-3 frames, 1736 bytes
    0 class-f frames, 0 bytes
    0 discards, 0 errors
    1 input OLS, 1 LRR, 0 NOS, 12 loop inits
    2 output OLS, 0 LRR, 1 NOS, 3 loop inits
    0 link failures, 1 sync losses, 0 signal losses
    5 BB credit transitions from zero
    16 receive B2B credit remaining
    0 transmit B2B credit remaining
    0 low priority transmit B2B credit remaining

    ----------
    Here what may be the problem .please suggest us.

    ReplyDelete
  3. Do a shut/no shut on the interface to bring it back up. If it happens again, try another fibre cable/interface as this appears to be a layer 1 issue.

    ReplyDelete