Solved: WAAS disk state 'abnormal'

cstockwe · ‎01-09-2008

Hi

One of our core WAEs is reporting a disk error but we cannot seem to correct it.

"show disk details" output is as follows:

Software RAID devices:

DEVICE NAME TYPE STATUS PHYSICAL DEVICES AND STATUS

/dev/md0 RAID-1 NORMAL OPERATION disk00/00[GOOD] disk01/00[GOOD]

/dev/md1 RAID-1 NORMAL OPERATION disk00/01[GOOD] disk01/01[GOOD]

/dev/md2 RAID-1 ONE OR MORE DRIVES ABNORMAL disk01/02[GOOD]

/dev/md3 RAID-1 NORMAL OPERATION disk00/03[GOOD] disk01/03[GOOD]

/dev/md4 RAID-1 ONE OR MORE DRIVES ABNORMAL disk01/04[GOOD]

/dev/md5 RAID-1 ONE OR MORE DRIVES ABNORMAL disk01/05[GOOD]

/dev/md6 RAID-1 ONE OR MORE DRIVES ABNORMAL disk01/06[GOOD]

The output of the 'show alarms crit detail support' results in 'none'.

I ran the disk_check.sh script (as we plan to upgrade this WAE first from 4.0.13.b.12 to the 4.0.15 release but that check passed ok:

#type disk_status.txt

Thu Jan 10 11:06:28 EST 2008

device /dev/md1 (/swstore) is OK

device /dev/md0 (/sw) is OK

device /dev/md2 (/state) is OK

device /dev/md6 (/local/local1/spool) is OK

device /dev/md5 (/local/local1) is OK

device /dev/md4 (/disk00-04) is OK

Question: Is there anything we can do to remove the 'abnormal' state? Is it safe to proceed with the software upgrade?

Thanks!

Cameron

Zach Seils · ‎01-15-2008

Cameron,

You can try the following process prior to replacing disk00:

1. From config mode, remove disk00 from the RAID array:

di d disk00 s

2. From config mode, re-add disk00 to the RAID array:

no di d disk00 s f

3. You will be asked to reload if this is a WAE-512, otherwise the disk should be added back into the array.

If this does not correct the disk state, I would recommend replacing the physical drive.

Zach

View solution in original post

Zach Seils · ‎01-09-2008

Cameron,

Was the drive recently replaced? Can you please provide the output from the command 'sh di t d'.

Thanks,

Zach

cstockwe · ‎01-14-2008

Zach,

Thanks for your reply. Here is the output:

sh disks tech-support details

=== disk00 ===

Device: IBM-ESXS ST3300555SS Version: BA33

Serial number: 3LM1JY51000098037C11

Device type: disk

Transport protocol: SAS

Local Time is: Tue Jan 15 08:10:26 2008 EST

Device supports SMART and is Enabled

Temperature Warning Enabled

SMART Health Status: OK

Current Drive Temperature: 39 C

Drive Trip Temperature: 68 C

Vendor (Seagate) cache information

Blocks sent to initiator = 652770078

Blocks received from initiator = 666414147

Blocks read from cache and sent to initiator = 8823800

Number of read and write commands whose size <= segment size = 3463715

Number of read and write commands whose size > segment size = 0

Error counter log:

Errors Corrected Total Total Correction Gigabytes Total

delay: [rereads/ errors algorithm processed uncorrected

minor | major rewrites] corrected invocations [10^9 bytes] errors

write: 0 0 0 0 0 341.940 0

verify: 194 0 0 194 194 0.675 0

Non-medium error count: 1

=== disk01 ===

Device: IBM-ESXS ST3300555SS Version: BA33

Serial number: 3LM1GA1Q00009803ZAFF

Device type: disk

Transport protocol: SAS

Local Time is: Tue Jan 15 08:10:28 2008 EST

Device supports SMART and is Enabled

Temperature Warning Enabled

SMART Health Status: OK

Current Drive Temperature: 36 C

Drive Trip Temperature: 68 C

Vendor (Seagate) cache information

Blocks sent to initiator = 1262471149

Blocks received from initiator = 309317146

Blocks read from cache and sent to initiator = 14889473

Number of read and write commands whose size <= segment size = 18130992

Number of read and write commands whose size > segment size = 0

Error counter log:

Errors Corrected Total Total Correction Gigabytes Total

delay: [rereads/ errors algorithm processed uncorrected

minor | major rewrites] corrected invocations [10^9 bytes] errors

write: 0 0 0 0 0 159.395 0

verify: 552 0 0 552 552 2.814 0

Non-medium error count: 0

We've successfully applied the 4.0.15 software to this WAE in any case, and it seems to operates normally.

No disks have been replaced as far as I am aware.

Thanks for your time Zach.

Cheers

Cameron

Zach Seils · ‎01-15-2008

Cameron,

You can try the following process prior to replacing disk00:

1. From config mode, remove disk00 from the RAID array:

di d disk00 s

2. From config mode, re-add disk00 to the RAID array:

no di d disk00 s f

3. You will be asked to reload if this is a WAE-512, otherwise the disk should be added back into the array.

If this does not correct the disk state, I would recommend replacing the physical drive.

Zach

cstockwe · ‎01-16-2008

Many thanks Zach - that fixed it!

PHYSICAL DEVICES AND STATUS

/dev/md0 RAID-1 NORMAL OPERATION disk00/00[GOOD] disk01/00[GOOD]

/dev/md1 RAID-1 NORMAL OPERATION disk00/01[GOOD] disk01/01[GOOD]

/dev/md2 RAID-1 NORMAL OPERATION disk00/02[GOOD] disk01/02[GOOD]

/dev/md3 RAID-1 NORMAL OPERATION disk00/03[GOOD] disk01/03[GOOD]

/dev/md4 RAID-1 NORMAL OPERATION disk00/04[GOOD] disk01/04[GOOD]

/dev/md5 RAID-1 NORMAL OPERATION disk00/05[GOOD] disk01/05[GOOD]

/dev/md6 RAID-1 NORMAL OPERATION disk00/06[GOOD] disk01/06[GOOD]

Zach Seils · ‎01-15-2008

Cameron,

You can try the following process prior to replacing disk00:

1. From config mode, remove disk00 from the RAID array:

di d disk00 s

2. From config mode, re-add disk00 to the RAID array:

no di d disk00 s f

3. You will be asked to reload if this is a WAE-512, otherwise the disk should be added back into the array.

If this does not correct the disk state, I would recommend replacing the physical drive.

Zach