cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1889
Views
5
Helpful
5
Replies

waas 7341 disk drive problem

DARYLE DIANIS
Level 1
Level 1

My disk00 went defunct and there is no /local file system present. If disk00 is part of the array, why would this failure cause the waas to go offline or not find /local? TAC is suggesting that I need to rebuild the box from the rescue cd and I want to avoid that.

thanks,

5 Replies 5

sachinga.hcl
Level 4
Level 4

Hi ,

You can obtain a detailed software RAID status, which includes the disk utilization, the number of disks on the WAE, and operational status, by using the EXEC command show disks details. The following example shows a two-disk system in which the disks are operating normally:

WAE# show disks details (Send me the output of the following command)

Physical disk information:

disk00: Normal (h00 c00 i00 l00 - DAS) 76324MB( 74.5GB)

disk01: Normal (h01 c00 i00 l00 - DAS) 76324MB( 74.5GB)

Mounted filesystems:

MOUNT POINT TYPE DEVICE SIZE INUSE FREE USE%

/ root /dev/root 34MB 28MB 6MB 82%

/swstore internal /dev/md1 495MB 212MB 283MB 42%

/state internal /dev/md2 4031MB 65MB 3966MB 1%

/disk00-04 WAFSFS /dev/md4 63035MB 32MB 63003MB 0%

/local/local1 SYSFS /dev/md5 3967MB 313MB 3654MB 7%

.../local1/spool PRINTSPOOL /dev/md6 991MB 16MB 975MB 1%

/sw internal /dev/md0 991MB 289MB 702MB 29%

Software RAID devices:

DEVICE NAME TYPE STATUS PHYSICAL DEVICES AND STATUS

/dev/md0 RAID-1 NORMAL OPERATION disk00/00[GOOD] disk01/00[GOOD]

/dev/md1 RAID-1 NORMAL OPERATION disk00/01[GOOD] disk01/01[GOOD]

/dev/md2 RAID-1 NORMAL OPERATION disk00/02[GOOD] disk01/02[GOOD]

/dev/md3 RAID-1 NORMAL OPERATION disk00/03[GOOD] disk01/03[GOOD]

/dev/md4 RAID-1 NORMAL OPERATION disk00/04[GOOD] disk01/04[GOOD]

/dev/md5 RAID-1 NORMAL OPERATION disk00/05[GOOD] disk01/05[GOOD]

/dev/md6 RAID-1 NORMAL OPERATION disk00/06[GOOD] disk01/06[GOOD]

Currently SW-RAID is not configured to change.

More common than a total disk failure is a partial disk failure. When errors occur in one or a few sectors of a disk, a partial failure has occurred. Because the RAID devices are configured on a partition by partition basis, some partitions may continue to operate using the respective disk partitions from both disk drives. Sector errors are typically detected when the software attempts to read an affected sector. After retrying the read operation internally a number of times, the disk drive eventually gives up, and returns an error to the operating system and the RAID driver code. The kernel RAID-1 code stops accessing the affected physical partition. The notification of these errors is visible using the show disks details EXEC command.

To attempt recovery from a partial disk failure, follow these steps:

--------------------------------------------------------------------------------

Step 1 Review the syslog.txt file or run the show alarms critical EXEC command to determine the name of the disk drive experiencing the errors.

Step 2 Run the disk delete-partitions EXEC command on the drive with the failures.

Step 3 Reboot the WAE using the reload EXEC command.

--------------------------------------------------------------------------------

Upon reboot, the standard RAID-1 resynchronization is performed. Resynchronization overwrites all of the failed drive's contents, giving the disk drive a chance to remap any bad sectors. If additional disk I/O errors subsequently occur in a short period of time, or if the disk drive cannot be detected by the software after a reboot, the disk drive has probably failed past the point of repair, and replacement is needed.

On systems with SCSI disk drives, another recovery option for partial failures is available. This option involves marking the disk as bad and reformatting the disk.

Contd 2....

page 2....

To attempt recovery from a partical disk failure on a system with SCSI drives, follow these steps:

--------------------------------------------------------------------------------

Step 1 Review the syslog.txt file or run the show alarms critical EXEC command to determine the name of the disk drive experiencing the errors.

Step 2 Run the disk mark diskname bad EXEC command on the drive with errors.

Step 3 Reboot the WAE using the reload EXEC command.

Step 4 Run the disk reformat diskname EXEC command.

Step 5 Reboot the WAE using the reload EXEC command.

--------------------------------------------------------------------------------

Note This process removes all data and the partition table on the specified disk. The standard RAID-1 resynchronization is performed after the second reboot.

Kindly refer following documnet for recovering from disk failures:

http://www.cisco.com/en/US/docs/app_ntwk_services/waas/wafs/v30/configuration/guide/sysparms.html

You must run a script (the WAAS disk check tool) that checks the file system for errors that can result from a RAID synchronization failure.

You can obtain the WAAS disk check tool from the following URL:

http://www.cisco.com/pcgi-bin/tablebuild.pl/waas40

When you run the WAAS disk check tool, you will be logged out of the device. The device automatically reboots after it has completed checking the file system. Because this operation results in a reboot, we recommend that you perform this operation after normal business hours.

Copy the script to your WAE device by using the copy ftp disk command.

WAE# copy ftp disk disk_check.sh

Run the script from the CLI, as shown in the following example:

WAE# script execute disk_check.sh

This script will check if there is any file system issue on the attached disks

Activating the script will result in:

Stopping all services. This will log you out.

Perform file system check for few minutes.

and record the result in the following files:

/local1/disk_status.txt - result summary

/local1/disk_check_log.txt - detailed log

System reboot

If the system doesn't reboot in 10 minutes, please re-login and check the result files.

Continue?[yes/no] yes

Please disk_status.txt after reboot for result summary

umount: /state: device is busy

umount: /local/lPAM_unix[26162]: ### pam_unix: pam_sm_close_session (su) session closed

for user root

waitpid returns error: No child processes

No child alive.

After the device reboots and you log in, locate and open the following two files to view the file system status:

•disk_status.txt- Lists each file system and shows if it is "OK," or if it contains an error that requires attention.

•disk_check_log.txt-Contains a detailed log for each file system checked.

If no repair is needed, then each file system will be listed as "OK," as shown in the following example:

WAE# type disk_status.txt

Thu Feb 1 00:40:01 UTC 2007

device /dev/md1 (/swstore) is OK

device /dev/md0 (/sw) is OK

device /dev/md2 (/state) is OK

device /dev/md6 (/local/local1/spool) is OK

device /dev/md5 (/local/local1) is OK

device /dev/md4 (/disk00-04) is OK

If any file system contains errors, the disk_status.txt file instructs you to repair it.

If an upgrade cannot be performed immediately, the customer should reload the system after the RAID resync is complete. RAID resync can be checked in the sh disk details output.

Kindly share your opinion with me if is any useful to you or tell me what I can do further to resolve it for you.

Sachin Garg

hi sachin, thanks for the very complete reply. TAC chose to replace the drive, the array rebuilt itself and I am up and running again.

Hi ,

Thanks for your quick response as your rating is very valuable to me.

Kind Regards,

Sachin garg

WAE-7341-K9 :

We are having high disk utilization issue but all disk healthy. Is there any issue due to this ? How can we reduce it ?

RAID Physical disk information:
disk00: Online BJ5008GG 286102 MB
disk01: Online 3LM0L256 286102 MB
disk02: Online 3LM3APQR 286102 MB
disk03: Online J8W39LMC 286102 MB

RAID Logical drive information:
raid-disk: RAID-5 Okay 857075 MB
Enabled (read-cache) Enabled (write-back)

Mounted file systems:
MOUNT POINT TYPE DEVICE SIZE INUSE FREE USE%
/sw internal /dev/sda1 991MB 927MB 64MB 93%
/swstore internal /dev/sda2 991MB 533MB 458MB 53%
/state internal /dev/sda3 7935MB 250MB 7685MB 3%
/local/local1 SYSFS /dev/sda6 33229MB 15226MB 18003MB 45%
/state/likewise/sw
internal /dev/sda1 991MB 927MB 64MB 93%
/state/likewise/local/local1
internal /dev/sda6 33229MB 15226MB 18003MB 45%
/local/local1/spool
PRINTSPOOL /dev/data1/spool 991MB 32MB 959MB 3%
/obj1 CONTENT /dev/data1/obj 224176MB 180233MB 43943MB 80%
/ackq1 internal /dev/data1/ackq 2379MB 102MB 2277MB 4%
/plz1 internal /dev/data1/plz 7141MB 0MB 7141MB 0%
Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: