cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
3425
Views
5
Helpful
13
Replies

VSMS recover from failure

Danny Mainprize
Level 1
Level 1

Once again I had multiple hard drives fail all at once that forced me to replace them and do a factory restore (time number 6).  I got Suse reinstalled, updated the VSMS server to 6.3.3, restored from backup.  Now the VSOM server sees the VSMS as up, but cannot see a version number, capacity, and says no repositories avaliable.  The VSMS server has 2 local repositories defined and active.

Has anyone seen this before and have any idea how to resolve it?

Thanks for any ideas/suggestions.

1 Accepted Solution

Accepted Solutions

driker023
Level 1
Level 1

Danny, I have VSOM on seperate server with 55 Media servers and yes I have seen this countless times. The first thing you should do is UPGRADE FW package on your servers to 11.0.1-0046 <-- this will fix lot of your issues with muiltple drive failures.

If this issue happens after FW upgrade follow these procedures. Because more then likely your operating System is still intact and all your data is still GOOD. Beleive me I fixed this many many times

If you have hard drive fail and after rebuild second drive fails, go into WebBios CTRL-H and look at drive states normally one drive will say OFFLINE and other FAILED. Make offlline drive - ONLINE and reboot server. After reboot ,WebBios check state of hard drives Which will probably say Unconfigured Spun Up.

Command MegaCli -cfgForeign -Scan -aALL

Command MegaCli -cfgForeign -Clear -aALL

Now make Foreign drive as Hot Spare

Comamnd MegaCli -PDHSP -Set -Physdrv \[E:S\] -aALL

E - Enclosure ID S - Slot drive number

Hope this helps because I know your pain times ten !!!!!!!!!!!!!!

View solution in original post

13 Replies 13

Scott Olsen
Level 6
Level 6

Is this a co-located server?  i.e. - Are both VSOM and VSMS running on the same server?

Best place to start for something like this is to ensure you have the proper host entries in /etc/hosts.

Usually ends up looking something like:

# special IPv6 addresses

::1             ipv6-localhost ipv6-loopback

fe00::0         ipv6-localnet

ff00::0         ipv6-mcastprefix

ff02::1         ipv6-allnodes

ff02::2         ipv6-allrouters

ff02::3         ipv6-allhosts

172.19.171.23   linux.site linux

172.19.171.23   BSIFREVS3.bulletproofsi.com BSIFREVS3

172.19.171.23   VSM7.bulletproofsi.com VSM7

BSIFREVS3:/etc #

Wouldn't hurt to punt services after this change, or just perform a full reboot.

Hope this helps.

Cheers.

Scott Olsen Solutions Specialist Bulletproof Solutions Inc. Web: www.bulletproofsi.com

I forgot to update this posting.  Turns out the ethernet port on the VSMS server was failing.  Swapped over to the other port and everything picked right back up.

Another issue on my lemon server

Thanks for reading and the reply.

driker023
Level 1
Level 1

Danny, I have VSOM on seperate server with 55 Media servers and yes I have seen this countless times. The first thing you should do is UPGRADE FW package on your servers to 11.0.1-0046 <-- this will fix lot of your issues with muiltple drive failures.

If this issue happens after FW upgrade follow these procedures. Because more then likely your operating System is still intact and all your data is still GOOD. Beleive me I fixed this many many times

If you have hard drive fail and after rebuild second drive fails, go into WebBios CTRL-H and look at drive states normally one drive will say OFFLINE and other FAILED. Make offlline drive - ONLINE and reboot server. After reboot ,WebBios check state of hard drives Which will probably say Unconfigured Spun Up.

Command MegaCli -cfgForeign -Scan -aALL

Command MegaCli -cfgForeign -Clear -aALL

Now make Foreign drive as Hot Spare

Comamnd MegaCli -PDHSP -Set -Physdrv \[E:S\] -aALL

E - Enclosure ID S - Slot drive number

Hope this helps because I know your pain times ten !!!!!!!!!!!!!!

Daniel, 

Can you elaborate on the FW package you mentioned (11.0.1-0046)?  I'm assuming this is a firmware upgrade for the discrete RAID adapter in the server?  Which RAID HBA model?  What Cisco chassis model?

Scott Olsen Solutions Specialist Bulletproof Solutions Inc. Web: www.bulletproofsi.com

The LSI Megaraid FW upgrade for Raid controller for CIVS-MSP-4RU media servers.The eariler FW packages 11.0.1-0036 has issues with failing hard drives that are known good drives. The upgrade to 11.0.1-0046 will fix lot of issues like this which is simple upgrade process. If you need the firmware zip file it can be found on LSI Megaraid website or I can upload to your FTP

I do recommend first opening up VSMC page and performing Media server backup

  1. Winscp drop Zip file on root
  2. open Putty run command chmod 777 firmware_upgrade.sh
  3. ./firmware_upgrade.sh

Best Regards,

Thanks for the firmware tip.  I've noted several Megaraid firmware issues in the past with TAC, but they didn't advise upgrading the firmware on the chassis.  Most notably 'failure lights lit on slots that aren't in use' (FW bug), and we've had two events where the RAID adapter firmware simply 'hung' and stopped responding. (FYI - What appears to happen in this case if the whole disk subsystem disappears and the filesystems get re-mounted READ-ONLY.  Hilariously, the Cisco services still report a 'running' state, despite the complete chaos that a RO filesystem causes)

I'll keep this in mind and make a renewed effort to get some of our deployed chassis upgraded.

Cheers!

Scott Olsen Solutions Specialist Bulletproof Solutions Inc. Web: www.bulletproofsi.com

Cisco TAC is pretty much hit or miss depending on you owns your case. I found great TAC case engineer Alan Mattson which knows about everything concerning these servers.

Just keep in mind what firmware you downlaod because the CIVS-MSP-2RU and CIVS-MSP-4RU are two different upgrade procedures.

I had so many issues with this system Cisco VSOM 6.3.2 running many thrid party software to operate eff enought for casino that I recently got approved to rip and replace the entire system. I'm migrating to Surveillus (Real Casino software VSM) on UCS and EMC Isilion OneFS for storage @ 2 Petabytes.....I can't wait to rip this Cisco system out. 

Best of luck and upgrade now before you have another failure!

You are the man!!!  I followed your steps and the OS booted back up again and allowed me to upgrade the fimware.

GREAT NEWS!!!!

Did you follow the steps to bring for muiltiple drive failures? and then you upgraded firmware?

I'm asking because I'm curious! When I found these forums I was blown away because I thought I was the only one out in the world dealing with all these issues (This is one of many many issues I have had in past 5 years with this system) because TAC seemed to never have the answer and since my system is regulated by the state and subjected to fines for equipment failure I really had find my own ways to recover these servers from failure.

Hey Daniel,

I noticed above you mentioned you are using WinSCP for SFTP transfers in windows.  I used to as well.  Give FileZilla a try; https://filezilla-project.org/download.php I've found it to be substantially faster than WinSCP.

Cheers!

Scott Olsen
Solutions Specialist
Bulletproof Solutions Inc.
Web: www.bulletproofsi.com


Scott Olsen Solutions Specialist Bulletproof Solutions Inc. Web: www.bulletproofsi.com

Well I spoke too soon.  I finally got back to messing with the server and now it tells me that the system has exceded the maximun limit of devices per quad.

Ever seen that one before?

Danny, No I haven't seen this message before. Can you post some screen shots?

FInally made it work sorta.  Had to remove the second row of hard drives...

IMG_0798.JPG

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: