cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
30469
Views
45
Helpful
14
Comments
sschmidt
Cisco Employee
Cisco Employee
 
The following procedure is not supported by TAC, the Wireless
Networking Business Unit or any other entity at Cisco.
 
The issue seems to be related to a problem identified by VMware:
 
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=51306
 
Linux based file systems become read-only
 
VMware has identified a problem where file systems may become read-only after encountering busy
I/O retry or SAN or iSCSI path failover errors.  NCS users have also encountered this issue after the 
storage has been uncleanly removed, usually brought on by a power outage.
 
If you can get to the shell prior to a reboot you can try issuing the following.  If you don't have access 
to a CLI becuase the vm is in a boot loop, proceed to the next section:
 
mount -o remount /
 
Recovering an NCS Virtual Machine stuck in a boot loop:
 
1. Download a live linux distro locally to your machine.  Users have reported success with Fedora.
2. In vSphere, left click the NCS VM -> Summary tab -> Storage -> right click the storage -> browse datastore -> click the icon to upload a file -> browse to the ISO. 
3. Exit out of the datastore browser.
4. Right click the NCS VM -> edit settings -> CD/DVD drive -> enable 'Connected' and 'Connect at power on' -> select the radio button 'Datastore ISO File' -> browse to the ISO you just uploaded -> save
5. Reload the VM and boot to the ISO
6. Get to the CLI (the exact steps to do so will completely depend on the linux distro)
7. Determine which designation has been given to the volumes we need to repair.  In the output below, this 
particular linux distro has given the volumes the 'sdb' designation. This can vary.  There will be three of them (sdb1, sdb2, sdb3)
 
# fdisk -l
 
Disk /dev/sdb: 209.7 GB,     209715200000 bytes
255 heads, 63 sectors/track,     25496 cylinders
Units = cylinders of 16065 * 512     = 8225280 bytes
 
Device     Boot          Start            End     Blocks   Id  System
/dev/sdb1       *               1              64       512000   83  Linux
Partition 1 does not end on cylinder boundary.
/dev/sdb2       64             77       102400   83  Linux
Partition 2 does not end on cylinder boundary.
/dev/sdb3                   77       25497       204184576   8e  Linux LVM
 
8. Scan for volume groups:
 
# lvm vgscan -v
 
  Wiping cache of LVM-capable devices
  Wiping internal VG cache
  Reading all physical volumes.  This may take a while...
  Finding all volume groups
  Finding volume group "smosvg"
  Found volume group "smosvg" using metadata type lvm2
  Archiving volume group "smosvg" metadata (seqno 12).
  Creating volume group backup "/etc/lvm/backup/smosvg" (seqno 12).
 
9. Activate all volume groups:
 
# lvm vgchange -a y
 
  11 logical volume(s) in volume group "smosvg" now active
 
10. List logical volumes:
 
# lvm lvs –a
 
      LV                   VG Attr LSize  
  altrootvol               smosvg -wi-a-  96.00M
  home                   smosvg -wi-a-  96.00M
  localdiskvol           smosvg -wi-a-  29.28G
  optvol                   smosvg -wi-a- 123.22G
  recvol                   smosvg -wi-a-  96.00M
  rootvol                 smosvg -wi-a-   3.91G
  storeddatavol       smosvg -wi-a-   9.75G
  swapvol               smosvg -wi-a-  15.62G
  tmpvol                 smosvg -wi-a-   1.94G
  usrvol                   smosvg -wi-a-   6.81G
  varvol                   smosvg -wi-a-   3.91G
 
11. Use fsck to check all the partitions on the drive.  It is ok if you receive errors for one of these.
 
# fsck -t ext3 –y /dev/sdb1
# fsck -t ext3 –y /dev/sdb2
# fsck -t ext3 -y /dev/sdb3
 
12. Perform the same steps for all of the logical volumes in the group identified with the lvscan command (remember to use the –y flag in all cases).
 
# fsck -t ext3 –y     /dev/smosvg/altrootvol
# fsck -t ext3 –y     /dev/smosvg/home
 
# fsck (repeat for all the others     from step 10)
 
You are looking for similar output to this:
 
fsck 1.39 (29-May-2006)
e2fsck 1.39 (29-May-2006)
/home: clean, 34/128016 files, 33751/512000 blocks
 
13.  Cleanly shut down the vm, remove the ISO configurations, and restart the server.  It should now boot successfully
 
With this information, and the volumes activated, you should be able to mount the partitions and volumes:
 
a. mount /dev/sdb1 /media/boot ( not the Linux LVM )
b. mount /dev/sdb2 /media/storedconfig ( not the Linux LVM
c. mount /dev/smosvg/localdiskvol /media/NCSbackup
d. Move the most current backup file from the localdiskvol volume, as well as the startup config from the
storedconfig volume, redeploy the VM using the OVA file, then restore from the
backup archive:
 
http://www.cisco.com/en/US/customer/docs/wireless/ncs/1.1/configuration/guide/tasks.html#wp1201560
 

 

 

Comments

instead of deploying a parallel linux VM the NCS virtual system can be booted with a Linux Live ISO Image.

I tried this today with a Ubuntu Server 12.04.1 64bit image, going to rescue mode to fsck our broken virtual filesystems since our NCS did not successfully shutdowned as there was a migration from our VM Department on schedule.

Yahya Jaber
Cisco Employee
Cisco Employee

Thank you Thomas

that was really helpful.

patoberli
VIP Alumni
VIP Alumni

Thanks @sschmidt and Thomas, this fixed the rebooting of my CPI 1.2. The issue happened as the nfs connected storage of the ESX had a CPU problem and the CPI decided to switch to read-only filesystem. Login wasn't anymore possible so I had to reboot, which caused the reboot loop.

Booting with a Fedora 17 Live ISO image and issuing the lvm and fsck commands fixed the server

tdaly
Level 1
Level 1

I find myself in the same boat with NCS.  I have attached to the VM using Fedora live ISO but not having much Linux experience I'm not sure how to run the commands. I was able to secure the backup file but if I could fix the VM as you described that would be ideal. Any help you could offer would be greatly appreciated.  Thanks in advance for your time.

Scott McKellar
Level 1
Level 1

Thanks, I can confirm these steps worked on using an Ubuntu 13.04 Live Boot Image

In my case the volumes were sda1-3 not sdb1-3

richard.borgia
Level 1
Level 1

Taking the time to thank Steve Schmidt for the excellent doc!

We used CentOS 6.5 live CD ISO

we had to enter the "SU" command to get to the correct prompt, and our volumes were also "sda" 1-3.

Thanks Steve!!

Did anyone have success running  fsck -t ext3 -y /dev/sdb3 ?

richard.borgia
Level 1
Level 1

Dzmitryj Jakavuk

We went through the entire process, it found a few issues and "fixed" them. I don't know for sure the output (results) from that specific command, but I will ask; is the partition correct? "sdb"? my partitions were "sda" (with an "a").

Rich

I mean is it possible to check third partition which is linux lvm  with fsck ?

aviwollman
Community Member

worked fine (mine was also sda) using fedora 15 live.

#10 & #11 do 10! times so this is a script

lvnam= `lvm lvs –a| awk '{print $1}'| grep -v LV`

for index in $lvnam; do fsck -t ext3 –y     /dev/smosvg/$index; done

jb800113707
Community Member

Fedora 17 Live Disc worked for us as well, Latest release (20) was unable to see any of the Disks.

Had to run the steps and restart three times before our ADS VM would boot correctly, and the first boot after the repair took a little over an hour.

Thank you for the Fix, I wish that Cisco would add this to their official documentation. We were referred to this post by Cisco support.

gfedergr
Community Member

Also happened to me with PI2.2

Here is my workaround:

1. After the VM entered a loop, I've downloaded the PI ISO file and uploaded it to Datastore

2. I chose "Connect to ISO Image on a Datastore..."

3. On the following loop the VM booted from ISO

4. I started a new fresh PI installation, the first step is a format of the exist OVA installation (including the Linux OS) and after it the PI installing starts over.

When the installation ends, the VM rebooted successfully. 

regards,

Gadi.

After running out of disk space we also hit this problem

Thanks very much, your instructions fixed our problem.

We used Ubuntu 13.04 Desktop Live CD

 

Jonathan Raper
Level 1
Level 1

More than 6 years later, and this still saved my bacon. Was running PI 3.1 and for whatever reason ended up in a boot loop that was asking for a root password in order to run a File System Check:

 

Prime Boot Loop1.png

 

We attempted to follow Cisco Document ID:200760, but the syntax was incorrect in multiple places. Still, even after finding a second article in the Cisco Community here stating that the syntax was incorrect in the first article, we still were at a loss, as the issue did not seem to be resolvable...by luck, a Google search returned the article on the page you are reading now. I used the Linux SystemRescueCd-6.0.2 Released on Feb 21, 2019, which you can download here: http://www.system-rescue-cd.org/Download/

 

When I booted, I was presented with this screen, and I chose the first option, which got me to a CLI:

 

SystemRescueCd.pngSystemRescueCd2.png

 

At this point, I picked up with step 7:

 

7. Determine which designation has been given to the volumes we need to repair.  In the output below, this 
particular linux distro has given the volumes the 'sdb' designation. This can vary.  There will be three of them (sdb1, sdb2, sdb3)
 
# fdisk -l
 
The only problem was that the output went completely off the screen. If you aren't familiar with Linux, this can be frustrating, however not to worry - just use the "less" command, which is just the command stated above with a pipe:
 
# fdisk -l | less
 
Doing this gave me the following output (you have to hit Q to exit back to the CLI):
 
SystemRescueCd3.png
From that screen above, this (below) is all we care about:
 
 
SystemRescueCd4.png
 
From here, we can continue on with steps 8 through to completion. Not all of the output will look like what is posted above, but that's ok. I've posted mine for examples, yours may look a little different. I didn't show all of my volume fscks, because only one needed fixing.
SystemRescueCd5.pngSystemRescueCd6.pngSystemRescueCd8.pngSystemRescueCd9.png
 
 
As you can see above, I re-ran the fsck on the /dev/smosvg/optvol, and the second time it ran completely clean.
 
Happiness!
 
So, I then cleanly shut down so that I could disconnect the ISO, and then the system booted properly.
 
#shutdown -h now
 
If you wait patiently, that command should shut down any linux OS properly, cleanly, and fully.
 
Good luck!
 
 
 
 
 
 
 
Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: