NSS 326 Not Booting

Unanswered Question
Jan 2nd, 2014
User Badges:

Hello All, first post here.


I searched for this issue before starting a new post, but had no luck.


So, on to the problem.


2 days ago, one of our NSS 326's became inaccessible. There were no failed drives indicated, nor had I received any email alerts for the device.


I could not access the web interface or any of the folders through the network.


It was making a kind of whining noise that I had not heard before.


I powered down the device, then restarted it. It stayed on "System Booting" for several hours, which is unusual. It normally boots up in less than 30 minutes.


I powered down again, and restarted again.


I came in this morning and it is still says "System Booting".



Beyond that The Status light is solid Green


The LAN light is blinking Orange.


The light for Drive 6 is solid green, while the lights for Drives 1-5 are off.


Can anyone provide any guidance on how to proceed?


Thanks,

Chris

  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 0 (0 ratings)
Loading.
mpyhala Thu, 01/02/2014 - 10:24
User Badges:
  • Gold, 750 points or more

Chris,


Power the NAS down and remove ALL drives. Make sure you keep them in the correct order.


Try to boot with no drives in the unit and see if you get to the setup wizard. If you do, then the chassis is good but the configuration is corrupt. The configuration resides on the hard drives.


Keep in mind that if you boot it again with drives, make sure that ALL drives are inserted in the correct order before you power it on. If you power it on with one drive for example, it may be formatted and data loss is guaranteed.


It may be possible to recover some or all data in this situation but it is likely that you will need to factory reset by holding the reset button for 10 seconds or more. You would then need to rebuild the configuration and restore the data from backups.


If the chassis does not boot up without drives, call support at 1-866-606-1866 for replacement under warranty. If the configuration and drives are good you can simply place them into the new unit (in the correct order) and boot normally with everything intact.


- Marty

Chris Denison Thu, 01/02/2014 - 14:02
User Badges:

Thank you Marty.


I ended up calling support shortly after posting this to start a new case.


When a tech called back, he walked me through what you described. The chassis booted up fine without the drives.


When I inserted the drives I had the same issue. We tried removing only drive 6, since that was the only one with the indicator light that was lit, but the issue persisted.


He recommended repeating the process with the other drives.


I got lucky and when I removed the next drive, number 5. The system booted up as it normally does, although in degraded mode.


I am in the process of backing up all the files to the other NAS, and have ordered a new hard drive to replace drive 5.


The odd thing is that there was no indication of a failed drive, so I'm not 1000% confident that is the issue. Once all the files are backed up, I am going to repeat the test with the other drives, just to see if it will boot.


Would it be worth reinserting drive 5 to see if the problem resolves itself. Or, format the drive and inserting it to see if the RAID is rebuilt with the formatted drive? Or, should I not take any chances and just insert the new drive when I get it and be done with it?


Thanks for your help.


Chris

mpyhala Thu, 01/02/2014 - 14:26
User Badges:
  • Gold, 750 points or more

Chris,


Do nothing until your data is backed up.


I have seen this happen once or twice where there is no indication that a drive went bad but the unit fails to boot.


I would do a SMART scan on all of the remaining drives first to make sure they are all ok. The status should be "Good"


You could boot the NAS with NO drives installed, start the wizard and use the "bad" drive to create a new volume. (Single drive) The NAS will let you know if that drive is bad. If it is good you could boot with ALL drives again and the array should rebuild. If it is bad, boot with all BUT that drive.


"Once all the files are backed up, I am going to repeat the test with the other drives, just to see if it will boot."


Keep in mind that if your RAID array is degraded and you remove another drive, you are at high risk of losing the array and having to start again from scratch. I would test the "bad" drive by itself and run the SMART test on the intact drives. Install the new drive when it arrives and confirm that the array rebuilds properly.


- Marty

Chris Denison Mon, 01/06/2014 - 09:03
User Badges:

Thanks Marty,


I am defintiely not doing anything until I have the files backed up. Speaking of which, what is the best way to back up the device? I started dragging and dropping files, but was concerned files would be missed. I'm also dealing with multiple shared folders, so it's a pain. Is there a way to backup everything at once to an External HD?


About doing a SMART Scan, is that something done manually? The reason I ask, is that under Volume Management in the web interface, the other 5 drives say GOOD in the SMART Information column.


Thanks,

Chris

mpyhala Mon, 01/06/2014 - 09:59
User Badges:
  • Gold, 750 points or more

Chris,


I use Backup > External Drive and select Backup Now. You can choose what folders to back up to the drive. You can select 5 folders per copy job and can create multiple jobs. I have found it to be much easier and more efficient than manually copying through the PC.


Go to Disk Management > HDD SMART and select the drive from the dropdown in the upper right. Select the test tab and run the short test, Each drive should show a status of "Good". A status of "Normal" indicates a problem. I recommend scheduling a long SMART test at least once per month (at a time of low activity if possible).


- Marty

Chris Denison Mon, 01/06/2014 - 12:55
User Badges:

Marty,


The first copy job is underway!


Are the Shared Folders the only place data is stored?



I've tested all five drives. The 1st drive came back as NORMAL, but the other 4 came back GOOD. Does that mean the SMART Information under Disk Management  > Volume Management not a realtime status, but the status from the  last time the drives were manually tested?


I also noticed that drives 3, 4 and 6 said they had not been tested, but had a status of GOOD. Does that mean that the status of an untested drive displays as GOOD, or that perhaps those three drives were replacements and the previous drives in those slots tested GOOD?



So, do I replace Drive 1 now, or wait until it fails completely? Should I do a full scan on it and see what that says? If the status doesn't change, is there anything I can do to try to repair it, or should I not chance it?


Your earlier comments:

You could boot the NAS with NO drives installed, start the wizard and  use the "bad" drive to create a new volume. (Single drive) The NAS will  let you know if that drive is bad. If it is good you could boot with ALL  drives again and the array should rebuild. If it is bad, boot with all  BUT that drive.


Is there any step by step documentation on this procedure, or is it pretty straight forward?


Thanks,
Chris

mpyhala Mon, 01/06/2014 - 13:15
User Badges:
  • Gold, 750 points or more

Chris,


First priority is backup. After that, make sure that the original faulty drive (#5) is replaced and the RAID array rebuilt completely. After the RAID is stable, replace drive 1. I would not wait for it to fail completely unless you have a spare ready to go.


I schedule a fast SMART test weekly (in the middle of the night) and a long test monthly. I set up email notifications so I will know immediately if there is a problem.


What I wrote about booting without drives is just basic troubleshooting to see if there is a problem with the NAS or drives. It is not documented as far as I know.


- Marty

Chris Denison Thu, 01/23/2014 - 06:14
User Badges:

Marty,


I completed the backup, replaced Drive #5, ordered a new drive to replace #1, and set up the test schedule.


I was surprised to find that, first of all, the schedule was not set up, but also that some drives had not been tested in over two years, and some had never been tested at all.


In any event, during the first weekly Rapid Test, the status of Drive #4 changed to Normal.


Then yesterday, after I replaced Drive #1 and while the RAID was being rebuilt, Drive #4 failed.


I received several emails about it, with the first one stating "Level: Warning" and "Rebuilding Skipped"


I've already ordered a replacement drive, and hopefully won't lose any more drives before I am able to replace that drive and rebuild the RAID.


This makes 3 drive replacements in less than a month. Is there something I need to be looking at that could cause multiple drive failures in such a short period of time?


Our other NAS, which is identical except for Hard Drive sizes and Firmware version, is not having these issues.


Thanks,

Chris

mpyhala Thu, 01/23/2014 - 08:21
User Badges:
  • Gold, 750 points or more

Chris,


I'm sorry to hear that you are having so many issues. Usually when there are multiple failures it happens within the first few months of use and possibly indicates a bad batch of drives. Check your serial numbers and see if they are close to the same. That might indicate a manufacturing defect that causes all of the drives to fail at about the same time.


I found some advice years ago that has served me well regarding RAID: Buy your drives at different times from different distributors to avoid getting drives manufactured at the same plant at the same time. If there is a problem during manufacturing on any given day/week, it is likely to be replicated to all drives manufactured during that period. Before I found that information I purchased 5 2TB drives in the same order from a reputable online retailer and they were all DOA. The serial numbers were nearly consecutive. I created an RMA with the manufacturer and they sent me 5 drives that are still working fine today (4 in an NSS324 and 1 as a backup, currently running 24/7 in a PC). The replacement drives were manufactured in different plants at different times (Possibly refurbished).


Are your drives on the list of supported drives for the NSS300 series NAS?


http://www.cisco.com/en/US/docs/storage/nass/csbcdp/smart_storage/avl/Smart_Storage_AVL.pdf


Another thing to look at is your power source. Make sure that the NAS has a battery backup to avoid drive damage from outages and surges.


- Marty

Chris Denison Thu, 01/23/2014 - 08:59
User Badges:

Marty,


I don't believe the serial numbers are that close, but then again, I'm not familiar with their numbering system.


SN: 6VPEN7L4

SN: 9VP5B1L6

SN: 9VP56XTB


All three drives are Seagate ST31000528AS 1TB drives. Two of them are firmware CC38 and one is CC46.



I'll defintiely take that advice. I've actually unknowingly been doing that already. All three drive's I've ordered have come from different places. I've almost bought a spare a couple of times, but didn't. In the back of my mind I've been thinking I should have, but now I'm glad I didn't.



I have been checking that list when I order drives, although I wonder when it was last updated. When I search the web for some of the drive models, some of them do not seem to be very prevalent. This leads me to believe they may be becoming outdated and there could be newer models available that are suitable replacements, although not listed.



Thanks,

Chris

mpyhala Wed, 02/05/2014 - 11:16
User Badges:
  • Gold, 750 points or more

Chris,


Do the Seagate drives have a manufacture date?


The approved drive list is fairly dated. In most cases the newer model of the same drive is compatible. It can be hard to find model numbers from 3 years ago. You might also try the QNAP list:


http://www.qnap.com/i/useng/product_x_grade/cat_intro.php?g_cat=1&hf=old


NSS322 = TS-259 Pro+

NSS324 = TS-459 Pro+

NSS326 = TS-659 Pro+


Keep in mind that the Cisco firmware 1.5 is the equivalent of the QNAP firmware 3.5, so only drives up to 3TB are officially supported. You could of course switch to the QNAP firmware and have wider compatibility.


Let me know if you have any questions.


- Marty

Chris Denison Wed, 02/05/2014 - 11:47
User Badges:

Marty,

I didn't see anythign that specifically said Manufacture Date, but on the label all three of them had:


Date Code: 10356   Site Code: TK


It appears that they were made around the same time and at teh same place?



Thanks for the link to the QNAP list! I didn't even think to check their site.


The Box that had the three failed drives was using Firmware 3.5 or 3.6, but I've upgraded it to 4.0.3 last week.


Our other NSS 326 was still running firmware 1.1.0, but I've upgraded it to QNAP 3.5.0 this week, and will probably take it up to 4.0.3 soon.


Thanks again for all your help. I really can't thank you enough.


Thanks,

Chris

Actions

This Discussion

Related Content