Event Logs
I had a look at the relevant EVA logs and discovered the following relevant entries:
- Temperature within an HSV300 controller becoming too hot.
View corrective actions. Corrective action code: 2e - A drive enclosure temperature sensor out of range condition has been reported by one of the drive enclosure link modules.
- A physical disk drive has disappeared.
View corrective actions. Corrective action code: 42 - A Volume has transitioned to the MISSING state.
View corrective actions. Corrective action code: bf
In retrospect it was a fairly simple sequence of events, as evidenced by the entries above. The Air Conditioner failed, which caused the temperature within the Drive Shelf to rise (this is the HSV300 controller referred to in the event log). To prevent damage to itself, the drive then switched itself off, which prompted the log entry about the physical drive disappearing.
We then started seeing volumes transitioning to the missing state, i.e. our VDisks went missing. Hardly surprising considering that the drives containing them switched themselves off.
Resolution
- Restored Air Conditioning (goes without saying I guess)
- Powered off the EVA and all attached disk shelves
- Powered on disk shelves and waited for the Numeric ID LED's at the back to display the proper IDs.
- Powered up the Controller
- Lo and behold! All the previously failed physical disks came on-line, meaning that my missing VDisks also made a most welcome return
- Unfortunately my Hyper-V Hosts still couldn't access the Vdisks, so I had to unpresent and re-present them via Command View. I assume the EVA assigned new WWN's to the LUNs.
- I re-scanned for storage from the Disk Management MMC on the Hyper-V Hosts
- Brought the Disks and CSV's online via cluster manager
- Started up the VM's
This was quite a harrowing experience, obviously. What struck me as ridiculous is that HP does not have *ANY* thermal shutdown logic / capabilities on the EVA controller itself. It keeps on trucking till the drives themselves fail, causing a very ungraceful failure of the VDisks. There is also no guarantee that your drives and VDisks will come back online. In essence - if your EVA overheats there is a distinct possibility that you lose your Data. Caveat Emptor...
Storage / RAID systems always cause me big emotions like this. But that's our life, we choose to work with these machines, right?
ReplyDelete