Difference between revisions of "SLab:Todo"
From CCGB
(→CRITICAL) |
(→CRITICAL) |
||
Line 1: | Line 1: | ||
− | == CRITICAL == | + | == CRITICAL DISK STORAGE PROBLEM WITH MD1K-2 == |
There is a serious problem with md1k-2, one of the PowerVault MD1000's connected to s3. It will get into a state where two disks appear to have failed. A two-disk failure when using RAID-5 would mean complete data loss. Fortunately, I've found a remedy that is allowing us to copy the data to a different array. | There is a serious problem with md1k-2, one of the PowerVault MD1000's connected to s3. It will get into a state where two disks appear to have failed. A two-disk failure when using RAID-5 would mean complete data loss. Fortunately, I've found a remedy that is allowing us to copy the data to a different array. | ||
Line 48: | Line 48: | ||
− | I've narrowed down the error by looking through the adapter event logs. There will be two errors, one followed about 45 seconds after the first | + | I've narrowed down the error by looking through the adapter event logs. There will be two errors, one followed about 45 seconds after the first: |
<pre> | <pre> | ||
Tue Mar 23 23:09:56 2010 Error on PD 31(e2/s0) (Error f0) | Tue Mar 23 23:09:56 2010 Error on PD 31(e2/s0) (Error f0) | ||
Tue Mar 23 23:10:43 2010 Error on PD 30(e2/s1) (Error f0) | Tue Mar 23 23:10:43 2010 Error on PD 30(e2/s1) (Error f0) | ||
</pre> | </pre> | ||
+ | |||
+ | After that, the virtual disk goes offline. | ||
+ | |||
+ | <pre> | ||
+ | s3% MegaCli -LDInfo L1 -a0 | ||
+ | |||
+ | Adapter 0 -- Virtual Drive Information: | ||
+ | Virtual Disk: 1 (Target Id: 1) | ||
+ | Name:md1k-2 | ||
+ | RAID Level: Primary-5, Secondary-0, RAID Level Qualifier-3 | ||
+ | Size:8.862 TB | ||
+ | State: Offline | ||
+ | Stripe Size: 64 KB | ||
+ | Number Of Drives:14 | ||
+ | Span Depth:1 | ||
+ | Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU | ||
+ | Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU | ||
+ | Access Policy: Read/Write | ||
+ | Disk Cache Policy: Disk's Default | ||
+ | Encryption Type: None | ||
+ | Number of Dedicated Hot Spares: 1 | ||
+ | 0 : EnclId - 18 SlotId - 1 | ||
+ | </pre> | ||
+ | |||
+ | If you go into the server room, you'll see flashing amber lights on the disks in md1k-2 slot 0 and 1. I can md1k-2 back using the following procedure (replace slot 0 and 1 which whichever slots are appropriate): | ||
+ | |||
+ | # Take the disks in md1k-2 slots 0 and 1 about half-way out and then push them back in. | ||
+ | # Wait a few seconds for the lights on md1k-2 slots 0 and 1 to return to green. | ||
+ | # Press the power button on s3 until it turns off. | ||
+ | # Press the power button on s3 again to turn it back on. | ||
== Miscellaneous == | == Miscellaneous == |
Revision as of 10:35, 26 March 2010
CRITICAL DISK STORAGE PROBLEM WITH MD1K-2
There is a serious problem with md1k-2, one of the PowerVault MD1000's connected to s3. It will get into a state where two disks appear to have failed. A two-disk failure when using RAID-5 would mean complete data loss. Fortunately, I've found a remedy that is allowing us to copy the data to a different array.
Dell Higher Education Support 1-800-274-7799 Enter Express Service Code 43288304365 when prompted on the call.
host | service tag | contract end | description | notes |
c8 | CPM1NF1 | 02/15/2011 | PowerEdge 1950 | old schuster storage, moved PERC 5/E to s3 |
s3 | GHNCVH1 | 06/22/2012 | PowerEdge 1950 | connected to md1k-2 via PERC 5/E |
md1k-1 | FVWQLF1 | 02/07/2011 | PowerVault MD1000 | enclosure 3 |
md1k-2 | JVWQLF1 | 02/07/2011 | PowerVault MD1000 | enclosure 2 |
md1k-3 | 4X9NLF1 | 03/06/2011 | PowerVault MD1000 | enclosure 1 |
I've narrowed down the error by looking through the adapter event logs. There will be two errors, one followed about 45 seconds after the first:
Tue Mar 23 23:09:56 2010 Error on PD 31(e2/s0) (Error f0) Tue Mar 23 23:10:43 2010 Error on PD 30(e2/s1) (Error f0)
After that, the virtual disk goes offline.
s3% MegaCli -LDInfo L1 -a0 Adapter 0 -- Virtual Drive Information: Virtual Disk: 1 (Target Id: 1) Name:md1k-2 RAID Level: Primary-5, Secondary-0, RAID Level Qualifier-3 Size:8.862 TB State: Offline Stripe Size: 64 KB Number Of Drives:14 Span Depth:1 Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU Access Policy: Read/Write Disk Cache Policy: Disk's Default Encryption Type: None Number of Dedicated Hot Spares: 1 0 : EnclId - 18 SlotId - 1
If you go into the server room, you'll see flashing amber lights on the disks in md1k-2 slot 0 and 1. I can md1k-2 back using the following procedure (replace slot 0 and 1 which whichever slots are appropriate):
- Take the disks in md1k-2 slots 0 and 1 about half-way out and then push them back in.
- Wait a few seconds for the lights on md1k-2 slots 0 and 1 to return to green.
- Press the power button on s3 until it turns off.
- Press the power button on s3 again to turn it back on.