Difference between revisions of "SLab:Todo"

From CCGB
Jump to: navigation, search
(CRITICAL)
(CRITICAL)
Line 1: Line 1:
== CRITICAL ==
+
== CRITICAL DISK STORAGE PROBLEM WITH MD1K-2 ==
  
 
There is a serious problem with md1k-2, one of the PowerVault MD1000's connected to s3.  It will get into a state where two disks appear to have failed.  A two-disk failure when using RAID-5 would mean complete data loss.  Fortunately, I've found a remedy that is allowing us to copy the data to a different array.
 
There is a serious problem with md1k-2, one of the PowerVault MD1000's connected to s3.  It will get into a state where two disks appear to have failed.  A two-disk failure when using RAID-5 would mean complete data loss.  Fortunately, I've found a remedy that is allowing us to copy the data to a different array.
Line 48: Line 48:
  
  
I've narrowed down the error by looking through the adapter event logs.  There will be two errors, one followed about 45 seconds after the first/
+
I've narrowed down the error by looking through the adapter event logs.  There will be two errors, one followed about 45 seconds after the first:
 
<pre>
 
<pre>
 
Tue Mar 23 23:09:56 2010  Error on PD 31(e2/s0) (Error f0)
 
Tue Mar 23 23:09:56 2010  Error on PD 31(e2/s0) (Error f0)
 
Tue Mar 23 23:10:43 2010  Error on PD 30(e2/s1) (Error f0)
 
Tue Mar 23 23:10:43 2010  Error on PD 30(e2/s1) (Error f0)
 
</pre>
 
</pre>
 +
 +
After that, the virtual disk goes offline.
 +
 +
<pre>
 +
s3% MegaCli -LDInfo L1 -a0
 +
 +
Adapter 0 -- Virtual Drive Information:
 +
Virtual Disk: 1 (Target Id: 1)
 +
Name:md1k-2
 +
RAID Level: Primary-5, Secondary-0, RAID Level Qualifier-3
 +
Size:8.862 TB
 +
State: Offline
 +
Stripe Size: 64 KB
 +
Number Of Drives:14
 +
Span Depth:1
 +
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
 +
Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
 +
Access Policy: Read/Write
 +
Disk Cache Policy: Disk's Default
 +
Encryption Type: None
 +
Number of Dedicated Hot Spares: 1
 +
    0 : EnclId - 18 SlotId - 1
 +
</pre>
 +
 +
If you go into the server room, you'll see flashing amber lights on the disks in md1k-2 slot 0 and 1.  I can md1k-2 back using the following procedure (replace slot 0 and 1 which whichever slots are appropriate):
 +
 +
# Take the disks in md1k-2 slots 0 and 1 about half-way out and then push them back in.
 +
# Wait a few seconds for the lights on md1k-2 slots 0 and 1 to return to green.
 +
# Press the power button on s3 until it turns off.
 +
# Press the power button on s3 again to turn it back on.
  
 
== Miscellaneous ==
 
== Miscellaneous ==

Revision as of 11:35, 26 March 2010

CRITICAL DISK STORAGE PROBLEM WITH MD1K-2

There is a serious problem with md1k-2, one of the PowerVault MD1000's connected to s3. It will get into a state where two disks appear to have failed. A two-disk failure when using RAID-5 would mean complete data loss. Fortunately, I've found a remedy that is allowing us to copy the data to a different array.

Dell Higher Education Support
1-800-274-7799
Enter Express Service Code 43288304365 when prompted on the call.
host service tag contract end description notes
c8 CPM1NF1 02/15/2011 PowerEdge 1950 old schuster storage, moved PERC 5/E to s3
s3 GHNCVH1 06/22/2012 PowerEdge 1950 connected to md1k-2 via PERC 5/E
md1k-1 FVWQLF1 02/07/2011 PowerVault MD1000 enclosure 3
md1k-2 JVWQLF1 02/07/2011 PowerVault MD1000 enclosure 2
md1k-3 4X9NLF1 03/06/2011 PowerVault MD1000 enclosure 1


I've narrowed down the error by looking through the adapter event logs. There will be two errors, one followed about 45 seconds after the first:

Tue Mar 23 23:09:56 2010   Error on PD 31(e2/s0) (Error f0)
Tue Mar 23 23:10:43 2010   Error on PD 30(e2/s1) (Error f0)

After that, the virtual disk goes offline.

s3% MegaCli -LDInfo L1 -a0

Adapter 0 -- Virtual Drive Information:
Virtual Disk: 1 (Target Id: 1)
Name:md1k-2
RAID Level: Primary-5, Secondary-0, RAID Level Qualifier-3
Size:8.862 TB
State: Offline
Stripe Size: 64 KB
Number Of Drives:14
Span Depth:1
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Access Policy: Read/Write
Disk Cache Policy: Disk's Default
Encryption Type: None
Number of Dedicated Hot Spares: 1
    0 : EnclId - 18 SlotId - 1 

If you go into the server room, you'll see flashing amber lights on the disks in md1k-2 slot 0 and 1. I can md1k-2 back using the following procedure (replace slot 0 and 1 which whichever slots are appropriate):

  1. Take the disks in md1k-2 slots 0 and 1 about half-way out and then push them back in.
  2. Wait a few seconds for the lights on md1k-2 slots 0 and 1 to return to green.
  3. Press the power button on s3 until it turns off.
  4. Press the power button on s3 again to turn it back on.

Miscellaneous