Difference between revisions of "SLab:Todo"

Revision as of 11:44, 26 March 2010

CRITICAL DISK STORAGE PROBLEM WITH MD1K-2

There is a serious problem with md1k-2, one of the PowerVault MD1000's connected to s3. It will get into a state where two disks appear to have failed. A two-disk failure when using RAID-5 would mean complete data loss. Fortunately, I've found a remedy that is allowing us to copy the data to a different array.

Dell Higher Education Support
1-800-274-7799
Enter Express Service Code 43288304365 when prompted on the call.

host	service tag	contract end	description	notes
c8	CPM1NF1	02/15/2011	PowerEdge 1950	old schuster storage, moved PERC 5/E to s3
s3	GHNCVH1	06/22/2012	PowerEdge 1950	connected to md1k-2 via PERC 5/E
md1k-1	FVWQLF1	02/07/2011	PowerVault MD1000	enclosure 3
md1k-2	JVWQLF1	02/07/2011	PowerVault MD1000	enclosure 2
md1k-3	4X9NLF1	03/06/2011	PowerVault MD1000	enclosure 1

Problem Description

I've narrowed down the error by looking through the adapter event logs. There will be two errors, one followed about 45 seconds after the first:

Tue Mar 23 23:09:56 2010   Error on PD 31(e2/s0) (Error f0)
Tue Mar 23 23:10:43 2010   Error on PD 30(e2/s1) (Error f0)

After that, the virtual disk goes offline.

[root@s3: ~/storage]# MegaCli -LDInfo L1 -a0

Adapter 0 -- Virtual Drive Information:
Virtual Disk: 1 (Target Id: 1)
Name:md1k-2
RAID Level: Primary-5, Secondary-0, RAID Level Qualifier-3
Size:8.862 TB
State: Offline
Stripe Size: 64 KB
Number Of Drives:14
Span Depth:1
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Access Policy: Read/Write
Disk Cache Policy: Disk's Default
Encryption Type: None
Number of Dedicated Hot Spares: 1
    0 : EnclId - 18 SlotId - 1

If you go into the server room, you'll see flashing amber lights on the disks in md1k-2 slot 0 and 1. I can md1k-2 back using the following procedure (replace slot 0 and 1 which whichever slots are appropriate):

Take the disks in md1k-2 slots 0 and 1 about half-way out and then push them back in.
Wait a few seconds for the lights on md1k-2 slots 0 and 1 to return to green.
Press the power button on s3 until it turns off.
Press the power button on s3 again to turn it back on.
Log into s3 as root after it has finished booting.

I then import the foreign configuration. The disk in md1k-2 slot 1 is fine, but the disk in slot 0 needs to be rebuild.

[root@s3: ~/storage]# MegaCli -CfgForeign -Scan -a0
                                     
There are 2 foreign configuration(s) on controller 0.

Exit Code: 0x00
[root@s3: ~/storage]# MegaCli -CfgForeign -Import -a0
                                     
Foreign configuration is imported on controller 0.

Exit Code: 0x00
[root@s3: ~/storage]# ./check_disk_states 
PhysDrv [ 18:0 ] in md1k-2 is in Rebuild state.

command to check rebuild progress:
MegaCli -PDRbld -ShowProg -PhysDrv [ 18:0 ] -a0

command to estimate remaining rebuild time:
./time_left 18 0
[root@s3: ~/storage]# MegaCli -PDRbld -ShowProg -PhysDrv [ 18:0 ] -a0
                                     
Rebuild Progress on Device at Enclosure 18, Slot 0 Completed 56% in 139 Minutes.

Exit Code: 0x00
[root@s3: ~/storage]# ./time_left 18 0
time_left: PhysDrv [ 18:0 ] will be done rebuilding in about 1:49:12

After the disk is finished rebuilding, reboot s3 and md1k-2 will be available once again.

Difference between revisions of "SLab:Todo"

Revision as of 11:44, 26 March 2010

CRITICAL DISK STORAGE PROBLEM WITH MD1K-2

Problem Description

Miscellaneous

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools

@@ Line 115: / Line 115: @@
 time_left: PhysDrv [ 18:0 ] will be done rebuilding in about 1:49:12
 </pre>
 After the disk is finished rebuilding, reboot s3 and md1k-2 will be available once again.
 == Miscellaneous ==