Raid Recovery: Do's and Don'ts

2013-06-15 00:00:00
15 JUN

From time to time we receive raid systems for data recovery.
Recovering from a raid itself is not easy in many cases, but the matter can become much more complex due to thoughtless actions by the user or IT manager .

First a brief description of the different raid systems:

Raid 0: Minimum 2 disks, the data is divided into 'stripes' over the different disks. If one disk fails, the data is no longer accessible.

Raid1: Consists of two disks that are a copy of each other. If one disk fails, work can continue normally with the other disk.
Disadvantage: Sometimes a disk failed months ago, without anyone noticing. If the other one also fails, the data can no longer be accessed.

Raid5: Consists of at least three disks. The data is divided among the three disks, along with parity blocks. If one disk fails, you can continue working with the other two.

Raid6: Consists of a minimum of four disks. The data is divided over the four disks, along with 2 different parity blocks. Here two disks may fail and work can still continue.

There are also other raid configurations, such as 0 + 1 or 5 + 1, but these are combinations of the above raid systems.

What could go wrong ?

Raid0:

1 drive fails; data is no longer accessible. All disks are needed for data recovery

Raid1:

Scenario 1:

1 disk fails and people don't notice it. After a while the second one also fails and the data is inaccessible.

The biggest mistake here is that they only deliver one of the two disks. It happens that one drive has only logical errors, and the other has a head crash.

Scenario 2:

1 drive fails and people don't notice it. Work continues with the other disk. After a few months, the drive that initially failed decides to start up again (reason unknown). The raid 1 syncs back but in the wrong direction. Both drives now contain old data.

Scenario 3:

1 drive fails, and people notice it. The failed drive is replaced with a new drive, and the system synchronizes both drives. However, in the wrong direction. You now have two blank disks.

Practical example:


A customer brought us one disk and reused the other (which turned out to still work, but the logical volume was gone ) to reinstall an old backup. It later turned out that that disk was the last to fail, and the disk that was delivered to us only contained old data. Since the still working disk was overwritten with an old backup, their data was gone.

What not to do:

Synchronize disks again without a backup because the synchronization can go wrong

< h4>Raid5:

In a raid 5, problems only occur if two drives fail.

Scenario 1:

One drive fails, and people continue working without replacing them, thinking that they are safe after all. It often happens that a second disk fails quite quickly and then you have a problem. Especially if the last disk fails with a head crash, because the disk that failed first contains old data.

Scenario 2:

Several disks fail at the same time, e.g. due to a power outage.

Scenario 3:

The Raid controller fails, and although the drives are still good, you can no longer access the data. This is especially a problem with older controllers as they are still difficult to find.

Practical example 1:

A RAID5 drive fails. The customer replaces the drive and does a rebuild. After a while the rebuild stops with an error message on another drive.
The customer now takes the other drive out of the raid and replaces it with a new one and forces a rebuild.

It speaks for itself realizes that this cannot end well. Since the first rebuild is not completed, the new rebuild will corrupt and even overwrite the data so that a complete recovery is no longer possible.

Practical example 2:

Two disks of a raid 5 drop out. The customer replaces both drives and forces a rebuild. Obviously this cannot work, because part of the data is missing as two disks have failed.

If a raid can no longer be recovered, this is in most cases due to the actions that the customer performed before coming to us.

What not to do:

Never perform a rebuild without first cloning the disks (sector by sector copy) or to have a good backup

Raid6:

With a raid 6, problems only occur if three disks fail. Given the great redundancy, it rarely happens that we receive a raid 6. Given the greater complexity, these recoveries are also more expensive.

What not to do:

Never perform a rebuild without first cloning the drives (sector by sector copy) or having a good backup .

Why choose for Datarecuperatie®?

High-tech laboratory

100% safe

80% success rate

24h / 7d service