I just finished building a four-drive software RAID10 on marmaduke and wanted to jot down my thoughts on RAID failure. In particular, I read a number of postings on the difference between RAID 01 and RAID 10. None of them satisfactory described the differences and how those differences changed when adding more drives.
Marmaduke only has four drives in its array. Most of the web postings dealt with four drives but I also wanted to see the impact on six drives. Here is a hypothetical set of six drives.
Six Drives:
/dev/sda a
/dev/sdb b
/dev/sdc c
/dev/sdd d
/dev/sde e
/dev/sdf f
For clarity, I will refer to /dev/sda
simply as ‘a’, and so on.
Recall that RAID 0 ‘stripes’ two or more drives and RAID 1 ‘mirrors’ two drives.
RAID 01 Composition
Four Drive RAID 01
STRIPE: a b as 0
STRIPE: c d as 1
MIRROR: 0 1 as R01 (RAID 0+1)
Six Drive RAID 01
STRIPE: a b c as 0
STRIPE: d e f as 1
MIRROR: 0 1 as R01 (RAID 0+1)
In both the four and six disk arrays, RAID 01 mirrors two striped arrays. Each striped array can contain two or more drives but there are always only two striped arrays. (A mirror has only two subarrays.)
RAID 10 Composition
Four Drive RAID 10
MIRROR: a b as 0
MIRROR: c d as 1
STRIPE: 0 1 as R10 (RAID 1+0)
Six Drive RAID 10
MIRROR: a b as 0
MIRROR: c d as 1
MIRROR: e f as 2
STRIPE: 0 1 2 as R10 (RAID 1+0)
In both the four and six disk arrays, RAID 10 stripes two or more mirrored raids. Each mirror has exactly two disks.
01 vs. 10
Which is better? Both cost the same in terms of disk drives. Both yield the same final RAID capacity. Performance is (for my purposes) the same.
I conclude that the difference is primarily in the failure rates between the two drives. There are two types of failures.
First is a failure that takes out a drive but not the array. Replace the drive and you can rebuild the array.
Second is a failure that takes out a drive and the array. Nothing you can do. The array is lost. Game over.
Which drive or drive set cause a catastrophic array loss? I’ve created two tables (ah, the beauty of pure 7-bit ASCII) to detail every scenario for both a 4-drive and a 6-drive array of both RAID 01 and RAID 10.
An asterisk denotes a catastrophic failure.
Column 1 : “F”, number of drives that failed in the array.
Column 2 : “DRIVES”, each drive in the array.
Column 3 : “RO1”, subarrays for RAID 01.
Column 4 : “R10”, subarrays for RAID 10.
Column 5 : “RAIDS”, the two final arrays.
Four Drive Arrays
DRIVES R01 R10 RAIDS
........... ... ..... ... ...
F a b c d 0 1 0 1 R01 R10
----------------------------------------
0 | | | |
----------------------------------------
| * | * | |
1 | * | * | |
| * | * | |
| * | * | |
----------------------------------------
| * * | * | * | *
| * * | * * | | *
2 | * * | * * | | *
| * * | * * | | *
| * * | * * | | *
| * * | * | * | *
----------------------------------------
| * * * | * * | * | * *
3 | * * * | * * | * | * *
| * * * | * * | * | * *
| * * * | * * | * | * *
----------------------------------------
4 | * * * * | * * | * * | * *
----------------------------------------
Neither RAID configurations can survive a 3 or 4 drive failure.
Both configurations can survive a 1 drive failure. One of the subarrays in RAID 01 always fail with a single drive failure but it doesn’t bring down the array. In RAID 10, the subarray doesn’t fail because the subarray is a mirror.
With four drives, there are six possible combinations of two drive failures. In this case, RAID 10 has twice the survival rate (two failure points) as does RAID 01 (four failure points).
Six Drive Arrays
DRIVES R01 R10 RAIDS
........... ... ..... ... ...
a b c d e f 0 1 0 1 2 R01 R10
----------------------------------------
0 | | | |
----------------------------------------
| * | * | |
| * | * | |
1 | * | * | |
| * | * | |
| * | * | |
| * | * | |
----------------------------------------
| * * | * | * | *
| * * | * | |
| * * | * | |
| * * | * * | | *
| * * | * * | | *
| * * | * * | * | * *
2 | * * | * * | | *
| * * | * * | | *
| * * | * * | | *
| * * | * | |
| * * | * * | | *
| * * | * * | | *
| * * | * * | | *
| * * | * | |
| * * | * | * | *
----------------------------------------
| * * * | * | * | *
| * * * | * * | * | * *
| * * * | * * | * | * *
| * * * | * * | * | * *
| * * * | * * | * | * *
| * * * | * * | | *
| * * * | * * | | *
| * * * | * * | | *
| * * * | * * | | *
| * * * | * * | * | * *
3 | * * * | * * | * | * *
| * * * | * * | | *
| * * * | * * | | *
| * * * | * * | | *
| * * * | * * | | *
| * * * | * * | * | * *
| * * * | * * | * | * *
| * * * | * * | * | * *
| * * * | * * | * | * *
| * * * | * | * | *
----------------------------------------
| * * * * | * * | * * | * *
| * * * * | * * | * | * *
| * * * * | * * | * | * *
| * * * * | * * | * | * *
| * * * * | * * | * | * *
| * * * * | * * | * | * *
4 | * * * * | * * | * | * *
| * * * * | * * | * | * *
| * * * * | * * | * | * *
| * * * * | * * | * * | * *
| * * * * | * * | * | * *
| * * * * | * * | * | * *
| * * * * | * * | * | * *
| * * * * | * * | * | * *
| * * * * | * * | * | * *
----------------------------------------
| * * * * * | * * | * * | * *
| * * * * * | * * | * * | * *
5 | * * * * * | * * | * * | * *
| * * * * * | * * | * * | * *
| * * * * * | * * | * * | * *
| * * * * * | * * | * * | * *
----------------------------------------
6 | * * * * * * | * * | * * * | * *
----------------------------------------
With a six drive array, RAID 10 has three failure points if two drives fail. However, RAID 01 has nine failure points.
Finally if three drives fail, RAID 10 has 12 failure points compared to RAID 01 which has 18 failure points. In the following table, ‘prm’ is the number of permutations for that number of drive failures.
RAID Failure Points
4-drive 6-drive
............... ...............
F R01 R10 prm R01 R10 prm
----------------------------------------
0 1 1
1 4 6
2 4 2 6 9 3 15
3 4 4 4 18 12 20
4 1 1 1 15 15 15
5 - - - 6 6 6
6 - - - 1 1 1
It is my conclusion that the likelihood of a catastrophic array failure is substantially greater for RAID 01 and prudence suggests a preference for RAID 10.