eli5 How does computer raid 0 and other numbers work? and when is it beneficial?

844 views

I keep hearing these over and over but I never did quite get how it works since the number is different and it does different thing apparently and wikipedia is quite not explaining it’s benefits and downsides. So Could someone explain how these raid systems work.

In: Technology

4 Answers

Anonymous 0 Comments

So first, let me give you some pragmatic advice,

Your data is the most valuable, precious thing you own on a computer. You can always buy a new computer, but money can’t buy your data, once it’s gone. Any photos or documents or unique files that are lost, are lost forever. They say you should act like you always have “`N – 1“` copies of your data, where “`N“` is the number of backups. So if you don’t backup, your data is as good as gone already.

The way I would do it is to probably build my own NAS, a Network Attached Storage, which is any cheapo computer with hard drives attached to it. Some of these things you can buy, some of these things you can assemble yourself. I would recommend a Raspberry Pi and some solid state drives if I were to build it, because it’s going to be on all the time, and so the energy footprint becomes significant. I would put the system on a battery backup (a UPS, uninterruptible power supply), and in the event of a power loss, the system should finish writing to disk, and gracefully shutdown.

I would recommend AT LEAST 4 drives running BRTFS configured as a software RAID 6, under Linux. I’ll get to what some of that is. But the key takeaway here is software RAIDS have some advantages – first, they’re cheap; enterprise RAID hardware knows no upper bounds in how much you can spend, software RAIDs have no additional costs. Second, and this is most important, for the home user such as yourself, they offer much greater data security. Hardware RAIDs are actually very brittle (we call them “snowflakes”), and can get into states where they fail, and the whole array is unrecoverable. Also, if you buy RAID hardware, you actually buy 2 (at least) and put the spares in your closet. Because if your RAID hardware fails, you need a nearly exact match in order to recover your array – RAIDs aren’t compatible between vendors. With a software RAID, if your computer dies, you can just plug the hard drives into ANY OTHER COMPUTER and it’ll work, you can get your data. Third, you don’t need the performance of a hardware RAID. Data access is only as fast as the slowest part of the data channel between you and your data: and that is your home network. Any computer with a software RAID will internally be fast enough to serve your house for any need, from uploading/downloading, to streaming media to your TV.

Finally, you want to run a tape backup monthly, and store it in a safe deposit box off-site, like at your bank. Your tapes don’t do you much good if you lose them in a fire, or they get stolen.

So that’s the end of the practical advice. You are not an enterprise, and the necessity of RAID hardware in the home is greatly overstated, it’s actually a costly and gotcha ridden blunder. Now let’s talk a bit about what RAID is.

It’s a means of organizing drives to increase data integrity or performance. You can organize your disks in a RAID 0, where the data or a given file is interleaved – imagine if every other byte was written to a different drive (imagine you have a pair of drives). Now one file is stored between two drives, and half is on each drive. When reading or writing, you’re saturating two data buses instead of one, and you’re splitting the work load in half, so you can get ostensibly nearly a 2x speedup. RAID 0 offers no data redundancy. Lose any bit of one drive or file, and the whole thing is kaput.

RAID 1 is mirroring. A file is written to two identical drives. Should one drive fail, you have the backup drive to recover. There is no additional performance gain, no additional storage capacity. The problem with mirroring, though, is that if one of your drives fail, you can bet the other drive isn’t far behind. Recovery is paramount, and it can fail – the strain can kill the backup drive, too.

You can combine RAID 1 and 0 to get RAID 10. This is striping across multiple drives for performance, and then mirroring that for redundancy. So if you stripe across 2 drives, now you use 4 drives to mirror the striping. Again, a RAID 10 suffers all the problems of a 0 and a 1.

Then there’s RAID 5. This uses at least 3 drives, and often a 4th is a “hot standby”. The data is stored across the drives in the array, and they’re not merely striped, but there is additional information encoded, I believe they’re called Hasting Codes, such that if any one disk fails, the hot standby goes live, and the data from the failed drive can be reconstituted from the remaining data on disk. The problem with RAID 5 is that again, if one drive fails, the others probably aren’t far behind. RAID 5 tends to fail during recovery. RAID 5 is the most famous, but most vendors today, HP and Dell both come to mind, push RAID 6.

RAID 6 is all the same, except you need at least 4 drives, and 2 can fail before you lose the array. Again, since recovery is the most taxing on a disk, that’s the critical moment when you may lose the array, so that’s what you need to guard against.

And of course, you can combine all the numbers, like there’s RAID 150, which is striped across redundant arrays, and then mirrored. This shit gets insane, and you can only imagine what the cost can be like.

You are viewing 1 out of 4 answers, click here to view all answers.