eli5 How does computer raid 0 and other numbers work? and when is it beneficial?

834 views

I keep hearing these over and over but I never did quite get how it works since the number is different and it does different thing apparently and wikipedia is quite not explaining it’s benefits and downsides. So Could someone explain how these raid systems work.

In: Technology

4 Answers

Anonymous 0 Comments

RAID is a way to make a single big virtual disk out of several smaller disks. This is both to increase the size of the file system you can put on it and also to provide redundancy in case a disk fails. The numbers are used to distinguish between different types of RAID configurations. RAID 0 just gives you a bigger faster disk by doing so called striping. This is when you store the odd data blocks on the first disk and the even on the second disk. So you get double the performance and double the space but if one disk fails all your data is lost. RAID 1 gives you full redundancy by storing the exact same data on both disks using mirroring. So if you lose one disk all your data is on the other disk. The performance is a bit more complex and can range from half the performance to double the performance on two disks depending on the configuration and the operation. RAID 5 is used for three or more disks and use striping like RAID 0 but will also store a parity or checksum of every stripe. If you lose one disk then you can recreate the data on it using the data on the other disks. This gives you both more space and some redundancy and is kind of compromise between RAID 0 and RAID 1. Then there is RAID 6 which is the same as RAID 5 but now with two parity so you can lose two disks without losing data but also costs you two disks worth of space. RAID 10 is a combination of 0 and 1 where you both stripe and mirror the data. You get the capacity of half your disks and the redundancy of the other half.

Anonymous 0 Comments

RAID basically means using multiple hard drives to provide, primarily, data backup and recovery functionality. That’s the main benefit. The downsides are the cost of having additional hard drives and needing specialized hardware/software to implement RAID functionality.

RAID 0 basically takes your data and spreads it out evenly over multiple disks. This does not provide for backup and recovery, but can improve performance and allows the computer to access data faster. The downside is the loss of any single hard drive in the array makes the entire array unusual as your data is broken up across all of them.

RAID 1 is a simple mirroring of one hard drive to another. This provides the simplest form of backup and recovery. If one hard drive fails, you have an exact copy of it. But it isn’t very efficient.

RAID 2 is basically just a different implementation of RAID 0. RAID 0 distributes data across individual hard drives in clumps called “blocks” whereas RAID 2 does it on a bit-by-bit basis. However it does implement a simple form of error correction whereas RAID 0 does not.

RAID 3 is like RAID 0 and RAID 2, but it operates at the byte level (rather than block or bit) and one of the disks of the array is dedicated for *parity*. The parity disk examples the data on all the other disks and stores the parity of that data. For example, with RAID 3 we’re storing information byte-by-byte. Let’s say the first byte on the first disk is 01010101 and the first byte on the second disk is 00001111 and the first byte on the third disk is 00111100. The first byte of the parity disk looks at the first byte on all the other disks and counts how many “1”s appear at each bit place. It then stores a “0” if there are an even amount of “1”s and stores a “1” if there are an odd amount of “1”s like so:

|First Disk|01010101|
|:-|:-|
|Second Disk|00001111|
|Third Disk|00111100|
|Parity Disk|01100110|

Having a parity disk allows the recovery of a single hard drive. If a hard drive fails, you can construct what the missing byte would be by looking at the remaining bytes, the parity, then constructing the missing byte by using whatever bits are necessary to make the parity correct.

RAID 4 is like RAID 3 but operates on block level.

RAID 5 is like RAID 4 but the parity blocks are distributed across all of the hard drives in the array rather than being confined to one. The parity blocks only come into play when a disk fails, so having all of the parity blocks on a single hard drive means you have an entire hard drive that is hardly being used, increasing the wear and tear on the other hard drives. By distributing the parity blocks throughout, you include all the hard drives and distribute this stress more evenly.

RAID 6 extends RAID 5 by having two parity blocks per data block and therefore can support the loss of up to two hard drives.

Anonymous 0 Comments

Raid0: you have many boxes. You throw your toys in them all. If you break a box you lose all your toys.

Raid1: you have 2 (4,6 etc) boxes. You put your toy in the box And it nakes a magic copy to box2. If you lose either box, it’s ok you still have your toy.

Raid5: you have 3 boxes. You put your first toy in box 1 and second toy in box2. Then you put. A “check” in box 3.
Your next toy goe in box2, then 3, then a check in box 1. Etc etc.
If you break a box, the check is used to repair your toys

Raid10: you take your set of toys from raid1 and put them inside raid0 (or vice versa)

Tried to make it as simple as I could

Anonymous 0 Comments

So first, let me give you some pragmatic advice,

Your data is the most valuable, precious thing you own on a computer. You can always buy a new computer, but money can’t buy your data, once it’s gone. Any photos or documents or unique files that are lost, are lost forever. They say you should act like you always have “`N – 1“` copies of your data, where “`N“` is the number of backups. So if you don’t backup, your data is as good as gone already.

The way I would do it is to probably build my own NAS, a Network Attached Storage, which is any cheapo computer with hard drives attached to it. Some of these things you can buy, some of these things you can assemble yourself. I would recommend a Raspberry Pi and some solid state drives if I were to build it, because it’s going to be on all the time, and so the energy footprint becomes significant. I would put the system on a battery backup (a UPS, uninterruptible power supply), and in the event of a power loss, the system should finish writing to disk, and gracefully shutdown.

I would recommend AT LEAST 4 drives running BRTFS configured as a software RAID 6, under Linux. I’ll get to what some of that is. But the key takeaway here is software RAIDS have some advantages – first, they’re cheap; enterprise RAID hardware knows no upper bounds in how much you can spend, software RAIDs have no additional costs. Second, and this is most important, for the home user such as yourself, they offer much greater data security. Hardware RAIDs are actually very brittle (we call them “snowflakes”), and can get into states where they fail, and the whole array is unrecoverable. Also, if you buy RAID hardware, you actually buy 2 (at least) and put the spares in your closet. Because if your RAID hardware fails, you need a nearly exact match in order to recover your array – RAIDs aren’t compatible between vendors. With a software RAID, if your computer dies, you can just plug the hard drives into ANY OTHER COMPUTER and it’ll work, you can get your data. Third, you don’t need the performance of a hardware RAID. Data access is only as fast as the slowest part of the data channel between you and your data: and that is your home network. Any computer with a software RAID will internally be fast enough to serve your house for any need, from uploading/downloading, to streaming media to your TV.

Finally, you want to run a tape backup monthly, and store it in a safe deposit box off-site, like at your bank. Your tapes don’t do you much good if you lose them in a fire, or they get stolen.

So that’s the end of the practical advice. You are not an enterprise, and the necessity of RAID hardware in the home is greatly overstated, it’s actually a costly and gotcha ridden blunder. Now let’s talk a bit about what RAID is.

It’s a means of organizing drives to increase data integrity or performance. You can organize your disks in a RAID 0, where the data or a given file is interleaved – imagine if every other byte was written to a different drive (imagine you have a pair of drives). Now one file is stored between two drives, and half is on each drive. When reading or writing, you’re saturating two data buses instead of one, and you’re splitting the work load in half, so you can get ostensibly nearly a 2x speedup. RAID 0 offers no data redundancy. Lose any bit of one drive or file, and the whole thing is kaput.

RAID 1 is mirroring. A file is written to two identical drives. Should one drive fail, you have the backup drive to recover. There is no additional performance gain, no additional storage capacity. The problem with mirroring, though, is that if one of your drives fail, you can bet the other drive isn’t far behind. Recovery is paramount, and it can fail – the strain can kill the backup drive, too.

You can combine RAID 1 and 0 to get RAID 10. This is striping across multiple drives for performance, and then mirroring that for redundancy. So if you stripe across 2 drives, now you use 4 drives to mirror the striping. Again, a RAID 10 suffers all the problems of a 0 and a 1.

Then there’s RAID 5. This uses at least 3 drives, and often a 4th is a “hot standby”. The data is stored across the drives in the array, and they’re not merely striped, but there is additional information encoded, I believe they’re called Hasting Codes, such that if any one disk fails, the hot standby goes live, and the data from the failed drive can be reconstituted from the remaining data on disk. The problem with RAID 5 is that again, if one drive fails, the others probably aren’t far behind. RAID 5 tends to fail during recovery. RAID 5 is the most famous, but most vendors today, HP and Dell both come to mind, push RAID 6.

RAID 6 is all the same, except you need at least 4 drives, and 2 can fail before you lose the array. Again, since recovery is the most taxing on a disk, that’s the critical moment when you may lose the array, so that’s what you need to guard against.

And of course, you can combine all the numbers, like there’s RAID 150, which is striped across redundant arrays, and then mirrored. This shit gets insane, and you can only imagine what the cost can be like.