RAID -- Redundant Array of Inexpensive Disks

Next: Optical Storage Up: Storage Media Previous: Basic Storage

RAID -- Redundant Array of Inexpensive Disks

This concept of RAID has been developed to fulfill the needs of current multimedia and other data hungry application programs, and which require fault tolerance to be built into the storage device. Further, techniques of parallel processing are also suitable to exploiting the benefits of such an arrangement of hard disks.

Raid technology offers some significant advantages as a storage medium:

Affordable alternative to mass storage
High throughput and reliability

The cost per megabyte of a disk has been constantly dropping, with smaller drives playing a larger role in this improvement. Although larger disks can store more data, it is generally more power effective to use small diameter disks (as less power consumption is needed to spin the smaller disks). Also, as smaller drives have fewer cylinders, seek distances are correspondingly lower. Following this general trend, a new candidate for mass storage has appeared on the market, based on the same technology as magnetic disks, but with a new organisation. These are arrays of small and inexpensive disks placed together, based on the idea that disk throughput can be increased by having many disk drives with many heads operating in parallel. The distribution of data over multiple disks automatically forces access to several disks at one time improving throughput. Disk arrays are therefore obtained by placing small disks together to obtain the performance of more expensive high end disks.

The key components of a RAID System are:

Set of disk drives, disk arrays, viewed by user as one or more logical drives.
Data may be distributed across drives
Redundancy added in order to allow for disk failure

Disk arrays can be used to store large amounts of data, have high I/O rates and take less power per megabyte (when compared to high end disks) due to their size, but they have very poor reliability.

What do you think is the reason of this low reliability?

As more devices are added, reliability deteriorates (N devices generally have $\frac{1}{N}$ the reliability of a single device).

Files stored on arrays may be striped across multiple spindles. Since a high capacity is available due to the availability of more disks, it is possible to create redundancy within the system, so that if a disk fails the contents of a file may be reconstructed from the redundant information. This off course leads to a penalty in capacity (when storing redundant information) and in bandwidth (to update the disk). Four main techniques are available to overcome the lack of reliability of arrays:

Mirroring or shadowing of the contents of disk, which can be a capacity kill approach to the problem. Each disk within the array is mirrored and a write operation performs a write on two disks - resulting in a 100% capacity overhead. Reads to disks may however be optimised. This solution is aimed at high bandwidth, high availability environments.
Horizontal Hamming Codes: A special means to reconstruct information using an error correction encoding technique. This may be an overkill, as it is complex to compute over a number of disks.
Parity and Reed-Soloman Codes: Also an error correction coding mechanism. Parity may be computed in a number of ways, either horizontally across disks or through the use of an interleaved parity block. Parity information also has to be stored on disk, leading to a 33% capacity cost for parity. Use of wider arrays reduces the capacity cost, but leads to a decrease
in the expected availability and increased reconstruction times. This approach is generally aimed at high bandwidth scientific applications (such as image processing).
Failure Prediction: There is no capacity overhead in this technique, though it is controversial in nature, as its use cannot be justified if all errors or failures can be forecast correctly.

Each disk within the array needs to have its own I/O controller, but interaction with a host computer may be mediated through an array controller as shown in figure 5.1.

A disk array link to the host processor

It may also be possible to combine the disks together to produce a collection of devices, where each vertical array is now the unit of data redundancy. Such an arrangement is called an orthogonal RAID and shown in figure 5.2; other arrangements of disks are also possible.

Orthogonal RAID

Figure 5.3 identifies disk performance for a number of machines and operating systems. The Convex supercomputer seems to provide the best performance in terms of throughput (or megabytes transferred per second) based on a 32KB read operation due to the use of a RAID disk. The figure also illustrates transfer rate dependencies between hardware and the particular operating system being used.

Orthogonal RAID

There are now 8 levels of RAID technology, with each level providing a greater amount of resilience then the lower levels:

Level 0: Disk Striping

-- distributing data across multiple drives. Level 0 is an independent array without parity redundancy that accesses data across all drives in the array in a block format. To accomplish this, the first data block is read/written from/to the first disk in the array, the second block from/to the second disk, and so on. RAID 0 only addresses improved data throughput, disk capacity and disk performance. In RAID 0 data is striped across the various drives present. Although striping can improve performance on request rates, it does not provide fault tolerance. If a single disk fails, the entire system fails. This would therefore be equivalent to storing the complete set of data on a single drive, though with a lower access rate.

Level 1: Disk Mirroring

-- Level 1 focusses on fault tolerancing and involves a second duplicate write to a mirror

disk each time a write request is made. The write is performed automatically and is transparent to the user, application and system. The mirror disk contains an exact replica of the data on the actual disk. Data is recoverable if a drive fails and may be recoverable if both drives fail. The biggest disadvantage is that only half of the disk capacity is available for storage. Also, capacity can only be expanded in pairs of drives.

Of the RAID levels, level 1 provides the highest data availability since two complete copies of all information are maintained. In addition, read performance may be enhanced if the array controller allows simultaneous reads from both members of a mirrored pair. During writes, there will be a minor performance penalty when compared to writing to a single disk. Higher availability will be achieved if both disks in a mirror pair are on separate I/O busses.

Level 2: Bit Interleaving and HEC Parity

-- Level 2 stripes data to a group of disks using a bite stripe. A Hamming code symbol for each data stripe is stored on the check disk. This code can correct as well as detect data errors and permits data recovery without complete duplication of data. This RAID level is also sometimes referred to as Level 0+1. It combines the benefits of both striping and Level 1 - with both high availability and high performance. It can be tuned for either a request rate intensive or transfer rate intensive environment. Level 2 arrays sector-stripe data across groups of drives, with some drives being dedicated to storing Error Checking and Correction (ECC) information within each sector. However, since most disk drives today embed ECC information within each sector as standard, Level 2 offers no significant advantages over Level 3 architecture. At the present time there are no manufacturers of Level 2 arrays.

Orthogonal RAID

Level 3: Bit Interleaving with XOR Parity

-- Level 3 is a striped parallel array where data is distributed by bit or byte. One drive in the array provides data protection by storing a parity check byte for each data stripe.

Level 3 has the advantage over lower RAID levels in that the ratio of check disk capacity to data disk capacity decreases as the number of data drives increases. It has parallel data paths and therefore offers high transfer rate performance for applications that transfer large files. Array capacity can be expanded in single drive or group increments. With Level 3, data chunks are much smaller than the average I/O size and the disk spindles are synchronised to enhance throughput in transfer rate intensive environments. Level 3 is well suited for CAD/CAM or imaging type applications.

Level 4: Block Interleaving with XOR Parity

-- In Level 4 parity is interleaved at the sector or transfer level. As with Level 3, a single drive is used to store redundant data using a parity check byte for each data stripe.

Level 4 offers high read performance and good write performance. Level 4 is a general solution, especially where the ratio of reads to writes is high. This makes Level 4 a good choice for small block transfers, which are typical for transaction processing applications. Write performance is low because the parity drive has to be written for each data write. Thus the parity drive becomes a performance bottleneck when multiple parity write I/Os are required. In this instance, Level 5 is a better solution because parity information is spiralled across all available disk drives. Level 4 systems are almost never implemented mainly because it offers no significant advantages over Level 5.

Level 5: Block Interleaving with Parity Distribution

-- Level 5 combines the throughput of block interleaved data

striping of Level 0 with the parity reconstruction mechanism of Level 3 without requiring an extra parity drive. In Level 5, both parity and data are striped across a set of disks. Data chunks are much larger than the average I/O size. Disks are able to satisfy requests independently which provides high read performance in a request-rate intensive environment. Since parity information is used, a Level 5 stripe can withstand a single disk failure without losing data or access to data.

Level 5's strength lies in handling large numbers of small files. It allows improved I/O transfer performance because the parity drive bottleneck of Level 4 is eliminated. While Level 5 is more cost effective because a separate parity drive is not used, write performance suffers. In graphic art and imaging applications, the weakness of Level 5 versus Level 3 is the write penalty from the striped parity information. In Level 3 there is no write penalty. Level 5 is usually seen in applications with large numbers of small read/write calls. Level 5 offers higher capacity utilisation when the array has less than 7 drives. With a full array, utilisation is about equal between Level 3 and 5.

Level 6: Fault Tolerant System

-- additional error recovery. This is an improvement of Level 5. Disks are considered to be in a matrix formation and parity is generated for each row and column of the matrix. Multidimensional parity is then computed and distributed among the disks in the matrix.

Level 6 became a common feature in many systems but the advent of Level 7 has led to the abandonment of Level 6 in many cases.

Level 7: Heterogeneous System

-- Fast access across whole system. Level 7 allows each individual drive to access data as fast as possible by incorporating a few crucial features:

Each I/O drive is connected to high speed data bus which posses a central cache store capable of supporting multiple host I/O paths.
A real time process-oriented OS is embedded into the disk array architecture -- frees up drives, allowing independent drive head operation. Substantial improvement.
All parity checking and check/microprocessor/bus/cache control logic embedded in this OS.
OS designed to support multiple host interfaces -- other RAID levels support only one.
Additional ability to reconstruct data in the event of dependent drive failure increased due to separate cache/device control, and secondary, tertiary and beyond parity calculation -- up to four simultaneous disk failures supported.
Dynamic Mapping used. In conventional storage a block of data, once created, is written to fixed memory location. All operations then rewrite data back to this location. In Dynamic Memory this constraint is freed and new write locations logged and mapped. This frees additional disk accesses and the potential for a bottleneck.

These First 6 RAID levels are illustrated in Figure 5.4, where each circle represents a single disk drive, and arrows represent data flows.

Next: Optical Storage Up: Storage Media Previous: Basic Storage

Dave Marshall
10/4/2001