This concept of RAID has been developed to fulfill the needs of current multimedia and other data hungry application programs, and which require fault tolerance to be built into the storage device. Further, techniques of parallel processing are also suitable to exploiting the benefits of such an arrangement of hard disks.
Raid technology offers some significant advantages as a storage medium:
The cost per megabyte of a disk has been constantly dropping, with smaller drives playing a larger role in this improvement. Although larger disks can store more data, it is generally more power effective to use small diameter disks (as less power consumption is needed to spin the smaller disks). Also, as smaller drives have fewer cylinders, seek distances are correspondingly lower. Following this general trend, a new candidate for mass storage has appeared on the market, based on the same technology as magnetic disks, but with a new organisation. These are arrays of small and inexpensive disks placed together, based on the idea that disk throughput can be increased by having many disk drives with many heads operating in parallel. The distribution of data over multiple disks automatically forces access to several disks at one time improving throughput. Disk arrays are therefore obtained by placing small disks together to obtain the performance of more expensive high end disks.
The key components of a RAID System are:
Disk arrays can be used to store large amounts of data, have high I/O rates and take less power per megabyte (when compared to high end disks) due to their size, but they have very poor reliability.
What do you think is the reason of this low reliability?
As more devices are added, reliability deteriorates (N devices generally
have the reliability of a single device).
Files stored on arrays may be striped across multiple spindles. Since a high capacity is available due to the availability of more disks, it is possible to create redundancy within the system, so that if a disk fails the contents of a file may be reconstructed from the redundant information. This off course leads to a penalty in capacity (when storing redundant information) and in bandwidth (to update the disk). Four main techniques are available to overcome the lack of reliability of arrays:
in the expected availability and increased reconstruction times. This approach is generally aimed at high bandwidth scientific applications (such as image processing).
Each disk within the array needs to have its own I/O controller,
but interaction with a host
computer may be mediated through an array controller as shown in figure 5.1.
A disk array link to the host processor
It may also be possible to combine the disks together to produce a collection of devices,
where each vertical array is now the unit of data redundancy. Such an arrangement is
called
an orthogonal RAID and shown in figure
5.2; other arrangements of disks are also possible.
Orthogonal RAID
Figure 5.3 identifies disk performance for a number of machines and
operating systems. The Convex supercomputer seems to provide the best performance in
terms of throughput (or megabytes transferred per second) based on a 32KB read operation
due to the
use of a RAID disk. The figure also illustrates
transfer rate dependencies between hardware and the particular operating system being
used.
Orthogonal RAID
There are now 8 levels of RAID technology, with each level providing a greater amount of resilience then the lower levels:
disk each time a write request is made. The write is performed
automatically and is transparent to the user, application and
system. The mirror disk contains an exact replica of the data
on the actual disk. Data is recoverable if a drive fails
and may be recoverable if both drives fail. The biggest
disadvantage is that only half of the disk capacity is
available for storage. Also, capacity can only be expanded in
pairs of drives.
Of the RAID levels, level 1 provides the highest data
availability since two complete copies of all information
are maintained. In addition, read performance may be
enhanced if the array controller allows simultaneous
reads from both members of a mirrored pair. During
writes, there will be a minor performance penalty when
compared to writing to a single disk. Higher availability
will be achieved if both disks in a mirror pair are on
separate I/O busses.
Orthogonal RAID
striping of Level 0 with the parity reconstruction
mechanism of Level 3 without requiring an extra
parity drive. In Level 5, both parity and data are striped across
a set of disks. Data chunks are much larger than the
average I/O size. Disks are able to satisfy requests
independently which provides high read performance in
a request-rate intensive environment. Since parity
information is used, a Level 5 stripe can withstand a
single disk failure without losing data or access to
data.
Level 5's strength lies in handling large numbers of small
files. It allows improved I/O transfer performance
because the parity drive bottleneck of Level 4 is
eliminated. While Level 5 is more cost effective
because a separate parity drive is not used, write
performance suffers. In graphic art and imaging
applications, the weakness
of Level 5 versus Level 3 is the write penalty from
the striped parity information. In Level 3 there is no
write penalty. Level 5 is usually seen in applications
with large numbers of small read/write calls. Level 5
offers higher capacity utilisation when the array has less
than 7 drives. With a full array, utilisation is about
equal between Level 3 and 5.
Level 6 became a common feature in many systems but the advent of Level 7 has led to the abandonment of Level 6 in many cases.
These First 6 RAID levels are illustrated in Figure 5.4, where each circle represents a single disk drive, and arrows represent data flows.