Updated
11/6/2009
The purpose of this document is to provide
guidelines to enable users to get the best performance from enstore. The main
two factors that affect tape performance are streaming and per file overhead,
both of which are affected by file size.
There are a number of overheads that affect
the performance when reading or writing files on tape:
· Mount latency (including load time) – this is on the
order of one to two minutes best and can take a very long time depending on how
busy the system is and how busy the tape drives are
· Seek time. For LTO4 it takes about 100s to traverse
the length of a tape. The average seek time from the beginning of the tape is ½
this or about 50 seconds. The average seek time from a randomly located file to
another randomly located file is 1/3 of this, or 33 seconds.
· There is a per file overhead to write a file mark. The
observed value is 3 seconds. It takes a day to write 30,000 files to a tape. Small files are
bad!
· Back Hitch latency. If a tape stops streaming, in
order to be up to speed at the location where it had stopped, the tape must
back up (hitch) some distance so it can ramp up to read/write speed by the time
the last location passes over the head. Typically IBM drives stream at 4
different rates to reduce back hitch. For example there are 4 streaming rates
between around 30 MB/s and 120 MB/s for IBM LTO4 drives. Hitch back wears the drive and tape.
The overheads that dominate writes are the
mount latency, the time to seek to the end of data on tape, the file mark
overheads for each successive writes, and back hitch latency.
The preferred mode of operation for writing
files is to write at least the minimum streaming rate (30MB/s) with enough
files that the overheads become insignificant and at file sizes such that the
file mark is insignificant. In order to render the file mark overhead
insignificant, your file size should be such that the time it takes to
read/write the file on tape is significantly more than this overhead, say a
factor of 10. For a 30MB/s 9940B drive, this is about 1GB. For a LTO-3
streaming at 40MB/s it is 1.3 GB, and for an LTO-3 streaming at 80MB/s this is
2.6GB.
The overheads that dominate reads are the
mount latency and seek times.
The preferred mode for reading tapes is to
read files successively in the order they were written and in batches large
enough that other overheads become insignificant. Note it is best to read successive files without skipping
any. Note also that enstore takes
care of the ordering of reads from a tape by sorting queued requests for files
on a tape in the order they were written to the tape. The experiment should
attempt to read as many files from a tape as it can at once rather than
individual files spread over man tapes.
There is another very important reason not to
write small files. Enstore administrators maintain the integrity of your data
by copying a tape to new media when the tape has had too many mounts (nominally
2000), and migrates tapes to denser media to save space and also to move off of
media that is becoming obsolete. A tape with 30,000 files takes more than
a day to copy (just writing the file-marks)! This ties up costly resources –
the tape drives. It is recommended that no more than 3000 file be written to a
single tape. Using the file size recommendations based on overhead
considerations limits the number of files/tape to the hundreds.
Tape drives have a rate that they stream at,
that is, they continue writing (reading) at this rate without starting and
stopping as long as data is written (read) at this rate or faster. A drive will
stop streaming to write a file-mark. Keeping a drive streaming provides the
best rate possible and does not wear the drive as much as the continual
starting and stopping when not streaming.
|
9940B |
30MB/s |
|
LTO-2 |
35MB/s |
|
LTO-3 |
40MB/s and 80MB/s |
|
LTO-4 |
30MB/s - 120MB/s |
Table
1. Drive streaming rates
A drive will not stream for very long when
small files are written to it, or if the rate the file is being provided is
slower than the drive’s streaming rate(s). Enstore has a large buffer memory on
its movers that pretty much assures that reads of large files stream,
regardless of how fast they are read out. With writes, the drive will not
stream unless the rate provided over the network is sufficient. For 9940B
drives, this rate is 30MB/s. LTO-3 (IBM) can stream at 40MB/s or 80MB/s. The
usual culprit in not achieving these rates is the disk on the client computer –
network rates are usually not an issue.
.
Below is a table of recommended file sizes
and disk rates to get the best performance from enstore:
|
Technology |
Recommended File Size |
Disk rate to stream writes |
|
9940B |
1-2GB |
> 30 MB/s |
|
LTO-2 |
1-2GB |
> 35MB/s |
|
LTO-3 |
2-5GB |
> 40-80 MB/s |
|
LTO-4 |
3-6GB |
> 30 to 120 MB/s |
Table
2: Enstore recommended files sizes and rates
Note that these file sizes also keep the
drive streaming for a significant time (~1 minute) so that the wear on the
drives from stopping/starting is also minimized.