Best Practice for Performance to/from tape

Updated 11/6/2009

 

The purpose of this document is to provide guidelines to enable users to get the best performance from enstore. The main two factors that affect tape performance are streaming and per file overhead, both of which are affected by file size.

 

There are a number of overheads that affect the performance when reading or writing files on tape:

 

·      Mount latency (including load time) – this is on the order of one to two minutes best and can take a very long time depending on how busy the system is and how busy the tape drives are

·      Seek time. For LTO4 it takes about 100s to traverse the length of a tape. The average seek time from the beginning of the tape is ½ this or about 50 seconds. The average seek time from a randomly located file to another randomly located file is 1/3 of this, or 33 seconds.

·      There is a per file overhead to write a file mark. The observed value is 3 seconds. It takes a day to write 30,000 files to a tape. Small files are bad!

·      Back Hitch latency. If a tape stops streaming, in order to be up to speed at the location where it had stopped, the tape must back up (hitch) some distance so it can ramp up to read/write speed by the time the last location passes over the head. Typically IBM drives stream at 4 different rates to reduce back hitch. For example there are 4 streaming rates between around 30 MB/s and 120 MB/s for IBM LTO4 drives.  Hitch back wears the drive and tape.

 

The overheads that dominate writes are the mount latency, the time to seek to the end of data on tape, the file mark overheads for each successive writes, and back hitch latency. 

 

The preferred mode of operation for writing files is to write at least the minimum streaming rate (30MB/s) with enough files that the overheads become insignificant and at file sizes such that the file mark is insignificant. In order to render the file mark overhead insignificant, your file size should be such that the time it takes to read/write the file on tape is significantly more than this overhead, say a factor of 10. For a 30MB/s 9940B drive, this is about 1GB. For a LTO-3 streaming at 40MB/s it is 1.3 GB, and for an LTO-3 streaming at 80MB/s this is 2.6GB.

 

The overheads that dominate reads are the mount latency and seek times.

 

The preferred mode for reading tapes is to read files successively in the order they were written and in batches large enough that other overheads become insignificant.  Note it is best to read successive files without skipping any.  Note also that enstore takes care of the ordering of reads from a tape by sorting queued requests for files on a tape in the order they were written to the tape. The experiment should attempt to read as many files from a tape as it can at once rather than individual files spread over man tapes.

 

There is another very important reason not to write small files. Enstore administrators maintain the integrity of your data by copying a tape to new media when the tape has had too many mounts (nominally 2000), and migrates tapes to denser media to save space and also to move off of media that is becoming obsolete.  A tape with 30,000 files takes more than a day to copy (just writing the file-marks)! This ties up costly resources – the tape drives. It is recommended that no more than 3000 file be written to a single tape. Using the file size recommendations based on overhead considerations limits the number of files/tape to the hundreds.

 

Tape drives have a rate that they stream at, that is, they continue writing (reading) at this rate without starting and stopping as long as data is written (read) at this rate or faster. A drive will stop streaming to write a file-mark. Keeping a drive streaming provides the best rate possible and does not wear the drive as much as the continual starting and stopping when not streaming.

 

9940B

30MB/s

LTO-2

35MB/s

LTO-3

40MB/s  and 80MB/s

LTO-4

30MB/s - 120MB/s

 

Table 1. Drive streaming rates

 

A drive will not stream for very long when small files are written to it, or if the rate the file is being provided is slower than the drive’s streaming rate(s). Enstore has a large buffer memory on its movers that pretty much assures that reads of large files stream, regardless of how fast they are read out. With writes, the drive will not stream unless the rate provided over the network is sufficient. For 9940B drives, this rate is 30MB/s. LTO-3 (IBM) can stream at 40MB/s or 80MB/s. The usual culprit in not achieving these rates is the disk on the client computer – network rates are usually not an issue.

.

Below is a table of recommended file sizes and disk rates to get the best performance from enstore:

 

Technology

Recommended File Size

Disk rate to stream writes

9940B

1-2GB

> 30 MB/s

LTO-2

1-2GB

> 35MB/s

LTO-3

2-5GB

> 40-80 MB/s

LTO-4

3-6GB

> 30 to 120 MB/s

 

Table 2: Enstore recommended files sizes and rates

                                                                                                                                                                                                                

Note that these file sizes also keep the drive streaming for a significant time (~1 minute) so that the wear on the drives from stopping/starting is also minimized.