Compressing and archiving
The page Using available storage describes the use of the different kinds of disk space that's available at Calcul Québec.
Disk space on the supercomputers is precious so please use it sparingly, that is periodically clean it up to eliminate useless files. If you have computational results that you want to keep, but which aren't needed on a regular basis, we would prefer that you archive and compress these files.
The program tar (from the term tape archive) is a standard program for archiving directories and files on UNIX systems. By combining your files into an archive, you will greatly improve the performance of parallel filesystems both for yourself and all users of the machine.
The principal commands are:
[name@server $] tar cf archive.tar directory/ [name@server $] tar tvf archive.tar [name@server $] tar xf archive.tar
to create, list the contents and extract the contents of a tar file, respectively.
The program gzip (an acronym of Gnu zip) is a free compression program created in 1991. It enables you to significantly reduce the size of many files.
The basic commands for compressing and decompressing files are:
[name@server $] gzip file.txt [name@server $] gunzip file.txt.gz
Archiving and Compression
A comman pattern consists in archiving and compressing at the same time with a single command. You can for example combine the commands tar and gzip to archive and compress the contents of the directory dir1 in the file $ARCHIVE/arch1.tar.gz. Let's suppose that the environment variable $ARCHIVE points towards a directory or partition that you use for archiving your data. To archive and compress or decompress and extract the commands are in order:
[name@server $] tar c dir1/ | gzip > $ARCHIVE/arch1.tar.gz [name@server $] gzip -dc $ARCHIVE/arch1.tar.gz | tar x
Using GNU tar, you can equivalently use:
[name@server $] tar zcf $ARCHIVE/arch1.tar.gz dir1/ [name@server $] tar zxf $ARCHIVE/arch1.tar.gz
- Note: Sometimes the extension .tar.gz is shortened to .tgz but it's the same kind of file.
It's possible to choose another compression algorithm, such as bzip2 (.bz2). You simply have to replace the letter z by j. The commands then become:
[name@server $] tar jcf $ARCHIVE/arch1.tar.bz2 dir1/ [name@server $] tar jxf $ARCHIVE/arch1.tar.bz2
- Note: bzip2 generally provides a higher degree of compression compared to gzip.
Using Compressed Files without Decompressing Them
Several standard Linux commands have versions that can interact with files that have been compressed using gzip and bzip2. This is the case for instance with such basic commands as cat, grep, more, less, cmp and diff. The following table lists the corresponding command names:
|Standard Command||Equivalent for a gz File||Equivalent for a bz2 File|
Compatibility with Other Operating Systems
An achiving format that's popular with other operating systems like Windows or OS X is .zip. It can be useful to be able to archive and extract files using this format and this can be accomplished via the following commands:
[name@server $] zip -r $ARCHIVE/arch1.zip dir1/ [name@server $] unzip $ARCHIVE/arch1.zip
- Note: Among the various compression algorithms presented in this page, zip provides the worst performance.
.7z, .bzip2, .rar and Other Formats
The extensions .tar and .gz are well-known across Unix/Linux systems and the tools for manipulating such files are normally always available. We therefore strongly recommend that users employ these formats in the interests of compatibility. There exist however several other compression formats, one of which that is particularly efficient in terms of compression is LZMA with the extension .7z.
To archive/compress, list and extract the contents of the directory dir3, we use:
[name@server $] 7z a arch3.7z dir3/ [name@server $] 7z l arch3.7z [name@server $] 7z e arch3.7z
It's important to note that the .7z format doesn't save all of the properties and permissions of the files. It's better to combine 7za with the command tar in the following manner for archiving with compression and extracting with decompression respectively:
[name@server $] tar c dir4/ | 7z a -si arch4.tar.7z [name@server $] 7z e -so arch4.tar.7z | tar x
The command 7za can also be used to decompress files with the extensions .zip, .bz2 and .rar. To decompress a file file.zip in zip format, common among Windows users, you could use for example:
[name@server $] 7z e -tzip file.zip
For text files, the option -m0=PPMd is strongly recommended, it allows a much quicker compression than the default algorithm:
[name@server $] 7z a -t7z -m0=PPMd file.7z file.txt