Tarballing Files¶
Tarballing is the process of bundling multiple files into one file. It is Compute Canada’s recommendation that a collection of multiple small files be tarballed into large files. It is also the recommended way to store files in Nearline since smaller files are not written to tape. Systems like Cedar, which have a conservative number of files allowed in Nearline (5K) also warrant tarballing to keep the file count low.
A tarball can also be compressed using the gunzip command. This has its limitations, such as not being able to add files to an existing gunzipped tarball. It is also unlikely to reduce the size of certain file formats like videos and images.
Working with Tarballs (CLI)¶
Note
You can also use a GUI of your choice. The CLI is often the only choice for working over SSH
.
Tarballing/Compressing Files¶
To make a tarball
# tarball
$ tar -cvf <.tar file> /path/to/folder/to/tarball/
# e.g. $ tar -cvf example.tar videos/
# gunzipped tarball
$ tar -cvzf <.tar.gz file> /path/to/folder/to/gunzip/
# e.g. $ tar -cvzf example.tar.gz videos/
Adding files to existing archive (only .tar
files)
Extracting files¶
To extract a tarball
$ tar -xvf <.tar file> -C </directory/to/extract/to/>
# or to extract in current directory
$ tar -xvf <.tar file>
To extract a gunzipped tarball
$ tar -xvzf <.tar.gz file> -C </directory/to/extract/to/>
# e.g. $ tar -xvzf
# or to extract in current directory
$ tar -xvzf <.tar.gz file>
Extract particular file/folder from tarball/gunzipped tarball
$ tar --extract <.tar/.tar.gz file> <path/to/file>
# e.g. $ tar --extract example.tar example.txt
Extract multiple files
# tarball
$ tar -xvf <.tar file> "<path/to/file1>" "<path/to/file2>"
# gunzipped tarball
$ tar -xvzf <.tar.gz file> "<path/to/file1>" "<path/to/file2>"
Extract files by wildcard
# tarball
$ tar -xvf <.tar file> --wildcards "*.mp4"
# gunzipped tarball
$ tar -xvzf <.tar.gz file> --wildcards "*.mp4"
Viewing Contents of Archive¶
To view contents of a tarball/gunzipped tarball
$ tar -tvf <.tar/.tar.gz file>