How did you like the article?
1
How did you like the article?
1

Create a tar backup: How the archiving works

The archiving program tar is based on an old data backup method, but is still convincing today. The name tar is an acronym, standing for Tape Archiver, or the archiving of tape drives. Even though private users hardly use it today, the program continues to be the most popular tool for archiving on Unix systems. Regular, incremental backups of a server can be created with the packing program. Here we explain how tar functions and which commands are used to run backups.

How does tar work?

Tar is a program for archiving on Linux and related systems. Atypically for this kind of program, tar doesn’t offer compression by default. But the program is very popular because it offers the great advantage that entire directories can be merged into one file. This technology is associated with the program’s history: With a tape drive, the data is transferred successfully to a magnetic tape. This explains the sequential, linear storage of the tar format. New files are attached to the back of the archive. One of the corresponding files is also known as Tarball, since the files are practically “glued together.”

However, to achieve successful compression, tar is often used in combination with gzip. The two programs complement each other perfectly: gzip can only compress individual files. That’s why usually tar is used first, and then gzip (or another program for compression). So, in the end, either .tar.gz or .tzip files are created.

Install tar

With Ubuntu, tar should already be preinstalled. If you use another Linux or Unix distribution, install the helpful program with:

sudo apt-get install tar tar-doc

The tar-doc package is optional: It contains documentation of the archiving program.

Use tar

If you would like to use tar, simply employ this syntax.

tar Option File

The tar options are as follows:

Option Description Special feature  
--help Displays all options    
--version Outputs the version of tar in use    
-c Creates a new archive (create)  
-d Compares files in the archive and in the file system with one another (diff)  
-f Writes an archive in the given file or reads the data out from the given file (file) This option always has to be entered last, since all subsequent entries are interpreted as files  
-z Compresses or decompresses the archive directly with gzip gzip needs to already be installed  
-Z Compresses or decompresses the archive directly with compress compress needs to already be installed; pay attention to capitalization  
-j Compresses or decompresses the archive directly with bzip2 bzip2 needs to already be installed  
-J Compresses or decompresses the archive directly with xz xz needs to already be installed; pay attention to capitalization  
-k Prevents files from overwriting already existing files when extracting from the archive    
-p Maintains access privileges while extracting    
-r Adds a file to an existing archive (recreate) The file is attached to the back of the archive; only functions with an uncompressed archive  
-t Displays the content of an archive (table)  
-u Only attaches files that are younger than their respective versions in the archive    
-v Displays the steps involved in archiving (verbose)  
-vv Displays detailed information about the archiving (very verbose)  
-w Each action must be confirmed    
-x Extracts files from the archive (extract) The files remain in the archive  
-A Attaches the files of an existing archive to another Pay attention to capitalization  
-C Outputs the location from which the files should be extracted Pay attention to capitalization  
-M Creates, displays, or extracts a multi-part archive Pay attention to capitalization  
-L Changes the medium to a specific file size The size is output in kilobytes; pay attention to capitalization  
-W Checks the archive after it’s been written Pay attention to capitalization  
-P Archives all files from the root directory Pay attention to capitalization  
--exclude Excludes files or folders Specified after the creation command with --exclude=<File/Folder>  
-X Reads a list with excluded files Requires a previously created list: -X<List>.list; pay attention to capitalization  
-g Creates a log of all directories, including checksums    

When creating tar archives, you also have the option to create wildcards with an asterisk. If you create a new archive, always indicate the options first, then the file names of the archive that you want to create, and finally the files and folders that it should contain. In the following example, create an archive (-c) from two text files, compress it with gzip (-z), and write it to the file archive.tar.gz (-f):

tar -czf archive.tar.gz example_1.txt example_2.txt

If you want to combine all text files in a directory into an archive, use a corresponding wildcard:

tar -cf text_archiv.tar *.txt

You can also combine complete directories and their subdirectories into an archive. In the following example, /directory1 including all its subdirectories and the contained files is archived, excluding the subdirectory /directory1/subdirectory_x:

tar -cf archive.tar --exclude=”/directory1/subdirectory_x” /directory_1

In the following example, you extract (-x) the compressed (-z) archive that we created in the first example into another directory (-C):

tar -xzf archive.tar.gz -C /home/directory1/archive_directory

To add another file to an archive (which has to be uncompressed), enter the following command:

tar -rf archive.tar example_extra.txt

How does a tar backup function?

Webmasters like using tar to create backups: With it, the directory structure is retained and the program’s functionality allows for lots of fine-tuning, as is evident from the many options. In the following sections, we’ll explain how to generate a full backup with tar, as well as how to create incremental backups with the program.

Create a simple backup with tar

It makes sense for your security strategy if you create a backup script for the archiving of your system instead of simply doing the archiving by hand. This way, you can automatically archive multiple directories, compress them, or transfer them to an external storage device. It’s important for this that you’re fully authorized to read and write in the corresponding directories. First, create a directory called bin in your home directory (in case you don’t have one already), and create the script there. You’ll need to modify the following example script yourself to suit your needs and directory structure:

#!/bin/bash
DATE=$(date +%Y-%m-%d-%H%M%S)
BACKUP_DIR="/targetdirectory/backup"
SOURCE="$HOME/sourcedirectory"
tar -cvzpf $BACKUP_DIR/backup-$DATE.tar.gz $SOURCE

So that you understand exactly what the script is accomplishing, we’ll explain it line for line:

  1. The first line is the so-called Shebang, which informs the operating system which interpreter program it should use. In this case, it should use bash.
  2. Each back up with tar contains a time stamp. This is important so that several backups can be safely separated from one another. The variables are given the following format, for example: Year-Month-Day-HourMinuteSecond, so for example 2017-09-07-152833.
  3. Here you specify the directory in which the backup should be created. The last subdirectory is not ended with “/”, though.
  4. In this line, you specify which directories you want to include in the archive. Here, you can also run multiple directories that are only separated by a space: SOURCE=“$HOME/sourcedirectory1 $HOME/sourcedirectory2 ”. This position also doesn’t have a “/” at the end of the directories. In any case, pay attention that there is a space before the closing quotation mark.
  5. The last line of the script finally contains the tar command:
    • -cvzpf creates an archive (-c), the steps are displayed (-v), it’s compressed with gzip (-z), the access rights are retained (-p), and everything is output in the following file (-f). In most cases, -v and -p are optional and you have the possibility to add additional options for customizing your backup.
    • To finish up, inform tar using the variable $SOURCE what should absolutely archive. It’s conceivable that you can also exclude directories or files with --exclude or -X that don’t need to be included in the backup.
    • $BACKUP_DIR/backup-$DATE.tar.gz denoted the directory ($BACKUP_DIR) and the file in which the backup should be saved. In our example, we name this backup, followed by the current time stamp. The file name is completed with the specification of the format in which the file is created. If you want to use a different compression, remember to change both the file format as well the option in the command.

Tip

In principle, Linux and Unix don’t play a role in which ending you give the script file. The systems read out the type of file by comparing the file structure with a magic file. This deals with a database usually found in /etc/magic. In any case, it has become common to specify file extensions so that you as a user can keep track of everything more easily.

Now save the file with the name backup in the bin directory, and add this path to the PATH variable:

PATH=$PATH:$HOME/bin

You still need to make the backup script that you created executable:

chmod u+x $HOME/bin/backup

This makes the file executable only for you (u). You can also assign permissions to a group (g), to others (o), or to all (a). You are now finished and can run the script:

sudo backup

If you would like to produce the backup again to extract the archive, then you can do so with this command:

tar -xzf backup.tar.gz -C /

The script creates a full backup. However, it’s not always the right choice when backing up a complete server. Therefore, you should consider whether an incremental backup with tar makes more sense for your purposes.

Note

When creating an archive with absolute path specifications, tar returns the message: “tar: remove leading “/” from member names”. This is not an error message, but rather a note on a safety precaution for the restore process: tar makes the path home/subdirectory from /home/subdirectory. If you’re not in the root directory when extracting the archive, tar creates a new directory structure, for example: /home/subdirectory/home/subdirectory. This reduces the chance of accidentally overwriting your entire system. Remember: Unix doesn’t ask before overwriting. If you really want to replace existing content, you must first navigate to the root directory. But you can get around this using the -P option.    

What is an incremental backup?

Webmasters create regular backups to avoid data loss. Should the actual system be denied, compromised, or deleted, you can install a working version from the backup. The more often you create storage points, the less data loss you’ll have to deal with in case of an emergency. If you only save a complete backup each time, and archive all of the system’s data, it takes a very long time and also requires lots of storage space. Instead, you can create incremental backups.

An incremental backup always requires a full backup. You have to first archive the entire system once (or at least the part that you want to back up). Afterward, only new or modified files are saved with an incremental backup. This results in a much smaller amount of data, but requires more effort when it comes to recovery. When you restore the backup, you’ll need the last full backup as well as any incremental backups you’ve made since then. If a file is lost (which is less likely nowadays than it was when magnet tapes were used), the backup will be incomplete.

Create an incremental backup with tar

With tar, you can create regular incremental backups. You can also write your own backup script. For example, you can specify that a full backup is to be created once a month and then an incremental backup is performed daily. The following script also makes sure that old backups are regularly moved into folders sorted by date. In addition to tar, you also need cron. This daemon (a program that runs in the background) allows for time-based execution of other processes, and is always included with Ubuntu. First, open another text editor and create this script:

#!/bin/bash
BACKUP_DIR=“/targetdirectory/backup”
ROTATE_DIR=“/targetdirectory/backup/rotate”
TIMESTAMP=“timestamp.dat”
SOURCE=“$HOME/sourcedirectory ”
DATE=$(date +%Y-%m-%d-%H%M%S)
EXCLUDE=“--exclude=/mnt/*--exclude=/proc/*--exclude=/sys/*--exclude=/tmp/*”
cd /
mkdir -p ${BACKUP_DIR}
set -- ${BACKUP_DIR}/backup-??.tar.gz
lastname=${!#}
backupnr=${lastname##*backup-}
backupnr=${backupnr%%.*}
backupnr=${backupnr//\?/0}
backupnr=$[10#${backupnr}]
if [ “$[backupnr++]” -ge 30 ]; then
  mkdir -p ${ROTATE_DIR}/${DATE}
  mv ${BACKUP_DIR}/b* ${ROTATE_DIR}/${DATE}
  mv ${BACKUP_DIR}/t* ${ROTATE_DIR}/${DATE}
  backupnr=1
fi

backupnr=0${backupnr}
backupnr=${backupnr: -2}
filename=backup-${backupnr}.tar.gz
tar -cpzf ${BACKUP_DIR}/${filename} -g ${BACKUP_DIR}/${TIMESTAMP} -X $EXCLUDE ${SOURCE}

For this backup script as well, we’ll explain step by step what is happening:

  • First, define the interpreter again.
  • Then, set the variables. New additions are a directory for the rotations of the backups (a type of backup archive), and a file for a timestamp.
  • In our example, we illustrate that it doesn’t always make sense to take all directories along in the backup. In this case, we’ve excluded the contents of the folders mnt, proc, sys, and tmp (but not the folders themselves, hence the “*”). The files in these directories are either temporary or created fresh with each system start.
  • To make sure that all paths are interpreted correctly, the script switches to the root directory with cd /.
  • Set up the backup directory with mkdir, in case it doesn’t exist yet.
  • All variables are now input. Since you want to number your backups sequentially, the code block determines the number of the last backup. This is done by removing the other parts of the file name in the script.
  • You only record 30 backups at a time, after which the script moves all archive files into the rotation folder. This is created first, and then all files starting with the letters b and t are moved into the new folder. The limitation of the letters is explained by the fact that there should only be files marked with those features in the folder: backup and timestamp. Finally, the script resets the backup number to 1. If your script detects that 30 backups haven’t been created yet, it simply increases the file number by 1 (++).
  • Now the script reverses what it did at the beginning: The commands ensure that the file name is complete again – with the new number.
  • Finally, the script runs the actual tar command: As opposed to the command of the simple full backup, there are further options available here. With -g the incremental backup is enabled. For this, tar reads the timestamp of each file, compares it with the data recorded so far in timestamp.dat, and can then decide which changes have been made since the last backup. Only these become part of the new archive.
Note

With daily archiving, the script moves backup files into a new archive folder each month, so that the actual backup directory only contains the current data. There is no built-in function, though, limiting the number of archive folders. That means that these must be manually deleted.

This completes the script for the creation of an incremental backup with tar: Save the file as backup in the bin directory. You also need to export the path here and make the script executable:

PATH=$PATH:$HOME/bin
chmod u+x $HOME/bin/backup

Theoretically, you can now start your backup script with sudo backup. But the idea behind the incremental backup is that the process is automatically run every day. For this, you access cron and change the so-called Crontab. This is the table that sets how cron tasks run. It has six sections:

Minutes (0-59) Hours (0-23) Days (1-31) Months (1-12) Days of the Week (0-7) Task

In these sections, you can either enter the corresponding number value (indicated in parentheses) or an asterisk (*). The latter basically for every possible value. One special feature is the days of the week section. Here you can set that a task is performed, for example, every Monday (1) or only on weekdays (1-5). Sunday can be given using two different values: Either 0 or 7 refers to Sunday, since for some people the week begins on this day and for others it ends.

In the command line, open the editor mode of cron with:

sudo crontab –e

Here, enter the following line:

30 7 * * * /home/bin/backup

This means that the backup will be performed at 7:30 a.m. every day (and every month, regardless of the day of the week). Save your changes, and a daily incremental backup is ready to use.

Note

Cron only functions if your system is running. With web servers, this should be the case anyway. But if you plan to set the script for the backup of your PC or laptop, then you need to make sure that the device is also running at 7:30 every day. If the device isn’t active, the backup simply doesn’t happen. One possibility for avoiding this is offered by anacron. This program delays the planned action to a time when the device is active again.

Restore system from a backup

Nobody would ever wish it upon anyone, but sometimes the worst happens and your system needs to be completely restored. With tar, this is also relatively easily done and requires no additional script. A single command for a full backup isn’t possible though: It’s in the nature of incremental backups that multiple files must be unpacked. In the console, enter these command lines:

BACKUP_DIR=/targetdirectory/backup
cd /
for archive in ${BACKUP_DIR}/backup-*.tar.gz; do
tar -xpzf $archive -C /
done
Note

When restoring a system from a backup, all directories and important files are overwritten.

So that each archive file doesn’t have to be extracted individually, use a for loop:

  1. In the first step, define the directory that contains the backups.
  2. With cd / switch to the root directory to ensure that the archive is extracted at the correct location.
  3. Now start a for loop: This command repeats all actions between do and done until all options have been executed. To specify the command, give the path of your backups again with an asterisk as a wildcard, since you want to unpack all archive files in this directory.
  4. The tar command is specified as follows: You extract (-x), while maintaining the access rights (-p), and decompress (-z) the archive (-f $archive) in the root directory (-C /).
  5. With done, set the end of the loop.

Since you numbered them consecutively with the creation of the archives, the backups are reloaded one after another – starting with the oldest. This is important: In the archives created after the full backup, there are newer file versions. That means that during the loop, the old version is first extracted and then overwritten with a newer version the next time it’s run. In the end, you’ve completely overwritten the entire system with the backup and restored the newest archived version of every file.

This is the actual point of an incremental backup: The complete restoration of the system. With a small detour, it’s even possible to recover only a single file and retrieve the previously archived, last version. Go about this in two steps:

BACKUP_DIR=/targetdirectory/backup
ls -l ${BACKUP_DIR}
for archive in ${BACKUP_DIR}/backup-*tar.gz; do
tar -tzf $archive | grep searched-file;
done

In this first step, you also rely on a for loop, which is used for searching and not for extracting:

  1. Define your backup directory again.
  2. Use the command ls to display all files and folders in the backup directory. The option -l enables detailed information.
  3. Initiate a loop, such as for the restoration of the complete archive.
  4. The important change is found in the options of the tar command: Instead of creating (c) or extracting (x) an archive, display the content of the archive (t). But since you don’t want to search for the file yourself, forward the output to the grep command using a pipe (vertical line). This searches within the output (so, the content of the archive) for the file that you’re looking for.
  5. End the loop.

Now the terminal only displays the searched file – and maybe even more than once, if you’ve edited it regularly and it appears in multiple incremental backups. Now remember the path to the file and build another loop that will restore the last saved version:

for archive in ${BACKUP_DIR}/backup-*.tar.gz; do
tar -xzf $archive -C / targetdirectory/backup/searched-file
done

Now the file is restored to its original location, and overwrites a possible newer version.

Tools Backup