Easy Backup

You just lost an important file. Looking into it, you find a whole directory tree is gone. Whether you erased it or the system killed it, can you get it from your backup? Easily?

Regular Backups

Backing up Linux and Windows systems regularly reduces the horror of a system failure. While all sorts of backup software will get the job done, anticipating single file loss is just as important as preparing for full system failure. With full system failure, go back to your operating system installation disk and reinstall. That will reformat your drive and lay your OS on it — assuming that your drive had a software-only failure that did not require you replace the hardware. Restore from your full backup. Then run a system update to bring your software to current condition in case something changed since your loss, including reconnecting to any subscription services. But if only a few files or a directory tree needs restoring, not the entire system, can you get it quickly without going through the entire restore process?

Unfortunately, most backup software seems more intent on saving space in the backup and less intent on accessibility to its contents. It isn’t often that you need to restore an entire file system. More often you need one file or directory.

Now.

Not hours from now. If you can’t search your backup contents and restore portions from it, the backup is getting in your way.

Rotating Backups

When all you can do is backup everything, such as on Qubes, a security-oriented Linux variant, the entire system backup with all the virtual machines and the primary VM (known as dom0) goes to an archive, possibly compressed, possibly encrypted. Such a file requires rotation because there is no incremental update option. I make a new one every Sunday. (To learn more about backup on Qubes, go here or directly.)

For example, if four distinct archives can store before running out of external drive space, before backing up the latest archive delete the oldest to reclaim space, then run the current backup. Repeat this rotation method next time. When a crash happens restore from the most recent archive. If something goes wrong with the most recent three other archives are available.

Some archives allow you to extract from the archive and grab one or more individual files or directory trees. Qubes does. While the manual extraction technique is not as straightforward as a full restore, with multiple rotating archives a file or tree is not likely to fully disappear, lost forever.

Incrementals

A company, with many people potentially making many separate mistakes, benefits from multiple snapshots because of the time between errors made and repairs processed. Keeping daily snapshots makes sense for multiple systems and servers.

For an individual, waiting for a full backup every time is a PITA. Just backup the files that changed since the last time. No need for a full archive every time no matter how little space it takes up unless you really want multiple snapshots. When a single file or directory fails, plug in the backup drive and restore just the one file or the directory tree.

Incremental backups help this problem. What if each incremental is stored separate from the previous, all distinct from the primary backup? You’d have to search through multiple archives before finding the right file or directory tree to restore.

Instead, use a fast and simple file structure duplicating your drive’s file layout. External disk storage is cheap, typically using a fast USB connection to your system. Plug it in, run one command, and walk away. Backup this way regularly so the latest file system snapshot exists. Got a faster interface drive than USB? Use it.

Beware. USB sticks come off the shelf formatted with the old FAT (FAT32) file system. For the manufacturer, FAT makes the stick compatible with any OS: Windows, Mac, or Linux. But for the user, FAT has a file size limit of 4G. No single file can exceed that size on the FAT format. Not much of a problem until you want to store a large video file or a large archive.

Recent file systems do not have this file size limit. Reformat your stick or drive right out of the box to use your native file system type, NTFS (Windows), EXT3 (Linux), HFS+ (Mac OS X, APFS (or directly) is not supported yet), or some other more appropriate to your system. If the external drive is big enough to store backups for multiple systems, consider multiple partitions each a different file system type. Linux can format, read, and write any of those types. (A later article on formatting a USB reviews how to use it format those types.)

Linux and Mac

The most convenient way to fully backup on Linux and still get to any single file is rsync. Rsync does incrementals, comparing the one it is about to copy to the destination. For each file it is about to copy, if the destination already contains that file in the same directory path, but the timestamps are different, rsync assumes the current file should replace the destination. Same timestamps? It won’t copy.

Maybe that’s not good enough. Timestamps aren’t always reliable. Rsync has an option to compare the data contents of the two files. That takes a bit longer, but if the files’ contents are identical, it won’t copy it. Different contents? Rsync will replace the destination file.

On Linux and Mac backup only your home directory. If the entire file system fails, it will be easier to reinstall your Linux distribution from scratch. Let reinstallation reformat your drive and then you restore your home directory from the backup rather than waste immense time trying to figure out which files to restore. But what about all your special software?

Installed Software List

In your preparations, make sure your home directory holds a list of all the software you installed on the system.

On Fedora Linux, the following command line will make the list:

time idle rpm -qa | sort >rpmlist.${HOSTNAME}.txt
  1. The optional time prefix tells how long the command took when it’s done.
  2. The optional idle alias acts as a prefix to run the command following it only during idle CPU moments. This lets your other, more important, time-consuming commands already running get most of the system’s usage.
  3. The rpm -qa command queries (-q) all (-a) the installed packages and outputs their names including their versions.
  4. Pipe the output to sort by package name.
  5. Redirect the sorted output to a file using your hostname, taken from the shell variable HOSTNAME, in case you keep multiple hosts.

NOTE: To create the idle alias, put the following command in your ~/.bashrc file. (Remember that “~” refers to your home directory.)

alias idle='chrt --idle 0 ionice -t -c3'

If you have lots of aliases, put them into a ~/.bash_aliases file and load them from your ~/.bashrc script using the code:

if [ -f ~/.bash_aliases ]
then
  . ~/.bash_aliases
fi

That makes your idle alias a prefix to your command lines, but use it after the built-in time prefix.

Making that list in one of my Qubes Fedora Application VMs (AppVMs) took about 1.26 seconds for 2,205 names.

On Debian and Ubuntu Linux, use apt list instead of rpm -qa to do the same thing. Of course, use aptlist in the file name instead of rpmlist. Running that command took 2.09 seconds in one of my Debian (jessie) AppVMs.

Backup Your Home Directory

To backup your home, use the command:

time idle rsync -av --delete ~/ /PathToYourBackupDrive/${HOSTNAME}/

where:

  1. The -a option archives all the files and subdirectories starting with the first name after the last hyphenated option.
  2. The -v option (verbose) shows rsync‘s progress with file names.
  3. The –delete option destroys any destination files not already in the original directory. This cleans older files in the destination that you erased from your origin directory. To keep those older files, don’t use this option, but beware detritis build-up.
  4. After all options comes the origin directory. Use a trailing slash so rsync won’t create a subdirectory in the destination named after your origin directory unless you want your home directory’s name there.
  5. Next comes the destination directory. This is the path to the mounted backup drive. If you’re not sure what that mounted drive is, use df -h to look for it. The device name (/dev) typically uses an “sd” device name with a drive letter and a partition digit, such as /dev/sdb1 or /dev/sdc1. The df output left column shows that file system’s device name. The destination mount point path appears in the right column. Use the mount path in the right column as “PathToYourBackupDrive” for rsync‘s destination. Adding the ${HOSTNAME} tells rsync to use your system’s unique host name as the primary directory to copy your home directory tree into.

If you fear a power failure or other connectivity problem, add the P option (such as -avP). This keeps temporary data about the transfer on the destination. If failure happens during mid-backup, rerunning the same command line — including the P — will pick up where it left off. The P option also shows the progress when very large files transfer.

Rotating backups instead of doing incrementals requires augmenting the hostname, such as adding the current timestamp. For example, consider using

${HOSTNAME}-$(date '+%Y%m%dT%H%M%S.%N')

on the command line after the path to your backup drive. This adds a hyphen and the date and time to the nanosecond when the backup began, separating the date from the time with a “T”. If you prefer hyphens and colons instead, use

${HOSTNAME}-$(date '+%FT%T.%N')

Remember that if you do this each backup will be full, not incremental, and will require you eventually rotate the oldest out to make space for the newest.

For higher security, write your backups to a fully encrypted drive, especially if your backup drive is a removable device, such as a USB stick or a USB connected external drive. (See my later article on formatting a USB.)

Windows

Windows does not have rsync to do incremental backups. One desktop app took over 20 minutes to build some sort of file index for the backup and hadn’t even started the backup. Ridiculous! Refusing to wait any longer, I stopped it while still building its list. Instead, backup software should qualify a file only when it will actually use the file, deliver it as needed, then look at the next file. Stop wasting time!

I’ve seen that kind of time wasting in other programs, such as desktop background changing programs, including on Linux desktop. (A future article will address background changing.)

For Windows, xxcopy has a free version and behaves similarly to rsync. Use it on the PowerShell command line.

Run PowerShell as administrator. If you’ve never used PowerShell, type powershell in the Windows search box to get a list. Select the plain name, not one of its variations, from the list by right-clicking on it to get a menu. Click on the menu item, “Run as Administrator”, to open the command line window in the system32 directory. Check with dir xx* that the xxcopy program is installed in the system32 directory.

Then, run the following command line:

$d=[system.net.dns]::gethostname(); Measure-Command {xxcopy $home i:\$d\ /clone /pb /pz0}

where:

  1. $d=[system.net.dns]::gethostname() stores the hostname
  2. $home is the origin directory known to the system
  3. i:\$d\ is the destination directory using the hostname (replace “i:” with your mounted drive letter)
  4. /clone option sends only files not already there
  5. /pb option shows a Progress Bar
  6. /pz0 option suppresses the destination approval prompt

The Measure-Command runs xxcopy inside the braces. When done, it shows the time it took for the whole run. This timing info, hours, minutes, seconds, and milliseconds, and the use of system access variables makes using PowerShell more interesting than cmd.exe.

The destination location, uses drive letter “i:” for the backup drive in this example, but your drive letter for your external storage may differ. Check your mount point in the Files listing. Unless you want to keep several incrementals, use the same destination directory name to allow recent files to overwrite older files. New files will add while old files no longer in the origin directory cause xxcopy to delete them from the destination directory. If you prefer multiple incrementals, add a date after the $d designation. For example, use

get-date -format filedatetime

and store that as a variable. Then use that variable with the $d, such as:

$d=[system.net.dns]::gethostname(); $t=get-date -format filedatetime; Measure-Command {xxcopy $home i:\$d-$t\ /clone /pb /pz0}

This date format will augment the hostname using the date and time in “yyyymmddThhmmssxxxx” format: 4-digit year, 2-digit month, 2-digit day, a “T” between the date and time akin to RFC-8601 and RFC-3339, then the local time 2-digit hour, 2-digit minute, 2-digit second, and 4-digit fraction of a second. Microsoft characterizes this 4-digit fraction of a second as a representation of milliseconds, but that would be three digits, not four. Going out to the fourth place gives ten-thousandths of a second, or tenths of a millisecond. I’d argue they should have used 3-digit milliseconds, or 6-digit microseconds, or 9-digit nanoseconds, but Microsoft does what it wants. Because each run using the $d-$t file naming scheme will deliver a whole new directory, the total backup time will be long. After enough of them, the oldest will have to be deleted — rotated out of use — to make way for the newest one.

Using the /clone option will do the backup every time, even the first time, but you could use the /backup option the first the time and /clone thereafter. The /clone option will delete files in the destination directory that are not in the origin directory. Thus /clone is equivalent to the rsync –delete option.

The /pb option shows the file names and sizes as they go to the destination.

Finally, the /pz0 option suppresses excess safety of prompting for approval of the destination directory.

Fully encrypted external drives are not readily available on Windows unless you sign in using a Microsoft account. That puts your keys on the Windows server. Other software supporting full drive encryption for Windows has come and gone. LUKS encryption support software for Windows does exist. (See here or directly and here or directly, and my formatting a USB article.)

Conclusion

Next time you run rsync (or xxcopy) with the same destination directory name it will update that directory to use your most recent file state. Recover files fast from the external drive. By the way, the shell (both Linux bash and Windows PowerShell) will remember your last use of the command. Just use the up-arrow key to go backward in the command history to find it.

Leave a Comment