Skip to content

Simple, reliable, rate-limited backups

Wow, once again it’s been a long time since I’ve written anything. The series on marrying Active Directory with *nix systems is by no means dead, I’ve just been forced to spend all of my research time tinkering with aspects of AD integration that don’t really fit in well with the series, so until I get past that, it’ll remain on hold.

In the meantime, I’ve been meaning to write up a brief post about a backup script that I’ve been using to back up a number of servers that I manage. Briefly, the script archives a local copy of the previous backup, creates tarballs of specific directories for the current backup, then transfers those to a remote backup server. That gives me at least one good local backup at any given time (two when the script is not actually running) and as many remote backups as I choose to keep — I’ve long considered implementing a `find` cron job to clean up old backups, but disk space is cheap, so I usually let them sit as long as possible and manually clean them up a few times a year.

The script doesn’t rely on much that you wouldn’t expect to have on a typical *nix server — tar, gz, scp are the minimum requirements, and screen is highly recommended if you choose to run the script manually. So, without further ado… actually, a disclaimer is in order — the fundamental concepts of this script I came across while lending a hand to a friend of mine who needed someone to keep an eye on his server while away on an extended trip around the world. I’ve polished it up, added some sanity checks, and combined a two step process into a single script, but the script below is definitely a derivative work, so credit is due to Trevor for his original work.

backupscript.txt

So, let’s break it down:

#!/bin/bash

Bash is definitely a requirement, as we’ll use it’s string manipulation abilities to munge multilevel paths into sane filenames.

backup_dirs="/etc/backup_dirs"
backup_path="/export/backup"
backup_location="backup@myserver.com:/export/backup/myfiles/"
scp_opts="-l 4096"

The variables set here are:

  • $backup_dirs is a text file containing a list of directories to back up, one per line, and I’m pretty sure trailing slashes are discouraged, though adding a line to make them merely irrelevant would be trivial.
  • $backup_path is the directory in which the local backups will be kept. It shouldn’t initially contain anything other than a directory called old.
  • $backup_location is the remote path that the backups will be transferred to via scp. This definitely works best if you have SSH keys set up
  • $scp_opts is a list of additional options that can be passed to scp for the remote transfer — in this example, it’s used to rate limit the connection to 4Mbps
# check for screen
if [[ $TERM != 'screen' ]] then
  echo "ERROR: seriously, run me inside screen"
  exit 1
fi

This is a completely optional check to make sure the script isn’t being run from a terminal session that you might not want to rely on keeping open — starting a screen session first will spare you some headaches if you’re running the script manually. If you plan to use the script both from cron and for manual backups, prepending `TERM=screen` to the cron job will allow you to keep the check in place but still easily trigger backups via cron.

# make sure we have list of directories to backup
if [[ ! -f $backup_dirs ]] then
  echo "ERROR: $backup_dirs not found; nothing to backup!"
  exit 1
else
  for dir in `cat $backup_dirs` ;  do
    if [[ ! -d $dir ]] then
      echo "ERROR: $dir is not a valid directory!"
      exit 1
    fi
  done
fi

This goes through the file $backup_dirs and makes sure that the listed items are in fact directories that exist on the filesystem. If either the file or any of its entries are not found, the backup will not continue.

# get passphraseeval `ssh-agent -s`ssh-add

Another optional element — I highly recommend passphrases on keys, and if you don’t use a utility such as keychain to manage your passphrases, this snippet will allow you to enter the passphrase at the beginning of the script, so that it will be stored when the script is ready to start the uploads.

# run local backup
echo backup started: `date`now=`date +%Y%m%d`
nice rm $backup_path/old/*.gz
nice mv $backup_path/*.gz $backup_path/old
for dir in `cat $backup_dirs` ; do
  echo $dir  dirname=${dir:1}
  nice tar -zcf $backup_path/${dirname//\//_}-$now.tar.gz $dir
  chmod 0600 $backup_path/${dirname//\//_}-$now.tar.gz
done
ls -l $backup_path/*.tar.gz
echo backup complete: `date`

Here we get the current date, saved as a variable to keep the backup set together in case the tar processing time overlaps multiple days. If you expect to take multiple backups in a single day (I choose to do weekly backups, but I’ve set this script up for daily/monthly backups as well at times) you’ll want to tweak the `date` command to include a more detailed timestamp. Next we remove the oldest backup from $backup_path/old, move the most recent backup into $backup_path/old, and start a new backup. The manipulations of $dir -> $dirname are intended to strip away the leading slash and to replace any internal slashes with underscores, so that a backup for /var/www (on 31-Oct-09) would be saved as var_www.tgz-20091031.

# upload
echo upload started: `date`
scp $scp_opts $backup_path/*.tar.gz $backup_location
echo upload complete: `date`

Lastly, we start the upload. This is almost anti-climactic. If you get charged for straight bandwidth usage (or don’t have to pay for bandwidth at all) then rate-limiting via the $scp_opts variable is a concern only if you need to reserve bandwidth or processing time (encryption can be expensive at higher speeds) for other tasks. But if you pay for a burstable connection, you’ll want to make sure that the rate limiting options are tuned to keep the bandwidth usage (of the backup plus any baseline/expected traffic) below your trigger rate.

In conclusion — this is definitely not the only way to do backups. It’s certainly not the most network or space efficient. It is, however, quite reliable and simple enough to be easily grasped, an important factor if you’re concerned about making sure that you actually know how to recover from your backups — and you definitely should be.

Categories: Random.

Tags: , , , , ,

Comment Feed

No Responses (yet)



Some HTML is OK

or, reply to this post via trackback.