Sunday, February 8, 2004

Snapshot backups

Disclaimer: I haven't used this technique in over 4 years, so I have no idea if this is still possible or works as advertised. It should, but I haven't kept up on Linux and there could be changes to some of the underlying technologies...

Mike Rubel found that using rsync on linux makes for a powerful snapshot-style backup tool. Using a script or two and calling with cron, you can have hourly, daily, weekly and monthly snapshots of your files.


For background information, check out his page Another utility that looks very complete (and based on rsync as well) is rsnapshot.

Quick introduction

We'll setup some drives, two scripts, and a few files indicating the directories that are to be backed up. The amount of time spent on this last step is proportional to how much you want to trim the size of your backups.

We'll setup a destination hard drive to hold snapshots. They'll look something like this:

drwxr-xr-x    5 root     root         4.0K Aug 15 22:00 daily.0/
drwxr-xr-x    5 root     root         4.0K Aug 14 22:00 daily.1/
drwxr-xr-x    5 root     root         4.0K Aug 13 22:00 daily.2/
drwxr-xr-x    5 root     root         4.0K Aug 12 22:00 daily.3/
drwxr-xr-x    5 root     root         4.0K Aug 16 02:00 hourly.0/
drwxr-xr-x    5 root     root         4.0K Aug 16 00:00 hourly.1/
drwxr-xr-x    5 root     root         4.0K Aug 15 22:00 hourly.2/
drwxr-xr-x    5 root     root         4.0K Aug 15 20:00 hourly.3/
drwxr-xr-x    5 root     root         4.0K Jul 31 22:00 monthly.0/
drwxr-xr-x    5 root     root         4.0K Jun 30 22:00 monthly.1/
drwxr-xr-x    5 root     root         4.0K Jun  5 22:00 monthly.2/
drwxr-xr-x    5 root     root         4.0K Aug  9 22:00 weekly.0/
drwxr-xr-x    5 root     root         4.0K Aug  2 22:00 weekly.1/
drwxr-xr-x    5 root     root         4.0K Jul 26 22:00 weekly.2/

As you can see, we'll have access to snapshots from various points in time over the past three months. More importantly, if you need to restore a backup for a crashed system, the most recent snapshot (hourly.0) is a complete image. No need for restoring a full and several partial backups.

The secret that makes this work is the way file systems store files and links to those files on a hard drive. Files that haven't changed are stored once, with hard links from each of the snapshot directories. Files that do change will be copied, and given a new link in hourly.0, while allowing all previous snapshots to continue linking to the old version. See the links under Background for more information.

A new harddrive

For optimum protection, the backups should be stored on their own harddrive. I'll quick step you through the process of installing a new ide harddrive, creating a read/write partition for root access, and creating an NFS readonly share for everyone to see.

Installing new drive

For a quick overview (assuming IDE, no RAID):

Install the drive physically. Make note if it is primary / secondary and master / slave.

As root, use the command fdisk /dev/hd[abcd] to create a linux ext2 partition (id 83) and write the table to disk. ([abcd] should just be one letter, where a is primary-master, b is primary-slave, c is secondary-master, d is secondary-slave. I'll use b for the rest of this example.)

mke2fs -j /dev/hdb1will install la journaled file system on the first partition of the primary-slave drive. A journaled file system is preferred because it can recover from error much more easily.

Mounting for root access

Create a directory to mount the backup snapshots - someplace only root has access. I suggest /root/mounts/backups. To mount the drive, add the following line to: /etc/fstab

/dev/hdb1               /root/mounts/backups  ext3    ro             0 0

And mount it:

mount /root/mounts/backups

And test that you can read and write files to the mount point.

Mounting readonly for everyone else

Based on advice from Mike Rubel (see links above), we'll make an nfs readonly share (for example /var/backups) for users to see the snapshots.

Start by adding the following to /etc/exports
/root/mounts/backups localhost(secure,ro,no_root_squash)

Next make sure nfs and portmap are both installed and running. (Either check the scripts from /etc/rc.d/rc3.d/ and /etc/rc.d/rc5.d/ to insure that they are installed or use the RedHat GUI for "Server settings | Services"to enable these.)NFS will have to be restarted to get the changes to/etc/exports. (You can do this from the GUI Server settings | Services | NFS | Restart or the command /etc/init.d/nfs restart;)

Unfortunately, we cannot mount/var/backups with a simple entry in fstab, since the nfs share will not be available as the fstab file is read during bootup. Instead, I suggest adding the following line to /etc/rc.d/rc.local- as this is the last file parsed during bootup. Thus, nfs will be running which is a pre-requisite.

mount -o ro localhost:/root/mounts/backups /var/backups

Check that everything works. Make sure that you cannot add or edit files in the /var/backups directory. Try remounting the /root/backup_snapshots directory read/write with the following:

% mount -o remount,rw /root/mounts/backups
% mount
(see a listing of all mount points, make sure /root/mounts/backups is read/write
% mount -o remount,ro /root/mounts/backups
(to put it back to readonly)

Setting up the scripts

Download and install

The next step is to download and install the scripts. I installed mine at /etc/backups.

After downloading and unzipping, you should have two new directories: /etc/backups/scripts and /etc/backups/excludes I'll describe each of these later.

Windows sharing

Decide the directories you'd like to have backed up. I'm assuming you'll want to back up files from both a windows pc and your linux box. To backup your windows pc, you'll have to first share out the necessary directories or drives and mount them with samba (alternatively, you can use ssh and rsync directly, but I think that takes a bit more work). I made a new user on windows (winuser below) with a simple password. I gave this user the most minimal permissions, and then I shared out my C drive readonly for just this user. If you are comfortable with this, add the following to /etc/fstab. The only security problems I see with this is that anybody who has access to your linux box now has readonly access to your windows shares. If you have a better idea for samba sharing by machine (similar to nfs) leave a comment.

//windows/c_share /root/mounts/c smbfs username=winuser,password=pass,
  uid=username,gid=group0 0

You can leave out the password, but then you will be prompted for it on a reboot, which is annoying.

Customize scripts

We'll first setup the scripts, then the excludes.

There are two scripts. First is make_snapshot.bash, written by Elio Pizzottelli based on original work by Mike Rubel. Second is run_backups.bash which I wrote as a wrapper. For both scripts, you'll have to setup the paths to required executables. For run_backups.bash, it would be as follows:


We'll look at the remaining steps for run_backups.bash

  1. Setup the backup device and mount point. The following assumes second IDE harddrive mounted in /root/backup_snapshots
  2. Setup the number of hourly, daily, weekly, and monthly snapshots. Also, setup the day of week and day of month to perform weekly and monthly snapshots. A few notes, the NUMBER_OF_HOURLY must be 3 or greater or else no backup will ever occur. The name hourly is somewhat wrong here because hourly can actully be every 2 hours, every day, or whatever frequency you setup cron to run the script. (I use every 2 hours). Because of a requirment of make_snapshots.bash, the NUMBER_OF's must be 3 or greater for that category (hourly, daily, weekly or monthly) to be valid.
    # Number of each backup
    # To be valid, the NUMBER_OF values
    # must be greater than or equal to 3.
    # Setting to 0, 1, or 2 will prevent that backup from occurring.
    # HOURLY must be run in order to get anything.
    # Note that, however, you can have this entire script only run once
    # a day by cron, hence "HOURLY" becomes "DAILY"
    # Pick the day of week and day of month for those backups
    # 0 disables, 1=Monday, 7=SUNDAY
    # 0 disables, otherwise choose a number
    # 01-31, although better not to go above 28 if you want
    # backups in february
  3. Now to setup the shares to be backed up. There are four arrays which must be setup. For each backup, we specify the source directory, the destination directory (which will be under the BACKUP_MOUNT_POINT specified earlier), an EXCLUDES file (which we'll discuss next) and any other options for the rsync program. This is an example of a backup of all of the linux box (source directory is /). The destination is a directory linux (which will be /root/mounts/backups/linux). Note: we'll use excludes later to prevent backing up of directories and files that shouldn't be backed up, including the /var/backups directory and /root/mounts directories. We point to a file which has this infomation at /etc/backups/excludes/linuxroot_exclude.
    #set up backup 1 - linux box
  4. Let's look at another example, for a windows share:
    #set up backup 2 - windows box
  5. Finally, make sure to setup the NUMBER_BACKUPS to the number of backups defined. (e.g. 2 if one linux and one windows)

Customize excludes

As mentioned earlier, the excludes allow you to specify directories and files to be included and excluded from a backup. Their syntax can be somewhat tricky, so you'll need to reference rsync documentation.

Backup 1 - Linux

The idea is simple. In the backup number 1 above, we specified that we wanted all of the linux file system backed up (by using / as the path). There are, however, only a few directories which are important to me. /home, /var/www, /etc, and /root to name a few. This is the excludes file in /etc/backups/excludes/linuxroot_exclude that I use to get those directories. (The notes in parenthesis are not really in the file, but are here for your explanation)

+ /var/         (include the directory /var
               the trailing / means it will only match a directory, not a file
               it includes all subdirectories as well)
+ /var/www/     (include the directory /var/www)
- /var/*/       (now exclude all other directories under /var
               the order was important of the above statements.  Excluding all
               var subdirectories first would invalidate the include of /var/www)
- /var/*        (and exclude all files under /var)
- Pictures/     (exclude any directory Pictures)
- manual/       (exclude any directory manual)
+ /root/        (include /root directory)
- /root/mounts/ (exclude /root/mounts directory -
               since we don't want to backup the backups!)
+ /etc/         (include /etc/ directory)

- *~            (exclude any file ending with ~ )
- /*/           (exclude all other directories under / )
- /*            (exclude all other files under / )

Backup 2 - Windows

Here's the excludes file for the C drive of my pc

+ /IISRoot/                   (include a directory)
+ /projects/                  (include a directory)
- tivo.bak                    (exclude a really big file)
+ /Documents and Settings/    (include a directory)
- Application Data/           (now exclude Application Data directory)
- Local Settings/             (and exclude Local Settings directory)
- My Music/                   (and a few other directories that don't fit)
- My Pictures/
- My Videos/
- My Slide Shows/
- nobackup/
- *.mp3                       (and don't back up any mp3 file)
- /*/                         (exclude any other directory at the root level)
- /*                          (exclude any other file at the root level)

Backup 3 - Outlook Express

I made another backup, in addition to the two shown above. This is a specific backup for outlook express mailboxes. It seems OE decides to update the time stamps on all folders every few minutes. This causes a lot of problems for backing up. In fact, just having a really big inbox getting updated with a few new messages every hour can consume a bit of drive space on the backups since an entirely new copy will be made for 4 hourlies + 3 dailies + 3 weeklies + 3 monthlies. So it might be best to just backup the address book or else only save one extra copy of the inbox every day or so. But here's my setup:


Note the size only option here. This is passed to rsync, so that it only judges a changed file by size (and not date, ownership, etc). And here is the corresponding excludes file

+ /Documents and Settings/
+ /Documents and Settings/username/
- /Documents and Settings/*/
+ /Documents and Settings/username/Application Data/
- /Documents and Settings/username/*/
- Deleted Items.dbx
- microsoft.public*     (exclude some newsgroups I'm on)
- /*/
- /*


To get this all to backup regularly, simply add an entry into the root crontab.

su - root
crontab -e
(then in editor)
0 0-23/2 * * * /etc/backups/scripts/run_backups.bash

Note: You'll want to insure that the above scripts are secured to just root (owner root:root, permissions 755 or such) since you don't want somebody else changing them to do arbitrary things! While I'm on the subject of security, it is also a good idea to change the permissions of files in the /root/mounts/backups folders for your linux backup.a You may not want somebody having read access to all of /etc or /root or anything else backed up.


I've been using it for months without a hitch. Please don't use any information I've posted against me or my systems. Finally, I can't guarantee it to work for you. I've had some issues with PC files being copied over even if they haven't changed, so it's not perfect. But it has also come in quite handy on more than one occasion when trying to undo changes that I didn't mean to code and other files. And if the really unfortunate were to happen, such as a system crash, it's good to know there a second copy.

Good luck, and let me know of any comments on the above.

No comments: