Hourly Snapshot Style Backup

From BubbaWiki
Revision as of 15:22, 14 March 2011 by A1n (talk | contribs)
Jump to navigation Jump to search

General

Howto setup Hourly snapshot-style backups on a B3 and an external disk.

On a B3 the scripts in this Howto create an hourly backup (snapshot) of the data directories on the server for all the users on the system and for the storage directory. The backup is on an external (esata) disk.

These scripts create what appears to the user a full backup every hour, when in fact you only get an image of what is unchanged plus the changes since the last backup. The result is an incremental backup which is very fast typically a few minutes, and the total backup volume of around 30 backups is not much bigger than just one backup. The backup is accessible by all users so that they can get there missing files themselves. The backup is only accessible as a read only file system, so that the users cannot mess up the backups. On the backup all the permissions from the original files are maintained so that if a user has something private in his directories, other users cannot access it through the backup.

This is what you see in the snapshot directory:

louis@b3:/home/storage/snapshot$ ls -l
total 140
drwxr-xr-x 9 root root 4096 Mar 9 00:00 00
drwxr-xr-x 9 root root 4096 Mar 9 01:00 01
drwxr-xr-x 9 root root 4096 Mar 9 02:00 02
drwxr-xr-x 9 root root 4096 Mar 9 03:00 03
drwxr-xr-x 9 root root 4096 Mar 9 04:00 04
drwxr-xr-x 9 root root 4096 Mar 9 05:00 05
drwxr-xr-x 9 root root 4096 Mar 9 06:00 06
drwxr-xr-x 9 root root 4096 Mar 9 07:00 07
drwxr-xr-x 9 root root 4096 Mar 9 08:00 08
drwxr-xr-x 9 root root 4096 Mar 9 09:00 09
drwxr-xr-x 9 root root 4096 Mar 9 10:34 10
drwxr-xr-x 9 root root 4096 Mar 9 11:00 11
drwxr-xr-x 9 root root 4096 Mar 9 12:00 12
drwxr-xr-x 9 root root 4096 Mar 9 13:00 13
drwxr-xr-x 9 root root 4096 Mar 8 14:00 14
drwxr-xr-x 9 root root 4096 Mar 8 15:00 15
drwxr-xr-x 9 root root 4096 Mar 8 16:00 16
drwxr-xr-x 9 root root 4096 Mar 8 17:00 17
drwxr-xr-x 9 root root 4096 Mar 8 18:00 18
drwxr-xr-x 9 root root 4096 Mar 8 19:00 19
drwxr-xr-x 9 root root 4096 Mar 8 20:00 20
drwxr-xr-x 9 root root 4096 Mar 8 21:00 21
drwxr-xr-x 9 root root 4096 Mar 8 22:00 22
drwxr-xr-x 9 root root 4096 Mar 8 23:00 23
drwxr-xr-x 9 root root 4096 Mar 8 00:00 dinsdag
drwxr-xr-x 7 root root 4096 Mar 3 00:00 donderdag
drwxr-xr-x 9 root root 4096 Mar 7 00:00 maandag
drwxr-xr-x 9 root root 4096 Mar 4 00:00 vrijdag
drwxr-xr-x 7 root root 4096 Mar 2 00:00 woensdag
drwxr-xr-x 9 root root 4096 Mar 5 00:00 zaterdag
drwxr-xr-x 9 root root 4096 Mar 6 00:00 zondag

Each directory containing a full image of your directories at the time of the backup. The funny names are the days of the week in Dutch, the scripts in this howto will yield the names in English.


Setup.

I use a B3 with a permanently mounted external disk as a primary backup using rsync to make incremental backups. The incremental backups are such that an automated backup is made each hour of the day and kept for twenty four hours and then the last one of each day is kept for a week. So there is a disk that will have each file as it was one hour, two hours up to twenty four hours ago and then 1 day ago up to 7 days ago. Whenever I change a file but I don't like the change, or if I delete a file but I need it back, I can simply pick it of the backup for that hour or day. I use rsync to make these incremental backups such that only changed files are actually copied to the backup on the last rsync. This way although there is a backup frequently the total backup size (30 backups) is not much bigger than the size of the original data. The B3 has a portion of your primary disk in use for the system files. So if you have two equal size disk, the secondary disk actually has same spare capacity anyway. The longer you keep the backups and the more you edit your files the bigger the backup will get.

This is a good setup for home users, but not if you have lots of data traffic to your disks. If the content of your disks changes with more than 20 gigabytes per hour, this backup method is to slow. For such data intense application you may consider RAID instead.

What you will need

You will need a B3 (it will probably work on a B2 but I have not tested it); an external esata disk of about the same size as your internal disk; this howto and the scripts in it; and you will need to install rsync (this howto tells you how to install rsync). You will also have to gain access to your B3 using SSH, if you don't know how there is a howto on the excito website.

Warnings!

The setup is complex, you will need to know your way around the system. You will need to use an ssh connection into your system and you will have to work as root. There is a risk of destroying your data or your system files. If you get your path names wrong you may lose valuable data.

The disk is mounted in two places (using mount --bind) this can be confusing at first, you will need to keep track of where you are at any time.

In the event of hardware failure you will lose any changes made since the last backup. For home or small office users, where backups are made daily or even hourly as described in this document, that's probably fine, but in situations where any data loss at all would be a serious problem (such as where financial transactions are concerned), a RAID system might be more appropriate. If you have a B3, as I do, you have to choose between RAID and rsync snapshot, it has only one esata connection. I would think that a usb disk is to slow for the rsynch apllication proposed here (though I have not tested it on a usb disk).

You will still need to make regular backups (say weekly) and store them in some safe place. In a serious incident, such as sudden power loss or more severe incidents, both disks may become compromised.

Credits

The scripts are a modified version of Henri Laxen's script, which you can find here. http://www.mikerubel.org/computers/rsync_snapshots/contributed/henry_laxen

The rsync technique that I use and describe here was written in an article by Mike Rubel in 2004, and I adapted it for my own purposes. You can find the original article here http://www.mikerubel.org/computers/rsync_snapshots/

Caveats

RAID will give you only the last version but has the advantage that your backup is always in sync, which is not the case with the rsync snapshot-style backup. The snapshot-style will be up to one hour old. This scripting technique gives you not only the last version of your files but also several older version.

The rsnapshot package may be a good alternative; it is avalaible as debian package. It uses nfs to export the backup as read only, which is then mounted for the users. NFS server is not installed on B3 and although not difficult to set up, it is another service and will use resources. The setup I use with mount does not need a new service. I have not found how to exclude directories from the backup in this rsnapshot package. Excluding directories is necessary as snapshot is mounted on \home and \home is in the backup files. The backup would then end up including itself and quickly bulge.

There is another howto on this wiki (Tutorials_and_How-tos/Use_Bubba_with_Time_Machine). This setup will give the users more control over their backup, and it will do a weekly backup in stead of an hourly backup, that can be changed easily however.

The setup in this howto has your backup disk mounted on the same machine as the primary disk, there is a drawback. The fact that your backup disk is mounted on the same machine makes it more vulnerable. Optionally you can consider mounting it before the backup and unmounting it afterwards, but then the disk in not accessible by the users. I use another disk (usb) that I keep unmounted in a different place to make a weekly backup of the same data. This is how I keep the data safe in case the system and both disks are compromised. You could do the same rsync over two different machines in two locations, but that is more costly in terms of hardware and power use. If you are interested lookup Mike Rubels original article (see credits). There is also a howto on this wiki  that uses rsync backup over any two linux machines (Backup with rsync and rsycn.net). That is a more secure setup. The probability of two machines failing simultaneously is simply less. However I do not have two machines and if you keep an extra backup off site, as I do, you can avoid catastrophic data loss.

The scripts that do the actual backup are written such that if a file is deleted from the source it will also be deleted from the backup, which is probably what you want. This means if you delete a file today you have max seven days to get it from the oldest backup. If you do not want this you can edit the snaphrl.pl script and take out the --delete option in the rsync command. Taking out this option means that your deleted files will be kept permanently on backup.

All the directories with the backups will take on the most recent ownership and permissions on the last backup. It is possible to change this behavior by editing the script, there is some information in Mike Rubels article (see credits). I have not tested the rsync using the --link-dest flag. I also think if you change the ownership or permission on a file, you probably intend to change them on the backup too as this backup is accessible by users.

When approached in windows file explorer, at least in XP (Samba file server), a user may appear to be able to delete a file. However the files do not ever actually get deleted. I think that you appear to be able to delete them is due to the Samba server export settings. I have not worried about it much, but if it bothers you I advice you to look at the Samba configuration files, it may be possible to change this for the /home/storage/snapshot directory.

This setup mounts the external disk in two places. The first mount is in the home partition (/home/.sdb1mntpnt) so that the primary disk does not fill up if everything goes wrong, which would in turn freeze your system. This mount is visible to the user by default, the user on a B3 can see all of the home directory. You therefore make the mount point hidden by using the dot notation. On a standard B3 dot files are hidden in samba and in the web gui. Linux bash hides dot directories by default, you can see them by listing with the -A option (as such ls -A).

The scripts.

The scripts are written in perl, which is available on the B3. There are two scripts one is snaphrl.pl and the other is snapscript.pl. You can use this scripts as provided, the text describes some changes you can make, but this is not necessary.

Snaphrl.pl

#!/usr/bin/perl
# Change this to wherever you want your backups to go
my $base = '/home/.sdb1mntpnt';
# List anything you want to backup
my @rsync=
qw(
   /home/
);

# And whatever you want to exclude, if anything
my @rsync_exclude =
qw(
   /dvd
   /media
   /floppy
   lost+found/ 
   snapshot/
   extern/
   .sdb1mntpnt/
   *~
);

#######################################################################
# 
#No user serviceable parts below this line
########################################################################
my @hours = qw(   00 01 02 03 04 05
            06 07 08 09 10 11 
            12 13 14 15 16 17
            18 19 20 21 22 23);

my $thishour = (localtime(time))[2];

$thishour = @hours[$thishour];
my $lasthour = $hours[ $thishour == 0 ? 23 : $thishour - 1 ];

sub run($) {
   my $command = shift;
   print "$command\n";
   system($command) == 0 or die $!;
}

#######################################################################
# 
#Remove this hours directory, since it is now a day old
run "rm -rf $base/$thishour" if -e "$base/$thishour";
########################################################################

#######################################################################
# 
#Then copy last hour's directory to this hour, using hard links
run "cp -al $base/$lasthour $base/$thishour" if -e "$base/$lasthour";
#######################################################################
#
#######################################################################
# 
#Update today's directory using rsync

foreach (@rsync) {
   my $rsync_command = "rsync -a --delete --link-dest=$base/$thishour \\\n";
   foreach (@rsync_exclude) {
      $rsync_command .= " --exclude=$_ \\\n";
   }
   $rsync_command .= " $_ $base/$thishour";
   run $rsync_command;
}

########################################################################
#And finally update the time stamp on the directory
run "touch $base/$thishour";
########################################################################

__END__

=head1 NAME
snapscript.pl
-  use rsync with Mike Rubel's idea to make rotating backups
=head1 SYNOPSIS
In a cron job put:
  0 * * * * perl snaphrl.pl
Be sure you have sufficient permissions! Make the cron job as root for example.
=head1 DESCRIPTION
Makes a set of rotating backups using rsync and subdirectories named
for the days of the week.
=head1 AUTHOR
Louis Chaillet
=head1 SEE ALSO
Perl(1).
=cut

Snaphrl.pl rotates the hourly backups and makes a new rsync. The directory to backup is /home/, the trailing slash does matter. With the trailing slash the script produces /home/storage/snapshot/..., without it will give /home/storage/snapshot/home/.....

You can exclude files and directories in the script by adding another line, do not use spaces, or if you must then put quote marks around the names; like this 'extern/'. The exclude '*~' excludes all lock files, as you do probably not want to back them up. You can change this exclude list. For example if you do not want to backup your music files you could add the line music/ or maybe *.mp3.

You can change the naming of the hours of your backup directories, I would advice you not to use spaces in these names.

Snapscript.pl

#!/usr/bin/perl
# Change this to wherever you want your backups to go
my $base = '/home/.sdb1mntpnt';
#######################################################################
# 
#No user serviceable parts below this line
########################################################################
my @hours = qw(   00 01 02 03 04 05 
                  06 07 08 09 10 11 
                  12 13 14 15 16 17 
                  18 19 20 21 22 23);

my $thishour = (localtime(time))[2];

$thishour = @hours[$thishour];

my @days = qw(sunday monday tuesday wednesday thursday friday saturday);
my $dtoday = (localtime(time))[6];
my $today = @days[$dtoday];
my $yesterday = @days[ $dtoday == 0 ? 6 : $dtoday - 1 ];
sub run($) {
   my $command = shift;
   print "$command\n";
   system($command) == 0 or die $!;
}

#######################################################################
# 
#Remove yesterday's directory, since it is now a week old
run "rm -rf $base/$yesterday" if -e "$base/$yesterday";
#######################################################################
# 
#Then copy the current hours's directory to yesterday, using hard links
run "cp -al $base/$thishour $base/$yesterday" if -e "$base/$thishour";

__END__

=head1 NAME
snapscript.pl
-  use rsync with Mike Rubel's idea to make rotating backups, this script only
updates the daily diretories, to be used together with snaphrl.pl
=head1 SYNOPSIS
In a cron job put:
  30 23 * * * perl snapscript.pl
Be sure you have sufficient permissions!
=head1 DESCRIPTION
Makes a set of rotating backups using rsync and subdirectories named
for the days of the week.
=head1 AUTHOR
Louis Chaillet
=head1 SEE ALSO
Perl(1).
=cut


snapscript.pl rotates the daily backups, using copy only. You can change the names of the days of the week here to whatever you want to name your directories.

It possible to rewrite the script to use just one in stead of two scripts. I have not worked on this.

How to proceed.

Overview.

This overview will help you understand where you are as you go through setting up the Hourly Snapshot Style Backup. 

  1. Make an external backup before you proceed.
  2. Mount the secondary disk and then bind it to another mount point as a read only file system.
  3. Download the scripts, edit them if necessary and move or copy to the correct directory.
  4. Edit crontab to automate running the scripts.

1. Make an external backup

It is essential that you do first make a backup of your data and your settings. 

2. Mount the secondary disk.

Find the device name from the web Gui under Administration/Disk in my case it is /dev/sdb1. You will need to know the formatting used on the disk. If the disk is not formatted, or if you want to change the formatting you can use the web Gui, by attaching the disk, formatting it and detaching. Watch out formatting will destroy any data on the disk.

If you don't know the formatting here is a trick from “Howto: Backup data from Bubba to external drive using Rsync” on the excito web site. Connect your external drive to Bubba and go to Disc Information in Bubba GUI to find the appropriate disk name and mount point. Click connect, go to your shell and give a mount command. The last line of the output will give you the type. Now, disconnect the drive from the web gui again.

If you know the name of the external disk and the formatting make sure the disk is not mounted on the gui and go to the shell and become superuser.

Su

Make the directory where the disk will be mounted.

mkdir /home/.sdb1mntpnt

Then mount the disk using the proper type (which may be different from ext3).

mount -t ext3 /dev/sdb1 /home/.sdb1mntpnt

As you mount it as root, root will have full access. Check the rights with ls -lA. This mount is not visible from the shell if you do not specifiy the -A option (or from the web gui or using samba). Next create the mount point where the users will see the backup.

mkdir /home/storage/snapshot

Then mount it with bind so it is mounted in two places.

mount --bind /home/.sdb1mntpnt/ /home/storage/snapshot
mount -o remount,ro /home/storage/snapshot


The second step is necessary: it is not possible to mount --bind and make read only in one step.
The result is that the disk is accessible read and write by root and read only by all users on /home/storage/snapshot.

Go to your web gui and check you can find the snapshot. Also check the rights on the filesystems.

Go back to your shell and Install rsync if you don't have it (not installed by default).

apt-get update
apt-get install rsync

3. Download the scripts

Download the scripts from the script listings above. Using your web browser copy the script to a text editor and save them with their respective names (snaphrl.pl or snapscript.pl) in your home directory.

Then go back to the shell, find the scripts in your home directory and copy (or move) them to /usr/local/etc.

$ mv /home/_user/snaphrl.pl /usr/local/etc/
$ mv /home/_user/snapscript.pl /usr/local/etc/

Change _user in these commands to you name on the B3.

You can put snaphrl.pl and snapscript.pl anywhere it is accessible but /usr/local/etc/ is a good option.

Make ownership to root and readable by all (there is no harm in being able to read the scripts).

$ Chown root /usr/local/etc/snapscript.pl /usr/local/etc/snaphrl.pl
$ chmod 744 /usr/local/etc/snapscript.pl /usr/local/etc/snaphrl.pl

Now run snaphrl.pl by hand, by issuing

$ perl /usr/local/etc/snaphrl.pl

Wait for this to finish. If it takes long you may have to rename the backup to whatever hour it is before starting the cron jobs, so that if the time is 18:34 but your backup in snapshot is now called 17, rename it to 18 with

$ mv /home/storage/snapshot/17 /home/storage/snapshot/18

The first run will take long, the following runs will be much shorter (like minutes) and therefore you have to check after the first run if the order of the backups is correct, the script will try to find last hours backup by the time of day.

4. Automate your runs with crontab.

Still as root set up the cron jobs. One cronjob to do the hourly backups and one to rotate the dailies. The latter starts half an hour before midnight, to give the hourly job time to finish first.

Crontab -e

and add

0 * * * * perl /usr/local/etc/snaphrl.pl > /var/log/snaphrl.log 2>&1
30 23 * * * perl /usr/local/etc/snapscript.pl > /var/log/snapscript.log 2>&1

I put the output to a log in the crontab, but there are other ways. If you leave out the redirection (>) the executing user will get a mail for the cron job (that would be root) and if you tire of this mail every hour you could use >/dev/null 2>&1 instead which will disable the mail, or you could use >/dev/null which should give root a mail only when there are errors.

That's it. If there are no errors, the directories will be created and updated every hour.

Restoring files

Restoring is really easy. Use either the web gui or file explorer (in windows) to go to the backup you need (e.g. /home/storage/snapshot/03/...) and copy the files or directories you want to wherever you want.

In case of loss of the primary drive the process would be more complex. You would set up your B3 again and create the users. Then mount the second drive as described under step 2. , (create the directories and issue all three mount commands in the order given). As a last action but BEFORE you run the scripts or set up the cron jobs to the rsync backup you would copy the latest (newest) snapshot to your home directory. The easiest is to use SSH and and as root copy the last backup with:

$ cp /home/storage/snapshot/03/* /home

(or whatever the newest snapshot is).

The snapshots themselves do not include a storage/snapshot directory so the snapshot is not overwritten (and it is mounted read only anyway). Then install the scripts (as is described in step 3) and make the cron jobs (as in step 4) and continue where you left off. The scripts will start to rotate again where they left off.