For a side project I’m working on I’ve recently setup a virtual private server to host the code on. To ensure that I would always be able to easily setup a new machine if something goes wrong I wanted to setup a solid backup and restore process for it.
Little did I know that there’s a ton of information out there and not any good recommendations if you ask me. This post is my attempt at boiling all of it down to one single recommendation that worked out really well for me while also telling you about my process to get there. Keep into account that I’ve just been running one virtual private server and this strategy has worked out really well for me so far.
If you want to find out what I settled upon, just scroll down to the section ‘Final approach’
What’s important to know before we start is that I’m running a virtual private server which is using Ubuntu 18.04 and my host machine is a MacBook.
I started by doing a backup by just using rsync
. Rsync has the possibility to use SSH to login to a remote machine and copy files from there, so that sounded like the perfect solution.
The exact command that I used was this:
sudo rsync root@example.host.com:/ -avzP --numeric-ids \
--exclude={"/dev/*","/proc/*","/sys/*","/tmp/*", \
"/run/*","/mnt/*","/media/*","/lost+found","swapfile","/var/cache/apt"} \
~/backup
This command will copy all of the files, starting from the root of the file system, excluding the files/folders mentioned in the --exclude
argument to the ~/backup
directory.
The flags that I’m using here are -avz --numeric-ids
. -a
stands for archive, which does a number of things. Then there’s -v
which means verbose, so I can actually see what is happening, -z
which tells Rsync to compress, to save data that we are transferring and last we have --numeric-ids
which tells Rsync to use numeric ID’s to set the permissions, which is important as I will explain later.
One important thing to include here as well is to use sudo
to run this command. One of the number of things is to preserve permissions. That means to for example make the group and the user equal to what it is on the remote machine. --numeric-ids
uses numeric identifiers for the group and user so that this will work even if the group(s)/user(s) don’t exist on the target machine.
If you don’t run this command as sudo
that means that rsync can’t chown
the files and therefore the permissions won’t match, which is a big problem if you want to restore the files later.
This command will probably run for a little while, but in the end you will have a full backup of the virtual private server which is ready to be restored. To restore I used this command, which is exactly the same, apart from the target and host directories being switched:
sudo rsync ~/backup/* root@185.109.217.252:/ -avzP \ --exclude={"/dev/*","/proc/*","/sys/*","/tmp/*","/run/*","/mnt/*","/media/*","swapfile","/lost+found","/var/cache/apt"}
The problem with this first iteration is that it is not deleting any files that are no longer relevant. That’s simple enough using the --delete
flag with Rsync but this means that it would delete all of those files. Lets say that you wanted to get back such a file? That would be impossible.
A solution to this is to make separate folders per day that you do a backup so that you can keep a record of all files per day, so that you can get back the deleted files as well. However this would take up a lot of space. If the backup is 5 gigabytes and you have 10 days of backups that would amount to 50 gigabytes, which isn’t great of course.
If you start googling about these things, you will very quickly find out about snapshot-based backups. These types of backups involve doing a backup every day/week/month, usually all of those three combined and using hard links to keep a record of all of the files but without actually costing a lot of space. Using hard links here will ensure that if you have the same file 10 times, it will only take up the space once. Last thing which is quite important for these types of backups is to cleanup old backups over time. If you would run this for 10 years, you don’t want to have 10 years of daily backups, most likely. This is quite a lot of work if you would have to do all of this manually, although it is possible to do.
There are two tools that I’ve tried out to do these types of backups since I didn’t want to set them up manually:
Both of these essentially do the same thing but they achieve it in a different way; both support snapshot-based backups, that will only save incremental changes once a full backup has been done and they also take care of automatically rotating old backups.
rdiff-backup saves the most recent version as files on disk and then it uses a special directory called rdiff-backup-data
to save deltas of the changes. These can only (easily) be seen and used when using the rdiff-backup
tool.
Rsnapshot is just a layer on top of Rsync, what Rsnapshot does is that it saves all of the different versions as directories on the file system. An example ls
of the Rsnapshot backup directory is this:
> ls ~/rsnapshots
daily.0 daily.1 daily.2 daily.3 daily.4 daily.5 daily.6 daily.7 daily.8 daily.9 daily.10
These are 10 daily backups that have been made which are all just copies of the file system of the remote machine. Restoring this is as simple as just rsyncing those files over, or using scp
for example.
That was the reason I chose Rsnapshot over rdiff-backup.
To achieve those backups with Rsnapshot there are a few things that have to be configured. When you install Rsnapshot through
brew install rsnapshot
it’ll install a default rsnapshot config file in this location:
/usr/local/Cellar/rsnapshot/$rsnapshot_version/etc/rsnapshot.conf.default
.
You need to copy that file without the .default
like so:
cp /usr/local/Cellar/rsnapshot/$rsnapshot_version/etc/rsnapshot.conf.default /usr/local/Cellar/rsnapshot/$rsnapshot_version/etc/rsnapshot.conf
.
Now the settings that you have to modify are the following:
snapshot_root -> point this to the folder that you want to use for storing your snapshots.
Backup levels -> I’ve set this to this:
retain daily 14
retain weekly 4
retain monthly 6
This will retain 14 daily backups, 4 weekly backups and 6 monthly backups. If Rsnapshot reaches 15 daily backups it will remove the oldest one so that it can create a more recent one.
Another thing to modify is the files to exclude:
The ones that I have are these:
exclude "/dev/*"
exclude "/proc/*"
exclude "/sys/*"
exclude "/tmp/*"
exclude "/run/*"
exclude "/mnt/*"
exclude "/media/*"
exclude "/lost+found"
exclude "/var/cache/apt"
exclude "/boot/*"
exclude "/swapfile"
exclude "/root/.cache"
exclude "/root/.composer"
exclude "/var/lib/apt"
exclude "/var/www/**/node_modules"
exclude "/var/www/**/vendor"
exclude "/usr/local/share/.cache"
exclude "/var/lock"
exclude "/var/tmp"
exclude "/var/run"
exclude "/var/spool/postfix/dev/*"
The last thing to configure is the target that you actually want to backup:
You do this by adding a line like this one to the config file:
backup root@example.host.com:/ example.host/
.
The only thing left to do now is to invoke the rsnapshot binary (using sudo, since otherwise it wouldn’t be able to preserve permissions).
sudo rsnapshot daily
-> runs a daily backup
sudo rsnapshot weekly
-> runs a weekly backup
sudo rsnapshot monthly
-> runs a monthly backup
This can be configured using CRON to run those at the correct moments or some other way to choose when to run which backups and as mentioned before, restoring is as simple as going into a directory that rsnapshot has created (ex. daily.0) and rsyncing/scp’ing those files over.
There you have it, a pragmatic guide to backing up and restoring a VPS.