Encrypt backups at an untrusted remote location

In a previous blog post I argued that a good backup solution includes backups at different geographical locations to compensate for local desasters. If you don’t fully trust the location, the only solution is to keep an encrypted backup.

In this tutorial we’re going to set up an encrypted, mountable backup image which allows us to use regular file system operations like rsync.

First, on any kind of permanent medium available, create a large enough file which will hold the encrypted file system. You can later grow the file system (with dd and resize2fs) if needed. We will use dd to create this file and fill this file with zeros. This may take a couple of minutes, depending on the write speed of the hard drive. Here, we create a 500GB file:

We will use LUKS to set up a virtual mapping device node for us:

First, we generate a key/secret which will be used to generate the longer symmetric encryption key which in turn protects the actual data. We tap into the entropy pool of the Linux kernel and convert 32 bytes of random data into base64 format (this may take a long time; consider installing haveged as an additional entropy source):

Store the Base64-encoded key in a secure location and create backups! If this key/secret is lost, you will lose the backup. You have been warned!

Next, we will write the LUKS header into the backup image:

Next, we “open” the encrypted drive with the label “backup_crypt”:

This will create a device node /dev/mapper/backup_crypt which can be mounted like any other hard drive. Next, create an Ext4 file system on this raw device (“formatting”):

Now, the formatted device can be mounted like any other file system:

You can inspect the mount status by typing mount. If data is written to this mount point, it will be transparently encrypted to the underlying physical device.

If you are done writing data to it, you can unmount it as follows:

To re-mount it:

Note that we always specify the Base64-encoded key on the command line and pipe it into cryptsetup. This is better than creating a file somewhere on the hard drive, because it only resides in the RAM. If the machine is powered off, the decrypted mount point is lost and only the encrypted image remains.

If you are really security-conscientious, you need to read the manual of cryptsetup to optimize parameters. You may want to use a key/secret longer than the 32 bytes mentioned here.

Before data loss: How to make correct backups

Why should you regularly make backups? Because if you don’t, then this mistake will bite you, sooner or later. Why? Because of Murphy’s Law:

Anything that can go wrong, will go wrong.

And a variation of it, Finagle’s law, even says:

Anything that can go wrong, will—at the worst possible moment.

So, let’s prepare right now and look at ways to back up data correctly.

RAID data mirroring is not enough

Realtime data mirroring (no matter if it is software or hardware RAID) is good, but not enough. What if your location is hit by lightning, fire or water? What if your entire system gets stolen? Then RAID would have been exactly useless.

Threats to local backups on external media

Say that you have an external USB hard drive for you backups. This is good, but as long as it is connected to your computer, it still may be subject to destruction due to lightning.

In addition, if you leave your external USB hard drive mounted in your host OS, and if you make a mistake as an administrator, or have faulty software or malware, you may fully erase your main hard drive and the backup at the same time. This is not too unlikely!

It happened to me once. A simple mistyped rm -rf ./ as root user somewhere deep in the filesystem did exactly that (I accidentally typed a space between the dot and the slash). Yes, I erased my main hard drive and the backup (mounted under  /mnt) at the same time. The data loss was desastrous.

Independent of the above, local backps are still susceptible to fire, water, or theft.

The dangers of deleted or changed files

rsync is especially good if you transfer the data to your backup location via public networks, because it only transfers changes. It also supports the --delete flag which deletes remote files when they are no longer locally present. This is generally a good idea if you want your backup to be an exact copy, otherwise your backup will become messy by accumulating many deleted files, which will make restoration not very fun.

But the --delete flag is also a danger. Say you delete an important file locally. Two days later you discover this fact, and decide to restore it from your backup. But guess what, it will be gone there too if you have synced in the meantime.

This problem is also present when changing files. The only solution to this problem is to have rolling backups (backups of your backups) in regular or increasing intervals (weekly, monthly, yearly). This will multiply the storage requirements, but you really cannot get around it.

Restoration is as important as backing up

Let’s say you have 10 perfectly done backups. But if you can’t access them any more, or not quickly enough (e.g. due to low bandwidth, etc.) they will be useless for your purposes. You need to put as much thought and effort into an effective restoration method as you put into the backups in the first place.

What works?

In general, a good backup solution depends on the specific circumstances and needs. Backups can never be perfect (100.0% reliable), there will always be a small but nevertheless existing possibility of total data loss. But you can make that possibility very, very small. As a rough guideline, the following principles seem to minimize the risk:

  • You have more than one backup.
  • You have backups of your backups (“rolling backups“).
  • You do not leave local backup media connected or mounted.
  • Your backups are at geographically different locations to compensate for local desasters.
  • If your backup is at a remote location, you fully trust the location, or use proper encryption.
  • Restoration is effective.
  • Backup and restoration is automated and tested.
  • After each backup cycle, the backups are verified. If there was a failure, the administrator is notified.

It is your responsibility!

If you should lose data, don’t blame it on ‘evil’ external circumstances, because:

Never attribute to malice that which is adequately explained by stupidity.

What if all data are still lost? Well, in this case I only can say:

Every misfortune is a blessing in disguise.

Start working on your backup solution now!