Big Backup Bonanza

EP 9: Tuesday, Aug 8, 2023
Episode Banner

Transcript

Alan Pope 0:00
I’m seeing the change to our regularly scheduled programming. I thought we’d all talk about the same topic this week, but from different angles. Okay? Yeah. Yeah, cool. Backups you should do them, you really should do them is the overriding message you should get from this. And I want to talk about how I do backups. And then you guys can talk about yours. So I’ll start with mine. Most of my systems are Linux, laptops, desktops, servers, they’re all Linux. And so they pretty much all reside in my house, or are remotely accessible via SSH. So I’ve got a few servers out there in the cloud, I only have one location, I’m not backing up from lots of places other than out there on the internet. And so for many years, I’ve used the same tool for doing backups. And it’s called an art snapshot. And it’s been around a long time, it’s a little bit quirky. And it’s a little bit weird to set up. But once it’s set up, it’s pretty much fire and forget. So I have a server upstairs in the loft at home. And that’s running our snapshot. So it has a regular cron job to run our snapshot. In fact, it has four cron jobs to run our snapshot. And what it does is the configuration for our snapshot tells it which machines to backup remotely. And it does that over SSH as root. So it has the SSH keys all set up. So that machine can SSH to any system as root, and access the file system of that remote system. And then it copies that all the files from that remote system onto the server at home. And it does that on a regular basis for multiple machines. So it wakes up every few hours, and backs up my laptop, my desktop, my web server, a game server I’m running and all kinds of other machines. And the neat thing about our snapshot is it uses our sync over SSH. And it only backs up changed or new files. So it’s not doing a full backup every single time. It’s the first backup takes a long time because it backs up everything that you specify. And you don’t have to backup the whole system, you can just backup certain folders. So I only backup certain important folders. But the important part is it runs us on a regular basis and backs up all the files on all the systems and it works. And do

Mark Johnson 2:36
you just get one backup of whatever the latest state is? Or does it do some sort of time window where you can go back and say I need to know what it was yesterday or something like that?

Alan Pope 2:48
Yeah, it’s configurable, right. So the way I have it set up is that it runs every six hours. And it keeps a certain number of those backups. When it gets to the limit of however many you set. Let’s say it’s six, it renames, the oldest one to another name. And they’re called alpha, beta, delta gamma. And so the Alpha backups, I’ve got six Alpha backups, and then when it starts the next one, and it seems there already six that have been done over the last six backups, it takes the oldest one and renames it beta dot zero. And then away we go again, and it does another backup. And then the next time it runs beta zero gets renamed beta one, until we have enough beta backups. And then the oldest betta backup gets renamed delta and then the oldest delta one gets renamed gamma. The net result of that is I have backups going quite far back. But I only keep seven of the most recent ones. And then the others are daily, the betta ones are daily. And then the older ones are weekly, and then the older ones that are monthly. So I effectively have hourly, daily, weekly, monthly, roughly. But because I don’t backup every hour, it’s not actually hourly. It’s every six hours and then every day and then every week and then every month. So I go back like six months, maybe a year of backups, but not every single backup. It’s not like ZFS snapshots. It’s a Delta taking a snapshot in time, at some point over the last year. And I’ve been doing this for years on all my systems. And it has saved my bacon a bunch of times where I’ve aggressively deleted files off of a laptop and then wanted to go and find them or needed to go and find some configuration file from a year ago because I’ve had to redeploy something or something like that. And there are certainly better ways to do it. It’s not the fastest backup in the world. Because the way it copies the files from Alpha zero alpha one and then alpha one to alpha two. It takes a lot of time to do that. Copying, it’s basically CPE a folder to another folder, it does save space by using hard links. So you don’t end up with lots of duplicate copies of the same files. But if you modify a lot of files, and a lot of files get changed, and that a lot of files get backed up. But I basically don’t look at it very often. Because whenever I look at the log file, it’s just working, and it has done for years. And so it’s good old, reliable, our snapshot. And I’m quite happy with it. And when I’ve never needed to get files from it, they’ve been there. And that’s kept me very happy.

Martin Wimpress 5:34
And I think that’s a critical piece, right? Because people talk about making backups. And I’ve spoken to people about the rigour with which they take backups. And I was like, what’s the restore process? Like? And they’re like, No, I’ve never really tested. Like, yeah,

well should probably do that, because you may just have a load of garbage.

Alan Pope 5:56
Yeah. And I’ve restored like entire folders full of music, and, you know, folders full of PDFs and stuff. It’s not just individual files, but it’s all there going back over a year. And I’ve recently been getting rid of some of my old backups, because I just don’t need them anymore. Like, I’m on my fourth generation of laptop that I’ve been using our snapshot with this server. And it’s keeping all those old laptop snapshots there. I call them snapshots, their backups, really. And so every time it’s copying Alpha zero to alpha one, and alpha one to alpha two is copying not only my current laptop, but the previous laptop and the laptop before that, because all those backups are still sat on the server. They’re all in one big folder, which means it’s super time consuming, copying a whole bunch of files that are never going to change because those laptops don’t exist anymore. So I think there’s ways I could improve this. But I’m too far down the line of using, I’d have to just replace it completely, I think.

Martin Wimpress 6:57
Right? And you said you do your incrementals out like every six hours. Yeah. Do you have any idea how long it takes your incremental backup to run?

Alan Pope 7:07
That’s a good question. So the most recent one that started at four o’clock today took an hour to Rm, the oldest Alpha backup, and then it took less than a minute to move all of the Alpha backups up when it moves, not copies, yeah. And then to copy Alpha zero to alpha one took an hour in 10 minutes. And then actually doing the backups took 15 minutes, not long at all. So the actual backup, the diff of what has changed since midday to four o’clock on all of those systems is 15 minutes worth of copying files over the network. But the prep took over an hour to do. So it’s horribly inefficient. Now bear in mind, this is running on an eighth Gen HP microserver with four ageing four terabyte drives in them, I could probably buy a Synology with 112 terabyte drive, and it would be just be like insanely fast. But that’s where we are. That’s the system I’ve got. And I don’t really want to go buying a whole load of more infrastructure for backups. I should note, I probably didn’t explain it particularly well. I have got a blog post. If you just do a search for Popey. Our snapshot. You’ll find my blog from a couple of years ago why I explained this in much more detail about how it all works and how it fits together. And I’ll link

Martin Wimpress 8:30
it in the show notes. I too have used our snapshot very happily for many years. But I’m here to tell you Resistance is futile bog backup is here to replace our snapshot. Okay. I have recently done the very thing that you posturing, I have completely replaced my backup solution with bulk backup. And I suppose it’s important to summarise why I decided to replace our snapshot after so many years of being happy with it.

Alan Pope 9:01
Is it anything to do with the reasons that I gave about being slow? And

Martin Wimpress 9:05
yes, yeah. Yeah. So I use our snapshot to do two types of backups. One, two, do backups of my home directory. And that is to protect me from when I do a silly and I go and delete a bunch of stuff accidentally that I really didn’t mean to delete and I need to recover it. And I also use it to backup my servers, which is mostly static content, it’s videos and music and things of that nature and some other stuff and I’ll get into that a bit later. But those workstation backups of my home directory, I’ve got terabytes of data and millions of files and our snapshot takes a long time to run even on very, very fast all NVMe storage solutions for the source where the data resides and the target to where it is being backed up. So, spinning rust is not the only limiting factor here, it is just a inefficient process. So I wanted to be able to take snapshots of my data at much more frequent intervals to give me more tight restore points more

Alan Pope 10:18
factor, silly factor

Martin Wimpress 10:21
for when I inevitably RM something in the wrong place and destroy hours of work. So I was thinking, I would switch my file systems to Zed Fs and get into ZFS replication. And you know what, life’s too short for that, you know, for your laptops and your trivial stuff at home, getting into balancing ZFS and doing storage architectures and all of this sort of thing. And I don’t trust B tree Fs, send your hate mail to Joe at late night linux.com has bitten me and I’m not going back there and XF s I love. But XF s is missing some features that would be good for a backup solution. Like there’s no native compression on the file system, for example, something that you can do with Btrfs. So I wanted to find a solution that enabled me to keep x Fs, which I trust, and I like, but also faster than our snapshot, more space efficient than our snapshot. But you talked about like how convenient the Restore is from our snapshot, because you can just go to an any point in time version, and all of your stuff is there. Yeah. And I love that. And also love as you say, when our snapshot is configured, it just comes along, and it does its job. And I wanted that. So bulk backup is all of these things. So it is very fast, much faster than my snapshots. And I’ll get to that in just a moment. But the space efficiency is interesting because it does deduplication as well as compression. So when you create a repository for your backups, you can say what compression you want to use on that repository. So I’ve created two repositories, one, which is a target for stuff that is probably going to compress well, like my documents and general data. And I have another one where I have no compression applied, which is where the music backups and the video backups and the photo backups go to because you get little to no benefit from RE compressing those things. And it just adds to the time the backups take. In addition, it has a very simple means of encrypting your data. And at some point in the future, I want to get to sending a another copy of my backups to some off site place. I’ve currently got the office in town and the server here. So I have two places. But I’d like something that’s completely separate as well. And the restore process for Borg is very elegant. You have they call them archives, but it’s basically a point in time backup. And you can just say, Go and mount that. And it uses fuse mounting, and it suddenly just appears just like a directory anywhere on your system. And you can then use our sync to copy a bunch of files or or the whole thing. But you don’t. It’s not like the old days of backup software where you have some way of saying right extract this whole backup. You just basically mount the pointing time backup that you’re interested in. And then you can restore some, any files that you want out of there, back over the target, just like I would do with our snapshots. So I love that. So because of all of these reasons, it has replaced our snapshot for me. And I’m using it for my workstation backups to do regular snapshots of my home directory. And that also includes backing up my key base folder. So I have got data in key base for projects that I work on with other people where we keep secrets that we want to share with members of the project. But I also want to have backups of that stuff in case keybase ever disappears, right? So it’s able to peek inside there. But the backup is encrypted. So you know, I now have a great way to back that stuff up. On my workstation backups are many terabytes and many millions of files and my incremental backups now take 30 seconds. So I am now doing hourly backups. So Mark was asking about retention. Earlier my retention for my home directory is I have 24 hourly backups. I have seven dailies for weeklies Six monthlies, and one annual now, I’ve only been using this for about four weeks at the moment. So I don’t have that history of stuff yet. And then I’m using it on the server as well. And the servers doing, you know, backups of the music and the videos. But I also have is a headless version of Dropbox running on the server. And I backup our dropbox folder as well. So if Dropbox ever goes away, or something disastrous happens, I now have our Dropbox stuff backed up. And I think what’s worth pointing out here is that you can completely automate Borg backup itself from the command line in much the same way you would with our snapshot. But there are some front ends that make it even more palatable and straightforward to use. So on my workstations, I’m using a bit of software called Vawter, which is a graphical application that is dead simple to use. And you can create backup archives very, very quickly and say what you want to backup, I has a simple mechanism of excluding, so I touch a file in any folders that I don’t want to backup, I just create a hidden file called.no backup. And Borg sees that and goes, Oh, well, I don’t back this one up. So if I’ve got things I want to exclude, it’s easy to do. So I’m using Vawter on the workstation. It sits in the indicator area with all of the other things. And it glows red when it’s doing its backups. And you can click on it. And if you want to restore, you can just click the in the list of backups and say mount it. And it just appears as a fuse file system in your file browser. And you can go and poke around and looking at and all that sort of thing that is

Alan Pope 16:46
super neat. It does. I mean that voltar does look I mean that the video on the homepage is on a Mac. So I assume the Linux version looks exactly the same, exactly the

Martin Wimpress 16:54
same, right. And as you point out, this will work on Mac, Linux and Windows. So like you, I only care about Linux. But if there are people out there who have got a mixture of machines in their home, and they want a solution that will can work everywhere. Then Borg plus Vawter will give you that. And then on the server I’m using a utility called Borg Matic which basically presents as a YAML file, where you configure what you want to backup. And it’s just a much simpler interface to Dr. Borg itself. So that’s what I’m using. And I don’t have 10 years of experience with it, like I do with us snapshot. But so far, it’s giving me the our snapshot experience that I’ve enjoyed, but it’s quicker, faster, using less disk space, as well. Nice. And when you have Borg installed on remote endpoints over SSH, it can accelerate the backups over SSH by virtue of having bhog on both ends of the connection. So it’s not just doing like a file system transaction, it actually has a protocol between the to the client and the server. So it’s very, very fast remotely as well,

Alan Pope 18:14
is bog installable. As a like Deb on Linux and as a packages for everything.

Martin Wimpress 18:20
I’m going to say yes, I’m obviously using on Nix OS. It’s working fine here. I have seen that that the guides talk about the many Linux is that it’s available for a tool, and it’s sponsored by Borg based.com, who are basically a hosting company that offer you remote storage, and also hurts now offer storage box solutions, which are bog enabled. So if you want an off site solution, there are hosting providers that support the project by providing you with cloud based storage that is accelerated for bog,

Alan Pope 18:57
okay, sold. Linux matters is part of the late night Linux family. If you enjoy the show, please consider supporting us and the rest of the late night Linux team using the PayPal or Patreon links at Linux matters.sh/support. For $5 a month on Patreon you can enjoy an ad free feed of our show, or for $10 get access to all the late night Linux shows ad free. You can get in touch with us via email show at Linux matters.sh or chat with other listeners in our telegram group. All the details are at Linux matters.sh/contact.

Mark Johnson 19:37
My backup solutions are a bit different because I think the main use case I’m thinking of is a bit different. I’m not thinking about backing up entire machines or my entire home directory all the time. In case I mess something up. I’m more thinking about what are the things that are crucial for my family to make sure that we never lose which are things like our Family Photos. And currently, we have quite a good system within the house whereby our photos are instantly uploaded from our phones to our next cloud server, and synced between all the devices which talk to that. So we’ve got plenty of copies of them around the house. If one machine goes pop, nothing’s getting lost. But were there to be a complete disaster. And all the machines get lost somehow, I wanted to have some sort of further off site backup that I could do. And I’d be thinking about the same sort of things that we’ve already been talking about, not just how do I back it up and do it easily and make sure it’s encrypted. But also, what’s the recovery process? Like? How easy is it to test the recovery process after just doing your first backup, and I recently discovered a tool called our clone. And our clone is a similar sort of tool, it will synchronise files from machine to machine. But the way it does it is quite clever and really flexible and gives you a lot of interesting options to cover a lot of different cases. So the way it works is it gives you a load of essentially options for pluggable backends. And that could be somewhere on a file system, it could be somewhere over SSH or SFTP, or any other sort of network protocol. But it could also be any other way there is of storing files remotely. And that could include something like s3, object storage, even something like WebDAV or IMAP, or anything which could feasibly store a file, you could create an R clone back end to do it. So in my case, I’ve gone with an s3 compatible option. I’ve actually gone with it Dr. E to storage because they sponsor our clone. And I thought if I’m going to give money to someone, I may as well give it to people who support the project. So that’s good. Yeah, you can just use the the generic s3 back end and point it anywhere, any provider that supports that API, or there’s a few which support a version of the API like Backblaze that have their own back end. But because this is all private family stuff, I wanted to make sure it was encrypted. And the way that our cloud handles things like that is as well as these things, which are actually a back end for storing files somewhere, there’s a set of sorts of meta backends, for doing other things. So there’s one called union, which will take a load of backends, and stick them together as though they were one back end. So you might have a load of small accounts across various providers, but you want to just backup to them, somehow use that storage as a blob, or have multiple accounts and backup your stuff to all of those at once. And there’s your backends for doing that by putting them in front of the real back ends. And similarly, with encryption, the way that you do that is you create your back end for where you’re actually storing the files, then you create an encryption back end, which points to the real back end, and then you back up to the encryption back end, which then transparently does all of that, basically. And the process is exactly the same. Like you’ve got all of these commands in our clone for doing various things. And they all work the same, whichever back end, you’re using nice.

Martin Wimpress 23:01
I’ve been aware of our claim for some time. And it’s been on my list of things to take a proper look at. And I’ve never got around to it. But the way that I’ve understood it is it’s like our sink for Cloud Storage.

Mark Johnson 23:14
Yes. I mean, that’s basically how it operates. But it’s cloud storage and any other storage, right? You could equally use it with with another server on your land, or just files locally if you wanted to. So you could use it like our sync like acid. Yeah, exactly.

Alan Pope 23:30
So the question is, what happens if you need to restore a file? How do you get stuff back.

Mark Johnson 23:36
So the backing up is simple. You know, I’ve got a cron job, which says, Take this directory and clone it to this back end. And that’s all I need to tell our clone. It does that however often I want it to. And you see, if you look at the bucket in the object storage, you just see a bunch of encrypted file names. But here’s the clever bit like Borg, you have the option on your machine where our clone is running, to say mount this back end, and you get it mounted as a directory Nice. So you can do that. You can also say take this back in and expose it over whatever, remote protocols so you could have it on your cert, you could have our code running on your server, but say actually give me that as a web dev mount. And then from somewhere else on your network, you could do the same thing. You could also take the same config file from the server that’s doing the backups and put that on another machine like your laptop and then run the same command to mount that back end. And it will do that on your local machine. But aside from doing all of this on the command line, you also get the option of running a web GUI which is actually what I use to do the configuration. So you have an interactive CLI for adding remotes and then the CLI commands for running it but you can also do it all through quite a decent although it describes itself as an experimental web front end, which you run. You do our clone All CD dash dash web GUI on whichever machine has the config file that you want. And then as well as setting it all up like that, you get the option to browse any of the remotes, so I can go in there and I can say, show me my backup destination. And it’ll show me the real file names and the real files, and then I can just click on one and download it. Or I can basically manage all of the mounts that I’ve got. So I can from there, I can say mount this, and then that appears in my file manager. And I can browse it at the moment in terms of historical backups, the way I’ve approached, that is using s3 versioning. So if I have files that I delete, that will be retained as a version or if they get updated, they’ll get a new version of them in the bucket, which I can then expose through our clone, the old versions, as well as the current version, this is still evolving. I’ve only put this in place recently. And I’m currently only doing just our family photos at the moment to see how it goes and how much it’s going to cost. But it’s certainly something I’m considering expanding to the other important things.

Alan Pope 26:06
I find it interesting that you don’t care so much about like non family photos, or like your home directory or anything like that. If my

Mark Johnson 26:15
home directory blows up, I can reinstall and start again. I don’t you know, handcraft my doc files like Martin does.

Martin Wimpress 26:23
I used to do that. But now I don’t have to because I have everything powered by NYX. So when I come to restore a recovering machine, which I did recently, I reinstalled my laptop. The instal process gives me an instal fully configured, including all of the directories in my home directory where my data lives. So then I just imported my water backup profile, and ran a restore. And I was back up and running. And that whole process from instal to fully restored system was under an hour, which is really nice, but like you mark, the server backup is backing up our important stuff. So we have sync thing on all of our laptops, with our documents and our photos. So like you everything’s in multiple places. But because we’ve now got this magical internet LAN, if I delete something in sync thing, it almost instantaneously gets deleted from everyone else’s machine, which is why I wanted this bulletproof backup sitting on top of it.

Alan Pope 27:25
It’s interesting that you say about installing your your system from scratch and then restoring from the most recent backup that you have in bulk. And now I think about it, if I get a new laptop, I do a clean instal and I cherry pick a few things out of the backups of my previous laptop like my music collection, for example. And maybe a couple of other folders and everything else I use. Same thing for it, I’m thinking actually backup a tremendous amount of stuff that I am never ever going to restore my there’s a whole truckload of stuff like the cache folder and the dot local in my home, I am almost guaranteed never to want to restore an old browser profile cache folder, that’s just not going to happen. So I think I need to optimise my backups a little bit better.

Martin Wimpress 28:15
So I used to do exactly what you’re describing with our snapshot, I would just say backup, my home directory dot files, the whole thing. And then, as you say, be selective in the restore process. I only need the Documents folder in the downloads folder and this and that. But now with Borg, I’ve changed my strategy to say these are the directories that have data in it that I want to backup. And it really is data, you know, is user data not configuration anymore. Definitely not caches. So when I do a restore, I know it’s that’s all of the stuff that I want.

Alan Pope 28:46
I don’t trust myself, I don’t I don’t. I don’t trust myself that I would pick all the right folders and then somewhere down the line, I would create a folder in my home. And it’s never been backed up. That’s one problem. I don’t trust myself to put things in the right place. Maybe I need more discipline with my backups.

Martin Wimpress 29:02
So that’s how we’re all doing our backups. Is anyone else out there doing it differently. We’d be interested to hear what your solutions are. In particular, I would like to hear what people are doing for backing up and viewing their family photos. I have a suboptimal solution at the moment. Your help finding a new solution would be most welcome

Show Notes

In this episode:

You can send your feedback via show@linuxmatters.sh or the Contact Form. If you’d like to hang out with other listeners and share your feedback with the community, you can join:

If you enjoy the show, please consider supporting us using Patreon or PayPal. For $5 a month on Patreon, you can enjoy an ad-free feed of Linux Matters, or for $10, get access to all the Late Night Linux family of podcasts ad-free.

Hosts

Alan Pope
Mark Johnson
Martin Wimpress