Ludovic Frank - Freelance developer

Data backup is important, so here's how to go about it.

ionicons-v5-k Ludovic Frank Mar 23, 2021
107 reads Level: Confirmed beginner

Following the fire in a datacenter in France, I saw reactions on social networks such as "I've got my company in your servers", which gave me the idea of writing this article.Of course, it's not the owner of the data center who's to blame here, but accidents like this do happen, so you need to be prepared. By having a data backup strategy. Can you imagine? Losing your accounting data? I've already touched on this subject in my article "How to protect yourself from ransomware".

The disaster recovery plan (a.k.a disaster plan)

In IT, the disaster recovery plan, often overlooked by companies, is a plan that describes "if a disaster happens, how do we get everything up and running again quickly?". Although it may seem like nothing, it's very important, because we don't know what tomorrow will bring, so I invite you to ask yourself the question today, "If a disaster happens? what do I do? What do I do?" The following points should help you.

Test backups

In my opinion, this is the most important point. Waiting until the day you need them to use your backups is dangerous, very dangerous. Are all data backed up as planned?
Yes, I can see your point: "It's slow to test your backups every week", it's true ... but it can be automated, why not develop a little script that fetches the backed-up data and tries to put it back?Why not develop a little script that fetches the backed-up data and tries to put it back on another site, then test that everything's fine with a little report by e-mail that says whether everything's OK or not, someone just needs to check that this e-mail exists every week (otherwise something's gone wrong)?

RAID

I'm not going to dwell on this technology, as it's everywhere: RAID is the use of several hard disks seen as "a single disk". There are several configurations, some designed for performance (RAID 0, for example) while others are designed for data integrity (like RAID 1).

It's the backup service that has to fetch the data to be backed up

Imagine if your production servers were compromised by malware. In this case, the malware would have access to the backup data, and since production has access to it, so does the malware. The best way to protect against this is for your backup system to connect itself to production and back up what it has to back up. If the production server is compromised, the attacker won't be able to trace the backups.
There are alternatives, for example, if the production server sends a copy of the data to the backup service, configure the backup system so that it only accepts new data, making it impossible to modify or delete data already backed up.

Several geographical zones

In entrepreneurship, you never put all your eggs in one basket, and in backup it's the same. I'd say that your data should be located in three distinct geographical zones, as natural disasters are not impossible. In my case, all my data and my customers' data are copied in these locations: Paris, Nancy, Strasbourg (yes, I had servers lost in the fire) and North America. Every night I have servers whose sole job is to check that everything is backed up in each geographical zone.

Incremental backup

"Backing up" doesn't mean "copying all the files" with each backup, which would consume a lot of storage space.If, for example, part of the data has been lost as a result of mishandling, don't panic, just go back in time to before the incident and voilà, the data's back.

A file system adapted to backup

Okay, this is getting a bit technical, I admit. First of all, let's explain what a file system is: a file system is the software that organizes the data on the physical medium (hard disk, SSD, magnetic tape, etc.).For example, if you're running Windows, your file system is called "NTFS" for "New Technology File System", on macOS it's "APFS" (Apple File System) and on Linux it's EXT4. File systems are designed for specific needs, such as performance or reliability; NTFS, for example, is reputed to be far more reliable than "FAT" (the system that existed before NTFS on Windows).
What we're interested in here, and what's good to know, is that there are file systems designed for backup: the one I personally use is B, which enables me to make an "image" of my backups every night, preserving them for 90 days.

The bonus: A fault-tolerant infrastructure

CA is my pet peeve, and now we're starting to have some fun. I talk about it in my article on the Maitrise Orthopédique website. The idea is very simple: NO server should be irreplaceable. If a server goes offline for any reason, the company's services continue to operate as if nothing had happened.
To do this, your applications must already be able to support it. For example, when you connect to a website, there's often what's called a "session".Each time your terminal communicates with the site, it sends a unique identifier. This unique identifier is recognized by the server, enabling the link to be made.
Often, in applications, the data linked to the proper functioning of the session is stored on the hard disk of the server that responds to you, and this poses a problem if, for example, you're navigating to another page and, to display this other page, your terminal connects to another server? The other server won't know who you are... and you'll be disconnected from your account.
It is, of course, possible to solve this problem and all the others, so as to have an infrastructure capable of withstanding the elements and remaining online despite a data center being wiped off the map.It's an exciting thing to do, requiring a lot of network and development skills, but the end result is really great.

Conclusion

Here are a few ideas for a data backup strategy worthy of the name, and my pet peeve, the bonus, is really the best of the best (well, without the points above, it's just as dangerous, isn't it?it's just as dangerous ... it doesn't protect against possible security breaches, for example).