This is the first of a series of articles on how to keep your data safe. By "safe" I don't mean secure against unauthorized persons. I mean making sure you don't lose data if you don't want to.
What I'd like from you, gentle reader, is any comments or questions you might have on the articles as you read them -- I will incorporate your concerns into later articles or go back and revise this one.
What I'd like to end up with is a good HOWTO on data protection. I want to cover both the "just get it done" as well as the "what am I doing here?" aspects.
First of all, my biases:
I am biased towards Linux as a file server. The reason is that I have two types of systems easily available: Windows and Linux, and I've lost too much data on Windows to be happy with it. Windows also tends to be very opaque and I want to understand what's going on here. Linux, on the other hand, while somewhat obtuse and finicky tends to be relatively stable and it is documented somewhere. If nothing else you can read the source.
What I'm calling "Data Security" is the science and art of keeping your data feeling warm, cozy and above all not losing it. You might lose data for a variety of reasons: computer upgrades, hardware failures, operator error. None of these are acceptable to me. We are accumulating too much unreplacable data. Family pictures are no longer kept in Grandma's closet -- they're now on my hard drive(s). I want to keep it safe.
Electrons are fickle things -- often they seem to be happy and content until the very moment you need something from them and then they're nowhere to be found. This is all very bad.
First of all, not all data is created equal. There is some data that is worth a great deal to you to protect. Other data is worth very little.
Where you get into trouble is when you choose the wrong amount of protection for your data. Too little and you're screwed when you lose data. Too much and you're constantly paying for it and perhaps not protecting the really critical data as you should because you don't want to spend enough to protect all of it that well.
The major failure of data protection a schemes is when it doesn't take this into account.
OK, let's look at some reasons your data might not be safe.
I'll call these the Four Horsemen of Data Destruction.
In order of impact (from my personal experience), I've lost data because...
There are various solutions to keeping your data safe. This section enumerates some of them and discussed pros and cons.
You can roughly divide the solutions into two camps: Those that try to prevent a failure from affecting data and those that try to make recovery easier using redundency.
There Ain't No Such Thing As A Free Lunch.
Sorry, but you're always going to pay something extra for data protection. The trick is to control how much you pay for how much protection.
Different strategies protect against different things and make different trade-offs to do it.
Here's probably a surprise for you: Many methods of data protection increase the likelyhood of suffering a failure of some kind. What you get in trade for that is reducing the impact of that failure. To make it concrete: If you have a couple of mirrored drives you're twice as likely to suffer a failure but a single failure doesn't really hurt.
Everything you do for data protection is a bet -- I could think of a very unlikely scenario that will destroy your data no matter what you do. The key is how "unlikely" I'd have to make it -- you pay your money and take your choice.
I'm listing backups first for several reasons:
There are several reasons that people don't like backups but they all fall into that backups take a lot of time and/or a lot of money and you only see the benefit very occationally. I will at some point discuss what makes a good backup system.
When done right, backups protect you from all of the Horsemen. You can choose how much risk you wish to mitigate with different kinds of backups. In other words, you can pick your cost/benefit.
Backups trade off space, time, money and failure rate in pretty much any combination you want, but there's a realtively high overhead to keeping good backups.
RAID, or Redundant Array of Independent Disks, protects against a failure from some number (you pick how many) of disks.
You sacrifice part of your storage for redundency and RAID in turn guarantees that if that much of your storage had a major failure you can recover -- most of the time.
RAID doesn't do anything for your own screw-ups.
Surprisingly, RAID often doesn't do anything for minor failures -- and there are plenty of horror stories where people thought they were protected only to find out that they weren't when they actually needed it. This is fixable so if you like RAID, be sure to pay attention when we get into the details there.
Drive montoring is keeping a finger on the pulse of a drive. That way if the pulse starts to get weak you can get your data elsewhere. The problem is that your drive might have a stroke while you're watching the pulse, and all your data is flushed, anyway.
If drives fail, just buy a better drive that won't fail. Ummm, yeah. I've got this nifty bridge I can sell you, too...
Seriously, though, "buying quality" reduces the chance of a failure but I've yet to see sufficient "quality" to entirely eliminate it. Buying quality can be part of a larger strategy but doesn't keep you safe by itself.
The place where this strategy really works is if you can buy enough quality that something else is going to die first.
The concept here is that you save your data somewhere else. The bet you're making is that "somewhere else" is going to do a better job of protecting your data than you do. This might be true.