Monday, November 19, 2012

Cloud Backups vs. RAID

A friend had an unfortunate but predictable mishap today, and I wanted to do a bit of math on the economics of data backups.  I'll start with two stories about hard drive crashes with very different outcomes.

Some time ago, I fixed a laptop for a friend and in the process had to reformat the hard drive.  So, I backed up all of her files to my RAID array and then copied them back to the hard drive afterwards.  Months later her laptop was stolen, and with it all of her irreplaceable documents and pictures from years ago.  Some time later when I was clearing up space on my backup array, I found that I had forgotten to delete these files after the repair and was able to send them back to her intact.  A fortunate circumstance, after which I did what I felt was responsible and used srm to scrub the files from my archive:  it was not right that I should have all of someone else's personal files.  Today I got a text message asking if I still had those--her hard drive failed, and sadly it seems that this time years of photos and documents may be gone for good.

Having all of your personal files on a single drive is fine if you do not have any important or irreplaceable content on your computer so that you don't care if they disappear forever, but for most people this is not the case.  The average lifetime of a hard drive varies, but in my experience typical use means 3 to 4 years, and I only get some warning of impending failure through the "click of death" or system errors about half the time.  Recovery services are loath to give out a fixed price scheme which is understandable given the number of ways a drive can fail, but $350 is a typical cost if they can be successful.  Obviously, if your drive is stolen, even a data recovery service can do you no good.  You don't think about it every day, but the value of years of pictures, documents and code may be so high that it is impossible to put a number on it.

Now, a story with a happy ending.  For years, I had been using a RAID 1 array as a backup system.  Whenever one drive failed, I would just buy two new drives and copy everything over.  This has worked well for me and I have data from well over a decade ago including some of the earliest programs I ever wrote, 8 years of code from a past career in contract software development, and homework assignments from 6 years of grad school.  I also have hundreds of megabytes of pictures from places that no longer exist and of people that are no longer alive.  Recently one of the two drives in my RAID array failed, and I was left with a choice:  to buy two new drives, or to switch over to a cloud backup service.  I used CrashPlan at work, and was pleased that it was cross-platform (Linux, Windows or Mac) and worked seamlessly and silently in the background.  After examining the price tag, I decided to step into the 21st century and use CrashPlan instead of ordering new drives.  Three days later, the lone surviving drive from my RAID array failed.  If I had ordered new drives via standard shipping, I would not even have received my new drives yet much less had a chance to set them up.  If I hadn't switched to CrashPlan, that could well have been the end of over a decade of irreplaceable data.  Instead, my data was back in my hands within a day or two as I restored it from the cloud.  The cloud backup service paid for itself in the first week of ownership.

Now my question:  is it cheaper to run a 1TB RAID array than to pay for a cloud backup service assuming 1 terabyte of data and a 4-year lifespan of a drive, and a computer that is always on?   When considering prices, I will use the cheapest workable option available from NewEgg, the lowest prices from my electric bill, and an unlimited data use internet plan which may not reflect tractable options for all people.


Criteria1TB Cloud1TB RAID 1Advantage
Direct Costs1TB HD: $80.001TB HD: $80.00RAID 1
4yr Crashplan+ Unlimited: $139.991TB HD: $80.00
Total: $219.99 Enclosure/RAID controller: $10.00
5W×2HD×4yrs×$0.05/kWh: $17.52
Total: $187.52
BandwidthAt least 1TB over 4 yearsNoneRAID 1
Recovery TimeDownload speeds (Mbps)Data transfer speeds (Gbps)RAID 1
ReliabilitySecure datacenterVulnerable to theft or accident such as fire or flood.Cloud
MaintenanceInstall cross-platform software.Relatively advanced RAID setup required.Cloud

Surprisingly to me, if you were to use the absolute cheapest options available to you and software RAID, then RAID 1 is actually about $30 cheaper than CrashPlan over a period of 4 years. Clearly, CrashPlan is taking advantage of an economy of scale in order to provide this service: lots of cheap drives, a data center in an area with affordable power, and mass-produced servers. RAID 1 also provides faster access to your backup in the event of a loss, and does not abuse your internet bandwidth.  I will argue, however, that for most people the cloud will still be the better option, for one reason:  ultimately, the loss of one's data entirely is a disaster whose cost may be innumerable and as such reliability is a more important metric than price for most people.  Running RAID 1, you are still vulnerable to theft, a disaster such as a fire or flood or just knocking the enclosure off of a shelf that simultaneously destroys both drivers, or a virus or software malfunction that leaves you in the unfortunate situation of data loss.  Further, a cloud backup service is much easier for most users to set up and maintain than a RAID array.  For myself and most home users, the superior reliability of a cloud backup service more than justifies the trivial additional cost over RAID 1.

Overall, I would recommend each backup system under the following circumstances:

  • RAID 1:  Justifiable in situations where bandwidth is at a premium or internet is not available, or where backups must be available immediately (as opposed to hours later after a download completes) upon failure of a drive.  The price advantage of $7.50 per year is so small as to be insignificant as a deciding factor.
  • Cloud:  For a typical user such as myself with about 1TB of data and an unlimited internet connection,  it is only slightly more expensive than running RAID 1 but provides much greater reliability.  This option represents the future of computing and of data in general.
Whatever you do, please be responsible and use an always-on backup system.  Remember that we live in an era where much of your life exists only in data, and that many of these seemingly generic zeroes and ones are actually quite dear to you.  I know that if I had waited even another day to subscribe to a cloud backup service, an irreplaceable record of the last 16 years of my life would be gone forever.  



No comments:

Post a Comment