Today is Friday which, as Doug Chinney would say, makes it #backupfriday. Like Doug I’m quite obsessive about backups. I currently have 4 copies of my images plus one in the cloud (using Zoolz. Check Doug’s review of this here). You may think this madness, and I'm not totally dismissive of that point of view, but when - not if - something goes wrong I’ll be feeling pretty smug that I have a robust backup system. Or will I…?
Back Story
With the move to Australia I was concerned about how I would bring my image catalog with me. I use a MacBook Pro which only has a 500gb SSD which only stores my most recent images. Everything else is stored on an external hard drive which is then synced with another 2 external hard drives and my NAS.
The introduction of fresh regulations in the UK regarding electronic devices in hand luggage aboard flights (read here) meant that I couldn’t take my mains powered external HDs onboard with me, they would have to go in the hold (a call to Manchester Airport confirmed this). Having heard horror stories about this I was a bit paranoid.
So I bought a 2TB portable hard drive to store my images on, and used 2 smaller 500Gb portable hard drives for my most important images. These came with me in my hand luggage. The external hard drives went in the hold.
Back in the UK I had my NAS with my images on, as well as 4 spare 1TB drives that I copied portions of my catalog onto. These will follow me in the shipping container with the rest of our worldly belongings. So in actual fact, I have 6 full backups across the world, a couple of partial ones, and everything in the cloud too. Paranoid, moi?
The good news is that everything arrived off the flight safely and all appears to be operating ok. However my paranoia kicked in. What if they’d zapped the external drives with some X-ray and wiped some data (which doesn’t actually affect them I believe, but still!), or a heavy handed luggage handler had somehow caused a malfunction or corruption on one of the drives. For my own peace of mind I had to make sure that all of the drives and the image copies were 100% ok.
Image Verifier
Researching the best way of doing this I came across the aptly named Image Verifier. All you do with IV is point it at your images and it will work through the folder hierarchy checking the images. How does it do that? There are two ways.
Hash Checking
The first is called Hash Checking. This will generate a checksum (also known as a hash) which is a bit like a fingerprint for a file. It puts the file’s data into the checksum generator and it gives you a magic number that uniquely represents that file. Put the same data into the checksum generator and the checksum will be the same.
If the file has changed the checksum that is generated will be different. IV will flag this difference up as something to investigate. The reason for the change could be completely benign. It may be that you re-edited a TIFF file and the different checksums are because you did actually change the file.
However usually RAW files are stored in their original form and never changed. When using Lightroom any image edits and metadata you add are stored within the Lightroom database rather than saved back into the RAW file. If you use Bridge to add ratings, keywords, etc then these edits are stored in a ‘sidecar’ XMP file - a wee file that sits next to the RAW file, the RAW itself is unchanged.
Knowing this you would expect the checksum generated for each RAW to be absolutely the same every time you run IV on your images. Any changes in the checksum point to something happening to the RAW file. This needs investigating.
Structure Checking
The issue with checksumming is that the checksum generator doesn’t know anything about the data in the file. It treats all numbers in the file equally and just hashes them altogether without trying to understand them. It can’t tell you if the numbers in the file are the correct numbers.
This is where the second form of verification comes in, Structure Checking. Structure Checking tries to understand the contents of the file and ensure that it is correct and usable.
IV has built-in libraries to open and check the contents of JPEG and TIFF files so that it can confirm that files are in the right format for those file types. For RAW files, IV delegates the job to Adobe’s DNG Converter utility. The DNG Converter has wide compatibility - usually equivalent to Adobe Camera RAW/Lightroom - in terms of the RAW files it can process. IV passes the RAW file to DNG Converter which then attempts to convert it to DNG.
If it succeeds the file is assumed to be fine. The converted DNG is thrown away; it doesn’t save it to your image library or anything so don’t worry about that. However if DNG Converter returns an error then there is likely to be an error in the RAW file that needs investigating.
It seems to me that Structure Checking is most important when you first get started, to verify that the images in your catalog are good and can be opened successfully. Once you know that they are ok then hash checking can be used to detect any change in the file going forward. In other words, there’s no point starting with hash checking - it will generate a checksum ok, but if the file has already been corrupted then the checksum is just that of the corrupted file rather than known to be a ‘good’ checksum.
My Story
So I set IV off and left it running overnight and then some. In total it took just over 30 hours to go through my full 75k image library. Considering that in effect it was converting 75k RAW files to DNG then this probably isn’t too bad. But the results shocked me.
In total it confirmed that 74887 image files were good, but that 163 were invalid. 163 invalid, say what?! But but but, I backup!
Oh no! What Happened to my Honeymoon Images?
When I saw that a number of corrupt files were in a folder called ‘Honeymoon’ I started to panic a little bit (and checked that my wife wasn’t looking over my shoulder).
I worked through the errors, selecting each file and clicking ‘Reveal in Finder’. Sure enough, the image preview in Finder doesn’t look good. Certainly I don’t remember taking photos of a white rectangle on a black background.
If I try to open the file into Photoshop, ACR gives me an error telling me the file can’t be opened. If I open the file in Iridient Developer (ID) then it manages to open the file but the image presented confirms that that the file is indeed corrupt. Uh-oh!
My next step is to have a look in Lightroom and see what is going on there. Weirdly I can’t find the image in there. I then realise that the whole folder ‘2007/Honeymoon’ where the corrupt files are living isn’t in Lightroom. Instead I have another folder ‘2008/2008-02-12-Honeymoon’ which makes more sense given that my honeymoon was in February 2008.
I check to see if IV reports any errors in the "2008-02-12-Honeymoon" folder but fortunately not and the images in Lightroom look fine too. Thank goodness my honeymoon images appear to be safe! I live to fight another day.
It’s a bit of a mystery as to how I have a duplicate and corrupted folder of Honeymoon pictures. These were made in the days before I got on top of my image organisation, so it could be the result of a number of things. It’s just good to be aware of them - and in this case know that I can delete them.
False Positive Results with TIFFs
Looking further down the list of errors there are various TIFF files that it flags up as invalid. I open each of these up in Photoshop and double check in LR. Fortunately these appear to be fine and are what we call 'false positives'; the verification has flagged them as invalid but manual checking confirms they are valid. I'm not sure if there may be something not awry that's not immediately obvious here, or if IV struggles with certain aspects of TIFF files that means these were flagged up.
By the way when checking images in Lightroom, make sure to go into the Develop module. In the Library module the LR preview file is used and so won’t confirm any problems in the source file.
Corrupted DNG Files
Looking further down the list I start to see some other DNG files being flagged as invalid. One from 2009 appears to open fine into Photoshop until Adobe Camera RAW pops up to tell me the file is damaged. In what way I don’t know. I can continue and actually open the file into PS and so the nature of the damage remains unknown. Though the file looks fine I don’t know if I can fully trust it.
The most recent example of a corrupt RAW file is in June 2012, which is worryingly recent for my liking. The RAWs either side of this file are fine, it’s just this one that is damaged. What has caused the corruption is again a mystery. I don’t know if my old PC (I moved to Mac in May 2013) had a faulty drive that quietly corrupted this and perhaps the other files.
Once corrupted, when I’ve copied it across to my new Mac’s external drive and then replicated against a new set of external backup drives I’ve unwittingly copied and replicated the corrupt file. It may be that I have intact copies of these files on my old backup drive still installed in my PC but I can’t confirm this until my shipment arrives.
I’m reassured that it’s not a problem with the memory card, card reader or import process as the file was originally converted to DNG successfully - as noted earlier, the DNG conversion would fail if the original was corrupt - so something has happened to it once on the computer.
A Narrow Escape
Having checked each of the TIFFs manually I’m reassured that these files are ok. Of the 115 invalid RAW files I’m perhaps lucky that these are mostly from that strange duplicate Honeymoon folder, but there are a few more recent ones that give me pause for thought. These aren’t images that I intend to carry forward and so will delete, but imagine if the number included some of my favourite images?
From now on I’ll be running Image Verifier on images as I import and back them up, first using structure checking and then using hash checking to ensure they don’t change later.
Going back to my initial point, backups are not a guarantee of image safety. Perhaps the worst kind of hard drive failure isn’t the ‘big bang’ type where you lose everything on the drive, but where errors are silently written into files. If that's your primary source file then backups won’t save you, the corruption is just mirrored onto the other drives. Checking through my backups for the corrupt files here confirms that the backups are also corrupt.
This is one potential problem with automated backups. They’re great at creating regular mirrored copies of your images but the automation also risks pulling over corrupt files as well as any ‘manual error’ changes and deletions that means that, ye, you have multiple copies, but all of the copies are damaged.
Of course the purpose of multiple backups is that if corruption happens on one drive it is incredibly unlikely to happen in the same place on another drive. This means that problems such as this can be detected and avoided by restoring a corrupt file from a backup. So mostly I expect this is an issue with images being corrupted prior to backup, or if a backup is used in the assumption that it is good when in fact there may be some corruption.
Nonetheless my experience with IV has made me realise that it’s important not to just delegate the task of backup (and verify) and hope it all works out, but that you have to check things yourself to fully trust them. I think running Image Verifier on an occasional basis will help give me that extra peace of mind that my backup routine is truly robust.