Image Preservation – The Early Days
After viewing the above image from fellow streetphotographer Paul Watson, I wanted to update an issue I’ve addressed previously: the major challenge that digital storage presents in terms of long-term archival endurance and accessibility. Back in my analog days, when still photography was a smelly endeavor in the darkroom for both developing and printing, I slowly learned about careful washing and fixing of negatives, how to make ‘museum’ archival prints (B&W), and the intricacies of dye-transfer color printing (at the time the only color print technology that offered substantial lifetimes). Prints still needed carefully restricted environments for both display and storage, but if all was done properly, a lifetime of 100 years could be expected for monochrome prints and even longer for carefully preserved negatives. Color negatives and prints were much more fragile, particularly color positive film. The emulsions were unstable, and many of the early Ektachrome slides (and motion picture films) faded rapidly after only a decade or so. A well-preserved dye-transfer print could be expected to last for almost 50 years if stored in the dark.
I served for a number of years as a consultant to the Los Angeles County Museum of Art, advising them on photographic archival practices, particularly relating to motion picture films. The Bing Theatre for many years offered a fantastic set of screenings that offered a rare tapestry of great movies from the past – and helped many current directors and others in the industry become better at their craft. In particular, Ron Haver (the film historian, preservationist and LACMA director with whom I worked during that time) was instrumental in supervising the restoration, screening and preservation of many films that would now be in the dust bin of history without his efforts. I learned much from him, and the principles last to this day, even in a digital world that he never experienced.
One project in particular was interesting: bringing the projection room (and associated film storage facilities) up to Los Angeles County Fire Code so we could store and screen early nitrate films from the 1920’s. [For those that don’t know, nitrate film is highly flammable, and once on fire will quite happily burn under water until all the film is consumed. It makes its own oxygen while burning…] Fire departments were not great fans of this stuff… Due to both the large (and expensive) challenges in projecting this type of film, as well as the continual degradation of the film stock, almost all nitrate film left has since been digitally scanned for preservation and safety. I also designed the telecine transfer bay for the only approved nitrate scanning facility in Los Angeles at that period.
What this all underscored was the considerable effort, expense and planning that is required for long term image preservation. Now, while we may think that once digitized, all our image preservation problems are over – the exact opposite is true! We have ample evidence (glass plate negatives from the 1880’s, B&W sheet film negatives from the early 1900’s) that properly stored monochrome film can easily last 100 years or more, and is readable today as it was the day the film was exposed with no extra knowledge or specialized machinery. B&W movie film is also just as stable as long as printed onto safety film base. Due to the inherent fading of so many early color emulsions, the only sure method for preservation (in the analog era) was to ‘color separate’ the negative film and print the three layers (cyan, magenta and yellow) onto three individual B&W films. – the so-called “Technicolor 3-stripe process”.
Digital Image Preservation
The problem with digital image preservation is not due to the inherent technology of digital conversion – if done well that can yield a perfect reproduction of the original after theoretically an infinite time period. The challenge is how we store, read and write the “0s and 1s” that make up the digital image. Our computer storage and processing capability has moved so quickly over the last 40 years that almost all digital storage from more than 25 years ago is somewhere between difficult and impossible to recover today. This problem is growing worse, not better, in every succeeding year…
As can be seen from the above examples, digital storage has changed remarkably over the last few decades. Even though today we look at multi-terabyte hard drives and SSD (Solid State Drives) as ‘cutting edge’, will we chuckle 20 years from now when we look back at something as archaic as spinning disks or NAND flash memory? With quantum memory, holographic storage and other technologies already showing promise in the labs, it’s highly likely that even the 60TB SSD disks that Samsung just announced will take their place alongside 8-inch floppy disks in a decade or so…
And these issues are actually the least of the problem (the physical storage medium). Yes, if you put your ‘digital negatives’ on a floppy disk 15 years ago and now want to read them you have a challenge at hand… but with patience and some time on eBay you could probably assemble the appropriate hardware to retrieve the data into a modern computer. The bigger issue is that of the data format: both of the drives themselves and the actual image files. The file systems – the method that was used to catalog and find the individual images stored on whatever kind of physical storage device, whether ancient hard drive or floppy disk – have changed rapidly over the years. Most early file systems are no longer supported by current OS (Operating Systems), so hooking up an old drive to a modern computer won’t work.
Even if one could find a translator from an older file system to a current one (there is a very limited capability in this regard, many older file systems can literally only be read by a computer as old as the drive), that doesn’t solve the next issue: the image format itself. The issue of ‘backwards compatibility’ is one of the great Achilles Heels of the entire IT industry. The huge push by all vendors to keep all their users relentlessly updating to the latest software, firmware and hardware is just to avoid these same companies having to support older versions of hardware and software. This is not totally a self-serving issue (although there are significant costs and time involved in doing so) – frequently certain changes in technology just can’t support an older paradigm any longer. The earliest versions of Photoshop files, PICT, etc are not easily opened with current applications. Anyone remember Corel Draw?? Even ‘common interchange’ formats such as TIFF and JPEG have evolved, and not every version is supported by every current image processing application.
The more proprietary and specific the image format is, the more fragile it is – in terms of archival longevity. For instance, it may seem that the best archival format would be the Camera Raw format – essentially the full original capture directly from the camera. File types such as RAW, NEF, CR2 and so on are typical. However, each of these is proprietary and typically has about a 5 year life span, in terms of active application support by the vendor. As camera models keep changing – more or less on a yearly cycle – the Raw formats change as well. 3rd party vendors, such as Adobe Photoshop, are under no obligation to support earlier Raw formats forever… and as previously discussed the challenge of maintaining backwards compatibility grows more complex with each passing year. There will always come a time when such formats will no longer be supported by currently active image retrieval, viewing or processing software.
Challenges of Long-Term Digital Image Preservation
Therefore two major challenges must be resolved in order to achieve long term storage and future accessibility of digital images. The first is the physical storage medium itself, whether that is tape (such as LTO-6), hard disk, SSD, optical, etc. The second is the actual image format. Both must be usable and able to transfer images back to the operating system, device and software that is current at the time of retrieval in order for the entire exercise of archival digital storage to be successful. Unfortunately, this is highly problematic at this time. As the pace of technological advance is exponentially increasing, the continual challenge of obsolescence becomes greater every year.
Currently there is no perfect answer for this dilemma – the only solution is one of proactivity on the part of the user. One must accommodate the continuing obsolescence of physical storage mediums, file systems, operating systems and file formats by moving the image files on a regular and continual basis to current versions of all of the above. Typically this is an exercise that must be repeated every five years – at current rates of technological development. For uncompressed images, other than the cost of the move/update there is no impact on the digital image – that is one of the plus sides of digital imagery. However, many images (almost all if you are other than a professional photographer or filmmaker) are stored in a compressed format (JPG, TIFF-LZW/ZIP, MPG, MOV, WMV, etc.). These images/movies will experience a small degradation in quality each time they are copied. The amount and type of artifacts introduced are highly variable, depending on the level of compression and many other factors. The bottom line is that after a number of copy cycles of a compressed file (say 10) it is quite likely that a visible difference from the original file can be seen.
Therefore, particularly for compressed files, a balance must be struck between updating often enough to avoid technical obsolescence and making the fewest number of copies over time in order to avoid image degradation. [It should be noted that potential image degradation will typically only be due to changing/updating the image file format, not moving a bit-perfect copy from one type of storage medium to another].
This process, while a bit tedious, can be automated with scripts or other similar tools, and for the casual photographer or filmmaker will not be too arduous if undertaken every five years or so. It’s another matter entirely for professionals with large libraries, or for museums, archives and anyone else with thousands or millions of image files. A lot of effort, research and thought has been applied to this problem by these professionals, as this is a large cost of both time and money – and no solution other than what’s been described above has been discovered to date. Some useful practices have been developed, both to preserve the integrity of the original images as well as reduce the time and complexity of the upgrade process.
Methods for Successful Digital Image Archiving
A few of those processes are shared below to serve as a guide for those that are interested. Further search will yield a large amount of sites and information that addresses this challenge in detail.
- The most important aspect of ensuring a long-term archival process that will result in the ability to retrieve your images in the future is planning. Know what you want, and how much effort you are willing to put in to achieve that.
- While this may be a significant undertaking for professionals with very large libraries, even a few simple steps will benefit the casual user and can protect family albums for decades.
- In addition to the steps discussed above (updating storage media, OS and file systems, and image formats) another very important aspect is “Where do I store the backup media?” Making just one copy and having it on the hard drive of your computer is not sufficient. (Think about fire, theft, complete breakdown of the computer, etc.)
- The current ‘best practices’ recommendation is the “3-2-1” approach: Make 3 copies of the archival backup. Store in at least 2 different locations. Place at least 1 copy off-site. A simple but practical example (for a home user) would be: one copy of your image library in your computer. A 2nd copy on a backup drive that is only used for archival image storage. A 3rd copy either on another hard drive that is stored in a vault environment (fireproof data storage or equivalent) or cloud storage.
- A note on cloud storage: while this can be convenient, be sure to check the fine print on liability, access, etc. of the cloud provider. This solution is typically feasible for up to a few terabytes, beyond that the cost can become significant, particularly when you consider storage for 10-20 years. Also, will the cloud provider be around in 20 years? What insurance do they provide in terms of buyout, bankruptcy, etc.? While the issue of storage media is not an issue with cloud storage and file formats (it is incumbent on the cloud provider to keep that updated) you are still personally responsible for the image format issue: the cloud vendor is only storing a set of binary files, they cannot guarantee that these files will be readable in 20 years.
- Unless you have a fairly small image library, current optical media (DVD, etc.) is impractical: even double-sided DVDs only hold about 8GB of formatted data. In addition, as one would need to burn these DVDs in your computer, the longevity of ‘burned’ DVDs is not great (compared to printed DVDs like you purchase when you buy a movie). With DVD usage falling off noticeably this is most likely not a good long-term archival format.
- The best current solution for off-premise archival storage is to physically store external hard drives (or SSDs) with a well known data vaulting vendor (Iron Mountain is one example). The cost is low, and since you only need access every 5 years or so the extra cost for retrieval and re-storage (after updating the storage media) is acceptable even for the casual user.
- Another vitally important aspect of image preservation is metadata. This is the information about the images. If you don’t know what you have then future retrieval can be difficult and frustrating. In addition to the very basic metadata (file name, simple description, and a master catalog of all your images) it is highly desirable to put in place a metadata schema that can store keywords and a multitude of other information about the images. This can be invaluable to yourself or others who may want to access these images decades in the future. A full discussion of image metadata is beyond the scope of this post, but there is a wealth of information available. One notable challenge is the most basic (and therefore future-proof) still image formats in use today [JPG and TIFF] do not have any facility to attach metadata directly within the image file – it must be stored externally and cross-referenced somehow. Photoshop files on the other hand store both metadata and the image within the same file – but as discussed above this is not the best format for archival storage. There are techniques to cross-reference information to images: from purpose-built archival image software to a simple spreadsheet that uses the filename of the image as a key to the metadata.
- An important reminder: the whole purpose of an archival exercise is to be able to recover the images at a future date. So test this. Don’t just assume. After putting it all in place, pull up some images from your local offline storage every 3-6 months and see that everything works. Pull one of your archival drives from off-site storage once a year and test it to be sure you can still read everything. Set up reminders in your calendar – it’s so easy to forget until you need a set of images that was accidentally deleted from your computer – and then find out your backup did work as expected.
A final note: if you look at entities that store valuable images as their sole activity (Library of Congress, The National Archives, etc.) you will find [for still images] that the two most popular image formats are low-compression JPG and uncompressed TIFF. It’s a good place to start…