Archiving Your Digital Data Back | Up | Next

You have spent long hard hours collecting precious photons from marvelous objects in the nighttime sky. These photons have spent thousands, and in some cases millions, of years traveling across space to land on the sensor of your DSLR camera. This data is precious indeed. Don't throw it all away by failing to archive and back up your data!

An archive is a collection of your primary source data. An archival collection means it intended to last a long time.

The good thing about digital data is that it is very easy to make a perfect backup copy. But sometimes this can make us complacent. We think we can just copy it to our hard drives and it will be safe there. But danger lurks on those spinning magnetic disks. They can sometimes crash, and we can lose all of the data on them.

This leads us to the first commandment of data storage: Thou must back up thy data!

Now, I know, this is not exactly news. And it's nothing that you haven't heard before. However, I'll bet it is not something that you do religiously. Accidents can happen, so you need to get that religion!

Let's consider for a moment what we do with our original digital files. You can't really store them on the memory cards you shoot them on in the camera. This is cost prohibitive. So you will need to copy them to another form of storage media. The two most popular are optical disks, such as CD-ROMs and DVDs, and magnetic media, such as hard drives.

Once you copy your original file from the camera's memory card to your hard drive or CD-ROM, and you re-format your memory card, the copy becomes the original. This new original needs to be backed up in case something happens to it. You can make any number of perfect copies of your original, so you have no reason not to make a back up.

Storing your originals on your computer's hard drive without a permanent backup is not a good idea because hard drives can crash due to mechanical failure. A virus can also corrupt data or even wipe a hard drive clean of the information stored on it. Lightning strikes can fry a hard drive if the energy gets into your home's electrical system and ultimately into your computer. Simple mistakes such as operator error can accidentally erase or overwrite the original file.

Optical media are supposed to last 100 years, but there are many reports of less expensive media going bad after just a couple of years. And DVDs, which can hold more data, do not seem to be as archival as CD-ROMS.

So both of these data archiving methods have potential long-term storage problems. What is the solution to the problem of the long-term backup your data?


A Two-Pronged Backup Approach

I use a double redundant backup approach (from the department of redundancy department). I use both hard drives and optical media to store my vital digital information.

Basically, I keep one set of my data live on my computer as working data. I keep separate, complete copies of that data on two external hard drives. I make another copy of my original image files on CD-ROM or DVD immediately after I shoot them. So I actually have 4 copies of my data (double redundancy).

One of the backup hard drives is stored unplugged in my closet in case of a lightning strike, and the other backup hard drive is stored in my safety deposit box in my bank in case my house burns down.

Now, I know what you are thinking - this guy is nuts! He is paranoid and obsessive-compulsive. However logic for this follows the simple law of nature that says if you take an umbrella, it won't rain. If you make backups, you will never need them. Although, I have to admit, on more than one occasion, I have needed them when I have done something stupid, like erasing a folder with my images in it. Luckily I had them backed up and didn't need to depend on superstitious mumbo jumbo about umbrellas and rain!


My Backup Procedure

In my computer, I have two hard drives. One small fast hard drive (a Western Digital Raptor 10,000 RPM 75 GB SATA), holds the operating system and application programs. One larger, slower hard drive (a Maxtor 7,200 RPM 500GB SATA) holds all of my data: document files, original image files, working image files, final enhanced image files, web site files, music and all of my other stuff.

I make a Ghost backup of the operating system hard drive so I can easily restore my OS and applications in case I get some nasty virus, or if my OS hard drive crashes. Ghost is a software program that makes an exact copy of a disk or partition by cloning. It works very fast. It usually takes about 10 to 15 minutes to copy a 10-gigabyte operating system partition. It literally copies everything sector by sector (except the swap file which is not needed). A Ghost backup will re-install everything in about 15 minutes compared to about 15 hours of manually re-installing the OS and applications with all of their serial numbers and preferences. And once you re-install from a Ghost backup, everything is exactly the same as when you made the backup.

After an astrophotography expedition, I first copy the original image files from their compact flash cards to the data hard drive on my computer. Then I burn two CDs or DVDs on two different manufacturer's media. This is just in case I get a bad lot of one type. Then I re-format the compact flash cards. Then I copy the entire data hard drive to an external hard drive overnight when I go to bed because this can take a couple of hours to copy 250 gigabytes worth of data. That external hard drive goes to the safety deposit box in my bank the next day, and the one currently in the safety deposit box comes home where it is re-formatted and another copy of the data hard drive is made to it. This hard drive is then stored in my closet.

If I don't get out to shoot any more new astrophotos for a while, I make another data backup onto the hard drive in the closet, and then swap it with the hard drive in the safety deposit box about once a week. Even though I may not be shooting new images, I am working on correcting and enhancing the ones I have shot recently, and this creates new files that need to be backed up.

It sounds like a lot of work, but it's really not because the hard drive backups are made at night when I'm sleeping. I just start the copy and go to bed. I don't even use Ghost for this, just a straight file copy in Window's Explorer.

Burning the original image files to CD or DVD immediately after I shoot them is not that hard because I only get out a couple of times a month at the most.

High quality DVD media costs about 25 cents a piece, and will store 4.7 gigabytes of data. Large capacity hard drives cost about 1 dollar per 10 gigabytes ($100 for a 1 Terabyte drive). Hard drives are a little more expensive, but their convenience and speed make the extra cost more than worth it.


The Question of Longevity

CD-ROM media will last quite a while (50 - 100 years allegedly), if you use really good media like Mitsui and Tayio-Uden, and if you store it correctly in a cool, dry, dark environment. If you store them on the dashboard of your car in the Sun, don't expect them to last very long.

The archival qualities of DVD media do not seem to be quite as good as CD-ROM.

With the amount of data being generated by my DSLR, it is increasingly more difficult to fit everything on a CD-ROM, so I'm starting to use DVDs, but I don't particularly trust them. This is why I believe in a double backup approach on two different types of media (optical disk and hard drive).

Hard drives, if used in a computer that is left on all the time, will last about 5-7 years. If not running constantly and stored in a closet, they should last considerably longer. But if you have two copies, and one goes bad, it is very easy to just buy another one and copy all of the data to it. It's not like having to copy all of the data from several hundred DVDs.


The Problem of Technology

The archival qualities of the media are not the only concern we have though. The problem with really long-term storage (50 - 100 years) will more likely be readers and operating system hardware support down the road, say more than 20 years from now. This is because about every 10 years a new storage medium comes along, and new operating systems don't support really old hardware technology. For example, most new computers don't have floppy drives anymore.

Buried in a storage closet, I just found some big - inch floppy disks from a Commodore 64 computer from about 25 years ago. When I found them, I realized that there was a story I had written long ago stored on one of them that I would like to read that I did not have any other copies of. The disks may still be good, but I really don't know because I cannot access this data anymore because the disk drive died and was thrown away a long time ago. Maybe, if I searched really hard, I might be able to find someone, somewhere, with a working Commodore disk drive. But what about in another 50 years? That is seriously doubtful.


The Problem of Transfer

This is the major problem with digital data: you can make perfect copies of it, but its long-term archival storage is problematic. If you really want to be able to access it for more than one generation of media, you have to transfer all of it every time a new generation of storage media comes along. I used to have a shelf full of floppies that stored my backup data. Now I have a shelf full of CDs and DVDs with backups on them.

Just as CDs have replaced floppies for computer storage. DVDs are now replacing CDs. The next generation may be holographic optical storage, who knows? Whatever it is, you will need to migrate your entire data collection to this new media, or run the risk of not being able to access it in subsequent media generations.

In terms of backing up large amounts of data from a hard drive to optical media, it is a problem that the optical backup media lags behind in size by several orders of magnitude. For example, when 100 to 150 megabyte hard drives came out, it took about 100 1.4mb floppies to back one up. That was crazy. Now, we have 1 terabyte hard drives, and a DVD holds about 4.7 gigabytes, so it would take more than 200 DVDs to back up a 1 terabyte hard drive. This is even crazier.

It has simply reached the point of practicality where we can only backup a 1 terabyte hard drive with another 1 terabyte hard drive.

Since hard dives are so inexpensive these days, it is a practical and easy solution to copy your entire hard drive and store one in the closet and one in your safety deposit box or elsewhere. But you have to create multiple backups because, being a mechanical device, they will fail. You can have a RAID (Redundant Array of Independent/Inexpensive Disks) as backup to solve the failure problem, but that won't help if your house burns down.

Some companies are also offering on-line storage for backups, but would you really trust your data stored on somebody else's computer on the Internet? I wouldn't.

What to Backup

If you shoot raw files, it is recommended to store the originals in the camera manufacturer's raw file format. It is possible that in the future the manufacturer or a third party may come out with a better algorithm for decoding the raw data that might make for better pictures.

In addition to the light frames you have shot of various celestial objects, you should also archive the support frames such as darks, flats and bias frames. You should also archive a simple text file that contains information such as when and where the images were taken, exposures, f/stops, ISO, ambient temperature, etc.

It is also a good idea to archive the camera manufacturer's software program that converts the raw files, and even the astronomical image-processing program that you used to calibrate the data.


Additional Image Files to Backup

After calibration, I also archive the calibrated and stacked, but unadjusted and unenhanced files. These are a second generation of originals. These should be stored as uncompressed 16-bit TIFF files.

When I have completed calibrating, processing, correcting and enhancing an image, I make another copy of everything I have created along the way. I usually save the working files after each major processing step. That way, if I don't like, say, the amount of noise reduction I have applied to an image, I can just go back a week, or a month, or even years later, to the working copy of the image in the previous step. I don't have to go through the laborious process of calibrating, stacking, flat-fielding, white and black point setting, background neutralization, color correction, and image enhancement steps. I can simply pick the image file for whatever step in the process I want to start over with.

Generally all of these image file steps will easily fit on one 4.5 GB DVD, so it only costs me 25 cents per final image to do this, and it gives me another backup!

I don't keep all of these working files live on my hard drive after I am finished working on an image because they take up too much space. I only keep the raw originals, the calibrated and stacked original, and the final corrected and enhanced image live on my computer's hard drive. If I need to access any of the working files later, I pull them off the backup DVD.


File Formats

A digital image file is a series of numbers describing the brightness and location of the pixels in the image. There are dozens of different types of file formats that are used for storing this information. We will primarily be interested in six different types.

  • Raw - Raw file formats are proprietary to the particular camera manufacturer, although more third party companies, such as Adobe's Photoshop can open them. Canon's raw file format is called .CRW and .CR2. CRW stands for Canon Raw, and CR2 is the second generation. Nikon's .NEF stands for Nikon Electronic Format.

    Raw files can be opened by many image-editing programs, but once the data is changed, the image cannot be saved back to the original raw format because each is proprietary. Raw files that have been manipulated must be saved in another format such as TIFF or PSD.

  • TIFF - Tagged Information File Format - TIFF is an image format that has become the de-facto imaging-industry standard for uncompressed image file storage. It is supported by nearly every photo-editing and page-layout program.

  • PSD Photoshop - The Photoshop file format is Adobe's native file format specifically made to optimize the use of images in Photoshop. It supports layers, transparency, alpha channels, etc, and these may be saved and preserved in the file.

    Although the .PSD format is proprietary, many other programs, such as ThumbsPlus and IrfanView, can work with these .PSD files.

  • FITS - Flexible Image Transport System - The FITS format is used primarily by professional and amateur astronomers for storage and exchange of scientific data and astronomical images taken with CCD cameras. Many astronomical image processing programs work with FITS files, but if they use compression, there can be problems opening them in other programs.

  • JPEG - Joint Photographic Experts Group - JPEG has become the industry standard for compressed files. It supports 24-bit true color and is excellent for photographic and continuous-tone images. JPEG is supported by nearly all web browsers.

    JPEG compression works by discarding data to make the file size smaller, a method called LOSSY compression because information is lost. JPEG offers different compression ratios that give different quality depending on the amount of compression. A given file can be saved at a high quality setting that produces excellent quality but with a relatively large file size. The same file can be saved at a low quality but high compression ratio that produces 10:1 or more compression.

    For images that must be transmitted over phone lines, or the internet, JPEG compression can produce acceptable trade offs between file size and quality. However, JPEG compression can also produce artifacts that are cumulative, so JPEG files should not be opened, modified, and then re-compressed.

  • GIF - Graphics Interchange Format - GIF is a compressed file format that supports only 256 colors. GIF is a good format for storing line art or illustrations that contain type, such as screen shots of dialog boxes. GIF is a terrible format for storing full color photographic images, and should not be used for this. GIF is supported by most web browsers and image editing programs.


File Extensions

Once a file is saved in a particular format on a PC, the file name is appended with a three-letter extension after a period or dot that describes the file type. For example, the file might be called M31 and saved as a TIFF file, and the file name would be M31.TIF. On a Mac the extension is not necessary as the file type is stored in the resource fork with the file, but if a Mac file is transferred to a PC, the file will not be recognized because it does not have an extension. If you are a Mac user and you are going to be sharing your files with PC users, it is recommended that extensions be used in the file names.

Recommended Archiving

At an absolute minimum, store one copy of your original image files on a permanent optical media such as CD-R or DVD, and not the hard drive on your computer. To be really safe, keep two copies, with one off-site. Keep the CD disks in a cool, dry, dark environment.

If you have really large amounts of data, your only solution may be to back it up on another hard drive.

Recommended File Formats

  • Archive original untouched files in raw format in the camera manufacturer's original file format.

  • If images are in working stages with layers, they should be stored in Photoshop's .PSD file format.

  • Once raw files are converted to high-bit depth linear files, they should be stored in 16-bit TIFF files.

  • Final calibrated, corrected and enhanced images should also be archived as 16-bit TIFF files.

  • Images for display on the web, or for email transmission should be resized and resampled down to appropriate sizes and saved as moderate-quality JPEG images (a compression ratio of about 8 is a good compromise between file size and quality).


The Ultimate Long Term Storage Medium

If you seriously want your work to survive you, and last 100 years or more (that is about as ultimate as you are going to get), you should output your enhanced digital images to transparency slide film (while they still make it). We won't address the question of whether our work deserves to survive us!

Properly stored, transparency film is more archival than photographic prints. Compared to digital data, there are no problems in accessing the data. Here is why - you don't need any technology at all to access it. You simply hold it up to the light and look at it!

One hundred years from now, your great-great-grandson will be able to do the same. Do you think your children, grand children, great grand children, and great-great-grandchildren are going to go to the trouble of converting your entire digital archive to the next generation of digital storage media every 10 years or so? I don't think so!

My father happened to have shot some Kodachrome slides of me when I was born more than 50 years ago. Today, they look like they were shot yesterday. I can still scan them with a scanner, or copy them with a digital camera if I want to digitize them.

Film probably isn't going anywhere for a while and we can still output our best digital work to slides now and archive those. The latest transparency films should easily last 100 years if stored properly. If your great-great-grand kids want to digitize them, they can always use a macro lens on whatever digital cameras they have in the future. If they find one of your CD-ROMs, or DVDs in a forgotten box of your memorabilia, the chances are, they are not going to have a reader for it anymore.

You can have your digital files output to slides at Colorslide.com, and other vendors you can find on the Internet.

There is a certain irony that in this age of digital wonders, the analog medium of film is still the most archival storage medium we have.




Back | Up | Next