Paper Media

From Digitize the Planet
Revision as of 15:24, 29 April 2016 by Jscott (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Digitizing paper media — books, magazines, flyers, data sheets, pamphlets, and so on — can range from extremely easy and straightforward to complex, depending on the type of original media.

Scanners for Paper

There are three main types of scanners for paper media: the one you choose will depend on whether the original media is bound or loose, the weight/quality of the paper and, for bound materials like books and magazines, whether you are willing to remove the binding (destroying the original) or if you want it to remain intact.

Flatbed scanners

A flatbed scanner

A flatbed scanner (connected to a computer) allows you to place one page at a time on a glass platen. The scanner scans one side of that page and sends the image data to software on the computer. Flatbed scanners are flexible, in that they can be used to scan most any printed material. For instance, you can scan a magazine or book a page at a time, if you're willing to open the book wide enough to place each page flat on the platen. Flat-beds are great for scanning paper that can be difficult for other scanners, like very thin pages, or pages that are unusually small or large. These are often the least-expesnibve scanners to buy.

On the downside, they require the constant attention of the person using it. With a sheet-feed scanner, you could drop in a stack of pages and walk away, but with a flatbed, you'll be placing a page, scanning, placing a new page, scanning, and so on. This is fine for a few sheets, but can be time-consuming for big projects.

Sheet-feed scanners

A sheet-feed scanner. The papers at the top are waiting to be scanned; the ones at the bottom have been scanned.

When you have stacks of loose-leaf pages to scan, sheet-feed scanners are a great way to go. A sheet-feed scanner (connected to a computer) allows you to place many pages (perhaps 50 sheets or more, depending on the model) in a hopper. An automatic document feeder (ADF) feeds the pages through one by one as it scans them. This can be a speedy method of scanning many pages quickly when you have many pages of uniform in size. A sheet-feed scanner requires less operator intervention than a flatbed scanner. Barring the occasional jam, the ADF should feed its stack of pages through without constant babysitting.

Some models of sheet-fed scanners can scan both sides of the page at the same time, a feature called duplexing. This is a huge timesaver when you're scanning two-sided documents, cutting the scanning time in half compared with single-sided (non-duplexing) scanners.

The obvious downside to sheet-fed scanners is that they can only scan loose-leaf pages. If you want to scan a book or magazine, you would need to cut the binding off first, destroying the book.

Non-destructive scanners

A non-destructive scanner

A non-destructive scanner allows you to scan pages of bound material like a book or magazine with the binding still intact. This type of scanner is the best choice for rare or valuable bound material.

However, non-destructive scanners are relatively hard to find: few commercial models exist, and those they do are typically very expensive (more than $10,000), the kind of thing a well-funded library or institution might invest in. Two of the few consumer-level models are the Fujitsu ScanScap SV600 and Plustek 4800. For intrepid do-it-yourselfers, a web site called DIY Book Scanner offers plans for building your own non-destructive book scanner, and an active forum about book scanners.

These scanners work by letting you place the source material face up. The pages may be held flat by a transparent, V-shaped holder. The "scanning" is actually done by digital camera (often, two — one for each facing page) which takes a picture of the flattened pages. These scanners require constant operator attention to flip pages and activate the scanner, though the high-end commercial models can scan hundreds of pages per hour.

Even with the pages being held flat by the V-shaped glass, pages may not be perfectly flat when scanned. So, it's often necessary to do post-processing of this type of scan with specialized software that corrects for pages that aren't perfectly flat. Choices include Scan Tailor and Book Scan Wizard (both are free.)

(ADD PD image of scanner)

Multi-function devices

Some scanners offer both flat-bed and sheet-feed capabilities, which can be helpful if you need to do both types of scanning.

In addition, many home and small office "multifunction devices" work as a printer, copy machine, and scanner. These aren't recommended for anything but the smallest scanning jobs: their scanners tend to be slow, inflexible, and relatively low resolution compared to dedicated scanners.

Which scanner to use for your project?

If you'll primarily be scanning loose sheets of paper of uniform size, a sheet-feed scanner is the way to go.

If you'll primarily be scanning books and magazines, a non-destructive scanner is the optimal choice, but as mentioned, they are expensive or difficult to build.

If you cut off the bindings, you can then use a sheet-feed scanner. Otherwise, a flat-bed scanner will get the job done.

Cutting of the bindings: your local copy shop may have a tool that easily chops bindings off of even thick books. As an alternative, you can use a single-edge razor blade to cut the pages away from the bindings. This can be time-consuming and must be done with care.

Scanner settings

For most projects digitizing/archiving paper media, we recommend scanning at 600 (or higher) ppi. Scan in color pr grayscale as appropriate. Save in TIFF format.

That will produce large, detailed scans that take up a lot of disk space. But that's OK. There are always tradeoffs between scan format/resolution, image quality, and file size. Our philosophy is that disk space and bandwidth are cheap, but doing the work of scanning is expensive (in terms of your time) so get the best scans that you can now, so some schlub doesn't feel the need to do the scans again in the future. (That schlub may even be you.)

The resulting scans may be too high-resolution or too big in filesize for your current needs — for instance, you may want to throw them up on your web site. If that's the case, you can use a utility like GraphicConverter to create versions in JPG format with lower resolution — but you'll still have the high-resolution scans when needed. On the other hand, if you upload those fat, hi-resolution scans to the Internet Archive, their system will ingest them easily, make them available for people who want them, and automatically create smaller versions for viewing online.

When scanning, you might be tempted to save space by scanning to black and white, or to JPG format, or at only 300 ppi. (That's pixels per inch, also known as dots per inch — dpi — in some scanning software.) We recommend not doing any of that — all of those choices throw away details of the scanned document in various ways. And while your black and white, 300 ppi, JPG scan may seen good enough for now, given constant improvements in screen resolution, printing resolution, and new tech like 3-D printing, a "good enough for now" scan does a disservice to the future.

The only exception to this is when scanning something that you know will never have long-term value to anyone. When you scan your business receipts for the tax man, no one will fault you for doing it at "good enough" settings.

Some Links

Personal tools