Digital Conversion Methodology
Production Process
Items will be scanned on various scanners, determined by the format of the primary source material A HP flatbed will be used to scan color and most grayscale items. Minolta's overhead scanners will be utilized for bound print materials, the PS3000 for bitonal, and the PS7000 (out in January) is planned for bound grayscale. Microfilm materials will be scanned with ScreenScan, mounted on a Minolta reader. The scanner operator will key into the OhioLINK Bulldog cataloging software basic bibliographic and capture metadata. The Project Manager will carefully check the scanned image against the original. After passing image inspection the scanner operator will catalog the image with Dublin Core elements. Please see attached project personnel duty chart for more details. Project manager will again check data and image. To produce textual output, image files of good quality printed text will be run through Textbridge software. Textual material not OCRable will be keyed in, edited by another worker, and quality checked by the Project Specialist. As material passes inspection, it will be backed up on tape, resulting in an on site and off site backup, and the images FTPed to OhioLINK's storage server at the Ohio Supercomputer Center in Columbus.
Format
All material will be scanned at 400 dpi. Archival copies will be stored as follows with lossless compression algorithms: bitonal (1 bit) images will be stored as TIFF with ITU Group IV compression; grayscale (8 bit) and color (32 bit) copies will be stored as TIFF with LZW compression. Derivatives in the form of low resolution thumbnails and medium resolution service images will be created from the archival masters. Thumbnails will be in the GIF format, while the service images will formatted according to content - GIF for printed and handwritten textual documents, and JPEG for maps and continuous tone images such as photos. The archival TIFFs will also be available for download, for those users who require a high resolution image. All textual material with the exception of graphs, charts, and figures, will be output as searchable ASCII with HTML formatting. Textual material which is not OCRable (such as poor quality print and handwritten) will be rekeyed. Oversize maps will be scanned in "tiled" sections. We would also be pleased to work with LC to investigate if it would be feasible for LC to create and mount compressed versions of the oversize maps using MrSID, a very promising proprietary technology. Please see the appendix for the chart, Categories of Material and Formats for Digital Reproduction for further breakdown and totals.
Consultants and Outside Vendors
None will be used.
In-house Production
Items will be scanned on various in-house scanners, determined by the format of the primary source material A HP flatbed will be used to scan color and most grayscale items. Minolta's overhead scanners, which do not require material to be laid flat and strain the binding or brittle pages, will be utilized for bound print materials, the PS3000 for bitonal, and the PS7000 (available January 1999) is planned for bound grayscale and oversize. Microfilm materials will be captured with a ScreenScan device on a Minolta microform reader. Software-wise, TMSSequoia ScanFix will deskew and despeckle, Adobe PhotoShop will be utilized for image cropping/correction, and Equilibrium DeBabelizer will be used to create the derivatives. Xerox Textbridge will be used for OCR and text correction. The Project Manager has used the same or very similar hardware and software on a previous LC/Ameritech award project, and is currently using them for an on-going reformatting project.
|