Thursday, April 7, 2011

Digitizing Historic African American Education Collections: Methods

Yesterday, Kerrie Cotten Williams and I met with Sheila McAlister and Mary Willoughby of the Digital Library of Georgia (DLG) to have them review the first batch of scans, close to 4,000, for quality control and feedback. We are relieved to report all is well and on track, thanks in part to the meticulous work of Colleen Carrington, who struck a balance between speed and excellence early on in the project.  Though we have 8 more collections to digitize, about 70,000 scans, we've settled into a comfortable pace for production. In an earlier post, we provided a summary of the scope and goals of the grant. (See here.) Today's entry focuses in depth on the project's methods.

For the digitization grant, collection and series selection was based on three criteria – subject, format and copyright. We identified 11 collections with folder and item level EAD-encoded finding aids, totaling 50 linear feet. Random files were selected from boxes and sampled to obtain an estimation of 74,000 pages. AARL owns the physical property of its collections through deeds of gift or sale. Of the material proposed for this project, 97% is either in public domain or AARL/A-FPLS owns intellectual copyright. Material whose copyright remains with original creators will not be digitized and is available for use onsite. AARL and DLG abide by current laws and regulations regarding copyright and fair use. Patrons must obtain permission from both institutions and the copyright holders, if any, to publish, broadcast, perform, or exhibit materials held in either repository. AARL and DLG provide robust reprographic services for patrons, though the onus of securing permission from copyright holders is the patron’s responsibility. Both institutions facilitate access to copyright holders when the information is known.

When completed, access to the digitized materials will be through the AARL finding aids database, DLG Web site, and via Web search engines. Since 2004, the DLG has presented folder and item level EAD-encoded finding aids of AARL’s holdings as part of a project of the Georgia HomePLACE initiative, supported with federal LSTA funds administered by the Institute of Museum and Library Services through the Georgia Public Library Service. AARL’s finding aids are discoverable through any major internet search engine. Furthermore, AARL creates MARC21 bibliographic records for archival collections in WorldCat that are downloaded into the institution’s local online catalog. DLG personnel will add digital archival object links to the EAD inventories for each digitized folder. In viewing the container list for a given collection, users will discover which folders are available online, as well as those available only at AARL. The finding aids also communicate the content, context, and structure of AARL’s collections.

For two years, AARL grant staff will scan, crop, and deskew content using flatbed scanner work stations and Adobe Photoshop. During this period, AARL and DLG personnel will review images for quality control. From the EAD files, the DLG will automatically generate folder-level Dublin Core records using a Perl script and metadata mapping/subject heading assignment protocol developed and tested in the 2007 NHPRC-funded Troup County (GA) Archives digitization project. (See project results here, specifically Series IV and V.) The DLG will load the Dublin Core records into its union metadata catalog (META), which will support access to the material both via the DLG site and Web search engines.

The project will proceed according to national standards and best practices for digital imaging and description. The finding aids are encoded using EAD version 2002 and adhere to best practices developed by the Research Libraries Group. The folder-level metadata records will use Dublin Core as the data structure. The DLG will employ the following content standards: AACR2, DACS, LCSH, LCNAF, AAT, and the local DLG name authority database. Master images will be 400 dpi, uncompressed TIF 6 images in 24-bit RGB color at 100% size. File-level access versions will be layered PDF, with DjVu as an alternate choice. DLG personnel will generate derivatives (PDF and DjVu files) for each folder of content for Web display; load master images to the DLG archival storage system for migration, disaster recovery, and other uses; load PDF and DjVu files to the DLG public Web server; and implement format selector pages.

Patron demand for AARL's archival collections has increased steadily since 2004, when we partnered with the DLG to create and host EAD records for our processed collections. Within the Archives Division, usage statistics capture the number of archives patrons inside and outside the library, as well as the collections requested and used. Between 2005 and 2009, there have been 4,448 patrons in AARL's Archives Division. During the same period there were 5,817 requests for archives and special collections. Also, staff has identified which collections have more frequent request and use. In particular, requests for and use of materials that document African American education, civil rights, and African American business have increased. Beginning in March 2009, the Archives Division began using Google Analytics to track Web site use by remote visitors, hoping to better identify the specific archival finding aids viewed by these users. Since implementing Google Analytics, staff has tracked use of archival finding aids that specifically represent the collections in this grant. Analysis of these statistics allows us to make better informed decisions about collection processing priorities and collection delivery methods, including digitization.

In future posts, we will address a summary of the collections, lessons learned, and project outcomes. Keep checking in, and please comment.
Posted by Wesley Chenault, Library Research Associate