American Red Cross: Converstion to XML

The Challenge:
The American Red Cross has been publishing the Journal of Immunohematology on a quarterly basis since 1983. The Redcross recently began digitizing their Immunohematology journals and providing availability to them online. Abstracts of articles from the journal have been updated to the National Center for Biotechnology Information’s (NCBI) database on a regular basis to enable researchers from all over the world to search for specific articles relating to blood disorders. The challenge: all the journals before 1993 were only available in a hardcopy format. The RedCross wanted to convert their archive of hard copy issues into searchable PDF’s as well as index the articles and issues with XML and upload them to NCBI.

The Solution:
The Catapult Consultants approach in converting the hardcopies involved a three step process. The first step was to take the hardcopies and meticulously scan each copy and convert them to a digital image format that was suitable for our Optical Character Recognition (OCR) software. Catapult Consultants then utilized the OCR software to convert the digital image into a character based searchable PDF. A great deal of effort was involved in manual correction of the interpretation of chemical formulas and scientific notations by the OCR software. The last step involved the creation of XML files indexing the volume, issue number, article title, abstract, and authors. These XML files were required to adhere to the schema and standards established by NCBI, this proved to be a challenge as the chemical formulas and scientific notations had to be changed again, sometimes using special character codes to meet their code specifications. As a final check the XML files were processed thru their XML Validator to ensure adherence to their standards.

The Benefit:
The Red Cross was now able to convert their entire library of Immunohematology Journals from hard copies to a digital based format.  The community they serve now has the ability to access the journal information on demand via the web.  The search functionality provides their users the ability to retrieve only the information they need, which saves them time and effort in their research.  The end result of this project is the Red Cross opening of a treasure trove of knowledge for researchers and doctors seeking information on blood disorders.

1750 Tysons Blvd., Suite 240, McLean, VA 22102 | Phone: 703.849.0960 | Fax: 703.852.2843 | Email: info@catapultconsultants.com
Copyright © 2001 Catapult Consultants. All rights reserved.