
The Challenge:
The American Red Cross has been
publishing the Journal of Immunohematology on a quarterly basis since 1983. The
Redcross recently began digitizing their Immunohematology journals and providing
availability to them online. Abstracts of articles from the journal have been
updated to the National Center for Biotechnology Information’s (NCBI) database
on a regular basis to enable researchers from all over the world to search for
specific articles relating to blood disorders. The challenge: all the journals
before 1993 were only available in a hardcopy format. The RedCross wanted to
convert their archive of hard copy issues into searchable PDF’s as well as index
the articles and issues with XML and upload them to NCBI.
The Solution:
The Catapult Consultants approach in converting
the hardcopies involved a three step process. The first step was to take the
hardcopies and meticulously scan each copy and convert them to a digital image
format that was suitable for our Optical Character Recognition (OCR) software.
Catapult Consultants then utilized the OCR software to convert the digital image into a
character based searchable PDF. A great deal of effort was involved in manual
correction of the interpretation of chemical formulas and scientific notations
by the OCR software. The last step involved the creation of XML files indexing
the volume, issue number, article title, abstract, and authors. These XML files
were required to adhere to the schema and standards established by NCBI, this
proved to be a challenge as the chemical formulas and scientific notations had
to be changed again, sometimes using special character codes to meet their code
specifications. As a final check the XML files were processed thru their XML
Validator to ensure adherence to their standards.
The Benefit:
The Red Cross was now able to convert
their entire library of Immunohematology Journals from hard copies to a digital
based format. The community they serve now has the ability to access the
journal information on demand via the web. The search functionality
provides their users the ability to retrieve only the information they need,
which saves them time and effort in their research. The end result of this
project is the Red Cross opening of a treasure trove of knowledge for
researchers and doctors seeking information on blood disorders.