Optical Structure Recognition Software To Recover Chemical Information: OSRA, An Open Source Solution

Igor V. Filippov* and Marc C. Nicklaus
Laboratory of Medicinal Chemistry, SAIC-Frederick, Inc., NCI-Frederick, Frederick, Maryland 21702, and Laboratory of Medicinal Chemistry, NCI, NIH, DHHS, NCI-Frederick, Frederick, Maryland 21702
J. Chem. Inf. Model., 2009, 49 (3), pp 740–743
DOI: 10.1021/ci800067r
Publication Date (Web): February 17, 2009
Copyright © 2009 American Chemical Society
* To whom correspondence should be addressed. E-mail: igorf@helix.nih.gov., †

SAIC-Frederick, Inc.

, ‡

NCI-Frederick.

Abstract

Abstract Image

Until recently most scientific and patent documents dealing with chemistry have described molecular structures either with systematic names or with graphical images of Kekulé structures. The latter method poses inherent problems in the automated processing that is needed when the number of documents ranges in the hundreds of thousands or even millions since graphical representations cannot be directly interpreted by a computer. To recover this structural information, which is otherwise all but lost, we have built an optical structure recognition application based on modern advances in image processing implemented in open source tools, OSRA. OSRA can read documents in over 90 graphical formats including GIF, JPEG, PNG, TIFF, PDF, and PS, automatically recognizes and extracts the graphical information representing chemical structures in such documents, and generates the SMILES or SD representation of the encountered molecular structure images.

Citing Articles

View all 4 citing articles

Citation data is made available by participants in CrossRef's Cited-by Linking service. For a more comprehensive list of citations to this article, users are encouraged to perform a search in SciFinder.

This article has been cited by 4 ACS Journal articles (4 most recent appear below).

  • Cover Image

    AsteriX: A Web Server To Automatically Extract Ligand Coordinates from Figures in PDF Articles

    V. Lounnas and G. Vriend
    Journal of Chemical Information and Modeling2012 Article ASAP
    • AsteriX: A Web Server To Automatically Extract Ligand Coordinates from Figures in PDF Articles

      V. Lounnas and G. Vriend
      Journal of Chemical Information and Modeling2012 Article ASAP

      Coordinates describing the chemical structures of small molecules that are potential ligands for pharmaceutical targets are used at many stages of the drug design process. The coordinates of the vast majority of ligands can be obtained from either ...

  • Cover Image

    Chemical−Text Hybrid Search Engines

    Yingyao Zhou, Bin Zhou, Shumei Jiang and Frederick J. King
    Journal of Chemical Information and Modeling2010 50 (1), 47-54
    • Chemical−Text Hybrid Search Engines

      Yingyao Zhou, Bin Zhou, Shumei Jiang and Frederick J. King
      Journal of Chemical Information and Modeling2010 50 (1), 47-54

      As the amount of chemical literature increases, it is critical that researchers be enabled to accurately locate documents related to a particular aspect of a given compound. Existing solutions, based on text and chemical search engines alone, suffer from ...

  • Cover Image

    Tunable Machine Vision-Based Strategy for Automated Annotation of Chemical Databases

    Jungkap Park, Gus R. Rosania and Kazuhiro Saitou
    Journal of Chemical Information and Modeling2009 49 (8), 1993-2001
    • Tunable Machine Vision-Based Strategy for Automated Annotation of Chemical Databases

      Jungkap Park, Gus R. Rosania and Kazuhiro Saitou
      Journal of Chemical Information and Modeling2009 49 (8), 1993-2001

      We present a tunable, machine vision-based strategy for automated annotation of virtual small molecule databases. The proposed strategy is based on the use of a machine vision-based tool for extracting structure diagrams in research articles and ...

  • Cover Image

    CLiDE Pro: The Latest Generation of CLiDE, a Tool for Optical Chemical Structure Recognition

    Aniko T. Valko and A. Peter Johnson
    Journal of Chemical Information and Modeling2009 49 (4), 780-787
    • CLiDE Pro: The Latest Generation of CLiDE, a Tool for Optical Chemical Structure Recognition

      Aniko T. Valko and A. Peter Johnson
      Journal of Chemical Information and Modeling2009 49 (4), 780-787

      We present CLiDE Pro, the latest version of the output of the long-term CLiDE project for the development of tools for automatic extraction of chemical information from the literature. CLiDE Pro is concerned with the extraction of chemical structure and ...

Tools

SciFinder Links

SciFinder subscribers:  Click to sign in | Not a SciFinder subscriber? Learn more at www.cas.org

Explore by:


History

  • Published In Issue March 23, 2009
  • Article ASAPFebruary 17, 2009
  • Received: February 22, 2008

Recommend & Share

  • Share on ACS NetworkACS Network
  • Add to FacebookFacebook
  • Tweet ThisTweet This
  • Add to CiteULikeCiteULike
  • Add to NewsvineNewsvine
  • Digg ThisDigg This
  • Add to DeliciousDelicious

Related Content

Other ACS content by these authors: