to Modern Drug Discovery home
May 2000
Modern Drug Discovery, 2000, 3(5) 52–54.
© 2000 American Chemical Society.


Spot checks


...
Automation in microarray image processing and gene expression on analysis speeds up drug discovery.

BY ALEXANDER KUKLIN

Molecular biology has revolutionized the pharmaceutical industry. Many researchers now rely on tools from molecular biology to understand the mechanisms of disease, giving them a key to develop better therapies to treat illness. Gene expression patterns (or “signatures”)—up- or down-regulation of genes of interest—can help illuminate pathophysiological processes and how they are affected by drug treatments. These signatures are easily obtained for cultured cells and animal or patient samples before, during, or after a drug treatment course by using DNA array technology.

DNA arrays are microscope slides (microarrays) or membrane filters containing a large number of immobilized DNA samples, often called spots. Researchers probe these spots with two dye-tagged or radioactively labeled cDNAs. These cDNAs are made by reacting reverse transcriptase with RNA from biological samples of interest, such as cells from patients. Following the hybridization step, the DNA array is scanned to generate two images, each corresponding to one of the dye “colors” or to the control and sample sources. In an experiment aiming to study physiological reactions to drug treatments, researchers generate numerous images over time for comparison.

The sensitivity of this technique exceeds that of histopathology and provides more useful information for drug development and usage. For this reason, DNA arrays are becoming indispensable tools for investigating the mechanism of drug action (1).

A bottleneck in DNA array technology is management of abundant information associated with slide and high-density membrane microarray design, image processing, and data mining. Image processing has progressed during the past year, however, providing better and more reliable tools to deal with the explosion of information.

Image analysis
The fundamental goal of array image processing is to measure the intensity of the arrayed spots into quantified expression values. The task of translating an image of spots of varying intensities into a table linking intensity values to each gene has been impeded by several microarray technology challenges.

Hybridization spots have to be identified and further analyzed. However, shape and size of the spots fluctuate significantly across the array. These fluctuations are caused by printing, hybridization, and slide-surface chemistry factors, which can significantly affect the interpretation of the microarray image. Spot position variation is caused by inherent mechanical limitations of the spotting process, for example, vibration of the printing pins within the pinhead and vibrations of the platform.

Microarray images consist of arrays of spots arranged in grids. All grids have the same numbers of rows and columns of spots. A microarray may have several grids, called sub-grids, which are arranged in relatively equal spacing with each other to form a meta-array. The meta-array structure is an artifact of array production and is caused by using a “print-head” with multiple spotting pins. Typically, each pin of the robotic arrayer deposits DNA material in a single subgrid.

A computer “perceives” an image as a two-dimensional array (or matrix) of numbers. Each array element is called a pixel, or picture element, and is represented in the computer as an integer value. Frequently, the pixel is represented as an unsigned 8-bit integer in the range [0, 255], with 0 corresponding to black, 255 corresponding to white, and shades of gray distributed over the middle values. However, most microarray scanners have higher than 8-bit sensitivity (closer to 12–13 bits, typically) and thus use a 16-bit TIFF format to store the images. A higher sensitivity allows for finer differentiation in the range of intensity values. A 16-bit representation produces up to 65,536 different shades of gray. Precise identification of signal pixels for microarray spots is crucial for obtaining accurate data in expression analysis.

Sophisticated machine vision algorithms have been implemented in various software packages to help selection of grids with high precision in a matter of seconds. In most cases, the user will identify the bounding area of a sub-grid by selecting corner spots (2). The number of columns and rows enclosed in the rectangle should match the expected number of rows and columns of spots in the array, which is known a priori.

The next step is to identify the location of the corner spots of the bounding sub-grids in the image. Then the spot-finding algorithm uses that information to create the grid in seconds. It adjusts the location of the grid points and lines to locate the arrayed spots in the image. The software should allow for additional, quick manual adjustment of the grid points if the automatic spot finding method has not correctly identified certain spot positions.

After the positioning of the grids and identification of the spot location and size, the software will process both control and sample images. Image segmentation algorithms are used to appropriately identify and segregate the pixels associated with each spot signal area from its local background and possible other contaminations—even if the contamination has landed on the spot.

This approach involves human intervention in the process of spot location. Despite drastic savings in time compared with the initial stages of microarray image processing, the procedure becomes cumbersome when hundreds of microarray images need to be processed, as is the case in high-throughput screening. A single lab could produce from a few to thousands of array images per week. Each image contains several million pixels. Any approach involving even the smallest human effort per spot is certainly impractical at this scale.

Because of continuing advances in hardware technology and the need to integrate information for disparate laboratories, image analysis software should not be dependent on the type of scanner or microarray matrices used. Also, users may work with both glass and high-density membrane DNA arrays—robust software is needed to process the images efficiently and with high accuracy for both types of arrays. Therefore, accurate and powerful software is required to handle this processing in a high-throughput environment.

Better data faster
The goals of complete automation in microarray image processing are to provide high accuracy in spot location, eliminate noise signals from the data analysis process, and minimize operator involvement in the procedure. This approach reduces time for personnel training and operator involvement. Automation ensures consistent, high quality control of data extraction, which is of paramount importance in drug development.

An automated system needs only input of the microarray configuration (e.g., number of rows and columns of spots) and a list of image files to process, after which analysis should be performed automatically. This system should be able to search the image for grid position, identify the layout of the array, localize the spots, and perform measurements without the need of user intervention.

For many laboratories, workers must be able to access data from multiple locations. One way to efficiently handle this situation is to send images to a server. A system should be able to process the images and archive the results, which would be accessible via an intranet in an institution or a secure server through the Internet. With this setup, multiple users in the laboratory can access system reports, view results, and work on data analysis and interpretation.

An example of software that fits the high-performance requirements of high-throughput screening is AutoGene from BioDiscovery, Inc. (Mountain View, CA). This system provides autonomous operation and offers batch-mode (overnight) processing of multiple images. The operator specifies image characteristics. Subsequently, images are loaded in a batch file. The system runs by itself without any operator intervention. Spots are quantified and assessed for artifacts individually through several computer vision algorithms, which increase reliability of the data. Robust statistical algorithms processed irregularity of spot sizes and shapes, which were caused by spot-printing hardware errors. Multiple quality measures per spot are obtained, which allow for association of confidence values to each measurement. Visual presentation of the results allows for manual inspection of output at any time. Several users may access the server (available as a workstation) and inspect the data from processed images.

Analysis and visualization
Microarray data analysis can involve identifying statistically significant up- or down-regulated genes, finding functional groupings of genes by discovering similarity or dissimilarity among gene expression profiles, or predicting the biochemical and physiological pathways of previously uncharacterized genes. Simple microarray experiments need comparison of just two samples. Under such circumstances, genes can be ranked by their relative induction. More complicated experimental designs involve developmental time courses in cell lines or patients. Such studies generate enormous amounts of data to be analyzed.

...
Figure 1. Self-organizing maps cluster data in GeneSight. Genes are plotted horizontally, and experiments/files are plotted vertically in the display. The color-coded expression level of all genes in all experiments is shown.
The first step in data analysis from microarray experiments begins with categorization of expression patterns according to their similarity and existing functional annotations. Many mathematical techniques have been developed for identifying underlying patterns in complex data. Cluster analysis methods have been used to systematically group related gene expression patterns. Hierarchical clustering is a common computational approach used for microarray data analysis. Another clustering technique used in microarray data analysis has been self-organizing maps (SOMs, Figure 1).

SOMs, known also as neural networks, are computationally intensive, algorithmic procedures for transforming inputs into desired outputs by using highly connected networks of relatively simple processing units (neurons or nodes). A SOM has a set of nodes with a simple topology and a distance function on the nodes. Nodes are iteratively mapped into k-dimensional “gene expression” space, in which the ith coordinate represents the expression level in the ith sample.

SOMs have several features that make them particularly well suited to clustering and analysis of gene expression patterns. They

  • allow imposing of partial structure on the cluster in contrast to the rigid structure of hierarchical clustering;
  • facilitate easy visualization and interpretation; and
  • are superior in both robustness and accuracy to other clustering techniques.

The availability of complicated multivariate analysis techniques, such as SOMs, hierarchical clustering, and principal component analysis, makes software packages like GeneSight (BioDiversity, Inc.) a useful asset for rapid analysis of data (3).

Conclusions
Automation and integration of microarray data management into a high-throughput environment will lead to a fast and error-proof information analysis methodology. Combining advanced tools for microarray data analysis with the progress in hardware design of microarray spotting robots and scanning devices will enhance our understanding of biological processes at the genomic level.

References

  1. Debouk, C.; Goodfellow, P. N. DNA microarrays in drug discovery and development. Nature Genet. 1999, 21 (1 Suppl), 48–50.
  2. Kuklin, A. Using array image analysis to combat HTS bottlenecks. Genetic Eng. News 1999, 19 (19), 32.
  3. Kalocsai, P.; Shams, S. Visualization and analysis of gene expression data. J. Assoc. Lab. Automation 1999, 4 (5), 58–61.


Alexander Kuklin is marketing and applications manager for BioDiversity, Inc., in Mountain View, CA.
Comments and questions for the author may be addressed to the Editorial Office by e-mail at mdd@acs.org, by fax at 202-776-8166 or by post at 1155 16th Street, NW; Washington, DC 20036.

Top || Modern Drug Discovery Home Page

CASChemPortChemCenterPubs Page