
|
Molecular biology has revolutionized the pharmaceutical industry. Many researchers now rely on tools from molecular biology to understand the mechanisms of disease, giving them a key to develop better therapies to treat illness. Gene expression patterns (or signatures)up- or down-regulation of genes of interestcan help illuminate pathophysiological processes and how they are affected by drug treatments. These signatures are easily obtained for cultured cells and animal or patient samples before, during, or after a drug treatment course by using DNA array technology. DNA arrays are microscope slides (microarrays) or membrane filters containing a large number of immobilized DNA samples, often called spots. Researchers probe these spots with two dye-tagged or radioactively labeled cDNAs. These cDNAs are made by reacting reverse transcriptase with RNA from biological samples of interest, such as cells from patients. Following the hybridization step, the DNA array is scanned to generate two images, each corresponding to one of the dye colors or to the control and sample sources. In an experiment aiming to study physiological reactions to drug treatments, researchers generate numerous images over time for comparison. The sensitivity of this technique exceeds that of histopathology and provides more useful information for drug development and usage. For this reason, DNA arrays are becoming indispensable tools for investigating the mechanism of drug action (1). A bottleneck in DNA array technology is management of abundant information associated with slide and high-density membrane microarray design, image processing, and data mining. Image processing has progressed during the past year, however, providing better and more reliable tools to deal with the explosion of information. Image analysis Hybridization spots have to be identified and further analyzed. However, shape and size of the spots fluctuate significantly across the array. These fluctuations are caused by printing, hybridization, and slide-surface chemistry factors, which can significantly affect the interpretation of the microarray image. Spot position variation is caused by inherent mechanical limitations of the spotting process, for example, vibration of the printing pins within the pinhead and vibrations of the platform. Microarray images consist of arrays of spots arranged in grids. All grids have the same numbers of rows and columns of spots. A microarray may have several grids, called sub-grids, which are arranged in relatively equal spacing with each other to form a meta-array. The meta-array structure is an artifact of array production and is caused by using a print-head with multiple spotting pins. Typically, each pin of the robotic arrayer deposits DNA material in a single subgrid. A computer perceives an image as a two-dimensional array (or matrix) of numbers. Each array element is called a pixel, or picture element, and is represented in the computer as an integer value. Frequently, the pixel is represented as an unsigned 8-bit integer in the range [0, 255], with 0 corresponding to black, 255 corresponding to white, and shades of gray distributed over the middle values. However, most microarray scanners have higher than 8-bit sensitivity (closer to 1213 bits, typically) and thus use a 16-bit TIFF format to store the images. A higher sensitivity allows for finer differentiation in the range of intensity values. A 16-bit representation produces up to 65,536 different shades of gray. Precise identification of signal pixels for microarray spots is crucial for obtaining accurate data in expression analysis. Sophisticated machine vision algorithms have been implemented in various software packages to help selection of grids with high precision in a matter of seconds. In most cases, the user will identify the bounding area of a sub-grid by selecting corner spots (2). The number of columns and rows enclosed in the rectangle should match the expected number of rows and columns of spots in the array, which is known a priori. The next step is to identify the location of the corner spots of the bounding sub-grids in the image. Then the spot-finding algorithm uses that information to create the grid in seconds. It adjusts the location of the grid points and lines to locate the arrayed spots in the image. The software should allow for additional, quick manual adjustment of the grid points if the automatic spot finding method has not correctly identified certain spot positions. After the positioning of the grids and identification of the spot location and size, the software will process both control and sample images. Image segmentation algorithms are used to appropriately identify and segregate the pixels associated with each spot signal area from its local background and possible other contaminationseven if the contamination has landed on the spot. This approach involves human intervention in the process of spot location. Despite drastic savings in time compared with the initial stages of microarray image processing, the procedure becomes cumbersome when hundreds of microarray images need to be processed, as is the case in high-throughput screening. A single lab could produce from a few to thousands of array images per week. Each image contains several million pixels. Any approach involving even the smallest human effort per spot is certainly impractical at this scale. Because of continuing advances in hardware technology and the need to integrate information for disparate laboratories, image analysis software should not be dependent on the type of scanner or microarray matrices used. Also, users may work with both glass and high-density membrane DNA arraysrobust software is needed to process the images efficiently and with high accuracy for both types of arrays. Therefore, accurate and powerful software is required to handle this processing in a high-throughput environment. Better data faster An automated system needs only input of the microarray configuration (e.g., number of rows and columns of spots) and a list of image files to process, after which analysis should be performed automatically. This system should be able to search the image for grid position, identify the layout of the array, localize the spots, and perform measurements without the need of user intervention. For many laboratories, workers must be able to access data from multiple locations. One way to efficiently handle this situation is to send images to a server. A system should be able to process the images and archive the results, which would be accessible via an intranet in an institution or a secure server through the Internet. With this setup, multiple users in the laboratory can access system reports, view results, and work on data analysis and interpretation. An example of software that fits the high-performance requirements of high-throughput screening is AutoGene from BioDiscovery, Inc. (Mountain View, CA). This system provides autonomous operation and offers batch-mode (overnight) processing of multiple images. The operator specifies image characteristics. Subsequently, images are loaded in a batch file. The system runs by itself without any operator intervention. Spots are quantified and assessed for artifacts individually through several computer vision algorithms, which increase reliability of the data. Robust statistical algorithms processed irregularity of spot sizes and shapes, which were caused by spot-printing hardware errors. Multiple quality measures per spot are obtained, which allow for association of confidence values to each measurement. Visual presentation of the results allows for manual inspection of output at any time. Several users may access the server (available as a workstation) and inspect the data from processed images. Analysis and visualization
SOMs, known also as neural networks, are computationally intensive, algorithmic procedures for transforming inputs into desired outputs by using highly connected networks of relatively simple processing units (neurons or nodes). A SOM has a set of nodes with a simple topology and a distance function on the nodes. Nodes are iteratively mapped into k-dimensional gene expression space, in which the ith coordinate represents the expression level in the ith sample. SOMs have several features that make them particularly well suited to clustering and analysis of gene expression patterns. They
The availability of complicated multivariate analysis techniques, such as SOMs, hierarchical clustering, and principal component analysis, makes software packages like GeneSight (BioDiversity, Inc.) a useful asset for rapid analysis of data (3). Conclusions References
Alexander Kuklin is marketing and applications manager for BioDiversity, Inc., in Mountain View, CA. Top || Modern Drug Discovery Home Page
|