Between missing chunks of chromosomes and single nucleotide polymorphisms (SNPs) lies a vast middle ground of genomic alterations. Among these are copynumber variations (CNVs) - the differences between individuals in the number of copies of a genomic region. "The total nucleotide content that is encompassed by CNVs most certainly exceeds that of SNPs," says Stephen Scherer of The Hospital for Sick Children in Toronto.
The recent surge of interest in CNVs has induced a proliferation of technologies designed to detect them in normal DNA, congenital diseases, and cancer cells, in which copynumber changes may induce their unruly divisions.
Scientists have largely turned to comparative genomic hybridization (CGH) arrays, which involve hybridizing two genomes - one as a reference and one to be tested - that fluoresce in different colors. Measuring the color that dominates at a given region determines whether the test genome contains an insertion or deletion at that site. Bacterial artificial chromosome (BAC) arrays allow researchers to mine the entire genome for CNVs. Two companies, NimbleGen Systems and Agilent Technologies now make oligonucleotide probes specifically for CNV detection. Alternatively, genotyping arrays, originally designed to identify SNPs, have been modified by users and by companies - mainly Affymetrix and Illumina - to uncover CNVs.
Researchers are systematically comparing the pros and cons of different platforms, says Charles Lee of Harvard Medical School in Boston, but so far, "there's no consensus on which is the best platform to use." SNP arrays are usually more cost-effective for genome-wide association studies that match known CNVs with gene expression or a particular disease, while oligonucleotide arrays are a better choice for detecting novel variants.
According to Evan Eichler of the University of Washington in Seattle, "the problem is that current prices are all too high." Regardless of platform, one array runs from $300 to $1,000, he says, but "we need prices at about $100 to $150 to screen a large number of samples genome-wide" in order to detect rare variants. Cost is rapidly becoming less of a barrier, Scherer says, with prices of all the major platforms, especially SNP arrays, dropping substantially in the past year.
User: Frank Speleman, Ghent University Hospital, Belgium
The project: Using BAC arrays left over from the Human Genome Project to screen cell lines for CNVs that associate with disease, including neuroblastoma and Hodgkin lymphoma.
The problem: BAC arrays are time-intensive to create, and the arrays themselves are not completely reproducible. Also, they might miss copy-number differences smaller than 50-100 kb.
The solution: Oligonucleotide and SNP arrays offer much quicker answers to copy-number questions, Speleman says. He expects that BAC arrays will die out in the near future, but his group continues to use them, mainly because of the time they've already invested in creating them. Moreover, since their genome coverage is comprehensive, BACs also offer more robust data than other arrays.
User: Ken Lo, Roswell Park Cancer Institute, Buffalo, New York
The project: Looking for CNVs in medulloblastoma and glioblastoma cell lines.
The problem: Tumor samples have two characteristics that make copynumber analyses difficult: abnormal numbers of chromosomes, and a heterogenous cell composition.
The solution: Lo's group uses both BAC arrays and Affymetrix SNP arrays, and picks through their data to manually correct suspected errors. For example, a detected single-copy gain might actually be a loss in a tetraploid cell or a gain in a diploid cell. They then look for a loss of heterozygosity (the loss of one parental allele) to decide which is the case.
Current platforms "are all designed based on the premise that the natural ploidy state of your DNA sample is two," Lo says. What's more, he adds, "sometimes, you can't tell whether [a data problem] is a ploidy issue or a tumor heterogeneity issue." Tumors are often mixtures of many types of cells, and copy-number changes occur differently in each. Since the cells are merged before DNA extraction, the results reflect an average across different cell types. His group is collaborating with Yuhang Wang, a computer scientist at Southern Methodist University in Dallas, who is developing algorithms to control for these issues. For now, says Lo, "you have to really, really think about the results."
User: George Zogopoulos, University of Toronto
The project: Genome-wide scans to detect CNVs in the general population and in patients with gastrointestinal cancer.
The problem: Genotyping platforms like the Affymetrix array that Zogopoulos uses can generate noisy data and don't cover the whole genome, particularly regions rich in repetitive sequences that are likely to contain CNVs.
The solution: "Given that [SNP arrays] weren't primarily designed for this, it's important to validate using a second laboratory approach," Zogopoulos says. He and his colleagues confirm their results with quantitative PCR. The sensitivity of PCR is often right at the needed level to detect copy-number changes, with the use of a high number of replicates to generate statistical power, he says.
Despite their drawbacks, Zogopoulos stuck with SNP-based arrays for their sensitivity to very fine-scale copynumber changes, and the ability to detect both SNPs and CNVs in a single assay. "We had generated the data for a different project, and we took advantage of the wealth of genetic data and reanalyzed it for copynumber variation."
User: Matthew Hurles, Genome Dynamics and Evolution Group, The Wellcome Trust Sanger Institute, Cambridge, UK
The project: Screening genomic samples from thousands of individuals to look for an association with common conditions such as diabetes, rheumatoid arthritis, and hypertension.
The problem: Results from such a large sample are often plagued by what Hurles terms "batch effects." Quality between sets of extracted DNA can vary, or discrepancies in DNA processing might arise simply because different people run the samples at different times. "Systematic differences can really screw up your association studies," Hurles says.
The solution: Most manufacturers offer "multiplex" arrays, which contain fewer probes but multiple sets of the same probes, Hurles says. "They're not going to give you whole-genome coverage, but they're targeted towards the CNVs that you already know exist." One of the best ways to control for batch issues is to run control genomic samples on each subarray in the multiplex setup, he says. "That kind of approach minimizes any systematic differences that you might get between cases and controls. It doesn't get rid of the effect, but it means it has less of an impact on your association testing." As for quality differences in your original DNA samples, Hurles says, "ideally you control that by not having it in the first place."