Ira M. Hall, PhD
Structural Genome Variation & Evolution; Genomic Instability; Epigenetic Inheritance
The goal of my laboratory’s work is to understand how genome architecture changes over different time scales, and in response to selective pressure. Our work is focused on mammalian genomes, mainly human and mouse. The genetic differences that change genome architecture are collectively known as “structural variation” (SV) and include duplications, deletions and other rearrangements. While microscopically-visible rearrangements have long been recognized to cause sporadic human disorders and to account for a large proportion of the alterations found in tumor genomes, it is only since the development of genome-wide techniques that the prevalence of submicroscopic variants has been recognized. It is now apparent that two “normal” human individuals differ by ~2,000 structural variants, that similar levels exist in other mammals, and that these differences are underlie a significant fraction of phenotypic variation and disease. The genome is thus a structurally plastic entity, and it is important to understand how and why this is so.
Our work is divided into four projects. First, we are developing techniques to identify genomic rearrangements. Massively parallel DNA sequencing technologies now offer the unprecedented opportunity to reconstruct whole-genome architecture on a routine basis, but the practical utility of these methods remains limited by the computational challenges posed by proper data interpretation, and by cost. Over the past year we have developed novel experimental and computational tools, and we are now able to comprehensively map structural variation in mammalian genomes, at reasonable cost and with modest computing power (Quinlan et. al., submitted). In the forthcoming years, we will continue to develop methods that exploit evolving sequencing technologies and ever more abundant genomic data.
Second, we are examining the mechanisms by which new structural mutations arise in the germline. Besides the obvious implications for natural variation and evolution, this is an important question for human health because de novo SVs are increasingly found to underlie spontaneous human diseases, including autism and schizophrenia. Thus far our work in this area has focused on the mouse genome. Our rationale for this is the following: 1) the underlying mechanisms will be the same between human and mouse; and 2) once understood in the mouse it will be possible to investigate the genetic and environmental factors that affect this process. We previously used genome-wide microarrays to show that much new variation is the product of recurrent mutation at “hotspots” (Egan et. al, 2007, Nature Genetics, Nov; 39(11)). More recently we have developed a method to identify, assemble and interpret SV breakpoints at single nucleotide resolution. We have used this method to characterize 3,316 breakpoints present between the genomes of two mouse strains. We have identified the causative mutational mechanism at most of these breakpoints, and we have discovered a number of interesting breakpoint features that have not been previously described. In addition, this work has identified a new mechanistic cause for elevated mutation rates at hotspots. In the future we plan to extend these studies to human by mining the vast amounts of sequence data being deposited in public databases by large sequencing projects (e.g., 1000 Genomes Project)
Third, we are measuring the prevalence of new SV in somatic lineages. The genetic diversity present among the cells of an individual is a question of central importance to human biology, and one that we know almost nothing about. Each cancer begins with a single cell that has diverged from its relatives, and genomic instability has been proposed as a major cause of aging. The major obstacle to studying somatic variation is obtaining pure samples of a given lineage. Current genome-wide methods cannot detect variants that are rare within a population of cells, and the vast majority of somatic mutations are. We are overcoming this obstacle by collaborating with groups that use induced reprogramming to isolate stem cell lines from single somatic cells. By using mice we can isolate induced pluripotent stem cell (iPSC) lines from diverse lineages, and we can analyze mice of different age or from different genetic backgrounds. For each iPSC line we are using paired-end DNA sequencing to assess the prevalence and mechanistic origin of de-novo structural variants.
Finally, we are initiating work to compare patterns of new SV in “normal” germline and somatic lineages to the genomic aberrations found in cancers. We will apply our computational methods to rapidly-accumulating data from ongoing cancer genome sequencing projects. Our method is to predict rearrangement breakpoints by paired-end mapping, assemble breakpoint-containing contigs using reads that map to the predicted breakpoint interval, and interpret breakpoints by comparison to the reference genome. The main goal of this work is to compare rearrangement mechanisms between normal and tumor genomes, and thus to identify the molecular pathways that make cancer genomes so remarkably unstable.
For more information please visit: