Many of the topics I have discussed in my posts have mentioned using DNA sequencing to both distinguish cancers and curate treatments. While interpretations of data obtained from DNA sequencing are often highlighted in the news, the methods behind how this process works and is analyzed can be a mystery to non-scientists. Over the past two decades since its use in mapping the human genome, the applications of sequencing and analysis of human sequences have evolved significantly, making it feasible for everyone to get their DNA sequenced. But what is DNA sequencing and how did it become so popular?
Every cell in our bodies contains a set of DNA, called the genome, which stretches out to over 6 feet long. These DNA strands, however, are 100,000 times thinner than a strand of hair! Even under the most powerful microscopes, we cannot read a DNA sequence. In the 1970’s, Fredrick Sanger developed a way to read DNA sequences. One of the few people to win two Nobel Prizes, Sanger’s new method of reading DNA sequences opened up a new world of biological research. The invention of DNA sequencing allowed scientists to identify genes, manipulate their sequences, and uncover sequences associated with disease.
It was not until the 1990’s and the launch of the Human Genome Project that the DNA sequencing revolution really gained broad momentum. With the goal of sequencing and mapping the human genome, this project was unprecedented as it was estimated that the human genome was comprised of 3 billion DNA letters. Sanger’s sequencing method, however, could only sequence small stretches of DNA— about 500 letters at a time. There also did not exist a way to order sequences in reference to each other, within the genome.
To overcome these barriers, the Human Genome Project enlisted the power of over 200 laboratories around the world at a cost of $2.7 billion! In addition to these vast resources, the scientists physically broke up the genome into fragments that were within the capability of the sequencing machinery. Ensuring that these fragments partially overlapped each other, the sequences were then paired, slowly building a continuous line of DNA sequences, a process which altogether took almost 15 years. Developing new sequencing and computing technology for this project enabled new applications of sequencing and analysis of the genome. While an enormous accomplishment, the Human Genome Project created more questions about human health than it answered.
Over the 25 years since the completion of the Human Genome Project, computing power and sequencing technology have advanced significantly. While the basic mechanism of DNA sequencing is still very similar to Sanger’s original method and the methods used in the Human Genome Project, the new technology, called next-generation sequencing, allows DNA sequencing to occur faster, which means more DNA can be sequenced by fewer people. Additionally, advancements in computer technology allow more streamlined assembly and analysis of the sequences. As a result of these advancements, a human genome can now be sequenced in a single day.
In addition to enabling rapid genome sequencing, these technological advancements also lowered the monetary limitations of sequencing for research purposes. Today it costs about $1,000 to sequence a human genome. While this is extremely inexpensive compared to the cost of the Human Genome Project, individuals do not commonly pay $1,000 to get their genomes sequenced. We tend to see at-home genetic testing kits from companies like Ancestry, 23andMe and Counsyl, all in the price range of $100. So why are these at-home genetic tests so much cheaper? The simple answer is that genetic testing companies do not sequence the entire genome. Less simply, each company has a different way of choosing what to sequence and how to analyze it.
Genetic tests that trace a person’s ancestry often compare similarities or differences in specific DNA letters that are associated with certain populations or traits. I previously discussed in The Double Edged Sword of Mutation that a specific DNA letter may become more frequent in a population through inbreeding. When two populations with different DNA letters at a specific place in the genome are isolated and do not interbreed, the prevalence of these differences, called variants, become more pronounced. Three specific DNA letters in the BRCA1 and BRCA2 genes that increase breast cancer risk, for instance, are more prevalent in eastern European populations as a result of breeding within small isolated populations.
On a larger scale, human populations have been geographically isolated until the past few hundred years. Isolation over the prior hundreds of thousands of years lead to increases in unique DNA sequences in different populations. Similar to the BRCA1 and BRCA2 gene mutations, other variants associated with traits, such as skin color, go through similar selection in isolated populations, becoming genetic markers of the population. Scientists can determine which geographical populations comprise an individual’s DNA by comparing the patterns of unique sequences and variants in a person’s DNA in relation to reference human genomes, taken from individuals from these regions around the world.
A similar approach of analyzing specific sequences in the genome can be useful for identifying risk for diseases. Analyzing a few select DNA variants works well for diseases that have defined genetic components, such as those in BRCA1 and BRCA2. Our understanding, however, of the genetic contribution to most diseases is very limited, leaving large amounts of genetic risk undetermined through these tests. The three BRCA1 and BRCA2 mutations I mentioned, often cited as examples of our understanding of genetic risk, are some of the best-defined variants in terms of their disease risk. These mutations, however, are not the only ones in the BRCA1 and BRCA2 genes that can put a person at an elevated risk for developing breast cancer or other cancers. Testing only for these three mutations, therefore, can miss the presence of other mutations contributing to disease susceptibility.
Because only a small portion of DNA letters has well-defined disease risks, there are many more that have either not been uncovered yet or have weaker connections to disease risk. Some genetic testing companies sequence entire genes that are implicated in disease risk, allowing for detection of all mutations. These mutations can then be classified based on available information about their link to disease. These tests are often useful to a broader range of people as they take a more unbiased approach to detection of variants. Lauren Ryan, a genetic counselor with Color, discussed this process, as well as variants of uncertain significance, in her episode of the BRCA Foundation’s Positive Perspectives podcast.
These tests that sequence limited amounts of the genome— either specific DNA letters or entire genes with known disease risk— are cheaper than sequencing the entire genome. As the function of most of the genome remains unknown, sequencing every letter in the genome would not yield much additional useful health information than these tests are currently able to provide. It is important, though, to understand the different information each company analyzes and the limitations of each test
An exciting aspect of these genetic tests is that our knowledge of disease risk and ability to refine ancestry mapping will improve as more people are tested. Similar to the idea behind the All of Us research program, increasing the amount of information we can access as a result of sequencing more genomes will allow scientists to find more links between diseases and learn more about the known connections. As more DNA is sequenced, more variants will be classified based on their association with disease risk and can yield useful information in genetic tests.
One of the goals of increasing access to genetic information by sequencing more genomes is to diversify sources of information. A vast majority of genetic information currently used to analyze the genetic basis of disease stems from individuals of European descent. This inherently biases genetic tests, as less is known about variants that are prevalent in other populations. Studies such as All of Us, therefore, aim to increase diversity in testing and obtain genetic information for research purposes so more can be learned about disease risk for other ethnicities. Sequencing genomes from people all over the world not only helps scientists learn more about biology and disease, it benefits everyone as genetic tests will be able to incorporate information that affects a wider range of people.
To summarize, DNA sequencing technology has existed for close to 40 years. The technological revolution of the past 20 years, however, has enabled the accessibility of sequence analysis. The resulting drop in the cost of DNA sequencing is allowing individuals to learn about their risks for certain diseases, as well as their ancestry. While the Human Genome Project was an enormous accomplishment, it did not answer all of the questions about human health, or even what comprises our genome. Twenty-five years after the completion of the Human Genome Project, our understanding of disease risk is still relatively limited in relation to size of the genome. Increasing the amount of health and genetic information that can be analyzed will help define the functions of these parts of the genome and help understand how mutations in these regions contribute to disease risk.
While genetic testing and ancestry mapping are the most accessible uses of DNA analysis to the general public, the advent of DNA sequencing and analysis technology has enabled biological research in many more ways. DNA sequencing is becoming an important tool in diagnosing diseases and in selecting and personalizing treatments. DNA sequencing technology also has huge implications for the development of therapies for cancer including those that I have covered in my previous posts, such as PARP inhibitors and immunotherapy. Other diseases and conditions such as mental illness, obesity, diabetes and heart disease, however, also have genetic components, of which having a better understanding will greatly improve their treatments. While there remains a lot to learn about disease from DNA sequencing, editing the genome by utilizing the information we already have is clearly the next frontier in treating diseases.