A dying science gets new life for the 21st century. But that might mean saying good-bye to Homo sapiens—and hello to “ACGATTCAGCATCAG...”
Over a quarter of known living creatures—animal, vegetable, and otherwise—are beetles. There are 350,000 species of them. In the early 1900s, an English archbishop turned to the biologist (and noted atheist) J.B.S. Haldane one night at a formal dinner and asked Haldane what his scientific research had taught him about the nature of the divine creator. Haldane thought for a moment before answering, “He must have an inordinate fondness for beetles.”
It took naturalists centuries of patient labor to discover and catalogue the inordinate number of beetle species, collecting them in mason jars and nets, dissecting them, pinning them to specimen trays, preserving them in chemicals, and keeping them safe and dry in museums.
The classification of beetles, and insects generally, into their hundreds of phylogenic families and orders is one of the great mass efforts in science history, the equivalent of producing thousands of periodic tables. And the work of taxonomy has hardly stopped. New species pop up regularly in every biological kingdom. Even a class of animals as well-known as mammalia has expanded by 408 new species in the past fifteen years. What’s more, what scientists thought they knew well can turn out to be wrong, and well-established species sometimes must be rearranged according to new evidence.
But while the labor of taxonomy—the science of sorting species—remains the same as it was in the 1700s, biology generally looks far different, far more clinical. When students go into biology these days, most jump into molecular biology, the study of the motor parts that make cells run. Fewer and fewer ever set foot in a marsh or forest or even touch a live creature. Many of the new generation of biologists are no more adept than a layman at differentiating between June bugs and stinkbugs. The lack of knowledge extends far beyond insects. The number of experts for most orders of plants and animals (aside from a few popular creatures like birds) has dwindled and will continue to dwindle.
Meanwhile, the number specimens collected in ecological fountainheads like Brazil, Ecuador, and Costa Rica continues to swell. At a conference in Barcelona this summer, one biologist noted that, at the current rates of progress, it would take at least two thousand years—and possibly up to fifty thousand years—to classify by hand the veritable landfill of specimens sitting in unsorted piles in museum storerooms around the world. Overall, there’s an awful need for thousands of people to sit down and go through the piles and sort out truly new discoveries from variations of known species—and an equally awful shortage of people qualified to do so.
Some traditional biologists, like Harvard University’s E.O. Wilson, have rallied new interest in describing and classifying species with the Encyclopedia of Life project, which aims to put up a Web page for every known species. Much like Wikipedia, anyone can contribute to the encyclopedia, making it a curious mix of fusty taxonomy and Web 2.0 technology (check its status at www.eol.org). The eventual hope is to renew citizens’ interest in local species around the world, with an eye toward preserving them. But even the founders of the project admit that while they’re confident they’re going to get an overload of information about popular flora and fauna like butterflies and orchids, there’s little hope of finding enough experts to explicate the mysteries of all those beetles, not to mention even more obscure critters.
Facing that reality, an international consortium of scientists on all seven continents is turning to none other than molecular biology, especially work on DNA. They’re hoping the very tools that have pushed aside the old models of shoot-and-stuff biology can actually save taxonomy and update it for the twenty-first century.
The folk classification of species extends at least as far back as Leviticus, with its prohibitions against eating creatures with cloven hooves or shells. Only around 1750 did Carl Linnaeus, a Swedish naturalist, begin to classify species based on internal anatomy, an innovation that shifted biologists’ focus from superficial traits to deeper structure. It also helped biologists group and classify animals by lineage, which paved the way for evolutionary thinking. Nevertheless, the Linnaean system is based on dissection and sight-identification, and if science has proved any one thing in its history it’s that human senses are fallible. Many subtle differences between creatures remain hidden, and some modern scientists are starting to betray impatience.
“Two-hundred fifty years after the registration of life began, we cannot recognize the species around us,” Paul Hebert, a biologist the University of Guelph, in Ontario, Canada, recently lamented. “Biology remains the only advanced science that hasn’t embraced identifying the fundamental particles of its discipline.”
To that end, Hebert and other biologists published a paper in 2003 about so-called DNA barcoding. It’s a scary name, reminiscent of the old science-fiction trope of labeling people (sometimes with tattoos of a universal product code [UPC] on the scruff of their necks) and sorting them into castes based on the traits encoded in their genes. But the advocates of barcoding are less interested in judging creatures than mapping their genealogies, and there’s really no better metaphor for the process of identifying specimens on a molecular level.
To find a creature’s genetic barcode, scientists first disintegrate a few of its cells to free up DNA, then use chemical enzymes to chop the DNA up into millions of small strands. Each strand, however, still contains many hundreds of “base pairs”—the four chemicals that actually encode genes, chemicals usually abbreviated A, C, G, and T. Each species has unique genes, so by scanning one of those fragments, scientists in theory can tell every species apart, even if they have only one cell of each.
The tricky part of DNA barcoding is figuring out which DNA sequence to scan. Scientists need a sequence that all creatures share, since locking onto a sequence that produces, say, hair would be little help in identifying fish; but not a sequence so fundamental to life and so ancient that all cells share it, since there would be nothing to tell apart. Hebert and others eventually keyed in on a stretch of 648 DNA base pairs in the mitochondria, the organelle inside cells that produces and stores energy. This stretch is common enough for some variant to exist in all animals but varied enough to provide unique identification. (So far, the mitochondrial work allows biologists to identify only animals, since mitochondria don’t exist in some microbes and mitochondria in plants evolve too slowly to differentiate between species. Microbiologists and botanists are working on finding suitable sequences for other forms of life, and expect to soon.)
Though recognized as a world expert in “aquatic invertebrates, especially microcrustaceans,” Hebert—a white-haired man with a bald pate who speaks (at least in public) in loud bursts, as if most every sentence in his presentation is bolded and italicized—has become synonymous with barcoding since the 2003 paper. He’s the public face of a consortium called the Barcode of Life initiative, which represents the interests of the 150 barcoding projects that have popped up in South Africa, Japan, Russia, Nigeria, Brazil, Kenya, Taiwan, and fifty other countries, including the archipelagoes and islands of Polynesia. Overall, Hebert has set the ambitious goal of barcoding all the world’s vertebrates, one-quarter of its plants, all human pathogens, most blights and agricultural pests, and all polar life by 2010, and for just $92 million.
For a field not even six years old, that’s a lot of activity. So why all the enthusiasm? The major advantage of DNA barcoding, especially over traditional taxonomy, is tempo. Even a discount DNA sequencer can scan a sample in hours, and high-end models take ninety minutes. Moreover, while most samples cost about five dollars to identify today, the best machines (though more expensive upfront, obviously) can identify them for two dollars apiece. And DNA technology is accelerating in this century at about the same pace that microchip production did in the last century. As the Barcode of Life Initiative writes in a brochure on the technology, “In the coming years, barcoding will probably cost pennies and take only minutes.” At some point in the indefinite but near future, biology, still largely a descriptive science, will transform into hybrid science with a strong quantitative edge.
Barcoding can also scrabble down into creatures’ genes and root out sub-anatomical features, especially details about divergent evolution. With just the small snippet of DNA used in barcoding, biologists probably cannot find definitive evidence of divergent evolution (and any evidence they do find will be restricted to the mitochondria). But small differences provide a hint that a closer examination of the entire genome might be in order, and any further discoveries about molecular differences could provide the justification for separating one monolithic species into multiple branches. In the past five years, scientists have used barcoding to unbraid three different species of dwarf crocodiles in a population once thought to be homogenous. A similar examination of Astraptes fulgerator, a luscious, black-and-shimmering-green skipper butterfly, teased out ten species.
What’s more, while an immature fledging or a badly damaged carcass would be useless in traditional taxonomy, which works best on da Vinci-like ideal specimens, DNA barcoding works just as well on creatures living or dead, old or young, preserved or rotten. There are good, prudent reasons to preserve taxonomic samples in the old style, in parallel with barcoding, but identification no longer requires expert attention. Most DNA barcoding work could be done by unskilled technicians who can’t tell sparrows from iguanas—the DNA machines will spit out the answer. In this case, the really hard work isn’t the years of patient labor that biologists once put in to master the details of a creature’s anatomy, but the programming that computer scientists must do and the bioinformatics that people have to code and input into the sequencers before the taxonomy even starts.
The final major benefit of barcoding is that scientists can begin to take a realistic census of the most extensive and persistent form of life on the planet, bacteria, whose microscopic size makes them more difficult than other species to find, much less classify. Craig Venter, the sometime rival of the federal Human Genome Project, recently underscored this ignorance with a highly publicized expedition through the Sargasso Sea, a region near the Bermuda Triangle in the Atlantic Ocean.
The trip was something of a retreat to the wilderness for Venter, who had recently been shoehorned out of the for-profit DNA sequencing company he had helped run, Celera. Deciding he needed to rededicate himself to pure science, Venter and his staff trawled the Sargasso waters for bacterial samples. The results were astounding: They catalogued 1,800 species of bacteria, far more than anyone expected to find in such brackish waters. They also discovered 150 species in just a few months work, bacteria that harbor 1.2 million wholly new genes.
The Barcode of Life Initiative is in some ways a grander version of Venter’s trip, an exploration of not just oceans but rain forests, deserts, and polar ice sheets. Still, a close look at Venter’s work actually highlights a major shortcoming of barcoding. Though Venter and his colleagues discovered scores of new species and over a million new genes, the necessary process of cutting genes into small pieces means that neither Venter nor anyone else knows what those genes do, nor how cells turn them on and off. Nor does anyone even know which species have which genes, nor how common any one gene is. Venter’s high-output, highly statistical approach to DNA sequencing has transformed biology generally. But it can only be a start for molecular taxonomists, who are more interested in individual details of species than bulk statistics.
Though the Barcode of Life Initiative can plausibly say as of January 2009 that it has collected DNA records for half a million species—an appreciable fraction of life on Earth—that figure is inflated. Only one-tenth of those half-million records can be linked to formally described species. The other hundreds of thousands of records are like an unsorted, un-alphabetic version of an old library card catalog. In this case everyone knows that each card refers to some book on the shelf—it’s just that no one knows which book, or where it is, or sometimes even what library it’s in.
That’s why, in parallel with barcoding new species, biologists must compile a barcode reference library from known and identified samples, a library whose core collection is the storerooms of specimens in natural history museums across the world. Beyond their historical importance—museums in Europe have collections dating back hundreds of years, some supplemented by handwritten notes from Darwin about his beloved barnacles, or from Linnaeus about his beloved plants—such collections are the only way biologists can link DNA strings to taxonomic groups and actually identify what ants or bacteria or whatever turn up during ventures like Venter’s. Unfortunately, the embalming fluids used to preserve samples can sometimes alter or damage DNA, so scientists have to take special steps to clean and extract genetic information. Overall, scientists have only just begun to build the comprehensive catalog they need of the world’s species.
And this work cannot wait: Last summer, in August, a few scientists documented the dangers of not having a thorough enough reference library in a paper in the Proceedings of the National Academy of Sciences (PNAS). Two teams of biologists had set out to barcode two well-known animals, grasshoppers and crayfish, but they soon ran into contradictions trying to pick out the 648-base-pair-long sequence used to sort animals. All cells have a fair amount of extraneous DNA inside them, DNA that once served a function but no longer does and still lives mutely inside cells. Because cells no longer use that DNA, mutations accumulate freely along the sequence. (The vestigial DNA mutates at the same rate as any DNA, but because it doesn’t have any function, harmful mutations are not weeded out, as they are in functioning DNA, which is more conservative about change.)
The frustration is that some vestigial DNA looks just enough like mitochondrial DNA to fool barcoding machines. And because of the mutations, DNA barcoders “read” the sequence as a new species. In the PNAS paper, barcoding identified seventeen distinct species of grasshoppers even though the research team had drawn samples from a pool of six, while the crayfish barcoding picked out twenty-five species from a pool of seven. To complicate matters, not all species have the specific string of extra DNA that bollixes the test. There are ways to control for such errors, but the moral of the story is that while DNA barcoding can be quick and easy, there’s such a thing as too quick and too easy.
The PNAS paper provided an implicit warning about the disconnect between lab biology (barcoding) and field biology (the tactile handling of samples). But there’s also another sense of disconnect in barcoding, in the new names species will receive. By necessity, barcodes will soon supersede the old-fashioned, Latin binomial system of naming species (Canis lupus for dogs, Felis catus for cats), especially for species (like Venter’s bacteria) no one has ever captured alive and that no one may ever capture. But biology will undoubtedly lose something if that system dies out. The binomial system remains about the only holdout left in science of what was once the dominate scientific language, Latin. Early scientists (Galileo being a exception) conversed in professional matters almost exclusively in Latin, a language they learned as part of their classical, gentlemanly education. Treatises not written in Latin almost didn’t count among learned societies, and any “low” craftsmen that did discover something important enough to write a scientific paper about would lace his pages with apologies for his “rude” tongue. Even after Latin fell out of favor among scientists in the 1800s—for the same democratic reasons that Protestants jettisoned the Latin mass and Vulgate Bible a few centuries before—biologists continued to employ it. Today they frequently encode clues in the names about where a species was first described, who discovered it, or what its most interesting features were. A recently discovered fossil of an ancient (and pregnant) whale was given the name Maiacetus inuus. Maiacetus means “mother whale” and Inuus was a Roman fertility god, an appropriate name for a creature that, according to the structure of its flippers and the position of the fetus in utero, was a missing link: It spent half its life on land, including giving birth on land, and eventually (as modern whales) returned to the sea of its ancestors. That’s a charm a string of A’s, T’s, C’s, and G’s can’t capture.
Once DNA barcoding takes hold, those historic names will likely fall into disuse except among a few nostalgic hold-outs. Instead, creatures’ official names will read something like “GATTACAAGCGATCGA ...” At a recent conference, Hebert posted one species’ new name on a slide during his presentation and quipped, “If you thought pronouncing Welsh town names was tough, try pronouncing that.” The crowd laughed, and his point was well-taken: DNA barcoding is exhaustive and precise. But part of the reason some people, especially religious folk, fear modern biology can be traced back to the message behind Hebert’s slide. People worry about biology becoming too quantitative and reductive, and for all its benefits, DNA barcoding doesn’t help that image. Few people read Latin any more, but at least it looked like a human language. “GATTACAAGCGATCGA ...” resembles computer code. Biology will undeniably lose something if names like Tyrannosaurus rex go the way of the Raphus cucullatus (the common dodo).
Nostalgic for the old names or not, biologists will continue to press forward with barcoding. Since Linnaeus, humans have catalogued an estimated ten percent of all species. But with a significant percentage of species facing extinction worldwide, biologists say they cannot afford to do anything but race forward. DNA first, names later could be their motto. Extinction is most acute in rain forests and near polar ice caps, which is why scientists have picked those two regions as two of the first they will catalogue. (One of Hebert’s groups, for instance, focuses on the Churchill estuary in northern Canada, on the Hudson Bay, and on various lush areas in the Polynesian islands.)
But there’s also an added, hidden benefit to targeting at least the rain forests, a benefit many biologists involved in barcoding are not shy in talking about: money.
“One great thing is that [DNA barcoding] has to involve all countries of the world,” said David Schindel in a recent talk. Schindel, executive secretary of the Consortium for the Barcode of Life, a project sponsored by the Smithsonian’s Museum of Natural History, further emphasized, “Unlike most biotechnology, this has to be global.”
Specifically, Schindel and others argue that many countries stand to gain a lot economically from setting up barcoding projects. Whereas most molecular biology today follows the money, with labs ending up in wealthy countries in North America and Europe, barcoding depends on collecting samples in countries that are often impoverished. Such fieldwork can only get done with a country’s goodwill and support.
Of course, the notion of rich countries trying to develop biotechnology in poor countries isn’t new, but DNA barcoding does promise to break from the exploitative, almost colonial model that has tarnished such relationships in the past. Many small countries feel burned by companies who scour rain forests for new genes and biochemicals to copy and develop, usually for pharmaceuticals. Royalty and patent rights rarely trickle down to the countries of origin. The feelings that linger are in some cases so rancorous that one of the key factors in the success of barcoding won’t be technological, but whether scientists can, as Schindel had it, “establish a trusting relationship with third-world countries.”
To meet its goals, the Barcode of Life Initiative has produced multiple brochures to sell its ideas to foreign countries. And while the brochures don’t underplay the science, they certainly play up the economic and social benefits of joining the barcoding movement. Among the benefits are “controlling agricultural pests,” since barcoding can identify pests even in larval states; “identifying disease vectors” like mosquitoes; “sustaining natural resources” by tracking poachers; “monitoring water quality” by tracking bacterial populations; and providing jobs and training for local scientists. Hebert of Guelph also talks of developing handheld DNA barcoding detectors, like portable UPC guns at the grocery store, to further help poor countries participate.
Up front, it sounds a little utopian for so many benefits to flow from one technology, but if DNA barcoding does spread far enough and wide enough, it’s not a stretch to think it really could help eradicate diseases and save potable water supplies. The point of DNA barcoding is to identify and conserve Earth’s other million-some forms of life, but in the end the ultimate beneficiaries might be Earth’s most dominant species, Homo sapiens.
Sam Kean is associate editor of Search.

