If you think being a scientist is all test tubes and no social life, think again. These days just about anyone can call themselves a scientist – and all from the comfort of their own home, as guest contributor David Smith discovered on one trip to his local coffee shop…
I know it’s bad coffee shop etiquette, but I keep peeking at the laptop screen of the young woman beside me. I see long stretches of the letters A, T, G and C and a large colorful image mapping out what I know to be lots of different genes. She’s definitely a geneticist. “Sorry to disturb you,” I say, tilting my own laptop screen towards her, revealing a diagram showing all the genes found in a jellyfish, “but it looks like we’re both working on similar things.”
Soon I’m telling her about my research on jellyfish genomes – the ‘library’ of genes carried by a typical jellyfish – and she’s describing to me her work on the genetics of salmon. I ask her if she’s a PhD student. She laughs and says, “I’m actually interning as an investment analyst.”
An unusual pastime
She explains how a few years ago a friend introduced her to the world of bioinformatics – the use of computer technology to explore and analyze genetic information. The friend taught her some basic bioinformatics skills and showed her how to download DNA sequences from the Internet. “Ever since, I’ve been piecing together DNA sequences in my spare time – trying to find the pieces to make up whole genomes and then seeing what I can learn from the results. I have no formal biology training, but I’ve learned the basics through a few textbooks I ordered online. It’s a weird hobby, but a great way to unwind from work.”
She’s not alone. Across the world, ‘hobby geneticists’ are exploring the huge number of DNA sequences that are freely available on websites like GenBank and EMBL-Bank. These online gene stores contain everything from the human genome to the smallpox genome, as well as some pretty ancient DNA sequences from the woolly mammoth and our own ancestors, Neanderthal man. A search of GenBank using the keyword “dog” reveals more than 200,000 entries, including complete genome sequences of the North American coyote, gray wolf, and domestic dog. These sequences can be downloaded in minutes by anyone connected to the Internet.
I ask my new acquaintance at the coffee shop how she makes sense of her DNA sequences. She recites a long and impressive list of computer programs, many of which I use in my own research. “I’ve tried lots of free programs,” she says. “Some are great, but you need to have a grasp of computer programming to be able to use them. Last Christmas, I convinced my parents to buy me an all-in-one bioinformatics suite that’s easier to use without the specialist knowhow. I argued that it was comparable to the price of an Xbox or an iPad. I showed them some of the cool things that I could do with the program, like figuring out how honeybees had evolved or looking at the sex chromosomes in frogs. They couldn’t believe that DNA sequences from all these different species were on the Internet, for everyone to explore.”
The next generation
The number of DNA sequences stored in websites like GenBank is growing exponentially. In the year 2000, GenBank contained around one million DNA entries. Now it boasts more than 150 million. This rapid increase is thanks to recent advancements in DNA sequencing technologies (often called “nextgeneration” techniques), which have made the gathering of DNA sequences cheap, easy, and fast.
So where does all this data come from? When scientists publish in academic journals they are required to hand over all of the DNA data reported in their articles in online sequence repositories. As well as handing over the complete ‘jigsaws’ – the genes and genome sequences that they’ve pieced together – many scientists submit the raw data that come directly from next-generation sequencing machines: millions of short, unassembled DNA sequences – the individual jigsaw pieces themselves.
The raw data is in a special section of GenBank called the Sequence Read Archive; they are great for both researchers and hobby geneticists alike because they often contain information that was ignored or overlooked by the original researcher. For example, the data obtained from sequencing the DNA found in a sample of green alga could also contain DNA sequences from the different viruses and bacteria that live around them A hobby geneticist could use these new data to assemble previously unknown viral and bacterial genomes.
Fishing for fun
My hobby geneticist friend tells me how she is piecing together the genomes of Atlantic salmon and the sea lice that parasitize them. She got the idea from watching a documentary on British Columbia’s salmon fishery. “One part of the movie showed how biologists are using DNA sequences to study the impact of sea lice on the salmon farming industry,” she says. “I found DNA sequences for both salmon and their sea lice parasites on the internet, and now I’m testing to see if the two species have swapped any DNA.”
I ask her if she plans to publish any of her work. “I mostly do this for my own enjoyment,” she explains. “But I did email some of my findings to a professor at the University of British Columbia who has been very helpful and encouraging. He’s even asked if he could use some of my results in a paper that he’s writing. If it works out, I’ll have a scientific publication to my name – not that it will help me much in the world of investment banking!”
After saying goodbye, I return to my work on jellyfish genomes. But soon I’m distracted by a nagging image: an army of hobby geneticists descending upon my hard-earned data – all of which I sent to GenBank last week! I wonder whether somebody, somewhere has already found the solution I’m looking for…