Data Not Shown: science 101

Showing posts with label science 101. Show all posts

Thursday, 27 August 2009

Saved by Science (NHM) Photo Series

I'm now six installments into a twitter photo series I've been calling "Saved by Science (NHM)" and I've decided I'm enjoying myself enough to warrant formalizing it a bit more.

It all started when I was browsing SEED magazine's special Darwin bicentenary collection (as a professional Darwin groupie is wont to do) and saw a link to an article by Carl Zimmer called 'The Awe of Natural History Collections'. I clicked it (as an Natural History Museum employee is wont to do) and was immediately enthralled, from the subtitle--'visiting the hidden side of natural history museums, where the vast collections of scientific specimens are kept'--through to the end. It's a real Zimmer gem, if you ask me.

Anyways, the article links to an audio slide show by Justine Cooper called "Saved by Science". It's not your average window-dressing to an article, it absolutely steals the show. And considering how good Carl Zimmer's writing is, that's really saying something. It's a brilliant piece of stand-alone journalism. Some of the slides are astonishingly intimate and poignant. Go there now and watch (and listen to) the whole thing.

I was immediately struck by the familiarity of the photographs. They were so similar to scenes I'd witnessed myself at the Natural History Museum in London, where I work. And then I realized that there was a communication void just waiting to be filled; I realized that I really should start taking candid behind-the-scenes photos at 'my' museum along the same lines as Justine Cooper's photos of AMNH.

And so began "Saved by Science (NHM)", a series of tweets (1, 2, 3, 4, 5, 6...) with my Cooperesque NHM photos attached as twitpics. Of course, my photos are usually taken with my iPhone, not a large-format camera, and of course they're not nearly as good as Cooper's, but the point is to reveal the hidden side of the Natural History Museum to a wider audience.

To keep up on my series you can follow me on twitter, but as twitter is a pretty ephemeral thing, and as not everyone tweets (I know--shocking), and as I'd hate for the series to get lost in that ever-growing graveyard of old, unarchived tweets, I've created a permanent archive on my website.

Monday, 3 August 2009

Gene angst: finding a DNA barcode for plants

I've been incubating this post since September 2008, so it's kind of cathartic to finally be writing it. I think it will be a good representation of the title and purpose of this blog in the sense that it's a window to some of those things that go on in science - and in the lives of scientists - that don't make it into the peer-reviewed publications.

So why the wait? On top of that it's inappropriate to talk in public about a piece of research before it's published unless all your co-authors agree (and a quick peek at the number of co-authors on this paper will explain why that was a non-starter), this work involved a lot of personalities and politics - even more than the usual paper - and some rather sensitive discussions and debates were being had right up to the publication date.

Speaking of the publication date, you'd be forgiven for thinking this open access PNAS paper came out on Tuesday; there was, after all, a rash of online and print news items¹ and press releases² about the paper that day, even radio and television interviews. But the paper wasn't published in the Early Edition until Thursday. See, PNAS does this weird thing where they lift the press embargoes on all of the papers in each week's issue on Monday night, even though the papers themselves may come out any day that week. I'm not sure why they do this and I find it a little annoying, largely because though we see a flood of news about a paper on Tuesday, it isn't actually available to non-journalists - you know, like those scientist and taxpayer schmucks - until a few days later. The result is that by the time the paper is out it's too late to influence or even critically filter any of the media surrounding it.

But I digress.

'A DNA barcode for land plants' is the culmination of 4 years' ~~blood, sweat and tears~~ work by a global consortium of researchers called the Plant Working Group (PWG) of the Consortium for the Barcode of Life (CBOL).

The purpose of the PWG is to bring plants up to speed with animals in an international effort to build standardised reference libraries of DNA sequences from known and unknown species. These libraries of 'DNA barcodes' will ultimately enable the rapid identification of unknown specimens (or fragments of specimens) even by non-experts. In the meantime the collaborations and frameworks created to build the libraries will, in the words of John F. Kennedy from his famous "We choose to go to the Moon" speech, "serve to organize and measure the best of our energies and skills."

Because I've blogged about DNA barcoding several times before³, both here and on The Beagle Project Blog, I'm not going to give you a lengthy background on barcoding in this post. Rather, I'll explain briefly why plants needed bringing up to speed in the first place, but then move on quickly to how we did it, and what it was like to be involved.

Why have plants lagged behind animals in terms of amassing DNA barcode reference libraries? It's not that botanists aren't keen to participate. Rather, it's that the gene chosen (and officially endorsed by CBOL and therefore GenBank) to serve as the DNA barcode for animals, CO1, though present in plants, is not variable enough to use in species identification. So the search was on for a CO1 equivalent in plants: a region conserved enough through evolution to be found in and easily amplified from every plant's genome but carrying enough variation to distinguish species.

The approach CBOL took to finding such a region was to assemble a consortium of botanists actively working on DNA barcoding, and to pay for them to have meetings with each other in order to hash it out amongst themselves. As someone working on DNA barcoding plants at the Natural History Museum, I was invited - along with several others - to join in.

This was my first time as a direct participant in science-by-consortium and boy, was it an eye-opener. It turns out trying to get scientists - botanists no less (eek!) - to agree on something is not as easy as one might imagine. (There is a long and inglorious history of botanists disagreeing, but I've already indulged in one digression today...)

The PWG has met several times, most notably at a side meeting during the 2nd International Barcode of Life conference in Taipei in September 2007, and then again at the Royal Botanical Garden Edinburgh in September 2008.

The Taipei meeting was widely believed and reported to be something of a mess, with lots of claim-staking but not much progress towards the all-important Final Decision. I vividly remember one moment from the meeting in which we used a white board to list all of the candidate plant barcode regions (and combinations of regions). I photographed the white board (right). Looking back at it now, I think this picture speaks a thousand words with regard to the indecision that was left hanging in the air after Taipei.

The Edinburgh meeting, on the other hand, was more focused, with a mandate to have a decision made before everyone went home. Ably chaired by Pete Hollingsworth, head of the Genetics and Conservation section at the Garden, we spent two days (rather than two hours, as in Taipei) focused on the task.

I can't speak for anyone else, but I personally found the Edinburgh meeting to be a whole lot of fun. In essence, we - 15 plant DNA barcoding specialists from around the world - locked ourselves in a small room and agreed not to come out until we had made a decision. Coffee was administered by IV drip and snacks and sandwiches delivered to an adjacent room for when our brains ran out of ATP. Unlike the Taipei meeting, we had lots of data to hand in Edinburgh. Print-outs of spreadsheets and figures flew around the room like so much confetti and got annotated by hand as they were discussed.

Participants of the Plant Working Group meeting in Edinburgh emerged breifly from their self-confinement for a group photo.

I mentioned data. Our group from the Natural History Museum in London contributed amplification success rates and DNA sequences for six regions from 138 flowering-plant specimens. These specimens were collected during our project to repeat Darwin's botanical survey of Great Pucklands Meadow at Down House (pause for one of those 'oh if Darwin only knew about DNA' moments). This might seem like an impressive amount of data but in fact it was a modest contribution; some of the other groups contributed not hundreds but thousands of sequences. All in all the various research groups contributed data from 907 specimens from 550 species representing the major groups of land plants (including 670/445 angiosperm, 81/38 gymnosperm, and 156/67 cryptogam samples/species) for up to seven candidate regions that had been flagged in Taipei. These regions are, in no particular order, the genes rpoC1, rpoB, matK and rbcL and the inter-genic regions psbK-psbI, atpF-atpH and trnH-psbA.

Back to our little room in Edinburgh. In some cases we analyzed this mountain of data right then and there, and in other cases, as when there were gaps in our data set that still needed filling, we agreed to go back home and churn out those data pronto.

One of the more illuminating analyses we did was to compare how well all possible combinations of one, two, three and seven candidate regions performed in terms of discriminating species. We were (or at least I was) surprised to find that while increasing the number of regions used in combination from one to two improved the power of species discrimination, combinations of three or more weren't any better (right, Figure 1C from the paper).

In addition to discriminatory power, we also looked at practical issues like universality (i.e., the rate at which we were able to successfully amplify any given region from our collection of specimens) and sequence quality (e.g., the frequency of high-quality sequences obtained for each region, the amount of manual editing required and the concordence of bidirectional sequence reads).

Ultimately, after all of these analyses, there was no obvious winner, no gleaming silver bullet. And so began the war of attrition, during which we said our tearful goodbyes to certain regions that were okay in terms of universality and sequence quality, but pretty useless for species discrimination (as was the case for two regions, rpoC1 and rpoB), or good at species discrimination but with poor amplification success rates and sequence quality (as was the case for psbK-psbI).

After this weed-out process, we were left with three regions - two genes, matK and rbcL, and one intergenic spacer region, trnH-psbA. Though these three outperformed the rest none of them alone performed ideally for all three criteria.

At this stage there was an intense discussion about whether we should recommend all three as a combinatorial plant DNA barcode to CBOL, or just two of the three. Some in the group preferred the better-safe-than-sorry approach of a three-region barcode that could be pruned down to two at a later date if one of the three proved superfluous. The majority, however, thought a two-region barcode preferable because it would be both be less expensive in terms of sequencing costs and also because it was felt that we needed to be decisive; many would-be plant barcoding projects were being denied funding as a result of funding agencies fears that their money might be wasted if CBOL shifted the goalposts. Moreover, as I said above, though two regions are better than one at discriminating species, three are not better than two.

So of the three remaining regions, we tasked ourselves to decide which two in combination to recommend to CBOL as 'the' plant DNA barcode. It made sense to choose two regions which would complement each other: one with high universality and sequence quality and good, but not great discriminatory power (rbcL), the other with better discriminatory power but needing further technical work to improve universality (matK) or sequence quality (trnH-psbA). In the end, the group felt it was easier to overcome the universality difficulties posed by matK than the sequence quality difficulties posed by trnH-psbA.

And there we have it: the Plant Working Group recommends that CBOL adopt⁴ the combination of rbcL and matK as the official plant DNA barcode.

So that's the story of the scientific process that the Plant Working Group went through to select a DNA barcode for plants, but before I end I want to say a little bit more about the political and social process. If you read between the lines of my account here, you can probably guess that there were some intense disagreements between various members of the working group over how many, and which, regions to select. This begs the question, why would anyone care? It's supposed to be cold, hard, evidence-based science, right?

As PWG member Damon Little carefully said in his WNYC radio interview, '...when this started, a lot of people...[had] their favorite region for various reasons,...because they were the ones that discovered it or...because it was a region that had worked well for them in the past...' In other words, different research groups involved had to some extent pinned their reputations on certain candidate regions. As a result, they advocated those regions for a combination of political and historical reasons as well as scientific reasons.

But it wasn't all sorrow and strife. As you can imagine, after the workshop was over, there was a sense of relief and accomplishment - and for some, lingering frustration - and how better to mark the occasion than by refreshing ourselves at the Scotch Malt Whisky Society Vaults in Leith (right)?

And now we have finally come to my last bit of data in this blog post ...consider it supplementary data to Science Creative Quarterly's 'manuscript' entitled 'Scientists will geek out under any circumstances': at the Whisky Society, we were treated to PWG chariman Pete Hollingsworth's expert tutelage in whisky tasting. Here are some of the various drams we tried:

Whisky tasting with the Plant Working Group. Crop at right shows drams labeled by distillery (actually they don't tell you which distillery they're from, so these are actually Pete's guesses).

As is only natural, our conversation turned to DNA barcoding, and we noticed that, just as whiskies have thier own personalities, so do the plant barcode candidate regions. Moreover, we figured these personalities could be mapped onto one another...

rbcL=Highland Park, trnH-psbA=Longmorn, CO1=Laphroaig,
matK=Caol Ila and rpoC1=Glen something
...obviously.

...because that's what we humans do. We identify things, and we classify things. And I hope that the new plant DNA barcode helps us do that a wee dram better.

Reference:

CBOL Plant Working Group (2009). A DNA barcode for land plants Proceedings of the National Academy of Sciences of the United States of America, 106 (31), 12794-12797 : 10.1073/pnas.0905845106

Footnotes:

¹Notable press coverage (last updated Saturday, 1st August, 2009): BBC, CBC News (Canada), The Citizen (South Africa), Guelph Mercury, Science, Science Daily, Scientific American, The Scotsman, The Sydney Morning Herald, The Telegraph
²Press releases: Consortium for the Barcode of Life, Imperial College, Natural History Museum, Royal Botanical Gardens, Kew, University of Guelph
³Data Not Shown: Barcode of plants ~~mapped~~ ~~identified~~ tested; The Beagle Project Blog: Would that which we call a rose by a DNA barcode smell as sweet? and Arbor DNA
⁴Notice I wrote 'CBOL', not 'everyone'. This is because 1) the next step is for the PWG to submit a formal applicaiton to CBOL to have the two-region barcode approved (and this is important because CBOL alone can tell GenBank to rubber-stamp these two regions with the keyword 'BARCODE') but also 2) no matter what the PWG or CBOL says, individual researchers can always sequence whatever they want from whatever plant species they want, for whatever purposes they want. It's only if they want to participate in, and derive useful data from the international DNA barcoding effort, that this recommendation even matters.

Friday, 19 June 2009

DNA-encrypted recipes

This morning I woke up with an idea for a science education/outreach project in my head. The idea is borne out of a fun exchange on twitter yesterday which occurred at the tail end of a long series of frustrated tweets about some problems I'm having submitting DNA sequences to Genbank:

kejames: Perhaps I should just tweet the sequences to Genbank: ctagctgctgttgaagttccatctataaatggataagactttggtcttagtatatacgagttctt
gaaagtaaaggaacaata

TwistedBacteria: chloroplast Prunus laurocerasus (cherry laurel) RT @kejames: Perhaps I should just tweet the sequences to Genbank: ctagctgctgttgaagttccatcta

That's right, TwistedBacteria actually thought to take my DNA fragment - tweeted in a moment of pure, hands-thrown-in-air frustration - and see if he could identify what species the fragment came from. What he did is essentially DNA barcoding (but using Genbank instead of the voucher-specimen-linked BARCODE-tagged databases 'approved' by CBOL).

The really cool thing is that even though I tweeted such a short sequence (just 83bp), and even though I had copied that sequence from a randomly chosen place in my data set, TwistedBacteria's correctly identified the genus if not the species of my specimen; the fragment I tweeted is from blackthorn (Prunus spinosa), not cherry laurel (Prunus laurocerasus).

It was entirely by accident that I happened to choose a blackthorn sequence to tweet, but because I did, I was reminded of a little haiku I did for the Science Creative Quarterly a while back:

SLOE GIN INGREDIENTS
a haiku by Karen James

Prunus spinosa
Juniperus communis
Triticum sp.

And that's when the idea hit me: why not take this 'recipe' one step further and make a fun and educational puzzle out of it by leaving the title of the recipe blank and encrypting the ingredients as DNA sequences? And why not do this for a bunch of recipes and make a whole DNA-encrypted recipe book? Here's what my sloe gin recipe might look DNA-encrypted:

The following ingredients make up what alcoholic beverage?

gcacaggctgaaacaggtgaaatcaaagggcattacttgaacgctactgcaggtacatgcgaagagatgatgaa
aagagctgcatttgccagagaattgggggttcctatcgtaatgcatgattacttaacagggggattcactgcaaata
ctaccttggctcattattgccgagataatggtttacttcttcacatccaccgtgcaatgcatgcagttattgatagaca
gaagaatcatggtatgcactttcgtgtactagctaaagcgttacgtatgtctggtggagatcatatacacgctggtac

ggatgtactatcaaaccaaaattgggtctatctgccaagaattatggtagagcggtttatgaatgtctccgtggtgga
cttgattttaccaaggatgatgaaaacgtgaattcccaaccatttatgcgctggagagatcgtttctgcttttgtgcag
aagcactttataaagctcaggctgagacgggtgagattaagggacattacctgaatgcgactgcagggacatgtga
agaaatgatgaaaagagcagtattcgccagagaattgggagttcctatagtcatgcatgactatctgactggaggtt

aagaaatgattaagagagctgtatttgcaagagaattaggggttcctattgtaatgcatgactacttaactggggga
ttcaccgcaaatactactttggctcattattgccgcgacaatggcctacttcttcacattcaccgtgcaatgcatgcagt
tattgatagacagaaaaatcatggtatgcatttccgtgtattagctaaagcattgcgtatgtctgggggagatcatatc
cactccggtacagtagtaggtaagttagaaggggaacgcgaaatgactttaggttttgttgatttattgcgcgatgatt

One could mix it up a bit and use some amino acid sequences too, and for ingredients that are pure products of biochemical pathways (sugar, alcohol, etc.), one could use sequences of genes that function in those pathways.

Lessons would include:

our food is (or was, or was produced by) living organisms with DNA in them (this is an important lesson - I've heard that children are generally unaware that what they ate for breakfast consisted of plants and animals)
you can identify species by their DNA
genes encode proteins, which have functions in the cells of plants and animals
practice using Genbank and BOLD databases

So, what do you think?

Tuesday, 28 October 2008

Down syndrome research pop quiz: fruit flies 94, Sarah Palin zilch

This story has already been covered to death (or at least I hope so) on teh interwebs, but I must have my say. You see, when Palin dissed fruit flies...

...she didn't just diss fruit flies and the general and specific importance of model organism research (for which she is rightly and expertly skewered by Christopher Hitchens, Kevin Berger and others). She also dissed me.

As mentioned previously here and here, I did my PhD research on the fruit fly* Drosophila melanogaster. Specifically, I worked my butt off for six years to understand some of the myriad and complex functions of a fruit fly gene called Ras (the human counterpart of which plays a role in the onset and/or development of almost every kind of cancer) on the ovarian and embryonic development of the fruit fly, and how this has been modulated during fruit fly evolution.

That's right, friends, I am officially (and, it must be said, very proudly) on Sarah Palin's shit list: not only did I do research on fruit flies (booooo!) but I also did research on evolution (hissssss!).

But enough about me.

Almost unbelievably, Palin's sneer came directly on the heels of her own call to help families with special needs kids like her nephew with autism and her son with Down syndrome. See, in Palin's (surprisingly young) universe, it cannot possibly be fathomed that something as obscure as a fruit fly could help special needs kids. But a quick search on PubMed for 'Drosophila and "Down syndrome"' yields 94 peer-reviewed research articles including this one [my emphases]:

Dscam guides embryonic axons by Netrin-dependent and -independent functions.

Andrews GL, Tanglao S, Farmer WT, Morin S, Brotman S, Berberoglu MA, Price H, Fernandez GC, Mastick GS, Charron F, Kidd T.

Development. 2008 Oct 23. [Epub ahead of print]

Developing axons are attracted to the CNS midline by Netrin proteins and other as yet unidentified signals. Netrin signals are transduced in part by Frazzled (Fra)/DCC receptors. Genetic analysis in Drosophila indicates that additional unidentified receptors are needed to mediate the attractive response to Netrin. Analysis of Bolwig's nerve reveals that Netrin mutants have a similar phenotype to Down Syndrome Cell Adhesion Molecule (Dscam) mutants. Netrin and Dscam mutants display dose sensitive interactions, suggesting that Dscam could act as a Netrin receptor. We show using cell overlay assays that Netrin binds to fly and vertebrate Dscam, and that Dscam binds Netrin with the same affinity as DCC. At the CNS midline, we find that Dscam and its paralog Dscam3 act redundantly to promote midline crossing. Simultaneous genetic knockout of the two Dscam genes and the Netrin receptor fra produces a midline crossing defect that is stronger than the removal of Netrin proteins, suggesting that Dscam proteins also function in a pathway parallel to Netrins. Additionally, overexpression of Dscam in axons that do not normally cross the midline is able to induce ectopic midline crossing, consistent with an attractive receptor function. Our results support the model that Dscam proteins function as attractive receptors for Netrin and also act in parallel to Frazzled/DCC. Furthermore, the results suggest that Dscam proteins have the ability to respond to multiple ligands and act as receptors for an unidentified midline attractive cue. These functions in axon guidance have implications for the pathogenesis of Down Syndrome.

In other words, research on fruit flies is helping us to understand Down Syndrome better. The same can be said for almost all human biology, both pathogenic and 'normal' (whatever that means).

And here's where Palin's mocking is even more resonant: the reason fruit fly biology illuminates human biology is because our genomes are so similar and the reason our genomes are so similar is because we inherited them from the last common ancestor of humans and fruit flies [cue Sarah Palin's head popping off and steam shooting out].

*Though not technically correct, 'fruit fly' is the colloquial name for the monumentally important model organism Drosophila melanogaster. True fruit flies belong to the insect family Tephritidae and it was in fact a research project on these Tephritid flies that Palin was so gleefully skewering as wasteful earmark spending. Some have argued that this fact exonerates Palin-- i.e. that she was not mocking D. melanogaster research because she knows how important that is (right, as if Palin knows the difference between true fruit flies and model organism 'fruit flies') but rather she was mocking Tephritid fruit fly research. Problem is that the project she mocked is more applied to human benefit (in this case agricultural productivity) than most D. melanogaster research, not less, so there's that argument out the window.

Friday, 25 April 2008

Plant genomes made easy

Science has a new multimedia feature on plant genomes "From evolutionary insights to crop development" which does a pretty good job demystifying plant genomics in reasonably* plain language.

*If you've had a basic biology class and know the meaning of such words as "chromosome", "gene" and "DNA replication" you should be able to follow along.

Thursday, 27 August 2009

Saved by Science (NHM) Photo Series

Monday, 3 August 2009

Gene angst: finding a DNA barcode for plants

Friday, 19 June 2009

DNA-encrypted recipes

Tuesday, 28 October 2008

Down syndrome research pop quiz: fruit flies 94, Sarah Palin zilch

Friday, 25 April 2008

Plant genomes made easy

About Data Not Shown

Karen2.0

tweet!

twittering...

Blog Archive

Thursday, 27 August 2009

Saved by Science (NHM) Photo Series

Monday, 3 August 2009

Gene angst: finding a DNA barcode for plants

Friday, 19 June 2009

DNA-encrypted recipes

Tuesday, 28 October 2008

Down syndrome research pop quiz: fruit flies 94, Sarah Palin zilch

Friday, 25 April 2008

Plant genomes made easy

About Data Not Shown

Subscribe To

Karen2.0

tweet!

twittering...

Blog Archive