Tuesday 1 December 2009

'Impact' crater

Tonight I attended the twitter-inspired 'Blue skies ahead?' debate in which science minister Lord Paul Drayson gamely engaged a youthful panel (and audience) of scientists on 'the prospects for UK science'.

The first half of the debate was dominated by one word: 'impact'. It's an unfortunate word choice, really: it's vague, loaded and unidirectional, suggesting science impacts society but not the other way around. There was lively disagreement regarding the extent to which science funding should hinge on retrospective and/or predicted impact.

As evidenced by my flush of tweets during and after the event, I have a lot to say about 'impact', but in this post I'm going to set aside my opinions and instead tell a personal story of how 'impact' impacted me.

I wouldn't have thought to tell this story (it happened a while ago and as it has a happy ending I don't give it too much thought anymore), but after two respected tweeps, Ed Yong and Evidence Matters, specifically asked for it, I thought it might merit daylight.

In 2002, shiny new PhD in hand, I was looking for a job in the UK. I was casting a wide net, applying and interviewing for not only postdoctoral research positions but also assistant editorships at peer-reviewed journals and various jobs involving popular science communication. During my PhD years, I had enjoyed writing and communicating science to both expert and non-expert audiences, and moreover I think it is a scientist's civic duty to engage the broader public, to improve general science literacy but also to pave the way for future science funding.

One of the postdoc fellowships for which I interviewed seemed perfect; the project addressed some fascinating evolutionary developmental-genetic questions using a range of new and old techniques, the lab seemed like it was thriving, and the lab head had written books and popular science articles which I not only admired but which also suggested that he might be a good mentor for that element of my training.

My interview seemed to go very well: my CV was strong, I was happy with my presentation, I had good discussions with the lab head and the other members of the lab during which I asked questions, made suggestions and I even proposed an experiment that it turned out they hadn't thought of yet.

At one point during my interview, I mentioned how keen I was to stay active in public outreach, through writing and perhaps other forms of engagement, and that I admired his own accomplishments in that area.

And that's where it all went wrong. In a sudden change of tone, the lab head started asking me probing questions about my commitment to the project, suggesting that I might not be up to seeing it through. He said I might be more suited to a career as a journal editor or science communicator. I reiterated my commitment to the research project, and said that I thought that shouldn't preclude engagement with the wider public; indeed, his own success in both research and popular science writing showed that it was possible to do both, and to do them well.

But it was too late. He had made up his mind. He wanted the people in his lab to have their noses to the research grindstone; he saw public outreach as icing on the cake, something you did only once you'd achieved success in your research career and were running your own lab.

Sure enough, a few weeks later he emailed me to say he'd decided to give the job to someone else. He cited his concern about my 'level of interest and commitment to the project', repeating a phrase he had used the day of the interview after I'd divulged my sordid secret interest in improving public understanding of science.

For a while I had some regret, but then I came to realize that it was better this way. I'm glad I didn't go to his lab only to find out too late that my 'extra-curricular' interests wouldn't be looked upon favorably.

Ultimately I found a job at an institution with a genuine commitment to both scientific research and public engagement with science. There will always be a natural tension between the two - after all, there are only so many hours in the day and science is a demanding career - but I'm glad to be in a place where public outreach isn't considered a character flaw.

Coming back to the 'Blue skies ahead' debate, I hope my story illustrates what every research scientist already knows: career progression depends primarily on one's (peer-reviewed) publication record and, to a lesser extent, one's history of winning research grants. Anything that takes time away from these two activities is therefore by definition a drag on one's career. Many of us do it anyways, because we enjoy it and think it's important. But there will not be any significant increase in the number of scientists engaging in public outreach until recognition of these activities is incorporated into research career progression criteria.

Sunday 6 September 2009

My first mix on 8tracks

8tracks is a simple way to share music mixes online. Here's my first attempt:

Friday 28 August 2009

I love the NHS but not their Ramadan health FAQs

The health care debate taking place in my homeland right now is immensely important. The outcome will affect all 300 million Americans, especially the 46 million that are uninsured, and if reform doesn't pass now, we probably won't get another shot at it for another decade or two.

It's also important to me, personally, as I do hope to repatriate one day. I am absolutely pro-reform and I find two aspects of the debate particularly infuriating:
  1. the spread of outright lies about the proposed reforms by the small but very screechy anti-reform camp (debunked here), including the slinging of vast quantities of mud across the Atlantic at the UK's National Health Service
  2. the sheer number of Americans--64%--who 'don't want to pay more taxes to expand health coverage to the uninsured'
As an American living in the UK, I feel it is my particular duty to counter the misinformation about the NHS that is circulating in the States right now. I've been counter-circulating as much information and testimonials by email and facebook as I can, and even have an 'I [heart] NHS' twibbon on my twitter avatar as a sign of my support.

I do, by the way. [Heart] the NHS, that is. It is difficult to overemphasize the peace of mind it gives me that those I love and I will never be unexpectedly refused coverage as a result of some policy small print about, for example, pre-existing conditions, nor financially ruined by a health problem. What a relief it is to be able to go to the doctor without having to fill out any forms or make any co-payments. Oh, and prescriptions are either free or £6.95 depending on whether you are capable of paying. I could go on but that's not what this post is about, and others have said it much better than me.

This post is about something the NHS did that has me pretty irked. I know, I know, given all of the above, maybe now isn't the best time to point out flaws in the NHS, but to that I say: a) this flaw has nothing to do with the general premise of the NHS or the health care they provide and b) I think it's right to be honest even when it's not politically expedient. Ahem.

So. The NHS has this website called 'Healthy Ramadan' which offers advice on staying healthy if you happen to have chosen to observe the daylight fasting that is part of the Muslim holy month of Ramadan. Of course, the word 'chosen' is tricky because it's difficult to quantify the extent to which religious indoctrination limits one's perceived if not real choices, but I digress.

The site seems like a pretty good idea: there are pages containing general advice on healthy fasting, suggestions on what to eat and what not to eat, and even a suggested meal plan. There's also an important section that lists the health risks that can be associated with fasting, and the site urges people to use Ramadan as an opportunity to quit smoking.

But then we get to the page, 'Ramadan health FAQs'. This page got my hackles up immediately with its introductory note that explains that 'the answers have been put together by medical experts and Islamic scholars and researchers'. I can see why Islamic scholars and researchers might help with devising the questions - after all, they are the experts on what the likely FAQs are going to be. But why should they be involved with putting together the answers? This is supposed to be health advice. It should come from the medical experts alone.

The first several Q&As about diabetes, migraines and blood pressure were okay, I suppose, though I was a little uncomfortable with how the questions were worded: each one was based around the question, 'should I fast?' when they really should have asked, 'is it alright to fast?' because then the answer would be less likely to be interpreted as prescriptive rather than permissive. But then I got to this one:

Is fasting harmful when a woman is expecting a baby? Must pregnant women fast?

There's medical evidence to show that fasting in pregnancy is not a good idea. If a pregnant woman feels strong and healthy enough to fast, especially during the early part of the pregnancy, she may do so. If she doesn't feel well enough to fast, Islamic law gives her clear permission not to fast, and to make up the missed fasts later. If she is unable to do this, she must perform fidyah (a method of compensation for a missed act of worship).

Let's just start with the question, shall we? 'Must' should never appear in front of or inside the phrase 'pregnant women fast', and certainly not on a national health service website. In fact the only time those two phrases should ever go together on any kind of government literature is if 'not' is inserted directly after 'must'.

The answer to the question starts out a bit better--using 'may' instead of 'must'--but then it all goes downhill. 'Islamic law gives her permission not to fast...' is useful information, as it may give uncertain women the religious argument they are looking for to give themselves permission not to fast (though of course that opens up a whole can of worms that I'm not going to go into today). But that last sentence is abhorrent. It's missing a big fat 'Islamic law says' before 'she must'. The way it is now, it looks like the NHS is the one telling her that she must perform fidyah!

I suppose one could argue that 'Islamic law' is mentioned in the penultimate sentence and therefore it is meant to indicate that Islamic law, and not the medical establishment, is the authority in both of the final two sentences. And I suppose that if this were the only problem with the website then I might have given them the benefit of the doubt. But two questions later it gets worse, and this time there's no qualifying 'Islamic law says' anywhere to be found:

From what age can children fast safely?

Children are required to fast from the age of puberty. It isn't harmful. Fasting before this age is tolerated differently depending on the child’s general health, nutrition and attitude. Fasting under the age of seven or eight isn't advisable. It is a good idea to make children aware of the practice of fasting and to practise fasting for a few hours at a time.

Look at that first sentence and tell me the NHS--the NHS!--didn't just say that children are required to fast during Ramadan!

The next few Q&As are okay, I suppose. They're about asthma, swimming and blood transfusions and there are occasional qualifiers like 'Muslim experts say...' and 'in their view...'. But I don't like how the answer to the asthma question contains an implication that it's somehow incumbent on Muslims to 'achieve good control' of their asthma ...as if it's some kind of personal failing if your asthma isn't under control. And I don't like that the answer to the transfusion question mandates fidyah with no qualifiers in sight. But I'm passing over these so that I can address this final doozie:

Does a breastfeeding woman have to fast?

No. Islamic law says a breastfeeding mother does not have to fast. Missed fasts must be compensated for by fasting or fidyah once breastfeeding has stopped.

As with the Q&A about pregnancy, the question itself contains an implication that the answer is mandataory rather than permissive. And again, missed fasts 'must' be compensated, no qualifiers, unless you count the one in the first sentence, but at this stage I'm not exactly inclined to give them a pass.

So, all you NHS web content editors out there, would you please do us all a favor and go in there with a red pen and change 'must' and 'should' to 'can' and 'could'? And while you're at it, add a liberal sprinkling of 'Muslim scholars say...' and 'Islamic law says...' before each sentence in which fasting is 'permitted' or fidyah 'suggested'? Oh, and could you please pay particular attention to those Q&As regarding women and children? Because I've noticed that those were most prescriptive and least qualified of all.

I'd do it myself, but I'm hungry.

Thursday 27 August 2009

Saved by Science (NHM) Photo Series

I'm now six installments into a twitter photo series I've been calling "Saved by Science (NHM)" and I've decided I'm enjoying myself enough to warrant formalizing it a bit more.

It all started when I was browsing SEED magazine's special Darwin bicentenary collection (as a professional Darwin groupie is wont to do) and saw a link to an article by Carl Zimmer called 'The Awe of Natural History Collections'. I clicked it (as an Natural History Museum employee is wont to do) and was immediately enthralled, from the subtitle--'visiting the hidden side of natural history museums, where the vast collections of scientific specimens are kept'--through to the end. It's a real Zimmer gem, if you ask me.

Anyways, the article links to an audio slide show by Justine Cooper called "Saved by Science". It's not your average window-dressing to an article, it absolutely steals the show. And considering how good Carl Zimmer's writing is, that's really saying something. It's a brilliant piece of stand-alone journalism. Some of the slides are astonishingly intimate and poignant. Go there now and watch (and listen to) the whole thing.

I was immediately struck by the familiarity of the photographs. They were so similar to scenes I'd witnessed myself at the Natural History Museum in London, where I work. And then I realized that there was a communication void just waiting to be filled; I realized that I really should start taking candid behind-the-scenes photos at 'my' museum along the same lines as Justine Cooper's photos of AMNH.

And so began "Saved by Science (NHM)", a series of tweets (1, 2, 3, 4, 5, 6...) with my Cooperesque NHM photos attached as twitpics. Of course, my photos are usually taken with my iPhone, not a large-format camera, and of course they're not nearly as good as Cooper's, but the point is to reveal the hidden side of the Natural History Museum to a wider audience.

To keep up on my series you can follow me on twitter, but as twitter is a pretty ephemeral thing, and as not everyone tweets (I know--shocking), and as I'd hate for the series to get lost in that ever-growing graveyard of old, unarchived tweets, I've created a permanent archive on my website.

Monday 3 August 2009

Gene angst: finding a DNA barcode for plants

I've been incubating this post since September 2008, so it's kind of cathartic to finally be writing it. I think it will be a good representation of the title and purpose of this blog in the sense that it's a window to some of those things that go on in science - and in the lives of scientists - that don't make it into the peer-reviewed publications.

So why the wait? On top of that it's inappropriate to talk in public about a piece of research before it's published unless all your co-authors agree (and a quick peek at the number of co-authors on this paper will explain why that was a non-starter), this work involved a lot of personalities and politics - even more than the usual paper - and some rather sensitive discussions and debates were being had right up to the publication date.

Speaking of the publication date, you'd be forgiven for thinking this open access PNAS paper came out on Tuesday; there was, after all, a rash of online and print news items1 and press releases2 about the paper that day, even radio and television interviews. But the paper wasn't published in the Early Edition Thursday. See, PNAS does this weird thing where they lift the press embargoes on all of the papers in each week's issue on Monday night, even though the papers themselves may come out any day that week. I'm not sure why they do this and I find it a little annoying, largely because though we see a flood of news about a paper on Tuesday, it isn't actually available to non-journalists - you know, like those scientist and taxpayer schmucks - until a few days later. The result is that by the time the paper is out it's too late to influence or even critically filter any of the media surrounding it.

But I digress.

ResearchBlogging.org 'A DNA barcode for land plants' is the culmination of 4 years' blood, sweat and tears work by a global consortium of researchers called the Plant Working Group (PWG) of the Consortium for the Barcode of Life (CBOL).

The purpose of the PWG is to bring plants up to speed with animals in an international effort to build standardised reference libraries of DNA sequences from known and unknown species. These libraries of 'DNA barcodes' will ultimately enable the rapid identification of unknown specimens (or fragments of specimens) even by non-experts. In the meantime the collaborations and frameworks created to build the libraries will, in the words of John F. Kennedy from his famous "We choose to go to the Moon" speech, "serve to organize and measure the best of our energies and skills."

Because I've blogged about DNA barcoding several times before3, both here and on The Beagle Project Blog, I'm not going to give you a lengthy background on barcoding in this post. Rather, I'll explain briefly why plants needed bringing up to speed in the first place, but then move on quickly to how we did it, and what it was like to be involved.

Why have plants lagged behind animals in terms of amassing DNA barcode reference libraries? It's not that botanists aren't keen to participate. Rather, it's that the gene chosen (and officially endorsed by CBOL and therefore GenBank) to serve as the DNA barcode for animals, CO1, though present in plants, is not variable enough to use in species identification. So the search was on for a CO1 equivalent in plants: a region conserved enough through evolution to be found in and easily amplified from every plant's genome but carrying enough variation to distinguish species.

The approach CBOL took to finding such a region was to assemble a consortium of botanists actively working on DNA barcoding, and to pay for them to have meetings with each other in order to hash it out amongst themselves. As someone working on DNA barcoding plants at the Natural History Museum, I was invited - along with several others - to join in.

This was my first time as a direct participant in science-by-consortium and boy, was it an eye-opener. It turns out trying to get scientists - botanists no less (eek!) - to agree on something is not as easy as one might imagine. (There is a long and inglorious history of botanists disagreeing, but I've already indulged in one digression today...)

The PWG has met several times, most notably at a side meeting during the 2nd International Barcode of Life conference in Taipei in September 2007, and then again at the Royal Botanical Garden Edinburgh in September 2008.

The Taipei meeting was widely believed and reported to be something of a mess, with lots of claim-staking but not much progress towards the all-important Final Decision. I vividly remember one moment from the meeting in which we used a white board to list all of the candidate plant barcode regions (and combinations of regions). I photographed the white board (right). Looking back at it now, I think this picture speaks a thousand words with regard to the indecision that was left hanging in the air after Taipei.

The Edinburgh meeting, on the other hand, was more focused, with a mandate to have a decision made before everyone went home. Ably chaired by Pete Hollingsworth, head of the Genetics and Conservation section at the Garden, we spent two days (rather than two hours, as in Taipei) focused on the task.

I can't speak for anyone else, but I personally found the Edinburgh meeting to be a whole lot of fun. In essence, we - 15 plant DNA barcoding specialists from around the world - locked ourselves in a small room and agreed not to come out until we had made a decision. Coffee was administered by IV drip and snacks and sandwiches delivered to an adjacent room for when our brains ran out of ATP. Unlike the Taipei meeting, we had lots of data to hand in Edinburgh. Print-outs of spreadsheets and figures flew around the room like so much confetti and got annotated by hand as they were discussed.

Participants of the Plant Working Group meeting in Edinburgh emerged breifly from their self-confinement for a group photo.

I mentioned data. Our group from the Natural History Museum in London contributed amplification success rates and DNA sequences for six regions from 138 flowering-plant specimens. These specimens were collected during our project to repeat Darwin's botanical survey of Great Pucklands Meadow at Down House (pause for one of those 'oh if Darwin only knew about DNA' moments). This might seem like an impressive amount of data but in fact it was a modest contribution; some of the other groups contributed not hundreds but thousands of sequences. All in all the various research groups contributed data from 907 specimens from 550 species representing the major groups of land plants (including 670/445 angiosperm, 81/38 gymnosperm, and 156/67 cryptogam samples/species) for up to seven candidate regions that had been flagged in Taipei. These regions are, in no particular order, the genes rpoC1, rpoB, matK and rbcL and the inter-genic regions psbK-psbI, atpF-atpH and trnH-psbA.

Back to our little room in Edinburgh. In some cases we analyzed this mountain of data right then and there, and in other cases, as when there were gaps in our data set that still needed filling, we agreed to go back home and churn out those data pronto.

One of the more illuminating analyses we did was to compare how well all possible combinations of one, two, three and seven candidate regions performed in terms of discriminating species. We were (or at least I was) surprised to find that while increasing the number of regions used in combination from one to two improved the power of species discrimination, combinations of three or more weren't any better (right, Figure 1C from the paper).

In addition to discriminatory power, we also looked at practical issues like universality (i.e., the rate at which we were able to successfully amplify any given region from our collection of specimens) and sequence quality (e.g., the frequency of high-quality sequences obtained for each region, the amount of manual editing required and the concordence of bidirectional sequence reads).

Ultimately, after all of these analyses, there was no obvious winner, no gleaming silver bullet. And so began the war of attrition, during which we said our tearful goodbyes to certain regions that were okay in terms of universality and sequence quality, but pretty useless for species discrimination (as was the case for two regions, rpoC1 and rpoB), or good at species discrimination but with poor amplification success rates and sequence quality (as was the case for psbK-psbI).

After this weed-out process, we were left with three regions - two genes, matK and rbcL, and one intergenic spacer region, trnH-psbA. Though these three outperformed the rest none of them alone performed ideally for all three criteria.

At this stage there was an intense discussion about whether we should recommend all three as a combinatorial plant DNA barcode to CBOL, or just two of the three. Some in the group preferred the better-safe-than-sorry approach of a three-region barcode that could be pruned down to two at a later date if one of the three proved superfluous. The majority, however, thought a two-region barcode preferable because it would be both be less expensive in terms of sequencing costs and also because it was felt that we needed to be decisive; many would-be plant barcoding projects were being denied funding as a result of funding agencies fears that their money might be wasted if CBOL shifted the goalposts. Moreover, as I said above, though two regions are better than one at discriminating species, three are not better than two.

So of the three remaining regions, we tasked ourselves to decide which two in combination to recommend to CBOL as 'the' plant DNA barcode. It made sense to choose two regions which would complement each other: one with high universality and sequence quality and good, but not great discriminatory power (rbcL), the other with better discriminatory power but needing further technical work to improve universality (matK) or sequence quality (trnH-psbA). In the end, the group felt it was easier to overcome the universality difficulties posed by matK than the sequence quality difficulties posed by trnH-psbA.

And there we have it: the Plant Working Group recommends that CBOL adopt4 the combination of rbcL and matK as the official plant DNA barcode.

So that's the story of the scientific process that the Plant Working Group went through to select a DNA barcode for plants, but before I end I want to say a little bit more about the political and social process. If you read between the lines of my account here, you can probably guess that there were some intense disagreements between various members of the working group over how many, and which, regions to select. This begs the question, why would anyone care? It's supposed to be cold, hard, evidence-based science, right?

As PWG member Damon Little carefully said in his WNYC radio interview, '...when this started, a lot of people...[had] their favorite region for various reasons,...because they were the ones that discovered it or...because it was a region that had worked well for them in the past...' In other words, different research groups involved had to some extent pinned their reputations on certain candidate regions. As a result, they advocated those regions for a combination of political and historical reasons as well as scientific reasons.

But it wasn't all sorrow and strife. As you can imagine, after the workshop was over, there was a sense of relief and accomplishment - and for some, lingering frustration - and how better to mark the occasion than by refreshing ourselves at the Scotch Malt Whisky Society Vaults in Leith (right)?

And now we have finally come to my last bit of data in this blog post ...consider it supplementary data to Science Creative Quarterly's 'manuscript' entitled 'Scientists will geek out under any circumstances': a
t the Whisky Society, we were treated to PWG chariman Pete Hollingsworth's expert tutelage in whisky tasting. Here are some of the various drams we tried:


Whisky tasting with the Plant Working Group. Crop at right shows drams labeled by distillery (actually they don't tell you which distillery they're from, so these are actually Pete's guesses).

As is only natural, our conversation turned to DNA barcoding, and we noticed that, just as whiskies have thier own personalities, so do the plant barcode candidate regions. Moreover, we figured these personalities could be mapped onto one another...


rbcL=Highland Park, trnH-psbA=Longmorn, CO1=Laphroaig,
matK=Caol Ila and rpoC1=Glen something
...obviously.

...because that's what we humans do. We identify things, and we classify things. And I hope that the new plant DNA barcode helps us do that a wee dram better.

Reference:

CBOL Plant Working Group (2009). A DNA barcode for land plants
Proceedings of the National Academy of Sciences of the United States of America, 106 (31), 12794-12797 : 10.1073/pnas.0905845106


Footnotes:

1Notable press coverage (last updated Saturday, 1st August, 2009)
: BBC, CBC News (Canada), The Citizen (South Africa), Guelph Mercury, Science, Science Daily, Scientific American, The Scotsman, The Sydney Morning Herald, The Telegraph
2Press releases: Consortium for the Barcode of Life, Imperial College, Natural History Museum, Royal Botanical Gardens, Kew, University of Guelph
3Data Not Shown: Barcode of plants mapped identified tested; The Beagle Project Blog: Would that which we call a rose by a DNA barcode smell as sweet? and Arbor DNA
4Notice I wrote 'CBOL', not 'everyone'. This is because 1) the next step is for the PWG to submit a formal applicaiton to CBOL to have the two-region barcode approved (and this is important because CBOL alone can tell GenBank to rubber-stamp these two regions with the keyword 'BARCODE') but also 2) no matter what the PWG or CBOL says, individual researchers can always sequence whatever they want from whatever plant species they want, for whatever purposes they want. It's only if they want to participate in, and derive useful data from the international DNA barcoding effort, that this recommendation even matters.

Tuesday 21 July 2009

'Man must explore'

At this very moment, exactly forty years ago, two men set foot upon the surface of the Moon.

The Moon, people!

There are many celebrations taking place 'in real life' and online; some of the most compelling of these are the real-time-plus-40-years commemorations like the Apollo 11 Radiocast, We Choose the Moon and ApolloPlus40.

As with all anniversaries (and boy, do I speak from experience), this is a time to reflect on the past and contemplate the future - in this case, of space exploration.

Almost as if to emphasize this, a review of the US Human Space Flight program is taking place right now, and they want our feedback. That's right: they want us rabble to tell them what we think about the future of manned missions into orbit and beyond.

So, here's what I think:

I think Apollo 15 Commander Dave Scott knew exactly what he was talking about when, upon becoming the 7th man to walk on the Moon, he said, "As I stand out here in the wonders of the unknown at Hadley, I sort of realize there’s a fundamental truth to our nature: man must explore. And this is exploration at its greatest."



I think that John F. Kennedy was absolutely right when, in his famous "we choose to go to the moon" speech, he said that we should go to the moon "because that goal will serve to organize and measure the best of our energies and skills...".

And I think that science and exploration are not just icing on a cultural cake to be undertaken during economically flush times, not just things we do to reap cold, hard, profitable benefits, but a core part of who we are as human beings.

As Brian Greene wrote in his brilliant NY Times Op-Ed piece 'Put a Little Science in Your Life', "science is a language of hope and inspiration, providing discoveries that fire the imagination and instill a sense of connection to our lives and our world. [snip] We must embark on a cultural shift that places science in its rightful place alongside music, art and literature as an indispensable part of what makes life worth living."

Update (21st July 12:15pm): I also think that people are holding science in general and the space program in particular to a double standard when it comes to federal funding. Have a look at Death and Taxes, an excellent and intuitive visualization of the federal budget. Is the NASA circle bigger or smaller than you expected?

Friday 19 June 2009

DNA-encrypted recipes

This morning I woke up with an idea for a science education/outreach project in my head. The idea is borne out of a fun exchange on twitter yesterday which occurred at the tail end of a long series of frustrated tweets about some problems I'm having submitting DNA sequences to Genbank:
kejames: Perhaps I should just tweet the sequences to Genbank: ctagctgctgttgaagttccatctataaatggataagactttggtcttagtatatacgagttctt
gaaagtaaaggaacaata

TwistedBacteria: chloroplast Prunus laurocerasus (cherry laurel) RT @kejames: Perhaps I should just tweet the sequences to Genbank: ctagctgctgttgaagttccatcta
That's right, TwistedBacteria actually thought to take my DNA fragment - tweeted in a moment of pure, hands-thrown-in-air frustration - and see if he could identify what species the fragment came from. What he did is essentially DNA barcoding (but using Genbank instead of the voucher-specimen-linked BARCODE-tagged databases 'approved' by CBOL).

The really cool thing is that even though I tweeted such a short sequence (just 83bp), and even though I had copied that sequence from a randomly chosen place in my data set, TwistedBacteria's correctly identified the genus if not the species of my specimen; the fragment I tweeted is from blackthorn (Prunus spinosa), not cherry laurel (Prunus laurocerasus).

It was entirely by accident that I happened to choose a blackthorn sequence to tweet, but because I did, I was reminded of a little haiku I did for the Science Creative Quarterly a while back:

SLOE GIN INGREDIENTS
a haiku by Karen James

Prunus spinosa
Juniperus communis
Triticum sp.

And that's when the idea hit me: why not take this 'recipe' one step further and make a fun and educational puzzle out of it by leaving the title of the recipe blank and encrypting the ingredients as DNA sequences? And why not do this for a bunch of recipes and make a whole DNA-encrypted recipe book? Here's what my sloe gin recipe might look DNA-encrypted:

The following ingredients make up what alcoholic beverage?

gcacaggctgaaacaggtgaaatcaaagggcattacttgaacgctactgcaggtacatgcgaagagatgatgaa
aagagctgcatttgccagagaattgggggttcctatcgtaatgcatgattacttaacagggggattcactgcaaata
ctaccttggctcattattgccgagataatggtttacttcttcacatccaccgtgcaatgcatgcagttattgatagaca
gaagaatcatggtatgcactttcgtgtactagctaaagcgttacgtatgtctggtggagatcatatacacgctggtac

ggatgtactatcaaaccaaaattgggtctatctgccaagaattatggtagagcggtttatgaatgtctccgtggtgga
cttgattttaccaaggatgatgaaaacgtgaattcccaaccatttatgcgctggagagatcgtttctgcttttgtgcag
aagcactttataaagctcaggctgagacgggtgagattaagggacattacctgaatgcgactgcagggacatgtga
agaaatgatgaaaagagcagtattcgccagagaattgggagttcctatagtcatgcatgactatctgactggaggtt

aagaaatgattaagagagctgtatttgcaagagaattaggggttcctattgtaatgcatgactacttaactggggga
ttcaccgcaaatactactttggctcattattgccgcgacaatggcctacttcttcacattcaccgtgcaatgcatgcagt
tattgatagacagaaaaatcatggtatgcatttccgtgtattagctaaagcattgcgtatgtctgggggagatcatatc
cactccggtacagtagtaggtaagttagaaggggaacgcgaaatgactttaggttttgttgatttattgcgcgatgatt

One could mix it up a bit and use some amino acid sequences too, and for ingredients that are pure products of biochemical pathways (sugar, alcohol, etc.), one could use sequences of genes that function in those pathways.

Lessons would include:
  • our food is (or was, or was produced by) living organisms with DNA in them (this is an important lesson - I've heard that children are generally unaware that what they ate for breakfast consisted of plants and animals)
  • you can identify species by their DNA
  • genes encode proteins, which have functions in the cells of plants and animals
  • practice using Genbank and BOLD databases
So, what do you think?