CITSCribe & Notes from Nature Hackathon

1476651_645322418845041_134131880_n Post by Austin Mast

The CITSCribe Hackathon, co-organized by Zooniverse’s Notes from Nature Project (www.notesfromnature.org) and iDigBio (www.idigbio.org), brought together over 30 programmers and researchers from the areas of biodiversity research and digital humanities for a week to further enable public participation in the transcription of biodiversity specimen labels.  There are approximately 1 billion biodiversity research specimens in US collections alone, but it is estimated that information from just 10% of them is currently digitized and online.  Digitization of these specimens gives researchers access to vast quantities of information in their investigations of timely subjects such as climate change, invasive species, and the extinction crisis.  The magnitude of the task of bringing those specimens into digital format far exceeds current capacity and requires new, Internet-scale approaches to engage the public to help with the task and learn more about biodiversity collections.  Participants in the hackathon were energized by the opportunity to work on groundbreaking citizen-science projects with immediate and strong impacts in the areas of biodiversity and applied conservation.

The event opened on December 16, 2013, at iDigBio’s University of Florida (Gainesville, FL) center with the co-organizers Rob Guralnick (University of Colorado, Boulder) and Austin Mast (Florida State University) introducing the group to the process of digitization of biodiversity specimens, the heterogeneity of specimen labels, and the role that public participation tools and public participants play in the digitization workflow.  This was followed by a brief introduction to the development tracks that sub-groups might like to tackle during the week: (1) interoperability between public participation tools and biodiversity data systems, (2) transcription quality assessment/quality control (QA/QC) and the reconciliation of replicate transcriptions, (3) integration of optical character recognition (OCR) into the transcription workflow, and (4) user engagement.   The brief introductions and expressions of interest that followed made it clear that there would be a critical mass of complementary interests and competencies in each track for the week (Yay!).

After Cody Meche (an Agile Trainer and Coach at Davisbase Consulting) energized the group with a talk on agile development best practices (thanks for volunteering your time, Cody!), Alex Thompson (iDigBio) presented some of the digital resources that iDigBio had assembled prior to the hackathon (including a Vagrant script to build a virtual machine for the Notes From Nature web interface) and helped the programmers set up their development environments in a “Tech-up!” session.  Yonggang Liu presented the new iDigBio Image Ingestion Appliance for the iDigBio Cloud—a storage resource for public participation tools.  The hackathon participants then self-organized into development tracks to plan deliverables and the development roadmaps in the Team-up!, activities that culminated in presentations to the whole group in a Stand-up! session after lunch on Day 2.

Huge progress was made in a series of Code-sprints and Stand-up! sessions that composed much of the second-half of Day 2 and the full Days 3 and 4.  These were punctuated by occasional Mix-up! sessions in which either pairs of development teams met together to discuss areas of overlap or the participants were completely randomized into new groups to discuss new directions not yet taken.  A call-in from Laura Whyte, the Director of Citizen Science at Adler Planetarium, provided an exciting overview of the latest activities at Zooniverse, including GalaxyZoo Quench (a project that is engaging the public from the process of data collection to data analysis to manuscript writing) and ZooTeach (a site where teachers can find lesson plans that complement Zooniverse projects).  And an excursion to the Florida Museum of Natural History (including its colorful Butterfly Rainforest) on Wednesday afternoon provided a bit of a breather from all of the coding.

On the final day, hackathon tracks presented their final Stand-up!—a parade of creative and useful solutions for public participation in transcriptions.  The interoperability track (Alex T., Ted H., Matthew M, Ed G., Robert B., Greg R., Yonggang L.) introduced their code to produce a Darwin Core Archive that describes discrete projects (sometimes called “Expeditions” or “Missions”) for ingestion by public participation tools and export from those tools back to the data providers.  This includes code to generate descriptions of the project (e.g., taxonomic and geographic scope) in Ecological Markup Language along with record-level description of images and digitization projects using Audobon Core and Darwin Core.  Parts of this code were added to a beta version of the iDigBio image ingestion appliance and Symbiota, a biodiversity data management tool.  Much of the further development in this area will involve creation of a public participation management tool to create and manage projects of this type and download and process publicly generated data.

The QA/QC track (Jun L., Tony K., Al M., Chuck M.) tackled a big challenge in citizen science transcription—how to take the outputs from the citizen science transcription products and assure the highest quality end result.  Team QA/QC introduced an innovative pipeline for building consensus from multiple transcription replicates using characters or, alternatively, tokens using the MAFFT alignment tool—a tool typically used for DNA sequence alignment.  They demonstrated ca. 35% agreement between the consensus that the two methods generate and gold standard data (transcribed by highly trained digitizers) for exact matches.  They also generated script to normalize the name strings (e.g., from “A. R. and F. T. Smith” to “A. R. Smith, F. T. Smith”). Much of the further development in this area will involve optimizing the alignment algorithm for this task and making the consensus builder into a web service that can take input replicate transcriptions and output a consensus transcription.

The integration of OCR track (Go Team Ll Ll!; William U., Deb P., Andrea M., Sylvia O., Miao C., Jason B.) created word clouds (using n-gram scoring, faceting, and Solr for indexing + Carrot2 for visualization) and explored their use in two steps of the pipeline: a step in which the public participant selects a subset of specimens with a word of interest from the word cloud and a data cleaning step, where infrequent words are highlighted by the system.  They also created an interface for exploring the words using histograms, rather than word clouds.  Much of the further development in this area will involve integration of the word selection step into public participation tools and integration of the visualization for data cleaning into a processing tool, such as the public participation management system.

The user engagement track (Go Team Honey Badger!; Julie A., Matthew B., David B., Paul F., Lisa L., Paul K.) made progress on a diversity of useful fronts.  Their completed “ditto” function code to autocomplete Notes from Nature fields using previous entries with key-binding is sure to make data entry in that system far more efficient.  Other code created by that group created functionality in Notes from Nature to see all target fields at once in a single window for easy tabbing between them and to flag specimens with explanations for skipping them (e.g., specimen label obscured, specimen label illegible).  The group brainstormed dashboard functionality for public participation tools, created a mock-up for a dashboard in Notes from Nature, and coded a dashboard (tentatively called My Dashboard) in Atlas of Living Australia’s Biodiversity Volunteer Portal.  These dashboards provide such things as a map of specimens transcribed by the public user, the user’s badges, and completed missions in which the user participated.  The group also produced white-papers on ideas to encourage user sign-in, gamification ideas for Notes from Nature and the Biodiversity Volunteer Portal, and classification of user experience.  Much of the further progress in this area will involve testing and implementation of this new functionality in the production versions of Notes from Nature and Atlas of Living Australia’s Biodiversity Volunteer Portal.

Hackathon participants represented a broad range of career stages—undergraduate students, graduate students, postdoctoral scholars, computer programmers, and university faculty—and institutions, including the Adler Planetarium, University of California–Berkeley, Cornell University, Harvard University, King’s College London, Australian Museum, Smithsonian, New York Botanical Garden, Botanical Research Institute of Texas, Illinois Natural History Survey, Atlanta University Center, National Ecological Observation Network, and many others.  Digital humanities projects represented at the hackathon included the University of Iowa Libraries’ DIYHistory Transcription Project, Indiana University’s Data to Insight Center, the Outreach Ethnomusicology project, and the FromThePage.com transcription project.  Biodiversity projects represented included Notes from Nature, iDigBio, VertNet, Atlas of Living Australia, Symbiota, Filtered-push, Morphbank, Smithsonian Digital Volunteers, and the Biodiversity Heritage Library.

Documentation of the hackathon can be found at the CITSCribe wiki (https://www.idigbio.org/wiki/index.php?title=Transcription_Hackathon).  This includes a complete participant list and many recorded presentations.  Hackathon participants used the hashtag #CITSCribe, and a few additional photos are available at https://www.facebook.com/iDigBio/photos_stream.

Cookies From Nature

‘Tis the season to be thankful for friends, family, and citizen science. And you can combine all three by making Notes from Nature cookies to serve to those around you:

Ingredients Preparation
2 3/4 cups all-purpose white flour
1 1/4 cups granulated sugar
1 teaspoon baking powder
1/4 teaspoon table salt
2 large egg yolks
3/8 cup sour cream
1 tablespoon vanilla
1 1/2 sticks (12 tablespoons/175 g) unsalted butter
1. Melt the butter and set aside to cool slightly.

2. In a large bowl, combine dry ingredients and whisk together.

3. In a smaller bowl, whisk egg yolks, sour cream, and vanilla until combined. Slowly add melted butter, whisking constantly, until mixture is smooth and homogeneous.

4. Pour wet ingredient mixture into dry ingredients; mix until flour is completely incorporated and the dough roughly makes a ball.

5. Turn dough out onto sheet of parchment paper (lightly floured if necessary), and separate into two halves. Form each half into a book-shaped rectangle and wrap with cling film.

cookies_from_nature_sm_1

This is one half of the dough. This recipe makes a lot of cookies.

Once this is done, put the dough in the fridge for an hour or so to chill it out somewhat. You want the butter in the dough cold enough to keep its shape and not stick to things, but warm enough that the dough doesn’t break when you roll it out. Once you’re there, knead the dough a bit to get rid of any cracks and form a ball shape, then roll it out on a piece of parchment paper. In my experience the dough doesn’t stick to the paper or the rolling pin, but you can use a bit of flour here if necessary to prevent sticking.

Just before you start rolling, preheat the oven to 325F (160C). Roll the dough out to about 1/8-inch (~3 mm) thickness. Then start cutting out shapes. If you have cookie cutters for leaves, butterflies, and other Notes from Nature objects, great! I didn’t, though, so I free-handed them instead, with the help of a friend.

This is a pretty standard recipe for butter cookies, and was adapted from Cook’s Country. The key point here is that you can knead it, roll it out, cut out shapes, collect scraps, knead them, roll them out, etc… for as long as you like, and the dough won’t get tough. So go ahead — get creative with the shapes!

cookies_from_nature_sm_3

If you’re doing this freehand, it might take a few tries to get a decent moth shape. Unless you’re my assistant/friend, who was awesome from the start.

Bake until the edges of the cookies are golden brown. The timing depends on the size of the cookies and whether or not you have a convection/fan oven. The original recipe was for a conventional oven and recommended 16 minutes per cookie, which will need to be shortened considerably if you have small cookies and a fan oven. Might be best to set a timer for 7 minutes and then rotate the cookie sheets and check for doneness.

cookies_from_nature_sm_4

Of course, when you bake cookies of wildly different sizes on the same sheet, it’s not easy to get them all to “golden brown” at once.

To decorate, you can make a simple icing with 1/2 cup of icing sugar and 1 tablespoon of milk (plain icing) or lemon juice (sweet-tart icing). You can also use food coloring or a drop or two of flavor extracts like almond or mint. The icing can be the decoration by itself, or it can be used as an underlaying glue for various sprinkles and edible beads, or it can be layered over melted (then chilled to set) chocolate. The sky is the limit for decorating, or you can just leave these plain, as the cookies are delicious on their own. But a little sparkle can be fun too:

cookies_from_nature_sm_5

I brought these into work the next day, where my Zooniverse colleagues were happy to help me taste test.

Which one is your favorite shape? What would you add? There’d be plenty of room for it: the above picture shows about one-quarter of the cookies this recipe made. You won’t be lacking for cookies. Enjoy!

grant_butterfly_cookie

Zooniverse team member Grant Miller prefers the colorful butterfly!

Adventures in the field: How do insect museums get these specimens anyway?

As you pour over images of our fascinating CalBug specimens, you may ask yourself how these insects ended up in the museum in the first place. Many of the labels you are transcribing date back to 60-100 years ago, but don’t let that fool you into thinking that museums are places to that just store old specimens. Scientists are still adding to museum collections every day, but how we use specimens now is often in ways that Entomologists 60 years ago could not have imagined.

As a PhD student in the Essig Museum of Entomology, I have had many opportunities to work with insect specimens within a museum. However, this summer I had the chance to go on a month-long expedition in the Appalachian Mountains of North America to collect live insects in the field. My dissertation research involves understanding the diversification and evolution of ground beetles in the genus Scaphinotus. Often referred to as “snail-eaters,” these nocturnal beetles have developed an elongate head and mouthparts, including escargot fork-like jaws and huge sensory palps that allow them to find and feed on snails and slugs. They are flightless and live in predominantly montane habitats. This makes them interesting candidates for studying how body-forms of species change over time, possibly adapting to feeding preferences.

Scaphinotus5

Insect specimens already housed in museums provide a great deal of information about morphology, distribution, seasonality and even behavior, however there is one thing they generally cannot provide- good quality DNA! So today entomologists are frequently heading to the field to collect specimens specifically to extract their DNA. This is why I went on my recent trip to the Appalachians, where I hoped to collect as many as 20 Scaphinotus species to use in my research.

A month-long field excursion requires careful planning and preparation. My trip included visits to 5 states and as many National Forests, where I camped and hiked long-forgotten trails in search of these elusive little beetles. Of course no amount of planning can prevent one from running into a month-long bout of stormy weather! And so it was, my first big trip into the field was vexed by torrential rains, flooding, lightning, thunder, and even a tornado! But in spite of all that heavy weather, rain and mud, I did manage to find a few Scaphinotus (the beetles were possibly as unhappy about the weather as I was!).

Flooded

I came away from the trip with a far greater understanding and appreciation of what it is like to be in the field collecting specimens first hand. I also chalked up nine additional species whose DNA will contribute to my dissertation research, and will be made available to other scientists worldwide via CalBug and the Essig Museum of Entomology at UC Berkeley.

-Meghan Culpepper

What are Macrofungi?

. . .You may be wondering.   It’s really just a fancy Latin term for “Big Fungi.”   What Macrofungi all have in common is that they form structures called fruiting bodies or sporocarps  –these sporocarps are  typically the above ground part of the mushroom that you see.

When you see  a sporocarp, this indicates that the macrofungus is in reproductive mode. When not in reproductive mode, these fungi consist of a nothing more than network of nearly invisible threads, called mycelia, which run through soil or decaying wood.  But, when environmental conditions are favorable for reproduction (for example, when temperatures are warm and there is lots of rain), these threads coalesce into the woody or fleshy sporocarp. These can take a wide variety of shapes, but somewhere on or in all sporocarps, tiny reproductive units called spores will be formed.  The spores of macrofungi  act like seeds in a plant — they are dispersed by the sporocarp, and if the spore lands on a suitable spot, it will produce mycelia, and eventually may form a new sporocarp.

Microfungi, by contrast, are mostly invisible for their whole lifetime, except when they produce millions of colorful spores.  You may have  seen  the black spores of bread mold  or the blue-green spores of Penicillium in your refrigerator, on occasion!

The most familiar group of macrofungi is the mushrooms.    In a typical mushroom, the spores are produced on the surfaces of the gills on the underside of the cap, as shown below.  The fungus shown here belongs to the genus Marasmiellus, and was collected in Belize. Read More…

What Motivates You?

Greetings, Citizen Scientists!

Some of you may remember me from my (months-earlier!) blog post on behalf of Notes from Nature, for which I was a beta tester as well as doing some copy work for the site. For those of you who don’t, let me make introductions!

My name is Aly Seeberger. I am a master’s student in the Museum & Field Studies program at CU Boulder. My thesis focuses on examining and improving citizen science volunteer motivation evaluation. Essentially, I am interested in what makes Notes from Nature and Zooniverse volunteers tick – why do you give your time so willingly and enthusiastically to these projects?

Museums and other organizations that rely heavily on volunteers do a lot of motivation evaluation in order to determine their volunteers’ needs and how best to satisfy them. However, thus far, this research has been focused mainly on volunteers inside the physical space of the museum. A new frontier for museums is developing citizen science efforts that operate outside the museum, often on the Internet. How museums engage and build participatory mechanisms given a digitally connected public is still evolving, and because of that, organizations are often working more on fine-tuning their projects than getting to know their volunteers.

There has been some research done in this area, to be sure, but it has always been very project-specific. My hope is to establish the use of a set of evaluations that can be applied across projects, in order to be able to compare results and populations in the same way. Doing so will create a streamlined, effective way to evaluate any volunteer population and get comparable results no matter the project. Any institution that hosts a citizen science project will be able to understand its user population – who they are and what they hope to get out of volunteering. Once users’ needs are identified, each project will be able to work toward meeting them. This will create a more productive, fulfilling experience for volunteers!

If this is something that interests you, I hope that you will be willing to take a quick online survey. This survey will look at you, the citizen scientist, and your motivations, and it will be used in the research described above. The survey takes about 15 minutes to complete and is very straightforward. You will not be required to identify yourself, nor will you be required to answer every question. The data from your results will be used in an article that will be published, but you will not be personally associated with that data in any way. If you have questions about the project, or you just want to say hi, feel free to drop in and email me at alysee1@gmail.com!

The survey can be found at http://survey.qualtrics.com/SE/?SID=SV_0vSOngLw1nDdT7L.

Thanks for taking the time to read this, and I hope to hear from you in the near future!

Aly before Derby Practice

Best wishes and many thanks,  Aly
(who is getting ready for a roller derby match above)

 

The Macrofungi Collection – Some Background

The Macrofungi project on Notes from Nature is off to a great start!!! Thank so much to all who have contributed so far.

Some transcribers have been a bit confused when there are several different bits of paper presented for transcribing. Usually there is one “official” label with the basic collection information, e.g., the name of the specimen, where it was collected, when and by whom. There may be a second label that just repeats some of the official label information. Occasionally there is even a third label, often handwritten, and sometimes quite lengthy, that is filled with unfamiliar terminology. Learning a bit more about how macrofungi collections are documented may help you to understand what is going on here.

Macrofungi are usually short-lived. As soon as you pick one, it begins to change, and if left alone after picking, may become a slimy mess in an astonishingly short time. So if a mycologist (that is, someone who studies fungi) plans to make a scientifically useful specimen from a macrofungus he or she collects, there is a work that has to be done right away.

First, the mycologist will take habitat photographs as shown here, sometimes picking a few individuals and arranging them so all the important parts are showing.

Habitat photograph

Habitat photograph

Then he or she will make written notes about the features of the fungus that are going to disappear once it is dried, namely the odor, the color, the taste (yes, all mushrooms are tasted, even poisonous ones, but of course they aren’t swallowed!) and whether or not the fungus is dry, sticky, or slippery to the touch. These characteristics, as well as measurements of size and descriptions of shapes are important for identifying the fungus, and must be recorded before the specimen is put on the drier.

These days mycologists record these field notes on computers, but before this was possible, the information was often recorded in cramped handwriting on small bits of paper, as shown here, that could be folded up and would follow the fungus on the its journey to becoming a permanent collection.

Handwritten fieldnotes

Handwritten fieldnotes

At the end of a collecting trip, the mycologist has to prepare the specimen for permanent storage in an herbarium (or fungarium, as some mycologists like to call these collections). This involves making the official label and placing the specimen and the field notes in a cardboard box.

Dried macrofungi collections

Dried macrofungi collections

Placement of the official label varies between and even within collections – it is very convenient for future users if the label is glued to the box top where it is easily seen, but to save space and money, we use the smallest possible box for each the collection, and this often means that the label won’t fit on the box top. In such cases, the label is put inside the box, and some of the label information, usually the name of the specimen and the collector name and number, sometimes the state or country, are written or printed on the box top.  The picture below shows a collection with all three label types.  Hopefully, armed with the information presented here, you will now be able to tell which is the official label, which is the box label, and which are field notes when Notes from Nature Macrofungi presents you with multiple pieces of paper to transcribe. You should always transcribe from the official label, but sometimes looking at the box top label can be helpful in interpreting handwriting or abbreviations. If not, please keep posting comments!

A macrofungi image in Notes from Nature with three labels

A macrofungi image in Notes from Nature with three labels

If you would like to learn more about how mycologists make collection of macrofungi, you can download a document called “Recommendations for Collecting Mushrooms for Scientific Study” that explain the process in more detail:                (http://sweetgum.nybg.org/boletineae/collecting_illustrated.pdf).

If you would like collect macrofungi yourself, contact a mushroom club in your area – you can find a complete listing at the North American Mycological Association website (http://namyco.org)

Macrofungi Added to Notes from Nature!

The Notes from Nature team is excited to announce the addition of content from the Macrofungi Collection Consortium!  This collection is a partnership of 35 institutions across the U.S that collectively will digitize about 1.5 million specimens that have been collected the past 150 years.  Macrofungi are important to humans in many ways – many people like to eat them, but some species are also deadly poisonous.  Macrofungi are also key to the health of our forests – indeed, most forest trees could not survive if their roots did not form a relationship with a macrofungus (called mycorrhizae) that helps tree roots absorb water and minerals from the soil.  Macrofungi are also an important source of food for forest animals and they serve as homes for many soil insects and other small organisms that are also part of a healthy forest ecosystem. Many macrofungi are very beautiful, and are the subject of nature photographers.  Their pigments may be used for dyeing wool or cotton, and for paper-making. Macrofungi are important religious symbols in some cultures.  Recently it has been discovered that macrofungi can play a role in the cleanup of environmental disasters.  Through a process called “mycoremediation” macrofungi are able to break down or remove contaminants such as pesticides and fuel oils.

The Macrofungi Collection comprises mushrooms and related fungi.  After collection, specimens of macrofungi are dried on a vegetable dehydrator or similar type of dryer, and then are placed in a box or packet with a specimen label that gives the name of the fungus, when, where, and and by whom the specimen was collected.  Because macrofungi are often very short-lived, documenting their occurrence with specimens is critically important for knowing which macrofungi grow where.

To help scientists answer the many remaining questions about these foundational organisms, they need access to data from collections.  Our project is to digitize these specimens and make the data available in a standardized, searchable form through the MycoPortal.

Although macrofungi (mushrooms and mushroom-like organisms) are not plants, they are still stored as dried specimens in herbaria.  The dried mushroom (which often looks nothing like the fresh mushroom!) is stored in a box or paper packet and is accompanied by a label that that gives the name of mushroom, where it was collected, when, and by whom.

You can contribute to a better understanding about these environmentally critical organisms by helping to transcribe data from the specimen labels into a structured format.   The folks who are capturing the images of these specimens have already recorded the name of the fungus,  so what we need your help with is transcribing the collection locality and date, as well as the collector’s name and number.

If you want to learn more about macrofungi, there are many sources of information.  Online, Encyclopedia of Life, which is also linked to the macrofungi collections in Notes from Nature, is a reference for images and descriptions of many of these fungi.  Mushroom Observer is a site where citizen scientists and professional mycologists meet to discuss macrofungi of interest.  There are also many clubs around the country where participants go on mushroom collecting trips, host lectures for members and teach the general public about these organisms.  You can learn about clubs in your area through the North American Mycological Association website.

Seeking participants for December hackathon!

iDigBio and Zooniverse’s Notes from Nature Project are pleased to invite you to participate in a hackathon to further enable public participation in online transcription of biodiversity specimen labels.  The event will occur from December 16-20, 2013, at iDigBio in Gainesville, FL, though you may choose to participate in a subset of the days based upon the schedule.   We are especially looking for participation from the most enthusiastic and committed citizen science transcribers!  This is a great opportunity to have a direct influence on expanding this tool in the directions you would like to see it go.

The hackathon will produce new functionality and interoperability for Zooniverse’s Notes from Nature  and similar transcription tools.  There are four areas of development that will be progressively addressed throughout the week.

  1. Linking images registered to the iDigBio Cloud with transcription tools in order to alleviate storage issues.  (Monday)
  2. Transcription QA/QC and the reconciliation of replicate transcriptions.  (Remainder of week)
  3. Integration of OCR into the transcription workflow.  (Remainder of week)
  4. New UI features and novel incentive approaches for public engagement.  (Remainder of week)

There will be opportunities to narrow the focus in each category of activity in a teleconference tentatively scheduled for early in the week of November 25 (and also at the TDWG meeting and the iDigBio Summit, if you are attending either of those events).

If you are interested, please get in touch with Austin Mast (amast@bio.fsu.edu) by Wednesday, Nov 1.  iDigBio has budgeted some funds to support travel costs.

With best regards,

Austin and Rob Guralnick (UC-Boulder), co-organizers