The Sitch with the Stitch—The CITStitch Hackathon
This is being cross-posted with iDigBio, and co-authored with Libby Ellwood and Austin Mast.
Internet-scale public engagement in the digitization of biodiversity research specimens, such as can be seen at Notes from Nature, DigiVol, the Smithsonian’s Transcription Center, and FromthePage.com, offers clear win-wins insofar as motoring through our 100’s-of-millions-of-specimens digitization backlog and advancing science literacy. However, developing that level of engagement presents some large cyberinfrastructure challenges, given that the community of public engagement tools has yet to interoperate “seamlessly” amongst themselves and with the more established biodiversity data platforms, such as Symbiota and iDigBio. This observation was first widely discussed at iDigBio’s 2012 Public Participation in Digitization of Biodiversity Research Specimens Workshop, which led to iDigBio’s development of Biospex, a prototype public participation project management system.
For the second year in a row, iDigBio and Notes from Nature co-organized a citizen science (CITSCI) hackathon focused on the cyberinfrastructure gaps. You can read about last year’s CITSCribe Hackathon here. The main goal of this year’s CITStitch Hackathon (Dec 3–5) was to build interoperability among projects that enable public participation in digitization in useful and exciting ways for both the public participation project managers and the public participants. Among this year’s 24 participants were developers, data managers and publishers from public participation tools as well as those who work with cool tools for data visualization (e.g., CartoDB and Zooniverse), data cleaning (e.g., VertNet, Encyclopedia of Life, Global Names, FilteredPush, Kurator, SALIX), and georeferencing (GeoLocate and CoGE).
On Day 1, Austin Mast (Florida State Univ.) and Rob Guralnick (Univ. of Florida) welcomed everyone and provided brief introductions to iDigBio, Biospex, and Notes from Nature before lightning intros from each participant. Next, Libby Ellwood (Florida State Univ,), Austin, and Rob provided an introduction to proposed activities at the hackathon. These had been developed by iDigBio’s Interoperability for Public Participation in Digitization working group—the hackathon’s organizing committee (including those three plus Ed Gilbert, Nelson Rios, Ben Brumfield, Paul Flemons, and Greg Newman). These activities were presented as occurring in one of two tracks. The first track focused on innovative cross-platform ways to deploy and manage public participation projects, visualize and analyze progress for the project managers, and ingest data and provenance back into data management systems; this group would later be dubbed “Team Tardigrade.” The second track focused on development of novel ways to engage citizen scientists (e.g., via visualizations of individual and collective contributions); this would become “Team Honey Badger.” After this, Cody Meche gave an engaging talk on Agile development best practices and we split into our teams to develop priorities, goals for deliverables, and a road map.
What followed in Days 2 and 3 were a series of code-sprints interspersed with animated stand-ups, all fueled with a lot of coffee, hot tea, and food. In the end, each team went on to produce several deliverables involving subsets of their members that far exceeded expectations. Content regarding the deliverables can be found at the CITStitch wiki page. We have summarized the work briefly below. But as was emphasized at the start of the hackathon, successes will also be measured by the number of long-term collaborations initiated over dinner at the Reggae Shack, The Top or Andaz Indian Restaurant.
Stuart Lynn (Zooniverse) produced a broadly useful web service and data explorer for the (now 1 million!) transcriptions in Notes from Nature.
Ed Gilbert (Symbiota) and Daryl Lafferty (SALIX) produced a SALIX web service that will take an OCR text string and direct it to the correct SALIX-enabled Symbiota portal for processing; this SALIX-parsed data then can be sent to a transcription tool for proofreading.
John Wieczorek (UC–Berkeley), David Lowery (Harvard), and Dmitry Mozzherin (Marine Biological Lab, Woods Hole) produced web services for assessing the fitness for use of data and doing data cleaning, including validators for scientific name, year collected, collection locality coordinates, and measure of coordinate uncertainty.
Ben Brumfield (FromthePage.com), Greg Riccardi (Florida State Univ.), and Robert Bruhn (iDigBio’s Biospex) expanded the Biospex data model to include ledger and field book pages in anticipation of adding FromthePage as an actor in Biospex project workflows.
Finally, Greg Riccardi, Ed Gilbert, Nelson Rios (Tulane Univ.), Ben Brumfield, Robert Bruhn, and Austin Mast established a manifest file example in JSON that enables tools and project management systems to communicate about the public participation projects.
Team Honey Badger
Chris Snyder (Zooniverse), with help from Libby Ellwood (Florida State Univ.) and Rob Guralnick (University of Florida), created functionality in Notes from Nature that compares entered taxonomic names against the GBIF Name API and gives feedback to inform the citizen scientist as to whether the name exists in GBIF and the number of records associated with that name; this gives the participant a sense of the significance of their contribution.
Julie Allen (Illinois Natural History Survey), Charlotte Germain (Univ. of Florida), Sophia B Liu (USGS iCoast) and Andrew Hill (CartoDB) created dynamic maps to visualize citizen science contributions; for example, the participant could upload a dataset to cartodb.com and select subsets of the data to display the country of origin for specimens that have been transcribed. Click here for a map of countries with specimens that have been digitized in Notes from Nature.
Paul Kimberly (Smithsonian), Paul Flemons (Australian Museum), Deb Paul (iDigBio), Libby, Rob, and Austin fleshed out a proposal for a 4-day global transcription blitz organized by the major transcription centers, including its timing, name, goals, funding, and organizational structure, and scoped the functionality for its website in years 1 and later (more to come on that in future blog posts!).
And, especially relevant to the global transcription blitz website, Alex Thompson (iDigBio) produced a prototype that integrates results across different transcription platforms and generates summary results and means for further exploration using Elasticsearch.
Thank you to all of our participants—what a great experience! You can check out more photos in the iDigBio CitStitch Hackathon album on facebook.