Culmulative transcription effort — How are we doing a year (plus) in?

Now that we have passed our one year anniversary here at Notes from Nature, it might be worth doing a little reflecting and data mining. Our favorite activities! So we decided to ask some simple questions: What are the trends in rates and cumulative activities on Notes from Nature? Do we find, for example, that projects show distinctive trends over time? Do they start hot and then settle into a comfortable groove? Maybe you have had some questions too, so feel free to ask us if you want to see a particular metric.

Below we show the cumulative transcription numbers, which should also give a fairly good idea of cumulative effort. However, we should note that each skipped record is also, at this time, counted as a transcription here, so the numbers are somewhat inflated.Image

Doing this simple plot yielded some (pleasant) surprises! For example, both Calbug and the Natural History Museum London (NHML) Ornithology project have shown trends of increasing transcriptions over the past months. Ornithology, in particular, is noteworthy in that the effort has really picked up. As well, this is the first time we’ve counted transcriptions for the NHML Ornithology project — when we set this one up, we focused on pages completed not transcriptions, and our current counts don’t include the work on birds. As you can see, a lot is getting done quickly on those ledgers, which is fantastic.

There are a lot of blips and bumps in the daily transcription rates, and we’d like to spend more time correlating those to things like national or international press, efforts to promote the project, new badges coming online, etc. But again, we mostly want to thank all those folks who’ve put time into the 669,666 (give or take) transcriptions on Notes from Nature. That is a herculean (collective) effort.

 

Natural history citizen science crowdsourcing

This is a guest post from Dag Endresen, who is the manager for the Norwegian participant node of the Global Biodiversity Information Facility (GBIF).
______________________________________

Are you interested in natural history? Help us to capture label information from images of specimens from the Norwegian natural history collections in Oslo. The Natural History Museum at the University of Oslo (NHM-UiO) and GBIF-Norway has released a new citizen science crowdsourcing portal for transcription of specimen label information (figure 1). This new portal was developed in collaboration with the Notes from Nature (NfN) team and follows the design example established by NfN (Hill et al. 2012). The crowdsourcing portal was presented last week for the 200-year jubilee of the Botanical Garden in Oslo (NHM-UiO)

image1

Figure 1: A new citizen science platform to capture label information from photographs of specimens in the natural history collections of Norway. Available at http://gbif.no/transcribe/

Primary biodiversity information (where, when and what):

Data for biological species on where, when and what define the so-called primary biodiversity information and is recognized as the minimum information requirement for fundamental biodiversity scientific research activities. Species distribution modelling is one of the important research tools for understanding the ecology of species, and is dependent on available primary biodiversity information (where, when and what) (Soberón and Peterson 2004). Other notable uses of primary biodiversity data includes refining range map boundaries, understanding inventory completeness, validating knowledge about habitat associations, understanding trait variation, providing knowledge about preparations that include tissues, and more.

The total number of museum specimens in worldwide natural history collections are estimated to be somewhere between 1.2 billion and 3 billion (Ariño 2010, Duckworth et al. 1993). The specimens archived in the natural history museums during more or less 250 years, provide a unique resource for understanding biological and ecological processes across time and place. A very low proportion of these museum specimens have been recorded electronically into databases, perhaps as low as 5% for many collections (Krishtalka and Humphrey 2000). The biggest challenge for efficient utilization of the natural history collections remains the lack of access to electronic information on the specimens (Smith and Blagoderov 2012, Pensoft Publishers 2012). Large-scale digitization activities, such as the prioritized and ongoing digitization of the natural history collections in Norway, are forced to focus on the registration of only the very bare minimum information for as many specimens as possible to make any reasonable progress with these enormous specimen collections. To maintain maximum speed of digitization, specimens are photographed and only the scientific name of the species and the country where the specimen was collected is registered. The specimen images include a visual representation of the label and for most specimens this is the only source of information regarding the collecting site and the data when the specimen was collected (the so-called fundamental primary biodiversity information of where, when and what). Crowdsourcing the electronic registration of these details from the specimen labels could be one realistic approach to capture this information in a reasonable timeframe (Hill et al. 2012).

Why participate and contribute to citizen science transcription:

* Discovery of biodiversity information: Transcription of label information and electronic registration into online databases greatly improve the discoverability of museum specimens for the purpose of scientific research and other public use.
* Education: Students from high school level to graduate and post-graduate level can engage with the photographs of the museum specimens and take part in a first class learning experience in interaction with this resource of primary biodiversity information.
* Scientific research: Scientists that study natural history need readily access to primary biodiversity information made available from museums and their online databases. Using the transcription portal they can directly take part in making the primary biodiversity information they need for their own research available by transcribing the labels for the respective species groups and or countries that they study.
* Public good, open and free online biodiversity information: The information that we gather from the transcription portal will flow into the museum specimen database and be published to open and free data portals such as the Global Biodiversity Information Facility (GBIF) and the Encyclopedia of Life (EOL). The valuable information to document historic biodiversity patterns are thus preserved not only for future generations, but also made available for current research using up-to-date and modern web technologies.

Digitization of natural history collections in Norway:

The collections at the Natural History Museum in Oslo include an estimated total of more than 6 million specimens (Mehlum et al., 2011). The collections in Oslo are estimated to hold more than 65% of the specimens held by natural history museums in Norway. The digitization of the Norwegian natural history collections has high priority and has reached a level of more than 50% of the specimens recorded and added into an electronic database system. This is a high proportion digitized when compared to other large natural history collections worldwide, but the estimated efforts required to complete the appropriate registration of all remaining specimens are enormous. The Natural History Museum in Oslo started a large-scale digitization activity in 2013 where specimens are photographed and only the very minimum information of the scientific name and the country where the specimen was collected is registered. Capturing additional information such as the collecting location (where), collecting date (when) and the verified current scientific name (what) will substantially increase the scientific value of these data records.

Lichen herbarium, Hildur Krog collection from eastern Africa:

The first specimen collection that has been loaded to the Norwegian citizen science transcription portal includes lichens collected by the Norwegian biologist Hildur Krog and others in East Africa. Professor Hildur Krogh was originally introduced to limnology as a student of professor Eilif Dahl (1916-1993). Eilif and Hildur pioneered the work on chemical methods for identification of lichen species. Hildur was appointed curator of the lichen herbarium at the Botanical Museum of the University of Oslo in 1971. Between 1972 and 1996 Hildur Krog and T.D.V. Swinscow explored systematically the lichen genera of East Africa for the development of the flora “Macrolichens of East Africa” (figure 2). Already more than 10,000 specimens from Africa has been digitized and registered in the museum database, but still some 2,850 exist only as image files. We are asking the help from volunteer citizen scientists to register the label information from these image files. The imaging of this collection was made late 2013 and early 2014 by Silje Larsen Rekdal and Even Stensrud under the coordination of lichen curator Einar Timdal and Siri Rui (NHM-UiO) and with funding from GBIF-Norway (figure 3).

image2

Figure 2: Collecting sites for the already registered East African Macrolichens collected by Hildur Krog et al. (1972-1996). The images loaded to the crowdsourcing portal describe specimens from similar locations in the same region.

image3

Figure 3: Curator Einar Timdal photographing specimens from the lichen type herbarium at the Natural History Museum in Oslo, January 2013. Photo: Dag Endresen (NHM-UiO) CC-by-4.0.

Notes from Nature as a platform for connecting crowdsourcing portals
The aim is to integrate this Norwegian crowdsourcing portal for Lichen specimens better into the NfN platform. The NfN site provides already a gateway overview to natural history collections and herbaria datasets from several distributed institutions. So far, all of these connected datasets use transcription software provided as an integrated part of the NfN portal. However, to allow the NfN platform to grow, we propose a slightly modified model where NfN could act more as a gateway “consortium” platform with more flexibility for individual datasets of specimen images to run on locally installed transcription services, including services using different software implementations. The appropriate role of NfN could be more of the entry point portal and a common gateway for volunteer citizen scientists, and less of the actual technical software solution for making the transcriptions. A more modular design would also make it easier for a team of somewhat loosely connected developers to improve on smaller parts of the system in a collaborative manner.

Participating institutes could use a simple web interface, such as a standardized json format, for reporting progress and other key properties of sub-collections that are offered for transcription to volunteer citizen scientists. Technical solutions such as OpenId login could provide easier user registration and synchronization of user information for users contributing to the transcription of many datasets.

References

Ariño A (2010). Approaches to estimating the universe of natural history collections data. Biodiversity Informatics, 7:81–92.

Duckworth WD, Genoways HH and Rose CL (1993). Preserving Natural Science Collections: Chronicle of our Environment Heritage. Washington, DC: National Institute for the Conservation of Cultural Property 140 pp.

Hill A, Guralnick R, Smith A, Sallans A, Gillespie R, Denslow M, Gross J, Murrell Z, Conyers T, Oboyski P, Ball J, Thomer A, Prys-Jones R, de la Torre J, Kociolek P, Fortson L (2012). The notes from nature tool for unlocking biodiversity records from museum records through citizen science. ZooKeys 209: 219-233. doi: 10.3897/zookeys.209.3472

Krishtalka L and Humphrey PS (2000). Can natural history museums capture the future? BioScience 50(7): 611–617. DOI:10.1641/0006-3568(2000)050[0611:CNHMCT]2.0.CO;2

Mehlum F, Lønnve J, and Rindal E (2011). Samlingsforvaltning ved NHM – strategier og planer. Versjon 30. juni 2011. Naturhistorisk museum, Universitetet i Oslo. Rapport nr. 18, pp. 1-89. ISBN: 978-82-7970-030-2. Available at http://www.nhm.uio.no/forskning/publikasjoner/rapporter/NHM-rapport-18-samlingsplan.pdf, accessed 28 May 2014.

Pensoft Publishers (2012). No specimen left behind: Mass digitization of natural history collections [special issue]. Editors: Blagoderov, V. and Smith, V. ZooKeys 209: 1-267. ISBN: 9789546426451. Available at http://www.pensoft.net/journals/zookeys/issue/209/

Smith VS and Blagoderov V (2012). Bringing collections out of the dark. ZooKeys 209: 1-6. DOI: 10.3897/zookeys.209.3699

Soberón J and Peterson AT (2004) Biodiversity informatics: managing and applying primary biodiversity data. Trans. R. Soc. Lond., B, Biol. Sci., 359, 689–698. DOI:10.1098/rstb.2003.1439

Badges? We don’t need no stinkin’ badges! Or do we?

We always appreciate all the hard work spent transcribing records on Notes from Nature, and we want to celebrate your accomplishments. As you gain expertise on Notes from Nature, you earn badges that are added to your “Transcriber’s Life” page (if you have a Zooniverse account — its so easy to get one, and totally worth the 20 seconds it takes to have it).

When we launched Notes from Nature, we had badges for SERNEC and Calbug. Since some you might not know all the badges available, and since, right now, we only show three on each “collection page”, I wanted to walk you through them all, especially because we just added some new ones! In particular, we added one new badge for SERNEC and CalBug, and three new badges for our Macrofungi project.

Here are the original 5 SERNEC Badges (representing seed, sprouts and young tree), earned when you transcribe 1,10,25,75,and 250 records.
image

Now you can earn a “mature tree badge” for 1000 records transcribed.

tree3

Here are the three original Calbug badges (egg, catepillar, and butterfly), earned when you transcribe 1,25, and 100 records.

badges_designs_bugs

Now you can also get the “butterfly swarm badge” when you transcribe 500 records.

butterflies_stage4_88px

And introducing the new Macrofungi badges (spore, mycelium, and mushroom) for transcribing 1,25,and 100 records! Sweet!

macrofungi

We hope you want to earn all 13. I guess we do like and need those badges, and we hope you do too.

Earth Day – Our One Year Anniversary

A year ago to this day, on Earth Day 2013, we launched Notes from Nature. Our choice of launch date reflects our hope that Notes from Nature can be a part of the Earth Day mission of creating a sustainable, green future. Every transcription that has been done over the last year brings a new data point onto our map of biodiversity. Taken together, we can finally assemble richer and deeper understanding of this amazing world all of us inhabit. With that knowledge, we can be better able to make wise choices about our planet’s future. Whether bugs, mushrooms, plants, birds or the myriad millions of species described or undescribed, our world is brimming with life, and our understanding deepened and stregthened with the meticulous efforts to collect and catalog that diversity.

So we are having a birthday party, in case you couldn’t guess from our front page, and we want you to join us! Come celebrate Earth Day and Notes from Nature both. Help us do something we never even thought possible — help us get our 500,000th transcription. We are that close, and we think it might be possible to get over the hump on our birthday. How cool would that be?

We mostly wanted to take the opportunity today to again thank every single transcriber out there for even tackling one record on Notes from Nature. We continue to be awed and amazed and appreciative of the effort you put in. We are pleased to have some new and exciting features coming your way at Notes from Nature in the next days and weeks and we hope we can keep this engaging and fun and productive in the long term for everyone. We also hope that the cool specimens and organisms you see on Notes from Nature speak for themselves. We think they tell the story of this green planet, on Earth Day no less, better than anything else.

Astraptes enotrus
Herbarium sheet
DSCN4364

Thanks again and take a second to wish us a happy one year anniversary!

FAQs and Useful Tools

We have put together a list of frequently asked questions and useful tools related to transcribing records. These are  based on discussion threads in the NfN discussion forums (linked at the bottom). Please take a look, and let us know if there are any questions we missed or useful tools that you can suggest. These will eventually go on a page of the Notes from Nature website.

We really appreciate your input and all that you do for Notes from Nature. Thank You!

Common issues or questions that you may encounter while transcribing Notes from Nature records:

1.) Interpretation: In general, you should minimize interpretation of open-ended fields and enter information verbatim. This way, we can better achieve consensus when checking multiple records against one another (see below, on that process). However, some discretion would be nice. Here are examples:

Interpretation that you should make: Simple spacing errors (e.g. “3miN. of Oakland” should be “3 mi N. of Oakland”)

Interpretation you should leave to us: Don’t interpret abbreviations, we’ll sort that out. (e.g. “Convict Lk.” )

2.) Not in English: Transcribe exactly as written. Match label content to transcription fields as best as you can.

3.) Abbreviations: Transcribe exactly as written.

4.) Spelling mistakes: Transcribe exactly as written, unless you have looked it up and are absolutely certain of a simple spelling mistake. In this case, you can enter the correct spelling.

5.) Problem records: If you come across a problem record that may need to be addressed by a scientist, like a faulty image or a record with illegible handwriting, you can flag the record by commenting on it (e.g. with the hashtag #error) and indicate what is in error.

6.) Provinces: Provinces go in the Location field (e.g. Coastal Plain Province, Piedmont Province).

7.) Capitalization: Sometimes information may be in all capital letters on the labels. Unless this is an abbreviation, you should capitalize only the first letter of every word in your transcription (e.g. “COASTAL PLAIN PROVINCE” should be “Coastal Plain Province”).

8.) Many collectors: In many cases, collectors may be listed on different lines of the label with no punctuation separating them. In your transcription, please separate the collectors with commas.

9.) Missing information: What should you do when there is no information available for a field? When information is not given on the label, you should leave the field blank (in the case of open-ended fields) or select “Unknown” or “Not Shown” in the drop-down lists.

10.) Inconsistent collector names: You will often find several variations of the same collector name (e.g. “R. Kral” or “R.Kral”, “RWG” or “R.W.Garrison”). Use similar discretion when transcribing these variations as in the localities.

Interpretation that you should make: Simple spacing errors (e.g. “R.Kral” should be “R. Kral”)

Interpretation you should leave to us: Don’t interpret abbreviations, we’ll sort that out. (e.g. “RWG” should remain “RWG”)

11.) Many scientific names: For Calbug, you do not need to enter the species name (we have this info already), but if there is a scientific name that is different from what is listed in the record, put it into the “Other Notes” field. These could be old names, or plant host names.

For SERNEC Herbarium specimens, copy only the most recent name. This can be determined based on the date that appears on the ‘annotation label.’ If you do not see a date then enter the name that appears on the primary label.

12.) Variations and subspecies (SERNEC): Record the subspecies, but omit the scientific author’s name. So “Cyperus odoratus var. squarrosus (Britton) Jones, Wipff & Carter” becomes “Cyperus odoratus var. squarrosus”. “Echinodorus cordifolius (Linnaeus) Grisebach ssp. cordifolius” becomes “Echinodorus cordifolius ssp. cordifolius”.

13.) Scientific name (SERNEC): Provide the most recent name, whether it is a species name (a two-word combination of the genus and what is called the “specific epithet” in botanical nomenclature) or a one-word name that is at a higher taxonomic rank (e.g., just the genus or family name). Names at higher taxonomic ranks than species are used when a more precise identification has not been made.  The species name should typically take the form of a genus name that begins with a capital letter and a specific epithet that begins with a lowercase letter.  If any of the names are given in all capitals, such as “CYPERUS ODORATUS”, the name should be entered using the typical convention, “Cyperus odoratus” in this case.

14.) Latitude and Longitude: How do you enter latitude and longitude values, and where do these values go? Enter exactly as written, you can find symbols in Word or by searching online (e.g. 33° 62’ 22” N  116° 41’ 42” W). You can also produce the degree symbol ° using key combinations (alt + 0 on a mac; alt + 0176 on a PC, with the key pad on the right side of your keyboard). This information should go into the “Location” or “Locality” field, depending on the project you are working on.

15.) Special Characters: What should you type when there is a special character in a text string, such as a degree symbol or language-specific characters? You can do a google search for the symbol or copy and paste it from Microsoft Word symbols. There are also key combinations for common symbols. As mentioned above, you can produce the degree symbol ° using key combinations (alt + 0 on a mac; alt + 0176 on a PC, with the key pad on the right side of your keyboard).

16.) Elevation: Enter elevation verbatim into the “Other Notes” field for Calbug and the “Habitat and Description” field for SERNEC Herbarium records.

17.) County: If the county is not given on the label, please find the appropriate county using google search. However, if there are multiple potential counties for a locality, please leave the county field blank.

18.) Checking your transcription: You can use the link to the left of the “Finish Record” button (e.g. “1/9” or “9/9”) to check the information that you entered. Just click on any of the fields to make any necessary edits to your transcription.

19.) When is a record finished?: These blog posts describe the data checking process that uses 4 transcriptions of the same record to derive a consensus.

http://blog.notesfromnature.org/2014/01/14/checking-notes-from-nature-data/

http://soyouthinkyoucandigitize.wordpress.com/2014/01/14/412/

 

Some Useful Tools (discovered by NfN users)

Counties and Cities: Good tools for finding counties etc. are lists on wikipedia, there are lists of municipalities in each state of the USA (there are also similar lists for others). For example, https://en.wikipedia.org/wiki/List_of_municipalities_in_Florida (via the linkbox you can also change the state).

Mountains: https://en.wikipedia.org/wiki/Category:Lists_of_mountains_of_the_United_States

Uncertain Localities: Geographic Names Information System, U.S. Geological Survey.

https://geonames.usgs.gov/pls/gnispublic

Mapping tool with topo quads: To find uncertain counties or localitieshttp://mapper.acme.com

Collector Names: The Essig Museum of Entomology database has lists of collector names and periods of activity for many collectors you will find in Calbug records.  http://essigdb.berkeley.edu/query_people.html

Hard-to-read text: Use “Sheen”, the visual webpage filter, for some hard-to-read handwriting written in pencil. (Tip was from the War Diary Zooniverse project) https://chrome.google.com/webstore/detail/sheen/mopkplcglehjfbedbngcglkmajhflnjk?hl=en-GB

Special symbols: You should be able to find symbols in word or by doing a google search and copy and paste. Here are a few:

-           degree symbol for coordinates:  °

-          plus minus: ±

-          fractions: ⅛ ¼ ⅓ ⅜ ½ ⅝ ⅔ ¾ ⅞

-          non-English symbols: Ä ä å Å ð ë ğ Ñ ñ õ Ö ö Ü ü Ž ž

The Plant List: Search for scientific names of plants -http://www.theplantlist.org/

List of Trees:   https://en.wikipedia.org/wiki/Category:Trees_of_the_United_States

Integrated Taxonomic Information System (ITIS): http://www.itis.gov/

Essig database (for Calbug): Unsure if you spelled a collector name or species name right? It might also be worth checking similar entries already in the database, http://essigdb.berkeley.edu/.

 

NfN Discussion threads that this is based on:

FAQs

http://talk.notesfromnature.org/#/boards/BNN0000001/discussions/DNN000024q

Useful Tools

http://talk.notesfromnature.org/#/boards/BNN0000001/discussions/DNN00001vl

A Big Thank You to Ornithology Ledger Transcribers

Phase 1 of the Ornithological Collections transcription project has been successfully completed.   Thank you to everyone who participated in the first stage of the project.   Over a thousand users worked on the project producing nearly 100,000 transcriptions from 1037 register pages.

Image

The resulting dataset is going to be tremendously valuable to Natural History Museum curators and researchers all over the world.   We have included some pictures of museum specimens that now have useful electronic data thanks to your work as citizen scientists.  Initial analysis suggests citizen scientists were especially busy over the mid-winter period. 

Image

Special mention must go to Snowysky, Estron21, OneUniverse, Dmbrgn, mbhook and AstroboyOW for their sterling work.  We’d love to hear from you.   But all contributions are valued and we’d love to hear from anyone who has participated in the first phase.  Especially, as we enter an exciting second stage with six more registers that need transcribing.  We hope you will all support us in this new phase ultimately aimed at getting all the Natural History Museum’s bird specimens on line.

Image

With thanks and best wishes,

NHM Bird Group

 

Making Progress Clear on Notes from Nature

Notes from Nature is something of a departure for a Zooniverse project.  Rather than a single organization asking for help with the exact same tasks, Notes from Nature is, like its subject matter, diverse.   So we have labels of bugs, sheets of plants, fungal specimen labels, and ledgers of birds.  And we have a lot – and I mean A LOT— of images that need transcription.   Not only that, but each of those images are transcribed more than once—as mentioned in previous posts, right now each image gets 4 separate transcriptions.

All of this is preface to the main topic of this post – how do we measure “progress” with the tasks of transcribing all of this data.  The science team on Notes from Nature has talked a lot about this, and a number of complexities related to making sure that the numbers are transparent to you, our volunteers.  This post covers a fair amount about how to measure overall progress.  We also know that there have been issues with transcription counts for individual volunteers. We believe that we have solved those issues, but we’ll cover those separately in another blog post.

So, here are two of the main issues we have been dealing with and some recent solutions that have been implemented across Notes from Nature:

Issue 1: Do we measure total number of transcriptions or total number of images that are “finished” (e.g. transcribed four times)?

Solution:  We have decided to measure total transcriptions completed across all projects and within projects.  This is a change from our previous strategy which had mixed and matched these different counts on different pages.  We think the most obvious measure is overall effort put in, even if this means it is harder to know how many images have been done.

Issue 2:  Should we even measure “completeness” within a project (e.g., Calbugs)?  The reason this is an issue is that most projects on Notes From Nature have only posted a small subset of available images and there are many more “waiting in the wings”.  We don’t want to say “hey, only a 1000 more images to transcribe” and then just a little later go “Oh!  Just kidding, there are now 50000 more!” Our ultimate goal is to stage the many remaining images as smaller batches with compelling themes derived from their research or other societal values (e.g., all specimens from a particular national park or collected by an important historical figure).   This will give us a chance to celebrate the success of completion more regularly.  At the moment, we are seeking funding to do this.

Solution:  We do want to show that progress is being made on the current batch of images on Notes from Nature, but we want to avoid any confusion if more images are made available once the current sets are close to be done.  So we are showing a percentage that represents total number of transcriptions completed over the total number needed for a batch, but we link to this very blog post to explain why those may change.  We are also providing some information on progress with the images themselves, and here we provide counts of “total images”, “active images”, “complete images”.  Below is a definition of each of those terms:

total images - The number of source images currently available.
active images - The number of images that are either in progress with being transcribed or waiting for transcription.
complete images -  The number of images that have been independently transcribed four times

We hope the new changes lead to greater transparency!  We are also so awestruck and appreciative of all the participation as we pass the 5000 registered participants and 400,000 transcriptions mark- we can’t thank you enough or tell you how important the public participation has been for this endeavor.  Finally, we love comments and suggestions. Please feel free to provide some, and we’ll get right back to you.
Follow

Get every new post delivered to your Inbox.

Join 710 other followers

%d bloggers like this: