Notes from Nature and the Push To A Million

Some quick Notes from Nature updates! As you might have noticed, our numbers of transciptions have occassionally grown by leaps and bounds over the last months. What gives? The shortest possible answer is “Ornithology Ledgers” and the truly impressive effort happening there. Back in late June, we had 149,537 Ornithology transcriptions. As of last Sunday Oct. 12, 2014, that number is now 294,973. Wow. That is a lot of work in 4 months. None of those Ornithology records had been included in our total counts (that show up on our homepage), because we had originally been focusing on counting ledger “pages” not transcriptions. And we still have to generate those ledger record counts separately from our logs every few weeks and add them into the total. We hope to solve the problem with manual additions to the total in the near future. So if you see the count “jump” every few weeks, that’s why.

Speaking of counting, many weeks ago we tried to better report the “per collection” statistics, in particular the amount of effort needed to complete work on a collection. We have recently refined those numbers YET AGAIN, and we hope the current reporting is (at least) less confusing. The long story short is each record is transcribed multiple times, and that number is usually 4 times. We have plans to make this more efficient in the future, but until then, this is a workable number of replicate transcriptions. However, occassionally that number is 5 or 6 (for reasons that have to do with both history and some techology glitches). When you add all this up, it was hard to give an exact number of total transcriptions needed.

Now, if you look at any introduction to collections page – take the Herbarium project (http://www.notesfromnature.org/#/archives/herbarium) – you will see total number of images, total number of active images, total number of complete images and a count of transcriptions completed. The active plus complete image numbers should add up to the total number.  And the total number of transcrptions gives an overall assessment of effort by our citizen science volunteers. The percent completed is now calculated in terms of images not transcriptions (i.e., completed images divided by the total images).

We mention all this because Notes from Nature is closing in on a HUGE MILESTONE — one million transcriptions! We only have 113,000 transcriptions or so to go.  We will mention this more in upcoming blog posts as we make what we hope is a BIG PUSH to 1 million!

ONE MILLION TRANSCRIPTIONS

Notes from Nature Remembers

Notes from Nature is super pleased to introduce something that our citizen scientist transcribers have wanted forever – a ditto function! What does the ditto function do? Simply put: You can access and select your last few entries that you made nearly instantaneously. Yeah, Notes from Nature now “remembers” your entries! This means less scrolling through long “pick lists” (we hope). On the downside, we have only implemented it – so far – in CalBug. We hope to have this working in other interfaces as soon as is possible and we’ll, of course, keep you posted via the blog.

So here are more details about this new function. To access your last 5 entries, you just need to click Control-M while on any field. When you do hit Ctrl-M, it pulls up those past entries and you can select and click any of those entries and, voila, you have finished that field. For CalBug, where there are a lot of records from the USA, and from California, being able to quickly select those from record to record should make for more speedy transcribing. But let us know what you think! We hope it is a major improvement.

ditto
The ditto function for Country showing two previous entries


The ditto function was the work of Lisa Larson, who is a web developer and project leader at Cornell Ornithology Lab, and who attended our first Citizen Science hackathon back in December 2013. She has been great about seeing through this fantastic idea that came from a team focusing on how to improve public participation.

So, to reiterate, in CalBug, clicking Ctrl-M will help you “remember” the entries you have made earlier, and speed ability to “click through” commonly used entries. You can always access the “help text” for further reminding. Finally, if you are using a small screen, you might not have a lot of room at the bottom of the screen for the ditto function entries to show up. You might want to move the transcription tool (you can drag the main part of the tool wherever you like on the canvas) or otherwise increase screen real-estate.

Working Across Collections in Notes from Nature

We know some people love bugs, other birds, and yet others plants or fungi. But some people just find all the diversity cool and help across multiple collections. And we have just put together some new badges to reward those restless spirits. It is just a start, but we have a beginner and intermediate “multicollections” badge for those who work in the plant, insect and fungi collection already up and ready for you to go get.

Beginner_plant-insect-fungi

You get this badge for transcribing 1 or more record in the plant, insect and fungi collections.


plant-fungi-insect

You get this badge for transcribing 25 or more records in the plant, insect and fungi collections.

Happy transcribing!

How long and when do you transcribe records in Notes from Nature, and other neat ways to look at your (amazing) effort.

This post is co-written by Julie Allen and Rob Guralnick

Last post, we provided a look at cumulative transcription efforts across our four current projects. These summary numbers hide all kinds of interesting details about how these transcriptions take place and where the data is coming from. Thanks to efforts on the part of guest bloggers Julie Allen (http://wwx.inhs.illinois.edu/directory/show/juliema) and Libby Ellwood (https://www.idigbio.org/content/welcome-libby-ellwood-new-postdoctoral-scholar-idigbio), we have some more in depth statistics to show you.

First, where has your effort helped us get a better understanding of biodiversity? The map below shows you a count of number of records transcribed per country across the world. Not surprisingly there has been a huge number of transcribed records that come from the United States (where we have 407,575 transcriptions completed). However, we now have transcriptions from 175 countries! Costa Rica, for example, has 6,766 records transcribed, 28 in Thailand and 26 in Mali. Explore more in the map below:


We have also been asking some questions about transcription effort and time put in by all the amazing people who have worked on Notes from Nature. So, simple questions: When do people transcribe records? And how long does it take? In terms of the “when”, here is a graph of number of transcriptions and when they happen during the day for the Herbarium collection (based on GMT time). Although its hard to know what time zone our transcribers might be in, we see a lot of activity at all hours (even and maybe especially late night) We wish we could provide more coffee for those working late.

TimeofDay.7.28.14-page-0

Not all records are created equal in terms of time needed to transcribe them, and surprisingly not all collections are equal either! We know that some records are harder to transcribe than others but on average it takes 3:05 min for a Herbarium record to be transcribed whereas only 2:16 for Macrofungi and the fastest of all 2:04 for an average CalBug record. There is a lot of variation around those times as you can see from the plot below.

BoxPlot.TimeTranscriptions.7.28.14-page-0

Amazingly, there have been 20,761 people helping to transcribe records across the four collections! We find a classic pattern in citizen science transcription projects where the majority of transcriptions have been performed by just a few people, while most folks only contribute briefly (a record or two) and then move on. We’d really like to find a way to engage folks for the long term and change that equation, but as you can see from the graph below, we also have a similar result with a few people doing the lion share of the work, with one person logging an amazing 92,528 transcriptions!!

page-0

We hope these summaries provide some useful information. We learn a lot from these, all with the goal of hopefully improving how well Notes from Nature can work in the future. We have some neat ideas about those improvements and hopefully will be sharing some of those improvements just as soon as they are ready. If you want to see the code that generated these results, we’ve posted it to github (https://github.com/juliema/NfN_DataParsing) and will make some of the results data available soon too.

Culmulative transcription effort — How are we doing a year (plus) in?

Now that we have passed our one year anniversary here at Notes from Nature, it might be worth doing a little reflecting and data mining. Our favorite activities! So we decided to ask some simple questions: What are the trends in rates and cumulative activities on Notes from Nature? Do we find, for example, that projects show distinctive trends over time? Do they start hot and then settle into a comfortable groove? Maybe you have had some questions too, so feel free to ask us if you want to see a particular metric.

Below we show the cumulative transcription numbers, which should also give a fairly good idea of cumulative effort. However, we should note that each skipped record is also, at this time, counted as a transcription here, so the numbers are somewhat inflated.Image

Doing this simple plot yielded some (pleasant) surprises! For example, both Calbug and the Natural History Museum London (NHML) Ornithology project have shown trends of increasing transcriptions over the past months. Ornithology, in particular, is noteworthy in that the effort has really picked up. As well, this is the first time we’ve counted transcriptions for the NHML Ornithology project — when we set this one up, we focused on pages completed not transcriptions, and our current counts don’t include the work on birds. As you can see, a lot is getting done quickly on those ledgers, which is fantastic.

There are a lot of blips and bumps in the daily transcription rates, and we’d like to spend more time correlating those to things like national or international press, efforts to promote the project, new badges coming online, etc. But again, we mostly want to thank all those folks who’ve put time into the 669,666 (give or take) transcriptions on Notes from Nature. That is a herculean (collective) effort.

 

Natural history citizen science crowdsourcing

This is a guest post from Dag Endresen, who is the manager for the Norwegian participant node of the Global Biodiversity Information Facility (GBIF).
______________________________________

Are you interested in natural history? Help us to capture label information from images of specimens from the Norwegian natural history collections in Oslo. The Natural History Museum at the University of Oslo (NHM-UiO) and GBIF-Norway has released a new citizen science crowdsourcing portal for transcription of specimen label information (figure 1). This new portal was developed in collaboration with the Notes from Nature (NfN) team and follows the design example established by NfN (Hill et al. 2012). The crowdsourcing portal was presented last week for the 200-year jubilee of the Botanical Garden in Oslo (NHM-UiO)

image1

Figure 1: A new citizen science platform to capture label information from photographs of specimens in the natural history collections of Norway. Available at http://gbif.no/transcribe/

Primary biodiversity information (where, when and what):

Data for biological species on where, when and what define the so-called primary biodiversity information and is recognized as the minimum information requirement for fundamental biodiversity scientific research activities. Species distribution modelling is one of the important research tools for understanding the ecology of species, and is dependent on available primary biodiversity information (where, when and what) (Soberón and Peterson 2004). Other notable uses of primary biodiversity data includes refining range map boundaries, understanding inventory completeness, validating knowledge about habitat associations, understanding trait variation, providing knowledge about preparations that include tissues, and more.

The total number of museum specimens in worldwide natural history collections are estimated to be somewhere between 1.2 billion and 3 billion (Ariño 2010, Duckworth et al. 1993). The specimens archived in the natural history museums during more or less 250 years, provide a unique resource for understanding biological and ecological processes across time and place. A very low proportion of these museum specimens have been recorded electronically into databases, perhaps as low as 5% for many collections (Krishtalka and Humphrey 2000). The biggest challenge for efficient utilization of the natural history collections remains the lack of access to electronic information on the specimens (Smith and Blagoderov 2012, Pensoft Publishers 2012). Large-scale digitization activities, such as the prioritized and ongoing digitization of the natural history collections in Norway, are forced to focus on the registration of only the very bare minimum information for as many specimens as possible to make any reasonable progress with these enormous specimen collections. To maintain maximum speed of digitization, specimens are photographed and only the scientific name of the species and the country where the specimen was collected is registered. The specimen images include a visual representation of the label and for most specimens this is the only source of information regarding the collecting site and the data when the specimen was collected (the so-called fundamental primary biodiversity information of where, when and what). Crowdsourcing the electronic registration of these details from the specimen labels could be one realistic approach to capture this information in a reasonable timeframe (Hill et al. 2012).

Why participate and contribute to citizen science transcription:

* Discovery of biodiversity information: Transcription of label information and electronic registration into online databases greatly improve the discoverability of museum specimens for the purpose of scientific research and other public use.
* Education: Students from high school level to graduate and post-graduate level can engage with the photographs of the museum specimens and take part in a first class learning experience in interaction with this resource of primary biodiversity information.
* Scientific research: Scientists that study natural history need readily access to primary biodiversity information made available from museums and their online databases. Using the transcription portal they can directly take part in making the primary biodiversity information they need for their own research available by transcribing the labels for the respective species groups and or countries that they study.
* Public good, open and free online biodiversity information: The information that we gather from the transcription portal will flow into the museum specimen database and be published to open and free data portals such as the Global Biodiversity Information Facility (GBIF) and the Encyclopedia of Life (EOL). The valuable information to document historic biodiversity patterns are thus preserved not only for future generations, but also made available for current research using up-to-date and modern web technologies.

Digitization of natural history collections in Norway:

The collections at the Natural History Museum in Oslo include an estimated total of more than 6 million specimens (Mehlum et al., 2011). The collections in Oslo are estimated to hold more than 65% of the specimens held by natural history museums in Norway. The digitization of the Norwegian natural history collections has high priority and has reached a level of more than 50% of the specimens recorded and added into an electronic database system. This is a high proportion digitized when compared to other large natural history collections worldwide, but the estimated efforts required to complete the appropriate registration of all remaining specimens are enormous. The Natural History Museum in Oslo started a large-scale digitization activity in 2013 where specimens are photographed and only the very minimum information of the scientific name and the country where the specimen was collected is registered. Capturing additional information such as the collecting location (where), collecting date (when) and the verified current scientific name (what) will substantially increase the scientific value of these data records.

Lichen herbarium, Hildur Krog collection from eastern Africa:

The first specimen collection that has been loaded to the Norwegian citizen science transcription portal includes lichens collected by the Norwegian biologist Hildur Krog and others in East Africa. Professor Hildur Krogh was originally introduced to limnology as a student of professor Eilif Dahl (1916-1993). Eilif and Hildur pioneered the work on chemical methods for identification of lichen species. Hildur was appointed curator of the lichen herbarium at the Botanical Museum of the University of Oslo in 1971. Between 1972 and 1996 Hildur Krog and T.D.V. Swinscow explored systematically the lichen genera of East Africa for the development of the flora “Macrolichens of East Africa” (figure 2). Already more than 10,000 specimens from Africa has been digitized and registered in the museum database, but still some 2,850 exist only as image files. We are asking the help from volunteer citizen scientists to register the label information from these image files. The imaging of this collection was made late 2013 and early 2014 by Silje Larsen Rekdal and Even Stensrud under the coordination of lichen curator Einar Timdal and Siri Rui (NHM-UiO) and with funding from GBIF-Norway (figure 3).

image2

Figure 2: Collecting sites for the already registered East African Macrolichens collected by Hildur Krog et al. (1972-1996). The images loaded to the crowdsourcing portal describe specimens from similar locations in the same region.

image3

Figure 3: Curator Einar Timdal photographing specimens from the lichen type herbarium at the Natural History Museum in Oslo, January 2013. Photo: Dag Endresen (NHM-UiO) CC-by-4.0.

Notes from Nature as a platform for connecting crowdsourcing portals
The aim is to integrate this Norwegian crowdsourcing portal for Lichen specimens better into the NfN platform. The NfN site provides already a gateway overview to natural history collections and herbaria datasets from several distributed institutions. So far, all of these connected datasets use transcription software provided as an integrated part of the NfN portal. However, to allow the NfN platform to grow, we propose a slightly modified model where NfN could act more as a gateway “consortium” platform with more flexibility for individual datasets of specimen images to run on locally installed transcription services, including services using different software implementations. The appropriate role of NfN could be more of the entry point portal and a common gateway for volunteer citizen scientists, and less of the actual technical software solution for making the transcriptions. A more modular design would also make it easier for a team of somewhat loosely connected developers to improve on smaller parts of the system in a collaborative manner.

Participating institutes could use a simple web interface, such as a standardized json format, for reporting progress and other key properties of sub-collections that are offered for transcription to volunteer citizen scientists. Technical solutions such as OpenId login could provide easier user registration and synchronization of user information for users contributing to the transcription of many datasets.

References

Ariño A (2010). Approaches to estimating the universe of natural history collections data. Biodiversity Informatics, 7:81–92.

Duckworth WD, Genoways HH and Rose CL (1993). Preserving Natural Science Collections: Chronicle of our Environment Heritage. Washington, DC: National Institute for the Conservation of Cultural Property 140 pp.

Hill A, Guralnick R, Smith A, Sallans A, Gillespie R, Denslow M, Gross J, Murrell Z, Conyers T, Oboyski P, Ball J, Thomer A, Prys-Jones R, de la Torre J, Kociolek P, Fortson L (2012). The notes from nature tool for unlocking biodiversity records from museum records through citizen science. ZooKeys 209: 219-233. doi: 10.3897/zookeys.209.3472

Krishtalka L and Humphrey PS (2000). Can natural history museums capture the future? BioScience 50(7): 611–617. DOI:10.1641/0006-3568(2000)050[0611:CNHMCT]2.0.CO;2

Mehlum F, Lønnve J, and Rindal E (2011). Samlingsforvaltning ved NHM – strategier og planer. Versjon 30. juni 2011. Naturhistorisk museum, Universitetet i Oslo. Rapport nr. 18, pp. 1-89. ISBN: 978-82-7970-030-2. Available at http://www.nhm.uio.no/forskning/publikasjoner/rapporter/NHM-rapport-18-samlingsplan.pdf, accessed 28 May 2014.

Pensoft Publishers (2012). No specimen left behind: Mass digitization of natural history collections [special issue]. Editors: Blagoderov, V. and Smith, V. ZooKeys 209: 1-267. ISBN: 9789546426451. Available at http://www.pensoft.net/journals/zookeys/issue/209/

Smith VS and Blagoderov V (2012). Bringing collections out of the dark. ZooKeys 209: 1-6. DOI: 10.3897/zookeys.209.3699

Soberón J and Peterson AT (2004) Biodiversity informatics: managing and applying primary biodiversity data. Trans. R. Soc. Lond., B, Biol. Sci., 359, 689–698. DOI:10.1098/rstb.2003.1439

Follow

Get every new post delivered to your Inbox.

Join 741 other followers

%d bloggers like this: