Results from Label Babel 2
We wanted to present some preliminary results of the Label Babel 2 expedition. The results from you all look spectacular. The vast majority of the data looks like the image below and will make excellent training data for the models.
The blue boxes in the image above are the outlines that you drew. The red boxes are the final crop for the labels; where we merged the blue box into a single “best” interpretation of the label. This is beautiful! There is close agreement on where & what the labels are. The only things we wanted identified as labels are outlined in blue. The tag, stamp, and ruler/color guide are not outlined, which is correct. The majority of the data looks this good.
The data from this expedition was generally great, but there were some wrinkles in the output. For example, there are some challenges in terms of processing the data in order to find the best interpretation of the labels. Some people outlined the wrong things or nothing at all, but by far the most common problem (unique to this expedition) was the lumping of several labels into a single outline (see image below). Here, we have added a new color “green” that shows several labels together. Unfortunately these kinds of entries can’t be used for training the models in the next step of our process.
Next, we go on to automating the label finding process by using the data you provided to train an automated process. After that, we will automate the classifications using the labels you provided around whether text was typewritten. We plan on using an artificial neural network that does both in one swell foop. We will use these annotations as the training data for this neural net.
We are eager to see the results and to use this data as well. We’ll give another update on the progress in a few weeks. In the meantime, we want to thank all of the participants in this expedition and say to also note how impressed we are with the results.
— The Notes from Nature Team