Completed Classifications and the Wisdom of Crowds

If you scroll down on our front page, you can keep tabs on all of the effort (so much effort!) that has gone in to our first season of images so far. We’ve had some questions about what these values actually mean, so let’s dig in to what each of these statistics is actually keeping track of:

Volunteers. The first number tells us how many unique users (you!) have participated in the project so far. If a new volunteer makes even a single classification, they get counted here, but this number also includes our regular and super-users who have helped us plow through hundreds or even thousands of images.

Classifications. The second number tells us how many times our volunteers have clicked “DONE” on one of our images. In the first few weeks of our project, you guys have classified our images over 270,000 times. Keep up the amazing work!

Subjects. This is the number of image sets that we uploaded in our current season. You may have noticed that our camera trap images tend to be grouped in sets of three, which you can play through to see a short sequence of animal activity. Every time a camera is triggered, it captures these three-image-long bursts, which we call “subjects”. For this first batch, we uploaded 34,917 subjects (or over 100,000 individual images). This number will only change when we upload new material for our next seasons.

Completed Subjects. The final statistic presented on Zooniverse is the number of subjects that have been classified enough times to “retire” – that is, we have enough information from our volunteers that we remove the image from circulation. This part is a little confusing: why, if 274,275 classifications have been made on only 34,917 subjects, are only ten thousand or so of those subjects considered “complete”?

To get into how this works, we need to bring up how Zooniverse operates on the wisdom of the crowd, which is to say, the final “official” image classification is based on the collective opinion of a group of individuals rather than the single opinion of one expert. Across numerous situations, it has been shown that averaging a large number of best guesses actually produces incredibly accurate estimates (statistically, averaging helps to reduce the variability or “noise” associated with each single guess to get at the true underlying number that is being evaluated).

So how does this work with image classifications, where we’re asking you not to estimate some kind of single number, but rather to identify species, counts, and behaviors? As a baseline, each subject set on Eyes on the Wild is classified by 15 users before it is “retired” or completed. That means we get 15 separate classifications of species, counts, and behaviors for each individual image. What we then do is use an algorithm which takes this collection of classifications and pulls out the most common species or set of species identified, the average count selected, and the proportion of classifications which say that a certain behavior is taking place. This produces the final single line of data for that subject set that we use in our analyses, and explains why it takes exponentially more classifications than subject sets to lead to image retirement!

And it’s not that we don’t trust our users to produce the best classifications they can. These images can be hard, even for experts! Some animals are particularly tricky to identify - is that long, furry, dark thing you’re looking at in a night picture a fisher or a raccoon? Some are difficult to see or count! We often miss birds in the trees if there’s a big deer center-frame, or don’t pick up on that fourth deer hiding off behind a bush in the distance. There’s variation in capturing behavioral performances (a deer foraging only in the third image in a set might be missed by users only looking at the first picture) that comes out in the wash when averaging multiple user IDs. The more eyes on a picture, the better our ultimate answers. This also means you shouldn’t be afraid to take your best guess if you’re not 100% certain of a classification or a count! Aggregating Zooniverse answers in this way produces classification data that has been shown to be as good as (or even better than!) data produced by a single expert researcher.

Whose ear is this? Take your best guess - along with 14 other volunteers!

But, we also know that humans are much better at identifying particular animals or things than others. While you might struggle a bit with the fisher vs. raccoon conundrum, you’re generally going to be spot-on at picking out humans, cars, and of course, our ever-present deer. In order to focus as much of our volunteer’s effort on the really difficult classifications, we’ve modified the “retirement rules” for blank images and those that contain humans, vehicles, or deer, to shift these pictures out of the pool sooner (while still having high confidence that they have been correctly identified). If the first five classifications or ten total classifications come in on an image with no animals as “nothing there”, it gets retired and removed from circulation. The same goes for images classified as containing humans or deer. In the future, we’ll have an artificial intelligence algorithm that identifies these images before they ever go online, leaving only the most interesting and challenging pictures for human brains! (stay tuned for more about images and AI in a future blog post).

More questions about retirement or Zooniverse statistics? Leave them in the comments below!