Completed Classifications and the Wisdom of Crowds
If you scroll down on our front page, you can keep tabs on
all of the effort (so much effort!) that has gone in to our first season of
images so far. We’ve had some questions about what these values actually mean,
so let’s dig in to what each of these statistics is actually keeping track of:
Volunteers. The
first number tells us how many unique users (you!) have participated in the
project so far. If a new volunteer makes even a single classification, they get
counted here, but this number also includes our regular and super-users who
have helped us plow through hundreds or even thousands of images.
Classifications. The
second number tells us how many times our volunteers have clicked “DONE” on one
of our images. In the first few weeks of our project, you guys have classified
our images over 270,000 times. Keep up the amazing work!
Subjects. This is
the number of image sets that we uploaded in our current season. You may have
noticed that our camera trap images tend to be grouped in sets of three, which
you can play through to see a short sequence of animal activity. Every time a
camera is triggered, it captures these three-image-long bursts, which we call “subjects”.
For this first batch, we uploaded 34,917 subjects (or over 100,000 individual
images). This number will only change when we upload new material for our next
seasons.
Completed Subjects.
The final statistic presented on Zooniverse is the number of subjects that have
been classified enough times to “retire” – that is, we have enough information
from our volunteers that we remove the image from circulation. This part is a
little confusing: why, if 274,275 classifications have been made on only 34,917
subjects, are only ten thousand or so of those subjects considered “complete”?
To get into how this works, we need to bring up how
Zooniverse operates on the wisdom of the crowd,
which is to say, the final “official” image classification is based on the
collective opinion of a group of individuals rather than the single opinion of
one expert. Across numerous situations, it has been shown that averaging a
large number of best guesses actually produces incredibly accurate estimates (statistically,
averaging helps to reduce the variability or “noise” associated with each
single guess to get at the true underlying number that is being evaluated).
So how does this work with image classifications, where we’re
asking you not to estimate some kind of single number, but rather to identify species,
counts, and behaviors? As a baseline, each subject set on Eyes on the Wild is classified
by 15 users before it is “retired” or completed. That means we get 15 separate
classifications of species, counts, and behaviors for each individual image.
What we then do is use an algorithm which takes this collection of
classifications and pulls out the most common species or set of species identified,
the average count selected, and the proportion of classifications which say
that a certain behavior is taking place. This produces the final single line of
data for that subject set that we use in our analyses, and explains why it
takes exponentially more classifications than subject sets to lead to image
retirement!
And it’s not that we don’t trust our users to produce the
best classifications they can. These images can be hard, even for experts! Some
animals are particularly tricky to identify - is that long, furry, dark thing you’re
looking at in a night picture a fisher or a raccoon? Some are difficult to see
or count! We often miss birds in the trees if there’s a big deer center-frame,
or don’t pick up on that fourth deer hiding off behind a bush in the distance. There’s
variation in capturing behavioral performances (a deer foraging only in the
third image in a set might be missed by users only looking at the first
picture) that comes out in the wash when averaging multiple user IDs. The more
eyes on a picture, the better our ultimate answers. This also means you shouldn’t
be afraid to take your best guess if you’re not 100% certain of a classification
or a count! Aggregating Zooniverse answers in this way produces classification
data that has been shown to be as good as (or even better than!) data produced
by a single expert researcher.
Whose ear is this? Take your best guess - along with 14 other volunteers!
But, we also know that humans are much better at identifying
particular animals or things than others. While you might struggle a bit with the
fisher vs. raccoon conundrum, you’re generally going to be spot-on at picking
out humans, cars, and of course, our ever-present deer. In order to focus as
much of our volunteer’s effort on the really difficult classifications, we’ve
modified the “retirement rules” for blank images and those that contain humans,
vehicles, or deer, to shift these pictures out of the pool sooner (while still
having high confidence that they have been correctly identified). If the first
five classifications or ten total classifications come in on an image with no
animals as “nothing there”, it gets retired and removed from circulation. The
same goes for images classified as containing humans or deer. In the future, we’ll have an
artificial intelligence algorithm that identifies these images before they ever
go online, leaving only the most interesting and challenging pictures for human
brains! (stay tuned for more about images and AI in a future blog post).
More questions about retirement or Zooniverse statistics?
Leave them in the comments below!
Comments
Post a Comment