I've received several e-mails from birders who expressed concern about "confirmation bias" affecting data, in what I posted on the topic of Tropical vs. Couch's Kingbirds.
The main concerns as I understood them were:
(1) Wouldn't "false positives" for Tropical Kingbirds distort the seasonal range maps produced from eBird?
(2) Wouldn't "false negatives" for Couch's Kingbirds have a similar but opposite effect?
Given that these two species are practically indistinguishable in the field, except by vocalizations, these are reasonable questions. You can't answer these questions in absolute terms. But (as I stated earlier), all useful science operates in the realm of probability.
The concept of "tolerance" is also important for scientific investigation. By that I don't mean "getting along with people who are different from you," though that can be important too. What I mean is, understanding what is a reasonable tolerance for errors, in your experimental design. Any good scientist needs to consider this question, when they plan their program for measurements or field observations. "Perfect" measurements or observations can never be achieved.
If you're too demanding in your expectations for data on a given scientific question, you're going to burn through a lot of extra money (if you're well funded) or, in the more typical case for bird investigations, a lot of extra volunteer energy.
An anecdote: In my first job as a research technician, about 37 years back, on my very first day on the job, I had a post-doctoral researcher fresh out of Cornell come to where I was working, to tell me about how I should prepare some specimens. I asked him, "What's the tolerance?." At first he just looked puzzled, then when I explained that "tolerance" meant how close I needed to come, plus or minus, on the dimensions of the samples. Maybe 2 or 3 thousandths of an inch on a 3-inch-long sample? His response was, "No, they need to be exact."
I asked the research machinist who I was working with how to handle this, and he gave me some more reasonable tolerances that were both achievable and would meet the neds of the experiment. He also advised me not to tell the young post-doc from Cornell. As I spent more time in that capacity, I learned an expression, "Good enough for Berkeley," that was common to the practical staff who had experience in dealing with high-level scientists, including many Nobel prize winners. The campus street in front of our building was full of signs reserving parking spots for Nobel prize winners (we had dozens of them) but one thing I learned, for every one of those signs there were 3 or 4 skilled research technicians who had learned to convert unreasonable designs into practical terms.
So coming back to birds: Now we have eBird which (coincidentally or not) also springs out of Cornell. The system is designed to capture as much data as possible, though the eventual application of all of those data is still unclear.
Without knowing the final application, how do we know what TOLERANCE is acceptable? Some folks keep pushing for 100% accuracy and I can assure you, they're not going to get it. That's just not the way the real, nitty-gritty world works. What we need is a sense of what's "good enough for Berkeley" even if it might not meet all of the demands of the folks from certain Ivy League schools.
How do we measure what's "good enough"? Probability is our guide.
Suppose that there was an equal probability of Couch's vs. Tropical Kingbirds showing up along the Pacific Northwest Coast. In that case, what are the chances that 12 straight vocal birds (as Bob O'Brien turned up in his search of eBird reports) would all sound like Tropical Kingbirds? That's an easy calculation:
p = (0.5)^12 = 0.00024 or 0.024 %
So the hypothesis that Couch's and Tropical Kingbirds are equally probable can be rejected at a 99.97% level of confidence.
Further, recognizing that (spoiler alert!) not all relevant records are in eBird since eBird is still a pretty new thing, based on things I've heard there are probably more like 30 records of vocal Tropical Kingbirds in Oregon, and zero records of Couch's Kingbird. Suppose that the real ratio of Tropical:Couch's Kingbird is 1:9. What are the odds of 30 straight reports of vocal Tropical Kingbirds and zero Couch's Kingbirds? Again, this is an easy calculation:
p = (0.9)^30 = 0.042 or 4.2 %
So the hypothesis that even 1 out of 10 Couch's/Tropical Kingbirds in Oregon is in fact a Couch's can be rejected at a 95% level of confidence.
To run this out a little further, I took a look through a few eBird records from Washington and British Columbia. In a quick look through a small sample (they've had lots more records than Oregon) I found 23 that had notes of vocalizations supporting Tropical Kingbird, and only one that (arguably) might support Couch's Kingbird, as the observer heard just two sharp pips.
Let's suppose that last observation should be counted as a Couch's Kingbird. Again, if the Tropical:Couch's ratio is 9:1, for roughly 54 observations of vocal TR/COKIs with at most one COKI, the probability is:
p = (0.9)^53 x 0.1 x 54 = 0.020 or 2%
So the hypothesis that even 1 out of 10 Couch's/Tropical Kingbirds in the Pacific Northwest is in fact a Couch's can be rejected at a 98% level of confidence.
Coming back the the original questions, what does this mean?
For question (1) (wouldn't "false positives" for Tropical Kingbirds distort the seasonal range maps produced from eBird?), the answer is NO, unless you think a 1 in 10 identification error will swamp all of the other sources of error in eBird.
For question (2) (wouldn't "false negatives" for Couch's Kingbirds have a similar but opposite effect?) the answer is yes. But mainly because Couch's Kingbirds are so improbable, that any outlier could radically change the map. I question whether this is of any significance whatsoever for bird conservation.