Judging the Judges: Study Shows Wine Judges Aren't That Reliable (or does it...?)

The Journal of Wine Economics has just published a study authored by Robert T. Hodgson titled An Examination of Judge Reliability at a major U.S. Wine Competition. The reported findings should make the fodder for about 10,000 wine blog articles over the next few weeks.

The study tracked the ability of wine competition judges to replicate the scores that they gave to wines (during blind tasting competition) at the California State Fair. The study found that (emphasis is mine):

…judges were perfectly consistent… about 18 percent of the time. However, this usually occurred for wines that were rejected. That is, when the judges were very consistent, it was often for wines that they did not like…

Let the blood-letting commence!

I fear that the media will take hold of this and start to sound the death knell for the ability of so-called experts to taste and rate wines (again), or use it to shake up an already arguably unfavorable view that wine appreciation and competition is the height of snobbery.

Neither are true, and this study does little to bolster either point. Why? Because wine tasting is, at its heart, heart a subjective exercise.

The study is clear on its intentions, which was not to shake up the world of wine competition, but to “provide a measure of a wine judge’s ability to consistently evaluate replicate samples of an identical wine. With such a measure in hand, it should be possible to evaluate the quality of future wine competitions using consistency as well as concordance with the goal to continually improve reliability and to track improvements associated with procedural changes…”

To understand why this study doesn’t ring so true with me, I need to give you a little detail on the mechanics of the study:

When possible, triplicate samples of all four wines were served in the second flight of the day randomly interspersed among the 30 wines. A typical day’s work involves four to six flights, about 150 wines… The judges first mark the wine’s score independently, and their scores are recorded by the panel’s secretary. Afterward the judges discuss the wine. Based on the discussion, some judges modify their initial score; others do not. For this study, only the first, independent score is used to analyze an individual judge’s consistency in scoring wines.

In summary: the judges weren’t consistent when faced with tasting hundreds of wines in a day, and there revised scores (based on panel discussion – which can have a huge impact on how you would evaluate a wine) weren’t used.

If the study proves anything, I think shows that trying to judge hundreds of wines in a day is a first-class non-stop ticket to palate fatigue, even for experienced wine judges.

Now that I think about it, blind tasting is so notoriously difficult that I give the judges in this study credit for being consistent almost 20% of the time. That would be a respectable hitting percentage in baseball (not sure… I don’t follow baseball actually)…

While the media may latch onto this one, the study hinted that there is some modicum of possible salvation for the madness surrounding wine competitions in general – not by way of wine judges, but by way of the ultimate judges of wine: the Consumer.

…a recent article in Wine Business Monthly (Thach, 2008) conducted as a joint
effort by 10 global universities with specialties in wine business and marketing found that consumers are not particularly motivated by medals when purchasing wine in retail stores. If consumer confidence is to be improved, managers of wine competitions would be well advised to validate their recommendations with quantitative standards.

Interesting conclusion. And a hopeful one.

Cheers!
(images: legaljuice.com, wine-economics.org)

8 thoughts on “Judging the Judges: Study Shows Wine Judges Aren’t That Reliable (or does it…?)”

Thanks Tish – totally with you on that; great comments!

What does the inconsistency of judges really tell us? Tasters are human; tasting is subjective; taste a lot of wines at one sitting and the results are hard to replicate. Seems to me to be a perfect rationale for us to discount glossy mag wine ratings even more than competitions.

I am not a fan of “medals” usually, but at least major competitions include some give and take among multiple palates on a panel; that tempers the fallibility. In “rating” situations, it’s one person, one time through, and the score becomes as permanent as a tattoo. As this study implies, same taster sampling same wine at a different time would give it a different number. Not fair to the wine, I’d say. WHich is why I trust good retailers and wine-loving colleagues more than any “blind” critic.

Its not the inconsistancy in and of itself which is cause for concern, its the standard deviation of 14 points which becomes an issue. Two or Three points is, in my mind, totally acceptable. wines change in the glass, and palettes are influences by the wine before. But a 14 point swing? Makes me question the choice of judges, and wine raters in general.

As for medals not selling wine, that’s no suprise, but sometimes its about the winery getting thier name out. Still, this study shows its an expensive risk at $75 – $150 per entry.

Hey Chris – I see that someone remembers their stas. class ;-). ANOVA, anyone?

I understand your concerns. I am still having a hard time getting past the volume of wine – if you’re going to have 100+ wines in a competition, can’t they rotate judges and give some folks a palate break? That would seem fairer (to me anyway) for the wines in the competition.

Judging wine is subjective if there are no definitions of exactly what are the desired medal winning qualities or flavor profiles the judges are tasked with rating. These competitions will always be worthless if done by judges chosen for their restaurant, or their retail shop, or their ability to write, or their ability to sell wine, or administrate a company. When judges with formal training in sensory analysis at the University level and proven abilities of accuracy and consistency are chosen, this might change. Until then, from what I have seen in publications and competitions, wines are for the most part will continue to be rated by amateurs.

Thanks, Morton. Wonder if the MW would qualify… which would seriously limit the number of judges I suppose…

Not only is judging wine subjective, but so is rating wine – let us not forget that fact as well . . .

Palata fatigue is something we should ALL be concerned with. Too many wines tasted at one setting is certainly problematic – and they really should cut down on the number of wines tasted in a day at events like this – period.

The other thing NOT mentioned in this study is variability in the wines themselves. Bottle variation in MOST wines is huge – and therefore the same wine tasted four times from four different bottles COULD taste and smell quite different.

And I would tend to disagree with the study re: medals/awards – consumers generally flock to ‘award winners’ in ALL catgories, including wines.

Cheers!