Monday, June 26, 2017

What happened to Decanter when it changed its points scoring scheme

In a previous post (How many wine-quality scales are there?), I noted that at the end of June 2012 Decanter magazine changed from using a 20-point ratings scale to a 100-point scale for its wine reviews (see New Decanter panel tasting system). In order to do this, they had to convert their old scores to the new scores (see How to convert Decanter wine scores and ratings to and from the 100 point scale).

It turns out that there were some unexpected consequences associated with making this change, which means that this change was not as simple as it might seem. I think that this issue has not been appreciated by the wine public, or probably even the people at Decanter, either; and so I will point out some of the consequences here.


We do expect that a 20-point scale and a 100-point scale should be inter-changeable in some simple way, when assessing wine quality. However, there is actually no intrinsic reason why this should be so. Indeed, Wendy Parr, James Green and Geoffrey White (Revue Européenne de Psychologie Appliquée 56:231-238. 2006) actually tested this idea, by asking wine assessors to use both a 20-point scale and a 100-point scale to evaluate the same set of wines. Fortunately, they found no large differences between the use of the two schemes, for the wines they tested.

This makes it quite interesting that when Decanter swapped between its two scoring systems it did seem to change the way it evaluated wines. This fact was discovered by Jean-Marie Cardebat and Emmanuel Paroissien (American Association of Wine Economists Working Paper No. 180), in 2015, when they looked at the scores for the red wines of Bordeaux.

Cardebat & Paroissien looked at how similar the quality scores were for a wide range of critics, and then compared them pairwise using correlation analysis. If all of the scores between any given pair of critics were closely related then their correlation value would be 1, and if they were completely different then the value would be 0; otherwise, the values vary somewhere in between these two extremes. Cardebat & Paroissien provide their results in Table 3 of their publication.

Of interest to us here, Cardebat & Paroissien treated the Decanter scores in two groups, one for the scores before June 2012, which used the old 20-point system, and one for the scores after that date, which used the new 100-point system. We can thus directly compare the Decanter scores to those of the other critics both before and after the change.

I have plotted the correlation values in the graph below. Each point represents the correlation between Decanter and a particular critic  — four of the critics have their point labeled in the graph. The correlation before June 2012 is plotted horizontally, and the correlation after June 2012 is plotted vertically. If there was no change in the correlations at that date, then the points would all lie along the pink line.

Change in relationship to other critics when the scoring system was revised

For two of the critics (Jeff Leve and Jean-Marc Quarin), there was indeed no change at all, exactly as we would expect if the 20-point system and 100-point system are directly inter-changeable. For seven other critics the points are near the line rather than on it (Tim Atkin, Bettane & Desseauve, Jacques Dupont, René Gabriel, Neal Martin, La Revue du Vin de France, Wine Spectator), and this small difference we might expect by random chance (depending, for example, on which wines were included in the dataset).

For the next two critics (Robert Parker, James Suckling), the points seem to be getting a bit too far from the line. At this juncture, it is interesting to note that the majority of the points lie to the right of the line. This indicates that the correlations between Decanter and the other critics were greater before June 2012 than afterwards. That is, Decanter started disagreeing with the other critics to a greater extent after they adopted 100 points than before; and they started disagreeing with Parker and Suckling even more than the others.

However, what happens with the remaining two critics is quite unbelievable. In the case of Jancis Robinson, before June 2012 Decanter agreed quite well with her wine-quality evaluations (correlation = 0.63), although slightly less than for the other critics (range 0.63-0.75). But afterwards, the agreement between Robinson and Decanter plummeted (correlation = 0.36). The situation for Antonio Galloni is the reverse of this — the correlation value went up, instead (from 0.32 to 0.56). In the latter case, this may be an artifact of the data, because only 13 of Galloni's wine evaluations before June 2012 could be compared to those of Decanter (and so the estimate of 0.32 may be subject to great variation).

What has happened here? Barring errors in the data or analyses provided by Cardebat & Paroissien, it seems quite difficult to explain what has happened here. Mind you, I have shown repeatedly that the wine-quality scores provided by Jancis Robinson are usually at variance with those of most other critics (see Poor correlation among critics' quality scores; and How large is between-critic variation in quality scores?), but this particular example does seem to be extreme.

For the Cardebat & Paroissien analyses, both Jancis Robinson and Antonio Galloni have the lowest average correlations with all of the other critics, with 0.46 and 0.45, respectively, compared to a range of 0.58-0.68 for the others. So, in this dataset there is a general disagreement between these two people and the other critics, and also a strong disagreement with each other (correlation = 0.17). It is thus not something that is unique to Decanter, but it is interesting that the situation changed so dramatically when Decanter swapped scoring schemes.

References

Jean-Marie Cardebat, Emmanuel Paroissien (2015) Reducing quality uncertainty for Bordeaux en primeur wines: a uniform wine score. American Association of Wine Economists Working Paper No. 180.

Wendy V. Parr, James A. Green, K. Geoffrey White (2006) Wine judging, context and New Zealand sauvignon blanc. Revue Européenne de Psychologie Appliquée 56:231-238.

Monday, June 19, 2017

Yellow Tail — wine imports into the USA do fit a "power law"

Some weeks ago I posted a discussion of whether sales by US wine companies fit the proverbial "power law". The Power Law is used to describe phenomena where large events are rare but small ones are quite common. I concluded that US wine sales in 2016 did, indeed, fit a Power Law, with the exception of the largest company, E&J Gallo Winery. To fit in with the rest of the wine companies, E&J Gallo should have sold c. 3.5 times as much wine as it actually did sell. Apparently, it is rather hard to dominate US domestic wine sales in the way predicted by a simple Power Law.


Power Laws are of interest because of their practical consequences. For example, the 80:20 Rule (or Pareto Principle) is one example of a Power Law, which says that for many events, roughly 80% of the effects come from 20% of the causes.

Power Laws are considered to be universal, and so there is no reason why they should not exist in the wine industry. One of the more obvious places that we might expect to find them is in wine sales — there are likely to be a few wines that sell very well and lots of smaller sales. As I showed in the earlier post, this appears to be generally true for domestic wine production in the USA; and so it is of interest to see whether it also applies to imported wines.

Yellow Tail and the Power Law

Currently, the biggest-selling imported wine in the USA is Yellow Tail (from Casella Wines, in Australia), with more than 8 million cases shipped to the US per year. This would place it at no. 9 in the current Wine Business Monthly top-30 list of wine companies in the USA. In July 2016, The Drinks Business placed Yellow Tail at no. 6 in its list of the Top 10 biggest-selling wine brands in the world, based on sales in 2015.

Unfortunately, I do not have a list of the sales of imported wine in the USA for any of the most recent years. However, in a presentation at the U.S. Beverage Alcohol Forum, which is part of the Wine & Spirits Wholesalers of America annual convention, Mike Ginley provided the US sales data for the top 25 imported table-wine brands in 2012. So, I will use this dataset for the analysis.

As I noted for for the previous analysis, one special case of the Power Law is known as Zipf's Law, which refers to the "size" of each event relative to it's rank order of size. This is what we are looking at here. For each wine brand, the "size" is the number of cases of wine sold during 2012, and the brands are listed in rank order of their sizes (largest to smallest). The standard way to evaluate the Zipf pattern is to plot the data with both axes of the graph converted to logarithms. Under these circumstances, the data should form a straight line.

Here is the graph of the 2012 sales data for the top 25 imported wine brands. Only the best-selling wine is labeled.

A Power Law fitted to the sales of wines imported to the USA

As you can see, all of the data lie roughly along a straight line, and thus do indeed fit a Power Law. That is what we would expect.

However, it is worth noting here that all of the wine brands do fit the same Power Law, including Yellow Tail. This is different from what we found for the domestic wines (where the no. 1 winery under-performed relative to the Power Law model). Indeed, the Power Law indicates that Yellow Tail actually sold 28% more cases than would be expected from the sales of the other wine brands. So, in 2012 Yellow Tail slightly out-performed the expectation from the mathematical model, whereas E&J Gallo greatly under-performed the expectation in 2016.

It is also worth noting the presence in the 2012 top-25 list of some of the best-selling wines from 30 years earlier. The data for 1980 and 1981 are provided in an article from the New York Times (Lambrusco rates high with U.S. consumers). The imported wine brands that have managed to hang on over the decades are: Riunite (no. 5 in 2012, but no. 1 back in 1980 & 1981), Folonari (12 now vs. 4 then), Bolla (18 vs. 3) and Cella (20 vs. 2). In 2012, these brands sold only 20-50% of their 1981 case sales, which is why they have dropped down the ranking.

Previous top-10 imported wine brands that have fallen by the wayside include: Zonin, Giacobazzi, Blue Nun, Mateus, Yago, and Lancers. Perhaps you remember some of them?

Monday, June 12, 2017

Why is wine often cheaper in Sweden than elsewhere?

In spite of considerable complaining by certain Swedes, a lot of wines are cheaper in Sweden than elsewhere in the European Union (EU), particularly European wines. Furthermore, Australian wines are sometimes cheaper in Sweden than they are in Australia; and occasionally even US wines can be cheaper than in the USA. This happens as a direct result of wine retail economics, and the fact that Sweden has a single government-owned retail chain for alcohol sales.

Not all wine is cheaper in Sweden, of course, but the ones I am interested in usually are cheaper; and so I thought that I would write about it.


The bottle shop / liquor store / off-licence (depending on your English idiom) is called Systembolaget (which translates as The System Company), and is wholly owned by and operated on behalf of the Swedish government. It has a monopoly on retail sales in Sweden, but not trade sales (for which there are several hundred importers), nor private imports from elsewhere within the EU.

Since I live in Sweden, I principally get my wine from Systembolaget, but I also get wine sent to me from elsewhere in the EU. I often read reviews written by people in the USA, and check up on their recommendations; and I am interested in Australian wine, since that is what I learned first. It is for these reasons that I am familiar with the prices of wines both inside and outside Sweden, and I can thus make direct comparisons of the prices of the same wine in several countries.

I therefore make the categorical statement that fine wine is cheaper in Sweden than most other places into which it is imported (see the example at the end of the post). But not cheap wine — that is often less expensive elsewhere.

Wine economics

Several people have looked at the economics of wine retail in the United Kingdom, but not so many in the USA. The latter is possibly because bottle prices can vary from state to state, due to differences in taxes, plus the economics of the three-tier distribution system. Economics in the USA is not always a simple thing!

So, as my example of the economics of wine retail sales, I will use the UK, because the situation is simpler. As far as I can tell, the basic economics are no different in most other places, although the actual percentages will vary somewhat. [Note: The UK government has recently announced an increase in excise duty on alcohol; and the Average price of a bottle of wine in UK has reached a new high thanks to Brexit. Neither of these facts affects my analysis.]

The economic breakdown of the price of a bottle of wine in the UK has been dissected independently on several blogs:
The Bibendum analysis has been updated yearly, and so I will use their data for March 2017:
Their analysis breaks down the bottle cost into these components: retailer margin, excise duty, value added tax (VAT), packaging, logistics, and the wine itself. They do this for bottles with four different retail prices.

In the first graph I have plotted the percentage of the UK final bottle price that goes to the retailer and to the winery. For comparison, £10 ≈ $12 ≈ 110 kronor.

Retailer and manufacturer margins for a bottle of wine in the UK

As you can see, the margins for the retailer and manufacturer increase as the bottle price increases — neither of them makes as much money on a cheap bottle of wine as they do on an expensive wine, both in straight money terms and as a percentage profit. Furthermore, the retailer is the one making the most money on wines less than about £15 ($20).

This same economics may not directly apply to large supermarket chains, which frequently market their own-label wines. In these cases, the relationship between the manufacturer and the retailer is blurred. This also applies in the USA, where it has been noted (Reverse Wine Snob, by Jon Thorsen. 2015):
Costco's average margin (per their financial filings) is about 12 percent. Costco has stated that the highest margin they will take on a non-Costco brand is 13 percent and they strive to keep it closer to 10 percent. On private label items (Kirkland Signature) they will go up to 15 percent margin, but of course the price is still lower than other brands because they cut out the middleman.
Sweden

We can now compare the UK economic model to that used by Systembolaget in Sweden. Their model has been a fixed price per bottle, which differs for different products (beer/cider, wine, spirits), plus a fixed percentage. Up to 1 March 2017, the fixed price for wine was 3.5 kr (£0.3) + 19%; from that date it has been 5.2 kr (£0.45) + 17.5%. I have plotted both of these models onto the next graph.

Retailer and manufacturer margins for a bottle of wine in the UK and Sweden

It is now easy to see why wine is cheaper in Sweden, except for the most inexpensive wines. If we define "good wine" as anything above £10 ($12), then Swedes are doing very well, indeed; and the more expensive the wine, the better off they are. The reason for this is quite straightforward — Systembolaget's stated goal is: "To minimize alcohol-related problems by selling alcohol in a responsible way, without profit motive." Needless to say, I am quite pleased with this situation, as a buyer of fine wines.

However, it is also easy to see why a lot of Swedes might complain. They are no different to wine drinkers anywhere else, and therefore a lot of wine purchases are at the inexpensive end of the market. For example, according to Systembolaget, in the first 3 months of this year 35% of wine sales were less than 80 kr (£7, $9) per bottle. At this price, wine in Sweden is not as cheap as elsewhere, and Swedes know it; and as you can see in the graph, it recently got noticeably more expensive, as well.

Systembolaget addresses this issue by virtue of being one of the largest alcohol retail chains in the world (reportedly third, behind Tesco, in the UK, and the Liquor Control Board of Ontario, in Canada). This position gives it a lot of bargaining power with the manufacturers and importers. In fact, Systembolaget puts a lot of the most inexpensive wines directly out to tender (as do their equivalents, ALKO, in Finland, and Vinmonopolet, in Norway) — you can see the current list of tenders here. (Note that not everyone is necessarily impressed with this idea.)

Finally, it is worth noting that most of the other bottle costs are similar in Sweden and the UK. For example, the excise duty that is imposed on alcohol in the UK is currently a fixed £2.16 per bottle of wine, while the Swedish alcohol tax is currently 26 kr (£2.30). However, the UK goods and services tax (VAT) is 20%, compared to the Swedish (moms) of 25% — this government tax significantly offsets the reduced retailer margin in Sweden. Sigh.

Note: The excise rates for alcohol in Sweden and the UK are among the highest in the EU, along with Ireland and Finland (see AAWE). On the other hand, EU goods and services taxes generally vary between 20 and 25%.

Example

The next graph shows the advertized price (on April 14, 2017) of a single bottle of Seghesio Family Vineyards Cortina Zinfandel 2013 (from California), for eight US stores, three UK stores, and Systembolaget. The Swedish price includes delivery to the nearest service point in Sweden (438 shops plus c. 500 drop-off locations), but the others exclude delivery.


The US price depends on the store location, with the highest price being 25% greater than the lowest price. The Swedish price is equal to the maximum US price, while being 5-10% less than the UK prices.

Monday, June 5, 2017

How many 100-point wine-quality scales are there?

In the previous post (How many wine-quality scales are there?) I discussed the range of ratings systems for describing wine quality that use 20 points. However, perhaps of more direct practical relevance to most wine drinkers in the USA is the range of systems that use 100 points (or, more correctly, 50-100 points).

The 100-point scale is used by the most popular sources of wine-quality scores, including the Wine Spectator, Wine Advocate and Wine Enthusiast; and so wine purchasers encounter their scores almost every time they try to purchase a bottle of wine. But how do these scores relate to each other? Using the metaphor introduced in the previous post, how similar are their languages? And what do we have to do to translate between languages?


All three of these popular scoring systems have been publicly described, although I contend that it might be a bit tricky for any of the rest of us to duplicate the scores for ourselves. However, there are plenty of other wine commentators who provide scores without any explicit indication of how they derive those scores. This means that some simple comparison of a few of the different systems is in order.

As explained in the last post, in order to standardize the various scales for direct comparison, we need to translate the different languages into a common language. I will do this in the same manner as last time, by converting the different scales to a single 100-point scale, as used by the Wine Advocate. I will also compare the quality scales based on their scores for the five First Growth red wines of the Left Bank of Bordeaux, as I did last time.

The scales for nine different scoring systems are shown in the graph. The original scores are shown on the horizontal axis, while the standardized score is shown vertically. The vertical axis represents the score that the Wine Advocate would give a wine of the same quality. If the critics were all speaking the same language to express their opinions about wine quality, then the lines would be sitting on top of each other; and the further apart they are, the more different are the languages.

Nine different 100-point wine-scoring systems

There are lots of different lines here, which indicates that each source of scores uses a different scheme, and thus is speaking a different language. Many of the lines are fairly close, however, and thus many of the languages are not all that different from each other. Fortunately for us, they are most similar to each other in the range 85-95 points.

First, note that the line for the Wine Spectator lies exactly along the diagonal of the graph. This indicates that the Wine Advocate and the Wine Spectator are using exactly the same scoring system — they are speaking the same language. In other words, a 95-point wine from either source means exactly the same thing. If they give different scores to a particular wine, then they are disagreeing only about the quality of the wine — this is not true for any other pair of commentators, because in their case a different score may simply reflect the difference in language.

It is worth noting that almost all of the Wine Advocate scores came from Robert Parker, while most of the Wine Spectator's were from James Suckling, along with a few from Thomas Matthews, James Molesworth and Harvey Steiman (who have all reviewed the red wines of Bordeaux for that magazine), plus some that were unattributed.

Second, the line for the Wine Enthusiast always lies below the diagonal of the graph. This indicates that the Wine Enthusiast scores are slightly greater than those of the Wine Advocate (and Wine Spectator) for an equivalent wine. For example, if the Enthusiast gives a score of 80 then Parker would give (in the Advocate) 78-79 points for a wine of the same quality. This situation has been noted in Steve De Long's comparison of wine scoring systems, although it is nowhere near as extreme as he suggests.

Third, the line for Stephen Tanzer always lies above the diagonal of the graph, indicating that his scores are usually slightly less than those of the Wine Advocate (and Wine Spectator). Indeed, a 100-point Parker wine would get only 98-99 points from Tanzer.

All of the other lines cross the diagonal at some point. This indicates that sometimes their scores are above those of the Advocate and sometimes they are below. Interestingly, most of these systems converge at roughly 91 points, as indicated by the dashed line on the graph. So, a 91-point wine means more-or-less the same thing for most of these commentators (except Tanzer and the Enthusiast) — it is the only common "word" in most of the languages!

The most different of the scoring schemes is that of James Suckling, followed by those of Jeannie Cho Lee and Richard Jennings (which are surprisingly similar). Suckling is a former editor of Wine Spectator, and he actually provided most of the scores used here for that magazine — this makes his strong difference in scoring system on his own web site particularly notable, as it implies that he has changed language since departing from the Spectator.

Finally, it is important to recognize that all I have done here I have evaluate the similarity of the different scoring systems. Whether the scores actually represent wine quality in any way is not something I can test, although I presume that the scores do represent something about the characteristics of the wines. Nor can I evaluate whether the scores reflect wines that any particular consumer might like to drink, or whether they can be used to make purchasing decisions. Nor can I be sure exactly what would happen if I chose a different set of wines for my comparisons.

Conclusions

The short answer to the question posed in the title is: pretty much one for each commentator, although some of them are quite similar. Indeed, the Wine Spectator and the Wine Advocate seem to use their scores to mean almost the same thing as each other, while the Wine Enthusiast gives a slightly higher score for a wine of equivalent quality.

While there are not as many wine-quality rating systems as there are languages, the idea of translating among them is just as necessary in both cases, if we are to get any meaning. That is, every time a wine retailer plies us with a combination of critics' scores, we have to translate those scores into a common language, in order to work out whether the critics are agreeing with each other or not. Different scores may simply reflect differences in scoring systems not differences in wine quality; and similarity of scores does not necessarily represent agreement on quality.

Averaging the scores from the different critics, as is sometimes done, notably by Wine-Searcher and 90plus Wines, is unlikely to be a valid thing, mathematically. Given the results from this and the previous post (How many wine-quality scales are there?), calculating a mathematical average score would be like trying to calculate a mathematically average language. Jean-Marie Cardebat and Emmanuel Paroissien (American Association of Wine Economists Working Paper No. 180. 2015) have correctly pointed out that the different scoring systems need to be converted to a common score (ie. a common language) before any mathematics can be validly applied to them.