Wizards loves to talk about the openness and diversity of Modern. Both the GP Charlotte and GP Copenhagen Day 2 Metagame Breakdowns made claims like this about the format, stating that Modern was “packed with a diversity of archetypes” and that the breakdown “shows the immense diversity of the Modern format.” Although we Modern players like to tout our format’s diversity, sometimes the Day 2 metagames don’t look as diverse as we want them to. GP Copenhagen was no exception to this, with a Day 2 field that was 15.8% “Splinter Twin” in its many variants, following a trend set the previous weekend at GP Charlotte where 17.7% of the Day 2 field was on some variant of Twin. Can fields like this really be labeled “diverse” or “open”, even if some rogue Lantern of Insight, Death’s Shadow, and Ad Nauseam decks are showing up?
The purpose of this article is to try and quantify the hazy definition of “format diversity”, especially as it pertains to Grand Prix events. Drawing on data from GPs since October 2014, this article compares different Day 2 metagames across Modern’s recent history. We can then situate GP Copenhagen and Charlotte in this broader picture, seeing if they are really as diverse as Wizards claims. As with any statistics article, I want to take us through not just the results but also the method of reaching those results and why that method makes sense. Hopefully, this approach will help you conduct similar analyses of your own.
If you’re like most Modern players, you’ve probably characterized a metagame as diverse, stagnant, warped, or some other similar adjective at one point in time. But it’s often unclear what underlies these characterizations, and the terms often come off as arbitrary. Not even Wizards gets it right, especially without the benefit of hindsight. Here’s a real gem from the Day 2 metagame of a recent Grand Prix, quoted from the mothership no less: “All in all it looks like Modern is still a very diverse format. And rumor has it one player is even running a Sliver deck today.” Recognize the GP? That’s from the paragon of diversity itself, GP Milan, a December 2014 event during the height of Treasure Cruise[/mtg_Card]’s and Birthing Pod‘s reign. Maybe Wizards was just being optimistic (and to some extent, the people who write those metagame articles aren’t necessarily representing Wizards-wide views), but if that kind of misevaluation can happen during December 2014, it makes us question the reliability of terms like “diverse”.
This gets at the importance of having consistent, transparent, and supported benchmarks for loaded terms like “diverse” and “warped”. Ideally, these benchmarks would be quantitative markers that we could look to in any given metagame, e.g. if a deck is over N% of the event, then that tournament wasn’t very diverse. But there are a few dangers here. First, we can’t just choose a number that “seems” high, like 15% or 20%. This kind of gut-instinct approach gets us in the exact same trouble we were in when we used terms like “diverse” or “open”: it’s important to follow our intuition, but we need to balance that against being arbitrary. Another option for calculating the N% cutoff is using metagame averages. This is by far the most common approach I see in other metagame breakdowns, but it also has the chance to be the most misleading. A hypothetical Metagame 1 with five decks with 23%, 7%, 5%, 5%, and 2.5% shares is totally different from Metagame 2 with 11%, 10%, 8%, 7%, and 6.5% shares, even though both metagames have the same average deck prevalence of 8.5%. But it’s clear that the first is completely warped around some monstrous 23% deck and the other is quite balanced.
Given these dangers, I don’t want us to think of a firm N1% cutoff. Instead, I want us to look at a range of values in between a lower N% and upper N2% bound of a metagame. That is also to say, decks in this metagame tended to fall between N1% and N2% of the Day 2 metagame. How do we define these bounds? By using the variance between all the different deck metagame shares. As an example, let’s look at the hypothetical metagames in the paragraph above. The hypothetical Metagame 1 has extremely high variance, with one deck at 23% and the next highest at 7%. Metagame 2, however, is much more clustered around the average 8.5% value. So instead of looking at single 8.5% cutoff, we construct a range of values around that average metagame share. For the more balanced Metagame 2, that would be a very reasonable 7% – 10%. For Metagame 2, it’s a much wilder 2% – 15% range. Metagames with narrow ranges tend to be much more balanced, where many decks are viable and nothing is dragging the range up. But if we get a metagame with a large range, that suggests we have some problematic decks polarizing the metagame. A Deathrite Shaman kind of problematic.
With this method set up, we can now turn to all the GPs for the past few year and see how GP Copenhagen and GP Charlotte stack up.
Metagame Share Ranges and GP Day 2s
Let’s apply this method to the Day 2s of these past tournaments. As some of you more statistically-inclined readers might recognize, this is another way of using the same confidence intervals that we use on the Top Decks page. The big difference today is that we are applying it to Grand Prix events and not to the general metagame. This distinction is important for three reasons. First, it means we are working with a population of decks and not a sample, which changes both the math itself and also our understanding of the numbers: there’s no uncertainty in what made Day 2 because we know all the decks. Second, it means we have fewer decks and “cases” (i.e. our N) than in the overall metagame. This makes the numbers harder to extrapolate from, but also concentrates the population around the decks that matter most at GPs (the big dogs like Twin, Affinity, etc.). Finally, GP dynamics are very different from those at a local event, which means a lot for things like breakers, random bad matchups, etc. This is one reason I don’t often perform this kind of analysis on Top 8 decks: the difference between 18th and 4th can often just be bad luck.
Using this approach, here are the metagame-share ranges for all GPs since July 2014. I have adjusted and edited Wizards’ breakdowns to both separate archetypes and expand categories. Also note that I exclude Pro Tour Fate Reforged because Modern decks made Day 2 based only partially on their Modern performance. For each event, I give the prevalence low-end, the high-end, and then the +/- margin around the average.
GP Day 2 Metagame Share Confidence Intervals
|1. GP Boston||1.7%||3.5%||.9%|
|2. GP Madrid||2.1%||4.3%||1.1%|
|3. GP MIlan||2.4%||6%||1.8%|
|4. GP Omaha||2.3%||5.1%||1.4%|
|5. GP Vancouver||1.8%||5.3%||1.7%|
|6. GP Charlotte||1.4%||3.3%||.9%|
|7. GP Copenhagen||2.3%||4%||.8%|
If we were to read this table for GP Boston, we would see that the middle range of deck prevalences is between 1.7% and 3.5%, with a range on that margin of .9%.
Looking over this table, we can quickly identify some themes. Day 2 metagames that were part of less balanced formats have much larger interval margins than the more balanced ones. GP Charlotte, which had dozens of strange decks and tier 2-3 contenders on Day 2, has one of the lowest margins at just .9%. By contrast, GP Milan, which took place at the height of the Pod/Cruise season, has a much higher margin at 1.8%. Higher margins suggest very polarized metagames with lots of upper-end outliers (e.g. Pod and Delver in December 2014). Lower margins suggest much more open metagames where lots of decks are clustered around a central range.
The second indicator I notice is in the relative sizes of the low-end and high-end ranges. The larger the high-end range, the more polarized that event was to the most-played decks. Here, we see GP Milan with its warpage towards Delver and Pod (high-end range of 6%), and GP Vancouver with tons of Abzan and Twin (high-end range of 5.3%). By contrast, more open metagames like GP Boston and GP Charlotte have much smaller high-end ranges, 3.5% and 3.3% respectively. We can also see this in the low-end ranges. When decks have really large low-end ranges, like GP Milan’s 2.4%, this suggests there wasn’t a lot of action happening at the bottom of Day 2. Compare this with GP Charlotte, with a low-end range of 1.4%: there were a ton of less-played decks bringing down the range.
GP Copenhagen and Day 2 Diversity
Based on all this, where does GP Copenhagen fall in the mix? Or GP Charlotte, another recent event that was lauded as one of Modern’s most open fields in months?
From the perspective of metagame-range margin, GP Copenhagen is actually the most diverse, followed closely by GP Charlotte. With a .8% and .9% margin respectively, these events were not at all polarized around a few decks. This is in stark contrast to something like GP Vancouver, where a huge subset of the field was on Abzan and that brought up the margin significantly. Looking back to Copenhagen and Charlotte, this quantitative assessment fits our qualitative understanding of the different events. Any event where you have Merfolk, Scapeshift, Ad Nauseam, Griselbrand, etc. as viable decks is a very diverse one. It’s when you are stuck on the top-tier decks like Abzan, Jund, Affinity, etc. that the margin widens and the Day 2 starts to look much less diverse. So in that regard, both GP Copenhagen and GP Charlotte were quite successful.
What about the low-end ranges? Remember that low-end ranges are suggestive of how many less-common decks made Day 2, i.e. decks like Martyr Proc, Mill, Mono U Tron, etc. with only a handful of pilots (or even just 1). Surprising no one, GP Charlotte is the hands-down winner here, with a low-end of 1.4%. This perfectly reflects all the tier 3 or lower decks we saw at the event, and all the buzz around Charlotte as being so diverse. GP Copenhagen, however, has a much larger low-end margin at 2.3%. To me, this indicates a metagame where there weren’t a lot of low-end outlier decks, with most people piloting more established builds in tier 1 or tier 2. The Day 2 metagame breakdown for Copenhagen also indicates this, with a lot of familiar faces and not a lot of decks with only 1-2 pilots. This points to GP Copenhagen being less diverse at the bottom than it otherwise could have been. We don’t see the same crazy decks that we did at Charlotte, although there are some standouts here like Dredgevine and Death and Taxes.
The last indicator of Day 2 diversity is the high-end range value. This is again where we see the influence of polarizing decks: GP Vancouver and GP Milan have the largest high-end values because of their warpage around Abzan and Delver/Pod. GP Copenhagen and GP Charlotte, however, are much better. Again, Charlotte stands out as being the most diverse, with a really small high-end value compared with the rest of the GPs in the table (3.3%). Copenhagen is right behind with a 4% high-end value. Just looking over the events, these assessments make perfect sense. Neither GP was dominated by one particular deck-type, even if they did have archetypes that saw more play than others. In Copenhagen’s case, this does lead to the question of Twin decks and their metagame role (more on this point in a second). But first, the Twin share isn’t nearly as problematic as we have seen in past metagames, even if we do group them all. And second, the rest of the event was much more open around decks like Merfolk, Grixis Control, Naya Company, Tron, and a number of other strategies that haven’t received a lot of press until recently.
Deck Supertypes and Next Steps
When classifying decks, one of the most controversial decisions is whether or not to group decks by supertypes. Should we talk about Splinter Twin decks or keep them separate as UR Twin, Temur Twin, and Grixis Twin? Is BGx one archetype? Or is there something to be said for variation between Jund, Abzan, and BG Rock? These kinds of decisions obviously have a huge impact on how the math works in metagame analysis. A Day 2 might be 15% “Twin” decks, but that also might be split pretty evenly between Temur, Grixis, and UR Twin. Making matters worse, it’s unclear how this factors into Wizards’ assessments of format diversity. Did Kiki Pod’s small share factor into the ultimate Birthing Pod ban? My guess is it didn’t: Wizards probably would have been thrilled to not have Abzan/Melira Pod decks and just have Kiki Pod ones.
In this article, I split up all the supertypes into distinct decks, but I also want to re-run this analysis at the end of the month with supertypes instead. Although there are appreciable differences between individual decks within a supertype, this often suggests deck diversity more than it suggests card diversity. And even there, if all the decks are built around the card in the same way, it might not even suggest deck diversity at all! This gives us a good opportunity to re-run the analysis with a different frame after all the June GPs have wrapped up (Singapore is this weekend).
We’ll be doing more GP Copenhagen review all week long, and this is a great starting point in situating Copenhagen in the broader Modern context. By many counts, GP Copenhagen looked like a diverse and open event, although it was certainly no GP Charlotte. Overall, Modern is looking healthier than it has in a long time, although there are still some lingering questions about how Twin-style decks might be shaping the metagame. We’ll have to amass more data before we can answer that question, and I’m excited what the rest of the month holds for the Modern community.