Modern MTGO Deep Dive: Win Rate Analysis

Posted on April 13, 2015 by Sheridan Lardner

Are you a Quiet Speculation member?

If not, now is a perfect time to join up! Our powerful tools, breaking-news analysis, and exclusive Discord channel will make sure you stay up to date and ahead of the curve.

Learn More

I'm a big fan of using MTGO data to analyze deck performance. But one of the biggest issues with data available on the Wizards website is it's incomplete. We only see 4-0/3-1 decks, we only see one event per day, and we have no idea what matchups happened on the way to those finishes. Although you can still use this public data to perform some interesting analysis at varying depths (see my past articles on Mono U Tron and Infect), the dataset limitations always leave us wanting more.

Working with a few friends of mine from the MTGSalvation Modern forums, we decided to take an in-depth snapshot of the current Modern MTGO metagame. To do this, we looked past the public data into the MTGO client itself, observing and recording dailies over the past few weeks. The result is almost a dozen events to dig through without any of the limitations we typically see in the public data. This dataset lets us not just look at winning decks but all decks, seeing which decks are performing well relative to every strategy in any given event. The statistical possibilities are almost endless, but this week we are going to use this data to get a sense of the top performing decks on MTGO. I'm sure you will see one of them coming, but another was a total surprise even to me.

Using some more statistics (didn't think you saw the end of that, did you?), I'm going to look into the match win percentages of these decks to try and identify a few of Modern MTGO's top decks. In doing so, I'm looking both for decks performing significantly above average, but also for some possible explanations about why this might be happening. I'll also give a general rundown of the current Tier 1 and Tier 2 decks on MTGO, showing how their match win percentages rank up relative to the rest. So let's dive in to this rich dataset and see how we can use these numbers to improve our understanding of the format.

[wp_ad_camp_1]

Dataset Description

If you've ever read research articles for work or school (all those 30+ page PDFs you got assigned in college and then only read the abstract), you'll know the importance of describing a dataset before we start. This is particularly important when we aren't all looking over the dataset together. This is actually quite common in both the "hard" sciences and the "soft" ones, although it is relatively rare in Magic where most data is just a click away on a website (e.g. our Top Decks page). So to get us on the same page, here's a bit of background on what we are looking at.

The data covers 11 MTGO events from 3/24 to 4/12. The average event size was about 70 players and there are roughly 750 decks captured, covering thousands of individual matches between those decks. The big difference between this dataset and those we have looked at in the past is the inclusion of 2-2 or worse decks, not just 4-0/3-1 ones.

This might be the most exciting Modern data I have gotten to work with in a while, but it still has its limitations. For one, this is by no means a complete population of all MTGO events in the time frame. It's just 11 dailies over a 2-3 week period, which is actually less than an event per day. Although our deck and match Ns are very large, the event N is quite small, so we need to understand that in analyzing the numbers. Perhaps more importantly, it's not even a random sample. Dailies were just observed whenever my partners and I had time, which is only semi-random at best. These limitations (and others I am sure you can think of; data entry errors are always at play in big, manually recorded datasets) do not undercut the value of what we are doing, but they do force us to consider their effect on our conclusions.

Also, as with all other articles of this kind, all statistics and data analysis disclaimers apply!

Tier 1 Deck Performance

One of the enduring problems of metagame statistics like the ones on our Top Decks page is they are almost entirely based on prevalence. They do not explicitly account for performance of those decks; a deck getting played a lot might be more likely to just win events eventually! Of course, that's not saying any of those decks are bad. We wouldn't keep seeing them time and time again if decks like Abzan, Twin, and Burn were "bad" decks. But prevalence is only one piece of the metagame story. To get to a deck's performance, particularly for decks played less than the top ones, we need to dig deeper.

To try and get at this issue, I analyzed the match win percentage (MWP) of decks in the dataset. MWP is just the sum of a deck's winning matches divided by the total number of matches the deck played. So a deck that went 4-0 on MTGO would have a 100% MWP, a 3-1 deck would have a 75% MWP, etc. We can aggregate this across events to get an overall MWP for any deck; a build with 50 match wins out of 100 matches played would have a 50% overall MWP. We can even augment this statistic with game win percentage (GWP) to see how many games the deck is winning as well. After all, a deck that goes 4-0 off of 2-1, 2-1, 2-1, and 2-1 records has a lower GWP than a deck that does the same off of four 2-0 matches. But they would both have that 100% MWP, which is why it is helpful to have both.

If you were worried there weren't enough statistics here, fear not my loyal readers! I'm also going to give the corresponding statistical significance (p value) for both the GWP and the MWP, just to make sure the number isn't within expected variance. After all, even a fair coin may flip 3 heads and 7 tails in just 10 trials, and Modern MTGO is much more complicated than that. To check this, I compare the GWPs and MWPs to the weighted average of GWPs/MWPs for all decks in the dataset. This gives us a sense as to whether a deck is actually under/over-performing relative to the average. Remember we are looking for small P values. Anything less than .10 is interesting and anything less than .05 should really catch our attention.

Finally, note GWPs/MWPs are both adjusted to account for byes, mirror matches, drops, splits, and other MTGO quirks skewing our dataset.

With the stat-speak out of the way, it's time to dive right in. Here are the results of this analysis for the Tier 1 decks on our Top Decks page, as of the 3/1 - 4/1 metagame period. I'm also putting in an entry for the MTGO-wide averages, just as a point of comparison. The table's default sort is on MWP.

Deck	N (decks in dataset)	GWP	GWP (p value)	MWP	MWP (p value)
Infect	20	50.29%	0.639	53.73%	0.277
Abzan	40	43.32%	0.833	49.26%	0.584
Affinity	60	46.60%	0.802	49.21%	0.528
Burn	78	48.53%	0.532	47.91%	0.744
MTGO weighted averages	-na-	44.99%	-na-	46.90%	-na-
UR Twin	64	43.96%	0.870	46.58%	0.923

One of the biggest challenges in looking at average is in understanding variance. Those P values help us in that respect. We shouldn't look at UR Twin and think "below average?? WORST TIER 1 DECK ALERT" or something similar. Instead, we should look at all of those P values and see all of the tier 1 decks are well within the expected performance range for any deck in the format. Infect is a bit higher (which I expected, given my analysis in a previous article), but it's not that much higher than the rest.

When i see the tier 1 decks, I don't see decks above or below an MTGO-wide average. I see decks that get a lot of play, which means they attract a lot of different players. Masters will pilot these decks to 4-0 finishes and MTGO newcomers will pick them up and flop out at 0-2. With tier 1 status comes a lot of profile, which means a lot of people are going to be trying these decks and bringing the GWP/MWP both up and down.

This all suggests the tier 1 decks are actually pretty similar in terms of their performance. Infect is a bit higher, but overall they are in the same performance band. This is more or less what we should expect of these decks. You can win with them, you can lose with them, and you can get everywhere in between with them too. Because Infect does not have a significant P value (it doesn't even have that large of an N), I also am unwilling to conclude Infect is the "best" of that bunch. They are all strong choices in their own right, depending on playstyle, metagame, preference, etc.

Tier 2 Deck Performance

In moving to the tier 2 decks, our biggest concern starts to be N. With the exception of Grixis Delver, which is more popular on MTGO right now than Ring Pops were in my youth, these decks do not have a lot of representatives. Lower representation can polarize GWPs/MWPs, and makes every individual showing count more. We are more likely to see significant P values with fewer decks in the sample, but that doesn't mean we should discount them entirely. Sample size is just one of many factors to consider and if we make interesting quantitative conclusions matching our qualitative knowledge and suspicions, N might not matter at all.

Here are the tier 2 decks laid out in the same way as the tier 1 decks were. MWP is still the default sorting variable, but pay attention to those P values, because we have some winners in this batch.

Deck	N (decks in dataset)	GWP	GWP (p value)	MWP	MWP (p value)
Temur Twin	7	58.98%	0.485	66.67%	0.085 *
Abzan Liege	18	54.34%	0.436	58.46%	0.079 *
Amulet Bloom	25	55.78%	0.289	56.82%	0.075 *
Jund	25	49.79%	0.634	52.81%	0.275
Merfolk	22	47.43%	0.820	52.70%	0.329
Bogles	13	43.89%	0.938	52.27%	0.489
Blue/Temur Moon	3	36.67%	0.799	50.00%	0.862
UWR Control	15	46.92%	0.882	48.08%	0.868
Grixis Delver	49	47.07%	0.771	47.53%	0.873
MTGO weighted averages	-na-	44.99%	-na-	46.90%	-na-
Scapeshift	21	39.33%	0.608	43.94%	0.635
Living End	8	45.07%	0.997	43.33%	0.707
RG Tron	15	45.33%	0.979	42.55%	0.560

Nothing excites statisticians quite like low P values (except maybe beta values...mmm). It's a rare thrill to come across significant P values in social science analyse like these. Abzan Liege, Amulet Bloom, and Temur Twin really deliver.

Before we jump into these three decks, let's take a look at the overall picture. Despite some really small N's that we almost have to throw out because they are so small (poor Blue Moon on MTGO), these N's aren't so tiny that we can ignore their GWPs/MWPs entirely. A lot of these decks have a respectable number of finishes, certainly enough to analyze with different statistical sampling distributions. Grixis Delver, for example, appears very average in this dataset, despite having a ton of playtime online. Indeed, it has very similar values to those in the tier 1 table, which might suggest it's just as viable on MTGO as any of the big guns like Abzan, Twin, and Burn. A number of other decks are also performing in that expected performance band, except maybe Bogles which is a little lower and Scapeshift that is a lot lower (but still not significantly so). Overall, although I would like to see more representatives of many of these decks, I still think we can look at this and conclude most of the tier 2 decks are all pretty viable. They aren't too different from the heavy hitters in tier 1, and they aren't too different from each other.

But three decks actually are pushing past the average performance of MTGO decks. Yeah, it doesn't quite clear the magical (i.e. semi-arbitrary) 95% significance cutoff we want to see with P < .05, but they are close enough to grab our eye. I want to talk about these decks one-by-one, both to unpack their stats and to give some thoughts on why they are doing so well.

Temur Twin (MWP = 66.77%, P = .085)

I guess good things do come to those who just add Goyf to their deck. Let's start with the bad news for Temur Twin fans out there; of the three decks I'm highlighting in tier 2, this is the one with by far the most alarming N. We're only looking at 7 decks spanning just over 25 matches. It's not really enough data to normalize the effects of luck and chance, even in a dataset we know has a small N overall. So we need to remember this when figuring out why RUG Twin might be good.

If Temur Twin is in fact a good deck, and not just a false positive because of a small N, I think a big part of this is actually the prominence of UR Twin itself. You can't leave home without anti-Twin tech in your sideboard and an anti-Twin game 1 plan in your maindeck. Most players also know how to respect the combo and play with it in mind. Tarmogoyf messes with that calculus. A lot of the stuff that works well in fighting stalled Twin board states doesn't do much when facing down the 4/5 Goyf on turn 3. Choke is a great card to land on turn 2-3 against Twin, but it isn't helpful at shutting down Magic's most efficient beatstick. It's easy to misstep against this deck, which can win from so many angles, which might make it a great choice for MTGO where player experience can vary widely. Another possible explanation is in the sideboard, with cards like Ancient Grudge and/or Nature's Claim, which are very strong in a metagame that is increasingly Affinity. I'm not totally sold RUG Twin would keep its status if we added more matches to the dataset, but if it did, it would likely be due to these reasons.

Abzan Liege (MWP = 58.46%, P = .079)

If a deck designed to target the current metagame didn't make this list, all of us would feel very misled by the innovators over at Team Face to Face. But Abzan Liege remains a metagame predator today as much as it was at the PT, reflected in its 58.82% MWP almost reaching the coveted .05 P value. The deck's matchups also reflect its suitability to our current metagame. In all its approximately 65 matchups, it went 4-2 vs. Affinity, 8-5 vs. Burn, 4-2 against UR Twin, 3-0 vs. Junk, and 3-3 against Grixis Delver, which is exactly the kind of match win rates you want to see against top MTGO decks. The deck appears to do less well against weirder, off-beat decks (e.g. Norin the Wary, Smallpox Loam, 4C Gifts, etc.), and is only average against the faster, linear decks like Amulet Bloom, Infect, Bogles, etc. But because MTGO is so dominated by format mainstays like Affinity/Burn/Twin/Grixis Delver (on MTGO at least) and so on, it looks like a great deck to hedge your bets on.

A big reason for the deck's success is the MTGO metagame's predictability. All those decks Abzan Liege beats or breaks even with are everywhere online, so between your maindeck and your sideboard, you are ready for a big chunk of the format. The deck's core also has a surprising degree of customization to fill in holes against more erratic matchups like Burn (which can be die-roll dependent) and Twin (which can hinge on drawing/not drawing a critical removal or disruption spell). But unlike with other customizable decks, like Abzan itself, this flexibility doesn't come at the cost of faster starts and aggression, which means you are balancing a proactive, focused gameplan with some reactive elements. So long as the MTGO metagame looks like it does now, this deck is going to remain a safe choice.

Amulet Bloom (MWP = 56.82%, P = .075)

On the one hand, this deck did not turn out to be some unstoppable MTGO force as I feared it might when I started the analysis. On the other hand, Amulet Bloom is still just a really good deck. Overly reactive decks just get crushed in this matchup: 3-0 vs. Scapeshift, 3-0 vs. UWR Control, 3-0 vs. Esper Control, etc. Decks that try to play too fairly also have big trouble here. Grixis Delver looks like a massacre, with Amulet Bloom going 5-1 against that deck. But Jund and Junk remain formidable policing presences against Amulet Bloom. The deck was 2-4 against Jund and 0-4 against Junk. Merfolk also gave Amulet Bloom a lot of trouble, combining hyper-efficient disruption with a fast clock to cause Bloom to go 1-5 against the fish-squad.

Despite its troubles against decks that are supposed to be good against combo anyway, Amulet Bloom remains a great deck. With the exception of perhaps Infect, no other top-tier deck punishes opponents harder for lacking interaction or not playing with disruption ready. And Bloom is actually harder than Infect to interact with in the first place. Even when you are trying to interact with Amulet Bloom, there are a few interesting contexts making this difficult. For one, stuff that is good against Amulet is not always relevant in the broader metagame. As an example, early Blood Moon can be great against this deck, but the kind of deck that packs Moon and either acceleration to power it out and/or disruption backup is not very good in MTGO right now. Some Twin builds are doing this, but with Grixis Delver, Burn, and Affinity having such high metagame shares, Moon isn't always the card you want to rely on.

A second factor at play in beating Amulet is experience. We saw countless examples of this at Pro Tour Fate Reforged, where the Twitch stream exploded as a pro player made the wrong removal, countermagic, or discard decision against the deck. Thoughtseize misplays were the most heart-wrenching. Sure, fatigue and expected gameplay errors played a role in this, but a big piece of it was also deck knowledge. This is magnified on MTGO, where player experience is much more variable than on the PT. Amulet really takes advantage of this, both by punishing your deck choice (which itself might be ill-suited for Amulet interaction), and then again by punishing any inexperience you have going into the match.

A final factor here is relative metagame share. The more Amulet you see, the more likely you are to face anti-Amulet cards or players who have experience beating the deck. But when Amulet declines, the cards to beat it might decline alongside, and the collective MTGO experience declines as well. In many respects, this makes Amulet a kind of Dredge of Modern (although that's not entirely fair because Modern seems to have a LOT of "Dredges of Modern" these days). Of course, this leads us to some interesting questions and some even more interesting answers. Why is Amulet declining if it's so good? Are people preparing better for it? Are worse players trying and failing at the deck, leaving it in the hands of masters? Is it just luck? If Amulet were some 8%+ format monster, like Storm in the Seething Song days, then this MWP and its significance would be more straightforward. But because Amulet is seeing a declining share in both paper and MTGO, we need to think of reasons why the deck is both losing its share and still maintaining a strong performance.

Next Steps

My initial draft of this article had another section on some other high (and low!) performing decks that were not tier 1 or tier 2. But because the dataset is so rich and this kind of information so interesting, it makes more sense to split that off into a followup article. That way, we can spend as much time unpacking those decks as we did unpacking these, and we can even revisit some of our conclusions here to see how they fit in the broader metagame context.

Open question until then: what are some other dynamics that can affect MWP and its significance? How should these factors affect our analysis, conclusions, and actions based on those conclusions? As a conversation starter, we already know the most-played decks in the format are not those with the highest MWPs. But we also know that many of those high MWP decks aren't overperforming at higher levels. This is especially true in paper, where a deck like Amulet Bloom has all but disappeared from the paper scene in the past few months. Having just updated the Top Decks page with events from the past two weeks, I can tell you this deck is basically nowhere. How can we reconcile these potentially contradictory numbers? How would we apply that to other decks? I'll touch on some other answers to this in next week's article.

Until next time, keep thinking about MTGO data and how we can use different analytic tools to improve your Modern game.

11 thoughts on “Modern MTGO Deep Dive: Win Rate Analysis”

amalek0 says:

April 13, 2015 at 10:09 am

Part of the reason is confidence–There are many players I’ve spoken to who are familiar with the amulet bloom deck and play it locally, but who fear pulling the trigger on it at paper events with any sort of real prize structure–there’s this fear that the deck can pull its “oops, two inconsistent hands in a row, you lose a round” trick once in a tournament and leave you out of top 8 contention.

Unlike most fair and control decks, pulling the trigger on playing combo decks without a strong backup plan (like the twin beatdown/blood moon plan, or the pod-deck fair-beats plan) requires absolute faith that the power level of what you can do will go over the top of your opponents often enough to outweigh the times where you run cold and are left without the ability to try and leverage any sort of playskill to get back into the game. Better players, as a general rule of thumb, like to believe that they can leverage their own playskill to reduce variance, and playing bloom doesn’t quite have that as much as other combo decks, like splinter twin, and certainly has it less than abzan midrange, affinity, or other tier one mainstays.

In other words, it’s mostly appeared to be a psychological barrier, not a logical one.

Log in to Reply
1. Sheridan Lardner says:
  
  April 14, 2015 at 8:44 am
  
  @amalek0: Interesting. This would definitely explain why the deck’s paper prevalence is falling, especially at larger events. The more rounds you have to win as an Amulet player, the more that high variance can screw you out of top finishes. We see similar issues with other supposedly broken decks like Griselbrand Reanimator, a deck that was big on MTGO in Summer 2013 but then never converted that into paper finishes past a T16 by Todd Anderson.
  
  That said, I don’t think this would explain a similar decline in Amulet’s MTGO share. If anything, MTGO has been a historical stronghold for Amulet players, where you just need to win 3 rounds to get tickets. With so many events and such a low buy-in cost, this should be the perfect place to play higher variance decks and just hope for the steamroll and/or the good matchups. But we also see an MTGO decline in the deck. Perhaps this is related to the paper decline; MTGO players see the deck not putting up paper results and then avoid it themselves, even if the conditions that inhibit paper success might be less present online.
  
  Log in to Reply
Roland F. Rivera Santiago says:

April 13, 2015 at 8:38 pm

A veritable treasure trove of data – thanks for taking the time to put this together. I’m surprised to see RG Tron on this list while mono-U Tron is absent (especially considering the small sample sizes you were allowing for). Did something change since you highlighted it in your previous piece?

Log in to Reply
1. Sheridan Lardner says:
  
  April 14, 2015 at 8:46 am
  
  @Roland: Happy the data is interesting! For Mono U Tron, the reason I excluded it is because it’s neither Tier 1 nor Tier 2 right now by our classification system. I didn’t highlight it or any of the other untiered decks in this current piece, but I’d like to revisit them later.
  
  Incidentally, Mono U Tron has super high variance once you look at a dataset like this. You see some players piloting it regularly and doing really well. But then you see a bunch of other players who flop out with it at 2-2 or worse. I think this is because the deck has the most favorable ratio of competitiveness to cost in MTGO, so it’s a good intro deck for new Modern players. These newer players may lack experience and could bring the deck performance down overall, even though other players are quite successful with it.
  
  Log in to Reply
  1. Roland F. Rivera Santiago says:
    
    April 14, 2015 at 12:37 pm
    
    I’d be excited to see a piece on “notable untiered decks” using your data. I feel that a few decks like 8-Rack, Orzhov Tokens, Sultai Control, Martyr Life/Soul Sisters, and Mono-U Tron are lurking on the margins and could make some noise in the coming months, and you could potentially see it coming in the MTGO data.
    
    Log in to Reply
Josh D says:

April 13, 2015 at 10:08 pm

Hey! As a solicitor, with a science degree (advanced mathematics major) that only plays modern – if there was ever an article written specifically for me, this one feels like it!!!

When you start to see the analysis like this come out of the statistics, it becomes apparent why Wizards do not release the statistics on each event. Considering the small sample size of events, imagining the data from the full range of dailies would be amazing. The numbers do not lie if the same size is significant.

This feels like the start of a beautiful project. A time-consuming, thankless, beautiful project.

Some comments:

a) Twin – I completely agree with your views regarding twin. That it won the last modern pro tour, and is seen as the tier 1 of tier 1 decks cannot be understated. It is middling in price, interactive and complicated to play – no wonder the deck has a high representation nearly equalling that of the comparatively cheap burn and affinity. This deck doesn’t give out free wins that occasionally come to burn infect and affinity – you have to work for the vast majority of them, with difficult unforgiving decision trees. While the sample size is small, I would expect twin to continue to be a middle performer, due solely to its exposure as the best modern deck and being played by people that expect the deck to win them matches due to winning the pro tour.

Anecdotally, I know a very experienced affinity player that has swapped over to twin – and is atrocious. It feels to me like the deck will not carry you, whereas in affinity picking the 5th best line is probably still good enough to get you through the match.

b) Jund and Merfolk – I am very excited about the data on these two decks becoming more solid. Jund is like the underexposed version of Junk – only played by those that actually really want to be on the deck. You would think that the average jund pilot would be better than the average junk pilot, just because the deck is less well known.

Merfolk is in a good place right now. The deck is fast, has disruption, and the main way to deal with it efficiently (sweepers) is not present in most of the top decks. Monastery Siege adding another kira effect (a better kira effect), could push this up the ranks.

I need to beware, as this seems a lot like confirmation bias. Hence why i am so excited to see further data. Keep up the great work.

Also – when are you putting forums in? modern is great, the best format, and although there is some good stuff elsewhere, wading through the crap of people that do not take modern seriously, and the “Here is my first deck pls critique” without testing is frustrating. This seems to be the place for serious modern players to read. statistics, actual testing, actual people defining their meta before posting.

Log in to Reply
Sheridan Lardner says:

April 14, 2015 at 9:02 am

@Josh: STATS ARE BEAUTIFUL! A MAN AFTER MY OWN HEART!

I agree with your assessment of Twin. This becomes a tricky statistical point, however, because the data does not suggest the deck is particularly good or bad. It just looks kind of average. But as we know, there are all sorts of other factors that are bringing down Twin’s overall performance even if the deck is still awesome. This is exactly the kind of caution it is important to keep in mind when analyzing deck performance, and I’m happy there are guys like you out there who think through this so critically.

It has been exciting to see Jund rising up the ranks. This makes a lot of sense given the comparative advantages to running Bolt and Blackcleave Cliffs over Path and a super painful shock/fetch manabase. It is also interesting that as Jund has gone up, Abzan has stayed relatively flat or even declined a bit. This suggests there is some ceiling on BGx style decks in the format. Or it’s just an artifact of player preference and metagame trends, and maybe the collective Jund/Junk share will rise later on. As for Merfolk, this was a deck I identified a while ago as a probably riser, for the very reasons you talked about. In particular, it has a good Burn matchup while still keeping the linear and aggressive elements that make Modern decks successful.

Forums are a tricky one. On the one hand, they are in line with our mission for providing quality Modern content. On the other hand, they are huge undertakings and many content sites have actually gotten out of the forum business (or forum sites getting out of the content business). So it’s an ambitious proposition that we are still considering, but the more feedback we get, the more information we will have to make a decision!

Log in to Reply
1. amalek0 says:
  
  April 14, 2015 at 11:32 am
  
  Leave the forum pages to be forum pages, leave the analysis to be analysis. Not that I mind Ktkenshinx bearing the torch of statistics on the mtgsalvation forums the same way I bear the torch of formal definitions (yo, what makes it a TEMPO deck?).
  
  One point I would like to make is the use of MWP and GWP–I’m not super familiar with MTGO, but is it possible to separate out whether a deck is on the play or draw in game 1, and how that affects the MWP of a deck? It might provide some nice hard numbers for deciding what the actual “speed” of the format is at a given point in time.
  
  Log in to Reply
2. Joshua Davenport says:
  
  April 14, 2015 at 9:33 pm
  
  I was thinking more about your open questions – items that affect the MWP, and one I would like to throw in the ring is expense of the deck.
  
  TL;DR “The greater the price of the deck, the more likely there is an above average pilot behind it. The cheaper the deck, the more likely there is a below average pilot. This has the potential to make the more expensive decks have a MW% higher than the true MW%, and potentially the cheaper decks a lower MW% than if all pilots were equally skilled.”
  
  Following generalised reasons:
  
  a) those that play in mtgo modern tournaments are a subset of the competitive magic community. You do not see many kitchen table players just “having a go” in modern dailies. I would expect this subset to have a greater skill level on average than the crowd at an event that is open to the public (an SCG event for example).
  b) Of this subset of “better on average” players, those that are more committed to magic are more likely to use their funds to get the more expensive decks in the format. Pricing the decks in the format (mtgo prices from mtggoldfish):
  
  Burn: $174.02
  Grixis delver: $210.04
  Merfolk: $260
  Bloom Titan: $277
  Infect: $320.99
  Affinity: 321.57
  Abzan Liege: 400
  Twin : $492.81
  Tarmo Twin: $660
  Jund: $763.00
  Junk: $907.38
  
  (the pricing is not perfect, but serves to illustrate my point).
  
  If you are playing junk/jund online, you are a serious player. You aren’t playing it because you would like to have a go at modern – you are on it because it is the best deck in the format.
  
  I expect the more expensive the deck, the better the average of the player within this group. Therefore, I would expect the super expensive decks to be a few % points higher than if players if the decks were played by equally skilled pilots. To rephrase, I think it is possible that these the expensive decks (jund and junk) win 1 or 2 matches out of 100 just because they have more skilled pilots than the pilots on other decks. The amount of money you have to spend on MTGO is a limiting factor when choosing a modern deck. I am sure there are well off people that choose the cheaper or more expensive decks. But i am also sure there are players that play cheaper decks because that is what they can afford.
  
  So i think Junk and Jund are probably a percentage or two too high. You will respond “that drags Junk down to 47%! that cannot be right.” I do not think it is possible to prove or disprove this hypothesis at present, but it is just conjecture.
  
  The alternate is true for the cheapest decks in the format. Look at the sweetheart (and most played) deck in the format – Grixis delver. It has the highest non-tier 1 N value, while costing little more than burn. The cheaper buy in is sure to appeal to the “average player on a budget” that doesn’t want to play burn. I think these two decks are probably a few percentages too low
  
  I think it also helps explain why tarmo twin is so dominant: it is not only a metagame call, but also probably skilled by better players that can afford the additional $180 for pixels.
  
  Abzan liege – The exception – cheaper than junk, but out performing it on all fronts. In my opinion, this deck is 100% a metagame call. With positive matchups against Affinity, burn, Junk & twin (4 of the 5 tier 1 decks), the question isnt how high is this MW%, but “what the hell is this deck losing against?” If any other deck had positive matchups against 4 of the top 5 tier 1 decks, there wouldnt be 5 decks in tier 1 – there would be this deck and the anti this deck. If Junk’s presence in the metagame slows down (which i think it will), there is a good chance this deck just becomes fringe playable.
  
  Regarding reasons bloom titan is underplayed for its success
  Bloom Titan – Fear. There is a lot of irrational fear over bannings at present. This deck seems to good to be true. The ban announcement (that some feared summer bloom) was 23 March – your collection data started on 24 March. This is a possible reason the deck appeared underrepresented for the period
  
  Log in to Reply
Timur Nurmagambetov says:

September 12, 2015 at 11:34 am

Your statistics findings are quite questionable
You public three articles with three different results
Naming three different decks (Blue Tron, Temur Twin and Infect) to be the best deck
You calculate confidence intervals then throw them away calling them old and present new ones
What is your Null hypothesis you calculate p-values for?
Your use of p-values is quite strange, you say that you are looking for low p-values, then post results with huge p-values like they are ok too
Could you public the raw data of modern matches?
These articles are written like you are taking a statistics class and just want to exercise with some numbers without a real understanding how methods work)

Log in to Reply
1. Sheridan Lardner says:
  
  September 13, 2015 at 8:06 am
  
  Happy to address these concerns!
  
  1. For those earlier articles, I was using a different dataset time and refining methods. I eventually realized I couldn’t just use public MTGO data from the mothership because the results were quite different than when using the data from the client. I also realized it wasn’t enough to look at raw MWP values because the confidence intervals around some of those MWPs were so wide. I leave those articles up not because they are right, but because I think it’s important to show how the thought process evolved on these kinds of questions.
  
  2. The null hypothesis is that the deck’s MWP is equal to the average MWP of all decks on MTGO. Decks that are significantly lower might not be optimal choices. Decks that are significantly higher might be the better players in Modern.
  
  3. I’m certainly looking (and hoping!) for low P values, but I’m going to report on any findings I come across. Although I would prefer the P values to be < .05 or < .01, the < .10 values are still worth reporting on and are small enough to suggest a legitimate outlier. Some of them, notably Amulet Bloom, continued their downward trend and ended up with some really exciting P values around their MWP (Amulet was .03 by May).
  4. As far as I can find, no one who conducts these sorts of metagame analysis publishes the dataset (including Frank Karsten and MTG Goldfish). Because it’s not public data, there’s a lot of work that goes into collecting it and I don’t think authors and sites want to totally turn that data over to the public. It would be a bummer if we did all the work collecting the stats and some other site took the data and analyzed it on their own! So I’m not making the data public. I would love to incorporate these numbers and results into a database of some kind, but we are pretty far away from that from a site development perspective.
  
  I’m sorry you didn’t enjoy the articles and the methods. I have extensive experience in both stats and stats as applied to various social sciences, and hoped this would be a fun way to bring those methods to a broader public while not making them too technical. Given the overwhelmingly positive reception, I think we did a pretty good job. Let me know if you have other concerns or questions!
  
  Log in to Reply