Due to complications in my holiday plans, I didn’t have time to write an entirely new article this week. As a replacement, please re-enjoy the final results of my Stoneforge Mystic testing with some additional commentary that I left out from the original run. I promise to have something new to start 2017.
Here it is. The actual data from my investigation into Stoneforge Mystic. After well over 600 matches with my Abzan test decks, I can finally give a decidedly data-driven answer to whether or not Stoneforge deserves its place on the banlist and how it would impact Modern.
I tried to be as clinical and scientific as possible in my treatment of the material and how I approached my results. I initially intended to include an actual statistical study, complete with confidence intervals and regression analysis, but my n proved too small for a reasonable margin of error. That requires at least twice as much data, realistically triple, and I don’t have the time for that. If anyone wants to do their own data collection and add it to mine, you are welcome to try.
If the goal is to be as scientific as possible, then we need a hypothesis to actually test. You cannot just do experiments, you need to be trying to prove or disprove something. For this test I want to avoid something too broad like, “The viability of unbanning Stoneforge Mystic.” After looking through postings on the subject throughout the greater Modern community, there is a consensus about why to unban Mystic. To the best of my ability to decipher, it is:
Stoneforge Mystic would allow more fair midrange and control decks to exist in Modern by slowing down aggressive decks and reducing their metagame presence.
Therefore, I will be performing my investigation with this hypothesis in mind. If Stoneforge slows the format down and makes fair decks a more attractive option then it is a good candidate for unbanning. If it does not accomplish that goal it should stay banned.
I’m going to provide general impressions of the matchups rather than detailed descriptions. Trying to be specific about 600+ matches is a Sisyphean task to write and incredibly boring to read. In addition, most of the games can simply be described as, “If Abzan runs the opponent out of resources it wins, if not it loses.” If you want more detail about a specific matchup ask in the comments.
I took copious notes and statistics as I went through and any particularly interesting and relevant ones are included in addition to the research I was actually conducting. Hopefully they will serve to provide additional color to my results and insight into my conclusions.
I looked up the consensus wisdom about whether Abzan was favored or unfavored before each test, partially to guide how I approached the matchup (I wasn’t an Abzan player before this test) and to provide some guidance for the validity of the testing. I will include the consensus evaluation alongside my own findings.
Edit: When all was said and done, I was actually really conflicted about Stoneforge Mystic in Abzan. Playing Liliana of the Veil on curve was so powerful that I didn’t want to play Stoneforge on turn two if I also had Liliana. Tutoring for a card and then not playing it while discarding cards was a bit inefficient, though it was correct. Had I been more flexible about my testing methodology and allowed myself to change playstyles more often I think that the Infect and Ad Nauseam matchups would have been more in Abzan’s favor.
The Stoneforge Data
I will begin by discussing Stoneforge Mystic herself. For the most part, the way I played Mystic did not change between matchups so it makes sense to deal with the associated data separately. Unless otherwise noted, I played Mystic at the earliest opportunity possible opportunity and fetched Batterskull.
- Opening Hands Containing Mystic: 54%
- Games including Mystic: 72%
- Average Mystic Turn: 3.80
- % Total Games Mystic Played on Turn 2: 49%
This is remarkably close to the results that Sheridan reported. The decreased average Mystic turn is most strongly the result of play adjustments made for the Jeskai matchup. The increased turn two percentage comes from the additional games played and mulliganing decisions, held down again by the Jeskai adjustment, which will be explained in the appropriate section.
Edit: As mentioned above, I think that I really should have delayed playing Mystic more often, giving me a higher average Mystic turn.
Without further ado, in the order that I tested them, here are the results of my investigation by deck with sideboarding information. The specific decklists are in my article from last week. I didn’t record sideboarding strategies from my opponents unless they changed them because of Mystic. Also, I never made matchup-specific considerations for my game one mulligans to try and keep things as “real world” as possible.
The community believes that Abzan is favored thanks specifically to Lingering Souls, but it needed disruption to be safe. I tended to mulligan for Souls and/or Inquisition in all my matches as a result.
- Game One Win %: 48% (24/50)
- Match Win %: 64% (32/50)
Games were not expected to go long and grinding wasn’t a factor. Either Abzan died quickly or Infect was successfully exhausted in the opening few turns. I lost a number of game ones due to hands with unplayable Tasigurs, or tapping out for Truths, so they get cut rather than interactive cards.
I mulliganed a little less because of keeping hands with Stoneforge Mystic and disruption, but that didn’t impact the matchup very much. It was decent against Glistener Elf, but less effective than Souls had been.
- Game One Win %: 50% (25/50)
- Total Match Win %: 62% (31/50)
I was decided that the additional creature kill was more important here than discard, so I swapped swords. Explosives and Slaughter Pact were the best removal in the sideboard and I couldn’t cut too many threats or any interaction to bring in more. In the end I didn’t end up missing the Fulminator Mages and it didn’t affect the matchup. I did miss the “I win” aspect of Curse several times. Maelstrom Pulse didn’t have much impact game one so I didn’t think it necessary games two and three.
Edit: The only real effect was on Glistener Elf kills. Batterskull made them a little more difficult, but Blighted Agent and Inkmoth Nexus don’t care about a ground-pounder. Were Mystic to be unbanned I suspect Infect would shift from Elf to Plague Stinger. A slightly slower goldfish, but more reliable.
Burn used to be considered a very good matchup for Abzan, but that was when it ran Kitchen Finks maindeck. Opinions on whether Siege Rhino actually replaces Finks appears mixed, and consensus has weakened on the matchup. I’ll take the assumption that it has moved to being more even these days.
- Game One Win %: 40% (20/50)
- Total Match Win %: 52% (26/50)
Truths cost too much life and Timely was just as many cards if you follow the Philosophy of Fire. Nothing else was expected to have an impact. Thoughtseize was not cut because it gained an average of one life per use. There was also nothing else I wanted to bring in.
On the Draw:
Abzan didn’t give Burn that many targets for Blaze so they were cut for the Siege Rhino answers. On the draw creatures were shaved on the assumption that they would be neutralized earlier and that Burn would have to play a longer game.
The extra lifegain was very important to pulling up Abzan in games two and three, but most wins still came from early disruption followed by Tarmogoyf. Siege Rhino was impactful, but also easy to anticipate and counter with a sandbagged Atarka’s Command or Skullcrack, frequently resulting in Burn victories.
I mulliganed fairly aggressively for Mystic in games two and three, assuming that Batterskull would be very important.
- Game One Win %: 66% (33/50)
- Total Match Win %: 60% (30/50)
Batterskull was quite important, and in game ones Burn only won once when it was played turn three thanks to a lot of Skullcracks.
My opponent wasn’t sure what to take out, but knew what to bring in. Answering Stoneforge and Batterskull was the priority and it happened enough to improve the win percentage. I was frequently forced to slow-roll Mystic thanks to hands containing multiple answers, which meant I often had no pressure.
Edit: My test partner kept testing to work out his sideboard strategy. He believes that Eidolon was correct but only one Spike should have been cut for Searing Blaze with no Paths. The critical thing to answer was the Batterskull and he now believes that you have to just race the other threats and forgo answering them.
I know that the Worlds coverage team claimed that combo decks are good against GBx decks, but in my experience the matchup is slightly tilted in GBx’s favor, since disruption plus a clock is good against combo and that’s all GBx does. This is especially true of combo decks that require multiple cards to win.
- Game One Win %: 42% (21/50)
- Total Matches Won: 46% (23/50)
- Ideal Abzan Opening Hands: 24% (12/50)
- Ideal Abzan Opening Win %: 91.7% (11/12)
- Times Ad Nauseam killed itself: 10% (5/50)
- Times Ad Nauseam would have died anyway: 60% (3/5)
A few things to explain: by “ideal Abzan opening” I mean Thoughtseize, Tarmogoyf, Liliana openings. Ad Nauseam killing itself counted the times it died to Spoils of the Vault, either from exiling all its win conditions or actually killing itself from life loss. My opponent was usually forced to play it to prevent dying next turn and more often than not it didn’t work.
Take out dead cards, add in less dead cards. Extraction comes in because taking all of a combo piece, especially Ad Nauseam itself was frequently game over. This was balanced by how powerful Leyline of Sanctity was against Abzan.
It was correct to find Sword of Feast and Famine in this matchup. While the discard was too slow to stop a fast combo, it helped tighten the screws and ensure victory.
- Game One Win %: 40% (20/50)
- Total Matches Won: 48% (24/50)
- Times Ad Nauseam killed itself: 16% (8/50)
- Times Ad Nauseam would have died anyway: 87.5% (7/8)
I really missed the additional Thoughtseize in the other Abzan list. Sword never won a game where I was behind, but it put me over the top of a few.
All the extra discard had a significant impact, as did an additional answer to Leyline.
Edit: I often didn’t want to play Mystic at all in this matchup because it was a slower clock than anything else. You could never take a turn off of playing Liliana to use Mystic, so Tarmogoyf was almost always better. The ideal Abzan hand I describe was far more reliable than Mystic. It might be right to cut them entirely in this matchup.
Full disclosure: by this point I was well over 300 matches in with the two Abzan decks and the practice was having an effect on the results. My play and subsequent win rates improved by an unquantifiable amount as I moved through the decks.
Based on my experience I expected this to go Merfolk’s way. Abzan has less removal than Jund, and Merfolk can power through Lingering Souls more easily than other aggro decks. Abzan wins when it can race with large creatures.
- Game One Win %: 36% (18/50)
- Total Matches Won: 44% (22/50)
In game one Merfolk’s speed and mana disruption were decisive and Abzan struggled to find its feet. It required a critical mass of answers plus a good clock to power through Merfolk’s redundancy, and that didn’t come together that often.
Take out the less impactful or slow cards and find sweepers and board-cloggers. Truths stayed because you really needed to find a sweeper or two to win.
It didn’t change between decks, but I did record Merfolk’s sideboarding.
Merfolk wants more mana distruption and cantrips rather than tempo cards. Relic also answers Ooze, Tarmogoyf, and sometimes Souls.
The plan was to use Batterskull to race as much as possible. Having slightly less disruption made this far more important.
- Game One Win %: 44% (22/50)
- Total Matches Won: 46% (23/50)
Having fewer sweepers was crippling for the deck and the win percentage suffered as a result. Swords were swapped mostly for protection colors as Fire and Ice killed very few creatures when Abzan wasn’t going to win anyway.
Edit: I was generally unimpressed with the Swords. Their impact was unexpectedly low in a lot of games. I think two Batterskulls would be more reliable in actual practice.
Consensus apparently hasn’t been reached about Abzan vs. Death’s Shadow, except that Abzan really needs to watch out for an instant kill. On the one hand Abzan is good against Zoo but it’s not so good against trampling double-strikers.
- Game One Win %: 46% (23/50)
- Total Matches Won: 52% (26/50)
- Game One Death’s Shadow is Zoo: 62% (31/50)
- Game One Win %, Death’s Shadow is Zoo: 65.2% (15/23)
- Adjusted Win %, Death’s Shadow is Zoo: 48% (15/31)
- Game One Death’s Shadow Combos: 38% (19/50)
- Game One Win %, Death’s Shadow Combos: 34.8% (8/23)
- Adjusted Win %, Death’s Shadow Combos: 42% (8/19)
By Zoo vs. Combo I mean the games where Death’s Shadow played creatures and attacked over a number of turns and ground down Abzan’s life total like a traditional aggro deck, vs. wins by combining Temur Battle Rage and Become Immense. Pulling off the combo was harder than expected. DS was able to find the pieces easily enough—it was keeping Temur Battle Rage and Become Immense in hand long enough that proved challenging. The strategy for Abzan was to not lose and let DS nearly kill itself, then win with Spirit tokens.
In comes the anti-creature cards, out go clunkers. Gaining extra life and sweeping the board was very helpful. Ooze never got to grow due to mana constraints and Truths was unnecessary. I wanted to leave in as many naturally large creatures as possible.
Death’s Shadow Sideboarding:
On the Draw:
-3 Steppe Lynx
On the play my opponent thought that the maindeck was fine, but on the draw they thought my discard would assist playing Mandrills early, which was correct for the most part.
Batterskull used to be very good at beating fair creature decks, so I assumed that reliable access would improve things for Abzan.
- Game One Win %: 44% (22/50)
- Total Matches Won: 52% (26/50)
- Game One Death’s Shadow is Zoo: 58% (29/50)
- Game One Win %, Death’s Shadow is Zoo: 81.81% (18/22)
- Adjusted Win %, Death’s Shadow is Zoo: 62% (18/29)
- Game One Death’s Shadow Combos: 42% (21/50)
- Game One Win %, Death’s Shadow Combos: 18.18% (4/22)
- Adjusted Win %, Death’s Shadow Combos: 19% (4/21)
I didn’t record whether the change in the number of Zoo vs. Combo games was because I disrupted the combo less or they found it more. Still, Abzan logged an improvement in games where DS was forced to play fair like traditional Zoo.
I swapped Swords mostly because discard wasn’t very relevant by the time it hit and I wanted to draw more answers.
Death’s Shadow Sideboarding:
On the Draw:
-3 Steppe Lynx
My opponent wanted some answers to Batterskull and cut the least impressive card to do so.
Edit: This matchup was really weird to test since even when I was very far ahead, I never felt safe. Couple that with all of Death’s Shadow Zoo’s cantrips and I never knew what to expect, which is weird for a Thoughtseize deck. I’ve tested Jund against this deck and felt much better since I could Bolt the face to win against the combo.
Consensus says its a very even matchup and whoever wins the attrition fight wins the match. Nahiri gives Jeskai the potential for free wins, but in practice she never does unless Jeskai already won the attrition fight.
- Game One Win %: 48% (24/50)
- Total Matches Won: 50% (25/50)
- Game One Jeskai suspends Ancestral Vision turn one: 46% (23/50)
- Game One Jeskai suspends Ancestral Vision turn one and wins: 87% (20/23)
Winning attrition by drawing cards was quite good and when Jeskai drew more at no cost it had the advantage.
-3 Abrupt Decay
Extra cards and mana disruption are pretty effective against Jeskai.
Same plan applies for Jeskai. Remand is great only against flashbacked Souls so it was cut for more impactful cards.
After the practice games we adjusted how I played Stoneforge Mystic. Initially I just played it as soon as possible, but that frequently allowed Jeskai to adjust how it sequenced its plays to answer the equipment more effectively, so I began playing it as the last threat once Jeskai was down on cards.
- Game One Win %: 54% (27/50)
- Total Matches Won: 52% (26/50)
- Game One Jeskai suspends Ancestral Vision turn one: 52% (26/50)
- Game One Jeskai suspends Ancestral Vision turn one and wins: 80.08% (21/26)
Jeskai got a few extra Ancestrals on the draw due to Abzan missing one Thoughtseize, in addition to normal variance. The increased wins came from Stoneforge being a threat by itself and then finding another threat.
Additional disruption and another Sword to search for to make Spirits into real threats.
An answer was required for all my equipment. My opponent tried Stony Silence and found it dead too often to use. The lack of card draw hurt more than expected.
Edit: As I mentioned, simply tutoring for Batterskull was shockingly powerful. That card was surprisingly hard to answer if it resolved. Even with Stony Silence around, sometimes the 4/4 just went unanswered. Playing two maindeck rather than a Sword would have been better for Abzan.
No data set is ever perfect, and as a result no analysis will ever be perfect. There are limitations and flaws in any study, and unfortunately my testing was no exception. What I didn’t realize when testing began was how the different sideboards would impact matchups. Mystic lacking stock’s sweepers had a noticeable effect on the creature matchups, as did the extra discard against combo and control for Mystic. As a result the deviation between total matches won was fairly small, due to cards missing from both sideboards having greater-than-expected impact on the overall win percentage.
To account for that I will be focusing my analysis on the game one win percentages. Maindeck composition between the test decks is very similar and also isolates the impact of Stoneforge Mystic rather than Stoneforge plus sideboard cards, so it is more useful analytically.
With our limitations in mind, lets look at the important numbers together.
- Stock vs. Infect Win %: 48% (24/50)
- Mystic vs. Infect Win %: 50% (25/50)
- Stock vs. Burn Win %: 40% (20/50)
- Mystic vs. Burn Win %: 66% (33/50)
- Stock vs. Ad Nauseam Win %: 42% (21/50)
- Mystic vs. Ad Nauseam Win %: 40 % (20/50)
- Stock vs. Merfolk Win %: 36% (18/50)
- Mystic vs. Merfolk Win %: 44% (22/50)
- Stock vs. Death’s Shadow Win %: 46% (23/50)
- Mystic vs. Death’s Shadow Win %: 44% (22/50)
- Stock vs. Jeskai Win %: 48% (24/50)
- Mystic vs. Jeskai Win %: 54% (27/50)
That’s still pretty messy. Let’s simplify things by tracking the change between versions.
- Infect Win % Change: 2%
- Burn Win % Change: 26%
- Ad Nauseam Win % Change: -2%
- Merfolk Win % Change: 8%
- Death’s Shadow Win % Change: -2%
- Jeskai Win % Change: 6%
Clearly Stoneforge had an effect on some matchups more than others. We can discount the 2% changes, as those represent only a single game’s difference, easily ascribed to normal variance. Jeskai, at only three games difference is right on the cusp of being relevant. I will ascribe a weak impact there, with Merfolk having a moderate impact. However Burn has been severely and unequivocally impacted by Stoneforge Mystic. In fact if you group the aggressive decks together you get a total impact of +34% for Abzan.
That would point towards confirming our hypothesis that Mystic would slow the format down by preying on aggro decks. However, that is not the full story. We must also consider Sheridan’s results with Affinity, and those indicate a worrying trend.
If we group the decks by fairness and look at the match results again:
- Burn: 26%
- Merfolk: 8%
- Jeskai: 6%
Less than Fair
- Death’s Shadow: -2%
- Infect: 2%
- Affinity: 12%, based on Sheridan’s results vs. Frank Karsten’s expectations, reported by Sheridan as low-impact.
- Ad Nauseam: 2%
Stoneforge Mystic affected fair decks far more often and more strongly than it did less-than-fair decks. This stands to reason when you consider that Batterskull is just a beater against combo and a much slower one compared to Tarmogoyf. Meanwhile it actively works against the aggro strategy and dominates the mid- to late-game. Even against midrange decks it is a strong, hard-to-kill threat. This split is corroborated, though not confirmed, by investigating the impact on the fair Death’s Shadow games vs. the unfair games.
- Stock vs. Zoo style win %: 48%
- Stock vs. Combo style win %: 42%
- Mystic vs. Zoo style win %: 62%
- Mystic vs. Combo style win %: 19%
Which yields an end result of:
- Total change vs. Zoo style win %: 14%
- Total change vs. Combo style win %: -23%
I doubt that Mystic is the actual reason the combo win rate got so much worse, but the results are the results. Adding Mystic to Abzan dramatically increased its win rate when Death’s Shadow played fair and hurt its chances at beating unfair attacks.
So what does all this mean? If my results accurately model real Modern, then it is fair to say that Stoneforge Mystic would not have an absolutely warping effect on the metagame. It is a powerful card but not truly degenerate, and it ultimately advantages fair midrange decks against aggressive decks.
The problem comes when we consider what kind of aggressive decks will feel the blow. Fair decks will be impacted much more strongly than unfair decks. Infect definitely doesn’t care about Batterskull any more than it does Tarmogoyf, and Affinity can care but it has plenty of options to get around it and win anyway. When Death’s Shadow is playing fair it cares as much as any Zoo type deck, but when it assembles its combo kill then Batterskull doesn’t matter. I would therefore expect them to try to combo more often.
The only reason that Batterskull would change an unfair combo matchup is by gaining more life than the combo can erase, which is hard considering how slow a clock Skull is compared to a turn two Tarmogoyf.
As a result, I would expect that in the wake of Stoneforge Mystic being unbanned there would in fact be a decrease in the total number of aggressive decks in Modern as Merfolk, Zoo, and Burn take a hit. This would slow things down as more players try slower decks with Mystic. However, after the initial slowdown, the format would accelerate as players notice that unfair decks aren’t affected. This will push players to play more Infect, Affinity, and combo decks and the aggro players will try to incorporate more unfair elements to fight back against Batterskull.
There is also the effect on other midrange decks to consider. The Jeskai results suggest that those decks that play Mystic will have an advantage over those that don’t. I suspect that had the Mystic deck run Painful Truths, Abzan would have been more strongly favored. The fact that Mystic still pushed it over Jeskai suggests that it would drive the format towards greater homogeneity. If you have to play Stoneforge to win, that does limit your deckbuilding options.
Could we adapt? Possibly. The Burn matchup would have been much more in Abzan’s favor if not for Destructive Revelry. Adding more targeted artifact removal might keep it in check, but I suspect that if players start doing that then Mystic decks will similarly adjust and run extra equipment and protection for it. This also doesn’t consider whether or not decks can afford the space with unfair decks running around.
Based on the results of my testing Stoneforge Mystic in Junk Abzan I recommend against unbanning. My results partially prove the hypothesis true, but analysis of the impact suggests that over the long term it will have the opposite effect.
While its power is manageable and it would give players more reason to play white, its impact would not be positive. It negatively impacts the viability of fair aggro decks and non-Stoneforge midrange decks, while having a negligible impact on the less fair decks. The likely outcome would be a shift to more unfair decks and the speed of the format increasing to try to ignore and invalidate Batterskull and Swords. Therefore there is no reason to unban Stoneforge Mystic.
I’m sure that many of you have questions about my conclusions, methods, or more specifics about how matchups played out. As always I am happy to discuss them with you in the comments. Next week, tune in for something completely different.
Edit: I should stress that my conclusions are based on how Stoneforge would actually impact decks rather than power level. I am less leery of Mystic based purely on power than I was, but after this was written I did some additional tests to confirm my results. The results were consistent: Stoneforge Mystic has a greater impact against fair decks than unfair decks. This takes Mystic out of consideration for unbanning in my mind, since fair decks struggle to gain ground in Modern as is, and I would rather not add another hurdle for them. If the format moves more fair and stays that way I could see Mystic being removed but right now it is not a serious consideration.