Investigating and Mitigating MTG: Arena Network Errors
Tracking down reasons for MTGA Network Errors using log files and Wireshark. Then mitigating using a VPN server in Azure.
While I was in the process of gathering packet captures for this write-up on 07/03/20 Arena decided to be extremely stable network wise for me. This may mean they fixed the stability issues which is awesome. The packet capture snippets I’ll be posting are from when I did get timeouts on 07/01/20.
If you’re just interested in the mitigation and not the investigation scroll down to the last section.
I’m sure every single MTG: Arena player has experienced this at least once. You’re playing a match (or drafting, or building a sealed deck, etc) and all of a sudden you can’t click on anything. Then it happens.
You scramble to gather a log, maybe check your internet connection and restart the client before you time out and lose the match. Sometimes you’re quick enough, sometimes you’re not. You go and look at MTG Arena bug reporting page and find that this sort of thing is extremely common.
After this happened to me multiple times in one play session I decided to dig into it a little bit more and maybe find a way to mitigate the problem.
The most obvious place to look would be the log files. If you happen to have saved a log from the client those live here: C:\Program Files\Wizards of the Coast\MTGA\MTGA_Data\Logs\Issue Reports
Otherwise the running log files live here (yes it’s actually Logs\Logs): C:\Program Files\Wizards of the Coast\MTGA\MTGA_Data\Logs\Logs
If you’ve ever looked at an MTG: Arena log file you’ve probably noticed that they’re pretty hard to read. Though apparently in the 1.09.01.00 Patch they may have remidaited this, at least by looking at the Patch Notes.
Added more logging to better identify & track down performance problems.
The logs are a mixture of plain text and JSON so I opened it up in Atom and turned on JSON syntax highlighting to make it slightly easier to parse by eye. Any text editor that supports language highlighting will do the trick.
Much better. It’s still a pretty big log file but at least I can pick out the JSON blocks and hopefully find the sections that pertain to the disconnects. Since I grabbed the log right after a disconnect the relevant sections are at the bottom.
Here’s the first one:
That’s still pretty hard to read so I did some magic. By magic I mean copied the JSON section into another document and ran it through jq
. Here’s the first snippet of that. The stack trace section is still pretty hard to read but we’re not trying to debug the program itself so I’m going to ignore that.
Time to pull out the relevant sections and take a look at them. The first section of the error message tells us pretty succinctly what happened. There was an error reading the SSL stream and the remote host closed the connection.
Key word here is remote which points to issues on the server end and not on the local end.
{
"function": "ProcessRead",
"description": "Error reading from SSL stream",
"exception": {
"ClassName": "System.IO.IOException",
"Message": "Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host.\r\n.",
"Data": null,
"InnerException": {
"NativeErrorCode": 10054,
"ClassName": "System.Net.Sockets.SocketException",
"Message": "An existing connection was forcibly closed by the remote host.\r\n",
"Data": null,
"InnerException": null,
"HelpURL": null,
"StackTraceString": " at System.Net.Sockets.Socket.EndReceive (System.IAsyncResult asyncResult) [0x00012] in <ae22a4e8f83c41d69684ae7f557133d9>:0 \r\n at System.Net.Sockets.NetworkStream.EndRead (System.IAsyncResult asyncResult) [0x00057] in <ae22a4e8f83c41d69684ae7f557133d9>:0 ",
"RemoteStackTraceString": null,
"RemoteStackIndex": 0,
"ExceptionMethod": null,
"HResult": -2147467259,
"Source": "System"
},
Then skipping over the stack trace we find the MTG: Arena server I was connected to and the port it uses. This will come in handy as we can search through the log file for that. The host that shows up in your log file may be different.
"status": 1,
"reason": "",
"host": "client.arenamatch-b.east.magic-the-gathering-arena.com",
"port": 9505,
"rtTicksRollingAvg": 18.2,
"rtTicksSamples": [
15,
15,
15,
15,
31
Searching through the log file for magic-the-gathering-arena.com
lets us find the related log entries. Most of them are related to the disconnects but there are a couple that are not.
[111] Connecting to Front Door: client.arenagame-b.east.magic-the-gathering-arena.com:9405
[8666] [UnityCrossThreadLogger]<== Event.MatchCreated
[49071] [UnityCrossThreadLogger]Client.TcpConnection.Close
[50469] [UnityCrossThreadLogger]<== Event.MatchCreated
[62548] BIError - UnityCrossThreadLogger.Error: [UnityCrossThreadLogger]TcpConnection.ProcessRead.Exception
[62548] BIError - UnityCrossThreadLogger.Error: [UnityCrossThreadLogger]Client.TcpConnection.Close
[62548] [UnityCrossThreadLogger]GREConnection.HandleWebSocketClosed
[62548] BIError - UnityCrossThreadLogger.Error: [UnityCrossThreadLogger]GREConnection.MatchDoorConnectionError
[62548] [UnityCrossThreadLogger]Client.TcpConnection.Close
[62548] BIError - UnityCrossThreadLogger.Error: [UnityCrossThreadLogger]Client.TcpConnection.Close
From this we can surmise that there are two servers:
client.arenagame-b.east.magic-the-gathering-arena.com
client.arenamatch-b.east.magic-the-gathering-arena.com
It looks like one is for the game as a whole and the other is match specific. When you start the game you connect to the arenagame
endpoint and when you start a match you appear to connect to the arenamatch
endpoint.
In this case a bunch of TCP and SSL related errors started happening on the arenamatch
endpoint and it ended with me getting disconnected from the game. This is about all we can do with the log files but it did give us enough information to move onto the next step of investigation: playing a bunch of matches with Wireshark open.
Before we get to that we need to get a little more info on the MTG: Arena servers so we know what to look for in Wireshark. A few minutes using nslookup
and we have our answer.
> client.arenagame-b.east.magic-the-gathering-arena.com
Server: 1.1.1.2
Address: 1.1.1.2#53
Non-authoritative answer:
client.arenagame-b.east.magic-the-gathering-arena.com canonical name = edge-arena-game-east-sfc-b.eastus2.cloudapp.azure.com.
Name: edge-arena-game-east-sfc-b.eastus2.cloudapp.azure.com
Address: 40.84.40.16
> client.arenamatch-b.east.magic-the-gathering-arena.com
Server: 1.1.1.2
Address: 1.1.1.2#53
Non-authoritative answer:
client.arenamatch-b.east.magic-the-gathering-arena.com canonical name = edge-arena-match-east-sfc-b.eastus2.cloudapp.azure.com.
Name: edge-arena-match-east-sfc-b.eastus2.cloudapp.azure.com
Address: 13.68.115.26
>
If there are b
servers then there also must be a
servers so I checked those too.
> client.arenagame-a.east.magic-the-gathering-arena.com
Server: 1.1.1.2
Address: 1.1.1.2#53
Non-authoritative answer:
client.arenagame-a.east.magic-the-gathering-arena.com canonical name = edge-arena-game-east-sfc-a.eastus2.cloudapp.azure.com.
Name: edge-arena-game-east-sfc-a.eastus2.cloudapp.azure.com
Address: 40.123.53.179
> client.arenamatch-a.east.magic-the-gathering-arena.com
Server: 1.1.1.2
Address: 1.1.1.2#53
Non-authoritative answer:
client.arenamatch-a.east.magic-the-gathering-arena.com canonical name = edge-arena-match-east-sfc-a.eastus2.cloudapp.azure.com.
Name: edge-arena-match-east-sfc-a.eastus2.cloudapp.azure.com
Address: 52.247.14.163
>
So there are a total of four servers that we could connect to all of which are living in the East US 2 region of Azure. Remember this, it’ll be important later.
Now that we have some IPs we can boot up Wireshark and MTG: Arena and hope we get disconnected while Wireshark is running. The safe way to do this would probably be to play against bots but where’s the fun in that?
The process for gathering packet captures for this issue was pretty simple. I played MTG: Arena with Wireshark open and waited until I got disconnected. As I mentioned at the beginning of this article I did manage to capture a disconnect on 07/01/20 but when I tried to replicate the behavior for fresh data on 07/03/20 I didn’t get disconnected.
When going through the packet capture I found that a pattern started to emerge. It would start with a series of three TCP Retransmission errors from my local machine to the server, two packets of normal communication and then a series of TCP Spurious Retransmission, TCP DUP ACK and TCP Retransmission errors between my local machine and the server. Finally after a certain time period (anywhere from 3–4 seconds to 8–10 seconds) it would resolve itself.
Without going too deep into the TCP Protocol basically these errors mean there was some sort of network packet loss going on and my machine and the MTGA server were trying to rectify it. As this article mentions a small number of these errors may be considered normal.
However it appears that if the time delta between the start of the errors and the end of the errors reached a certain threshold I would get a Network Error and have to restart the MTGA client to reconnect.
After looking at a number of these “fights” my computer and the Arena server had I found that if time delta between the start and the end of the errors reached a certain threshold I would get a Network Error and have to restart the MTGA client to reconnect.
Most of the errors (like the one above) cleared themselves up within 3–4 seconds and if they happened on my opponent’s turn I didn’t even notice any lag. However if the errors lasted between 8–10 seconds (see below) before they recovered I got the dreaded Network Error message.
I also found that the errors affect both the arenagame
and arenamatch
servers at around the the same time though only the arenamatch
server errors get put into the log.
It doesn’t appear to put anything in the log if you don’t actually get disconnected. I found this out after I had the client freeze on me and a bunch of errors spawned in Wireshark but I didn’t get kicked off and the client unfroze.
Basically this is a lot of words to say that these TCP Retranmission errors happen every so often but if they last longer than ~10 seconds you’re probably going to get kicked off.
Update 07/07/20: I am no longer able to connect to MTG: Arena while I’m behind the VPN. Not sure exactly what changed between last week and today but I hope the network stability issues are fixed so I don’t need to rely on the VPN.
Now that I had a hypothesis about why I was getting Network Disconnect errors I wanted to figure out if I could mitigate it somehow. Since the MTGA servers are hosted in Azure I decided to spin up an OpenVPN server in Azure in the same region.
I wrote up a guide for how to do this and you can find it below. If you know your way around Azure they have a Quickstart template for OpenVPN.
After getting the server up and running I connected to it from my machine, verified my IP was showing up as coming from Azure in the right region and started up Arena. I played for a couple of hours and didn’t have a single disconnect or any lag that I noticed.
When I looked at the packet captures I found that the TCP Retransmission errors were still happening but they were being resolved extremely quickly, in under a second in almost every case.
This was a huge improvement over my experience when not playing over a VPN and I don’t live that far away from the Azure East US 2 region relatively speaking.
So what’s the takeaway here?