This article also has a Chinese version.
This article mainly analyzes the currently popular Trojan protocol and proposes a better solution based on the characteristics of current man-in-the-middle (MITM) attacks.
The implementation of this solution is ShadowTLS, for which you can find the complete code and pre-compiled binaries on Github.
To hide traffic characteristics, one way is to not expose any features, as with shadowsocks: this type of protocol encrypts the protocol headers for transmission, so no obvious features are observed. The second way is to hide oneself among the crowd, with the simplest method being to masquerade as HTTP or TLS traffic, corresponding to the approaches of simple-obfs and Trojan, respectively.
The first method is now relatively easy to identify. Traffic that does not hit any protocol but has timing characteristics consistent with web traffic can be simply assumed to be of that type. The second approach has become increasingly mainstream in recent years, with the Trojan protocol being the most widely used (simple-obfs, which just adds an HTTP protocol header at the beginning, is too easy to identify and will not be analyzed here).
Trojan aims to encapsulate traffic into normal TLS traffic. Since TLS traffic is encrypted, it’s difficult for a MITM to identify whether the traffic is ordinary web traffic or proxy traffic encapsulated by another layer. To make it more convincing, Trojan also defends against active probing by properly responding when a browser directly accesses the corresponding webpage.
The problems it aims to solve are mainly:
- Proxy request carriage: It must be able to encode the proxy request into binary, and the server side must be able to decode this request to establish a remote connection and relay traffic.
- Differentiating traffic from clients and active probes: It needs a way to distinguish between a client’s request and that of an active probe and to treat them differently.
- Handling traffic from clients and active probes: Different treatment is needed after distinguishing the traffic. Client traffic needs to be carried by the TLS protocol, and active probe traffic also needs to be able to act like http.
The official protocol specification writes about this here: The Trojan Protocol. Solving problem 1 is simple, as the protocol is exposed as a SOCKS5 proxy above, so you can directly pack the SOCKS5 proxy request header (similar to shadowsocks).
The focus is on problems 2 and 3. The method here is to first establish a TLS session, then send an authentication via the first 56 bytes in the TLS connection. If these 56 bytes match a certain hash result of our preshared key, then we consider the traffic to be from our client.
An obvious issue arises here: as an attacker, after establishing a TLS session, if I send an HTTP request less than 56 bytes, I can determine whether it is a Trojan server by seeing if it gets stuck because routing cannot be done before the data reaches 56 bytes.
In fact, this issue does not exist. Let’s look at the details of the protocol design: these 56 bytes are hex(SHA224(password)), followed by CRLF. Isn’t it strange? Why would a binary protocol use something like CRLF, which is only used in text protocols? And wouldn’t it be more efficient to send the binary result of SHA224 than to send hex? This is where the subtlety of the protocol design lies.
This CRLF is actually for corresponding to HTTP traffic. On the server side, you read until CRLF, after which routing can be done. Because HTTP traffic requires processing after it has sent CRLF.
So after reading the first CRLF, either the hex(SHA224(password)) is finished or the first line of the HTTP request is completed. In either case, we can now distinguish for routing. For example, if we find that the data is less than 56 bytes, then we can directly determine it as active probe traffic without having to wait to receive the full 56 bytes. The reason for using hex is to avoid accidentally including CRLF in the hash result, which would affect our determination.
Incidentally, during the research of this protocol, I read the implementation of the Trojan in both C and Go versions. In fact, the Go version has issues; perhaps the author didn’t grasp all the tricks in the protocol design and made a direct read—if the data is insufficient or the hash does not match, it’s identified as active probe traffic. However, we cannot take it for granted that reading 56 bytes at once is natural—as a stream protocol, TCP is in compliance with POSIX standards even when reading 1 byte at a time.
Correction: After being corrected by the netizen RPRX, the statement here is indeed wrong. The reading and writing here are not over the bare TCP stream but over TLS traffic. TLS traffic has frames and theoretically has the guarantee that reading and writing correspond one to one.
The official Go TLS library exposes
io.Writerinterfaces, which are streaming interfaces and the official implementation of the TLS library does not ensure the correspondence, so the statement here should be corrected to depend on specific behavior under specific conditions rather than the interface itself.
Everything seems normal? We package all data into TLS, so outsiders can’t distinguish what the encrypted data is about, as if we’re always requesting some web service, and if we browse web pages, the timing features of proxy traffic also resemble web traffic.
If we don’t consider some specific implementation traits, the only thing exposed here is the SNI (Server Name Indication) and the corresponding certificate. In the TLS Client Hello, the target domain name we request is exposed, and continuously requesting a niche domain with large traffic might not look normal.
Is there a better way to camouflage? If we use TLS, we need to handle handshakes ourselves; if we want handshakes, we need to issue certificates for our domain names. It seems like an intractable problem…
Wait a minute! We’re just disguising as TLS traffic, who said we really need to use TLS?
So, can we perform a “TLS performance” for the MITM to see? The server can directly proxy this performance data to some large companies or institutions on the whitelist, and the MITM sees a handshake that is legal and matches the whitelist domain’s handshake. After the handshake is over, the client and server switch modes and use the established connection to transfer custom data.
To switch modes, both sides need to sense the end of the handshake. Here we force the use of TLS1.2, and after observing a Change Cipher Spec package followed by reading another Handshake package, the handshake is marked complete.
We do not want to implement data encryption or proxy protocol packaging ourselves, so the custom data here is directly handled by shadowsocks. Our ShadowTLS works as a wrapper for shadowsocks traffic; for the client, it adds a layer of handshake data to the traffic, and for the server, it removes this layer of handshake data.
Up to this point, if we assume that the MITM:
- Does not analyze traffic after the handshake.
- Does not carry out active probes.
Then our protocol can work very effectively. Packet capture shows that from the perspective of the MITM, we’re really communicating with a trusted domain over TLS. According to feedback, this version has helped some people avoid QoS issues targeted at domains from the end of August to the beginning of October 2022.
We’ve only performed a simple performance so far and made two assumptions, but in reality, these assumptions do not hold. We need to be able to address these two problems.
Normal TLS data, after the handshake, uses the Application Data encapsulation packet for communication. Directly forwarding the shadowsocks data stream does not conform to the TLS protocol at all, and even Wireshark would highlight subsequent packets to indicate there’s a problem. Resolving this issue is not difficult; we only need to perform encapsulation and decapsulation on both sides.
To be able to deal with active probes, we need to be able to do two things (the same as Trojan needs to do):
- Differentiate client traffic from active probe traffic.
- Respond correctly to active probe traffic.
We need the client to provide something special to determine that this is our client’s traffic. To avoid active probing, we must introduce a preshared key. But how?
In the Trojan protocol, just sending the password hash will suffice. But here, we only have a plaintext channel to use, so sending the password hash directly exposes the password, meaning the password becomes meaningless and completely undefendable against data replay.
Given the plaintext channel, we can only use a challenge-response form of authentication. Normally, to authenticate the client, the server side would send a challenge. But we can’t actually do this because a normal HTTPS server wouldn’t send back a challenge after the TLS handshake.
So, can we hide the challenge in the normal handshake? The requirements for the challenge are simple: random and uncontrolled by the client. My thought is that the handshake process itself contains data sent by the server that can serve as a challenge: it has random data, like the server random, and it is not client-controllable.
Here, I treat all data sent by the server during the handshake process as a challenge (of course, server random can also be used, but that requires parsing TLS packets, which is a bit cumbersome to implement and may introduce detailed feature distinctions), which weakens reliance on the details of the TLS protocol as much as possible.
With a challenge, how should the response be made? Obviously, we need to authenticate the preshared key, so we use
hmac(data, key) as the response (which can be simply understood as
hash(data+key), but with better security, and both can be computed in a streaming manner without data caching).
How is this Response data sent back? If it is sent as a separate data packet, it will introduce new distinct features. So here, we put this Response at the head of the first Application Data packet sent to the server side.
I use the first 8 bytes of hmac-sha1 for this hmac, which is secure enough.
During data forwarding, Application Data encapsulation and decapsulation will be done. One issue to consider here is how large an individual Application Data packet is under normal circumstances? The current implementation has directly decided on a buffer size, but to prevent this packet size from becoming a feature, it is necessary to research TLS library implementations and observe packet captures to decide on a reasonable maximum value.
We can simplify the server model: default connection to handshake server; if hmac authentication passes, then switch to data server.
For active probe traffic, it is impossible to guess the 8-byte hmac correctly, so it will never switch to the data server. To avoid unnecessary hash calculations, when the hmac verification does not pass for the first N Application Data packets (choosing N instead of 1 because it is uncertain whether sending Application Data marks the end of the handshake), it will switch to direct proxying immediately, and subsequent hmac calculations and verifications will no longer be attempted.
The detailed protocol design is written here for those interested.
Compared to Trojan, ShadowTLS does not need to issue its certificates (it can directly use the trusted domain names of large companies or institutions) and does not need to start a disguised HTTP service (because the data is directly forwarded to the trusted domain’s corresponding website). Using a trusted domain name can further weaken features and hide in plain sight.
ShadowTLS and Trojan can both handle active probes; when using a browser to open directly, they can both access the HTTP page normally.
UPDATED AT 2022-11-13
More than a month has passed since the release of v2, and ShadowTLS has seen good results: during a period when Trojan was massively blocked, ShadowTLS remained useable. Currently, ShadowRocket and Surge both support this protocol (although I still can’t afford Surge).
But there are actually many areas for improvement:
For the Server, we directly forward the traffic, so there’s no fingerprint issue; but the Client is our own implementation, and we expect it to look like a browser or other normal clients, but in fact, it might not be quite right. If you capture the Client Hello packet sent by Chrome, you can clearly see it contains many Extension fields, which are not automatically attached when using rustls; and the default choices of Cipher, Hash lists, etc., differ between clients.
So one area of improvement is to provide multiple Client TLS Profiles for users to choose from.
This issue mentions a real problem: what if someone hijacks the traffic on the Client side to the handshake server?
Firstly, who does the Client trust? Before it completes the TLS handshake, its behavior is the same as that of a normal TLS Client. To gain the Client’s trust, first, the certificate verification must be passed, and the TLS handshake must be completed. The people who can do this include our Server, the handshake server itself, and other MITMs proxying the handshake.
I think we can assume that the handshake server is uncontrollable by the MITM, and its certificate cannot be held by the MITM. So now the focus is on MITMs who proxy handshakes like us. The MITM does not need to decrypt the traffic; its goal is to identify if we are a normal connection. So although it has not obtained the decryption key, it can still hijack and replay the traffic for its purpose:
- Directly hijack the entire connection to the handshake server (the attack method mentioned in the issue): After the Client completes the protocol switch, it will give itself away, and the handshake server will return an Encrypted Alert.
- Normally proxy traffic, but secretly drop or shuffle an Application Data packet: It should return an Encrypted Alert, but because we do not authenticate the message nor encrypt it, we are actually not aware of it, and this is passed on to the underlying service. We depend on the underlying service to disconnect and return an Encrypted Alert.
- Observing disconnection: As mentioned in 2, we need to properly handle the disconnection problem, whether it’s a normal shutdown or an abnormal one. But the current implementation does not send an Encrypted Alert.
- Merging adjacent Application Data: Normally, there are sequence numbers and MACs within the TLS protocol, but our current encapsulation does not have these, so if the hijacker merges the adjacent Application Data and the connection still operates normally, then it can also be discovered as disguised TLS.
However, these problems (except for problem 3) require traffic to be hijacked on the main link to be effective. Without other hints, hijacking all outbound tls traffic is still very risky. So I think the issue is actually not that big.
We can simply fix some of the issues mentioned earlier (these are some implementation issues, not protocol problems): provide Client TLS Profile, send Encrypted Alert when the connection is closed. But how to deal with the remaining traffic hijacking issues?
We can see that the crux of the problem is that the Client does not authenticate the Server (just certifies the certificate). The Server needs to show its identity. If it is placed in some Extension in the Server Hello, it may become a clear feature; if it is directly carried in the subsequent traffic, it will also confuse the prober, who cannot decrypt normally.
We need such a place: it is sent by the Server, inherently a random number, and we can modify it without any impact, best to be sent after the Server Random (this way we can use Server Random to defend against replay attacks). Actually, we can hide things on IP packets, but this requires system administrator privileges, which introduces stronger environmental restrictions, so we look for such a place above TCP as much as possible.
Thus, we find a place that meets the criteria: the Session ID (only for TLS 1.3’s Session ID). Since we completely trust the Server Random is random, we can do something even simpler here: if a relayed Server Hello packet contains a 32-byte Session ID, replace that ID with the HMAC of the Server Random. Since the Session ID field is empty by default in TLS 1.2, we cannot rashly insert this value for TLS 1.2 to avoid becoming a feature.
Our Application Data encapsulation can be reshaped, but real TLS traffic can’t, so we need to carry some data for verification within this encapsulation layer. Such verification helps to detect the problems mentioned earlier in points 2 and 4, after which only an Alert response and disconnection are needed.
Of course, none of these have been implemented yet (until November 13, 2022). If you’re interested, you’re welcome to file an issue to claim a contribution!
The fix implementation and defense against direct hijacking to the handshake server can be compatible with old Client versions, but to add MAC to Application Data, protocol changes are necessary (possibly v3)～
This section was updated in February 2023, for the complete content, please see the link.
In August 2022 I implemented the first version of the ShadowTLS protocol. The goal of the V1 protocol was simple: to evade man-in-the-middle traffic discrimination by simply proxying the TLS handshake. v1 assumed that the man-in-the-middle would only observe handshake traffic, not subsequent traffic, not active probes, and not traffic hijacking.
However, this assumption does not hold true. In order to defend against active probing, the V2 version of the protocol added a mechanism to verify the identity of the client by challenge-response; and added Application Data encapsulation to better disguise the traffic.
The V2 version works well so far, and I have not encountered any problem of being blocked in daily use. After implementing support for multiple SNIs, it can even work as an SNI Proxy, which doesn’t look like a proxy for data smuggling at all.
But the V2 protocol still assumes that the middleman will not do traffic hijacking (refer to issue). The cost of traffic hijacking is relatively high, and it is not widely used at present. The means of man-in-the-middle are still mainly bypass observation and injection, and active detection. However, this does not mean that traffic hijacking will not be used on a large scale in the future, and protocols designed to resist traffic hijacking must be a better solution. One of the biggest problems faced is that it is difficult for the server side to identify itself covertly.
- Capable of defending against traffic signature detection, active detection and traffic hijacking.
- Easier to implement correctly.
- Be as weakly aware of the TLS protocol itself as possible, so implementers do not need to hack the TLS library, let alone implement the TLS protocol themselves.
- Keep it simple: only act as a TCP flow proxy, no duplicate wheel building.
The V3 protocol only supports handshake servers using TLS1.3 in strict mode. You can use
openssl s_client -tls1_3 -connect example.com:443 to detect whether a server supports TLS1.3.
If you want to support TLS1.2, you need to perceive more details of the TLS protocol, and the implementation will be more complicated; since TLS1.3 is already used by many manufacturers, we decided to only support TLS1.3 in strict mode.
Considering compatibility and some scenarios that require less protection against connection hijacking (such as using a specific SNI to bypass the billing system), TLS1.2 is allowed in non-strict mode.
This part of the protocol design is based on restls, but there are some differences: it is less aware of the details of TLS and easier to implement.
The client’s TLS Client constructs the ClientHello, which generates a custom SessionID. The length of the SessionID must be 32, the first 28 bits are random values, and the last 4 bits are the HMAC signature data of the ClientHello frame (without the 5-byte header of the TLS frame, the 4 bytes after the SessionID are filled with 0). The HMAC instance is for one-time use only, and the instance is created directly using the password. A Read Wrapper is also needed to extract the ServerRandom from ServerHello and forward the subsequent streams.
When the server receives the packet, it will authenticate the ClientHello, and if the authentication fails, it will continue the TCP relay with the handshake server. If the identification is successful, it will also forward it to the handshake server and continuously hijack the return stream from the handshake server. The server side will:
- log the ServerRandom in the forwarded ServerHello.
- do the following with the content portion of all ApplicationData frames.
- transform the data to XOR SHA256 (PreSharedKey + ServerRandom). 2.
- Add the 4 byte prefix
HMAC_ServerRandom(processed frame data), the HMAC instance should be filled with ServerRandom as the initial value, and this HMAC instance should be reused for subsequent ApplicationData forwarded from the handshake server. Note that the frame length needs to be + 4 at the same time. 3.
The client’s ReadWrapper needs to parse the ApplicationData frame and determine the first 4 byte HMAC:
HMAC_ServerRandom(frame data)is met, the server is proven to be reliable. These frames need to be filtered out after the handshake is complete. 2.
HMAC_ServerRandomS(frame data)is met, it proves that the data has finished switching. The content part needs to be forwarded to the user side.
- If none of them match, the traffic may have been hijacked and the handshake needs to be continued (or stopped if the handshake fails) and a random length HTTP request (muddled request) sent after a successful handshake and the connection closed properly after the response is read.
- When traffic is hijacked, Server will return data without doing XOR and Client will go straight to the muddling process.
- ClientHello may be replayed but cannot use its correct handshake (discussion of restls), so there is no way to identify whether the XOR data we return with a prefix is decodable.
- If Client pretends the data is decrypted successfully and sends the data directly, it will not be able to pass because of the data frame checksum.
The V2 version of the data encapsulation protocol is in fact not resistant to traffic hijacking, e.g., the middleman may tamper with this part of the data after the handshake is completed, and we need to be able to respond to Alert; the middleman may also split one ApplicationData package into two as in the V2 protocol, which can also be used to identify the protocol if the connection is normal.
To deal with traffic hijacking, in addition to optimizing the handshake process, the data encapsulation part also needs to be redesigned. We need to be able to authenticate the data stream and resist attacks such as replay, data tampering, data slicing, and data disorder.
In addition to continuing to use ApplicationData encapsulation for the outermost layer of data, we added a 4 byte HMAC computed value to the inner layer. After we create the HMAC instance with the preshared key, we fill in
ServerRandom+"S" as the initial value, the former corresponds to the sent data stream of the Client, the latter corresponds to the sent data stream of the Server (the purpose is to prevent the man-in-the-middle from sending back the data we sent, or replaying the data from different connections). In the forwarding process, the pure data is first filled into the HMAC instance, and then the 4 byte value is calculated and placed at the top of the pure data. The encapsulated data frame format: (5B tls frame header)(4B HMAC)(data). After encapsulation, the 4 byte data is fed into the HMAC instance (to avoid man-in-the-middle cut splicing requests).
When the data checksum fails, we need to send a TLS Alert immediately to close the connection properly. We also need to be able to close the connection correctly when it is broken.
- For man-in-the-middle data tampering, HMAC will directly verify it and will respond to Alert.
- For man-in-the-middle disorder attack, HMAC will directly verify it and respond to Alert.
- For cut and splice attack (merging two AppData), although HMAC is processing the data stream, we can interrupt two consecutive streams to defend against this attack because we update in an additional 4 byte value after the processing is completed.
ShadowTLS is implemented in Rust using Monoio, which, based on io_uring and the thread-per-core model, can bring better IO performance (however, since Monoio does not currently support Windows, Windows users are temporarily unable to use it, and it is recommended to use WSL instead).
In conclusion, this article attempts to analyze mainstream TLS-based proxy protocols and proposes a better protocol design for its potential flaws, and provides the corresponding implementation. You can find the code here.