GoVPN, secure censorship resistant VPN daemon history and implementation decisions (Sergey Matveev’s homepage)

Next: Написание хороших писем, Previous: Нахождение OpenPGP ключей, Up: Articles

GoVPN, secure censorship resistant VPN daemon history and implementation decisions ¶

This article tells about GoVPN free software daemon: why it was born, what tasks it is aimed to solve, technical overview.

Birth and aimed targets.

There are plenty of various data transport securing protocols and implementations. If you just want to connect either two computers, or two networks, then you can use: TLS, SSH, IPsec, MPPE, OpenVPN, tinc and many others. All of them could provide confidentiality, authenticity of transmitted data and both sides authentication.

But I, being an ordinary user, found that lacking of strong password authentication capability is very inconvenient. Without strong password based authentication I always have to carry high entropy private keys with me. But, being the human, I am able to memorize long passphrases, that have enough entropy for authenticating myself and establishing secure channel.

Probably the most known strong password authentication protocol is Secure Remote Password (SRP). Except for various JavaScript-based implementations, I know only lsh SSHv2 daemon supporting SRP and GnuTLS supporting TLS-SRP. Replacing OpenSSH with lsh is troublesome. TLS-SRP must be supported not only by the underlying library. So SRP hardly can be used in most cases in practice.

My first target: strong password authentication and state-of-art robust cryptography.

Moreover the next problem is protocols and code complexity. Is is the strongest enemy of security and especially all cryptography-related solutions. TLS is badly designed (remember at least MAC-then-Encrypt) and the most popular OpenSSL library is hated by overall humanity. OpenSSH gained -etm MAC modes not so long ago. IPsec is good protocol, but its configuration is not so easy. OpenVPN is working and relatively simple solution, but it is not aware of modern fast encryption and authentication algorithms. And the codebase of all those projects is big enough not to look at, but just trust and hope that no serious bugs won’t be found anymore. OpenSSL demonstrates us that huge open-source community is not enough for finding critical bugs.

My second target: KISS, small codebase and simple, very simple reviewable code and protocol. Without unnecessary complexity, without explicit compatibility with previous solutions.

The next question I am aware of is: why all those existing protocols are too easy to distinguish one from another and filter on DPI-level state firewalls? Basically I do not have much against censorship, because it is necessary anyway, but DPI solutions, as a rule, are so crude and clumsy that deal big harm to innocent servers and users, destroying the Internet as a fact, leaving only huge Faceboogle corporations alive. I like all-or-nothing solutions: either I have got working data transmission routed payed channel through the ISP, or I have got nothing, because no – having only Facebook, YouTube, Gmail and VKontakte access is useless to me at all.

My third target: make more or less censorship-resistant protocol, where nobody can distinguish it from, for example, cat /dev/urandom | nc remote.

And of course, zero target: make it free software, without any doubts, so everyone can benefit from its existence.

Daemon overview.

GoVPN does not use any new fresh technologies and protocols. It does not use not well-studied cryptographic solutions. I do not violate the rule: do not create and implement crypto by yourself. Well, less of more. All critical low-level algorithms, except for some simple ones, are included and written by true crypto gurus. All cryptography must be proved by time.

I decided to use Go programming language. It is mature enough for that kind of tasks, very easy to read and support. Simplicity, reviewability and supportability can easily be achieved with it.

From VPN daemon point of view, here is its current state:

Works with layer 2 TAP virtual network interfaces.
Single server can work with multiple clients, each with its own configuration, possible up/down-hooks.
Works over either UDP, TCP, or HTTP proxies with CONNECT method. IPv4/IPv6 supported.
Client is single executable binary with a few command line options. Server is a single executable binary with single YAML configuration file.
Built-in rehandshaking and heartbeating.

Client authentication tokens.

All client are identified by 128-bit random number. It is not explicitly transmitted in the clear – so others can not distinguish one client’s session from another. Mutual client-server authentication is performed using so-called pre-shared verifier. Client’s identity, verifier and memorable passphrase is everything you need. Example client id with verifier is:

$argon2d$m=4096,t=128,p=1$4lG67PhgB0qCh7xB+3a+eA$NjUo1kV/L19wP2+htdJA4qIVNlS72riT3E8wfse4jJM

Transport protocol.

Let’s dive deeper in its protocol. Basically it includes: transport protocol and handshake protocol.

Transport protocol is very straightforward from modern cryptographic point of view. Basically it is similar (but not the same) to Bernstein’s NaCl solution:

TAG || ENCRYPTED || NONCE

Tag is Poly1305 authentication over the whole data packet. Nonce is the incrementing counter (odd values are server ones, even are client’s). Encryption is done over padded payload with Salsa20 symmetric encryption algorithm.

Nonce is not secret information, so can be sent in the clear. But it will be easily detected and censored – one knows that this is some kind of nonce-encrypted traffic. So I decided to obfuscate it using PRP (pseudo random permutation) function XTEA. It is very simple in implementation and fast enough for short (8 byte) payloads. It does not add any security, but randomizes the data making DPI censorship the hard task. Nonce encryption key is derived from the session one after the handshake stage.

Nonce is used for replay-attack detection and prevention. We memorize the previous ones and check if they are met again. In TCP mode all messages have guaranteed delivery order, so any desynchronization leads to immediate disconnection. In UDP mode messages can be delivered in varying time, so we have small bucket storage of nonces.

Most protocols does not hide underlying messages lengths. Data can stay confidential, but its size and time of appearance can tell much about traffic inside the VPN. For example relatively easily you can tell that DHCP is passing through the tunnel. Moreover you can watch impact of data transmission inside the tunnel and external system’s behaviour. This is metainformation leak.

Noise can be used to hide message length. GoVPN pads the payload before encryption by appending 0x80 and necessary number of zeros. Anyway after encryption they will look like pseudo-random noise. Heartbeat packets have zero payload length, consisting only of padding. All packets will have the same (maximal) size. Of course this consumes the traffic, so it can be rather expensive.

PAYLOAD || 0x80 || 00 || ...

Authentication tag looks like noise that never repeats among all sessions (probability is negligible), encrypted nonce with ephemeral session key also repeats with negligible probability, and an encrypted payload also look like noise. Adversary does not see any structure.

GoVPN also can hide messages timestamps: time of their appearance. Idea is pretty simple and similar to the noise: constant packet rate traffic. Your tunnel will have fixed transmission speed. Large data amount will be slowly transmitted, while absence of the real payload will be hidden with zero-sized (but padded) packets. One can not distinguish the "empty" channel from the loaded one.

Why nonce is located at the end of the packet? Because we do not have already separated one from another messages in TCP mode, unlike UDP. In TCP mode we have got stream of pseudo-random bytes. But it guarantees order of delivery – so we can predict the next nonce value. As we know nonce PRP encryption key, we can also predict its real value. So we just wait for that expected value to determine the borders of transmitted message. We can not add clearly visible structure, because it will be visible also to DPI system and thus can be censored.

Salsa20 encryption key is generated every time for each session during handshake procedure. It is ephemeral – so compromising of your passphrase can not reveal encryption and authentication keys. This is called perfect forward secrecy (PFS) option. Poly1305 uses one-time authentication keys derived from Salsa20’s ciphertext, similarly to NaCl. Unlike many block-cipher based modes and implementations, Salsa20+Poly1305 does not consume entropy for any kind of initialization vectors.

Handshake protocol.

The most complex part is the handshake procedure.

At first, you need Diffie-Hellman protocol. It is simple, well-studied and de-facto protocol for establishing ephemeral session keys. Our choice is curve25519 protocol. It could be very trivial:

┌─┐          ┌─┐
│C│          │S│
└┬┘          └┬┘
 │  CDHPub    │
 │───────────>│
 │            │
 │  SDHPub    │
 │<───────────│
 │            │

Peers send their public curve25519 public keys and performs computation that should result in identical result. That result is not random data ready to be used as a key, but elliptic curve point. We can hash it for example to make it uniform pseudo-random string – session key.

SessionKey = H(curve25519(ourPrivate, remotePublic))

This scheme of course can not be used because it lacks peers authentication. We can use encrypted key exchange (EKE) technique: encrypt Diffie-Hellman packets with pre-shared symmetric secret. That way we provides indirect authentication: if any peer does not know shared symmetric secret, then it won’t decipher public key correctly and derive the same session key. For symmetric encryption we could use Salsa20:

┌─┐                     ┌─┐
│C│                     │S│
└┬┘                     └┬┘
 │enc(SharedKey, CDHPub) │
 │──────────────────────>│
 │                       │
 │enc(SharedKey, SDHPub) │
 │<──────────────────────│
 │                       │

Salsa20 is a stream cipher, so it is fatal if encryption parameters are used twice. Our shared secret is constant value, so we have to provide random nonce R each time. It is not secret information, so we can send it in the clear. The response packet from the server can increment it to derive another usable nonce value:

┌─┐                           ┌─┐
│C│                           │S│
└┬┘                           └┬┘
 │R, enc(SharedKey, R, CDHPub) │
 │────────────────────────────>│
 │                             │
 │enc(SharedKey, R+1, SDHPub)  │
 │<────────────────────────────│
 │                             │

We can not use low-entropy passwords for SharedKey in the scheme above. One can intercept our packets and brute-force (dictionary attack) the password, checking on each attempt if deciphered message contains elliptic curve point. Problem here is that adversary is capable to understand if he decrypted the message successfully.

Thank goodness for Elligator encoding algorithm! This encoding is capable to encode some elliptic curve points to the uniform string and vice versa. Not all points can be converted – only a half in the average, so we could generate ephemeral curve25519 keypairs more than once during single session. By applying this encoding we remove adversary’s ability to distinguish successful decryption from the failed one – any plaintext will look like uniform pseudo-random string. That solution is commonly called password authenticated key agreement (PAKE).

┌─┐                              ┌─┐
│C│                              │S│
└┬┘                              └┬┘
 │R, enc(Password, R, El(CDHPub)) │
 │───────────────────────────────>│
 │                                │
 │enc(Password, R+1, El(SDHPub))  │
 │<───────────────────────────────│
 │                                │

But we still do not authenticate peers explicitly. Of course if our passwords are not equal, then derived session key will be wrong and transport layer authentication will fail immediately, but nobody guarantees us that transport layer will transmit packets immediately after handshake is completed.

For that task we just send random number using the session-key and wait for the same response from the remote side. So client authentication will look like this (RS is the server’s random number):

┌─┐                                            ┌─┐
│C│                                            │S│
└┬┘                                            └┬┘
 │       R, enc(Password, R, El(CDHPub))        │
 │─────────────────────────────────────────────>│
 │                                              │
 │enc(Password, R+1, El(SDHPub)), enc(K, R, RS) │
 │                                              │
 │                                              ────┐
 │                                                  │ compare(RS)
 │                                              <───┘
 │                                              │

And to perform mutual authentication we do the same (RC is client’s random number):

┌─┐                                            ┌─┐
│C│                                            │S│
└┬┘                                            └┬┘
 │       R, enc(Password, R, El(CDHPub))        │
 │─────────────────────────────────────────────>│
 │                                              │
 │enc(Password, R+1, El(SDHPub)), enc(K, R, RS) │
 │                                              │
 │                                              ────┐
 │                                                  │ compare(RS)
 │                                              <───┘
 │                                              │
 │               enc(K, R+2, RC)                │
 │<─────────────────────────────────────────────│
 │                                              │
 ────┐                                          │
     │ compare(RC)                              │
 <───┘                                          │

This is under question is it needed, but some protocols provide explicit pre-master keys, master key sources. Diffie-Hellman derived keys may contain not enough entropy for long-time usage. So we additionally transmit pre-master secrets (this is terminology is taken from TLS) from both sides: 256-bit random strings. Resulting master session key that will be used in the transport protocol is just a XOR of two pre-master keys. If one communication party does not behave honestly and does not generate ephemeral keys every time – XORing its permanent keys with the random ones of the honest one will give your perfect forward secrecy anyway. SC and SS are pre-master keys of the client and server sides.

┌─┐                                               ┌─┐
│C│                                               │S│
└┬┘                                               └┬┘
 │        R, enc(Password, R, El(CDHPub))          │
 │────────────────────────────────────────────────>│
 │                                                 │
 │enc(Password, R+1, El(SDHPub)), enc(K, R, RS+SS) │
 │                                                 │
 │                                                 ────┐
 │                                                     │ compare(RS)
 │                                                 <───┘
 │                                                 │
 │                enc(K, R+2, RC)                  │
 │<────────────────────────────────────────────────│
 │                                                 │
 ────┐                                             │
     │ compare(RC)                                 │
 <───┘                                             │

Augmented EKE.

Are we satisfied now? Not yet! Our password is known both to client and server. If the later one is compromised, then adversary get our secret. There are so-called augmented encrypted key exchange protocols. Actual secret is kept only on client’s side. Server side keeps so called verifier – something that can approve client knowledge of the secret.

That kind of proof can be achieved using asymmetric digital signatures. So we use the passphrase as an entropy source for creating digital signature keypair. Its public key is exactly that kind of verifier that will be stored on the server’s side. For convenience we use hash of that public key as a key for symmetric encryption in EKE protocol.

For proving the knowledge of the secret key we have to make a signature with it. We just sign our handshake ephemeral symmetric key. H() is the hash function (BLAKE2b algorithm), DSAPub is the public key derived from user’s passphrase (ed25519 algorithm).

┌─┐                                                ┌─┐
│C│                                                │S│
└┬┘                                                └┬┘
 │        R, enc(H(DSAPub), R, El(CDHPub))          │
 │─────────────────────────────────────────────────>│
 │                                                  │
 │enc(H(DSAPub), R+1, El(SDHPub)), enc(K, R, RS+SS) │
 │                                                  │
 │                                                  ────┐
 │                                                      │ compare(RS)
 │                                                  <───┘
 │                                                  │
 │                                                  ────┐
 │                                                      │ Verify(DSAPub, Sign(DSAPriv, K), K)
 │                                                  <───┘
 │                                                  │
 │                 enc(K, R+2, RC)                  │
 │<─────────────────────────────────────────────────│
 │                                                  │
 ────┐                                              │
     │ compare(RC)                                  │
 <───┘                                              │

I want to note again: R, El(...), all sent ciphertexts – all of them looks like a random strings for the third party that never repeat and does not have any visible structure. So DPI hardly can determine is it GoVPN’s handshake messages.

Elligator encoding of curve25519 public keys provides zero-knowledge strong password authentication, that is immune to offline dictionary attacks. Even if our password is "1234" – you can not check in offline if it is true while having all intercepted ciphertexts.

Server does not know our cleartext secret passphrase – it knows only its derivative in the form of public key. But it still can be dictionary attacked. If server’s verifiers are compromised, then you can quickly check if public key (verifier) corresponds for example to "1234" password.

We can not protect ourselves from this kind of attack. Strong passphrases still is important. But at least we can harden dictionary attack by strengthening those password. It is well known practice: PBKDF2, bcrypt, scrypt and similar technologies. As a rule they contain some very slow function (to decrease attack rate) and a "salt" for increasing the entropy and randomizing equal passwords.

We use password hashing competition winner: Argon2 algorithm. Client’s identity used a salt. ed25519 keypair is generated from the strengthened password derivation. It is computed only during session initialization on the client side once.

PrivateKey    Verifier -----> Server storage
    ^         ^
    |        /
    |       /
    |      /
ed25519Generate(strongpass)
                     ^
                     |
                     |
                  Argon2(Password, salt=ClientId)
                                           ^
                                           |
                                           |
                                        ClientId = random(128bit)

DPI resistant handshake packets.

And again there is still another problem: we have not yet transmitted our client’s identity. Server does not know what verifier must be used for handshake processing. If we transmit it in clear, then third party will see the same repeated string during each handshake. It does not harm confidentiality and security, but it is the leakage of deanonymization metainformation.

Moreover all handshake packets have the same size and behaviour: 48 bytes from client to server, 80 bytes response, 120 bytes again, 16 bytes response. Handshake behaviour still differs from the transport one.

Each handshake packet is padded similarly to transport messages:

HANDSHAKE MSG = [R] || enc(PAYLOAD || 0x80 || 0x00 || ...)

After its encryption we have got pseudo-random noise with maximal size indistinguishable from other packets.

And each handshake packet has appended so called IDtag. This tag is XTEA encryption of the first 8 bytes of the message using client’s identity as a key. When server gets handshake messages it takes all known client identities and tries to decrypt last 8 bytes and compare it with the first 8 bytes of the message. Of course this search time grows linearly with the number of clients, but XTEA is pretty fast and that searching is needed only during handshake messages processing.

      HANDSHAKE MSG = [R] || enc(PAYLOAD || 0x80 || 0x00 || ...) ||
XTEA(ClientId, 8bytes([R] || enc(PAYLOAD || ...)))

This feature is also good at saving server’s resources: it won’t try to participate in handshake with unknown clients. So adversary can send any random data and receive nothing in response.

But an adversary can intercept the first client’s handshake message and repeat it again. Because it is valid from the server’s point of view: it will respond to it. You can not finish that handshake session, but at least you know that GoVPN server is sitting on that port and it knows that client’s identity.

To mitigate this kind of attack, we use synchronized clocks. Well, dependency on time is an awful thing. It complicates things very much. So this is only an option. To randomize client identities we just take current time, round it to specified amount, for example ten seconds, and XOR with the client’s identity – every ten seconds an encryption key for IDtag is altered.

               HANDSHAKE MSG = [R] || enc(PAYLOAD || 0x80 || 0x00 ...) ||
XTEA(TIME XOR ClientId, 8bytes([R] || enc(PAYLOAD || ...)))

At last we are quite satisfied with that protocol. Of course you must use strong passphrase and high quality entropy source for ephemeral keys and random numbers generation.

Additional remarks.

Not all operating systems provide good PRNG out-of-box. GoVPN has ability to use other than /dev/urandom entropy sources through Entropy Gathering Daemon compatible protocol.

GoVPN is only layer-2 VPN daemon. It knows nothing about layer-3 IP addresses, routes and anything close to that subject. It uses layer-2 TAP interfaces and you have to manually configure and control how you clients work with the routing and addresses. There are support for convenient up and down scripts executed after session either initialization or termination.

I thought about making some kind of stunnel replacement from it, for example tunneling of either single TCP connection, or externally executed command’s stdin/stdout. But all of this are much more complicated task comparing to the VPN. I decided that you should use specialized tools for all of this. Anyway you can use GoVPN for creating IPv6 link-local only small networks where all you socat, stunnel, SSH, whatever works.

Encryptionless mode.

GoVPN also includes so called encryptionless mode of operation. Its necessity is under question and mainly theoretical.

Assume that you operate under jurisdictions where using of encryption functions is illegal. This mode (actually XTEA PRP encryption of the nonce is still performed) uses only authentication functions. Unfortunately it is much more resource and traffic hungry.

This mode is based on relatively old Ronald L. Rivest’s work about "chaffing and winnowing". Additionally it uses another well known all-or-nothing transformation (AONT): Optimal Asymmetric Encryption Padding (OAEP). Actually OAEP is slightly changed: length field replaced with hash-based checksumming taken from SAEP+.

Chaffing-and-Winnowing idea is pretty simple in our context: except sending just only single bit of required data, you always send two bits, always 0 and always 1. But you also provide authentication information for each of them: so you can distinguish the bit you really need from the junk (chaff).

For each input byte (8 bits) you send 16 MACs. Odd ones are for 0 bit value, even are for 1 bit value. Only single valid MAC in the pair is allowed.

   VALID    INVLD    INVLD    VALID    INVLD    VALID    INVLD    VALID
   MAC00 || MAC01 || MAC02 || MAC03 || MAC04 || MAC05 || MAC06 || MAC07 ||

   INVLD    VALID    VALID    INVLD    VALID    INVLD    VALID    INVLD
|| MAC08 || MAC09 || MAC10 || MAC11 || MAC12 || MAC13 || MAC14 || MAC15

In that example we have 0, 1, 1, 1, 1, 0, 0, 0 valid bits and byte 01111000.

GoVPN uses Poly1305 as a MAC. So for transmitting single byte we spent 256 bytes of real traffic: 16 128-bit MACs. Each Poly1305 requires one-time authentication key. We take them from XSalsa20 output stream. XSalsa20 differs from Salsa20: it uses longer 192-bit nonces.

MAC00Key, MAC01Key, ... = XSalsa20(
    encryptionKey=SessionKey,
    nonce=PacketNum || 0x00 ... || ByteNum,
    plaintext=0x00 ...
)

As session key is unique for each session and packet numbers do not repeat, we guarantee that one-time authentication keys won’t repeat too.

Sending 256 times more traffic is really very expensive. So AONT can help us here. Its idea is simple: either provide all bits of the message to retrieve it, or you won’t recover anything from it. The main difference of AONT from the encryption: it is keyless. It is just a transformation.

AONT takes message M and some random number r. AONT package consists of two parts P1, P2:

PKG = P1 || P2
 P1 = expand(r) XOR (M || H(r || M))
 P2 = H(P1) XOR r

+-----------------------+-----------+
|         M             | H(r || M) |
+-----------------------+-----------+
          |                  ^
          |                   \
          .                    \
         XOR <-- expand(r)      XOR
          |                         \
          |                          \
          .                           .
+-----------------------------------+----+
|        P1                         | P2 |
+-----------------------------------+----+

If any of your bit in either P1 or P2 is tampered – you will detect this. We use BLAKE2b as a hash function H() and Salsa20 as an expander for the random number. r is used as a key for Salsa20.

Only 16 bytes (128-bit security margin) of this AONT package are chaffed-and-winnowed during transmission. We use 256-bit random number during AONT packaging. So each transmitted packet requires 16 * 256 + 32 = 4128 bytes of overhead. Comparing to 1500 MTU bytes this is not so huge value as 256 times more of clear chaffing-and-winnowing.

Conclusions.

We have got strong password authenticated augmented key agreement protocol with zero-knowledge mutual peers authentication.
Authentication tokens are resistant to offline dictionary attacks even if server’s database/hard drive is compromised.
Replay attack protection, perfect forward secrecy.
DPI resistance: all transport and handshake messages looks like random data without any repeating structure. Message lengths and timestamps can be hidden with the noise.
Relatively small codebase:
- 6 screens of transport protocol;
- 7 screens of handshake protocol;
- 2 screens of verifier related code;
- 2 screens of chaffing-and-winnowing related code;
- 1 screen of AONT related code;
- 3+3 screens (UDP and TCP) of server related main code;
- 2+2 screens (UDP and TCP) of client related main code.
Enough throughput performance: my Intel i5 notebook CPU under Go 1.5 gives 786 Mbps of UDP packets throughput.