\begin{code} {-# LANGUAGE StrictData #-} module Tox.Network.TCP.Server where \end{code} \chapter{TCP server} The TCP server in tox has the goal of acting like a TCP relay between clients who cannot connect directly to each other or who for some reason are limited to using the TCP protocol to connect to each other. \texttt{TCP\_server} is typically run only on actual server machines but any Tox client could host one as the api to run one is exposed through the tox.h api. To connect to a hosted TCP server toxcore uses the TCP client module. The TCP server implementation in toxcore can currently either work on epoll on linux or using unoptimized but portable socket polling. TCP connections between the TCP client and the server are encrypted to prevent an outsider from knowing information like who is connecting to whom just be looking at someones connection to a TCP server. This is useful when someone connects though something like Tor for example. It also prevents someone from injecting data in the stream and makes it so we can assume that any data received was not tampered with and is exactly what was sent by the client. When a client first connects to a TCP server he opens up a TCP connection to the ip and port the TCP server is listening on. Once the connection is established he then sends a handshake packet, the server then responds with his own and a secure connection is established. The connection is then said to be unconfirmed and the client must then send some encrypted data to the server before the server can mark the connection as confirmed. The reason it works like this is to prevent a type of attack where a peer would send a handshake packet and then time out right away. To prevent this the server must wait a few seconds for a sign that the client received his handshake packet before confirming the connection. The both can then communicate with each other using the encrypted connection. The TCP server essentially acts as just a relay between 2 peers. When a TCP client connects to the server he tells the server which clients he wants the server to connect him to. The server will only let two clients connect to each other if both have indicated to the server that they want to connect to each other. This is to prevent non friends from checking if someone is connected to a TCP server. The TCP server supports sending packets blindly through it to clients with a client with public key X (OOB packets) however the TCP server does not give any feedback or anything to say if the packet arrived or not and as such it is only useful to send data to friends who may not know that we are connected to the current TCP server while we know they are. This occurs when one peer discovers the TCP relay and DHT public key of the other peer before the other peer discovers its DHT public key. In that case OOB packets would be used until the other peer knows that the peer is connected to the relay and establishes a connection through it. In order to make toxcore work on TCP only the TCP server supports relaying onion packets from TCP clients and sending any responses from them to TCP clients. To establish a secure connection with a TCP server send the following 128 bytes of data or handshake packet to the server: \begin{tabular}{l|l} Length & Contents \\ \hline \texttt{32} & DHT public key of client \\ \texttt{24} & Nonce for the encrypted data \\ \texttt{72} & Payload (plus MAC) \\ \end{tabular} Payload is encrypted with the DHT private key of the client and public key of the server and the nonce: \begin{tabular}{l|l} Length & Contents \\ \hline \texttt{32} & Public key \\ \texttt{24} & Base nonce \\ \end{tabular} The base nonce is the one TCP client wants the TCP server to use to decrypt the packets received from the TCP client. The first 32 bytes are the public key (DHT public key) that the TCP client is announcing itself to the server with. The next 24 bytes are a nonce which the TCP client uses along with the secret key associated with the public key in the first 32 bytes of the packet to encrypt the rest of this 'packet'. The encrypted part of this packet contains a temporary public key that will be used for encryption during the connection and will be discarded after. It also contains a base nonce which will be used later for decrypting packets received from the TCP client. If the server decrypts successfully the encrypted data in the handshake packet and responds with the following handshake response of length 96 bytes: \begin{tabular}{l|l} Length & Contents \\ \hline \texttt{24} & Nonce for the encrypted data \\ \texttt{72} & Payload (plus MAC) \\ \end{tabular} Payload is encrypted with the private key of the server and the DHT public key of the client and the nonce: \begin{tabular}{l|l} Length & Contents \\ \hline \texttt{32} & Public key \\ \texttt{24} & Base nonce \\ \end{tabular} The base nonce is the one the TCP server wants the TCP client to use to decrypt the packets received from the TCP server. The client already knows the long term public key of the server so it is omitted in the response, instead only a nonce is present in the unencrypted part. The encrypted part of the response has the same elements as the encrypted part of the request: a temporary public key tied to this connection and a base nonce which will be used later when decrypting packets received from the TCP client both unique for the connection. In toxcore the base nonce is generated randomly like all the other nonces, it must be randomly generated to prevent nonce reuse. For example if the nonce used was 0 for both sides since both sides use the same keys to encrypt packets they send to each other, two packets would be encrypted with the same nonce. These packets could then be possibly replayed back to the sender which would cause issues. A similar mechanism is used in \texttt{net\_crypto}. After this the client will know the connection temporary public key and base nonce of the server and the server will know the connection base nonce and temporary public key of the client. The client will then send an encrypted packet to the server, the contents of the packet do not matter and it must be handled normally by the server (ex: if it was a ping send a pong response. The first packet must be any valid encrypted data packet), the only thing that does matter is that the packet was encrypted correctly by the client because it means that the client has correctly received the handshake response the server sent to it and that the handshake the client sent to the server really came from the client and not from an attacker replaying packets. The server must prevent resource consuming attacks by timing out clients if they do not send any encrypted packets so the server to prove to the server that the connection was established correctly. Toxcore does not have a timeout for clients, instead it stores connecting clients in large circular lists and times them out if their entry in the list gets replaced by a newer connection. The reasoning behind this is that it prevents TCP flood attacks from having a negative impact on the currently connected nodes. There are however much better ways to do this and the only reason toxcore does it this way is because writing it was very simple. When connections are confirmed they are moved somewhere else. When the server confirms the connection he must look in the list of connected peers to see if he is already connected to a client with the same announced public key. If this is the case the server must kill the previous connection because this means that the client previously timed out and is reconnecting. Because of Toxcore design it is very unlikely to happen that two legitimate different peers will have the same public key so this is the correct behavior. Encrypted data packets look like this to outsiders: \begin{tabular}{l|l} Length & Contents \\ \hline \texttt{2} & \texttt{uint16\_t} length of data \\ variable & encrypted data \\ \end{tabular} In a TCP stream they would look like: \texttt{[[length][data]][[length][data]][[length][data]]...}. Both the client and server use the following (temp public and private (client and server) connection keys) which are each generated for the connection and then sent to the other in the handshake and sent to the other. They are then used like the next diagram shows to generate a shared key which is equal on both sides. \begin{verbatim} Client: Server: generate_shared_key( generate_shared_key( [temp connection public key of server], [temp connection public key of client], [temp connection private key of client]) [temp connection private key of server]) = = [shared key] [shared key] \end{verbatim} The generated shared key is equal on both sides and is used to encrypt and decrypt the encrypted data packets. each encrypted data packet sent to the client will be encrypted with the shared key and with a nonce equal to: (client base nonce + number of packets sent so for the first packet it is (starting at 0) nonce + 0, the second is nonce + 1 and so on. Note that nonces like all other numbers sent over the network in toxcore are numbers in big endian format so when increasing them by 1 the least significant byte is the last one) each packet received from the client will be decrypted with the shared key and with a nonce equal to: (server base nonce + number of packets sent so for the first packet it is (starting at 0) nonce + 0, the second is nonce + 1 and so on. Note that nonces like all other numbers sent over the network in toxcore are numbers in big endian format so when increasing them by 1 the least significant byte is the last one) Encrypted data packets have a hard maximum size of 2 + 2048 bytes in the toxcore TCP server implementation, 2048 bytes is big enough to make sure that all toxcore packets can go through and leaves some extra space just in case the protocol needs to be changed in the future. The 2 bytes represents the size of the data length and the 2048 bytes the max size of the encrypted part. This means the maximum size is 2050 bytes. In current toxcore, the largest encrypted data packets sent will be of size 2 + 1417 which is 1419 total. The logic behind the format of the handshake is that we: \begin{enumerate} \item need to prove to the server that we own the private key related to the public key we are announcing ourselves with. \item need to establish a secure connection that has perfect forward secrecy \item prevent any replay, impersonation or other attacks \end{enumerate} How it accomplishes each of those points: \begin{enumerate} \item If the client does not own the private key related to the public key they will not be able to create the handshake packet. \item Temporary session keys generated by the client and server in the encrypted part of the handshake packets are used to encrypt/decrypt packets during the session. \item The following attacks are prevented: \begin{itemize} \item Attacker modifies any byte of the handshake packets: Decryption fail, no attacks possible. \item Attacker captures the handshake packet from the client and replays it later to the server: Attacker will never get the server to confirm the connection (no effect). \item Attacker captures a server response and sends it to the client next time they try to connect to the server: Client will never confirm the connection. (See: \texttt{TCP\_client}) \item Attacker tries to impersonate a server: They won't be able to decrypt the handshake and won't be able to respond. \item Attacker tries to impersonate a client: Server won't be able to decrypt the handshake. \end{itemize} \end{enumerate} The logic behind the format of the encrypted packets is that: \begin{enumerate} \item TCP is a stream protocol, we need packets. \item Any attacks must be prevented \end{enumerate} How it accomplishes each of those points: \begin{enumerate} \item 2 bytes before each packet of encrypted data denote the length. We assume a functioning TCP will deliver bytes in order which makes it work. If the TCP doesn't it most likely means it is under attack and for that see the next point. \item The following attacks are prevented: \begin{itemize} \item Modifying the length bytes will either make the connection time out and/or decryption fail. \item Modifying any encrypted bytes will make decryption fail. \item Injecting any bytes will make decryption fail. \item Trying to re order the packets will make decryption fail because of the ordered nonce. \item Removing any packets from the stream will make decryption fail because of the ordered nonce. \end{itemize} \end{enumerate} \section{Encrypted payload types} The folowing represents the various types of data that can be sent inside encrypted data packets. \subsection{Routing request (0x00)} \begin{tabular}{l|l} Length & Contents \\ \hline \texttt{1} & \texttt{uint8\_t} (0x00) \\ \texttt{32} & Public key \\ \end{tabular} \subsection{Routing request response (0x01)} \begin{tabular}{l|l} Length & Contents \\ \hline \texttt{1} & \texttt{uint8\_t} (0x01) \\ \texttt{1} & \texttt{uint8\_t} rpid \\ \texttt{32} & Public key \\ \end{tabular} rpid is invalid \texttt{connection\_id} (0) if refused, \texttt{connection\_id} if accepted. \subsection{Connect notification (0x02)} \begin{tabular}{l|l} Length & Contents \\ \hline \texttt{1} & \texttt{uint8\_t} (0x02) \\ \texttt{1} & \texttt{uint8\_t} \texttt{connection\_id} of connection that got connected \\ \end{tabular} \subsection{Disconnect notification (0x03)} \begin{tabular}{l|l} Length & Contents \\ \hline \texttt{1} & \texttt{uint8\_t} (0x03) \\ \texttt{1} & \texttt{uint8\_t} \texttt{connection\_id} of connection that got disconnected \\ \end{tabular} \subsection{Ping packet (0x04)} \begin{tabular}{l|l} Length & Contents \\ \hline \texttt{1} & \texttt{uint8\_t} (0x04) \\ \texttt{8} & \texttt{uint64\_t} \texttt{ping\_id} (0 is invalid) \\ \end{tabular} \subsection{Ping response (pong) (0x05)} \begin{tabular}{l|l} Length & Contents \\ \hline \texttt{1} & \texttt{uint8\_t} (0x05) \\ \texttt{8} & \texttt{uint64\_t} \texttt{ping\_id} (0 is invalid) \\ \end{tabular} \subsection{OOB send (0x06)} \begin{tabular}{l|l} Length & Contents \\ \hline \texttt{1} & \texttt{uint8\_t} (0x06) \\ \texttt{32} & Destination public key \\ variable & Data \\ \end{tabular} \subsection{OOB recv (0x07)} \begin{tabular}{l|l} Length & Contents \\ \hline \texttt{1} & \texttt{uint8\_t} (0x07) \\ \texttt{32} & Sender public key \\ variable & Data \\ \end{tabular} \subsection{Onion packet (0x08)} Same format as initial onion packet but packet id is 0x08 instead of 0x80. \subsection{Onion packet response (0x09)} Same format as onion packet but packet id is 0x09 instead of 0x8e. \subsection{Data (0x10 and up)} \begin{tabular}{l|l} Length & Contents \\ \hline \texttt{1} & \texttt{uint8\_t} packet id \\ \texttt{1} & \texttt{uint8\_t} connection id \\ variable & data \\ \end{tabular} The TCP server is set up in a way to minimize waste while relaying the many packets that might go between two tox peers hence clients must create connections to other clients on the relay. The connection number is a \texttt{uint8\_t} and must be equal or greater to 16 in order to be valid. Because a \texttt{uint8\_t} has a maximum value of 256 it means that the maximum number of different connections to other clients that each connection can have is 240. The reason valid \texttt{connection\_ids} are bigger than 16 is because they are the first byte of data packets. Currently only number 0 to 9 are taken however we keep a few extras in case we need to extend the protocol without breaking it completely. Routing request (Sent by client to server): Send a routing request to the server that we want to connect to peer with public key where the public key is the public the peer announced themselves as. The server must respond to this with a Routing response. Routing response (Sent by server to client): The response to the routing request, tell the client if the routing request succeeded (valid \texttt{connection\_id}) and if it did, tell them the id of the connection (\texttt{connection\_id}). The public key sent in the routing request is also sent in the response so that the client can send many requests at the same time to the server without having code to track which response belongs to which public key. The only reason a routing request should fail is if the connection has reached the maximum number of simultaneous connections. In case the routing request fails the public key in the response will be the public key in the failed request. Connect notification (Sent by server to client): Tell the client that \texttt{connection\_id} is now connected meaning the other is online and data can be sent using this \texttt{connection\_id}. Disconnect notification (Sent by client to server): Sent when client wants the server to forget about the connection related to the \texttt{connection\_id} in the notification. Server must remove this connection and must be able to reuse the \texttt{connection\_id} for another connection. If the connection was connected the server must send a disconnect notification to the other client. The other client must think that this client has simply disconnected from the TCP server. Disconnect notification (Sent by server to client): Sent by the server to the client to tell them that the connection with \texttt{connection\_id} that was connected is now disconnected. It is sent either when the other client of the connection disconnect or when they tell the server to kill the connection (see above). Ping and Pong packets (can be sent by both client and server, both will respond): ping packets are used to know if the other side of the connection is still live. TCP when established doesn't have any sane timeouts (1 week isn't sane) so we are obliged to have our own way to check if the other side is still live. Ping ids can be anything except 0, this is because of how toxcore sets the variable storing the \texttt{ping\_id} that was sent to 0 when it receives a pong response which means 0 is invalid. The server should send ping packets every X seconds (toxcore \texttt{TCP\_server} sends them every 30 seconds and times out the peer if it doesn't get a response in 10). The server should respond immediately to ping packets with pong packets. The server should respond to ping packets with pong packets with the same \texttt{ping\_id} as was in the ping packet. The server should check that each pong packet contains the same \texttt{ping\_id} as was in the ping, if not the pong packet must be ignored. OOB send (Sent by client to server): If a peer with private key equal to the key they announced themselves with is connected, the data in the OOB send packet will be sent to that peer as an OOB recv packet. If no such peer is connected, the packet is discarded. The toxcore \texttt{TCP\_server} implementation has a hard maximum OOB data length of 1024. 1024 was picked because it is big enough for the \texttt{net\_crypto} packets related to the handshake and is large enough that any changes to the protocol would not require breaking TCP server. It is however not large enough for the biggest \texttt{net\_crypto} packets sent with an established \texttt{net\_crypto} connection to prevent sending those via OOB packets. OOB recv (Sent by server to client): OOB recv are sent with the announced public key of the peer that sent the OOB send packet and the exact data. OOB packets can be used just like normal data packets however the extra size makes sending data only through them less efficient than data packets. Data: Data packets can only be sent and received if the corresponding \texttt{connection\_id} is connection (a Connect notification has been received from it) if the server receives a Data packet for a non connected or existent connection it will discard it. Why did I use different packet ids for all packets when some are only sent by the client and some only by the server? It's less confusing.