Middleboxes can mess up with TCP flows, e.g. intercepting the connections and dropping MPTCP options. Using an TCP-in-UDP tunnel will force such middleboxes not to modify such TCP connections. The idea here is inspired by an old IETF draft.
This "tunnel" is done in eBPF, from the TC hooks. For more details about why it has been created, and its particularities, please check this blog post.
UDP:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Port | Destination Port |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Length | Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
TCP:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Port | Destination Port |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Acknowledgment Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data | |C|E|U|A|P|R|S|F| |
| Offset| Reser |R|C|R|C|S|S|Y|I| Window |
| | |W|E|G|K|H|T|N|N| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Checksum | Urgent Pointer |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| (Optional) Options |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Port | Destination Port |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Length | Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data | |C|E| |A|P|R|S|F| |
| Offset| Reser |R|C|0|C|S|S|Y|I| Window |
| | |W|E| |K|H|T|N|N| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Acknowledgment Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| (Optional) Options |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Modifications:
URG
set to 0,Urgent Pointer
is supposed to be zero (not used).- Switch
Sequence Number
andAcknowledgment Number
withUrgent Pointer
andChecksum
. - Replace
Urgent Pointer
by theLength
: Checksum needs to be recomputed.
Checksum:
-
No need to recompute it from scratch, it can be derived from the previous values, by just changing the protocol.
-
UDP Checksum computed from:
- Source and destination address: from upper layer
- Protocol (1B): UDP (17)
- Length (2B): Data (variable) + UDP header (8 octets) lengths
- TCP header
- Data
-
TCP Checksum computed from:
- Source and destination address: from upper layer
- Protocol (1B): TCP (6)
- Length (2B): Data (variable) + TCP header (Between 20 and 56 octets) lengths
- TCP header
- Data
-
Differences:
- Source and destination address: not changed
- Protocol: changed: UDP/TCP.
- Data length: not changed
- L4 header: changed:
UDP Length
vsTCP Urgent Pointer
- Data: not changed
Build the binary using make
. CLang and libbpf
is required, e.g.
sudo apt install clang llvm libelf-dev build-essential libc6-dev-i386 libbpf-dev
Load it with tc
commands:
- Client:
tc qdisc add dev "${IFACE}" clsact tc filter add dev "${IFACE}" egress bpf obj tcp_in_udp_tc.o sec tc_client_egress action csum udp tc filter add dev "${IFACE}" ingress bpf da obj tcp_in_udp_tc.o sec tc_client_ingress
- Server:
tc qdisc add dev "${IFACE}" clsact tc filter add dev "${IFACE}" egress bpf obj tcp_in_udp_tc.o sec tc_server_egress action csum udp tc filter add dev "${IFACE}" ingress bpf da obj tcp_in_udp_tc.o sec tc_server_ingress
GRO/TSO cannot be used on this interface, because each UDP packet will carry a part of the TCP headers as part of the data: this is specific to one packet, and it cannot be merged with the next data. Please use this:
ethtool -K "${IFACE}" gro off lro off gso off tso off ufo off sg off
ip link set ${IFACE} gso_max_segs 1
(to be checked: maybe it is enough to disable gro
and gso/tso
.)
Note: to get some stats, in egress, it is possible to use:
tc -s action show action csum
tc -s -j action show action csum | jq
It might be interesting to monitor the tracing ring buffer for warnings and other messages generated by the eBPF programs:
cat /sys/kernel/debug/tracing/trace_pipe
To stop the eBPF programs:
tc filter del dev "${IFACE}" egress
tc filter del dev "${IFACE}" ingress
Because the packets will be in UDP and not TCP, any MSS clamping will have no effects here. It is important to avoid IP fragmentation. In other words, it might be required to adapt the MTU (or the MSS).
- Ingress: From a specific destination IP and port in UDP
- Egress: To a specific destination IP and port in TCP
- Ingress: To a specific destination IP and port in UDP
- Egress: From a previously used
sk
: use ConnMark to set a specificSO_MARK