在做 Linux 代理流量回放实验 时,因 ip route
配置失误导致实验失败;排查了一个星期,最终在阅读 Inline on a Linux router 博客时发现了配置失误的地方。
代理流量回放实验
在 namespace server 中,使用策略路由将 client 程序发送过来的任意 IP、TCP 端口的 IP 包导流到 server 程序,其中在 namespace server 中的策略路由配置如下:
1
2
3
4
5
6
7
|
iptables -t mangle -N DIVERT
iptables -t mangle -A PREROUTING -p tcp -m socket -j DIVERT
iptables -t mangle -A DIVERT -j MARK --set-mark 1
iptables -t mangle -A DIVERT -j ACCEPT
iptables -t mangle -A PREROUTING -p tcp -j TPROXY --tproxy-mark 0x1/0x1 --on-port ${LISTEN_PORT}
ip rule add fwmark 1 lookup 100
ip route add default dev ${VETH_SERVER_INNER} scope host table 100
|
其中最后那条 ip route 的配置是有问题的,下面就是该问题的排查过程。
学习 Linux 收包过程
学习资料:
- linux TCP/IP协议栈-IP层
- It’s crowded in here! - The Cloudflare Blog
其中 Cloudflare 的博客提到的 Linux 接收网络包的阶段:
既然提到了 bpf,那就使用 bpf 来排查一下吧。
使用 bpf 排查问题
使用性能优化大师 Brendan Gregg 在 BPF Performance Tools 书中提供的 skbdrop.bt 工具,执行结果如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
|
bpftrace --unsafe skbdrop.bt
Attaching 3 probes...
Tracing unusual skb drop stacks. Hit Ctrl-C to end.
^C#kernel
IpInReceives 17 0.0
IpInDelivers 17 0.0
IpOutRequests 22 0.0
TcpInSegs 17 0.0
TcpOutSegs 21 0.0
TcpRetransSegs 1 0.0
TcpExtTCPHPHits 1 0.0
TcpExtTCPPureAcks 8 0.0
TcpExtTCPHPAcks 5 0.0
TcpExtTCPTimeouts 1 0.0
TcpExtTCPSynRetrans 1 0.0
TcpExtTCPOrigDataSent 20 0.0
TcpExtTCPDelivered 20 0.0
TcpExtTcpTimeoutRehash 1 0.0
IpExtInOctets 880 0.0
IpExtOutOctets 2239 0.0
IpExtInNoECTPkts 17 0.0
@[
kfree_skb+118
kfree_skb+118
ip_error+134
ip_rcv_finish+135
ip_rcv+188
__netif_receive_skb_one_core+136
__netif_receive_skb+24
process_backlog+169
]: 2
@[
kfree_skb+118
kfree_skb+118
unix_stream_connect+1919
__sys_connect_file+95
__sys_connect+161
__x64_sys_connect+26
do_syscall_64+73
entry_SYSCALL_64_after_hwframe+68
]: 2
|
根据 uname -r
去查看 ip_rcv_finish 源代码,发现 ip_rcv_finish
函数里并没有调用 ip_error;
根据协议栈来排查这问题的办法无法进行下去了,因为不知道该如何继续阅读 ip_rcv_finish
后面的代码。
继续阅读 TProxy 相关的博客
在阅读 Inline on a Linux router 博客的时候,留意了一下这条命令: ip route add local 0.0.0.0/0 dev lo table 1
,这里为什么会有个 local 呢?
不管三七二十一,先加上再说。
1
2
3
4
5
6
7
|
iptables -t mangle -N DIVERT
iptables -t mangle -A PREROUTING -p tcp -m socket -j DIVERT
iptables -t mangle -A DIVERT -j MARK --set-mark 1
iptables -t mangle -A DIVERT -j ACCEPT
iptables -t mangle -A PREROUTING -p tcp -j TPROXY --tproxy-mark 0x1/0x1 --on-port ${LISTEN_PORT}
ip rule add fwmark 1 lookup 100
ip route add local default dev ${VETH_SERVER_INNER} scope host table 100
|
终于,实验成功了。
local 是什么?
有问题,就找 man:man ip-route
,在线文档:iproute(8)。 其中有一段内容:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
|
Route types:
unicast - the route entry describes real paths to the destinations covered by the route prefix.
unreachable - these destinations are unreachable. Packets are discarded and the ICMP message host unreachable is generated. The
local senders get an EHOSTUNREACH error.
blackhole - these destinations are unreachable. Packets are discarded silently. The local senders get an EINVAL error.
prohibit - these destinations are unreachable. Packets are discarded and the ICMP message communication administratively prohib‐
ited is generated. The local senders get an EACCES error.
local - the destinations are assigned to this host. The packets are looped back and delivered locally.
broadcast - the destinations are broadcast addresses. The packets are sent as link broadcasts.
throw - a special control route used together with policy rules. If such a route is selected, lookup in this table is terminated
pretending that no route was found. Without policy routing it is equivalent to the absence of the route in the routing table. The
packets are dropped and the ICMP message net unreachable is generated. The local senders get an ENETUNREACH error.
nat - a special NAT route. Destinations covered by the prefix are considered to be dummy (or external) addresses which require
translation to real (or internal) ones before forwarding. The addresses to translate to are selected with the attribute via.
Warning: Route NAT is no longer supported in Linux 2.6.
anycast - not implemented the destinations are anycast addresses assigned to this host. They are mainly equivalent to local with
one difference: such addresses are invalid when used as the source address of any packet.
multicast - a special type used for multicast routing. It is not present in normal routing tables.
|
local 是一个路由类型,指将网络包发给系统本地协议栈。
总结
一个 “小小的” 偏差,导致了一个星期的时间消耗,只能说明自己的知识储备还不够深厚。
得好好阅读一下内核网络协议栈的源代码,加深对 Linux 系统收发网络包的理解。