eBPF Talk: 耗时 10 个月,修复了又一个 tailcall 的 bug 之后,再下一城,再次修复与 tailcall 有关的一个 BUG:

BUG 危害

这个 BUG 会导致内核 panic:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
[309049.036402] BUG: kernel NULL pointer dereference, address: 0000000000000004
[309049.036419] #PF: supervisor read access in kernel mode
[309049.036426] #PF: error_code(0x0000) - not-present page
[309049.036432] PGD 0 P4D 0
[309049.036437] Oops: 0000 [#1] PREEMPT SMP NOPTI
[309049.036444] CPU: 2 PID: 788148 Comm: test_progs Not tainted 6.8.0-31-generic #31-Ubuntu
[309049.036465] Hardware name: VMware, Inc. VMware20,1/440BX Desktop Reference Platform, BIOS VMW201.00V.21805430.B64.2305221830 05/22/2023
[309049.036477] RIP: 0010:bpf_prog_map_compatible+0x2a/0x140
[309049.036488] Code: 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 49 89 fe 41 55 41 54 53 44 8b 6e 04 48 89 f3 41 83 fd 1c 75 0c 48 8b 46 38 48 8b 40 70 <44> 8b 68 04 f6 43 03 01 75 1c 48 8b 43 38 44 0f b6 a0 89 00 00 00
[309049.036505] RSP: 0018:ffffb2e080fd7ce0 EFLAGS: 00010246
[309049.036513] RAX: 0000000000000000 RBX: ffffb2e0807c1000 RCX: 0000000000000000
[309049.036521] RDX: 0000000000000000 RSI: ffffb2e0807c1000 RDI: ffff990290259e00
[309049.036528] RBP: ffffb2e080fd7d08 R08: 0000000000000000 R09: 0000000000000000
[309049.036536] R10: 0000000000000000 R11: 0000000000000000 R12: ffff990290259e00
[309049.036543] R13: 000000000000001c R14: ffff990290259e00 R15: ffff99028e29c400
[309049.036551] FS:  00007b82cbc28140(0000) GS:ffff9903b3f00000(0000) knlGS:0000000000000000
[309049.036559] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[309049.036566] CR2: 0000000000000004 CR3: 0000000101286002 CR4: 00000000003706f0
[309049.036573] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[309049.036581] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[309049.036588] Call Trace:
[309049.036592]  <TASK>
[309049.036597]  ? show_regs+0x6d/0x80
[309049.036604]  ? __die+0x24/0x80
[309049.036619]  ? page_fault_oops+0x99/0x1b0
[309049.036628]  ? do_user_addr_fault+0x2ee/0x6b0
[309049.036634]  ? exc_page_fault+0x83/0x1b0
[309049.036641]  ? asm_exc_page_fault+0x27/0x30
[309049.036649]  ? bpf_prog_map_compatible+0x2a/0x140
[309049.036656]  prog_fd_array_get_ptr+0x2c/0x70
[309049.036664]  bpf_fd_array_map_update_elem+0x37/0x130
[309049.036671]  bpf_map_update_value+0x1d3/0x260
[309049.036677]  map_update_elem+0x1fa/0x360
[309049.036683]  __sys_bpf+0x54c/0xa10
[309049.036689]  __x64_sys_bpf+0x1a/0x30
[309049.036694]  x64_sys_call+0x1936/0x25c0
[309049.036700]  do_syscall_64+0x7f/0x180
[309049.036706]  ? do_syscall_64+0x8c/0x180
[309049.036712]  ? do_syscall_64+0x8c/0x180
[309049.036717]  ? irqentry_exit+0x43/0x50
[309049.036723]  ? common_interrupt+0x54/0xb0
[309049.036729]  entry_SYSCALL_64_after_hwframe+0x73/0x7b

BUG 根因分析

由以下 2 个 commit 组合起来导致的:

在 commit bpf: Move prog->aux->linked_prog and trampoline into bpf_link on attach 里,freplace prog 在 attach 完成之后,prog->aux->dst_prog = NULL;

然而,在 commit bpf: Resolve to prog->aux->dst_prog->type only for BPF_PROG_TYPE_EXT 里,如果 prog->type == BPF_PROG_TYPE_EXT 则把 prog->aux->dst_prog->type 当作当前的 prog 类型。

所以,如果在 freplace prog attach 之后将 freplace 添加到 prog_array map 里,则会导致上面的 panic。

BUG 修复

修复这个 BUG 不难,将 resolve_prog_type() 改成如下样子即可:

1
2
3
4
5
static inline enum bpf_prog_type resolve_prog_type(const struct bpf_prog *prog)
{
	return (prog->type == BPF_PROG_TYPE_EXT && prog->aux->saved_dst_prog_type) ?
		prog->aux->saved_dst_prog_type : prog->type;
}

总结

修复这个 BUG 的 patch 已合并到 6.11 内核。

即,如果在 6.1/6.6/6.8/6.10 等常见内核版本里使用 freplace+tailcall,则需要规避一下这个 BUG:需要在将 freplace 程序添加到 prog_array map 后再进行 attach。