eBPF Talk: trampoline stack on x86【汇编慎入】

本系列是 x86 架构平台上 trampoline 的实现，从原理和实现上进行了详细的介绍。

前面学习了 trampoline 的工作原理：

继续深入学习 trampoline 底层实现，回答以下 2 个问题：

fentry bpf prog 是如何获取 trace 目标函数的参数的？
fexit bpf prog 为什么能够同时获取 trace 目标函数的参数和返回值？

TL;DR 答案是，trampoline 将 trace 目标函数的参数和返回值都保存到当前 trampoline bpf prog 的栈上了。

P.S. 相对比,kretprobe 能获取到返回值，但不一定能获取到参数。

trampoline stack on x86

答案尽在源代码中。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30


// ${KERNEL}/arch/x86/net/bpf_jit_comp.c

int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *image_end,
                                const struct btf_func_model *m, u32 flags,
                                struct bpf_tramp_links *tlinks,
                                void *func_addr)
{
    // ...

    /* Generated trampoline stack layout:
     *
     * RBP + 8         [ return address  ]
     * RBP + 0         [ RBP             ]
     *
     * RBP - 8         [ return value    ]  BPF_TRAMP_F_CALL_ORIG or
     *                                      BPF_TRAMP_F_RET_FENTRY_RET flags
     *
     *                 [ reg_argN        ]  always
     *                 [ ...             ]
     * RBP - regs_off  [ reg_arg1        ]  program's ctx pointer
     *
     * RBP - args_off  [ arg regs count  ]  always
     *
     * RBP - ip_off    [ traced function ]  BPF_TRAMP_F_IP_ARG flag
     *
     * RBP - run_ctx_off [ bpf_tramp_run_ctx ]
     */

    // ...
}

`fexit`

由 trampoline 生成的栈可知，fentry 和 fexit 获取 trace 目标函数参数的方式是一样的，所以分析 fexit 获取函数参数的实现，便可知 fentry 获取函数参数的实现。

使用 fexit 时，trampoline bpf prog 对应的汇编代码如下：

注意其中 “蓝色” 标出的三块汇编：

将参数压入栈。
将返回值压入栈。
将栈中的函数返回值 mov 到 %rax（返回值寄存器）。

inet_csk_complete_hashdance() 函数声明如下：

1
2
3
4
5
6
7


// ${KERNEL}/net/ipv4/inet_connection_sock.c

struct sock *inet_csk_complete_hashdance(struct sock *sk, struct sock *child,
                     struct request_sock *req, bool own_req)
{
    // ...
}

inet_csk_complete_hashdance() 函数有 4 个参数，依次将参数压入栈中。

1
2
3
4


   0xffffffffc044a012:        mov    %rdi,-0x28(%rbp)
   0xffffffffc044a016:        mov    %rsi,-0x20(%rbp)
   0xffffffffc044a01a:        mov    %rdx,-0x18(%rbp)
   0xffffffffc044a01e:        mov    %cl,-0x10(%rbp)

x86-64	说明
rax	返回值
rdi	第一个参数
rsi	第二个参数
rdx	第三个参数
rcx	第四个参数
r8	第五个参数
rbx	第六个参数

参考：bpf-docs/bpf-internals

P.S. %cl 指一个字节大小的第四个参数。

来源：网络

How many registers does an x86_64 CPU actually have?

`fentry` `tcp_connect()`

eBPF 源代码：

1
2
3
4
5
6
7


SEC("fentry/tcp_connect")
int BPF_PROG(tcp_connect, struct sock *sk)
{
    handle_new_connection(ctx, sk);

    return 0;
}

x86 上的汇编：

1
2
3
4
5
6
7
8
9


   0xffffffffc02276cc:  nopl   0x0(%rax,%rax,1)
   0xffffffffc02276d1:  xchg   %ax,%ax
   0xffffffffc02276d3:  push   %rbp
   0xffffffffc02276d4:  mov    %rsp,%rbp
   0xffffffffc02276d7:  mov    0x0(%rdi),%rsi
   0xffffffffc02276db:  call   0xffffffffc0227950
   0xffffffffc02276e0:  xor    %eax,%eax
   0xffffffffc02276e2:  leave
   0xffffffffc02276e3:  ret

关键在于 mov 0x0(%rdi),%rsi 这一条指令。

%rdi 是第一个参数，即 bpf prog 里的第一个参数 ctx；在 fentry/fexit bpf prog 里，ctx 是指向第一个参数所在栈的地址。
0x0(%rdi) 指将 %rdi 地址所在的内存的 8 个字节加载到 %rsi 寄存器（第二个参数）。

一条指令便准备好了 handle_new_connection() 函数的参数，因为复用了第一个参数。

`fexit` `inet_csk_complete_hashdance()`

eBPF 源代码：

1
2
3
4
5
6
7
8
9


SEC("fexit/inet_csk_complete_hashdance")
int BPF_PROG(inet_csk_complete_hashdance, struct sock *sk, struct sock *child,
             struct request_sock *req, bool own_req, struct sock *ret)
{
    if (ret)
        handle_new_connection(ctx, ret);

    return 0;
}

x86 上的汇编：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


   0xffffffffc022769c:  nopl   0x0(%rax,%rax,1)
   0xffffffffc02276a1:  xchg   %ax,%ax
   0xffffffffc02276a3:  push   %rbp
   0xffffffffc02276a4:  mov    %rsp,%rbp
   0xffffffffc02276a7:  mov    0x20(%rdi),%rsi
   0xffffffffc02276ab:  test   %rsi,%rsi
   0xffffffffc02276ae:  je     0xffffffffc02276b5
   0xffffffffc02276b0:  call   0xffffffffc022785c
   0xffffffffc02276b5:  xor    %eax,%eax
   0xffffffffc02276b7:  leave
   0xffffffffc02276b8:  ret

关键在于 mov 0x20(%rdi),%rsi 这一条指令。

%rdi 是第一个参数，即 bpf prog 里的第一个参数 ctx；在 fentry/fexit bpf prog 里，ctx 是指向第一个参数所在栈的地址。
0x20(%rdi) 指将 %rdi 地址向上偏移 0x20 个字节所在的内存的 8 个字节（正是 return value）加载到 %rsi 寄存器（第二个参数）。

一条指令便准备好了 handle_new_connection() 函数的参数，因为复用了第一个参数。

总结

一顿汇编和栈的分析，终于搞清楚以下 2 个问题：

fentry bpf prog 是如何获取 trace 目标函数的参数的？
fexit bpf prog 为什么能够同时获取 trace 目标函数的参数和返回值？

这是因为 trampoline 将 trace 目标函数的参数和返回值都保存到当前 trampoline bpf prog 的栈上，而后 fentry/fexit/fmod_ret bpf prog 按需从栈上获取参数和返回值。

eBPF Talk: trampoline stack on x86【汇编慎入】

文章目录

trampoline stack on x86

`fexit`

`fentry` `tcp_connect()`

`fexit` `inet_csk_complete_hashdance()`

总结

知识星球

星球里的专栏：

《XDP 进阶手册》

文章目录

trampoline stack on x86

fexit

fentry tcp_connect()

fexit inet_csk_complete_hashdance()

总结

知识星球

星球里的专栏：

《XDP 进阶手册》

`fexit`

`fentry` `tcp_connect()`

`fexit` `inet_csk_complete_hashdance()`