Linux之bpfjit(2)使用分析和mini-tcpdump实现

发布于:2024-04-16 ⋅ 阅读:(25) ⋅ 点赞:(0)

Linux之bpfjit(2)使用分析和mini-tcpdump实现


Author: Once Day Date: 2024年4月13日

一位热衷于Linux学习和开发的菜鸟,试图谱写一场冒险之旅,也许终点只是一场白日梦…

漫漫长路,有人对你微笑过嘛…

全系列文章可以参考专栏:Linux基础知识_Once-Day的博客-CSDN博客

参考文章:


1. 概述
1.1 BPF(Berkeley Packet Filter)

BPF(Berkeley Packet Filter)最初设计用于数据包过滤,即网络封包的捕获与筛选。随着时间的发展,BPF已经演变成为一个更加通用、功能强大的技术,尤其是在Linux内核中,它已经被扩展为eBPF(extended Berkeley Packet Filter)。

基本概念:

  • BPF:最初的设计目的是提高网络封包处理的效率,通过在用户空间与内核空间之间提供一个灵活的数据包过滤机制。
  • eBPF:是BPF的扩展版本,不仅能够进行数据包过滤,还能进行性能监控、网络监控、安全审计等多种功能。eBPF提供了一种在不改变内核源代码的情况下,向Linux内核动态添加自定义代码(主要是监控和跟踪代码)的能力。

在FastPath,目前使用的包过滤技术,主要是BPF。eBPF需要内核支持,暂时没有合适的用户空间实现机制。

BPF本质上是一堆预定义的字节码,可以模拟加减乘除、分支判断、跳转、寄存器存入和读取等操作。这些基础操作组合起来,就能实现复杂的过滤处理逻辑

例如,下面是一些BPF的算术和判断操作符定义:

//(NetBSD) - sys/net/bpf.h
/* alu/jmp fields */
#define BPF_OP(code)	((code) & 0xf0)
#define		BPF_ADD		0x00
#define		BPF_SUB		0x10
#define		BPF_MUL		0x20
#define		BPF_DIV		0x30
#define		BPF_OR		0x40
#define		BPF_AND		0x50
#define		BPF_LSH		0x60
#define		BPF_RSH		0x70
#define		BPF_NEG		0x80
#define		BPF_MOD		0x90
#define		BPF_XOR		0xa0
......
1.2 BPFJIT(Berkeley Packet Filter Just In Time)

BPFJIT 是 JIT 编译技术在 BPF(Berkeley Packet Filter)上的应用。BPF最初是为了高效的数据包过滤而设计的,它允许在用户空间编写过滤规则,然后在内核空间执行,大幅提升了网络数据包处理的效率。

BPFJIT 则是进一步优化了这个过程,将 BPF 字节码即时编译成机器码,以便内核可以直接执行,这样可以进一步提高过滤效率。

BPF工具可以生成一个包含BPF指令码的字节序列,但是这个字节序列的执行可以有多种形式,如下:

// 一个典型的(tcpdump tcp)命令生成的bpf_filter字节码,用于过滤TCP报文
// 该字节码判断IPv4和IPv6协议类型,针对IPv6还考虑了分片情况处理。
// 如果报文符合条件,返回8192。如果报文不符合条件,则返回0
(000) ldh      [12]
(001) jeq      #0x86dd          jt 2    jf 8
(002) ldb      [20]
(003) jeq      #0x6             jt 12   jf 4
(004) ldb      [20]
(005) jeq      #0x2c            jt 6    jf 8
(006) ldb      [54]
(007) jeq      #0x6             jt 12   jf 8
(008) ldh      [12]
(009) jeq      #0x800           jt 10   jf 13
(010) ldb      [23]
(011) jeq      #0x6             jt 12   jf 13
(012) ret      #8192
(013) ret      #0

对于上述BPF指令,BSD内核代码通过一个C函数直接迭代解析,在数据报mbuf原地上进行处理。

//(NetBSD) - sys/net/bpf_filter.h
u_int
bpf_filter(const struct bpf_insn *pc, const u_char *p, u_int wirelen,
    u_int buflen)
#endif
{
	uint32_t A, X, k;
#ifndef _KERNEL
	uint32_t mem[BPF_MEMWORDS];
	bpf_args_t args_store = {
		.pkt = p,
		.wirelen = wirelen,
		.buflen = buflen,
		.mem = mem,
		.arg = NULL
	};
	bpf_args_t * const args = &args_store;
#else
	const uint8_t * const p = args->pkt;
#endif
	if (pc == 0) {
		/*
		 * No filter means accept all.
		 */
		return (u_int)-1;
	}

	/*
	 * Note: safe to leave memwords uninitialised, as the validation
	 * step ensures that it will not be read, if it was not written.
	 */
	A = 0;
	X = 0;
	--pc;

	for (;;) {
		++pc;
		switch (pc->code) {

		default:
#ifdef _KERNEL
			return 0;
#else
			abort();
			/*NOTREACHED*/
#endif
		case BPF_RET|BPF_K:
			return (u_int)pc->k;

		case BPF_RET|BPF_A:
			return (u_int)A;

		case BPF_LD|BPF_W|BPF_ABS:
			k = pc->k;
			if (k > args->buflen ||
			    sizeof(int32_t) > args->buflen - k) {
#ifdef _KERNEL
				int merr;

				if (args->buflen != 0)
					return 0;
				A = xword(args->pkt, k, &merr);
				if (merr != 0)
					return 0;
				continue;
#else
				return 0;
#endif
			}
			A = EXTRACT_LONG(&p[k]);
			continue;

		case BPF_LD|BPF_H|BPF_ABS:
//...(省略大量代码)...

FastPath的报文过滤的BPF指令码执行函数,采用就是该函数的实现方式

除了这种C函数直接循环迭代解析之外,还可以通过JIT(即时编译)技术增加处理效率。

在NetBSD实现里面,采用SLJIT技术,将BPF指令码一一对应转换为SLJIT指令码,在即时编译后,生成机器特定汇编代码,最终就可以采用C函数指针直接调用执行。

//(NetBSD) - sys/net/bpfjit.c - generate_insn_code
// ...(省略大量代码)...
case BPF_LD:
    /* BPF_LD+BPF_IMM          A <- k */
    if (pc->code == (BPF_LD|BPF_IMM)) {
        status = sljit_emit_op1(compiler,
            SLJIT_MOV,
            BJ_AREG, 0,
            SLJIT_IMM, (uint32_t)pc->k);
        if (status != SLJIT_SUCCESS)
            goto fail;

        continue;
    }

    /* BPF_LD+BPF_MEM          A <- M[k] */
    if (pc->code == (BPF_LD|BPF_MEM)) {
        if ((uint32_t)pc->k >= memwords)
            goto fail;
        status = emit_memload(compiler,
            BJ_AREG, pc->k, extwords);
        if (status != SLJIT_SUCCESS)
            goto fail;

        continue;
    }

    /* BPF_LD+BPF_W+BPF_LEN    A <- len */
    if (pc->code == (BPF_LD|BPF_W|BPF_LEN)) {
        status = sljit_emit_op1(compiler,
            SLJIT_MOV, /* size_t source */
            BJ_AREG, 0,
            SLJIT_MEM1(BJ_ARGS),
            offsetof(struct bpf_args, wirelen));
        if (status != SLJIT_SUCCESS)
            goto fail;

        continue;
    }

    mode = BPF_MODE(pc->code);
    if (mode != BPF_ABS && mode != BPF_IND)
        goto fail;

    if (unconditional_ret)
        continue;

    status = emit_pkt_read(compiler, hints, pc,
        to_mchain_jump, &ret0, &ret0_size, &ret0_maxsize);
    if (status != SLJIT_SUCCESS)
        goto fail;

    continue;
// ...(省略大量代码)...

SLJIT即时编译后通常返回一个函数,BPFJIT固定了这个函数的形式,如下:

/*
 * Return value of a function generated by sljit have sljit_uw type
 * which can have a greater width. In such cases, we rely on the fact
 * that calling conventions use same registers for smaller types.
 * SLJIT_MOV_UI is passed to sljit_emit_return() to make sure that the
 * return value is truncated to unsigned int.
 */
typedef unsigned int (*bpfjit_func_t)(const bpf_ctx_t *, bpf_args_t *);

通过这个函数,BPF可以执行更多复杂的操作,同时兼顾效率。

1.3 SLJIT(Simple Just-In-Time)

SLJIT 是一个独立的、通用的 JIT 编译库,它不特定于任何领域,可以被用于任何需要 JIT 功能的场合。SLJIT 的设计哲学是简单和通用,它提供了一套低层次的 API,使得开发者可以根据自己的需求生成机器码。比如,SLJIT 可以用于实现正则表达式的快速匹配,也可以用于脚本语言的即时编译。

SLJIT和下面的JIT技术属于同类工具:

  • Libjit/liblighning,the backend of GNU.net
  • Libgccjit,introduced in GCC5.0, its different from other JIT lib, this one seems like constructing a C code, it use the backend of GCC.
  • AsmJIT,branch from the famous V8 project (JavaScript engine in Chrome), support only X86/X86_64.
  • DynASM,used in LuaJIT.

SLJIT架构支持CPU架构指令情况如下所示:

Intel-x86 32
AMD-x86 64
ARM 32 (ARM-v5, ARM-v7 and Thumb2 instruction sets)
ARM 64
PowerPC 32
PowerPC 64
MIPS 32 (III, R1)
MIPS 64 (III, R1)
RISC-V 32
RISC-V 64
s390x (64)
loogarch 	#目前看到loogarch支持代码提交记录

SLJIT使用方式类似于汇编编程,通过中间层转换,可以屏蔽复杂的处理逻辑,下面是一个原始编程的例子:

typedef sljit_sw (*func3_t)(sljit_sw a, sljit_sw b, sljit_sw c);
static int branch(sljit_sw a, sljit_sw b, sljit_sw c)
{
    void    *code;
    sljit_uw len;
    func3_t  func;

    struct sljit_jump *ret_c;
    struct sljit_jump *out;

    /* Create a SLJIT compiler */
    struct sljit_compiler *C = sljit_create_compiler(NULL);

    /* 3 arg, 1 temp reg, 3 save reg */
    sljit_emit_enter(C, 0, SLJIT_ARG1(SW) | SLJIT_ARG2(SW) | SLJIT_ARG3(SW), 1, 3, 0, 0, 0);

    /* R0 = a & 1, S0 is argument a */
    sljit_emit_op2(C, SLJIT_AND, SLJIT_R0, 0, SLJIT_S0, 0, SLJIT_IMM, 1);

    /* if R0 == 0 then jump to ret_c, where is ret_c? we assign it later */
    ret_c = sljit_emit_cmp(C, SLJIT_EQUAL, SLJIT_R0, 0, SLJIT_IMM, 0);

    /* R0 = b, S1 is argument b */
    sljit_emit_op1(C, SLJIT_MOV, SLJIT_RETURN_REG, 0, SLJIT_S1, 0);

    /* jump to out */
    out = sljit_emit_jump(C, SLJIT_JUMP);

    /* here is the 'ret_c' should jump, we emit a label and set it to ret_c */
    sljit_set_label(ret_c, sljit_emit_label(C));

    /* R0 = c, S2 is argument c */
    sljit_emit_op1(C, SLJIT_MOV, SLJIT_RETURN_REG, 0, SLJIT_S2, 0);

    /* here is the 'out' should jump */
    sljit_set_label(out, sljit_emit_label(C));

    /* end of function */
    sljit_emit_return(C, SLJIT_MOV, SLJIT_RETURN_REG, 0);

    /* Generate machine code */
    code = sljit_generate_code(C);
    len  = sljit_get_generated_code_size(C);

    /* Execute code */
    func = (func3_t)code;
    printf("func return %ld\n", func(a, b, c));

    dump_code(code, len);

    /* Clean up */
    sljit_free_compiler(C);
    sljit_free_code(code);
    return 0;
}

这段SLJIT处理生成了一个简单的函数,C语言等价表示如下:

sljit_sw func(sljit_sw a, sljit_sw b, sljit_sw c)
{
    if ((a & 1) == 0)
        return c;
    return b;
}

SLJIT即时编译生成的汇编代码经过反编译后,输出如下:

Disassembly of section .data:

0000000000000000 <.data>:
   0:   f3 0f 1e fa                     endbr64 
   4:   53                              push   %rbx
   5:   41 57                           push   %r15
   7:   41 56                           push   %r14
   9:   48 8b df                        mov    %rdi,%rbx
   c:   4c 8b fe                        mov    %rsi,%r15
   f:   4c 8b f2                        mov    %rdx,%r14
  12:   48 89 d8                        mov    %rbx,%rax
  15:   48 83 e0 01                     and    $0x1,%rax
  19:   48 83 f8 00                     cmp    $0x0,%rax
  1d:   74 05                           je     0x24
  1f:   4c 89 f8                        mov    %r15,%rax
  22:   eb 03                           jmp    0x27
  24:   4c 89 f0                        mov    %r14,%rax
  27:   41 5e                           pop    %r14
  29:   41 5f                           pop    %r15
  2b:   5b                              pop    %rbx
  2c:   c3                              ret  

这个汇编代码并不算高效,因为很多无效堆栈保存操作。不过这也是编译器优化的痛点所在,即使用GCC编译C代码,在没有高效优化模型和编程技巧下,生成的汇编指令也是非常繁复

1.4 BPF和eBPF的兼容性

eBPF(extended Berkeley Packet Filter)是BPF(Berkeley Packet Filter)的一个扩展,它们在核心概念上是兼容的,但eBPF提供了更多的功能和更大的灵活性。下面是两者之间的关系和兼容性方面的一些细节:

基础兼容性

  • eBPF是BPF的超集,这意味着所有有效的BPF程序都是有效的eBPF程序。
  • eBPF扩展了BPF的指令集,增加了新的指令和功能,但保持了与传统BPF指令集的兼容性。

指令集和功能

  • eBPF引入了更多的寄存器,提供了64位寄存器支持,而传统BPF是基于32位的。
  • eBPF支持更复杂的数据结构(如maps),而传统BPF的功能主要局限于数据包过滤。
  • eBPF程序可以附加到多种内核挂钩点,而BPF主要用于网络数据包捕获和过滤。

向后兼容

  • Linux内核对eBPF提供了向后兼容支持,意味着旧的BPF程序可以在新的内核中运行,但是可能无法利用eBPF提供的所有新功能。

eBPF支持类C语言语法,相比于BPF的原始字节码,易用性大大提高,但是整个框架也更加复杂,需要Clang专门工具进行编译和开发。

1.5 常见BPF技术区别和联系

BPFJIT一般在内核中有实现,支持三类操作:

  • bpf_filter,标准Unix网络数据包过滤操作,直接解释BPF字节码。
  • bpf_validate,用于验证BPF字节码是否正常,避免无限循环和错误逻辑。
  • bpf_jit_generate,使用SLJIT即时生成BPF字节码的机器汇编指令。

整体处理逻辑如下:

SHELL命令(tcpdump)
SDK API(pcap)
bpf语言
bpf_validate
bpf_filter
bpf_jit_generate
SLJIT

libpcap库支持将常见的tcpdump命令转换为BPF指令码,从而实现灵活抓包功能。对于用户空间开发的程序,也可以支持类似的技术。在第二章,会借助bpf技术实现一个Tcpdump-mini程序,在里面尝试上述三种bpf操作,并给出对比数据。

2. 简易抓包程序(Tcpdump-mini)
2.1 获取SLJIT和BPFJIT源码

SLJIT源码下载: zherczeg/sljit: Platform independent low-level JIT compiler (github.com)

BPFJIT源码下载: alnsn/bpfjit: Just-in-Time compilation of bpf (github.com)

Ubuntu下创建一个干净的目录,需要安装好GNU开发套件(缺啥直接apt安装即可):

# 例如安装mk-configure
sudo apt install mk-configure

先下载BPFJIT源码,再下载SLJIT源码:

ubuntu->bpf-sop:$ git clone https://github.com/alnsn/bpfjit.git
Cloning into 'bpfjit'...
remote: Enumerating objects: 1092, done.
remote: Total 1092 (delta 0), reused 0 (delta 0), pack-reused 1092
Receiving objects: 100% (1092/1092), 215.03 KiB | 78.00 KiB/s, done.
Resolving deltas: 100% (666/666), done.

ubuntu->bpf-sop:$ git clone https://github.com/zherczeg/sljit.git
Cloning into 'sljit'...
remote: Enumerating objects: 6679, done.
remote: Counting objects: 100% (6679/6679), done.
remote: Compressing objects: 100% (1310/1310), done.
remote: Total 6679 (delta 5411), reused 6545 (delta 5330), pack-reused 0
Receiving objects: 100% (6679/6679), 3.99 MiB | 58.00 KiB/s, done.
Resolving deltas: 100% (5411/5411), done.

ubuntu->bpf-sop:$ ll
drwxrwxr-x 9 ubuntu ubuntu 4096 Mar 28 22:45 bpfjit/
drwxrwxr-x 7 ubuntu ubuntu 4096 Mar 28 22:50 sljit/

SLJIT的版本一直在更新,但是API存在不兼容变化,因此需要先找到对应版本的SLJIT,目前是BPFJIT指定的版本可以直接用。

ubuntu->bpf-sop:$ cd sljit/
ubuntu->sljit:$ git checkout 8d536bf7c334f9e31a5cc366e5d5d8cd1cd431b1
Note: switching to '8d536bf7c334f9e31a5cc366e5d5d8cd1cd431b1'.
# 2020年的版本
ubuntu->sljit:$ git log
commit 8d536bf7c334f9e31a5cc366e5d5d8cd1cd431b1 (HEAD)
Author: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Date:   Tue Aug 11 03:17:47 2020 -0700

    config: detect gcc support for fastcall (#75)
    
    haiku x86 uses gcc 2.95.2 as the system compiler and fails to build,
    because support for the fastcall calling convention was added with 3.4.
    
    detect the gcc version before enabling the attribute  and while at it
    reverse the condition and refactor the surrrounding code.

打包当前版本sljit源码,复制到bpfjit目录下面解压,需要注意别覆盖了原有的Makefile文件,否则会编译报错。

ubuntu->sljit:$ git archive --format=tar --output=sljit.tar HEAD
ubuntu->sljit:$ cd ../bpfjit/sljit/
ubuntu->sljit:$ mv Makefile bpf-sljit.mk 
ubuntu->sljit:$ tar -xf ../../sljit/sljit.tar 
ubuntu->sljit:$ ll
total 56
drwxrwxr-x 6 ubuntu ubuntu 4096 Mar 28 23:01 ./
drwxrwxr-x 9 ubuntu ubuntu 4096 Mar 28 22:45 ../
-rw-rw-r-- 1 ubuntu ubuntu 5627 Aug 11  2020 API_CHANGES
-rw-rw-r-- 1 ubuntu ubuntu   44 Mar 28 22:45 bpf-sljit.mk
drwxrwxr-x 3 ubuntu ubuntu 4096 Aug 11  2020 doc/
-rw-rw-r-- 1 ubuntu ubuntu   11 Aug 11  2020 .gitignore
-rw-rw-r-- 1 ubuntu ubuntu  245 Aug 11  2020 INTERNAL_CHANGES
-rw-rw-r-- 1 ubuntu ubuntu 4290 Aug 11  2020 Makefile
-rw-rw-r-- 1 ubuntu ubuntu 1033 Aug 11  2020 README
drwxrwxr-x 2 ubuntu ubuntu 4096 Aug 11  2020 regex_src/
drwxrwxr-x 2 ubuntu ubuntu 4096 Aug 11  2020 sljit_src/
drwxrwxr-x 2 ubuntu ubuntu 4096 Aug 11  2020 test_src/

交换Makefile和bpf-sljit.mk两个文件的名字,bpfjit有自己的一套编译流程,所以需要分开编译。

ubuntu->sljit:$ mv Makefile sljit-self.mk
ubuntu->sljit:$ mv bpf-sljit.mk Makefile

修改一下Makefile文件,通过Make子进程单独编译测试程序,默认SLJIT是源码分发,不编译动态库和静态库。

先编译SLJIT,并且测试一下功能:

ubuntu->sljit:$ make -f sljit-self.mk 
mkdir -p bin
cc  -DSLJIT_CONFIG_AUTO=1 -Isljit_src -O2 -Wall -c -o bin/sljitMain.o test_src/sljitMain.c
cc  -DSLJIT_CONFIG_AUTO=1 -Isljit_src -O2 -Wall -c -o bin/sljitTest.o test_src/sljitTest.c
cc  -DSLJIT_CONFIG_AUTO=1 -Isljit_src -O2 -Wall -c -o bin/sljitLir.o sljit_src/sljitLir.c
cc -O2 -Wall  bin/sljitMain.o bin/sljitTest.o bin/sljitLir.o -o bin/sljit_test -lm -lpthread
cc  -DSLJIT_CONFIG_AUTO=1 -Isljit_src -O2 -Wall -fshort-wchar -c -o bin/regexMain.o regex_src/regexMain.c
cc  -DSLJIT_CONFIG_AUTO=1 -Isljit_src -O2 -Wall -fshort-wchar -c -o bin/regexJIT.o regex_src/regexJIT.c
cc -O2 -Wall  bin/regexMain.o bin/regexJIT.o bin/sljitLir.o -o bin/regex_test -lm -lpthread

ubuntu->sljit:$ export LD_LIBRARY_PATH=./bin
ubuntu->sljit:$ ./bin/sljit_test
Pass -v to enable verbose, -s to disable this hint.

SLJIT tests: all tests are PASSED on x86 64bit (little endian + unaligned) (with fpu)
ubuntu->sljit:$ ./bin/regex_test 
Pass -v to enable verbose, -s to disable this hint.

REGEX tests: all tests are PASSED on x86 64bit (little endian + unaligned)

测试完毕可以看到功能正常,然后继续编译bpfjit。需要修改以下部分代码,避免编译报错异常退出:

# bpfjit/test/test_empty.c 39行 添加初始化值
	struct bpf_insn dummy = {0};

使用mkcmake直接编译,如果正常,将直接编译成功,如果存在问题,按照编译提示修改即可(编译器版本不同,会有新增报错,这个很正常)。

ubuntu->bpfjit:$ mkcmake
==================================================
all ===> sljit
==================================================
all ===> sljit/sljit_src
==================================================
all ===> src
==================================================
all ===> test
==================================================
all ===> benchmark
cc   -I ../src -I ../sljit/sljit_src/ -DSLJIT_CONFIG_AUTO=1       -Wall -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Wreturn-type -Wswitch -Wshadow -Wcast-qual -Wwrite-strings -Wno-unused-parameter    -Werror       -c -o benchmark.o -O2 -g  benchmark.c
cc   -I ../src -I ../sljit/sljit_src/ -DSLJIT_CONFIG_AUTO=1       -Wall -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Wreturn-type -Wswitch -Wshadow -Wcast-qual -Wwrite-strings -Wno-unused-parameter    -Werror       -c -o c.o -O2 -g  c.c
cc -o bpfjit_benchmark benchmark.o c.o     -L /home/ubuntu/NetBSD/bpf-sop/bpfjit/benchmark/../src -L /home/ubuntu/NetBSD/bpf-sop/bpfjit/benchmark/../sljit/sljit_src   -lpcap -lbpfjit -lsljit 

安装到固定目录中:

ubuntu->bpfjit:$ export DESTDIR=/home/ubuntu/NetBSD/bpf-sop
ubuntu->bpfjit:$ env PREFIX=/ mkcmake install
==================================================
install ===> sljit
==================================================
install ===> sljit/sljit_src
if test -n "/home/ubuntu/NetBSD/bpf-sop//lib"; then mkc_install -c -d -m 755 /home/ubuntu/NetBSD/bpf-sop//lib; fi
mkc_install  -c -o ubuntu  -g ubuntu -m 644 libsljit.a /home/ubuntu/NetBSD/bpf-sop//lib/libsljit.a
mkc_install  -c -o ubuntu  -g ubuntu -m 644 libsljit_pic.a /home/ubuntu/NetBSD/bpf-sop//lib/libsljit_pic.a
mkc_install  -c -o ubuntu  -g ubuntu -m 644 libsljit.so.1.0.0 /home/ubuntu/NetBSD/bpf-sop//lib/libsljit.so.1.0.0
ln -s -f libsljit.so.1.0.0  /home/ubuntu/NetBSD/bpf-sop//lib/libsljit.so.1
ln -s -f libsljit.so.1.0.0  /home/ubuntu/NetBSD/bpf-sop//lib/libsljit.so
==================================================
......

可以执行一下bpfjit的单元测试程序,如下:

ubuntu->bpf-sop:$ export LD_LIBRARY_PATH=./lib
ubuntu->bpf-sop:$ export PATH=$PATH:./bin
ubuntu->bpf-sop:$ ./bin/bpfjit_test 
bpfjit_test: test_copx.c:139 (in test_copx_ret_A): code(&ctx, &args) == 13
bpfjit_test: test_copx_extmem.c:96 (in test_copx_ret_mem): code(&ctx, &args) == 13
bpfjit_test: test_copx_extmem.c:138 (in test_copx_ret_preinited_mem): code(&ctx, &args) == 3

这里打印三个测试信息,说明bpfjit有三个单元测试用例无法测试通过,目前可以先忽略

查看lib目录下面,就有完整的动态库和静态库文件,除此之外,还需要有相应的头文件,这里没有安装,功能有所欠缺。

ubuntu->bpf-sop:$ ll lib/ -h
-rw-r--r-- 1 ubuntu ubuntu  65K Mar 28 23:30 libbpfjit.a
-rw-r--r-- 1 ubuntu ubuntu  65K Mar 28 23:30 libbpfjit_pic.a
lrwxrwxrwx 1 ubuntu ubuntu   18 Mar 28 23:30 libbpfjit.so -> libbpfjit.so.1.0.0
lrwxrwxrwx 1 ubuntu ubuntu   18 Mar 28 23:30 libbpfjit.so.1 -> libbpfjit.so.1.0.0
-rw-r--r-- 1 ubuntu ubuntu  55K Mar 28 23:30 libbpfjit.so.1.0.0
-rw-r--r-- 1 ubuntu ubuntu 471K Mar 28 23:30 libsljit.a
-rw-r--r-- 1 ubuntu ubuntu 474K Mar 28 23:30 libsljit_pic.a
lrwxrwxrwx 1 ubuntu ubuntu   17 Mar 28 23:30 libsljit.so -> libsljit.so.1.0.0
lrwxrwxrwx 1 ubuntu ubuntu   17 Mar 28 23:30 libsljit.so.1 -> libsljit.so.1.0.0
-rw-r--r-- 1 ubuntu ubuntu 275K Mar 28 23:30 libsljit.so.1.0.0
2.2 编写tcpdump-mini程序

第一个文件ether-input.c,用于初始化原始套接字,从网卡读取原始以太报文。

// ether-input.c 初始化Raw套接字,并且收取原始报文。
extern int32_t ether_sock_init(const char *if_name);
extern int32_t ether_recv_packet(int32_t sock, char *buffer, int32_t len);

第二个文件mini-tcpdump.c,实现抓包过滤处理逻辑,将tcpdump过滤参数转换为bpf和机器指令,打印符合条件的报文信息。

// min-tcpdump.c 处理参数和过滤,打印目标报文信息
int32_t deal_tcpdump_code(pcap_t **handle, struct bpf_program *fp, const char *filter_exp);
void print_packet_info(const char *packet);
int32_t capture_packets(int32_t sock, struct bpf_program *fp, bpfjit_func_t code);

int32_t main(int32_t argc, char *argv[])
{
    int32_t            sock;
    pcap_t            *handle;
    struct bpf_program fp;
    bpfjit_func_t      code;

    if (argc != 3) {
        fprintf(stderr, "Usage: %s <interface> <tcpdump code>\n", argv[0]);
        return -1;
    }

    sock   = 0;
    handle = NULL;
    code   = NULL;
    memset(&fp, 0, sizeof(struct bpf_program));

    /* 也许用一下伪Lambda函数?至少可以避免全局变量使用 */
    lambda (void, free_source, int32_t signo) {
        /* 打印提示信息, 回收资源 */
        if (signo != -1) {
            printf("\nCtrl+C is pressed(sig %d), exit with 0.\n", signo);
        }
        if (fp.bf_insns) {
            pcap_freecode(&fp);
        }
        if (handle) {
            pcap_close(handle);
        }
        if (code) {
            bpfjit_free_code(code);
        }
        if (sock) {
            close(sock);
        }
        exit(0);
    }

    /* 注册ctrl+c信号处理函数 */
    signal(SIGINT, free_source);

    printf("Try to dump packet from interface(%s) with filter(%s)\n", argv[1], argv[2]);
    sock = ether_sock_init(argv[1]);
    if (sock < 0) {
        fprintf(stderr, "Failed to init socket\n");
        free_source(-1);
        return -1;
    }

    /* 编译tcpdump参数为bpf指令码 */
    if (deal_tcpdump_code(&handle, &fp, argv[2]) != 0) {
        fprintf(stderr, "Failed to deal tcpdump code\n");
        free_source(-1);
        return -1;
    }

    /* 验证bpf指令码的正确性 */
    if (bpf_validate(fp.bf_insns, fp.bf_len) == 0) {
        fprintf(stderr, "Failed to validate bpf code\n");
        free_source(-1);
        return -1;
    }

    /* 编译bpf指令码为机器指令 */
    code = bpfjit_generate_code(NULL, fp.bf_insns, fp.bf_len);
    if (code == 0) {
        fprintf(stderr, "Failed to compile bpf code\n");
        free_source(-1);
        return -1;
    }

    /* 循环抓包到结束 */
    if (capture_packets(sock, &fp, code) != 0) {
        fprintf(stderr, "Failed to capture packets\n");
        free_source(-1);
        return -1;
    }

    return 0;
}

mini-tcpdump程序主函数里面,执行了如下操作流程:

  1. 首先,函数接受两个参数:argcargvargc表示命令行参数的数量,argv是一个指向参数字符串的指针数组。

  2. 接下来,函数检查命令行参数的数量是否为3。如果不是3个参数,它会打印出使用说明并返回-1,表示程序执行失败。

  3. 然后,函数初始化一些变量,包括一个整型变量sock、一个指向pcap_t结构体的指针handle、一个struct bpf_program结构体变量fp和一个指向bpfjit_func_t类型的指针code。这些变量用于后续的操作。

  4. 函数定义了一个伪Lambda函数free_source,用于释放资源。这个函数会在程序接收到SIGINT信号(即用户按下Ctrl+C)时被调用。它会打印提示信息,并释放之前分配的资源,包括fp的指令码、handle的资源、code的机器指令以及sock的文件描述符。

  5. 接下来,函数注册了SIGINT信号的处理函数为free_source,以便在用户按下Ctrl+C时执行资源释放操作。

  6. 函数打印一条提示信息,显示要从指定的网络接口抓取数据包,并使用指定的过滤器进行过滤。

  7. 函数调用ether_sock_init函数初始化一个套接字,并将返回的文件描述符保存在sock变量中。如果初始化失败,函数会打印错误信息,调用free_source函数释放资源,并返回-1,表示程序执行失败。

  8. 函数调用deal_tcpdump_code函数,将指定的tcpdump代码编译为BPF指令码,并将结果保存在handlefp变量中。如果编译失败,函数会打印错误信息,调用free_source函数释放资源,并返回-1,表示程序执行失败。

  9. 函数调用bpf_validate函数验证BPF指令码的正确性。如果验证失败,函数会打印错误信息,调用free_source函数释放资源,并返回-1,表示程序执行失败。

  10. 函数调用bpfjit_generate_code函数将BPF指令码编译为机器指令,并将结果保存在code变量中。如果编译失败,函数会打印错误信息,调用free_source函数释放资源,并返回-1,表示程序执行失败。

  11. 最后,函数调用capture_packets函数开始循环抓取数据包,直到结束。如果抓包失败,函数会打印错误信息,调用free_source函数释放资源,并返回-1,表示程序执行失败。

  12. 如果所有操作都成功完成,函数返回0,表示程序执行成功。

这个函数的主要目的是从指定的网络接口抓取数据包,并根据指定的tcpdump代码进行过滤和处理。它使用了一些库函数和自定义函数来实现这些功能,并在程序执行过程中处理了一些错误情况,以确保程序的稳定性和正确性。

2.3 对比bpf-filter和bpfjit的开销

除了正常验证bpf过滤功能之外,这里还简单对比了一下函数解析BPF指令码和JIT即使编译执行的性能:

/* 获取绝对时间差值 */
static inline int64_t get_current_time(void)
{
    struct timespec ts;

    clock_gettime(CLOCK_MONOTONIC_RAW, &ts);
    return ts.tv_sec * 1000000000 + ts.tv_nsec;
}

int32_t capture_packets(int32_t sock, struct bpf_program *fp, bpfjit_func_t code)
{
    int32_t ret, temp;
    int64_t start_time, end_time;
    int64_t filtered_count, captured_count;
    int64_t func_cost_time, jit_cost_time;
    char    buffer[PACKET_SIZE];

    func_cost_time = jit_cost_time = 0;
    filtered_count = captured_count = 0;
    printf("Start to capture packets...\n");
    while (1) {
        int32_t len = ether_recv_packet(sock, buffer, PACKET_SIZE);
        if (len < 0) {
            perror("Failed to receive packet");
            break;
        }

        /* 执行BPF过滤器函数 */
        start_time = get_current_time();
        ret        = bpf_filter(fp->bf_insns, (const u_char *)buffer, len, len);
        end_time   = get_current_time();
        func_cost_time += end_time - start_time;

        /* 执行BPF过滤即时编译指令 */
        start_time = get_current_time();
        temp       = jitcall(code, (const u_char *)buffer, len, len);
        end_time   = get_current_time();
        jit_cost_time += end_time - start_time;
        if (temp != ret) {
            fprintf(stderr,
                "Warning, Result of executing bpf jit code is not equal to filter func: %d -> "
                "%d.\n",
                ret, temp);
            return -1;
        }

        if (ret == 0) {
            filtered_count++;
            continue;
        }
        captured_count++;
        /* 打印抓到的报文信息 */
        printf("[%ld]Packet captured ! Bypass %ld, Time avg cost: %ld ns(func) - %ld ns(jit).\n",
            captured_count, filtered_count, func_cost_time / (captured_count + filtered_count),
            jit_cost_time / (captured_count + filtered_count));
        print_packet_info(buffer);
    }

    return -1;
}

这个函数用于捕获网络数据包并执行BPF过滤器函数和即时编译指令:

  1. 首先,函数声明了一些变量,包括rettempstart_timeend_timefiltered_countcaptured_countfunc_cost_timejit_cost_timebuffer

  2. 接下来,函数初始化了func_cost_timejit_cost_time为0,用于记录执行BPF过滤器函数和即时编译指令的时间。

  3. 函数打印了一条提示信息,表示开始捕获数据包。

  4. 然后,函数进入一个无限循环,用于不断捕获数据包并进行处理。

  5. 在循环中,函数调用ether_recv_packet函数接收一个数据包,并将其存储在buffer中。如果接收失败,函数会打印错误信息并跳出循环。

  6. 接下来,函数执行BPF过滤器函数。它调用bpf_filter函数,将过滤器指令、数据包和数据包长度作为参数传递给它。函数还记录了执行过滤器函数的起始时间和结束时间,以计算执行时间。

  7. 然后,函数执行BPF即时编译指令。它调用jitcall函数,将即时编译的代码、数据包和数据包长度作为参数传递给它。函数同样记录了执行即时编译指令的起始时间和结束时间。

  8. 函数比较了BPF过滤器函数和即时编译指令的返回值。如果它们不相等,函数会打印警告信息,并返回-1。

  9. 如果返回值为0,表示数据包被过滤掉了,函数会增加filtered_count的计数,并继续下一次循环。

  10. 如果返回值不为0,表示数据包符合过滤条件,函数会增加captured_count的计数,并打印捕获到的数据包信息。

  11. 循环会一直执行,直到出现错误或者手动中断循环。

  12. 最后,函数返回-1,表示捕获过程出现了错误。

这个函数的主要目的是捕获数据包并执行BPF过滤器函数和即时编译指令,以实现网络数据包的过滤和处理功能。

2.4 实际效果演示

首先抓取一下icmp报文看看,如下:

onceday->bpf-sop:# export LD_LIBRARY_PATH=./lib
onceday->bpf-sop:# ./mini-tcpdump eth0 "icmp"
Try to dump packet from interface(eth0) with filter(icmp)
BPF bytecode length: 6
BPF bytecode: 
28, 00, 00, 0c
15, 00, 03, 800
30, 00, 00, 17
15, 00, 01, 01
06, 00, 00, 2000
06, 00, 00, 00
(000) ldh      [12]
(001) jeq      #0x800           jt 2    jf 5
(002) ldb      [23]
(003) jeq      #0x1             jt 4    jf 5
(004) ret      #8192
(005) ret      #0
Start to capture packets...
[1]Packet captured ! Bypass 20, Time avg cost: 932 ns(func) - 204 ns(jit).
        Ethernet: fe:ee:8f:bf:86:99 -> 52:54:00:85:f0:22, Type : 0x0800 
        IP: 169.254.128.17 -> 10.0.4.17, Proto: 1, Total Length: 28.
[2]Packet captured ! Bypass 20, Time avg cost: 895 ns(func) - 197 ns(jit).
        Ethernet: 52:54:00:85:f0:22 -> fe:ee:8f:bf:86:99, Type : 0x0800 
        IP: 10.0.4.17 -> 169.254.128.17, Proto: 1, Total Length: 28.
^C
Ctrl+C is pressed(sig 2), exit with 0.

表达式复杂度可以再高一些(捕获目标IP为169.254.0.4的80端口TCP报文,或者ICMP报文):

onceday->bpf-sop:# export LD_LIBRARY_PATH=./lib
onceday->bpf-sop:# ./mini-tcpdump eth0 "(tcp port 80 and host 169.254.0.4) or icmp"
Try to dump packet from interface(eth0) with filter((tcp port 80 and host 169.254.0.4) or icmp)
BPF bytecode length: 50
BPF bytecode: 
28, 00, 00, 0c
15, 00, 06, 86dd
30, 00, 00, 14
15, 00, 04, 06
28, 00, 00, 36
15, 0e, 00, 50
28, 00, 00, 38
15, 0c, 00, 50
28, 00, 00, 0c
15, 00, 22, 800
30, 00, 00, 17
15, 00, 20, 06
28, 00, 00, 14
45, 1e, 00, 1fff
b1, 00, 00, 0e
48, 00, 00, 0e
15, 03, 00, 50
b1, 00, 00, 0e
48, 00, 00, 10
15, 00, 18, 50
28, 00, 00, 0c
15, 00, 02, 800
20, 00, 00, 1a
15, 18, 00, a9fe0004
28, 00, 00, 0c
15, 00, 02, 800
20, 00, 00, 1e
15, 14, 00, a9fe0004
28, 00, 00, 0c
15, 00, 02, 806
20, 00, 00, 1c
15, 10, 00, a9fe0004
28, 00, 00, 0c
15, 00, 02, 806
20, 00, 00, 26
15, 0c, 00, a9fe0004
28, 00, 00, 0c
15, 00, 02, 8035
20, 00, 00, 1c
15, 08, 00, a9fe0004
28, 00, 00, 0c
15, 00, 02, 8035
20, 00, 00, 26
15, 04, 00, a9fe0004
28, 00, 00, 0c
15, 00, 03, 800
30, 00, 00, 17
15, 00, 01, 01
06, 00, 00, 2000
06, 00, 00, 00
(000) ldh      [12]
(001) jeq      #0x86dd          jt 2    jf 8
(002) ldb      [20]
(003) jeq      #0x6             jt 4    jf 8
(004) ldh      [54]
(005) jeq      #0x50            jt 20   jf 6
(006) ldh      [56]
(007) jeq      #0x50            jt 20   jf 8
(008) ldh      [12]
(009) jeq      #0x800           jt 10   jf 44
(010) ldb      [23]
(011) jeq      #0x6             jt 12   jf 44
(012) ldh      [20]
(013) jset     #0x1fff          jt 44   jf 14
(014) ldxb     4*([14]&0xf)
(015) ldh      [x + 14]
(016) jeq      #0x50            jt 20   jf 17
(017) ldxb     4*([14]&0xf)
(018) ldh      [x + 16]
(019) jeq      #0x50            jt 20   jf 44
(020) ldh      [12]
(021) jeq      #0x800           jt 22   jf 24
(022) ld       [26]
(023) jeq      #0xa9fe0004      jt 48   jf 24
(024) ldh      [12]
(025) jeq      #0x800           jt 26   jf 28
(026) ld       [30]
(027) jeq      #0xa9fe0004      jt 48   jf 28
(028) ldh      [12]
(029) jeq      #0x806           jt 30   jf 32
(030) ld       [28]
(031) jeq      #0xa9fe0004      jt 48   jf 32
(032) ldh      [12]
(033) jeq      #0x806           jt 34   jf 36
(034) ld       [38]
(035) jeq      #0xa9fe0004      jt 48   jf 36
(036) ldh      [12]
(037) jeq      #0x8035          jt 38   jf 40
(038) ld       [28]
(039) jeq      #0xa9fe0004      jt 48   jf 40
(040) ldh      [12]
(041) jeq      #0x8035          jt 42   jf 44
(042) ld       [38]
(043) jeq      #0xa9fe0004      jt 48   jf 44
(044) ldh      [12]
(045) jeq      #0x800           jt 46   jf 49
(046) ldb      [23]
(047) jeq      #0x1             jt 48   jf 49
(048) ret      #8192
(049) ret      #0
Start to capture packets...
[1]Packet captured ! Bypass 7, Time avg cost: 368 ns(func) - 95 ns(jit).
        Ethernet: fe:ee:8f:bf:86:99 -> 52:54:00:85:f0:22, Type : 0x0800 
        IP: 169.254.128.6 -> 10.0.4.17, Proto: 1, Total Length: 28.
[2]Packet captured ! Bypass 7, Time avg cost: 353 ns(func) - 94 ns(jit).
        Ethernet: 52:54:00:85:f0:22 -> fe:ee:8f:bf:86:99, Type : 0x0800 
        IP: 10.0.4.17 -> 169.254.128.6, Proto: 1, Total Length: 28.
[3]Packet captured ! Bypass 27, Time avg cost: 1170 ns(func) - 284 ns(jit).
        Ethernet: fe:ee:8f:bf:86:99 -> 52:54:00:85:f0:22, Type : 0x0800 
        IP: 169.254.128.17 -> 10.0.4.17, Proto: 1, Total Length: 28.
[4]Packet captured ! Bypass 27, Time avg cost: 1138 ns(func) - 277 ns(jit).
        Ethernet: 52:54:00:85:f0:22 -> fe:ee:8f:bf:86:99, Type : 0x0800 
        IP: 10.0.4.17 -> 169.254.128.17, Proto: 1, Total Length: 28.
[5]Packet captured ! Bypass 55, Time avg cost: 1283 ns(func) - 302 ns(jit).
        Ethernet: fe:ee:8f:bf:86:99 -> 52:54:00:85:f0:22, Type : 0x0800 
        IP: 169.254.128.6 -> 10.0.4.17, Proto: 1, Total Length: 28.
[6]Packet captured ! Bypass 55, Time avg cost: 1264 ns(func) - 298 ns(jit).
        Ethernet: 52:54:00:85:f0:22 -> fe:ee:8f:bf:86:99, Type : 0x0800 
        IP: 10.0.4.17 -> 169.254.128.6, Proto: 1, Total Length: 28.
[7]Packet captured ! Bypass 68, Time avg cost: 1345 ns(func) - 324 ns(jit).
        Ethernet: 52:54:00:85:f0:22 -> fe:ee:8f:bf:86:99, Type : 0x0800 
        IP: 10.0.4.17/50252 -> 169.254.0.4/80, TCP(6), Total Length: 60.
[8]Packet captured ! Bypass 69, Time avg cost: 1338 ns(func) - 322 ns(jit).
        Ethernet: fe:ee:8f:bf:86:99 -> 52:54:00:85:f0:22, Type : 0x0800 
        IP: 169.254.0.4/80 -> 10.0.4.17/50252, TCP(6), Total Length: 52.
[9]Packet captured ! Bypass 69, Time avg cost: 1325 ns(func) - 319 ns(jit).
        Ethernet: 52:54:00:85:f0:22 -> fe:ee:8f:bf:86:99, Type : 0x0800 
        IP: 10.0.4.17/50252 -> 169.254.0.4/80, TCP(6), Total Length: 40.
[10]Packet captured ! Bypass 69, Time avg cost: 1312 ns(func) - 315 ns(jit).
        Ethernet: 52:54:00:85:f0:22 -> fe:ee:8f:bf:86:99, Type : 0x0800 
        IP: 10.0.4.17/50252 -> 169.254.0.4/80, TCP(6), Total Length: 221.
[11]Packet captured ! Bypass 70, Time avg cost: 1302 ns(func) - 312 ns(jit).
        Ethernet: fe:ee:8f:bf:86:99 -> 52:54:00:85:f0:22, Type : 0x0800 
        IP: 169.254.0.4/80 -> 10.0.4.17/50252, TCP(6), Total Length: 40.
[12]Packet captured ! Bypass 70, Time avg cost: 1290 ns(func) - 310 ns(jit).
        Ethernet: 52:54:00:85:f0:22 -> fe:ee:8f:bf:86:99, Type : 0x0800 
        IP: 10.0.4.17/50252 -> 169.254.0.4/80, TCP(6), Total Length: 835.
[13]Packet captured ! Bypass 71, Time avg cost: 1278 ns(func) - 308 ns(jit).
        Ethernet: fe:ee:8f:bf:86:99 -> 52:54:00:85:f0:22, Type : 0x0800 
        IP: 169.254.0.4/80 -> 10.0.4.17/50252, TCP(6), Total Length: 40.
[14]Packet captured ! Bypass 72, Time avg cost: 1272 ns(func) - 306 ns(jit).
        Ethernet: fe:ee:8f:bf:86:99 -> 52:54:00:85:f0:22, Type : 0x0800 
        IP: 169.254.0.4/80 -> 10.0.4.17/50252, TCP(6), Total Length: 256.
[15]Packet captured ! Bypass 72, Time avg cost: 1261 ns(func) - 304 ns(jit).
        Ethernet: 52:54:00:85:f0:22 -> fe:ee:8f:bf:86:99, Type : 0x0800 
        IP: 10.0.4.17/50252 -> 169.254.0.4/80, TCP(6), Total Length: 40.
[16]Packet captured ! Bypass 72, Time avg cost: 1249 ns(func) - 301 ns(jit).
        Ethernet: fe:ee:8f:bf:86:99 -> 52:54:00:85:f0:22, Type : 0x0800 
        IP: 169.254.0.4/80 -> 10.0.4.17/50252, TCP(6), Total Length: 40.
[17]Packet captured ! Bypass 73, Time avg cost: 1227 ns(func) - 296 ns(jit).
        Ethernet: 52:54:00:85:f0:22 -> fe:ee:8f:bf:86:99, Type : 0x0800 
        IP: 10.0.4.17/50252 -> 169.254.0.4/80, TCP(6), Total Length: 40.
[18]Packet captured ! Bypass 74, Time avg cost: 1220 ns(func) - 295 ns(jit).
        Ethernet: fe:ee:8f:bf:86:99 -> 52:54:00:85:f0:22, Type : 0x0800 
        IP: 169.254.0.4/80 -> 10.0.4.17/50252, TCP(6), Total Length: 40.
[19]Packet captured ! Bypass 84, Time avg cost: 1248 ns(func) - 302 ns(jit).
        Ethernet: fe:ee:8f:bf:86:99 -> 52:54:00:85:f0:22, Type : 0x0800 
        IP: 169.254.128.17 -> 10.0.4.17, Proto: 1, Total Length: 28.
[20]Packet captured ! Bypass 84, Time avg cost: 1238 ns(func) - 299 ns(jit).
        Ethernet: 52:54:00:85:f0:22 -> fe:ee:8f:bf:86:99, Type : 0x0800 
        IP: 10.0.4.17 -> 169.254.128.17, Proto: 1, Total Length: 28.
^C
Ctrl+C is pressed(sig 2), exit with 0.

可以更进一步抓包,以下tcpdump表达式来捕获所有含有SYN标识的TCP报文:

onceday->bpf-sop:# export LD_LIBRARY_PATH=./lib
onceday->bpf-sop:# ./mini-tcpdump eth0 "tcp[tcpflags] & (tcp-syn) != 0"
Try to dump packet from interface(eth0) with filter(tcp[tcpflags] & (tcp-syn) != 0)
BPF bytecode length: 38
BPF bytecode: 
28, 00, 00, 0c
15, 00, 23, 800
28, 00, 00, 0c
15, 00, 06, 86dd
30, 00, 00, 14
15, 08, 00, 06
30, 00, 00, 14
15, 00, 02, 2c
30, 00, 00, 36
15, 04, 00, 06
28, 00, 00, 0c
15, 00, 19, 800
30, 00, 00, 17
15, 00, 17, 06
28, 00, 00, 14
45, 15, 00, 1fff
00, 00, 00, 0d
02, 00, 00, 00
b1, 00, 00, 0e
60, 00, 00, 00
0c, 00, 00, 00
07, 00, 00, 00
50, 00, 00, 0e
02, 00, 00, 01
00, 00, 00, 02
02, 00, 00, 02
61, 00, 00, 02
60, 00, 00, 01
5c, 00, 00, 00
02, 00, 00, 02
00, 00, 00, 00
02, 00, 00, 03
61, 00, 00, 03
60, 00, 00, 02
1c, 00, 00, 00
15, 01, 00, 00
06, 00, 00, 2000
06, 00, 00, 00
(000) ldh      [12]
(001) jeq      #0x800           jt 2    jf 37
(002) ldh      [12]
(003) jeq      #0x86dd          jt 4    jf 10
(004) ldb      [20]
(005) jeq      #0x6             jt 14   jf 6
(006) ldb      [20]
(007) jeq      #0x2c            jt 8    jf 10
(008) ldb      [54]
(009) jeq      #0x6             jt 14   jf 10
(010) ldh      [12]
(011) jeq      #0x800           jt 12   jf 37
(012) ldb      [23]
(013) jeq      #0x6             jt 14   jf 37
(014) ldh      [20]
(015) jset     #0x1fff          jt 37   jf 16
(016) ld       #0xd
(017) st       M[0]
(018) ldxb     4*([14]&0xf)
(019) ld       M[0]
(020) add      x
(021) tax      
(022) ldb      [x + 14]
(023) st       M[1]
(024) ld       #0x2
(025) st       M[2]
(026) ldx      M[2]
(027) ld       M[1]
(028) and      x
(029) st       M[2]
(030) ld       #0x0
(031) st       M[3]
(032) ldx      M[3]
(033) ld       M[2]
(034) sub      x
(035) jeq      #0x0             jt 37   jf 36
(036) ret      #8192
(037) ret      #0
Start to capture packets...
[1]Packet captured ! Bypass 33, Time avg cost: 990 ns(func) - 208 ns(jit).
        Ethernet: 52:54:00:85:f0:22 -> fe:ee:8f:bf:86:99, Type : 0x0800 
        IP: 10.0.4.17/33886 -> 13.107.5.93/443, TCP(6), Total Length: 60.
[2]Packet captured ! Bypass 34, Time avg cost: 1009 ns(func) - 211 ns(jit).
        Ethernet: 52:54:00:85:f0:22 -> fe:ee:8f:bf:86:99, Type : 0x0800 
        IP: 10.0.4.17/41920 -> 169.254.0.4/80, TCP(6), Total Length: 60.
[3]Packet captured ! Bypass 35, Time avg cost: 1035 ns(func) - 216 ns(jit).
        Ethernet: fe:ee:8f:bf:86:99 -> 52:54:00:85:f0:22, Type : 0x0800 
        IP: 169.254.0.4/80 -> 10.0.4.17/41920, TCP(6), Total Length: 52.
[4]Packet captured ! Bypass 41, Time avg cost: 1183 ns(func) - 214 ns(jit).
        Ethernet: fe:ee:8f:bf:86:99 -> 52:54:00:85:f0:22, Type : 0x0800 
        IP: 13.107.5.93/443 -> 10.0.4.17/33886, TCP(6), Total Length: 52.
[5]Packet captured ! Bypass 85, Time avg cost: 1273 ns(func) - 236 ns(jit).
        Ethernet: 52:54:00:85:f0:22 -> fe:ee:8f:bf:86:99, Type : 0x0800 
        IP: 10.0.4.17/41926 -> 169.254.0.4/80, TCP(6), Total Length: 60.
[6]Packet captured ! Bypass 86, Time avg cost: 1274 ns(func) - 236 ns(jit).
        Ethernet: fe:ee:8f:bf:86:99 -> 52:54:00:85:f0:22, Type : 0x0800 
        IP: 169.254.0.4/80 -> 10.0.4.17/41926, TCP(6), Total Length: 52.
[7]Packet captured ! Bypass 311, Time avg cost: 610 ns(func) - 134 ns(jit).
        Ethernet: 52:54:00:85:f0:22 -> fe:ee:8f:bf:86:99, Type : 0x0800 
        IP: 10.0.4.17/33902 -> 13.107.5.93/443, TCP(6), Total Length: 60.
[8]Packet captured ! Bypass 312, Time avg cost: 612 ns(func) - 134 ns(jit).
        Ethernet: fe:ee:8f:bf:86:99 -> 52:54:00:85:f0:22, Type : 0x0800 
        IP: 13.107.5.93/443 -> 10.0.4.17/33902, TCP(6), Total Length: 52.
[9]Packet captured ! Bypass 343, Time avg cost: 673 ns(func) - 145 ns(jit).
        Ethernet: 52:54:00:85:f0:22 -> fe:ee:8f:bf:86:99, Type : 0x0800 
        IP: 10.0.4.17/33906 -> 13.107.5.93/443, TCP(6), Total Length: 60.
[10]Packet captured ! Bypass 344, Time avg cost: 677 ns(func) - 145 ns(jit).
        Ethernet: fe:ee:8f:bf:86:99 -> 52:54:00:85:f0:22, Type : 0x0800 
        IP: 13.107.5.93/443 -> 10.0.4.17/33906, TCP(6), Total Length: 52.
[11]Packet captured ! Bypass 400, Time avg cost: 775 ns(func) - 162 ns(jit).
        Ethernet: 52:54:00:85:f0:22 -> fe:ee:8f:bf:86:99, Type : 0x0800 
        IP: 10.0.4.17/41930 -> 169.254.0.4/80, TCP(6), Total Length: 60.
[12]Packet captured ! Bypass 401, Time avg cost: 779 ns(func) - 163 ns(jit).
        Ethernet: fe:ee:8f:bf:86:99 -> 52:54:00:85:f0:22, Type : 0x0800 
        IP: 169.254.0.4/80 -> 10.0.4.17/41930, TCP(6), Total Length: 52.
[13]Packet captured ! Bypass 422, Time avg cost: 807 ns(func) - 168 ns(jit).
        Ethernet: 52:54:00:85:f0:22 -> fe:ee:8f:bf:86:99, Type : 0x0800 
        IP: 10.0.4.17/42400 -> 169.254.0.4/80, TCP(6), Total Length: 60.
[14]Packet captured ! Bypass 423, Time avg cost: 808 ns(func) - 168 ns(jit).
        Ethernet: fe:ee:8f:bf:86:99 -> 52:54:00:85:f0:22, Type : 0x0800 
        IP: 169.254.0.4/80 -> 10.0.4.17/42400, TCP(6), Total Length: 52.
[15]Packet captured ! Bypass 447, Time avg cost: 835 ns(func) - 173 ns(jit).
        Ethernet: 52:54:00:85:f0:22 -> fe:ee:8f:bf:86:99, Type : 0x0800 
        IP: 10.0.4.17/42402 -> 169.254.0.4/80, TCP(6), Total Length: 60.
[16]Packet captured ! Bypass 448, Time avg cost: 837 ns(func) - 173 ns(jit).
        Ethernet: fe:ee:8f:bf:86:99 -> 52:54:00:85:f0:22, Type : 0x0800 
        IP: 169.254.0.4/80 -> 10.0.4.17/42402, TCP(6), Total Length: 52.
^C
Ctrl+C is pressed(sig 2), exit with 0.
2.5 tcpdump-mini源码文件
2.5.1 ether-input.c文件。
#define _GNU_SOURCE

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <unistd.h>
#include <sys/socket.h>
#include <sys/ioctl.h>
#include <net/if.h>
#include <netinet/in.h>
#include <linux/if_packet.h>
#include <linux/if_ether.h>

#define BUFFER_SIZE 65536

extern int32_t ether_sock_init(const char *if_name);
extern int32_t ether_recv_packet(int32_t sock, char *buffer, int32_t len);

/**
 * @description: 初始化原始套接字
 * @param {char} *if_name 接口名称
 * @return {sock} 返回套接字ID
 */
int32_t ether_sock_init(const char *if_name)
{
    int32_t            sock;
    struct ifreq       ifr;
    struct sockaddr_ll sll;

    /* 创建原始套接字, 抓取所有二层协议的报文, 不限于以太网协议 */
    sock = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL));
    if (sock < 0) {
        perror("Failed to create socket");
        exit(1);
    }

    /* 获取网络接口的索引 */
    memset(&ifr, 0, sizeof(ifr));
    strncpy(ifr.ifr_name, if_name, IFNAMSIZ - 1);
    if (ioctl(sock, SIOCGIFINDEX, &ifr) < 0) {
        perror("Failed to get interface index by ioctl");
        close(sock);
        exit(1);
    }

    /* 绑定到指定的网络接口 */
    memset(&sll, 0, sizeof(sll));
    sll.sll_family   = AF_PACKET;
    sll.sll_ifindex  = ifr.ifr_ifindex;
    sll.sll_protocol = htons(ETH_P_ALL);
    if (bind(sock, (struct sockaddr *)&sll, sizeof(sll)) < 0) {
        perror("Failed to bind to interface");
        close(sock);
        exit(1);
    }

    return sock;
}

/**
 * @description: 收取报文
 * @param {int32_t} sock
 * @param {char} *buffer
 * @param {int32_t} len
 * @return {*}
 */
int32_t ether_recv_packet(int32_t sock, char *buffer, int32_t len)
{
    int32_t ret;

    ret = recvfrom(sock, buffer, len, 0, NULL, NULL);
    if (ret < 0) {
        perror("Failed to receive packet");
        return -1;
    }
    return ret;
}
2.5.2 mini-tcpdump.c文件
#define _GNU_SOURCE

#include <pcap.h>
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <string.h>
#include <signal.h>
#include <time.h>

#include <netpacket/packet.h>
#include <net/ethernet.h>
#include <netinet/ip.h>
#include <netinet/tcp.h>
#include <netinet/udp.h>

#include "bpfjit.h"
#include "bpf-compat.h"

/* clang-format off */
#ifndef __COMPILING
/* 让ide不会报错, 可能无法识别嵌套函数语法 */
#define lambda(ret, name, arg)   ret (*name)(arg); name = NULL ; for (arg; 0;)
#else
#define lambda(ret, name, ...)   ret name(__VA_ARGS__)
#endif
/* clang-format on */

/* 执行即时编译的汇编指令 */
#define jitcall(func, _pkt, _wirelen, _buflen) \
    (func(NULL, &((bpf_args_t){.pkt = _pkt, .wirelen = _wirelen, .buflen = _buflen})))

extern int32_t ether_sock_init(const char *if_name);
extern int32_t ether_recv_packet(int32_t sock, char *buffer, int32_t len);

#define PACKET_SIZE 65536

/**
 * @description: 根据Tcpdump过滤表达式生成BPF字节码
 * @param {pcap_t} **handler 句柄
 * @param {bpf_program *} fp BPF程序
 * @param {char} *filter_exp 过滤表达式
 * @return {*}
 */
int32_t deal_tcpdump_code(pcap_t **handle, struct bpf_program *fp, const char *filter_exp)
{
    int         i;
    bpf_u_int32 net;
    pcap_t     *temp_handle;

    char errbuf[PCAP_ERRBUF_SIZE];

    /* The IP of our sniffing device */
    net = 0;
    /* 使用pcap_open_dead()创建一个用于编译过滤器的空PCAP句柄 */
    temp_handle = pcap_open_dead(DLT_EN10MB, BUFSIZ);
    if (temp_handle == NULL) {
        fprintf(stderr, "Couldn't create dead pcap session: %s\n", errbuf);
        return -1;
    }

    /* 编译BPF过滤器,但不应用到任何捕获会话 */
    if (pcap_compile(temp_handle, fp, filter_exp, 0, net) == -1) {
        fprintf(stderr, "Couldn't parse filter %s: %s\n", filter_exp, pcap_geterr(temp_handle));
        pcap_close(temp_handle);
        return -1;
    }

    printf("BPF bytecode length: %d\n", fp->bf_len);
    printf("BPF bytecode: \n");
    for (i = 0; i < fp->bf_len; i++) {
        printf("%02x, %02x, %02x, %02x\n", fp->bf_insns[i].code, fp->bf_insns[i].jt,
            fp->bf_insns[i].jf, fp->bf_insns[i].k);
    }

    /* 打印bpf字节码 */
    bpf_dump(fp, 1);

    *handle = temp_handle;
    return 0;
}

/* 获取绝对时间差值 */
static inline int64_t get_current_time(void)
{
    struct timespec ts;

    clock_gettime(CLOCK_MONOTONIC_RAW, &ts);
    return ts.tv_sec * 1000000000 + ts.tv_nsec;
}

/**
 * @description: 打印报文信息
 * @param {u_char} *packet
 * @return {*}
 */
void print_packet_info(const char *packet)
{
    struct ether_header *eth_header;
    struct ip           *ip_header;
    char                 src_ip[INET_ADDRSTRLEN], dst_ip[INET_ADDRSTRLEN];
    struct tcphdr       *tcp_header;
    struct udphdr       *udp_header;

    /*  以太网头部 */
    eth_header = (struct ether_header *)packet;
    printf(
        "\tEthernet: %02x:%02x:%02x:%02x:%02x:%02x -> %02x:%02x:%02x:%02x:%02x:%02x, Type : 0x%04x "
        "\n",
        eth_header->ether_shost[0], eth_header->ether_shost[1], eth_header->ether_shost[2],
        eth_header->ether_shost[3], eth_header->ether_shost[4], eth_header->ether_shost[5],
        eth_header->ether_dhost[0], eth_header->ether_dhost[1], eth_header->ether_dhost[2],
        eth_header->ether_dhost[3], eth_header->ether_dhost[4], eth_header->ether_dhost[5],
        ntohs(eth_header->ether_type));

    /* 非IP协议直接Pass */
    if (ntohs(eth_header->ether_type) != ETHERTYPE_IP) {
        printf("Not an IP packet\n");
        return;
    }

    ip_header = (struct ip *)(packet + sizeof(struct ether_header));
    inet_ntop(AF_INET, &ip_header->ip_src, src_ip, INET_ADDRSTRLEN);
    inet_ntop(AF_INET, &ip_header->ip_dst, dst_ip, INET_ADDRSTRLEN);

    /* 分IP类型打印信息 */
    switch (ip_header->ip_p) {
    case IPPROTO_TCP:
        tcp_header = (struct tcphdr *)(ip_header + 1);
        printf("\tIP: %s/%d -> %s/%d, TCP(%d), Total Length: %d.\n", src_ip,
            ntohs(tcp_header->th_sport), dst_ip, ntohs(tcp_header->th_dport), ip_header->ip_p,
            ntohs(ip_header->ip_len));
        break;
    case IPPROTO_UDP:
        udp_header = (struct udphdr *)(ip_header + 1);
        printf("\tIP: %s/%d -> %s/%d, UDP(%d), Total Length: %d.\n", src_ip,
            ntohs(udp_header->uh_sport), dst_ip, ntohs(udp_header->uh_dport), ip_header->ip_p,
            ntohs(ip_header->ip_len));
        break;
    default:
        printf("\tIP: %s -> %s, Proto: %d, Total Length: %d.\n", src_ip, dst_ip, ip_header->ip_p,
            ntohs(ip_header->ip_len));
        break;
    }

    return;
}

/**
 * @description: 抓包函数
 * @param {int32_t} sock 套接字
 * @param {struct bpf_program} *fp BPF程序
 * @return {*}
 */
int32_t capture_packets(int32_t sock, struct bpf_program *fp, bpfjit_func_t code)
{
    int32_t ret, temp;
    int64_t start_time, end_time;
    int64_t filtered_count, captured_count;
    int64_t func_cost_time, jit_cost_time;
    char    buffer[PACKET_SIZE];

    func_cost_time = jit_cost_time = 0;
    filtered_count = captured_count = 0;
    printf("Start to capture packets...\n");
    while (1) {
        int32_t len = ether_recv_packet(sock, buffer, PACKET_SIZE);
        if (len < 0) {
            perror("Failed to receive packet");
            break;
        }

        /* 执行BPF过滤器函数 */
        start_time = get_current_time();
        ret        = bpf_filter(fp->bf_insns, (const u_char *)buffer, len, len);
        end_time   = get_current_time();
        func_cost_time += end_time - start_time;

        /* 执行BPF过滤即时编译指令 */
        start_time = get_current_time();
        temp       = jitcall(code, (const u_char *)buffer, len, len);
        end_time   = get_current_time();
        jit_cost_time += end_time - start_time;
        if (temp != ret) {
            fprintf(stderr,
                "Warning, Result of executing bpf jit code is not equal to filter func: %d -> "
                "%d.\n",
                ret, temp);
            return -1;
        }

        if (ret == 0) {
            filtered_count++;
            continue;
        }
        captured_count++;
        /* 打印抓到的报文信息 */
        printf("[%ld]Packet captured ! Bypass %ld, Time avg cost: %ld ns(func) - %ld ns(jit).\n",
            captured_count, filtered_count, func_cost_time / (captured_count + filtered_count),
            jit_cost_time / (captured_count + filtered_count));
        print_packet_info(buffer);
    }

    return -1;
}

int32_t main(int32_t argc, char *argv[])
{
    int32_t            sock;
    pcap_t            *handle;
    struct bpf_program fp;
    bpfjit_func_t      code;

    if (argc != 3) {
        fprintf(stderr, "Usage: %s <interface> <tcpdump code>\n", argv[0]);
        return -1;
    }

    sock   = 0;
    handle = NULL;
    code   = NULL;
    memset(&fp, 0, sizeof(struct bpf_program));

    /* 也许用一下伪Lambda函数?至少可以避免全局变量使用 */
    lambda (void, free_source, int32_t signo) {
        /* 打印提示信息, 回收资源 */
        if (signo != -1) {
            printf("\nCtrl+C is pressed(sig %d), exit with 0.\n", signo);
        }
        if (fp.bf_insns) {
            pcap_freecode(&fp);
        }
        if (handle) {
            pcap_close(handle);
        }
        if (code) {
            bpfjit_free_code(code);
        }
        if (sock) {
            close(sock);
        }
        exit(0);
    }

    /* 注册ctrl+c信号处理函数 */
    signal(SIGINT, free_source);

    printf("Try to dump packet from interface(%s) with filter(%s)\n", argv[1], argv[2]);
    sock = ether_sock_init(argv[1]);
    if (sock < 0) {
        fprintf(stderr, "Failed to init socket\n");
        free_source(-1);
        return -1;
    }

    /* 编译tcpdump参数为bpf指令码 */
    if (deal_tcpdump_code(&handle, &fp, argv[2]) != 0) {
        fprintf(stderr, "Failed to deal tcpdump code\n");
        free_source(-1);
        return -1;
    }

    /* 验证bpf指令码的正确性 */
    if (bpf_validate(fp.bf_insns, fp.bf_len) == 0) {
        fprintf(stderr, "Failed to validate bpf code\n");
        free_source(-1);
        return -1;
    }

    /* 编译bpf指令码为机器指令 */
    code = bpfjit_generate_code(NULL, fp.bf_insns, fp.bf_len);
    if (code == 0) {
        fprintf(stderr, "Failed to compile bpf code\n");
        free_source(-1);
        return -1;
    }

    /* 循环抓包到结束 */
    if (capture_packets(sock, &fp, code) != 0) {
        fprintf(stderr, "Failed to capture packets\n");
        free_source(-1);
        return -1;
    }

    return 0;
}
2.5.3 Makefile文件
CC=gcc
TARGET=mini-tcpdump

.PHONY: all clean build
all: clean build

clean:
	rm -f $(TARGET)
build: $(TARGET)

SOURCE=mini-tcpdump.c ether-input.c

CFLAGS=-Wall -Werror -O0 -g -D__COMPILING=1
INCLUDE=-I./bpfjit/src -I./bpfjit/sljit/sljit_src
LDFLAGS= -L./lib -lsljit -lbpfjit -lpcap

mini-tcpdump: $(SOURCE)
	$(CC) -o $@ $^ $(CFLAGS) $(INCLUDE) $(LDFLAGS)

3. 总结(mini-tcpdump演示GIF)

根据MVP最小可用产品(tcpdump-mini)的验证结果,使用bpfjit的过滤效率还是挺不错的,复杂表达式下,开销都小于1us。如下:

场景 bpf filter bpf jit
ICMP过滤 900ns 200ns
ICMP和TCP-80过滤 1200ns 300ns
TCP SYN过滤 800ns 160ns

实际测试过程中,抓包越多,执行效率会更高,所以这里的耗时数据可作为一个参考值,但不能直接用于基准性能测试

从数据中,明显可以看出,bpfjit效率比bpf-filter要高,耗时只有bpf-filter的20~30%左右

在这里插入图片描述







Alt

Once Day

也信美人终作土,不堪幽梦太匆匆......

如果这篇文章为您带来了帮助或启发,不妨点个赞👍和关注,再加上一个小小的收藏⭐!

(。◕‿◕。)感谢您的阅读与支持~~~