GSoC: Support for BTF and other BPF decoding improvements proposal rough draft

SuHsueyu anolasc13 at gmail.com
Thu Apr 7 14:45:30 UTC 2022


Hello, I'd like to contribute to the strace project in GSoC. I wrote a
proposal for it. Suggestions for improvement are always welcome!
---
Support for BTF and other BPF decoding improvements

SuHsueyu, anolasc13 at gmail.com

Abstract
BTF information is important for debugging. The purpose of this
project is to extract BTF information for eBPF maps, process it in a
human-readable manner, and improve decoding for the bpf() syscall's
map manipulation sub-calls. This proposal also includes a small
prototype for testing feasibility as well as a description of the
prototype.

Project Description
Retrieve the BTF information for the eBPF maps and use it to enhance
decoding of keys and values of map elements passed to/from the kernel
in map-manipulation-related `bpf` syscalls.

Motivation
Debugging eBPF programs and maps requires BTF information. BTF is a
metadata collection for eBPF that includes source information for BPF
data. So, with BTF, we can obtain a deeper understanding of eBPF and
the bpf() syscall. The syscall bpf() not only allows you to load an
eBPF program into the kernel, but it also allows you to get meta
information about it. This project will extract BTF data via the bpf()
syscall, decode it, and use the output to improve decoding itself.

Previous work
bpftool: bpftool is an tool that allows you to view and manipulate
eBPF programs and maps. To retrieve BTF information about eBPF maps,
we can use 'bpftool btf dump prog <prog id>' or 'bpftool map dump <map
id>' to retrieve BTF objects loaded in the kernel. bpftool use libbpf
to retrieve the BTF object loaded in kernel.
The bpftool source code is included in the Linux source code. bpftool
is a great example of a tool for retrieving BTF information. It can
provide a deep interpretation of BPF data to the user, which is the
project's purpose. And I think bpftool is a useful and valuable
reference object.
link to bpftool:
[bpftool](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/bpf/bpftool)

Prototype
For this proposal, I write a prototype:
[RetriveBTFDemo](https://github.com/ANOLASC/RetriveBTFDemo). It has
the basic ability to retrieve BTF data that has been loaded into the
kernel. I propose adding several crucial features in GSoC 2022 to make
it more complete.

Result of my demo:
sudo ./rt 6
fd: 3
type: 2
id: 6
key_size: 4
value_size: 8
max_entries: 2
map_flags: 0
name: cnt
ifindex: 0
btf_value_type_id: 10
netns_dev: 0
netns_ino: 0
btf_id: 93
btf_key_type_id: 7
btf_value_type_id: 10

Implementation of prototype
BTF information loaded in the kernel or ELF file can be inspected
using the Linux syscall bpf(). The prototype will extract information
loaded in kernel and print it to the console using the command line
option provided by user, which is now map id. The prototype's workflow
is as follows:
1. The prototype will attempt to obtain the map file descriptor that
corresponds to the map id provided by the user via command line input.
To retrieve the map file descriptor, the example uses the bpf syscall
with the 'BPF_MAP_GET_FD_BY_ID'. The map file descriptor is part of
BPF virtual file system.
2. After getting the corresponding map file descriptor, the prototype
will retrieve the map information using bpf syscall with
'BPF_OBJ_GET_INFO_BY_FD' parameter.
3. Finally, the prototype will decode the map information and print
each map information entry to the console.

The prototype is simple and it remains to be improved. For example,
retrieving BTF information in ELF file .BTF section; More detailed
classification. BPF has many types of map, like hash table map,
program array map, stack trace map, etc. It would be beneficial if it
could make more detailed classifications.

Why is it innovative and What it will contribute
It can provide a new viewpoint of the eBPF program and bpf() syscall
by retrieving BTF for eBPF and BPF data and performing more extensive
analysis on the returned BTF information. This can provide strace
users with much more detailed information when tracing bpf() syscall.

Time Line
Community Bonding    May 20 - June 12
During the community bonding phase, I will delve into eBPF, set up a
coding and debugging environment, learn about the community workflow,
and get a sense of how things function here.

Phase 1 June 1 - June 29
Week 1 June 1 - June 7
I will follow TDD in development. So I would write test case first

Week 2 June 8 - June 14
Week for writing full coverable and complete test cases

Week 3 June 15 - June 21
Week for adding ability of retrieving map info to the decoder

Week 4 June 22 - June 28
Week for classify the map type

Phase 2 June 29 - July 27
Week 5 June 29 - July 5
Week for reading bpf source code and improving decoding information
for bpf syscall

Week 6 July 6 - July 12
Week for improving BPF_MAP_CREATE

Week 7 July 13 - July 19
Week for improving BPF_MAP_LOOKUP_ELEM

Week 8 July 20 - July 26
Week for improving BPF_MAP_UPDATE_ELEM

Phase 3 July 27 - August 24

Week 9 July 27 - August 2
Week for improving BPF_MAP_DELETE_ELEM

Week 10 August 3 - August 9
Week for improving BPF_MAP_GET_NEXT_KEY

Week 11 August 10 - August 16
Week for code reviewing and bug fixing

Week 12 August 17 - August 23
Week for code reviewing, bug fixing. Buffer time in case of something
cannot be done on time.

About me
I am an undergraduate student of software engineering.

Relevant skills that will help to achieve the goal
I'm fresh to the opensource community. I'm conversant with C/C++ and
the fundamentals of git. Through MIT 6.s081, I have a fundamental
understanding of the operating system. I believe these abilities will
assist me in completing this project.

Open-source experience
I don't have any open-source projects; this is the first open-source
project I involved, and  I'm hoping it will serve as a springboard for
future open-source activity. During the GSoC 2022 period, I am
available full-time to work on this project.

Personal Info
- Name: SuHsueyu
- Email: anolasc13 at gmail.com
- Github: github.com/ANOLASC


More information about the Strace-devel mailing list