[GSoC] Draft Proposal: Improve Statistics Handling

Fri Mar 31 22:28:12 UTC 2023

Hi,

I have got the first draft of my GSoC proposal ready. The project is on
"Improving Statistics Handling in strace". I have uploaded the markdown
version of the proposal on GitHub.

https://github.com/valdaarhun/Strace-GSoC-Proposal/blob/main/proposal.md

I have also appended the plain text version of the draft below.

I would love to hear back from you. Please let me know if you need more
information or if you would like to have more features added or modified
in the proposal.

Thanks,
Sahil

---------------------------------

## Name

Sahil Siddiq

## Title

Improve Statistics Handling in strace

## Abstract

strace is a popular syscall and signal tracing tool. One can
get valuable information about a process and how it works by
investigating the syscalls it invokes and the signals it sends/
receives. When dealing with long-running processes, daemons
and multi-threaded applications, concise and readily-available
syscall and signal statistics are invaluable. Currently, strace has
some statistics handling facilities but its capabilities are limited.

The objective of this project is to identify and implement a set of
features and capabilities that improves strace's handling of syscall
and signal statistics.

While tracing tools such as perf and LTTng offer more comprehensive
system-wide performance analysis, some of their features and design
patterns can serve as good benchmarks when implementing the new
features of interest in strace.

## Description

While strace has some facilites for statistics handling, there is scope
for improvement. Some features that can be implemented in strace
are described below:

### Per-tracee statistics

strace currently provides only aggregate totals when summarizing
the traced output for the following cases:
- a multi-threaded tracee
- a tracee that spawns children
- a comma separated list of pids is passed through -p

Having access to per-tracee statistics provides more fine-grained
information about the tracees' syscall activity. This can help in identifying
bottlenecks in individual tracees (eg: statistics of the number of syscalls
invoked by a tracee, or the time spent in specific syscalls). It can also help
give an insight into the amount of resource utilization per tracee
(eg: number of I/O operations). This could be useful when optimizing or
debugging a tracee.

One possible implementation of this feature could be to add a pid column
in the summary table. This also allows for per-tracee sorting (eg: sorting on
the "pid" column), or sorting over all pids (eg: sorting on the "number of
syscalls" column irrespective of the pid).

With this, it would also be possible to enable the -c and -ff options at the
same time. This would allow each tracee's summary table to be saved in
"filename.pid".

### Signal statistics

Currently, strace does not gather signal-related statistics. This project aims
to provide a facitlity in strace to display a summary of intercepted signals per
tracee. For every intercepted signal, it is possible to extract fields such as
si_signo, si_code, si_pid, si_uid amongst others.

For example, consider the following C program:

    void handler(int num){
      printf("Received %d\n", num);
      exit(1);
    }

    int main(){
      pid_t pid = fork();
      if (pid == 0){
        signal(SIGUSR1, handler);
        sleep(10);
      }
      sleep(5);
      kill(pid, SIGUSR1);
      wait(NULL);
    }

The child process and parent process receive SIGUSR1 and SIGCHLD
respectively.

    Child : --- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=3345, si_uid=1000} ---
    Parent: --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=3346, si_uid=1000, si_status=1, si_utime=0, si_stime=0} ---

For tracees that receive a considerable number of signals, it would be convenient
to collate signal statistics and display them in a structured format.

### Structured Output

Currently, it is possible to print a process' syscall statistics to a file. This can be done
by using the -c option with the -o option. As mentioned above, it should be possible
to print each tracee's syscall statistics to "filename.pid" once per-tracee statistics are
available. Apart from this, it would also be nice to have strace output statistics in a
particular format to a file which could later be used for further processing by other
tools.

For example, we can have strace write the syscall statistics of a tracee to a file in
json or csv format. For example, Brendan Gregg's "flamegraph.pl" can take this
data as input and generate a flamegraph that is relevant to the user.

A typical syscall summary table has the following columns:

    %time | seconds | usecs/call | calls | errors | syscall

This can be stored in a json file as shown:

    {
       "syscall": "name",
       "%time": "%time",
       "seconds": "seconds",
       "usecs_per_call": "usecs/call",
       "calls": "calls",
       "errors": "errors",
    }

### Provide statistics periodically or on demand

For processes with reasonably short output, it is adequate to print a single syscall
summary table. However, for long-running processes, daemons or processes that
may never halt, it would make sense to have a facility in place that periodically prints
a summary of the output. Alternatively, a facility can be put in place that allows the
user to request strace to print the statistics collected so far.

#### Periodic printing

Periodic printing could be based on two metrics. Statistics could be printed:
- at equal intervals of time from the beginning of the trace. This could probably be
  done by keeping track of the wall clock time.
- after a specified number of syscalls have been traced.

#### On-demand printing

Based on issue #47 (https://github.com/strace/strace/issues/47) on GitHub. A facility
could be provided in strace to print statistics whenever it receives a SIGUSR1 signal.

### Visualization

When the number of syscall statistics collected is large, it becomes easier to comprehend
that data by visualizing it. Hence, this project also aims to implement features that
enable strace to plot visualizations of syscall statistics. The data to be used can be
retrieved from the output of strace's -c option. Visualizations that might especially be
of use include:

- histograms of the number of each type of syscall
- histograms of the time spent in each type of syscall
- histograms of the time spent in each instance of a specific syscall. As an example,
  to plot a histogram for various instances of the "read" syscall, one could run:

      strace --plot -e histogram=read [tracee]

  This could give an insight into which specific syscalls are responsible for time-consuming
  operations. This could be helpful when optimizing the performance of individual syscalls
  is a priority.

#### Visualizing per-tracee stats

When visualizing per-tracee statistics, heatmaps could be helpful. The heatmap could
be modeled on GitHub's contribution graph. One design of such a graph could be to
have syscalls across the x axis, pids across the y axis. The colour intensity of an (x, y)
cell would represent the metric that we wish to visualize.

Design problems: Since colour is involved in this feature, strace will not be able to
support this feature for terminals that do not support colour.

Alternative: Instead of printing such a graph to the terminal using coloured ascii-art,
an option can be provided to generate an svg or ppm file of the graph similar to
Brendan Gregg's "flamegraph.pl" tool.

### Efficient statistics collection using eBPF

Currently, strace collates syscall statistics by maintaining a direct address table of
"call_counts". When strace is run with the -c option, it still stops twice at every
syscall. During the "syscall-exit" stage, the array of "call_counts" is updated. For
programs that invoke a large number of syscalls, this overhead can be significant.

As explained in Paul Chaignon's talk "strace --seccomp-bpf: a look under the hood" 
(https://www.youtube.com/watch?v=fAcI3NErQw0), eBPF can be leveraged to
reduce this overhead. It can provide an efficient mechanism to collect and aggregate
syscall statistics in kernel space which can then be sent back to strace. For every syscall
invocation, the eBPF program can be attached to the relevant kernel tracepoint while
eBPF maps are used to collect data.

## Why is this innovative and what will it contribute

This project aims to implement facilities for more detailed and fine-grained syscall
and signal statistics. This can help users find bottlenecks in tracees and analyze their
performance. The proposed features will also allow statistics to be printed in a format
that can be used by other tools for further processing, as well as enable collected
information to be graphed for visualization and ease of comprehension.

## Existing Work

"Perf" and "LTTng" are both tracing and profiling tools/frameworks for the linux kernel.
The "perf-trace" tool is very similar to strace and has a few features that this project
aims to implement. For example, perf-trace can be used to collect per-thread statistics.
However, "perf-trace" does not have a facility to sort the collected statistics.

Brendan Gregg's "flamegraph.pl" tool can be used to generate flame graphs for statistics
related to resource usage. While this project does not aim to provide statistics or their
visualizations for usage of resources or hardware events, it can still be used as inspiration
for developing histograms and heat maps for syscall statistics.

LTTng is another popular tracer for Linux applications that can trace syscalls with the
"--kernel --syscall" options.

## Tentative Timeline

### Community Bonding

May 4 - May 28

- Understand in detail the current implementation of statistics handling in strace.
- Delve into the documentation and implementation of tools such as perf-trace and
  LTTng that could help with implementing this project.
- Break down each feature into more granular bite-sized tasks.

### Coding Period Begins

Week 1: May 29 - June 4
- Implement per-tracee statistics handling.

Week 2: June 5 - June 11
- Write tests for the newly implemented features.
- Implement signal statistics handling.

Week 3: June 12 - June 18
- Continuation of week 2.
- Write tests for newly implemented features.

Week 4: June 19 - June 25
- Implement periodic display of statistics.

Week 5: June 26 - July 2
- Continuation of Week 4.
- Write tests for newly implemented features.

Week 6: July 3 - July 9
- Finish pending work from previous weeks, if any.

Week 7: July 10 - July 16
- Implement structured output for statistics.

Week 8: July 17 - July 23
- Continuation of Week 7.
- Write tests for newly implemented features.

Week 9: July 24 - July 30
- Implement histogram and heat map generation.

Week 10: July 31 - August 6
- Continuation of Week 9.
- Write tests for newly implemented features.

Week 11: August 7 - August 13
- Use eBPF to make statistics handling more efficient.

Week 12: August 14 - August 20
- Write tests for newly implemented features.

Week 13: August 21 - August 28
- Finish pending work from previous weeks, if any.

## About Me

I am a final year computer science undergraduate with an interest in systems
programming and binary analysis. I was introduced to the strace tool as an end
user while working on the pwn.college (https://pwn.college/) CTF challenges.

GitHub: https://github.com/valdaarhun

## Skills and Relevant Work Experience

I am comfortable with C, C++ and Python and am currently working on my Rust
skills. I have completed a few systems courses (eg: operating systems, compiler
construction) as part of my college curriculum. I have also been working through
MIT's operating systems lab sheets to get my hands dirty in OS development.
Additionally, I have read the book "Dynamic Binary Modification: Tools, Techniques,
and Applications" by Kim Hazlewood while learning about dynamic binary modification
systems.

## Previous open source contributions

I have worked on a few bite-sized tasks in some open source organizations in the
past. Some of them are:

- htop:
  - https://github.com/htop-dev/htop/pull/1062
  - https://github.com/htop-dev/htop/pull/1181 (still open)
- openssl:
  - https://github.com/openssl/openssl/pull/18865
  - https://github.com/openssl/openssl/pull/18982
- shadow: https://github.com/shadow/shadow/pull/2474
- nix-rust: https://github.com/nix-rust/nix/pull/1768

## Personal projects:

- Tiny Debugger: https://github.com/valdaarhun/Tiny-Debugger
- Ray Tracer: https://github.com/valdaarhun/Ray-Tracer
- Distributed Key Generation over a P2P network:
  https://github.com/valdaarhun/Distributed-Key-Generation-Algorithm

## Other commitments

I have end-semester exams in the first half of May. However, since I do not
have too many courses, I should still be able to meet my commitments during
this period. Thereafter, I'll be able to work on this project for the entire summer.