strace's performance overhead on multithreaded processes

Fotios Valasiadis fvalasiad at
Thu Apr 13 14:24:56 UTC 2023

Hello everyone,

I was wondering, just how important is strace's performance to you, the 
project Maintainers? What I am trying to say is, do you care about the 
overhead that strace adds to the underlying process' execution? Would 
you be interested in exploring ways to improve it?

I am a fellow project contributor of a project that heavily relies on 
ptrace, build-recorder (, and 
performance is of massive importance to us. While investigating ways to 
improve our relatively new project, I found a hacky way to achieve our 
goal(regarding ptrace's API). The optimizations I am thinking are all 
about multi-threaded processes(strace -f, in your case).

Out of pure curiosity, since strace served as a guide for me to learn 
more about ptrace as I was developing the tool, I wanted to measure how 
it performs under the same tests. I was surprised to see, that it 
performs worse than build recorder(post optimizations) in terms of 
making the most of the CPU.

You can find a detailed explanation of what the problem with build 
recorder was, which I believe strace also suffers from, as well as 
proposed solutions(which plenty of them led to dead ends, since I am 
still learning about ptrace) here:

You can also check to see the optimized branch of my fork, here:
bother changing the threadpool_new() call to match the number of 
hardware threads of your CPU, minus one, in src/tracer.c before testing, 
please ��.

Compare it with the main branch of the main repository. The code is 
badly structured since I only cared about seeing the potential of build 
recorder, we are planning to do said optimizations properly in build 
recorder this summer.

Regarding the program and the test cases, build recorder traces build 
processes to create a record of all the dependencies, that can also be 
included in SBOMs. The primal test case as such, is tracing build 
processes, which are generally speaking, embarrassingly parallel 
programs, and as such suffer a lot from these issues regarding ptrace(2) 
and its effect on multithreaded processes. You can test strace and 
build-recorder against a build to see how much strace utilizes the 
multithreaded nature of modern CPUs, compared to the main and optimized 
branches of build-recorder.

I understand that strace and build-recorder aren't the same, in the 
sense that the problems that build recorder tries to solve don't 
necessarily align with those of strace, and could also potentially not 
have the same goals. Excuse me if you are well aware of these issues and 
have already taken the best measures possible to deal with them, I 
haven't studied the entirety of strace's source. You are welcome to 
ignore this message if that's the case.

Nevertheless, I'd be up to help with this if you consider my proposal 

Best regards,
Fotios Valasiadis

More information about the Strace-devel mailing list