strace's performance overhead on multithreaded processes
Fotios Valasiadis
fvalasiad at gmail.com
Thu Apr 13 14:24:56 UTC 2023
Hello everyone,
I was wondering, just how important is strace's performance to you, the
project Maintainers? What I am trying to say is, do you care about the
overhead that strace adds to the underlying process' execution? Would
you be interested in exploring ways to improve it?
I am a fellow project contributor of a project that heavily relies on
ptrace, build-recorder (https://github.com/eellak/build-recorder), and
performance is of massive importance to us. While investigating ways to
improve our relatively new project, I found a hacky way to achieve our
goal(regarding ptrace's API). The optimizations I am thinking are all
about multi-threaded processes(strace -f, in your case).
Out of pure curiosity, since strace served as a guide for me to learn
more about ptrace as I was developing the tool, I wanted to measure how
it performs under the same tests. I was surprised to see, that it
performs worse than build recorder(post optimizations) in terms of
making the most of the CPU.
You can find a detailed explanation of what the problem with build
recorder was, which I believe strace also suffers from, as well as
proposed solutions(which plenty of them led to dead ends, since I am
still learning about ptrace) here:
https://github.com/eellak/build-recorder/issues/151
You can also check to see the optimized branch of my fork, here:
https://github.com/fvalasiad/build-recorder/tree/test-concurrent-ca-hashmap
bother changing the threadpool_new() call to match the number of
hardware threads of your CPU, minus one, in src/tracer.c before testing,
please .
Compare it with the main branch of the main repository. The code is
badly structured since I only cared about seeing the potential of build
recorder, we are planning to do said optimizations properly in build
recorder this summer.
Regarding the program and the test cases, build recorder traces build
processes to create a record of all the dependencies, that can also be
included in SBOMs. The primal test case as such, is tracing build
processes, which are generally speaking, embarrassingly parallel
programs, and as such suffer a lot from these issues regarding ptrace(2)
and its effect on multithreaded processes. You can test strace and
build-recorder against a build to see how much strace utilizes the
multithreaded nature of modern CPUs, compared to the main and optimized
branches of build-recorder.
I understand that strace and build-recorder aren't the same, in the
sense that the problems that build recorder tries to solve don't
necessarily align with those of strace, and could also potentially not
have the same goals. Excuse me if you are well aware of these issues and
have already taken the best measures possible to deal with them, I
haven't studied the entirety of strace's source. You are welcome to
ignore this message if that's the case.
Nevertheless, I'd be up to help with this if you consider my proposal
worthwhile!
Best regards,
Fotios Valasiadis
<fvalasiad>
More information about the Strace-devel
mailing list