Proliferation of hundreds of tiny files

Mon Oct 26 12:08:52 UTC 2015

On 10/25/2015 03:12 AM, Dmitry V. Levin wrote:
> On Sat, Oct 24, 2015 at 07:49:43PM +0200, Denys Vlasenko wrote:
>>>>> Ideally, each parser should have been in a separate file from the
>>>>> beginning, that would make maintenance easier.
>>>>> I'm working in this direction.
>>>>
>>>> It's easier in what way?
>>>
>>> You must be joking.  Smaller translation unit has less context to care
>>> about.
>>
>> Translation unit is not a human, it's a piece of code.
>> It does not feel hurt when it, say, has a bunch extra includes
>> it does not need, or a function is being compiled after
>> 100 other functions were preceding it. Even if they were unrelated
>> to it. What's the difference?
> 
> Translation units are maintained by humans, and humans usually do better
> with smaller ones.

I don't see why.
When I work on the function foo(), I couldn't care less about
functions above it in the same file. Be there three, or 300 of them.

> Every extra include takes extra attention, too.
> I certainly prefer chdir.c over old file.c
> 
>>>  For example, there used to be unwanted side effects when headers
>>> included for one parser affected other parsers.
>>
>> Can you be more specific? What side effects?
>>
>> If two headers can't live together, then there is a bug in headers.
>> It should be reported, and effects worked around while it is
>> fixed upstream. Been there, done that.
>>
>> All headers together (for example, all libc headers) define the API
>> of the corresponding package (in my example, glibc). So it is legal,
>> and should not lead to any bugs, to include *every* glibc header
>> file at the beginning of a C program.
> 
> Unlike regular C programs, strace utilizes linux headers a lot, and libc
> headers do not live well together with linux headers.

How this situation becomes better after splitting?
You still need to include linux headers, don't you?

>>>> Let's get back to linux/x86_64/get_scno.c
>>>> Here's the entire beginning of the file, I'm skipping nothing:
>>>>
>>>>
>>>> #ifndef __X32_SYSCALL_BIT
>>>> # define __X32_SYSCALL_BIT      0x40000000
>>>> #endif
>>>>
>>>> unsigned int currpers;
>>>>
>>>> #if 1
>>>> /* ... */
>>>> if (x86_io.iov_len == sizeof(i386_regs)) {
>>>>         scno = i386_regs.orig_eax;
>>>>         currpers = 1;
>>>> } else {
>>>>
>>>>
>>>> So. What is "scno"? Is is a global variable? A local one?
>>>> A function parameter?
>>>>
>>>> Is it a "long"? Or "unsigned long long"?
>>>>
>>>> To learn answers to these questions, I need to go
>>>> to *another file* and look there!
>>>>
>>>> Usually, in C language people don't have to do that
>>>> to find out such things.
>>>
>>> Do you mean that an arch_get_scno function called by generic get_scno
>>> would be easier to read than current #include "get_scno.c"?
>>> Yes, I think it would.
>>>
>>>> This is not easier. This is more obscure. IMHO, of course.
>>>> You probably have logical reasons why I'm not right in my opinion.
>>>> Can you voice these reasons?
>>>
>>> Mixing numerous arch specific ifdefs with generic code is much harder to
>>> read
>>
>> It's harder to read an ifdef than to look at half a dozen files to
>> see one function?
> 
> It is quite hard to follow the code with ~20 arch ifdefs per function.
> Why do you need to look at half a dozen files to see one function is
> beyond my understanding.

/* Return -1 on error or 1 on success (never 0!) */
static int
get_syscall_args(struct tcb *tcp)
{
#include "get_syscall_args.c"
        return 1;
}

static void
get_error(struct tcb *tcp)
{
        const bool check_errno = !(tcp->s_ent->sys_flags & SYSCALL_NEVER_FAILS);
        tcp->u_error = 0;

#include "get_error.c"
}

/* Returns:
 * 1: ok, continue in trace_syscall_exiting().
 * -1: error, trace_syscall_exiting() should print error indicator
 *    ("????" etc) and bail out.
 */
static int
get_syscall_result(struct tcb *tcp)
{
#if defined ARCH_REGS_FOR_GETREGSET || defined ARCH_REGS_FOR_GETREGS
        /* already done by get_regs */
#else
# include "get_syscall_result.c"
#endif
        get_error(tcp);
        return 1;
}

To understand the above mere screenful of text, I need to look at four files.

>> Where's the problem? What would I gain if I split it up into 20 files?
>> Will code become smaller? Faster? Will it compile faster?
>> I suspect all three are answered with "no".
>> Maybe whatever bugs and non-elegant bits it has now will get fixed by
>> a split? Again, no.
> 
> Speaking of myself, splitting the strace code made my life as a maintainer
> easier.

Only if you somehow fear having more than one function per source file.

The goal of my email was to bring to your attention that maybe
it is going too far.

I was hesitating to talk about that because I do realize that people
are different and attempts to make other people do exactly what *I*
would consider to be best is counterproductive and at times actively harmful.

However, I realized that if I don't speak up, the entire source
is on its merry way to be split into ~2000 source files.