[RFC v2] Add basic syscalls fault injection
Nahim El Atmani
nahim at naam.me
Wed May 18 01:59:09 UTC 2016
Hi,
On Mon, 16 May 2016 14:46:38 +0300, Dmitry V. Levin wrote:
> > Since the syscall is directly passed to qual_syscall() everything you used
> > before is still available (syscall number, name, classes...). The occurrence is
> > actually an integer
>
> Any integer? Can it be negative or zero?
Nope, proper checks have to be added. Since this patch is more like a PoC there
is not much correctness checks as the user is still myself.
> > diff --git a/syscall.c b/syscall.c
> > index d71ead3..580660c 100644
> > --- a/syscall.c
> > +++ b/syscall.c
> > @@ -266,6 +266,8 @@ enum {
> > MIN_QUALS = MAX_NSYSCALLS > 255 ? MAX_NSYSCALLS : 255
> > };
> >
> > +struct_fail fails_vec[MAX_NSYSCALLS];
>
> Shouldn't there be not just one vector but SUPPORTED_PERSONALITIES
> vectors, like qual_vec?
Yes, I misunderstood why we needed the support for each personalities while
tracing a program.
> > + for (i = 0; i < nsyscall_vec[current_personality]; ++i)
> > + if (sysent_vec[current_personality][i].sys_name
> > + && !strcmp(ss, sysent_vec[current_personality][i].sys_name))
> > + break;
>
> Why limit this to current_personality?
For the same reason as above.
>
> > + if (i == nsyscall_vec[current_personality])
> > + goto bad_format;
> > + qual_syscall(ss, bitflag, not);
> > +
> > + ss = strtok_r(NULL, ":", &saveptr);
> > + if (!ss)
> > + goto bad_format;
> > +
> > + occ = strtol(ss, &end, 10);
> > + if (end == ss || (*end != '\0' && *end != '%') || errno == ERANGE)
> > + goto bad_format;
>
> What if occ is negative or zero?
As I said first it's a rough PoC, thanks for pointing this out though.
>
> > + fails_vec[i].occ = occ;
> > + if (*end == '%')
> > + fails_vec[i].cnt = -1;
> > +
> > + ss = strtok_r(NULL, ":", &saveptr);
> > + if (!ss)
> > + goto bad_format;
> > +
> > + if (*ss >= '0' && *ss <= '9') {
> > + err = strtol(ss, &end, 10);
> > + if (end == ss || *end != '\0' || errno == ERANGE)
> > + goto bad_format;
>
> What if err is negative or zero?
Same, thanks.
>
> > + }
> > + else {
> > + /* TODO: support string form. */
> > + }
> > + fails_vec[i].err = err;
> > + fails_vec[i].fail = 0;
> > +
> > + free(ms);
> > +
> > + return 0;
> > +
> > +bad_format:
> > + free(ms);
> > +
> > + return -1;
> > +}
>
> I suggest a simple change to this procedure: parse the string passed to
> the function and handle errors, then loop over personalities initializing
> qual_vec and fails_vec.
The only reason I see for duplicating code of qual_syscall() and qualify_one()
is a little speed improvement. Is that what you aim at?
>
> > +static int
> > qual_signal(const char *s, const unsigned int bitflag, const int not)
> > {
> > unsigned int i;
> > @@ -789,6 +851,35 @@ static void get_error(struct tcb *, const bool);
> > static int getregs_old(pid_t);
> > #endif
> >
> > +static long
> > +fail_syscall_enter(struct tcb *tcp)
> > +{
> > + if (fails_vec[tcp->scno].cnt != -1) {
> > + if (fails_vec[tcp->scno].cnt++ % fails_vec[tcp->scno].occ)
>
> What of fails_vec[tcp->scno].occ is zero?
Already discussed, thanks
>
> > + return 0;
> > + }
> > + else { /* TODO: Support the non deterministic way */
> > + return 0;
> > + }
> > +
> > + fails_vec[tcp->scno].fail = 1;
> > + return ptrace(PTRACE_POKEUSER, tcp->pid,
> > + offsetof(struct user, regs.orig_rax),
> > + (unsigned long)-1);
> > +}
>
> As this ptrace call is arch specific, I suggest moving it to a separate
> function, e.g. arch_set_scno.
Yes, definitely. This version was just a x86_64 PoC to ensure everything work
before diving into portability, error checking, testing...
>
> > +
> > +static inline long
> > +fail_syscall_exit(struct tcb *tcp)
> > +{
> > + if (!fails_vec[tcp->scno].fail)
> > + return 0;
> > +
> > + tcp->u_error = fails_vec[tcp->scno].err;
> > + return ptrace(PTRACE_POKEUSER, tcp->pid,
> > + offsetof(struct user, regs.rax),
> > + (unsigned long)-fails_vec[tcp->scno].err);
> > +}
>
> Likewise, I suggest moving arch specific code to a separate function,
> e.g. arch_set_error.
Sure, thanks.
>
> > +
> > static int
> > trace_syscall_entering(struct tcb *tcp)
> > {
> > @@ -842,6 +933,8 @@ trace_syscall_entering(struct tcb *tcp)
> > # endif
> > }
> > #endif
> > + if (tcp->qual_flg & QUAL_FAIL)
> > + fail_syscall_enter(tcp);
> >
> > if (!(tcp->qual_flg & QUAL_TRACE)
> > || (tracing_paths && !pathtrace_match(tcp))
> > @@ -908,6 +1001,9 @@ trace_syscall_exiting(struct tcb *tcp)
> > update_personality(tcp, tcp->currpers);
> > #endif
> > res = (get_regs_error ? -1 : get_syscall_result(tcp));
> > +
> > + if (tcp->qual_flg & QUAL_FAIL && fails_vec[tcp->scno].fail)
> > + fail_syscall_exit(tcp);
>
> The check for hide_log_until_execve should go first.
True, however, and correct me if I'm wrong but one can have both a filtered tcp
because syscall have not been selected for logging and a failing rule on that
syscall. That situation can be reflected as:
strace -e '!write' -e failwith=write:1:EINVAL /bin/ls
As in « don't output write syscalls since I already know they will fail ».
If we switch position the goto ret will occur and syscall will *not* be
discarded.
>
> It's important to make fail flag not per-scno but per-tcb, otherwise
> force-failing syscalls when tracing mulitple processes will be unreliable.
>
> struct_fail.fail is probably not needed at all.
> struct call_counts is better place for accounting purposes.
Yes struct_fail.fail should be replaced by a modulus check on call_counts.calls
I guess. However we may want the struct nevertheless to store information about
the error code and the fail frequency.
> > + if (tcp->qual_flg & QUAL_FAIL && fails_vec[tcp->scno].fail) {
> > + if ((unsigned int)u_error < nerrnos && errnoent[u_error])
> > + tprintf("= -1 %s ", errnoent[tcp->u_error]);
> > + else
> > + tprintf("= -1 %ld ", u_error);
> > + tprints("(DISCARDED)");
>
> Why DISCARDED?
It is just a simple way for one to know that this particular call failed on
purpose, hence it makes greping & dump parsing easier. Do you see any other
naming for that?
>
> > + fails_vec[tcp->scno].fail = 0;
> > + }
> > + else if (tcp->qual_flg & QUAL_RAW) {
> > if (u_error)
> > tprintf("= -1 (errno %ld)", u_error);
> > else
>
> What if both QUAL_FAIL and QUAL_RAW flags are set?
The case was not handled, thanks!
--
Nahim El Atmani <nahim+dev at naam.me>
http://naam.me/
More information about the Strace-devel
mailing list