[PATCH 3/4] Introduce syscall fault injection feature

Nahim El Atmani nahim+dev at naam.me
Wed Aug 10 20:31:48 UTC 2016


Hi,

I'm finalizing the new patchset with the previous comments addressed. However
there is some points that I would like to clarify.

On Fri, 29 Jul 2016 04:04:02 +0300, Dmitry V. Levin wrote:
> Date: Fri, 29 Jul 2016 04:04:02 +0300
> From: "Dmitry V. Levin" <ldv at altlinux.org>
> To: strace-devel at lists.sourceforge.net
> Subject: Re: [PATCH 3/4] Introduce syscall fault injection feature
> Message-ID: <20160729010402.GC16977 at altlinux.org>
> In-Reply-To: <20160727184117.20233-4-nahim+dev at naam.me>
> List-Id: strace development list <strace-devel.lists.sourceforge.net>
> 
> > * Concerning the struct qual_options, the name 'fault' and short name 'f' are
> >   not fixed, feel free to comment if something better comes to your mind.
> > * Regarding the error number checks in qual_fault(), they're light on purpose
> >   to allow one to return any kind of return value.
> 
> Even if they are light, they should be correct.

Maybe the word permissive is more appropriate than lite. The idea here is to
allow one to return any numeric value. The string_to_int() function is
correctly used in the next patchset. Let me know if you still find anything
that does not make sense to you on the incoming v2.

> > +/* Fault injection qualifiers concerning syscalls:
> > + * FAULT_ENTER: already discarded, but error is not yet propagated
> > + * FAULT_DONE: already discarded and we were using FAULT_AT (prevent overflow)
> > + * FAULT_AT: discard syscall at the nth time
> > + * FAULT_EVERY: discard syscall every nth time
> > + * FAULT_FUZZY: discard syscall on a random basis every nth percent of the time
> > + */
> > +#define FAULT_ENTER 1
> > +#define FAULT_EVERY (1 << 2)
> > +#define FAULT_FUZZY (1 << 3)
> > +#define FAULT_AT (1 << 4)
> > +#define FAULT_DONE (1 << 5)
> 
> I prefer enums in new code.  BTW, (1 << 1) is missing.

Are you sure about using an enum in this case? These defines are flags used in
the structure fault_opts. Using enum would ends up wrapping flags in a quite
strange fashion like:

enum fault_flag {
	FAULT_ENTER = (1 << 1),
	FAULT_EVERY = (1 << 2),
	FAULT_FUZZY = (1 << 3),
	FAULT_AT = (1 << 4),
	FAULT_DONE = (1 << 5)
};

> > +static inline long
> > +fault_set_sc_err(struct tcb *tcp, int error)
> > +{
> > +	return ptrace(PTRACE_POKEUSER, tcp->pid,
> > +		      offsetof(struct user, regs.eax),
> > +		      (unsigned long long int)error);
> > +}
> 
> It might be a good idea to omit the second argument and just set
> the error register from tcp->u_error.

Thanks, removed.

> 
> > +static inline long
> > +fault_discard_sc(struct tcb *tcp)
> > +{
> > +	return ptrace(PTRACE_POKEUSER, tcp->pid,
> > +		      offsetof(struct user, regs.orig_eax),
> > +		      (unsigned long long int)-1);
> > +}
> 
> Likewise, arch_set_scno and $arch/get_scno.c
> 
> get_error.c and get_scno.c could be renamed if necessary.

I feel wrong about putting set_error() in get_error.c. Does the following
renaming suit you?
- set_error() in arch_sc_error.c instead of get_error.c. Actually I wanted to
  go for arch_set_error() but set_error() seems more appropriate to keep
  consistency with the existing get_error() function.
- arch_set_scno() in arch_scno.c instead of get_scno.c

> 
> > --- a/syscall.c
> > +++ b/syscall.c
> > @@ -266,6 +266,14 @@ enum {
> >  	MIN_QUALS = MAX_NSYSCALLS > 255 ? MAX_NSYSCALLS : 255
> >  };
> >  
> > +#if ENABLE_FAULT_INJECTION
> > +#include "fault.h"
> > +struct fault_opts *faults_vec[SUPPORTED_PERSONALITIES];
> > +#define syscall_failed(tcp)					\
> > +	(((tcp)->qual_flg & QUAL_FAULT) &&			\
> > +	 (faults_vec[current_personality][(tcp)->scno].flag & FAULT_ENTER))
> > +#endif
> 
> This is wrong: faults_vec[current_personality][(tcp)->scno].flag
> is a global flag, it cannot be used as a state of syscall parsing
> for a tracee.
> 
> Use tcp->flags to store this information.

True, I moved all accounting informations in a pointer to an array of
struct fault_opts in struct tcb. To reduce memory footprint the array is
not lacunar anymore and only contains relevant syscalls informations. We don't
have O(1) access anymore but since the array size can not exceed nsyscalls
I'll go for a binary search to retrieve fault informations.

 
> > +	if (end == ss || (*end != '\0' && *end != '%' && *end != '.')
> > +	    || errno == ERANGE || errno == EINVAL || opts.occ < 1
> 
> errno is not cleared before the call.

uh, thanks!

-- 
Nahim El Atmani <nahim+dev at naam.me>
http://naam.me/




More information about the Strace-devel mailing list