[PATCH 1/2] powerpc: properly negate error in syscall_set_return_value() in sc case

Eugene Syromyatnikov evgsyr at gmail.com
Tue Jan 28 15:54:17 UTC 2025


On Tue, Jan 28, 2025 at 3:59 PM Christophe Leroy
<christophe.leroy at csgroup.eu> wrote:
>
>
> Le 27/01/2025 à 19:13, Dmitry V. Levin a écrit :
> > According to the Power Architecture Linux system call ABI documented in
> > [1], when the syscall is made with the sc instruction, both a value and an
> > error condition are returned, where r3 register contains the return value,
> > and cr0.SO bit specifies the error condition.  When cr0.SO is clear, the
> > syscall succeeded and r3 is the return value.  When cr0.SO is set, the
> > syscall failed and r3 is the error value.  This syscall return semantics
> > was implemented from the very beginning of Power Architecture on Linux,
> > and syscall tracers and debuggers like strace that read or modify syscall
> > return information also rely on this ABI.
>
> I see a quite similar ABI on microblaze, mips, nios2 and sparc. Do they
> behave all the same ?

In terms of ABI, yes:
https://gitlab.com/strace/strace/-/blob/master/src/linux/sparc/get_error.c
https://gitlab.com/strace/strace/-/blob/master/src/linux/mips/get_error.c
https://gitlab.com/strace/strace/-/blob/master/src/linux/nios2/get_error.c

(cf. https://gitlab.com/strace/strace/-/blob/master/src/linux/powerpc/get_error.c
)

I can't find microblaze SyscallV ABI, but at least both strace and
glibc expect negated errno in r3:
https://gitlab.com/strace/strace/-/blob/master/src/linux/microblaze/get_error.c
https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/microblaze/syscall.S;h=6ebf688d800e19c14a140f0cd5a0ba9344fa95a5;hb=HEAD

> > r3 and cr0.SO are exposed directly via struct pt_regs where gpr[3] and
> > (ccr & 0x10000000) correspond to r3 and cr0.SO, respectively.
> > For example, here is an excerpt from check_syscall_restart() that assigns
> > these members of struct pt_regs:
> >          regs->result = -EINTR;
> >          regs->gpr[3] = EINTR;
> >          regs->ccr |= 0x10000000;
> > In this example, the semantics of negative ERRORCODE that's being used
> > virtually everywhere in generic kernel code is translated to powerpc sc
> > syscall return ABI which uses positive ERRORCODE and cr0.SO bit.
>
> At what point are they exposed really ? At what point do they need to
> comply with the ABI ?

On ptrace stops via pt_regs, for example?

> I'm also a bit lost between regs->orig_r3, regs->gpr[3] and regs->result.
>
> The comment added by commit 1b1a3702a65c ("powerpc: Don't negate error
> in syscall_set_return_value()") says that CCR needs to be set because of
> signal code. But signal code is invoked by syscall_exit_prepare()
> through call to interrupt_exit_user_prepare_main() after setting CR[SO]
> and negating syscall result.
>
> >
> > Also, r3 and cr0.SO are exposed indirectly via helpers.
> > For example, here is an excerpt from syscall_get_error():
> >          /*
> >           * If the system call failed,
> >           * regs->gpr[3] contains a positive ERRORCODE.
> >           */
> >          return (regs->ccr & 0x10000000UL) ? -regs->gpr[3] : 0;
> > and here is another example, from regs_return_value():
> >          if (is_syscall_success(regs))
> >                  return regs->gpr[3];
> >          else
> >                  return -regs->gpr[3];
> > In these examples, the powerpc sc syscall return ABI which uses positive
> > ERRORCODE and cr0.SO bit is translated to the semantics of negative
> > ERRORCODE that's being used virtually everywhere in generic kernel code.
> >
> > Up to a certain point in time the kernel managed to implement the powerpc
> > sc syscall return ABI in all cases where struct pt_regs was exposed to user
> > space.
> >
> > The situation changed when SECCOMP_RET_TRACE support was introduced.
> > At this point the -ERRORCODE semantics that was used under the hood to
> > implement seccomp on powerpc became exposed to user space.  The tracer
> > handling PTRACE_EVENT_SECCOMP is not just able to observe -ENOSYS in gpr[3]
> > - this is relatively harmless as at this stage there is no syscall return
> > yet so the powerpc sc syscall return ABI does not apply.  What's important
> > is that the tracer can change the syscall number to -1 thus making the
> > syscall fail, and at this point the tracer is also able to specify the
> > error value.  This has to be done in accordance with the syscall return
> > ABI, however, the current implementation of do_seccomp() supports both the
> > generic kernel -ERRORCODE return value ABI and the powerpc sc syscall
> > return ABI, thanks to syscall_exit_prepare() that converts the former to
> > the latter.  Consequently, seccomp_bpf selftest passes both with and
> > without this change.
> >
> > Now comes the moment when PTRACE_SET_SYSCALL_INFO is going to be
> > introduced.  PTRACE_SET_SYSCALL_INFO is a generic ptrace API that
> > complements PTRACE_GET_SYSCALL_INFO by letting the ptracer modify
> > the details of the system calls the tracee is blocked in.
> >
> > One of the helpers that have to be used to implement
> > PTRACE_SET_SYSCALL_INFO is syscall_set_return_value().
> > This helper complements other two helpers, syscall_get_error() and
> > syscall_get_return_value(), that are currently used to implement
> > PTRACE_GET_SYSCALL_INFO on syscall return.  When syscall_set_return_value()
> > is used to set an error code, the caller specifies it as a negative value
> > in -ERRORCODE format.
> >
> > Unfortunately, this does not work well on powerpc since commit 1b1a3702a65c
> > ("powerpc: Don't negate error in syscall_set_return_value()") because
> > syscall_set_return_value() does not follow the powerpc sc syscall return
> > ABI:
> >       /*
> >        * In the general case it's not obvious that we must deal with
> >        * CCR here, as the syscall exit path will also do that for us.
> >        * However there are some places, eg. the signal code, which
> >        * check ccr to decide if the value in r3 is actually an error.
> >        */
> >       if (error) {
> >               regs->ccr |= 0x10000000L;
> >               regs->gpr[3] = error;
> >       } else {
> >               regs->ccr &= ~0x10000000L;
> >               regs->gpr[3] = val;
> >       }
> >
> > The reason why this syscall_set_return_value() implementation was able to
> > get away with violating the powerpc sc syscall return ABI is the following:
> > Up to now, syscall_set_return_value() on powerpc could be called only from
> > do_syscall_trace_enter() via do_seccomp(), there was no way it could be
> > called from do_syscall_trace_leave() which is the point where tracers on
> > syscall return are activated and the powerpc sc syscall return ABI has
> > to be respected.
> >
> > Introduction of PTRACE_SET_SYSCALL_INFO necessitates a change of
> > syscall_set_return_value() to comply with the powerpc sc syscall return
> > ABI.  Without the change, the upcoming ptrace/set_syscall_info selftest
> > fails with the following diagnostics:
> >
> >    # set_syscall_info.c:119:set_syscall_info:Expected exp_exit->rval (-38) == info->exit.rval (38)
> >    # set_syscall_info.c:120:set_syscall_info:wait #4: PTRACE_GET_SYSCALL_INFO #2: exit stop mismatch
> >
> > Note that since backwards compatibility with the current implementation has
> > to be provided, the kernel has to continue supporting simultaneously both
> > the generic kernel -ERRORCODE return value ABI and the powerpc sc syscall
> > return ABI at least for PTRACE_EVENT_SECCOMP tracers.  Consequently, since
> > the point of __secure_computing() invocation and up to the point of
> > conversion in syscall_exit_prepare(), gpr[3] may be set according to either
> > of these two ABIs.  An attempt to address code inconsistencies in syscall
> > error return handling that were introduced as a side effect of the dual
> > ABI support follows in a separate patch.
>
> What do you mean by "backwards compatibility" here ? backwards
> compatibility applies only to userspace API doesn't it ? So if there was
> no way to trigger the problem previously, what does it mean ?

ptrace and seccomp users are pretty much userspace, aren't they?

> >
> > Link: https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.kernel.org%2Fdoc%2Fhtml%2Flatest%2Farch%2Fpowerpc%2Fsyscall64-abi.html%23return-value&data=05%7C02%7Cchristophe.leroy%40csgroup.eu%7Cc2cf590281c24fe1478408dd3efe4a3e%7C8b87af7d86474dc78df45f69a2011bb5%7C0%7C0%7C638735984085033893%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=EEP9s6k%2Fs5VfqWgrs6VXi879HEfJ8BYOOJ8InmmVTQA%3D&reserved=0 [1]
> > Signed-off-by: Dmitry V. Levin <ldv at strace.io>
> > Reviewed-by: Alexey Gladkov <legion at kernel.org>
> > ---
> >   arch/powerpc/include/asm/syscall.h | 6 +++++-
> >   1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/powerpc/include/asm/syscall.h b/arch/powerpc/include/asm/syscall.h
> > index 3dd36c5e334a..422d7735ace6 100644
> > --- a/arch/powerpc/include/asm/syscall.h
> > +++ b/arch/powerpc/include/asm/syscall.h
> > @@ -82,7 +82,11 @@ static inline void syscall_set_return_value(struct task_struct *task,
> >                */
> >               if (error) {
> >                       regs->ccr |= 0x10000000L;
> > -                     regs->gpr[3] = error;
> > +                     /*
> > +                      * In case of an error regs->gpr[3] contains
> > +                      * a positive ERRORCODE.
> > +                      */
> > +                     regs->gpr[3] = -error;
> >               } else {
> >                       regs->ccr &= ~0x10000000L;
> >                       regs->gpr[3] = val;
>


-- 
Eugene Syromyatnikov
mailto:evgsyr at gmail.com
xmpp:esyr at jabber.{ru|org}


More information about the Strace-devel mailing list