[rfc+] new generic ptrace api for setting syscall info

Renzo Davoli renzo at cs.unibo.it
Fri Jan 5 15:33:50 UTC 2024


On Tue, Aug 30, 2022 at 10:47:16PM +0300, Dmitry V. Levin wrote:
> Hi,
> I wonder has there been any progress with these ideas?
Yes!

Dear Dmitry, Mike, Eugene, Oleg, Davide and strace developers,

After a long while I have resumed the idea to add the
PTRACE_SET_SYSCALL_INFO support.

I have split the project in a number of patches (tested against
the Linux current master branch on git).

With respect to our previous (ancient ;-) messages, I have defined a
kernel configuration flags (HAVE_ARCH_SET_TRACEHOOK) so that the support
for specific architectures can be added one by one.
The patch set adds PTRACE_SET_SYSCALL_INFO for x86_64.

The actual implementation of the PTRACE_SET_SYSCALL_INFO feature is in
patch #4: it contains the code to set:
* system call number
* system call arguments
* Instruction pointer
* Stack pointer
and the support for seccomp (it is quite the same of the standard ptrace).

We discussed to add IP, SP and seccomp in a second time: the problem is
that we'd end up having a number of configuration flags to select what
support is available and what not. Plus the userland programs need to
have the way to test which sub-features of PTRACE_SET_SYSCALL_INFO are
available which not. 
IMO it is simpler to have PTRACE_SET_SYSCALL_INFO with all its
bells and whistles.

Happy {New Year, hacking}

        renzo

The following text is a draft of the messages for the LKML.
---

PTRACE_GET_SYSCALL_INFO retrieves information of a system call in a 
architecture independent mode. strace(1) is the most important use
case of PTRACE_GET_SYSCALL_INFO. Before PTRACE_GET_SYSCALL_INFO strace had
to include arch specific code to retrieve information using tags like
PTRACE_PEEKUSER, PTRACE_GETREGS, PTRACE_GETREGSET. 

PTRACE_SET_SYSCALL_INFO is the converse of PTRACE_GET_SYSCALL_INFO.

It is useful to modify the number and arguments of a system call during
process tracing. Currently it is possible to modify the syscall parameters
using PTRACE_POKEUSER, PTRACE_SETREGS or PTRACE_SETREGSET.
PTRACE_SET_SYSCALL_INFO is architecture independent and thus it permits to
write portable tools.

My interest in PTRACE_SET_SYSCALL_INFO is to implement a portable version
of vuos: https://github.com/virtualsquare/vuos.

Some postings reveal that other projects can be interested in such an extension.
https://lists.strace.io/pipermail/strace-devel/2021-October/010745.html
https://discourse.llvm.org/t/linux-powerpc-new-system-call-instruction-and-abi/55564/21?page=2
https://lists.openwall.net/linux-kernel/2019/03/27/45

PTRACE_SET_SYSCALL_INFO can be useful to enable new debugging features and
tools able to modify system calls parameters during the execution in the
same way it is currently possible to reassign variables or register values.

The following patch set implements the core support for PTRACE_SET_SYSCALL_INFO
and the arch specific support for x86_64.

PTRACE_SET_SYSCALL_INFO requires the following functions:

In asm/syscall.h:
* void syscall_set_arguments(struct task_struct *task, struct pt_regs *regs,
         const unsigned long *args);
* void syscall_set_return_value(struct task_struct *task, struct pt_regs *regs,
         int error, long val)
* void syscall_set_nr(struct task_struct *task, struct pt_regs *regs, int sysno);

In asm/ptrace.h:
* void instruction_pointer_set(struct pt_regs *regs, unsigned long val);
* void user_stack_pointer_set(struct pt_regs *regs, unsigned long val);

A new kernel configuration flag named HAVE_ARCH_SET_TRACEHOOK has been
defined. It can be set if the architecture provides all the required
functions and thus PTRACE_SET_SYSCALL_INFO can be compiled in.

The situation of HAVE_ARCH_SET_TRACEHOOK functions availability for the
architectures currently supporting PTRACE_GET_SYSCALL_INFO
(those having set the flag HAVE_ARCH_TRACEHOOK is the following:

(after the application of the patch set)
            setargs setrv setnr ipset uspset
arc         -(1)    ok    -(3)  -(6)  -(8)
arm         ok      ok    ?(4)  ok    -(9)
arm64       ok      ok    -(3)  ok    -(9)
csky        ok      ok    ok    ok    -(8)
hexagon     -(1)    -(2)  -(3)  -(6)  -(8)
ia64        ok      ok    -(3)  ?(7)  -(9)
mips        -(1)    ok    ?(4)  ok    ok
nds32       ok      ok    -(3)  -(6)  -(8)
nios2       ok      ok    -(3)  -(6)  -(8)
openrisc    ok      ok    -(3)  -(6)  -(8)
parisc      -(1)    ok    -(3)  ok    -(8)
powerpc     ok      ok    -(3)  ok    -(9)
riscv       ok      ok    -(3)  ok    ok
s390        ok      ok    -(5)  ok    -(8)
sh          ok      ok    -(3)  ok    ok
sparc       ok      ok    -(3)  ok    -(8)
x86         ok      ok    ok    ok    ok

(1) setargs can be implemented as symmetric of getargs
(2) setrv can be implemented as in many other archs
    setrv is used by seccomp_filter: CONFIG_HAVE_ARCH_SECCOMP_FILTER
    does not support hexagon
(3) setnr should be implemented as symmetric of getnr
(4) getnr takes the value from the task struct
    mips: mips_syscall_update_nr
    arm: task_thread_info(task)->abi_syscall = data & __NR_SYSCALL_MASK;
    (see: /arch/arm/kernel/ptrace.c)
(5) update 16 bits of regs->int_code
(6) instruction_pointer(regs) is a macro: use it in an inline function ipset
(7) instruction_pointer(regs) sums two values ???
    bundle + instruction: see ia64_increment_ip/ia64_decrement_ip
(8) user_stack_pointer is a macro: write an inline function for uspset
(9) uspset  can be implemented as symmetric of user_stack_pointer

Most of the missing functions seem to be easily implementable.
Further investigation is needed for (4) and (7)

[PATCH v0 1/5] add syscall_set_arguments
syscall_set_arguments was deleted from Linux in Linux 5.16
because it was unused. PTRACE_SET_SYSCALL_INFO needs it so
this patch reverts commit 7962c2eddbfe7cce879acb06f9b4f205789e57b7

[PATCH v0 2/5] add HAVE_ARCH_SET_TRACEHOOK config flag
create the new HAVE_ARCH_SET_TRACEHOOK flag, add comments
about the required functions needed.
add syscall_set_nr signature in syscall.h
PTRACE_SET_SYSCALL_INFO is supported only when HAVE_ARCH_SET_TRACEHOOK
is set.

[PATCH v0 3/5] enable HAVE_ARCH_SET_TRACEHOOK on x86_64
This patch set fulfills the requirements for HAVE_ARCH_SET_TRACEHOOK
for the x86_64 architecure.
add syscall_set_nr for x86_64 and set the config flag
PTRACE_SET_SYSCALL_INFO can be enabled for other architecures in a
similar manner: by adding the missing functions and setting the 
config flags.

[PATCH v0 4/5] add PTRACE_SET_SYSCALL_INFO support
include/uapi/linux/ptrace.h:
- add PTRACE_SET_SYSCALL_INFO tag
kernel/ptrace.c:
- add ptrace_set_syscall_info* functions
- add PTRACE_SET_SYSCALL_INFO request tag management.
The new option requires HAVE_ARCH_SET_TRACEHOOK.

A program can test if the current architecture supports 
PTRACE_SET_SYSCALL_INFO or not:
when info.op == PTRACE_SYSCALL_INFO_NONE, ptrace/PTRACE_SET_SYSCALL_INFO
returns 0 (success) if PTRACE_SET_SYSCALL_INFO is supported
(i.e. HAVE_ARCH_SET_TRACEHOOK is set).
ptrace/PTRACE_SET_SYSCALL_INFO returns EINVAL otherwise.

[PATCH v0 5/5] add selftest for PTRACE_SET_SYSCALL_INFO
The test source code is tools/testing/selftests/ptrace/set_syscall_info.c
The test succeeds if either:
* all the tests are successful
or:
* PTRACE_SET_SYSCALL_INFO is not supported for the current architecture.
(so that the entire sequence of tests successfully terminates also
for those architecures not currently supporting PTRACE_SET_SYSCALL_INFO).
-------------- next part --------------
diff --git a/arch/arm/include/asm/syscall.h b/arch/arm/include/asm/syscall.h
index fe4326d938c1..95bf70ebd878 100644
--- a/arch/arm/include/asm/syscall.h
+++ b/arch/arm/include/asm/syscall.h
@@ -80,6 +80,16 @@ static inline void syscall_get_arguments(struct task_struct *task,
 	memcpy(args, &regs->ARM_r0 + 1, 5 * sizeof(args[0]));
 }
 
+static inline void syscall_set_arguments(struct task_struct *task,
+					 struct pt_regs *regs,
+					 const unsigned long *args)
+{
+	regs->ARM_ORIG_r0 = args[0];
+	args++;
+
+	memcpy(&regs->ARM_r0 + 1, args, 5 * sizeof(args[0]));
+}
+
 static inline int syscall_get_arch(struct task_struct *task)
 {
 	/* ARM tasks don't change audit architectures on the fly. */
diff --git a/arch/arm64/include/asm/syscall.h b/arch/arm64/include/asm/syscall.h
index ab8e14b96f68..4a850ca5b1ff 100644
--- a/arch/arm64/include/asm/syscall.h
+++ b/arch/arm64/include/asm/syscall.h
@@ -73,6 +73,16 @@ static inline void syscall_get_arguments(struct task_struct *task,
 	memcpy(args, &regs->regs[1], 5 * sizeof(args[0]));
 }
 
+static inline void syscall_set_arguments(struct task_struct *task,
+					 struct pt_regs *regs,
+					 const unsigned long *args)
+{
+	regs->orig_x0 = args[0];
+	args++;
+
+	memcpy(&regs->regs[1], args, 5 * sizeof(args[0]));
+}
+
 /*
  * We don't care about endianness (__AUDIT_ARCH_LE bit) here because
  * AArch64 has the same system calls both on little- and big- endian.
diff --git a/arch/csky/include/asm/syscall.h b/arch/csky/include/asm/syscall.h
index 0de5734950bf..f624fa3bbc22 100644
--- a/arch/csky/include/asm/syscall.h
+++ b/arch/csky/include/asm/syscall.h
@@ -59,6 +59,15 @@ syscall_get_arguments(struct task_struct *task, struct pt_regs *regs,
 	memcpy(args, &regs->a1, 5 * sizeof(args[0]));
 }
 
+static inline void
+syscall_set_arguments(struct task_struct *task, struct pt_regs *regs,
+		      const unsigned long *args)
+{
+	regs->orig_a0 = args[0];
+	args++;
+	memcpy(&regs->a1, args, 5 * sizeof(regs->a1));
+}
+
 static inline int
 syscall_get_arch(struct task_struct *task)
 {
diff --git a/arch/microblaze/include/asm/syscall.h b/arch/microblaze/include/asm/syscall.h
index 5eb3f624cc59..3a6924f3cbde 100644
--- a/arch/microblaze/include/asm/syscall.h
+++ b/arch/microblaze/include/asm/syscall.h
@@ -58,6 +58,28 @@ static inline microblaze_reg_t microblaze_get_syscall_arg(struct pt_regs *regs,
 	return ~0;
 }
 
+static inline void microblaze_set_syscall_arg(struct pt_regs *regs,
+					      unsigned int n,
+					      unsigned long val)
+{
+	switch (n) {
+	case 5:
+		regs->r10 = val;
+	case 4:
+		regs->r9 = val;
+	case 3:
+		regs->r8 = val;
+	case 2:
+		regs->r7 = val;
+	case 1:
+		regs->r6 = val;
+	case 0:
+		regs->r5 = val;
+	default:
+		BUG();
+	}
+}
+
 static inline void syscall_get_arguments(struct task_struct *task,
 					 struct pt_regs *regs,
 					 unsigned long *args)
@@ -69,6 +91,17 @@ static inline void syscall_get_arguments(struct task_struct *task,
 		*args++ = microblaze_get_syscall_arg(regs, i++);
 }
 
+static inline void syscall_set_arguments(struct task_struct *task,
+					 struct pt_regs *regs,
+					 const unsigned long *args)
+{
+	unsigned int i = 0;
+	unsigned int n = 6;
+
+	while (n--)
+		microblaze_set_syscall_arg(regs, i++, *args++);
+}
+
 asmlinkage unsigned long do_syscall_trace_enter(struct pt_regs *regs);
 asmlinkage void do_syscall_trace_leave(struct pt_regs *regs);
 
diff --git a/arch/nios2/include/asm/syscall.h b/arch/nios2/include/asm/syscall.h
index fff52205fb65..526449edd768 100644
--- a/arch/nios2/include/asm/syscall.h
+++ b/arch/nios2/include/asm/syscall.h
@@ -58,6 +58,17 @@ static inline void syscall_get_arguments(struct task_struct *task,
 	*args   = regs->r9;
 }
 
+static inline void syscall_set_arguments(struct task_struct *task,
+	struct pt_regs *regs, const unsigned long *args)
+{
+	regs->r4 = *args++;
+	regs->r5 = *args++;
+	regs->r6 = *args++;
+	regs->r7 = *args++;
+	regs->r8 = *args++;
+	regs->r9 = *args;
+}
+
 static inline int syscall_get_arch(struct task_struct *task)
 {
 	return AUDIT_ARCH_NIOS2;
diff --git a/arch/openrisc/include/asm/syscall.h b/arch/openrisc/include/asm/syscall.h
index 903ed882bdec..e6383be2a195 100644
--- a/arch/openrisc/include/asm/syscall.h
+++ b/arch/openrisc/include/asm/syscall.h
@@ -57,6 +57,13 @@ syscall_get_arguments(struct task_struct *task, struct pt_regs *regs,
 	memcpy(args, &regs->gpr[3], 6 * sizeof(args[0]));
 }
 
+static inline void
+syscall_set_arguments(struct task_struct *task, struct pt_regs *regs,
+		      const unsigned long *args)
+{
+	memcpy(&regs->gpr[3], args, 6 * sizeof(args[0]));
+}
+
 static inline int syscall_get_arch(struct task_struct *task)
 {
 	return AUDIT_ARCH_OPENRISC;
diff --git a/arch/powerpc/include/asm/syscall.h b/arch/powerpc/include/asm/syscall.h
index 3dd36c5e334a..b2715448a660 100644
--- a/arch/powerpc/include/asm/syscall.h
+++ b/arch/powerpc/include/asm/syscall.h
@@ -110,6 +110,16 @@ static inline void syscall_get_arguments(struct task_struct *task,
 	}
 }
 
+static inline void syscall_set_arguments(struct task_struct *task,
+					 struct pt_regs *regs,
+					 const unsigned long *args)
+{
+	memcpy(&regs->gpr[3], args, 6 * sizeof(args[0]));
+
+	/* Also copy the first argument into orig_gpr3 */
+	regs->orig_gpr3 = args[0];
+}
+
 static inline int syscall_get_arch(struct task_struct *task)
 {
 	if (is_tsk_32bit_task(task))
diff --git a/arch/riscv/include/asm/syscall.h b/arch/riscv/include/asm/syscall.h
index 121fff429dce..8d389ba995c8 100644
--- a/arch/riscv/include/asm/syscall.h
+++ b/arch/riscv/include/asm/syscall.h
@@ -66,6 +66,15 @@ static inline void syscall_get_arguments(struct task_struct *task,
 	memcpy(args, &regs->a1, 5 * sizeof(args[0]));
 }
 
+static inline void syscall_set_arguments(struct task_struct *task,
+					 struct pt_regs *regs,
+					 const unsigned long *args)
+{
+	regs->orig_a0 = args[0];
+	args++;
+	memcpy(&regs->a1, args, 5 * sizeof(regs->a1));
+}
+
 static inline int syscall_get_arch(struct task_struct *task)
 {
 #ifdef CONFIG_64BIT
diff --git a/arch/s390/include/asm/syscall.h b/arch/s390/include/asm/syscall.h
index 27e3d804b311..b3dd883699e7 100644
--- a/arch/s390/include/asm/syscall.h
+++ b/arch/s390/include/asm/syscall.h
@@ -78,6 +78,18 @@ static inline void syscall_get_arguments(struct task_struct *task,
 	args[0] = regs->orig_gpr2 & mask;
 }
 
+static inline void syscall_set_arguments(struct task_struct *task,
+					 struct pt_regs *regs,
+					 const unsigned long *args)
+{
+	unsigned int n = 6;
+
+	while (n-- > 0)
+		if (n > 0)
+			regs->gprs[2 + n] = args[n];
+	regs->orig_gpr2 = args[0];
+}
+
 static inline int syscall_get_arch(struct task_struct *task)
 {
 #ifdef CONFIG_COMPAT
diff --git a/arch/sh/include/asm/syscall_32.h b/arch/sh/include/asm/syscall_32.h
index d87738eebe30..cb51a7528384 100644
--- a/arch/sh/include/asm/syscall_32.h
+++ b/arch/sh/include/asm/syscall_32.h
@@ -57,6 +57,18 @@ static inline void syscall_get_arguments(struct task_struct *task,
 	args[0] = regs->regs[4];
 }
 
+static inline void syscall_set_arguments(struct task_struct *task,
+					 struct pt_regs *regs,
+					 const unsigned long *args)
+{
+	regs->regs[1] = args[5];
+	regs->regs[0] = args[4];
+	regs->regs[7] = args[3];
+	regs->regs[6] = args[2];
+	regs->regs[5] = args[1];
+	regs->regs[4] = args[0];
+}
+
 static inline int syscall_get_arch(struct task_struct *task)
 {
 	int arch = AUDIT_ARCH_SH;
diff --git a/arch/sparc/include/asm/syscall.h b/arch/sparc/include/asm/syscall.h
index 20c109ac8cc9..62a5a78804c4 100644
--- a/arch/sparc/include/asm/syscall.h
+++ b/arch/sparc/include/asm/syscall.h
@@ -117,6 +117,16 @@ static inline void syscall_get_arguments(struct task_struct *task,
 	}
 }
 
+static inline void syscall_set_arguments(struct task_struct *task,
+					 struct pt_regs *regs,
+					 const unsigned long *args)
+{
+	unsigned int i;
+
+	for (i = 0; i < 6; i++)
+		regs->u_regs[UREG_I0 + i] = args[i];
+}
+
 static inline int syscall_get_arch(struct task_struct *task)
 {
 #if defined(CONFIG_SPARC64) && defined(CONFIG_COMPAT)
diff --git a/arch/um/include/asm/syscall-generic.h b/arch/um/include/asm/syscall-generic.h
index 172b74143c4b..2984feb9d576 100644
--- a/arch/um/include/asm/syscall-generic.h
+++ b/arch/um/include/asm/syscall-generic.h
@@ -62,6 +62,20 @@ static inline void syscall_get_arguments(struct task_struct *task,
 	*args   = UPT_SYSCALL_ARG6(r);
 }
 
+static inline void syscall_set_arguments(struct task_struct *task,
+					 struct pt_regs *regs,
+					 const unsigned long *args)
+{
+	struct uml_pt_regs *r = &regs->regs;
+
+	UPT_SYSCALL_ARG1(r) = *args++;
+	UPT_SYSCALL_ARG2(r) = *args++;
+	UPT_SYSCALL_ARG3(r) = *args++;
+	UPT_SYSCALL_ARG4(r) = *args++;
+	UPT_SYSCALL_ARG5(r) = *args++;
+	UPT_SYSCALL_ARG6(r) = *args;
+}
+
 /* See arch/x86/um/asm/syscall.h for syscall_get_arch() definition. */
 
 #endif	/* __UM_SYSCALL_GENERIC_H */
diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h
index f44e2f9ab65d..3a907701dcc9 100644
--- a/arch/x86/include/asm/syscall.h
+++ b/arch/x86/include/asm/syscall.h
@@ -87,6 +87,15 @@ static inline void syscall_get_arguments(struct task_struct *task,
 	memcpy(args, &regs->bx, 6 * sizeof(args[0]));
 }
 
+static inline void syscall_set_arguments(struct task_struct *task,
+					 struct pt_regs *regs,
+					 unsigned int i, unsigned int n,
+					 const unsigned long *args)
+{
+	BUG_ON(i + n > 6);
+	memcpy(&regs->bx + i, args, n * sizeof(args[0]));
+}
+
 static inline int syscall_get_arch(struct task_struct *task)
 {
 	return AUDIT_ARCH_I386;
@@ -118,6 +127,30 @@ static inline void syscall_get_arguments(struct task_struct *task,
 	}
 }
 
+static inline void syscall_set_arguments(struct task_struct *task,
+					 struct pt_regs *regs,
+					 const unsigned long *args)
+{
+# ifdef CONFIG_IA32_EMULATION
+	if (task->thread_info.status & TS_COMPAT) {
+		regs->bx = *args++;
+		regs->cx = *args++;
+		regs->dx = *args++;
+		regs->si = *args++;
+		regs->di = *args++;
+		regs->bp = *args;
+	} else
+# endif
+	{
+		regs->di = *args++;
+		regs->si = *args++;
+		regs->dx = *args++;
+		regs->r10 = *args++;
+		regs->r8 = *args++;
+		regs->r9 = *args;
+	}
+}
+
 static inline int syscall_get_arch(struct task_struct *task)
 {
 	/* x32 tasks should be considered AUDIT_ARCH_X86_64. */
diff --git a/arch/xtensa/include/asm/syscall.h b/arch/xtensa/include/asm/syscall.h
index 5ee974bf8330..f9a671cbf933 100644
--- a/arch/xtensa/include/asm/syscall.h
+++ b/arch/xtensa/include/asm/syscall.h
@@ -68,6 +68,17 @@ static inline void syscall_get_arguments(struct task_struct *task,
 		args[i] = regs->areg[reg[i]];
 }
 
+static inline void syscall_set_arguments(struct task_struct *task,
+					 struct pt_regs *regs,
+					 const unsigned long *args)
+{
+	static const unsigned int reg[] = XTENSA_SYSCALL_ARGUMENT_REGS;
+	unsigned int i;
+
+	for (i = 0; i < 6; ++i)
+		regs->areg[reg[i]] = args[i];
+}
+
 asmlinkage long xtensa_rt_sigreturn(void);
 asmlinkage long xtensa_shmat(int, char __user *, int);
 asmlinkage long xtensa_fadvise64_64(int, int,
diff --git a/include/asm-generic/syscall.h b/include/asm-generic/syscall.h
index 5a80fe728dc8..0f7b9a493de7 100644
--- a/include/asm-generic/syscall.h
+++ b/include/asm-generic/syscall.h
@@ -117,6 +117,22 @@ void syscall_set_return_value(struct task_struct *task, struct pt_regs *regs,
 void syscall_get_arguments(struct task_struct *task, struct pt_regs *regs,
 			   unsigned long *args);
 
+/**
+ * syscall_set_arguments - change system call parameter value
+ * @task:	task of interest, must be in system call entry tracing
+ * @regs:	task_pt_regs() of @task
+ * @args:	array of argument values to store
+ *
+ * Changes 6 arguments to the system call.
+ * The first argument gets value @args[0], and so on.
+ *
+ * It's only valid to call this when @task is stopped for tracing on
+ * entry to a system call, due to %SYSCALL_WORK_SYSCALL_TRACE or
+ * %SYSCALL_WORK_SYSCALL_AUDIT.
+ */
+void syscall_set_arguments(struct task_struct *task, struct pt_regs *regs,
+			   const unsigned long *args);
+
 /**
  * syscall_get_arch - return the AUDIT_ARCH for the current system call
  * @task:	task of interest, must be blocked
-------------- next part --------------
diff --git a/arch/Kconfig b/arch/Kconfig
index f4b210ab0612..9dda3c8bbb7e 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -253,6 +253,18 @@ config TRACE_IRQFLAGS_NMI_SUPPORT
 config HAVE_ARCH_TRACEHOOK
 	bool
 
+#
+# An arch should select this if it provides all these things:
+#
+# syscall_set_arguments() in asm/syscall.h
+# syscall_set_return_value() in asm/syscall.h
+# syscall_set_nr() in asm/syscall.h
+# instruction_pointer_set() in asm/ptrace.h
+# user_stack_pointer_set() in asm/ptrace.h
+#
+config HAVE_ARCH_SET_TRACEHOOK
+	bool
+
 config HAVE_DMA_CONTIGUOUS
 	bool
 
diff --git a/include/asm-generic/syscall.h b/include/asm-generic/syscall.h
index 0f7b9a493de7..8d374007145c 100644
--- a/include/asm-generic/syscall.h
+++ b/include/asm-generic/syscall.h
@@ -37,6 +37,23 @@ struct pt_regs;
  */
 int syscall_get_nr(struct task_struct *task, struct pt_regs *regs);
 
+/**
+ * syscall_set_nr - change what system call a task is executing
+ * @task: task of interest, must be blocked
+ * @regs: task_pt_regs() of @task
+ * @sysno: system call number
+ *
+ * If @task is executing a system call or is at system call
+ * tracing about to attempt one, change the system call number.
+ *
+ * It's only valid to call this when @task is known to be blocked.
+ *
+ * Architectures which permit CONFIG_HAVE_ARCH_SET_TRACEHOOK must
+ * provide an implementation of this.
+ */
+void syscall_set_nr(struct task_struct *task, struct pt_regs *regs,
+			   int sysno)
+
 /**
  * syscall_rollback - roll back registers after an aborted system call
  * @task:	task of interest, must be in system call exit tracing
-------------- next part --------------
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 3762f41bb092..6955d32aaaff 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -188,6 +188,7 @@ config X86
 	select HAVE_ARCH_THREAD_STRUCT_WHITELIST
 	select HAVE_ARCH_STACKLEAK
 	select HAVE_ARCH_TRACEHOOK
+	select HAVE_ARCH_SET_TRACEHOOK
 	select HAVE_ARCH_TRANSPARENT_HUGEPAGE
 	select HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD if X86_64
 	select HAVE_ARCH_USERFAULTFD_WP         if X86_64 && USERFAULTFD
diff --git a/arch/x86/include/asm/syscall.h b/arch/x86/include/asm/syscall.h
index 3a907701dcc9..b05174da5b6c 100644
--- a/arch/x86/include/asm/syscall.h
+++ b/arch/x86/include/asm/syscall.h
@@ -40,6 +40,12 @@ static inline int syscall_get_nr(struct task_struct *task, struct pt_regs *regs)
 	return regs->orig_ax;
 }
 
+static inline void syscall_set_nr(struct task_struct *task, struct pt_regs *regs,
+				    int sysno)
+{
+	regs->orig_ax = sysno;
+}
+
 static inline void syscall_rollback(struct task_struct *task,
 				    struct pt_regs *regs)
 {
-------------- next part --------------
diff --git a/include/uapi/linux/ptrace.h b/include/uapi/linux/ptrace.h
index 72c038fc71d0..52e385b9b1f3 100644
--- a/include/uapi/linux/ptrace.h
+++ b/include/uapi/linux/ptrace.h
@@ -74,10 +74,12 @@ struct seccomp_metadata {
 };
 
 #define PTRACE_GET_SYSCALL_INFO		0x420e
+#define PTRACE_SET_SYSCALL_INFO		0x4212
 #define PTRACE_SYSCALL_INFO_NONE	0
 #define PTRACE_SYSCALL_INFO_ENTRY	1
 #define PTRACE_SYSCALL_INFO_EXIT	2
 #define PTRACE_SYSCALL_INFO_SECCOMP	3
+#define PTRACE_SYSCALL_INFO_SECCOMP_SKIP	4
 
 struct ptrace_syscall_info {
 	__u8 op;	/* PTRACE_SYSCALL_INFO_* */
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index d8b5e13a2229..0481f182fc9a 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -1026,6 +1026,113 @@ ptrace_get_syscall_info(struct task_struct *child, unsigned long user_size,
 	write_size = min(actual_size, user_size);
 	return copy_to_user(datavp, &info, write_size) ? -EFAULT : actual_size;
 }
+
+#ifdef CONFIG_HAVE_ARCH_SET_TRACEHOOK
+static void
+ptrace_set_syscall_info_entry(struct task_struct *child, struct pt_regs *regs,
+			      struct ptrace_syscall_info *info)
+{
+	unsigned long args[ARRAY_SIZE(info->entry.args)];
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(args); i++)
+		args[i] = info->entry.args[i];
+
+	syscall_set_nr(child, regs, info->entry.nr);
+	syscall_set_arguments(child, regs, args);
+}
+
+static void
+ptrace_set_syscall_info_seccomp(struct task_struct *child, struct pt_regs *regs,
+				struct ptrace_syscall_info *info)
+{
+	ptrace_set_syscall_info_entry(child, regs, info);
+}
+
+static void
+ptrace_set_syscall_info_exit(struct task_struct *child, struct pt_regs *regs,
+			     struct ptrace_syscall_info *info)
+{
+	if (info->exit.is_error)
+		syscall_set_return_value(child, regs, info->exit.rval, 0);
+	else
+		syscall_set_return_value(child, regs, 0, info->exit.rval);
+}
+
+static void
+ptrace_set_syscall_info_seccomp_skip(struct task_struct *child, struct pt_regs *regs,
+        struct ptrace_syscall_info *info)
+{
+	syscall_set_nr(child, regs, -1);
+	ptrace_set_syscall_info_exit(child, regs, info);
+}
+
+static int
+ptrace_set_syscall_info(struct task_struct *child, unsigned long user_size,
+      void __user *datavp)
+{
+	struct pt_regs *regs = task_pt_regs(child);
+	struct ptrace_syscall_info info;
+	unsigned long read_size = min(sizeof(info), user_size);
+
+	if (read_size < offsetof(struct ptrace_syscall_info, entry))
+		return -EINVAL;
+
+	if (copy_from_user(&info, datavp, read_size))
+		return -EFAULT;
+
+	if (info.op == PTRACE_SYSCALL_INFO_NONE)
+		return 0;
+
+	if (read_size < sizeof(info))
+		memset(((char *)&info) + read_size, 0, sizeof(info) - read_size);
+
+	switch (child->last_siginfo ? child->last_siginfo->si_code : 0) {
+		case SIGTRAP | 0x80:
+			switch (child->ptrace_message) {
+				case PTRACE_EVENTMSG_SYSCALL_ENTRY:
+					if (info.op == PTRACE_SYSCALL_INFO_ENTRY)
+						break;
+					return -EINVAL;
+				case PTRACE_EVENTMSG_SYSCALL_EXIT:
+					if (info.op == PTRACE_SYSCALL_INFO_EXIT)
+						break;
+					return -EINVAL;
+			}
+			break;
+		case SIGTRAP | (PTRACE_EVENT_SECCOMP << 8):
+			if (info.op == PTRACE_SYSCALL_INFO_SECCOMP ||
+					info.op == PTRACE_SYSCALL_INFO_SECCOMP_SKIP)
+				break;
+			return -EINVAL;
+		default:
+			return -EINVAL;
+	}
+
+	if (info.instruction_pointer != 0)
+		instruction_pointer_set(regs, info.instruction_pointer);
+
+	if (info.stack_pointer != 0)
+		user_stack_pointer_set(regs, info.stack_pointer);
+
+	switch (info.op) {
+		case PTRACE_SYSCALL_INFO_ENTRY:
+			ptrace_set_syscall_info_entry(child, regs, &info);
+			break;
+		case PTRACE_SYSCALL_INFO_EXIT:
+			ptrace_set_syscall_info_exit(child, regs, &info);
+			break;
+		case PTRACE_SYSCALL_INFO_SECCOMP:
+			ptrace_set_syscall_info_seccomp(child, regs, &info);
+			break;
+		case PTRACE_SYSCALL_INFO_SECCOMP_SKIP:
+			ptrace_set_syscall_info_seccomp_skip(child, regs, &info);
+			break;
+	}
+	return 0;
+}
+
+#endif /* CONFIG_HAVE_ARCH_SET_TRACEHOOK */
 #endif /* CONFIG_HAVE_ARCH_TRACEHOOK */
 
 int ptrace_request(struct task_struct *child, long request,
@@ -1244,6 +1351,12 @@ int ptrace_request(struct task_struct *child, long request,
 	case PTRACE_GET_SYSCALL_INFO:
 		ret = ptrace_get_syscall_info(child, addr, datavp);
 		break;
+
+	case PTRACE_SET_SYSCALL_INFO:
+#ifdef CONFIG_HAVE_ARCH_SET_TRACEHOOK
+		ret = ptrace_set_syscall_info(child, addr, datavp);
+#endif
+		break;
 #endif
 
 	case PTRACE_SECCOMP_GET_FILTER:
-------------- next part --------------
diff --git a/tools/testing/selftests/ptrace/.gitignore b/tools/testing/selftests/ptrace/.gitignore
index b7dde152e75a..308852d264dc 100644
--- a/tools/testing/selftests/ptrace/.gitignore
+++ b/tools/testing/selftests/ptrace/.gitignore
@@ -1,5 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0-only
 get_syscall_info
+set_syscall_info
 get_set_sud
 peeksiginfo
 vmaccess
diff --git a/tools/testing/selftests/ptrace/Makefile b/tools/testing/selftests/ptrace/Makefile
index 1c631740a730..c5e0b76ba6ac 100644
--- a/tools/testing/selftests/ptrace/Makefile
+++ b/tools/testing/selftests/ptrace/Makefile
@@ -1,6 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0-only
 CFLAGS += -std=c99 -pthread -Wall $(KHDR_INCLUDES)
 
-TEST_GEN_PROGS := get_syscall_info peeksiginfo vmaccess get_set_sud
+TEST_GEN_PROGS := get_syscall_info set_syscall_info peeksiginfo vmaccess get_set_sud
 
 include ../lib.mk
diff --git a/tools/testing/selftests/ptrace/set_syscall_info.c b/tools/testing/selftests/ptrace/set_syscall_info.c
new file mode 100644
index 000000000000..368ae1e249be
--- /dev/null
+++ b/tools/testing/selftests/ptrace/set_syscall_info.c
@@ -0,0 +1,285 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Copyright (c) 2021 Renzo Davoli, based on get_syscall_info by:
+ * Copyright (c) 2018 Dmitry V. Levin <ldv at altlinux.org>
+ * All rights reserved.
+ *
+ * Check whether PTRACE_SET_SYSCALL_INFO semantics implemented in the kernel
+ * matches userspace expectations.
+ */
+
+#include "../kselftest_harness.h"
+#include <err.h>
+#include <signal.h>
+#include <asm/unistd.h>
+#include "linux/ptrace.h"
+
+#ifndef PTRACE_SET_SYSCALL_INFO
+#define PTRACE_SET_SYSCALL_INFO 0x4212
+#endif
+#ifndef PTRACE_SYSCALL_INFO_SECCOMP_SKIP
+#define PTRACE_SYSCALL_INFO_SECCOMP_SKIP       4
+#endif
+
+
+	static int
+kill_tracee(pid_t pid)
+{
+	if (!pid)
+		return 0;
+
+	int saved_errno = errno;
+
+	int rc = kill(pid, SIGKILL);
+
+	errno = saved_errno;
+	return rc;
+}
+
+	static long
+sys_ptrace(int request, pid_t pid, unsigned long addr, unsigned long data)
+{
+	return syscall(__NR_ptrace, request, pid, addr, data);
+}
+
+#define LOG_KILL_TRACEE(fmt, ...)				\
+	do {							\
+		kill_tracee(traceeid);				\
+		TH_LOG("wait #%d: " fmt,			\
+				ptrace_stop, ##__VA_ARGS__);		\
+	} while (0)
+
+typedef struct {    /* op == PTRACE_SYSCALL_INFO_ENTRY */
+	__u64 nr;       /* System call number */
+	__u64 args[6];  /* System call arguments */
+} syscall_info_entry;
+
+typedef struct {    /* op == PTRACE_SYSCALL_INFO_EXIT */
+	__s64 rval;     /* System call return value */
+	__u8 is_error;  /* System call error flag; */
+} syscall_info_exit;
+
+struct sycall_test {
+	char *test;
+	__u8 setentry;                /* boolean: set_syscall_info requred at syscall entry */
+	__u8 setexit;                 /* boolean: set_syscall_info requred at syscall exit */
+	syscall_info_entry entry;     /* syscall nr and args as requested by the process */
+	syscall_info_entry newentry;  /* new syscall nr and args as modified by the tracer */
+	syscall_info_exit exit;       /* rval/error returned by the kernel */
+	syscall_info_exit newexit;    /* rval/error modified by the tracer/expected by the tracee */
+};
+
+TEST(set_syscall_info)
+{
+	pid_t tracerid = getpid();
+	pid_t traceeid = fork();
+
+	ASSERT_LE(0, traceeid) {
+		TH_LOG("fork: %m");
+	}
+
+	if (traceeid == 0)
+		traceeid = getpid();
+
+	struct sycall_test tests[] = {
+		/* test #0: replace syscall args: 'chdir("")' changed in 'chdir("/")' */
+		{ "change arg", 1, 0,
+			{.nr = __NR_chdir, .args[0] = (__u64) ""}, {.nr = __NR_chdir, .args[0] = (__u64) "/"},
+			{0, 0}, {0, 0}},
+		/* test #1: replace retvalue getpid returns the tracer's pid instead of the tracee's */
+		{ "change retval", 0, 1,
+			{.nr = __NR_getpid}, {.nr = __NR_getpid},
+			{traceeid, 0}, {tracerid, 0}},
+		/* test #2: generate an error: getpid returns ENOMEM instead of the tracee's pid */
+		{ "return error", 0, 1,
+			{.nr = __NR_getpid}, {.nr = __NR_getpid},
+			{traceeid, 0}, {-ENOMEM, 1}},
+		/* test #3: replace syscall nr */
+		{ "replace syscall", 1, 0,
+			{.nr = __NR_getpid}, {.nr = __NR_getppid},
+			{tracerid, 0}, {tracerid, 0}},
+		{ "all tests OK, leave", 0, 0,
+			{.nr = __NR_exit_group, .args[0] = 0}, {.nr = __NR_exit_group, .args[0] = 0}}
+	};
+
+	if (traceeid == getpid()) {
+		ASSERT_EQ(0, sys_ptrace(PTRACE_TRACEME, 0, 0, 0)) {
+			TH_LOG("PTRACE_TRACEME: %m");
+		}
+		ASSERT_EQ(0, kill(traceeid, SIGSTOP)) {
+			/* cannot happen */
+			TH_LOG("kill SIGSTOP: %m");
+		}
+		for (unsigned int i = 0; i < ARRAY_SIZE(tests); ++i) {
+			__u64 *args = tests[i].entry.args;
+			__u8 is_error = 0;
+			/* generate a syscall request */
+			__s64 rval = syscall(tests[i].entry.nr,
+					args[1], args[2], args[3],
+					args[4], args[5], args[6]);
+			/* check expected results */
+			if (rval == -1) {
+				is_error = 1;
+				rval = -errno;
+			}
+			/* check the expected results error/return value */
+			ASSERT_EQ(is_error, tests[i].newexit.is_error) {
+				_exit(2);
+			}
+			ASSERT_EQ(rval, tests[i].newexit.rval) {
+				TH_LOG("rval %d", i);
+				_exit(2);
+			}
+		}
+		/* unreachable */
+		_exit(1);
+	}
+
+	unsigned int ptrace_stop;
+
+	for (ptrace_stop = 0; ; ++ptrace_stop) {
+		struct ptrace_syscall_info info = {
+			.op = 0xff	/* invalid PTRACE_SYSCALL_INFO_* op */
+		};
+		const size_t size = sizeof(info);
+		const int expected_none_size =
+			(void *) &info.entry - (void *) &info;
+		const int expected_entry_size =
+			(void *) &info.entry.args[6] - (void *) &info;
+		const int expected_exit_size =
+			(void *) (&info.exit.is_error + 1) -
+			(void *) &info;
+		int status;
+		long rc;
+
+		ASSERT_EQ(traceeid, wait(&status)) {
+			/* cannot happen */
+			LOG_KILL_TRACEE("wait: %m");
+		}
+		if (WIFEXITED(status)) {
+			traceeid = 0;	/* the tracee is no more */
+			ASSERT_EQ(0, WEXITSTATUS(status));
+			break;
+		}
+		ASSERT_FALSE(WIFSIGNALED(status)) {
+			traceeid = 0;	/* the tracee is no more */
+			LOG_KILL_TRACEE("unexpected signal %u",
+					WTERMSIG(status));
+		}
+		ASSERT_TRUE(WIFSTOPPED(status)) {
+			/* cannot happen */
+			LOG_KILL_TRACEE("unexpected wait status %#x", status);
+		}
+
+		switch (WSTOPSIG(status)) {
+			case SIGSTOP:
+				ASSERT_EQ(0, ptrace_stop) {
+					LOG_KILL_TRACEE("unexpected signal stop");
+				}
+				ASSERT_EQ(0, sys_ptrace(PTRACE_SETOPTIONS, traceeid, 0,
+							PTRACE_O_TRACESYSGOOD)) {
+					LOG_KILL_TRACEE("PTRACE_SETOPTIONS: %m");
+				}
+				ASSERT_LT(0, (rc = sys_ptrace(PTRACE_GET_SYSCALL_INFO,
+								traceeid, size,
+								(unsigned long) &info))) {
+					LOG_KILL_TRACEE("PTRACE_GET_SYSCALL_INFO: %m");
+				}
+				ASSERT_EQ(expected_none_size, rc) {
+					LOG_KILL_TRACEE("signal stop mismatch");
+				}
+				ASSERT_EQ(PTRACE_SYSCALL_INFO_NONE, info.op) {
+					LOG_KILL_TRACEE("signal stop mismatch");
+				}
+				ASSERT_TRUE(info.arch) {
+					LOG_KILL_TRACEE("signal stop mismatch");
+				}
+				ASSERT_TRUE(info.instruction_pointer) {
+					LOG_KILL_TRACEE("signal stop mismatch");
+				}
+				ASSERT_TRUE(info.stack_pointer) {
+					LOG_KILL_TRACEE("signal stop mismatch");
+				}
+				info.op = PTRACE_SYSCALL_INFO_NONE;
+				if (sys_ptrace(PTRACE_SET_SYSCALL_INFO,
+							traceeid, size,
+							(unsigned long) &info) != 0) {
+					LOG_KILL_TRACEE("PTRACE_SET_SYSCALL_INFO not supported by this architecture");
+					return;
+				}
+
+				break;
+
+			case SIGTRAP | 0x80:
+				ASSERT_LT(0, (rc = sys_ptrace(PTRACE_GET_SYSCALL_INFO,
+								traceeid, size,
+								(unsigned long) &info))) {
+					LOG_KILL_TRACEE("PTRACE_GET_SYSCALL_INFO: %m");
+				}
+				if (ptrace_stop & 1) {
+					/* entry */
+					int index = ptrace_stop >> 1;
+					TH_LOG("test: %s", tests[index].test);
+					ASSERT_EQ(expected_entry_size, rc) {
+						LOG_KILL_TRACEE("entry stop mismatch");
+					}
+					ASSERT_EQ(PTRACE_SYSCALL_INFO_ENTRY, info.op) {
+						LOG_KILL_TRACEE("entry stop mismatch");
+					}
+					/* set syscall nr and args if required */
+					if (tests[index].setentry) {
+						info.entry.nr = tests[index].newentry.nr;
+						info.entry.args[0] = tests[index].newentry.args[0];
+						info.entry.args[1] = tests[index].newentry.args[1];
+						info.entry.args[2] = tests[index].newentry.args[2];
+						info.entry.args[3] = tests[index].newentry.args[3];
+						info.entry.args[4] = tests[index].newentry.args[4];
+						info.entry.args[5] = tests[index].newentry.args[5];
+						ASSERT_EQ(0, (rc = sys_ptrace(PTRACE_SET_SYSCALL_INFO,
+										traceeid, size,
+										(unsigned long) &info)))
+							LOG_KILL_TRACEE("PTRACE_SET_SYSCALL_INFO: %m");
+					}
+				} else {
+					/* exit */
+					int index = (ptrace_stop >> 1) - 1;
+					ASSERT_EQ(expected_exit_size, rc) {
+						LOG_KILL_TRACEE("exit stop mismatch");
+					}
+					ASSERT_EQ(PTRACE_SYSCALL_INFO_EXIT, info.op) {
+						LOG_KILL_TRACEE("exit stop mismatch");
+					}
+					ASSERT_EQ(tests[index].exit.is_error,
+							info.exit.is_error) {
+						LOG_KILL_TRACEE("exit stop mismatch");
+					}
+					ASSERT_EQ(tests[index].exit.rval, info.exit.rval) {
+						LOG_KILL_TRACEE("exit stop mismatch");
+					}
+					/* set return value/error if required */
+					if (tests[index].setexit) {
+						info.exit.rval = tests[index].newexit.rval;
+						info.exit.is_error = tests[index].newexit.is_error;
+						ASSERT_EQ(0, (rc = sys_ptrace(PTRACE_SET_SYSCALL_INFO,
+										traceeid, size,
+										(unsigned long) &info)))
+							LOG_KILL_TRACEE("PTRACE_SET_SYSCALL_INFO: %m");
+					}
+				}
+				break;
+
+			default:
+				LOG_KILL_TRACEE("unexpected stop signal %#x",
+						WSTOPSIG(status));
+				abort();
+		}
+
+		ASSERT_EQ(0, sys_ptrace(PTRACE_SYSCALL, traceeid, 0, 0)) {
+			LOG_KILL_TRACEE("PTRACE_SYSCALL: %m");
+		}
+	}
+
+	ASSERT_EQ(ARRAY_SIZE(tests) * 2, ptrace_stop);
+}
+
+TEST_HARNESS_MAIN


More information about the Strace-devel mailing list