[patch] Hangs and/or multithreaded process left Stopped (T) on CTRL-C of strace

Jan Kratochvil jan.kratochvil at redhat.com
Wed May 23 16:25:14 UTC 2007


Hi,

strace sometimes hangs during detaching from a multithreaded application during
CTRL-C of strace itself.
As a sideeffects in some cases the multithreaded application gets Stopped (T, by
SIGSTOP) and needs to be sent `kill -CONT'.  Shell prints:
[1]+  Stopped                 appname args

It will print:
	Process 13968 attached with 64 threads - interrupt to quit
	Process 13907 detached
	...
	Process 13905 detached
	[HANG; many other tasks should have been printed]

The problem occurs due to kill() may choose arbitrarily the target task of the
process group while we later wait just on one specific TID.
PID process waits become TID task specific waits for process under ptrace(2).
[ Roland McGrath originally provided this useful info. ]

Unfortunately the POSIX specification does not seem to mention this behavior:
	http://www.opengroup.org/onlinepubs/009695399/functions/kill.html
This paragraph talks only about kill (getpid (), ...):
	If the value of pid causes sig to be generated for the sending process,
	and if sig is not blocked for the calling thread and if no other thread
	has sig unblocked or is waiting in a sigwait() function for sig, either
	sig or at least one pending unblocked signal shall be delivered to the
	sending thread before kill() returns.


Hanged up state description:
The process being traced gets into state:
/proc/12664/task/12664/status:State:	S (sleeping)
/proc/12664/task/12665/status:State:	T (tracing stop)
...
/proc/12664/task/12730/status:State:	T (tracing stop)
with STRACE in state:
#0  0xa000000000010641 in __kernel_syscall_via_break ()
#1  0x2000000000162fe0 in wait4 () from /lib/tls/libc.so.6.1
#2  0x4000000000008420 in detach (tcp=0x600000000001c050, sig=0) at strace.c:1337
1337	  if (wait4(tcp->pid, &status, __WALL, NULL) < 0) {
#3  0x40000000000093b0 in cleanup () at strace.c:1516
#4  0x4000000000006ea0 in main (argc=6, argv=0x60000fffffff9f18) at strace.c:803

- STRACE sends SIGSTOP to the process thread group leader but never receives it
back through wait4().


Testcase contains workaround of Linux kernel Bug leaking ERESTARTNOINTR to the
userland, it is present on some older Linux kernel variants around 2.6.9.  This
Linux kernel problem otherwise does not affect this Bug.

The attached testcase to reliably pass needs also the next patch from the mail:
	[patch] Multithreaded process left Stopped (T) on CTRL-C of strace


Regards,
Jan

Problem tracked at:
	https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=240962
-------------- next part --------------
2007-05-23  Jan Kratochvil  <jan.kratochvil at redhat.com>

	* strace.c [LINUX] (tkill): New definition.
	[LINUX] (detach): Use tkill(2) instead of kill(2).

diff -u -rup strace-4.5.15-detach-zombie/strace.c strace-4.5.15/strace.c
--- strace-4.5.15-detach-zombie/strace.c	2007-05-23 16:09:24.000000000 +0200
+++ strace-4.5.15/strace.c	2007-05-23 16:32:55.000000000 +0200
@@ -46,6 +46,11 @@
 #include <limits.h>
 #include <dirent.h>
 
+#ifdef LINUX
+#include <sys/syscall.h>
+#define tkill(tid, sig) syscall (SYS_tkill, (tid), (sig))
+#endif
+
 #if defined(IA64) && defined(LINUX)
 # include <asm/ptrace_offsets.h>
 #endif
@@ -1323,11 +1328,14 @@ int sig;
 		/* Shouldn't happen. */
 		perror("detach: ptrace(PTRACE_DETACH, ...)");
 	}
-	else if (kill(tcp->pid, 0) < 0) {
+	/* kill() may choose arbitrarily the target task of the process group
+	   while we wait on a specific TID below.  PID process waits become TID
+	   task specific waits for process under ptrace(2).  */
+	else if (tkill(tcp->pid, 0) < 0) {
 		if (errno != ESRCH)
 			perror("detach: checking sanity");
 	}
-	else if (kill(tcp->pid, SIGSTOP) < 0) {
+	else if (tkill(tcp->pid, SIGSTOP) < 0) {
 		if (errno != ESRCH)
 			perror("detach: stopping child");
 	}
-------------- next part --------------
#include <stdio.h>
#include <sys/types.h>
#include <pthread.h>
#include <signal.h>
#include <assert.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>

void *start (void *arg)
{
	for (;;)  {
		int i;
		struct sigaction oact;

		i = sigaction (SIGCHLD, (struct sigaction *) NULL, &oact);
#ifndef ERESTARTNOINTR
#define ERESTARTNOINTR  513
#endif
		if (i == -1 && errno == ERESTARTNOINTR)
		  {
#if 0
		    fprintf (stderr, "sigaction(): ERESTARTNOINTR\n");
		    abort ();
#endif
		  }
		else
		  {
		    assert (i == 0);
		    assert (oact.sa_handler == SIG_DFL);
		  }
	}
}

int main(int argc, char *argv[])
{
pid_t   p_pid;
pthread_t	thread_id[64];
int i;
int	status;
void 	*result;

	p_pid=getpid();

        printf("%d\n",p_pid);

	for(i=0;i<sizeof(thread_id)/sizeof(*thread_id);i++)
		status=pthread_create(&thread_id[i],NULL,start,(void *)NULL);

	for(i=0;i<sizeof(thread_id)/sizeof(*thread_id);i++)
		pthread_join(thread_id[i],&result);

        printf("[%d]end\n",p_pid);

	return EXIT_SUCCESS;
}


More information about the Strace-devel mailing list