[GSOC][namespace Support]

Eugene Syromiatnikov esyr at redhat.com
Mon Mar 26 12:00:09 UTC 2018


On Sun, Mar 18, 2018 at 01:43:43PM +0800, WeiFeng Lai wrote:
> > How do you expect to do this, taking into account the fact that strace
> >process doesn't normally have CAP_SYS_ADMIN?
> >Note that stable upstream kernels do not normally accept new features.
> >And downstream kernels are also quite hesitant in doing so.
> 
> Thank you for a reply, I will summarize the previous misunderstandings:
> 1. strace does not have CAP_SYS_ADMIN privileges in most cases, and
> mounting /proc requires root privileges.
>     Therefore, it is not desirable to mount /proc continuously.
> 2. The method of adding a system call is difficult to be passed upstream
> kernel or downstream kernel, so the previous idea is really unrealistic.
> 
> >There are NSFS_* ioctls present that can be used for (PID) namespace
> >tree traversal[3]. Along with inspection of *id fields in
> >/proc/<pid>/status, the available information information is sufficient
> >for deriving the needed PID in strace's PID NS (having /proc mounted
> >with different PID NS quite complicates things but still manageable).
> >[1] https://lkml.org/lkml/2018/3/13/1544
> >[2] https://lkml.org/lkml/2017/10/13/177
> >[3] http://blog.man7.org/2016/12/introspecting-namespace-relationships.html
> 
> Thanks for your excellent documents which include many elegant thoughts.
> I think starting with /proc/[pid]/ns/* is better one after reading over
> these documents.
> Through reading a lot of articles to understand the structure of the
> namespace, the operation, and the new features of kernel 4.9 (I only
> contacted kernel 3.X and 4.1 before, but I'm downloading kernel 4.9), I
> know a truth that many things that can be used to display PIDs in different
> namespaces.
> 
> Some of my opinions as follows:
> In this article by Michael Kerrisk (link:
> http://blog.man7.org/2016/12/introspecting-namespace-relationships.html),
> the relationships and features between different namespaces are described.
> Describes one of the important features of linux kernel 4.9: Support for
> binding a (unmounted) object in the namespace using the file descriptor fd.
> With this feature, we can check the /proc/[pid]/ns/* files for all
> processes and build a map which contains all processes in the
> pid_namespaces and reflects hierarchical pid_namespace, the use of this map
> can be realized that all processes on the system can discover the PID and
> user namespace structure hierarchy on a live system.
> Referring to the source code in the article with the Go language (link:
> http://blog.man7.org/2016/12/introspecting-namespace-relationships.html), I
> think what I can do at the moment is to use the knowledge from "Introduce
> to algorithm" and other Data Structures before learning, and use those
> knowledge to optimize the retrieval process.
> Is there anything wrong with my understanding of those documents?
> Is there any better suggestion?

Yes, a PID NS tree should be built (at least to the point the desired
information is obtained) in order to perform the translation. As I said,
the endeavor can be complicated by the fact that /proc can be mounted from
the alien PID namespace, but in that case we can just bail out early, as
it is not a normal setup (however, pretty much possible).

Note that since this is involves quite a lot of syscalls, some form of
caching should be implemented. It is also complicated by the fact that
processes can come and go between queries, so we should account for that
somehow ({i,fa}notify?).

> BWT, there is another problem I don't how to solve it. it needs
> CAP_SYS_ADMIN when system check the contents of  /proc/[pid]/ns/* .
> that means strace need CAP_SYS_ADMIN  privileges still. Is there some
> better ways to solve this problem?

Why are you saying that CAP_SYS_ADMIN is needed? It perfectly works without it.

pts/15, esyr at asgard: /tmp % sudo unshare -p --fork su - esyr -c 'sleep 100' &
[2] 18281
pts/15, esyr at asgard: /tmp % cat ns.c
#include <assert.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/ioctl.h>
#include <sys/ptrace.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>

#define NSIO	0xb7
#define NS_GET_PARENT		_IO(NSIO, 0x2)

int main(int argc, char **argv)
{
	int target_pid = strtol(argv[1], NULL, 0);
	char *path;
	struct stat st;
	int pidns_fd;
	int pidns_fd_parent;

	asprintf(&path, "/proc/%d/ns/pid", target_pid);

	assert(!ptrace(PTRACE_SEIZE, target_pid));

	pidns_fd = open(path, O_RDONLY);
	assert(pidns_fd >= 0);
	printf("pidns_fd = %d\n", pidns_fd);

	assert(!fstat(pidns_fd, &st));
	printf("pid ns inode: %llu\n", (unsigned long long) st.st_ino);

	pidns_fd_parent = ioctl(pidns_fd, NS_GET_PARENT);
	assert(pidns_fd_parent >= 0);
	printf("pidns_fd_parent = %d\n", pidns_fd_parent);

	assert(!fstat(pidns_fd_parent, &st));
	printf("parent pid ns inode: %llu", (unsigned long long) st.st_ino);

	return 0;
}
pts/15, esyr at asgard: /tmp % gcc ns.c -o ns
pts/15, esyr at asgard: /tmp % ./ns $(pgrep -f '^sleep 100$')
pidns_fd = 3
pid ns inode: 4026532513
pidns_fd_parent = 4
parent pid ns inode: 4026531836
pts/15, esyr at asgard: /tmp % ls -la /proc/$(pgrep -f '^sleep 100$')/ns/pid
lrwxrwxrwx 1 esyr esyr 0 Mar 26 13:54 /proc/18284/ns/pid -> pid:[4026532513]
pts/15, esyr at asgard: /tmp % ls -la /proc/self/ns/pid 
lrwxrwxrwx 1 esyr esyr 0 Mar 26 13:55 /proc/self/ns/pid -> pid:[4026531836]
pts/15, esyr at asgard: /tmp %

> 2018-03-17 14:56 GMT+08:00 Eugene Syromiatnikov <esyr at redhat.com>:
> 
> > On Sat, Mar 17, 2018 at 10:52:39AM +0800, WeiDeng Lai wrote:
> > > mounting /proc whenever we enter the new name space.
> >
> > How do you expect to do this, taking into account the fact that strace
> > process doesn't normally have CAP_SYS_ADMIN?
> >
> > > To complete this requirement,we can make a try to add a
> > > new kernel API for trans_pid between different pid_namespaces,such as
> > patch
> > > in link: * https://lkml.org/lkml/2018/3/6/593
> > > <https://lkml.org/lkml/2018/3/6/593> *.
> >
> > Note Eric Biederman's comments there[1]. Please also refer to the
> > discussion related to the previous version of the patch[2]. How do you
> > expect to address the objections raised there in order to have the API
> > accepted in the kernel's upstream?
> >
> > > a few days ago,I talk with my  seniors of community,we have a consistent
> > > point that add a new kernel API may a good idea,we can apply patch on
> > later
> > > kernel versions,and modify it so that patch can apply on 3.x to now.If it
> > > make sense,I'll do this.
> >
> > Note that stable upstream kernels do not normally accept new features.
> > And downstream kernels are also quite hesitant in doing so.
> >
> > > I don't hatch other methods,can someone provide some information or
> > > documents for my reference?
> >
> > There are NSFS_* ioctls present that can be used for (PID) namespace
> > tree traversal[3]. Along with inspection of *id fields in
> > /proc/<pid>/status, the available information information is sufficient
> > for deriving the needed PID in strace's PID NS (having /proc mounted
> > with different PID NS quite complicates things but still manageable).
> >
> > [1] https://lkml.org/lkml/2018/3/13/1544
> > [2] https://lkml.org/lkml/2017/10/13/177
> > [3] http://blog.man7.org/2016/12/introspecting-namespace-
> > relationships.html


More information about the Strace-devel mailing list