strace of io_uring events?

Jens Axboe axboe at
Tue Jul 21 19:48:06 UTC 2020

On 7/21/20 1:44 PM, Andy Lutomirski wrote:
> On Tue, Jul 21, 2020 at 11:39 AM Jens Axboe <axboe at> wrote:
>> On 7/21/20 11:44 AM, Andy Lutomirski wrote:
>>> On Tue, Jul 21, 2020 at 10:30 AM Jens Axboe <axboe at> wrote:
>>>> On 7/21/20 11:23 AM, Andy Lutomirski wrote:
>>>>> On Tue, Jul 21, 2020 at 8:31 AM Jens Axboe <axboe at> wrote:
>>>>>> On 7/21/20 9:27 AM, Andy Lutomirski wrote:
>>>>>>> On Fri, Jul 17, 2020 at 1:02 AM Stefano Garzarella <sgarzare at> wrote:
>>>>>>>> On Thu, Jul 16, 2020 at 08:12:35AM -0700, Kees Cook wrote:
>>>>>>>>> On Thu, Jul 16, 2020 at 03:14:04PM +0200, Stefano Garzarella wrote:
>>>>>>>>> access (IIUC) is possible without actually calling any of the io_uring
>>>>>>>>> syscalls. Is that correct? A process would receive an fd (via SCM_RIGHTS,
>>>>>>>>> pidfd_getfd, or soon seccomp addfd), and then call mmap() on it to gain
>>>>>>>>> access to the SQ and CQ, and off it goes? (The only glitch I see is
>>>>>>>>> waking up the worker thread?)
>>>>>>>> It is true only if the io_uring istance is created with SQPOLL flag (not the
>>>>>>>> default behaviour and it requires CAP_SYS_ADMIN). In this case the
>>>>>>>> kthread is created and you can also set an higher idle time for it, so
>>>>>>>> also the waking up syscall can be avoided.
>>>>>>> I stared at the io_uring code for a while, and I'm wondering if we're
>>>>>>> approaching this the wrong way. It seems to me that most of the
>>>>>>> complications here come from the fact that io_uring SQEs don't clearly
>>>>>>> belong to any particular security principle.  (We have struct creds,
>>>>>>> but we don't really have a task or mm.)  But I'm also not convinced
>>>>>>> that io_uring actually supports cross-mm submission except by accident
>>>>>>> -- as it stands, unless a user is very careful to only submit SQEs
>>>>>>> that don't use user pointers, the results will be unpredictable.
>>>>>> How so?
>>>>> Unless I've missed something, either current->mm or sqo_mm will be
>>>>> used depending on which thread ends up doing the IO.  (And there might
>>>>> be similar issues with threads.)  Having the user memory references
>>>>> end up somewhere that is an implementation detail seems suboptimal.
>>>> current->mm is always used from the entering task - obviously if done
>>>> synchronously, but also if it needs to go async. The only exception is a
>>>> setup with SQPOLL, in which case ctx->sqo_mm is the task that set up the
>>>> ring. SQPOLL requires root privileges to setup, and there's no task
>>>> entering the io_uring at all necessarily. It'll just submit sqes with
>>>> the credentials that are registered with the ring.
>>> Really?  I admit I haven't fully followed how the code works, but it
>>> looks like anything that goes through the io_queue_async_work() path
>>> will use sqo_mm, and can't most requests that end up blocking end up
>>> there?  It looks like, even if SQPOLL is not set, the mm used will
>>> depend on whether the request ends up blocking and thus getting queued
>>> for later completion.
>>> Or does some magic I missed make this a nonissue.
>> No, you are wrong. The logic works as I described it.
> Can you enlighten me?  I don't see any iov_iter_get_pages() calls or
> equivalents.  If an IO is punted, how does the data end up in the
> io_uring_enter() caller's mm?

If the SQE needs to be punted to an io-wq worker, then
io_prep_async_work() is ultimately called before it's queued with the
io-wq worker. That grabs anything we need to successfully process this
request, user access and all. io-wq then assumes the right "context" to
performn that request. As the async punt is always done on behalf of the
task that is submitting the IO (via io_uring_enter()), that is the
context that we grab and use for that particular request.

You keep looking at ctx->sqo_mm, and I've told you several times that
it's only related to the SQPOLL thread. If you don't use SQPOLL, no
request will ever use it.

Jens Axboe

More information about the Strace-devel mailing list