bugfix for strace for less-aligned kernel memory

Dmitry V. Levin ldv at altlinux.org
Tue Mar 31 22:04:50 UTC 2015


On Thu, Mar 19, 2015 at 03:48:35PM -0500, Bolo wrote:
> This bugfix / patch consists of two portions
> 
> 1) A 'Z' option which forces strace to use ptrace() memory access
> routines all the time instead of vm-based memory access routines if
> they are present..  This allows a fallback if you find that VM based
> requests are failing or producing incorrect output.   The existing
> ptrace based methods are quite robust and tested :-)

As there are no known issues with kernel implementation of process_vm_readv,
why should anybody want to disable fast process_vm_readv based code
in favour of slow PTRACE_PEEKDATA based one?

> 2) A correction for a long-term bug in th VM based memory access
> method of strace.  This bug has been in strace in various forms since
> at least 4.5.19.
> 
> The ptrace based code is correct.

Well, it was a genuine bug in process_vm_readv based code, thank you for
reporting it, although I haven't got it from your description and patch
after the first read.

It's fixed by commit v4.10-56-g4832134 which also contains a regression
test.  There is also a follow-up cleanup commit v4.10-57-gea1fea6.

> What happens is that the vm based access code in strace doesn't deal
> correctly with alignments of data in the kernel.   It assumes that
> everything is at a certain alignment.  This is an incorrrect assumption;
> when there are a lot of objects the kernel starts putting string data at
> lesser aligned  addresses.   I discovered this through debugging
> of strace failures and finding the kernel relaxing alignment constraints.
> This is true of older (such as rhel 5.9 and newer such as rhel -6.6 kernels).

I've tried to reproduce alignment issues with process_vm_readv from your
description, but without success.

> There was also an issue with the code not treating page boundaries
> correctly due to incorrect page arithmetic.

The actual bug was using wrong address for the page arithmetic.

> When this alignment constraint is relaxed, or strace incorrectly issues
> reads across page boundaries, strace fails with a error of
> 
> 	"umovestr: short read (%d < %d) @0x%lx"
> 
> due to the incorect code.  In addition, since the vm-based code doesn't
> correctly update the address and lengths of the region to be accessed,
> the fallback code -- which is implemented correctly, fails to work.

There is no fallback to PTRACE_PEEKDATA based code in case when
process_vm_readv has read at least one byte.

> The new code also deals with arbitrary page sizes correctly in extracting
> data using the vm mode, instead of relying upon a 4k pagesize.
> 
> 3) Reproducing this error depends upon where the kernel is putting items
> in memory, and may also be based on the load of the kernel.   As such,
> this error can be difficult to reproduce.

Actually, the reproducer is quite simple:
http://sourceforge.net/p/strace/code/ci/HEAD/tree/tests/umovestr.c


-- 
ldv
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 181 bytes
Desc: not available
URL: <http://lists.strace.io/pipermail/strace-devel/attachments/20150401/eb8b98b4/attachment.bin>


More information about the Strace-devel mailing list