Q: tests/ioctl_kvm_run_common.c failing with KVM_EXIT_EXCEPTION

Wed Jul 10 23:42:21 UTC 2019

On Thu, 11 Jul 2019 00:19:50 +0300, "Dmitry V. Levin" <ldv at altlinux.org> wrote:
> Hi,
> 
> Somebody has updated rawhide-test.fedorainfracloud.org today, and our
> ioctl_kvm_run* tests started to fail there with the following symptoms:
> 
> $ ./ioctl_kvm_run; echo \$?=$?
> ioctl(3</dev/kvm>, KVM_GET_API_VERSION, 0) = 12
> ioctl(3</dev/kvm>, KVM_CHECK_EXTENSION, KVM_CAP_USER_MEMORY) = 1
> ioctl(3</dev/kvm>, KVM_CREATE_VM, 0) = 4<anon_inode:kvm-vm>
> ioctl(4<anon_inode:kvm-vm>, KVM_SET_USER_MEMORY_REGION, {slot=0, flags=0, guest_phys_addr=0x1000, memory_size=4096, userspace_addr=0x7f72bade1000}) = 0
> ioctl(4<anon_inode:kvm-vm>, KVM_CREATE_VCPU, 0) = 5<anon_inode:kvm-vcpu:0>
> ioctl(3</dev/kvm>, KVM_GET_VCPU_MMAP_SIZE, 0) = 12288
> ioctl(3</dev/kvm>, KVM_GET_SUPPORTED_CPUID, 0x7f72badb1378) = -1 E2BIG (Argument list too long)
> ioctl(3</dev/kvm>, KVM_GET_SUPPORTED_CPUID, {nent=26, entries=[...]}) = 0
> ioctl(5<anon_inode:kvm-vcpu:0>, KVM_SET_CPUID2, {nent=0, entries=[]}) = 0
> ioctl(5<anon_inode:kvm-vcpu:0>, KVM_SET_CPUID2, {nent=26, entries=[...]}) = 0
> ioctl(5<anon_inode:kvm-vcpu:0>, KVM_SET_CPUID2, NULL) = -1 EFAULT (Bad address)
> ioctl(5<anon_inode:kvm-vcpu:0>, KVM_GET_SREGS, {cs={base=0xffff0000, limit=65535, selector=61440, type=11, present=1, dpl=0, db=0, s=1, l=0, g=0, avl=0}, ...}) = 0
> ioctl(5<anon_inode:kvm-vcpu:0>, KVM_SET_SREGS, {cs={base=0, limit=65535, selector=0, type=11, present=1, dpl=0, db=0, s=1, l=0, g=0, avl=0}, ...}) = 0
> ioctl(5<anon_inode:kvm-vcpu:0>, KVM_SET_REGS, {rax=0x2, ..., rsp=0, rbp=0, ..., rip=0x1000, rflags=0x2}) = 0
> ioctl(5<anon_inode:kvm-vcpu:0>, KVM_RUN, 0) = 0
> exit_reason = 0x1
> $?=1
> 
> exit_reason 0x1 is KVM_EXIT_EXCEPTION.
> 
> Unlike the similar issue on f30-test.fedorainfracloud.org where these
> tests fail consistently with exit_reason 0x9, this new test failure
> happens with probability around 40%.
> 
> The system where this oddness happens is
> Linux rawhide-test 5.2.0-0.rc3.git3.1.fc31.x86_64 #1 SMP Fri Jun 7 17:09:19 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
> 
> There were no such issues with that system yesterday when uname -a printed this:
> Linux rawhide-test.fedorainfracloud.org 4.19.0-0.rc8.git3.1.fc30.x86_64 #1 SMP Thu Oct 18 13:50:54 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
> 
> rawhide-test$ cat /proc/cpuinfo
> processor	: 0
> vendor_id	: GenuineIntel
> cpu family	: 6
> model		: 44
> model name	: Intel(R) Xeon(R) CPU           X5690  @ 3.47GHz
> stepping	: 2
> microcode	: 0x1
> cpu MHz		: 3466.786
> cache size	: 4096 KB
> physical id	: 0
> siblings	: 1
> core id		: 0
> cpu cores	: 1
> apicid		: 0
> initial apicid	: 0
> fpu		: yes
> fpu_exception	: yes
> cpuid level	: 11
> wp		: yes
> flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl cpuid tsc_known_freq pni pclmulqdq vmx ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes hypervisor lahf_lm pti tpr_shadow vnmi flexpriority ept vpid tsc_adjust arat
> bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds
> bogomips	: 6933.57
> clflush size	: 64
> cache_alignment	: 64
> address sizes	: 40 bits physical, 48 bits virtual
> power management:
> 
> processor	: 1
> vendor_id	: GenuineIntel
> cpu family	: 6
> model		: 44
> model name	: Intel(R) Xeon(R) CPU           X5690  @ 3.47GHz
> stepping	: 2
> microcode	: 0x1
> cpu MHz		: 3466.786
> cache size	: 4096 KB
> physical id	: 1
> siblings	: 1
> core id		: 0
> cpu cores	: 1
> apicid		: 1
> initial apicid	: 1
> fpu		: yes
> fpu_exception	: yes
> cpuid level	: 11
> wp		: yes
> flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl cpuid tsc_known_freq pni pclmulqdq vmx ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes hypervisor lahf_lm pti tpr_shadow vnmi flexpriority ept vpid tsc_adjust arat
> bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds
> bogomips	: 6933.57
> clflush size	: 64
> cache_alignment	: 64
> address sizes	: 40 bits physical, 48 bits virtual
> power management:
> 
> Any ideas what's going on and how to handle this?

I'm very sorry to be late.

It will take time to understand what happens in kvm and the hardware.

So I think it will be better to extend strace itself and the test
cases to accept behavior of kvm and the hardware.

I'm writing code to accept "exit_reason 0x9".
I will work on "exit_reason 0x1" next.

Masatake YAMATO

> 
> -- 
> ldv