[cfarm-admins] gcc202 is occasionally returning EIO from fdatasync(2)

Anatoly Pugachev matorola at gmail.com
Sat May 8 08:06:00 UTC 2021


So, I took a retrospective view on logs from my running ldoms...

Summary:
- it does not tied to specific filesystem, as gcc202 /home was ext4
earlier and currently live on xfs (migrated recently)
- strace test suite (futex in logs) seems a good candidate to try to
catch the issue with sunvdc
- it does not tied to specific LDOM configuration, some ldoms live on
raw zfs volumes (gcc202), some ldoms have its disk devices living in a
files on zfs  filesystem (ttip, buildd)

I'm still going to make tests/checks with xfstests, strace and
stress-ng trying to catch the issue.

Detailed logs:

deimos == management / control LDOM
cfgcc == gcc202
ttip == my test sparc64 ldom

sysadmin at deimos:/var/log/vntsd/cfgcc$ grep -c blk_update *
console-log:8
console-log.0:20
console-log.1:13
console-log.2:1
console-log.3:0
console-log.4:1
console-log.5:1
sysadmin at deimos:/var/log/vntsd/cfgcc$ ls -l
total 70103
-rw-r-----   1 root     root       90002 May  7 23:05 console-log
-rw-r-----   1 root     root      185171 May  3 19:26 console-log.0
-rw-r-----   1 root     root     28291781 Apr  3 23:19 console-log.1
-rw-r-----   1 root     root     4157903 Mar  4 23:03 console-log.2
-rw-r-----   1 root     root      605083 Jan 29 20:51 console-log.3
-rw-r-----   1 root     root      522874 Dec 30 22:40 console-log.4
-rw-r-----   1 root     root     1641315 Dec  4 23:09 console-log.5

from console-log.5:
[    0.000056] Linux version 5.9.0-3-sparc64-smp
(debian-kernel at lists.debian.org) (gcc-10 (Debian 10.2.0-17) 10.2.0,
GNU ld (GNU Binutils for Debian) 2.35.1) #1 SMP Debian 5.9.9-1
(2020-11-19)
[101790.136672] sunvdc: vdc_tx_trigger() failure, err=-11
[101790.136746] blk_update_request: I/O error, dev vdiska, sector
2174003304 op 0x1:(WRITE) flags 0x107000 phys_seg 18 prio class 0
[101790.136764] Buffer I/O error on dev dm-0, logical block 268449293,
lost async page write
[101790.136786] Buffer I/O error on dev dm-0, logical block 268449294,
lost async page write
[101790.136796] Buffer I/O error on dev dm-0, logical block 268449295,
lost async page write
[101790.136806] Buffer I/O error on dev dm-0, logical block 268449296,
lost async page write
[101790.136816] Buffer I/O error on dev dm-0, logical block 268449297,
lost async page write
[101790.136826] Buffer I/O error on dev dm-0, logical block 268449298,
lost async page write
[101790.136835] Buffer I/O error on dev dm-0, logical block 268449299,
lost async page write
[101790.136845] Buffer I/O error on dev dm-0, logical block 268449300,
lost async page write
[101790.136854] Buffer I/O error on dev dm-0, logical block 268449301,
lost async page write
[101790.136864] Buffer I/O error on dev dm-0, logical block 268449302,
lost async page write
[101790.136917] EXT4-fs error (device dm-0):
ext4_check_bdev_write_error:215: comm systemd-journal: Error while
async write back metadata
[101790.200384] Aborting journal on device dm-0-8.
[101790.257119] EXT4-fs (dm-0): Remounting filesystem read-only
[101790.257142] EXT4-fs error (device dm-0) in
ext4_reserve_inode_write:5667: Journal has aborted
[101790.257240] EXT4-fs error (device dm-0):
ext4_journal_check_start:83: Detected aborted journal
[101790.259239] EXT4-fs error (device dm-0):
ext4_journal_check_start:83: Detected aborted journal
[101790.287445] EXT4-fs error (device dm-0): ext4_dirty_inode:5869:
inode #8782187: comm systemd-journal: mark_inode_dirty error

from console-log.4:
[    0.000057] Linux version 5.9.0-4-sparc64-smp
(debian-kernel at lists.debian.org) (gcc-10 (Debian 10.2.0-19) 10.2.0,
GNU ld (GNU Binutils for Debian) 2.35.1) #1 SMP Debian 5.9.11-1
(2020-11-27)
[493284.878035] futex_wake_op: futex tries to shift op by -518; fix this program
[493287.765631] sunvdc: vdc_tx_trigger() failure, err=-11
[493287.765704] blk_update_request: I/O error, dev vdiska, sector
1049830096 op 0x1:(WRITE) flags 0x3000 phys_seg 2 prio class 0
[493287.765736] Buffer I/O error on dev dm-0, logical block 127927642,
lost async page write
[493287.765760] Buffer I/O error on dev dm-0, logical block 127927643,
lost async page write
[493287.765787] EXT4-fs error (device dm-0):
ext4_check_bdev_write_error:215: comm strace: Error while async write
back metadata
[493289.006197] Aborting journal on device dm-0-8.
[493289.006244] EXT4-fs error (device dm-0) in ext4_evict_inode:249:
Journal has aborted
[493289.006248] EXT4-fs error (device dm-0) in ext4_evict_inode:249:
Journal has aborted
[493289.222338] EXT4-fs error (device dm-0):
ext4_journal_check_start:83: Detected aborted journal
[493289.222409] EXT4-fs (dm-0): Remounting filesystem read-only

from console-log.2:
[    0.000060] Linux version 5.10.0-3-sparc64-smp
(debian-kernel at lists.debian.org) (gcc-10 (Debian 10.2.1-6) 10.2.1
20210110, GNU ld (GNU Binutils for Debian) 2.35.1) #1 SMP Debian
5.10.13-1 (2021-02-06)
[220118.794680] sunvdc: vdc_tx_trigger() failure, err=-11
[220118.794744] blk_update_request: I/O error, dev vdiskc, sector
789205200 op 0x1:(WRITE) flags 0x4800 phys_seg 16 prio class 0

from console-log.1:
[    0.000061] Linux version 5.10.0-4-sparc64-smp
(debian-kernel at lists.debian.org) (gcc-10 (Debian 10.2.1-6) 10.2.1
20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Debian
5.10.19-1+sparc64 (2021-03-07)
[395369.003918] futex_wake_op: futex tries to shift op by -518; fix this program
[505723.531540] sunvdc: vdc_tx_trigger() failure, err=-11
[505723.531613] blk_update_request: I/O error, dev vdiskc, sector
1833122064 op 0x1:(WRITE) flags 0x1000 phys_seg 1 prio class 0
[505723.547017] sunvdc: vdc_tx_trigger() failure, err=-11
[505723.547068] blk_update_request: I/O error, dev vdiskc, sector
1833239920 op 0x1:(WRITE) flags 0x1000 phys_seg 1 prio class 0
[505723.562462] sunvdc: vdc_tx_trigger() failure, err=-11
[505723.562513] blk_update_request: I/O error, dev vdiskc, sector
1833284832 op 0x1:(WRITE) flags 0x1000 phys_seg 1 prio class 0
[505723.567732] XFS (dm-0): metadata I/O error in
"xfs_buf_ioend+0x2cc/0x640 [xfs]" at daddr 0x6d433110 len 8 error 5
[564991.789167] futex_atomic_op_inuser: 2 callbacks suppressed
[564991.789176] futex_wake_op: futex tries to shift op by -849; fix this program

[    0.000061] Linux version 5.10.0-4-sparc64-smp
(debian-kernel at lists.debian.org) (gcc-10 (Debian 10.2.1-6) 10.2.1
20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Debian
5.10.19-1+sparc64 (2021-03-07)
[606538.865338] sunvdc: vdc_tx_trigger() failure, err=-11
[606538.865412] blk_update_request: I/O error, dev vdiskc, sector
1760253248 op 0x1:(WRITE) flags 0x1000 phys_seg 4 prio class 0
[606538.869736] XFS (dm-0): metadata I/O error in
"xfs_buf_ioend+0x2cc/0x640 [xfs]" at daddr 0x68eb4d40 len 32 error 5
[728702.059059] sunvdc: vdc_tx_trigger() failure, err=-11
[728702.059131] blk_update_request: I/O error, dev vdiskc, sector
1137021984 op 0x1:(WRITE) flags 0x4800 phys_seg 17 prio class 0
[728702.215048] dm-0: writeback error on inode 1157860231, offset 0,
sector 1136988968

[    0.000061] Linux version 5.10.0-4-sparc64-smp
(debian-kernel at lists.debian.org) (gcc-10 (Debian 10.2.1-6) 10.2.1
20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Debian
5.10.19-1+sparc64 (2021-03-07)
[1605065.277545] futex_wake_op: futex tries to shift op by -518; fix
this program
[1605090.203207] sunvdc: vdc_tx_trigger() failure, err=-11
[1605090.203295] blk_update_request: I/O error, dev vdiskc, sector
1087405072 op 0x1:(WRITE) flags 0x4800 phys_seg 16 prio class 0
[1605090.219790] sunvdc: vdc_tx_trigger() failure, err=-11
[1605090.219871] blk_update_request: I/O error, dev vdiskc, sector
1087405328 op 0x1:(WRITE) flags 0x4800 phys_seg 16 prio class 0
[1605090.235954] sunvdc: vdc_tx_trigger() failure, err=-11
[1605090.236016] blk_update_request: I/O error, dev vdiskc, sector
1074363797 op 0x1:(WRITE) flags 0x9800 phys_seg 4 prio class 0
[1605090.236059] XFS (dm-0): log I/O error -5
[1605090.236086] XFS (dm-0): xfs_do_force_shutdown(0x2) called from
line 1196 of file fs/xfs/xfs_log.c. Return address = 0000000076933e51
[1605090.236097] XFS (dm-0): Log I/O Error Detected. Shutting down filesystem
[1605090.236106] XFS (dm-0): Please unmount the filesystem and rectify
the problem(s)
[1605090.609638] dm-0: writeback error on inode 547115109, offset 0,
sector 547205784
[1605090.609657] dm-0: writeback error on inode 35621093, offset 0,
sector 35703240
[1605090.609707] dm-0: writeback error on inode 1085975733, offset 0,
sector 1085987480
[1605090.628900] dm-0: writeback error on inode 575365206, offset 0,
sector 575365272
[1605090.628952] dm-0: writeback error on inode 547115099, offset 0,
sector 547144576
[1605090.628977] dm-0: writeback error on inode 35621096, offset 0,
sector 35621672
[1605090.629000] dm-0: writeback error on inode 547115112, offset 0,
sector 547144600
[1605090.629038] dm-0: writeback error on inode 1085975738, offset 0,
sector 1085975808


[567060.440416] futex_wake_op: futex tries to shift op by -518; fix this program
[606538.865338] sunvdc: vdc_tx_trigger() failure, err=-11
[606538.865412] blk_update_request: I/O error, dev vdiskc, sector
1760253248 op 0x1:(WRITE) flags 0x1000 phys_seg 4 prio class 0
[606538.869736] XFS (dm-0): metadata I/O error in
"xfs_buf_ioend+0x2cc/0x640 [xfs]" at daddr 0x68eb4d40 len 32 error 5
[728702.059059] sunvdc: vdc_tx_trigger() failure, err=-11
[728702.059131] blk_update_request: I/O error, dev vdiskc, sector
1137021984 op 0x1:(WRITE) flags 0x4800 phys_seg 17 prio class 0
[728702.215048] dm-0: writeback error on inode 1157860231, offset 0,
sector 1136988968

[    0.000061] Linux version 5.10.0-4-sparc64-smp
(debian-kernel at lists.debian.org) (gcc-10 (Debian 10.2.1-6) 10.2.1
20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Debian
5.10.19-1+sparc64 (2021-03-07)
[52078.670379] process 'fixes_3_2/fpcsrc/packages/fpmake' started with
executable stack
[84700.481814] sunvdc: vdc_tx_trigger() failure, err=-11
[84700.481875] blk_update_request: I/O error, dev vdiskc, sector
1726087232 op 0x1:(WRITE) flags 0x800 phys_seg 9 prio class 0
[84700.674742] dm-0: writeback error on inode 1727211665, offset 0,
sector 1726052552



from console-log:

[3086382.281890] futex_wake_op: futex tries to shift op by -518; fix
this program
[3141976.195280] sunvdc: vdc_tx_trigger() failure, err=-11
[3141976.195444] blk_update_request: I/O error, dev vdiskc, sector
63150848 op 0x1:(WRITE) flags 0x4800 phys_seg 17 prio class 0
[3141976.377290] dm-0: writeback error on inode 32556699, offset 0,
sector 63117832
[3151124.220995] futex_atomic_op_inuser: 14 callbacks suppressed
[3151124.221001] futex_wake_op: futex tries to shift op by -849; fix
this program
...
[3151605.377426] futex_wake_op: futex tries to shift op by -518; fix
this program
[3253624.278367] sunvdc: vdc_tx_trigger() failure, err=-11
[3253624.278443] blk_update_request: I/O error, dev vdiskc, sector
159273120 op 0x1:(WRITE) flags 0x4800 phys_seg 17 prio class 0
[3253624.371989] dm-0: writeback error on inode 2148294407, offset 0,
sector 159239256
[3253918.135035] sunvdc: vdc_tx_trigger() failure, err=-11
[3253918.135105] blk_update_request: I/O error, dev vdiskc, sector
157618896 op 0x1:(WRITE) flags 0x4800 phys_seg 16 prio class 0
[3253918.226523] dm-0: writeback error on inode 155142134, offset 0,
sector 157584576
[3254125.590955] sunvdc: vdc_tx_trigger() failure, err=-11
[3254125.591023] blk_update_request: I/O error, dev vdiskc, sector
657284672 op 0x1:(WRITE) flags 0x1000 phys_seg 4 prio class 0
[3254125.606031] XFS (dm-0): metadata I/O error in
"xfs_buf_ioend+0x2cc/0x640 [xfs]" at daddr 0x272d5640 len 32 error 5


May 07 03:26:45 gcc202 kernel: sunvdc: vdc_tx_trigger() failure, err=-11
May 07 03:26:45 gcc202 kernel: blk_update_request: I/O error, dev
vdiskc, sector 159273120 op 0x1:(WRITE) flags 0x4800 phys_seg 17 prio
class 0
May 07 03:31:39 gcc202 kernel: dm-0: writeback error on inode
2148294407, offset 0, sector 159239256
May 07 03:31:39 gcc202 kernel: sunvdc: vdc_tx_trigger() failure, err=-11
May 07 03:31:39 gcc202 kernel: blk_update_request: I/O error, dev
vdiskc, sector 157618896 op 0x1:(WRITE) flags 0x4800 phys_seg 16 prio
class 0
May 07 03:35:06 gcc202 kernel: dm-0: writeback error on inode
155142134, offset 0, sector 157584576
May 07 03:35:06 gcc202 kernel: sunvdc: vdc_tx_trigger() failure, err=-11
May 07 03:35:06 gcc202 kernel: blk_update_request: I/O error, dev
vdiskc, sector 657284672 op 0x1:(WRITE) flags 0x1000 phys_seg 4 prio
class 0
May 07 03:35:06 gcc202 kernel: XFS (dm-0): metadata I/O error in
"xfs_buf_ioend+0x2cc/0x640 [xfs]" at daddr 0x272d5640 len 32 error 5


[    0.000060] Linux version 5.10.0-6-sparc64-smp
(debian-kernel at lists.debian.org) (gcc-10 (Debian 10.2.1-6) 10.2.1
20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Debian
5.10.28-1 (2021-04-09)
May 07 11:21:40 gcc202 kernel: xfs filesystem being mounted at /home
supports timestamps until 2038 (0x7fffffff)
May 07 21:50:54 gcc202 kernel: sunvdc: vdc_tx_trigger() failure, err=-11
May 07 21:50:54 gcc202 kernel: blk_update_request: I/O error, dev
vdiskc, sector 1764819232 op 0x1:(WRITE) flags 0x1000 phys_seg 2 prio
class 0
May 07 21:50:54 gcc202 kernel: sunvdc: vdc_tx_trigger() failure, err=-11
May 07 21:50:54 gcc202 kernel: blk_update_request: I/O error, dev
vdiskc, sector 1764819328 op 0x1:(WRITE) flags 0x1000 phys_seg 8 prio
class 0
May 07 21:50:54 gcc202 kernel: sunvdc: vdc_tx_trigger() failure, err=-11
May 07 21:50:54 gcc202 kernel: blk_update_request: I/O error, dev
vdiskc, sector 1764821440 op 0x1:(WRITE) flags 0x1000 phys_seg 4 prio
class 0
May 07 21:50:54 gcc202 kernel: sunvdc: vdc_tx_trigger() failure, err=-11
May 07 21:50:54 gcc202 kernel: blk_update_request: I/O error, dev
vdiskc, sector 1764824576 op 0x1:(WRITE) flags 0x1000 phys_seg 2 prio
class 0
May 07 21:50:54 gcc202 kernel: XFS (dm-0): metadata I/O error in
"xfs_buf_ioend+0x2cc/0x640 [xfs]" at daddr 0x6930f920 len 32 error 5



from another LDOM :
[    0.000070] Linux version 5.12.0-rc3 (mator at ttip) (gcc (Debian
10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2)
#199 SMP Mon Mar 15 13:04:24 MSK 2021
[ 6008.263499] run fstests generic/323 at 2021-03-20 19:58:13
[ 6008.874269] XFS (vdiskb1): Mounting V5 Filesystem
[ 6008.886915] XFS (vdiskb1): Ending clean mount
[ 6008.888990] xfs filesystem being mounted at /1/testvol supports
timestamps until 2038 (0x7fffffff)
[ 6112.836409] sunvdc: vdc_tx_trigger() failure, err=-11
[ 6112.836548] blk_update_request: I/O error, dev vdiskb, sector
1097992 op 0x0:(READ) flags 0x0 phys_seg 8 prio class 0

and I was unable to reproduce to reproduce it with simply run
"xfstests-dev$./check generic/323"

and there's quite occasional almost same blk_update errors on another
ldom (buildd - debian sid build)


More information about the Strace-devel mailing list