[Gluster-users] libgfapi failover problem on replica bricks
Paul Penev
ppquant at gmail.com
Mon Apr 21 16:28:38 UTC 2014
Here is the session log from the testing. Unfortunately there is
little in the logs.
1. KVM running on server 15. bricks are on server 14 and 15. Killing
glusterfsd on server 14
1.1. killall -KILL glusterfsd
It has no change to log anything, so the logs below are from the
gluster server restart.
The KVM machine has process PID 213620
[2014-04-21 16:01:13.434638] I [glusterfsd.c:1910:main]
0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version
3.4.3 (/usr/sbin/glusterfsd -s s14 --volfile-id
gtest.s14.var-gtest-brick -p
/var/lib/glusterd/vols/gtest/run/s14-var-gtest-brick.pid -S
/var/run/358a7d1bdc05c7a60f4379b9e54fac65.socket --brick-name
/var/gtest/brick -l /var/log/glusterfs/bricks/var-gtest-brick.log
--xlator-option
*-posix.glusterd-uuid=84eb547a-6086-4795-a543-e9dad6fe78d1
--brick-port 49153 --xlator-option gtest-server.listen-port=49153)
[2014-04-21 16:01:13.436112] I [socket.c:3480:socket_init]
0-socket.glusterfsd: SSL support is NOT enabled
[2014-04-21 16:01:13.436144] I [socket.c:3495:socket_init]
0-socket.glusterfsd: using system polling thread
[2014-04-21 16:01:13.436245] I [socket.c:3480:socket_init]
0-glusterfs: SSL support is NOT enabled
[2014-04-21 16:01:13.436256] I [socket.c:3495:socket_init]
0-glusterfs: using system polling thread
[2014-04-21 16:01:16.982974] I [graph.c:239:gf_add_cmdline_options]
0-gtest-server: adding option 'listen-port' for volume 'gtest-server'
with value '49153'
[2014-04-21 16:01:16.982992] I [graph.c:239:gf_add_cmdline_options]
0-gtest-posix: adding option 'glusterd-uuid' for volume 'gtest-posix'
with value '84eb547a-6086-4795-a543-e9dad6fe78d1'
[2014-04-21 16:01:16.984083] W [options.c:848:xl_opt_validate]
0-gtest-server: option 'listen-port' is deprecated, preferred is
'transport.socket.listen-port', continuing with correction
[2014-04-21 16:01:16.984113] I [socket.c:3480:socket_init]
0-tcp.gtest-server: SSL support is NOT enabled
[2014-04-21 16:01:16.984121] I [socket.c:3495:socket_init]
0-tcp.gtest-server: using system polling thread
Given volfile:
+------------------------------------------------------------------------------+
1: volume gtest-posix
2: type storage/posix
3: option volume-id 083fea96-47e7-4904-a3d3-500e226786d2
4: option directory /var/gtest/brick
5: end-volume
6:
7: volume gtest-access-control
8: type features/access-control
9: subvolumes gtest-posix
10: end-volume
11:
12: volume gtest-locks
13: type features/locks
14: subvolumes gtest-access-control
15: end-volume
16:
17: volume gtest-io-threads
18: type performance/io-threads
19: subvolumes gtest-locks
20: end-volume
21:
22: volume gtest-index
23: type features/index
24: option index-base /var/gtest/brick/.glusterfs/indices
25: subvolumes gtest-io-threads
26: end-volume
27:
28: volume gtest-marker
29: type features/marker
30: option quota off
31: option xtime off
32: option timestamp-file /var/lib/glusterd/vols/gtest/marker.tstamp
33: option volume-uuid 083fea96-47e7-4904-a3d3-500e226786d2
34: subvolumes gtest-index
35: end-volume
36:
37: volume /var/gtest/brick
38: type debug/io-stats
39: option count-fop-hits off
40: option latency-measurement off
41: option log-level INFO
42: subvolumes gtest-marker
43: end-volume
44:
45: volume gtest-server
46: type protocol/server
47: option statedump-path /tmp
48: option auth.addr./var/gtest/brick.allow *
49: option
auth.login.734ec716-f516-4b95-93b0-2714373f25c4.password
700527a5-a1d1-4c14-97ac-0c640c9467b1
50: option auth.login./var/gtest/brick.allow
734ec716-f516-4b95-93b0-2714373f25c4
51: option transport-type tcp
52: subvolumes /var/gtest/brick
53: end-volume
+------------------------------------------------------------------------------+
[2014-04-21 16:01:17.142466] I
[server-handshake.c:567:server_setvolume] 0-gtest-server: accepted
client from s15-213620-2014/04/21-09:53:17:688030-gtest-client-0-0
(version: 3.4.3)
This is all. I now wait for heal to finish.
2. Heal is finished. I kill the glusterfsd on server 15:
This kills the KVM machine. nothing is in the logs. Dmesg from the KVM is:
Write-error on swap-device (252:0:96627104)
Write-error on swap-device (252:0:96627112)
Write-error on swap-device (252:0:96627120)
Write-error on swap-device (252:0:96627128)
Write-error on swap-device (252:0:96627136)
Write-error on swap-device (252:0:96627144)
Write-error on swap-device (252:0:96627152)
Write-error on swap-device (252:0:96627160)
Write-error on swap-device (252:0:96627168)
Write-error on swap-device (252:0:96627176)
Write-error on swap-device (252:0:96627184)
Write-error on swap-device (252:0:96627192)
Write-error on swap-device (252:0:96627200)
Write-error on swap-device (252:0:96627208)
Write-error on swap-device (252:0:96627216)
Write-error on swap-device (252:0:96627224)
Write-error on swap-device (252:0:96627232)
Write-error on swap-device (252:0:96627240)
Write-error on swap-device (252:0:96627248)
Write-error on swap-device (252:0:96627256)
Write-error on swap-device (252:0:96627264)
Write-error on swap-device (252:0:96627272)
Write-error on swap-device (252:0:96627280)
Write-error on swap-device (252:0:96627288)
Write-error on swap-device (252:0:96627296)
Write-error on swap-device (252:0:96627304)
Write-error on swap-device (252:0:96627312)
Write-error on swap-device (252:0:96627320)
Write-error on swap-device (252:0:96627328)
Write-error on swap-device (252:0:96627336)
Write-error on swap-device (252:0:96627344)
------------[ cut here ]------------
WARNING: at block/blk-core.c:398 blk_start_queue+0x88/0x90() (Not tainted)
Hardware name: Bochs
Modules linked in: vzethdev vznetdev pio_nfs pio_direct pfmt_raw
pfmt_ploop1 ploop simfs vzrst nf_nat nf_conntrack_ipv4 nf_defrag_ipv4
vzcpt nf_conntrack vzdquota vzmon vzdev ip6t_REJECT ip6table_mangle
ip6table_filter ip6_tables xt_length xt_hl xt_tcpmss xt_TCPMSS
iptable_mangle iptable_filter xt_multiport xt_limit xt_dscp ipt_REJECT
ip_tables vhost_net tun macvtap macvlan vzevent ib_iser rdma_cm
ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp
libiscsi scsi_transport_iscsi nfsd nfs nfs_acl auth_rpcgss fscache
lockd sunrpc ipv6 fuse snd_pcsp snd_pcm snd_page_alloc snd_timer
i2c_piix4 snd i2c_core soundcore virtio_balloon parport_pc parport
ext4 jbd2 mbcache sg virtio_net virtio_blk ata_generic pata_acpi
virtio_pci virtio_ring virtio ata_piix [last unloaded: scsi_wait_scan]
Pid: 0, comm: swapper veid: 0 Not tainted 2.6.32-28-pve #1
Call Trace:
<IRQ> [<ffffffff81075fe8>] ? warn_slowpath_common+0x88/0xc0
[<ffffffff8107603a>] ? warn_slowpath_null+0x1a/0x20
[<ffffffff8126dae8>] ? blk_start_queue+0x88/0x90
[<ffffffffa00252c8>] ? blk_done+0xe8/0x110 [virtio_blk]
[<ffffffffa00112ac>] ? vring_interrupt+0x3c/0xd0 [virtio_ring]
[<ffffffff810f9b80>] ? handle_IRQ_event+0x60/0x170
[<ffffffff810fc52e>] ? handle_edge_irq+0xde/0x180
[<ffffffff81010219>] ? handle_irq+0x49/0xa0
[<ffffffff81564ddc>] ? do_IRQ+0x6c/0xf0
[<ffffffff8100ba93>] ? ret_from_intr+0x0/0x11
<EOI> [<ffffffff8103fb0b>] ? native_safe_halt+0xb/0x10
[<ffffffff81016e2d>] ? default_idle+0x4d/0xb0
[<ffffffff8100a013>] ? cpu_idle+0xb3/0x110
[<ffffffff815407b5>] ? rest_init+0x85/0x90
[<ffffffff81c34f82>] ? start_kernel+0x412/0x41e
[<ffffffff81c3433a>] ? x86_64_start_reservations+0x125/0x129
[<ffffffff81c34453>] ? x86_64_start_kernel+0x115/0x124
---[ end trace d257e46e0ec99bba ]---
Tainting kernel with flag 0x9
Pid: 0, comm: swapper veid: 0 Not tainted 2.6.32-28-pve #1
Call Trace:
<IRQ> [<ffffffff81075e45>] ? add_taint+0x35/0x70
[<ffffffff81075ff5>] ? warn_slowpath_common+0x95/0xc0
[<ffffffff8107603a>] ? warn_slowpath_null+0x1a/0x20
[<ffffffff8126dae8>] ? blk_start_queue+0x88/0x90
[<ffffffffa00252c8>] ? blk_done+0xe8/0x110 [virtio_blk]
[<ffffffffa00112ac>] ? vring_interrupt+0x3c/0xd0 [virtio_ring]
[<ffffffff810f9b80>] ? handle_IRQ_event+0x60/0x170
[<ffffffff810fc52e>] ? handle_edge_irq+0xde/0x180
[<ffffffff81010219>] ? handle_irq+0x49/0xa0
[<ffffffff81564ddc>] ? do_IRQ+0x6c/0xf0
[<ffffffff8100ba93>] ? ret_from_intr+0x0/0x11
<EOI> [<ffffffff8103fb0b>] ? native_safe_halt+0xb/0x10
[<ffffffff81016e2d>] ? default_idle+0x4d/0xb0
[<ffffffff8100a013>] ? cpu_idle+0xb3/0x110
[<ffffffff815407b5>] ? rest_init+0x85/0x90
[<ffffffff81c34f82>] ? start_kernel+0x412/0x41e
[<ffffffff81c3433a>] ? x86_64_start_reservations+0x125/0x129
[<ffffffff81c34453>] ? x86_64_start_kernel+0x115/0x124
------------[ cut here ]------------
WARNING: at block/blk-core.c:245 blk_remove_plug+0x98/0xa0() (Tainted:
G W --------------- )
Hardware name: Bochs
Modules linked in: vzethdev vznetdev pio_nfs pio_direct pfmt_raw
pfmt_ploop1 ploop simfs vzrst nf_nat nf_conntrack_ipv4 nf_defrag_ipv4
vzcpt nf_conntrack vzdquota vzmon vzdev ip6t_REJECT ip6table_mangle
ip6table_filter ip6_tables xt_length xt_hl xt_tcpmss xt_TCPMSS
iptable_mangle iptable_filter xt_multiport xt_limit xt_dscp ipt_REJECT
ip_tables vhost_net tun macvtap macvlan vzevent ib_iser rdma_cm
ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp
libiscsi scsi_transport_iscsi nfsd nfs nfs_acl auth_rpcgss fscache
lockd sunrpc ipv6 fuse snd_pcsp snd_pcm snd_page_alloc snd_timer
i2c_piix4 snd i2c_core soundcore virtio_balloon parport_pc parport
ext4 jbd2 mbcache sg virtio_net virtio_blk ata_generic pata_acpi
virtio_pci virtio_ring virtio ata_piix [last unloaded: scsi_wait_scan]
Pid: 0, comm: swapper veid: 0 Tainted: G W ---------------
2.6.32-28-pve #1
Call Trace:
<IRQ> [<ffffffff81075fe8>] ? warn_slowpath_common+0x88/0xc0
[<ffffffff8107603a>] ? warn_slowpath_null+0x1a/0x20
[<ffffffff8126c218>] ? blk_remove_plug+0x98/0xa0
[<ffffffff8126c236>] ? __blk_run_queue+0x16/0x40
[<ffffffff8126daa5>] ? blk_start_queue+0x45/0x90
[<ffffffffa00252c8>] ? blk_done+0xe8/0x110 [virtio_blk]
[<ffffffffa00112ac>] ? vring_interrupt+0x3c/0xd0 [virtio_ring]
[<ffffffff810f9b80>] ? handle_IRQ_event+0x60/0x170
[<ffffffff810fc52e>] ? handle_edge_irq+0xde/0x180
[<ffffffff81010219>] ? handle_irq+0x49/0xa0
[<ffffffff81564ddc>] ? do_IRQ+0x6c/0xf0
[<ffffffff8100ba93>] ? ret_from_intr+0x0/0x11
<EOI> [<ffffffff8103fb0b>] ? native_safe_halt+0xb/0x10
[<ffffffff81016e2d>] ? default_idle+0x4d/0xb0
[<ffffffff8100a013>] ? cpu_idle+0xb3/0x110
[<ffffffff815407b5>] ? rest_init+0x85/0x90
[<ffffffff81c34f82>] ? start_kernel+0x412/0x41e
[<ffffffff81c3433a>] ? x86_64_start_reservations+0x125/0x129
[<ffffffff81c34453>] ? x86_64_start_kernel+0x115/0x124
---[ end trace d257e46e0ec99bbb ]---
Tainting kernel with flag 0x9
Pid: 0, comm: swapper veid: 0 Tainted: G W ---------------
2.6.32-28-pve #1
Call Trace:
<IRQ> [<ffffffff81075e45>] ? add_taint+0x35/0x70
[<ffffffff81075ff5>] ? warn_slowpath_common+0x95/0xc0
[<ffffffff8107603a>] ? warn_slowpath_null+0x1a/0x20
[<ffffffff8126c218>] ? blk_remove_plug+0x98/0xa0
[<ffffffff8126c236>] ? __blk_run_queue+0x16/0x40
[<ffffffff8126daa5>] ? blk_start_queue+0x45/0x90
[<ffffffffa00252c8>] ? blk_done+0xe8/0x110 [virtio_blk]
[<ffffffffa00112ac>] ? vring_interrupt+0x3c/0xd0 [virtio_ring]
[<ffffffff810f9b80>] ? handle_IRQ_event+0x60/0x170
[<ffffffff810fc52e>] ? handle_edge_irq+0xde/0x180
[<ffffffff81010219>] ? handle_irq+0x49/0xa0
[<ffffffff81564ddc>] ? do_IRQ+0x6c/0xf0
[<ffffffff8100ba93>] ? ret_from_intr+0x0/0x11
<EOI> [<ffffffff8103fb0b>] ? native_safe_halt+0xb/0x10
[<ffffffff81016e2d>] ? default_idle+0x4d/0xb0
[<ffffffff8100a013>] ? cpu_idle+0xb3/0x110
[<ffffffff815407b5>] ? rest_init+0x85/0x90
[<ffffffff81c34f82>] ? start_kernel+0x412/0x41e
[<ffffffff81c3433a>] ? x86_64_start_reservations+0x125/0x129
[<ffffffff81c34453>] ? x86_64_start_kernel+0x115/0x124
Read-error on swap-device (252:0:96493000)
Read-error on swap-device (252:0:96493008)
Read-error on swap-device (252:0:96493016)
Read-error on swap-device (252:0:96493024)
Read-error on swap-device (252:0:96493032)
Read-error on swap-device (252:0:96493040)
Read-error on swap-device (252:0:96493048)
Read-error on swap-device (252:0:96493056)
Read-error on swap-device (252:0:96493448)
Read-error on swap-device (252:0:96493456)
Read-error on swap-device (252:0:96493464)
Read-error on swap-device (252:0:96493472)
Read-error on swap-device (252:0:96493480)
Read-error on swap-device (252:0:96493488)
Read-error on swap-device (252:0:96493496)
Read-error on swap-device (252:0:96493504)
Read-error on swap-device (252:0:96492424)
Read-error on swap-device (252:0:96492432)
Read-error on swap-device (252:0:96492440)
Read-error on swap-device (252:0:96492448)
Read-error on swap-device (252:0:96492456)
Read-error on swap-device (252:0:96492464)
Read-error on swap-device (252:0:96492472)
Read-error on swap-device (252:0:96492480)
EXT4-fs error (device vda1): ext4_discard_preallocations: Error
loading buddy information for 129
Aborting journal on device vda1-8.
EXT4-fs (vda1): Remounting filesystem read-only
EXT4-fs error (device vda1): ext4_discard_preallocations: Error
loading buddy information for 129
EXT4-fs error (device vda1): ext4_discard_preallocations: Error
loading buddy information for 195
Read-error on swap-device (252:0:96626632)
Read-error on swap-device (252:0:96626640)
Read-error on swap-device (252:0:96626648)
Read-error on swap-device (252:0:96626656)
Read-error on swap-device (252:0:96626664)
Read-error on swap-device (252:0:96626672)
Read-error on swap-device (252:0:96626680)
Read-error on swap-device (252:0:96626688)
This is all. Not much I'm afraid.
Paul
2014-04-21 18:21 GMT+02:00 Paul Penev <ppquant at gmail.com>:
> Joe,
> it will take some time for redo the logs (I'm doing them now).
> While waiting for the heal to complete I did some research on the
> version history of qemu.
>
> There's a patch made two months ago that intrigues me:
>
> https://github.com/qemu/qemu/commit/adccfbcd6020e928db93b2b4faf0dbd05ffbe016
>
> Also the changelog
> https://github.com/qemu/qemu/commits/master/block/gluster.c seems to
> support some work done recently in gluster.c
>
> keep tuned for the logs from the bricks (but I don't expect much there).
More information about the Gluster-users
mailing list