[Gluster-users] libgfapi failover problem on replica bricks

Paul Penev ppquant at gmail.com
Mon Apr 21 16:28:38 UTC 2014


Here is the session log from the testing. Unfortunately there is
little in the logs.

1. KVM running on server 15. bricks are on server 14 and 15. Killing
glusterfsd on server 14
1.1. killall -KILL glusterfsd

It has no change to log anything, so the logs below are from the
gluster server restart.
The KVM machine has process PID 213620


[2014-04-21 16:01:13.434638] I [glusterfsd.c:1910:main]
0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version
3.4.3 (/usr/sbin/glusterfsd -s s14 --volfile-id
gtest.s14.var-gtest-brick -p
/var/lib/glusterd/vols/gtest/run/s14-var-gtest-brick.pid -S
/var/run/358a7d1bdc05c7a60f4379b9e54fac65.socket --brick-name
/var/gtest/brick -l /var/log/glusterfs/bricks/var-gtest-brick.log
--xlator-option
*-posix.glusterd-uuid=84eb547a-6086-4795-a543-e9dad6fe78d1
--brick-port 49153 --xlator-option gtest-server.listen-port=49153)
[2014-04-21 16:01:13.436112] I [socket.c:3480:socket_init]
0-socket.glusterfsd: SSL support is NOT enabled
[2014-04-21 16:01:13.436144] I [socket.c:3495:socket_init]
0-socket.glusterfsd: using system polling thread
[2014-04-21 16:01:13.436245] I [socket.c:3480:socket_init]
0-glusterfs: SSL support is NOT enabled
[2014-04-21 16:01:13.436256] I [socket.c:3495:socket_init]
0-glusterfs: using system polling thread
[2014-04-21 16:01:16.982974] I [graph.c:239:gf_add_cmdline_options]
0-gtest-server: adding option 'listen-port' for volume 'gtest-server'
with value '49153'
[2014-04-21 16:01:16.982992] I [graph.c:239:gf_add_cmdline_options]
0-gtest-posix: adding option 'glusterd-uuid' for volume 'gtest-posix'
with value '84eb547a-6086-4795-a543-e9dad6fe78d1'
[2014-04-21 16:01:16.984083] W [options.c:848:xl_opt_validate]
0-gtest-server: option 'listen-port' is deprecated, preferred is
'transport.socket.listen-port', continuing with correction
[2014-04-21 16:01:16.984113] I [socket.c:3480:socket_init]
0-tcp.gtest-server: SSL support is NOT enabled
[2014-04-21 16:01:16.984121] I [socket.c:3495:socket_init]
0-tcp.gtest-server: using system polling thread
Given volfile:
+------------------------------------------------------------------------------+
  1: volume gtest-posix
  2:     type storage/posix
  3:     option volume-id 083fea96-47e7-4904-a3d3-500e226786d2
  4:     option directory /var/gtest/brick
  5: end-volume
  6:
  7: volume gtest-access-control
  8:     type features/access-control
  9:     subvolumes gtest-posix
 10: end-volume
 11:
 12: volume gtest-locks
 13:     type features/locks
 14:     subvolumes gtest-access-control
 15: end-volume
 16:
 17: volume gtest-io-threads
 18:     type performance/io-threads
 19:     subvolumes gtest-locks
 20: end-volume
 21:
 22: volume gtest-index
 23:     type features/index
 24:     option index-base /var/gtest/brick/.glusterfs/indices
 25:     subvolumes gtest-io-threads
 26: end-volume
 27:
 28: volume gtest-marker
 29:     type features/marker
 30:     option quota off
 31:     option xtime off
 32:     option timestamp-file /var/lib/glusterd/vols/gtest/marker.tstamp
 33:     option volume-uuid 083fea96-47e7-4904-a3d3-500e226786d2
 34:     subvolumes gtest-index
 35: end-volume
 36:
 37: volume /var/gtest/brick
 38:     type debug/io-stats
 39:     option count-fop-hits off
 40:     option latency-measurement off
 41:     option log-level INFO
 42:     subvolumes gtest-marker
 43: end-volume
 44:
 45: volume gtest-server
 46:     type protocol/server
 47:     option statedump-path /tmp
 48:     option auth.addr./var/gtest/brick.allow *
 49:     option
auth.login.734ec716-f516-4b95-93b0-2714373f25c4.password
700527a5-a1d1-4c14-97ac-0c640c9467b1
 50:     option auth.login./var/gtest/brick.allow
734ec716-f516-4b95-93b0-2714373f25c4
 51:     option transport-type tcp
 52:     subvolumes /var/gtest/brick
 53: end-volume

+------------------------------------------------------------------------------+
[2014-04-21 16:01:17.142466] I
[server-handshake.c:567:server_setvolume] 0-gtest-server: accepted
client from s15-213620-2014/04/21-09:53:17:688030-gtest-client-0-0
(version: 3.4.3)

This is all. I now wait for heal to finish.

2. Heal is finished. I kill the glusterfsd on server 15:

This kills the KVM machine. nothing is in the logs. Dmesg from the KVM is:

Write-error on swap-device (252:0:96627104)
Write-error on swap-device (252:0:96627112)
Write-error on swap-device (252:0:96627120)
Write-error on swap-device (252:0:96627128)
Write-error on swap-device (252:0:96627136)
Write-error on swap-device (252:0:96627144)
Write-error on swap-device (252:0:96627152)
Write-error on swap-device (252:0:96627160)
Write-error on swap-device (252:0:96627168)
Write-error on swap-device (252:0:96627176)
Write-error on swap-device (252:0:96627184)
Write-error on swap-device (252:0:96627192)
Write-error on swap-device (252:0:96627200)
Write-error on swap-device (252:0:96627208)
Write-error on swap-device (252:0:96627216)
Write-error on swap-device (252:0:96627224)
Write-error on swap-device (252:0:96627232)
Write-error on swap-device (252:0:96627240)
Write-error on swap-device (252:0:96627248)
Write-error on swap-device (252:0:96627256)
Write-error on swap-device (252:0:96627264)
Write-error on swap-device (252:0:96627272)
Write-error on swap-device (252:0:96627280)
Write-error on swap-device (252:0:96627288)
Write-error on swap-device (252:0:96627296)
Write-error on swap-device (252:0:96627304)
Write-error on swap-device (252:0:96627312)
Write-error on swap-device (252:0:96627320)
Write-error on swap-device (252:0:96627328)
Write-error on swap-device (252:0:96627336)
Write-error on swap-device (252:0:96627344)
------------[ cut here ]------------
WARNING: at block/blk-core.c:398 blk_start_queue+0x88/0x90() (Not tainted)
Hardware name: Bochs
Modules linked in: vzethdev vznetdev pio_nfs pio_direct pfmt_raw
pfmt_ploop1 ploop simfs vzrst nf_nat nf_conntrack_ipv4 nf_defrag_ipv4
vzcpt nf_conntrack vzdquota vzmon vzdev ip6t_REJECT ip6table_mangle
ip6table_filter ip6_tables xt_length xt_hl xt_tcpmss xt_TCPMSS
iptable_mangle iptable_filter xt_multiport xt_limit xt_dscp ipt_REJECT
ip_tables vhost_net tun macvtap macvlan vzevent ib_iser rdma_cm
ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp
libiscsi scsi_transport_iscsi nfsd nfs nfs_acl auth_rpcgss fscache
lockd sunrpc ipv6 fuse snd_pcsp snd_pcm snd_page_alloc snd_timer
i2c_piix4 snd i2c_core soundcore virtio_balloon parport_pc parport
ext4 jbd2 mbcache sg virtio_net virtio_blk ata_generic pata_acpi
virtio_pci virtio_ring virtio ata_piix [last unloaded: scsi_wait_scan]
Pid: 0, comm: swapper veid: 0 Not tainted 2.6.32-28-pve #1
Call Trace:
 <IRQ>  [<ffffffff81075fe8>] ? warn_slowpath_common+0x88/0xc0
 [<ffffffff8107603a>] ? warn_slowpath_null+0x1a/0x20
 [<ffffffff8126dae8>] ? blk_start_queue+0x88/0x90
 [<ffffffffa00252c8>] ? blk_done+0xe8/0x110 [virtio_blk]
 [<ffffffffa00112ac>] ? vring_interrupt+0x3c/0xd0 [virtio_ring]
 [<ffffffff810f9b80>] ? handle_IRQ_event+0x60/0x170
 [<ffffffff810fc52e>] ? handle_edge_irq+0xde/0x180
 [<ffffffff81010219>] ? handle_irq+0x49/0xa0
 [<ffffffff81564ddc>] ? do_IRQ+0x6c/0xf0
 [<ffffffff8100ba93>] ? ret_from_intr+0x0/0x11
 <EOI>  [<ffffffff8103fb0b>] ? native_safe_halt+0xb/0x10
 [<ffffffff81016e2d>] ? default_idle+0x4d/0xb0
 [<ffffffff8100a013>] ? cpu_idle+0xb3/0x110
 [<ffffffff815407b5>] ? rest_init+0x85/0x90
 [<ffffffff81c34f82>] ? start_kernel+0x412/0x41e
 [<ffffffff81c3433a>] ? x86_64_start_reservations+0x125/0x129
 [<ffffffff81c34453>] ? x86_64_start_kernel+0x115/0x124
---[ end trace d257e46e0ec99bba ]---
Tainting kernel with flag 0x9
Pid: 0, comm: swapper veid: 0 Not tainted 2.6.32-28-pve #1
Call Trace:
 <IRQ>  [<ffffffff81075e45>] ? add_taint+0x35/0x70
 [<ffffffff81075ff5>] ? warn_slowpath_common+0x95/0xc0
 [<ffffffff8107603a>] ? warn_slowpath_null+0x1a/0x20
 [<ffffffff8126dae8>] ? blk_start_queue+0x88/0x90
 [<ffffffffa00252c8>] ? blk_done+0xe8/0x110 [virtio_blk]
 [<ffffffffa00112ac>] ? vring_interrupt+0x3c/0xd0 [virtio_ring]
 [<ffffffff810f9b80>] ? handle_IRQ_event+0x60/0x170
 [<ffffffff810fc52e>] ? handle_edge_irq+0xde/0x180
 [<ffffffff81010219>] ? handle_irq+0x49/0xa0
 [<ffffffff81564ddc>] ? do_IRQ+0x6c/0xf0
 [<ffffffff8100ba93>] ? ret_from_intr+0x0/0x11
 <EOI>  [<ffffffff8103fb0b>] ? native_safe_halt+0xb/0x10
 [<ffffffff81016e2d>] ? default_idle+0x4d/0xb0
 [<ffffffff8100a013>] ? cpu_idle+0xb3/0x110
 [<ffffffff815407b5>] ? rest_init+0x85/0x90
 [<ffffffff81c34f82>] ? start_kernel+0x412/0x41e
 [<ffffffff81c3433a>] ? x86_64_start_reservations+0x125/0x129
 [<ffffffff81c34453>] ? x86_64_start_kernel+0x115/0x124
------------[ cut here ]------------
WARNING: at block/blk-core.c:245 blk_remove_plug+0x98/0xa0() (Tainted:
G        W  ---------------   )
Hardware name: Bochs
Modules linked in: vzethdev vznetdev pio_nfs pio_direct pfmt_raw
pfmt_ploop1 ploop simfs vzrst nf_nat nf_conntrack_ipv4 nf_defrag_ipv4
vzcpt nf_conntrack vzdquota vzmon vzdev ip6t_REJECT ip6table_mangle
ip6table_filter ip6_tables xt_length xt_hl xt_tcpmss xt_TCPMSS
iptable_mangle iptable_filter xt_multiport xt_limit xt_dscp ipt_REJECT
ip_tables vhost_net tun macvtap macvlan vzevent ib_iser rdma_cm
ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp
libiscsi scsi_transport_iscsi nfsd nfs nfs_acl auth_rpcgss fscache
lockd sunrpc ipv6 fuse snd_pcsp snd_pcm snd_page_alloc snd_timer
i2c_piix4 snd i2c_core soundcore virtio_balloon parport_pc parport
ext4 jbd2 mbcache sg virtio_net virtio_blk ata_generic pata_acpi
virtio_pci virtio_ring virtio ata_piix [last unloaded: scsi_wait_scan]
Pid: 0, comm: swapper veid: 0 Tainted: G        W  ---------------
2.6.32-28-pve #1
Call Trace:
 <IRQ>  [<ffffffff81075fe8>] ? warn_slowpath_common+0x88/0xc0
 [<ffffffff8107603a>] ? warn_slowpath_null+0x1a/0x20
 [<ffffffff8126c218>] ? blk_remove_plug+0x98/0xa0
 [<ffffffff8126c236>] ? __blk_run_queue+0x16/0x40
 [<ffffffff8126daa5>] ? blk_start_queue+0x45/0x90
 [<ffffffffa00252c8>] ? blk_done+0xe8/0x110 [virtio_blk]
 [<ffffffffa00112ac>] ? vring_interrupt+0x3c/0xd0 [virtio_ring]
 [<ffffffff810f9b80>] ? handle_IRQ_event+0x60/0x170
 [<ffffffff810fc52e>] ? handle_edge_irq+0xde/0x180
 [<ffffffff81010219>] ? handle_irq+0x49/0xa0
 [<ffffffff81564ddc>] ? do_IRQ+0x6c/0xf0
 [<ffffffff8100ba93>] ? ret_from_intr+0x0/0x11
 <EOI>  [<ffffffff8103fb0b>] ? native_safe_halt+0xb/0x10
 [<ffffffff81016e2d>] ? default_idle+0x4d/0xb0
 [<ffffffff8100a013>] ? cpu_idle+0xb3/0x110
 [<ffffffff815407b5>] ? rest_init+0x85/0x90
 [<ffffffff81c34f82>] ? start_kernel+0x412/0x41e
 [<ffffffff81c3433a>] ? x86_64_start_reservations+0x125/0x129
 [<ffffffff81c34453>] ? x86_64_start_kernel+0x115/0x124
---[ end trace d257e46e0ec99bbb ]---
Tainting kernel with flag 0x9
Pid: 0, comm: swapper veid: 0 Tainted: G        W  ---------------
2.6.32-28-pve #1
Call Trace:
 <IRQ>  [<ffffffff81075e45>] ? add_taint+0x35/0x70
 [<ffffffff81075ff5>] ? warn_slowpath_common+0x95/0xc0
 [<ffffffff8107603a>] ? warn_slowpath_null+0x1a/0x20
 [<ffffffff8126c218>] ? blk_remove_plug+0x98/0xa0
 [<ffffffff8126c236>] ? __blk_run_queue+0x16/0x40
 [<ffffffff8126daa5>] ? blk_start_queue+0x45/0x90
 [<ffffffffa00252c8>] ? blk_done+0xe8/0x110 [virtio_blk]
 [<ffffffffa00112ac>] ? vring_interrupt+0x3c/0xd0 [virtio_ring]
 [<ffffffff810f9b80>] ? handle_IRQ_event+0x60/0x170
 [<ffffffff810fc52e>] ? handle_edge_irq+0xde/0x180
 [<ffffffff81010219>] ? handle_irq+0x49/0xa0
 [<ffffffff81564ddc>] ? do_IRQ+0x6c/0xf0
 [<ffffffff8100ba93>] ? ret_from_intr+0x0/0x11
 <EOI>  [<ffffffff8103fb0b>] ? native_safe_halt+0xb/0x10
 [<ffffffff81016e2d>] ? default_idle+0x4d/0xb0
 [<ffffffff8100a013>] ? cpu_idle+0xb3/0x110
 [<ffffffff815407b5>] ? rest_init+0x85/0x90
 [<ffffffff81c34f82>] ? start_kernel+0x412/0x41e
 [<ffffffff81c3433a>] ? x86_64_start_reservations+0x125/0x129
 [<ffffffff81c34453>] ? x86_64_start_kernel+0x115/0x124
Read-error on swap-device (252:0:96493000)
Read-error on swap-device (252:0:96493008)
Read-error on swap-device (252:0:96493016)
Read-error on swap-device (252:0:96493024)
Read-error on swap-device (252:0:96493032)
Read-error on swap-device (252:0:96493040)
Read-error on swap-device (252:0:96493048)
Read-error on swap-device (252:0:96493056)
Read-error on swap-device (252:0:96493448)
Read-error on swap-device (252:0:96493456)
Read-error on swap-device (252:0:96493464)
Read-error on swap-device (252:0:96493472)
Read-error on swap-device (252:0:96493480)
Read-error on swap-device (252:0:96493488)
Read-error on swap-device (252:0:96493496)
Read-error on swap-device (252:0:96493504)
Read-error on swap-device (252:0:96492424)
Read-error on swap-device (252:0:96492432)
Read-error on swap-device (252:0:96492440)
Read-error on swap-device (252:0:96492448)
Read-error on swap-device (252:0:96492456)
Read-error on swap-device (252:0:96492464)
Read-error on swap-device (252:0:96492472)
Read-error on swap-device (252:0:96492480)
EXT4-fs error (device vda1): ext4_discard_preallocations: Error
loading buddy information for 129
Aborting journal on device vda1-8.
EXT4-fs (vda1): Remounting filesystem read-only
EXT4-fs error (device vda1): ext4_discard_preallocations: Error
loading buddy information for 129
EXT4-fs error (device vda1): ext4_discard_preallocations: Error
loading buddy information for 195
Read-error on swap-device (252:0:96626632)
Read-error on swap-device (252:0:96626640)
Read-error on swap-device (252:0:96626648)
Read-error on swap-device (252:0:96626656)
Read-error on swap-device (252:0:96626664)
Read-error on swap-device (252:0:96626672)
Read-error on swap-device (252:0:96626680)
Read-error on swap-device (252:0:96626688)

This is all. Not much I'm afraid.

Paul

2014-04-21 18:21 GMT+02:00 Paul Penev <ppquant at gmail.com>:
> Joe,
> it will take some time for redo the logs (I'm doing them now).
> While waiting for the heal to complete I did some research on the
> version history of qemu.
>
> There's a patch made two months ago that intrigues me:
>
> https://github.com/qemu/qemu/commit/adccfbcd6020e928db93b2b4faf0dbd05ffbe016
>
> Also the changelog
> https://github.com/qemu/qemu/commits/master/block/gluster.c seems to
> support some work done recently in gluster.c
>
> keep tuned for the logs from the bricks (but I don't expect much there).



More information about the Gluster-users mailing list