[Gluster-devel] The return of the all-null pending matrix

Emmanuel Dreyfus manu at netbsd.org
Tue Jul 23 00:19:34 UTC 2013


Vijay Bellur <vbellur at redhat.com> wrote:

> I have not been able to re-create the problem in my setup. I think it 
> would be a good idea to track this bug and address it. For now, can we 
> not use the volume set mechanism to disable eager-locking?

Our exchanges have gone off list after this message. I repost here 
the 100k last lines of log with debug mode:
http://ftp.espci.fr/shadow/manu/log

relevant part:

[2013-07-22 15:36:22.923866] D [afr-lk-common.c:447:transaction_lk_op] 0-gfs34-replicate-0: lk op is for a transaction
[2013-07-22 15:36:22.924484] D [client-rpc-fops.c:2789:client_fdctx_destroy] 0-gfs34-client-0: sending release on fd
[2013-07-22 15:36:22.924560] D [client-rpc-fops.c:2789:client_fdctx_destroy] 0-gfs34-client-1: sending release on fd
[2013-07-22 15:36:22.943156] D [afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 0-gfs34-replicate-1: pending_matrix: [ 0 0 ]
[2013-07-22 15:36:22.943202] D [afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 0-gfs34-replicate-1: pending_matrix: [ 0 0 ]
[2013-07-22 15:36:22.943236] D [afr-self-heal-common.c:887:afr_mark_sources] 0-gfs34-replicate-1: Number of sources: -1
[2013-07-22 15:36:22.943271] D [afr-self-heal-data.c:794:afr_lookup_select_read_child_by_txn_type] 0-gfs34-replicate-1: /manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po: Possible split-brain
[2013-07-22 15:36:22.943305] D [afr-self-heal-data.c:825:afr_lookup_select_read_child_by_txn_type] 0-gfs34-replicate-1: returning read_child: 1
[2013-07-22 15:36:22.943336] D [afr-common.c:1380:afr_lookup_select_read_child] 0-gfs34-replicate-1: Source selected as 1 for /manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po
[2013-07-22 15:36:22.943374] D [afr-common.c:1117:afr_lookup_build_response_params] 0-gfs34-replicate-1: Building lookup response from 1
[2013-07-22 15:36:22.943409] D [afr-common.c:1265:afr_detect_self_heal_by_iatt] 0-gfs34-replicate-1: size differs for /manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po 
[2013-07-22 15:36:22.943444] D [afr-common.c:1291:afr_detect_self_heal_by_split_brain_status] 0-gfs34-replicate-1: split brain detected during lookup of
/manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po.
[2013-07-22 15:36:22.943478] D [afr-common.c:1426:afr_launch_self_heal] 0-gfs34-replicate-1: background  data self-heal triggered. path: /manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po, reason:
lookup detected pending operations
[2013-07-22 15:36:23.272807] D [afr-self-heal-metadata.c:486:afr_sh_metadata_post_nonblocking_inodelk_cbk] 0-gfs34-replicate-1: Non Blocking metadata inodelks done for
/manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po. Proceeding to FOP
[2013-07-22 15:36:23.272868] D [mem-pool.c:422:mem_get]  0-mem-pool: Mem pool is full. Callocing mem
[2013-07-22 15:36:23.272900] D [afr-self-heal-common.c:1930:afr_sh_common_lookup] 0-gfs34-replicate-1: looking up /manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po on subvolume gfs34-client-2
[2013-07-22 15:36:23.272986] D [afr-self-heal-common.c:1930:afr_sh_common_lookup] 0-gfs34-replicate-1: looking up /manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po on subvolume gfs34-client-3
[2013-07-22 15:36:23.273596] D [mem-pool.c:422:mem_get]  0-mem-pool: Mem pool is full. Callocing mem
[2013-07-22 15:36:23.273752] D [mem-pool.c:422:mem_get]  0-mem-pool: Mem pool is full. Callocing mem
[2013-07-22 15:36:23.273792] D [afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 0-gfs34-replicate-1: pending_matrix: [ 0 0 ]
[2013-07-22 15:36:23.273829] D [afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 0-gfs34-replicate-1: pending_matrix: [ 0 0 ]
[2013-07-22 15:36:23.273862] D [afr-self-heal-common.c:887:afr_mark_sources] 0-gfs34-replicate-1: Number of sources: 2
[2013-07-22 15:36:23.273895] D [afr-lk-common.c:452:transaction_lk_op] 0-gfs34-replicate-1: lk op is for a self heal
[2013-07-22 15:36:23.276705] D [afr-self-heal-metadata.c:61:afr_sh_metadata_done] 0-gfs34-replicate-1: proceeding to data check on /manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po
[2013-07-22 15:36:23.278390] D [afr-self-heal-data.c:1158:afr_sh_data_post_nonblocking_inodelk_cbk] 0-gfs34-replicate-1: Non Blocking data inodelks done for
/manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po by 5c3e47ba. Proceeding to self-heal
[2013-07-22 15:36:23.278520] D [mem-pool.c:422:mem_get]  0-mem-pool: Mem pool is full. Callocing mem
[2013-07-22 15:36:23.278540] D [mem-pool.c:422:mem_get]  0-mem-pool: Mem pool is full. Callocing mem
[2013-07-22 15:36:23.280422] D [mem-pool.c:422:mem_get]  0-mem-pool: Mem pool is full. Callocing mem
[2013-07-22 15:36:23.281824] D [mem-pool.c:422:mem_get]  0-mem-pool: Mem pool is full. Callocing mem
[2013-07-22 15:36:23.282746] D [afr-self-heal-data.c:686:afr_sh_data_fxattrop_fstat_done] 0-gfs34-replicate-1: Pending matrix for: 5c3e47ba
[2013-07-22 15:36:23.282798] D [afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 0-gfs34-replicate-1: pending_matrix: [ 0 0 ]
[2013-07-22 15:36:23.282831] D [afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 0-gfs34-replicate-1: pending_matrix: [ 0 0 ]
[2013-07-22 15:36:23.282862] D [afr-self-heal-common.c:887:afr_mark_sources] 0-gfs34-replicate-1: Number of sources: -1
[2013-07-22 15:36:23.282897] E [afr-self-heal-common.c:197:afr_sh_print_split_brain_log] 0-gfs34-replicate-1: Unable to self-heal contents of '/manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po'
(possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix:  [ [ 0 0 ] [ 0 0 ] ]
[2013-07-22 15:36:23.282931] D [afr-self-heal-data.c:336:afr_sh_data_fail] 0-gfs34-replicate-1: finishing failed data selfheal of /manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po
[2013-07-22 15:36:23.282962] D [afr-lk-common.c:452:transaction_lk_op] 0-gfs34-replicate-1: lk op is for a self heal
[2013-07-22 15:36:23.283575] E [afr-self-heal-common.c:2212:afr_self_heal_completion_cbk] 0-gfs34-replicate-1: background  data self-heal failed on /manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po
[2013-07-22 15:36:23.283636] D [afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 0-gfs34-replicate-1: pending_matrix: [ 0 0 ]
[2013-07-22 15:36:23.283669] D [afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 0-gfs34-replicate-1: pending_matrix: [ 0 0 ]
[2013-07-22 15:36:23.283700] D [afr-self-heal-common.c:887:afr_mark_sources] 0-gfs34-replicate-1: Number of sources: -1
[2013-07-22 15:36:23.283730] D [afr-self-heal-data.c:794:afr_lookup_select_read_child_by_txn_type] 0-gfs34-replicate-1: /manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po: Possible split-brain
[2013-07-22 15:36:23.283763] D [afr-self-heal-data.c:825:afr_lookup_select_read_child_by_txn_type] 0-gfs34-replicate-1: returning read_child: 1
[2013-07-22 15:36:23.283794] D [afr-common.c:1380:afr_lookup_select_read_child] 0-gfs34-replicate-1: Source selected as 1 for /manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po
[2013-07-22 15:36:23.283828] D [afr-common.c:1117:afr_lookup_build_response_params] 0-gfs34-replicate-1: Building lookup response from 1
[2013-07-22 15:36:23.284755] W [afr-open.c:213:afr_open] 0-gfs34-replicate-1: failed to open as split brain seen, returning EIO



-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
manu at netbsd.org




More information about the Gluster-devel mailing list