[Gluster-users] One file incessant self heal

Pranith Kumar Karampuri pkarampu at redhat.com
Thu Jul 12 10:29:04 UTC 2012


It seems like both the bricks went down when an operation was happening, I saw the volume info and saw that both the replicate subvolumes are in same host 172.30.1.125, in fact all of the bricks are in same host. This defeats the purpose of replication.

Pranith
----- Original Message -----
From: "Homer Li" <01jay.ly at gmail.com>
To: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
Cc: "gluster-users" <gluster-users at gluster.org>
Sent: Thursday, July 12, 2012 2:13:44 PM
Subject: [Gluster-users] One file incessant self heal

Hi Pranith;
    Thanks for your reply.
    I checked md5sum too, it is different.
    Here is my output:

# getfattr -d -m . -e hex
/export/data10/.glusterfs/d9/b0/d9b0c350-33ba-4090-ab08-f91f30dd661f
getfattr: Removing leading '/' from absolute path names
# file: export/data10/.glusterfs/d9/b0/d9b0c350-33ba-4090-ab08-f91f30dd661f
trusted.afr.gvol1-client-2=0x000003630000000000000000
trusted.afr.gvol1-client-3=0x000000010000000000000000
trusted.gfid=0xd9b0c35033ba4090ab08f91f30dd661f
trusted.glusterfs.quota.4111d3b4-7e06-483f-aae8-fbefe9e55843.contri=0x00000001c0d15000

# getfattr -d -m . -e hex
/export/data11/.glusterfs/d9/b0/d9b0c350-33ba-4090-ab08-f91f30dd661f
getfattr: Removing leading '/' from absolute path names
# file: export/data11/.glusterfs/d9/b0/d9b0c350-33ba-4090-ab08-f91f30dd661f
trusted.afr.gvol1-client-2=0x000000010000000000000000
trusted.afr.gvol1-client-3=0x000000010000000000000000
trusted.gfid=0xd9b0c35033ba4090ab08f91f30dd661f
trusted.glusterfs.quota.4111d3b4-7e06-483f-aae8-fbefe9e55843.contri=0x00000001c0d15000

#stat /export/data10/.glusterfs/d9/b0/d9b0c350-33ba-4090-ab08-f91f30dd661f
  File: `/export/data10/.glusterfs/d9/b0/d9b0c350-33ba-4090-ab08-f91f30dd661f'
  Size: 7530086400	Blocks: 14706856   IO Block: 4096   regular file
Device: 6900h/26880d	Inode: 81395728    Links: 2
Access: (0644/-rw-r--r--)  Uid: (  107/ UNKNOWN)   Gid: (  107/ UNKNOWN)
Access: 2012-06-21 09:58:59.242136421 +0800
Modify: 2012-07-10 13:42:04.381141510 +0800
Change: 2012-07-12 16:23:15.884163991 +0800

#stat /export/data11/.glusterfs/d9/b0/d9b0c350-33ba-4090-ab08-f91f30dd661f
  File: `/export/data11/.glusterfs/d9/b0/d9b0c350-33ba-4090-ab08-f91f30dd661f'
  Size: 7530086400	Blocks: 14706856   IO Block: 4096   regular file
Device: 6910h/26896d	Inode: 17956874    Links: 2
Access: (0644/-rw-r--r--)  Uid: (  107/ UNKNOWN)   Gid: (  107/ UNKNOWN)
Access: 2012-06-21 09:58:59.242136421 +0800
Modify: 2012-07-10 13:42:04.381141510 +0800
Change: 2012-07-12 16:23:15.885163872 +0800

#md5sum  /export/data10/.glusterfs/d9/b0/d9b0c350-33ba-4090-ab08-f91f30dd661f
7fba61af476bf379c50f7429c89449ee
/export/data10/.glusterfs/d9/b0/d9b0c350-33ba-4090-ab08-f91f30dd661f

#md5sum  /export/data11/.glusterfs/d9/b0/d9b0c350-33ba-4090-ab08-f91f30dd661f
9b08b7145c171afff863c4ae5884fa01
/export/data11/.glusterfs/d9/b0/d9b0c350-33ba-4090-ab08-f91f30dd661f



2012/7/12 Pranith Kumar Karampuri <pkarampu at redhat.com>:
> Homer,
>     Could you give the output of
> getfattr -d -m . -e hex /export/data10/.glusterfs/d9/b0/d9b0c350-33ba-4090-ab08-f91f30dd661f
> getfattr -d -m . -e hex /export/data11/.glusterfs/d9/b0/d9b0c350-33ba-4090-ab08-f91f30dd661f
> and also 'stat' of these files.
>
> Pranith.
> ----- Original Message -----
> From: "Homer Li" <01jay.ly at gmail.com>
> To: "gluster-users" <gluster-users at gluster.org>
> Sent: Thursday, July 12, 2012 7:54:24 AM
> Subject: [Gluster-users] One file incessant self heal
>
> Hello ;
>    I found many self-heal triggered log in every 10 minutes.
>    Only one file , it 's gfid:d9b0c350-33ba-4090-ab08-f91f30dd661f.
>    Heal-failed and split-brain have not display anything.
>    Does any problem in this file ?
>
>
> GlusterFS config:
>
> OS: 2.6.32-220.17.1.el6.x86_64  Scientific Linux release 6.2 (Carbon)
> # rpm -qa | grep glusterfs
> glusterfs-3.3.0-2.el6.x86_64
> glusterfs-devel-3.3.0-2.el6.x86_64
> glusterfs-fuse-3.3.0-2.el6.x86_64
> glusterfs-geo-replication-3.3.0-2.el6.x86_64
> glusterfs-rdma-3.3.0-2.el6.x86_64
> glusterfs-server-3.3.0-2.el6.x86_64
> glusterfs-debuginfo-3.3.0-2.el6.x86_64
>
> gluster> volume info
>
> Volume Name: gvol1
> Type: Distributed-Replicate
> Volume ID: a7d8ffdf-7296-404b-aeab-824ee853ec59
> Status: Started
> Number of Bricks: 2 x 2 = 4
> Transport-type: tcp
> Bricks:
> Brick1: 172.30.1.125:/export/data00
> Brick2: 172.30.1.125:/export/data01
> Brick3: 172.30.1.125:/export/data10
> Brick4: 172.30.1.125:/export/data11
> Options Reconfigured:
> features.limit-usage: /source:500GB
> features.quota: on
> performance.cache-refresh-timeout: 30
> performance.io-thread-count: 32
> nfs.disable: off
> cluster.min-free-disk: 5%
> performance.cache-size: 128MB
>
>
> gluster volume heal gvol1 info
> Heal operation on volume gvol1 has been successful
>
> Brick 172.30.1.125:/export/data00
> Number of entries: 0
>
> Brick 172.30.1.125:/export/data01
> Number of entries: 0
>
> Brick 172.30.1.125:/export/data10
> Number of entries: 1
> /fs126/Graphite-monitor_vdb.qcow2
>
> Brick 172.30.1.125:/export/data11
> Number of entries: 1
> /fs126/Graphite-monitor_vdb.qcow2
>
> # gluster volume heal gvol1 info heal-failed
> Heal operation on volume gvol1 has been successful
>
> Brick 172.30.1.125:/export/data00
> Number of entries: 0
>
> Brick 172.30.1.125:/export/data01
> Number of entries: 0
>
> Brick 172.30.1.125:/export/data10
> Number of entries: 0
>
> Brick 172.30.1.125:/export/data11
> Number of entries: 0
>
> gluster volume heal gvol1 info split-brain
> Heal operation on volume gvol1 has been successful
>
> Brick 172.30.1.125:/export/data00
> Number of entries: 0
>
> Brick 172.30.1.125:/export/data01
> Number of entries: 0
>
> Brick 172.30.1.125:/export/data10
> Number of entries: 0
>
> Brick 172.30.1.125:/export/data11
> Number of entries: 0
>
>
>
> Log detail:
> [2012-07-12 09:13:11.666417] I
> [afr-self-heald.c:282:_remove_stale_index] 0-gvol1-replicate-0:
> Removing stale index for e6087bf7-ae55-441b-8f88-a7b17475caea on
> gvol1-client-0
> [2012-07-12 09:13:11.666998] W
> [client3_1-fops.c:592:client3_1_unlink_cbk] 0-gvol1-client-0: remote
> operation failed: No such file or directory
> [2012-07-12 09:13:11.667132] I
> [afr-common.c:1340:afr_launch_self_heal] 0-gvol1-replicate-1:
> background  data self-heal triggered. path:
> <gfid:d9b0c350-33ba-4090-ab08-f91f30dd661f>, reason: lookup detected
> pending operations
> [2012-07-12 09:13:11.667141] E
> [afr-self-heald.c:287:_remove_stale_index] 0-gvol1-replicate-0:
> e6087bf7-ae55-441b-8f88-a7b17475caea: Failed to remove index on
> gvol1-client-0 - No such file or directory
> [2012-07-12 09:13:11.667589] I
> [afr-common.c:1340:afr_launch_self_heal] 0-gvol1-replicate-1:
> background  data self-heal triggered. path:
> <gfid:d9b0c350-33ba-4090-ab08-f91f30dd661f>, reason: lookup detected
> pending operations
> [2012-07-12 09:13:11.668581] I
> [afr-self-heal-data.c:712:afr_sh_data_fix] 0-gvol1-replicate-1: no
> active sinks for performing self-heal on file
> <gfid:d9b0c350-33ba-4090-ab08-f91f30dd661f>
> [2012-07-12 09:13:11.669322] I
> [afr-self-heal-common.c:2159:afr_self_heal_completion_cbk]
> 0-gvol1-replicate-1: background  data self-heal completed on
> <gfid:d9b0c350-33ba-4090-ab08-f91f30dd661f>
> [2012-07-12 09:13:11.669775] I
> [afr-self-heal-data.c:712:afr_sh_data_fix] 0-gvol1-replicate-1: no
> active sinks for performing self-heal on file
> <gfid:d9b0c350-33ba-4090-ab08-f91f30dd661f>
> [2012-07-12 09:13:11.670408] I
> [afr-self-heal-common.c:2159:afr_self_heal_completion_cbk]
> 0-gvol1-replicate-1: background  data self-heal completed on
> <gfid:d9b0c350-33ba-4090-ab08-f91f30dd661f>
> [2012-07-12 09:23:11.770994] I
> [afr-self-heald.c:282:_remove_stale_index] 0-gvol1-replicate-0:
> Removing stale index for e6087bf7-ae55-441b-8f88-a7b17475caea on
> gvol1-client-0
> [2012-07-12 09:23:11.771358] W
> [client3_1-fops.c:592:client3_1_unlink_cbk] 0-gvol1-client-0: remote
> operation failed: No such file or directory
> [2012-07-12 09:23:11.771416] E
> [afr-self-heald.c:287:_remove_stale_index] 0-gvol1-replicate-0:
> e6087bf7-ae55-441b-8f88-a7b17475caea: Failed to remove index on
> gvol1-client-0 - No such file or directory
> [2012-07-12 09:23:11.771898] I
> [afr-common.c:1340:afr_launch_self_heal] 0-gvol1-replicate-1:
> background  data self-heal triggered. path:
> <gfid:d9b0c350-33ba-4090-ab08-f91f30dd661f>, reason: lookup detected
> pending operations
> [2012-07-12 09:23:11.772059] I
> [afr-common.c:1340:afr_launch_self_heal] 0-gvol1-replicate-1:
> background  data self-heal triggered. path:
> <gfid:d9b0c350-33ba-4090-ab08-f91f30dd661f>, reason: lookup detected
> pending operations
> [2012-07-12 09:23:11.773074] I
> [afr-self-heal-data.c:712:afr_sh_data_fix] 0-gvol1-replicate-1: no
> active sinks for performing self-heal on file
> <gfid:d9b0c350-33ba-4090-ab08-f91f30dd661f>
> [2012-07-12 09:23:11.773686] I
> [afr-self-heal-common.c:2159:afr_self_heal_completion_cbk]
> 0-gvol1-replicate-1: background  data self-heal completed on
> <gfid:d9b0c350-33ba-4090-ab08-f91f30dd661f>
> [2012-07-12 09:23:11.774094] I
> [afr-self-heal-data.c:712:afr_sh_data_fix] 0-gvol1-replicate-1: no
> active sinks for performing self-heal on file
> <gfid:d9b0c350-33ba-4090-ab08-f91f30dd661f>
> [2012-07-12 09:23:11.774652] I
> [afr-self-heal-common.c:2159:afr_self_heal_completion_cbk]
> 0-gvol1-replicate-1: background  data self-heal completed on
> <gfid:d9b0c350-33ba-4090-ab08-f91f30dd661f>
> [2012-07-12 09:33:11.874474] I
> [afr-self-heald.c:282:_remove_stale_index] 0-gvol1-replicate-0:
> Removing stale index for e6087bf7-ae55-441b-8f88-a7b17475caea on
> gvol1-client-0
> [2012-07-12 09:33:11.874919] W
> [client3_1-fops.c:592:client3_1_unlink_cbk] 0-gvol1-client-0: remote
> operation failed: No such file or directory
> [2012-07-12 09:33:11.874978] E
> [afr-self-heald.c:287:_remove_stale_index] 0-gvol1-replicate-0:
> d9b0c350-33ba-4090-ab08-f91f30dd661f: Failed to remove index on
> gvol1-client-0 - No such file or directory
> [2012-07-12 09:33:11.875505] I
> [afr-common.c:1340:afr_launch_self_heal] 0-gvol1-replicate-1:
> background  data self-heal triggered. path:
> <gfid:d9b0c350-33ba-4090-ab08-f91f30dd661f>, reason: lookup detected
> pending operations
> [2012-07-12 09:33:11.875676] I
> [afr-common.c:1340:afr_launch_self_heal] 0-gvol1-replicate-1:
> background  data self-heal triggered. path:
> <gfid:d9b0c350-33ba-4090-ab08-f91f30dd661f>, reason: lookup detected
> pending operations
> [2012-07-12 09:33:11.876613] I
> [afr-self-heal-data.c:712:afr_sh_data_fix] 0-gvol1-replicate-1: no
> active sinks for performing self-heal on file
> <gfid:d9b0c350-33ba-4090-ab08-f91f30dd661f>
> [2012-07-12 09:33:11.877244] I
> [afr-self-heal-common.c:2159:afr_self_heal_completion_cbk]
> 0-gvol1-replicate-1: background  data self-heal completed on
> <gfid:d9b0c350-33ba-4090-ab08-f91f30dd661f>
> [2012-07-12 09:33:11.877646] I
> [afr-self-heal-data.c:712:afr_sh_data_fix] 0-gvol1-replicate-1: no
> active sinks for performing self-heal on file
> <gfid:d9b0c350-33ba-4090-ab08-f91f30dd661f>
> [2012-07-12 09:33:11.878191] I
> [afr-self-heal-common.c:2159:afr_self_heal_completion_cbk]
> 0-gvol1-replicate-1: background  data self-heal completed on
> <gfid:d9b0c350-33ba-4090-ab08-f91f30dd661f>
> [2012-07-12 09:43:11.971727] I
> [afr-self-heald.c:282:_remove_stale_index] 0-gvol1-replicate-0:
> Removing stale index for e6087bf7-ae55-441b-8f88-a7b17475caea on
> gvol1-client-0
> [2012-07-12 09:43:11.972004] W
> [client3_1-fops.c:592:client3_1_unlink_cbk] 0-gvol1-client-0: remote
> operation failed: No such file or directory
> [2012-07-12 09:43:11.972066] E
> [afr-self-heald.c:287:_remove_stale_index] 0-gvol1-replicate-0:
> e6087bf7-ae55-441b-8f88-a7b17475caea: Failed to remove index on
> gvol1-client-0 - No such file or directory
> [2012-07-12 09:43:11.972635] I
> [afr-common.c:1340:afr_launch_self_heal] 0-gvol1-replicate-1:
> background  data self-heal triggered. path:
> <gfid:d9b0c350-33ba-4090-ab08-f91f30dd661f>, reason: lookup detected
> pending operations
> [2012-07-12 09:43:11.972924] I
> [afr-common.c:1340:afr_launch_self_heal] 0-gvol1-replicate-1:
> background  data self-heal triggered. path:
> <gfid:d9b0c350-33ba-4090-ab08-f91f30dd661f>, reason: lookup detected
> pending operations
> [2012-07-12 09:43:11.973885] I
> [afr-self-heal-data.c:712:afr_sh_data_fix] 0-gvol1-replicate-1: no
> active sinks for performing self-heal on file
> <gfid:d9b0c350-33ba-4090-ab08-f91f30dd661f>
> [2012-07-12 09:43:11.974451] I
> [afr-self-heal-common.c:2159:afr_self_heal_completion_cbk]
> 0-gvol1-replicate-1: background  data self-heal completed on
> <gfid:d9b0c350-33ba-4090-ab08-f91f30dd661f>
> [2012-07-12 09:43:11.974965] I
> [afr-self-heal-data.c:712:afr_sh_data_fix] 0-gvol1-replicate-1: no
> active sinks for performing self-heal on file
> <gfid:d9b0c350-33ba-4090-ab08-f91f30dd661f>
> [2012-07-12 09:43:11.975772] I
> [afr-self-heal-common.c:2159:afr_self_heal_completion_cbk]
> 0-gvol1-replicate-1: background  data self-heal completed on
> <gfid:d9b0c350-33ba-4090-ab08-f91f30dd661f>
>
> --
> Best Regards
> Homer Li
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users



-- 
Best Regards
Homer Li



More information about the Gluster-users mailing list