[Gluster-users] Self-heal doesn't appear to be happening
Joe Julian
joe at julianfamily.org
Sun Mar 15 19:39:06 UTC 2015
On 03/15/2015 11:16 AM, Jonathan Heese wrote:
>
> Hello all,
>
>
> I have a 2 node 2 brick replicate gluster volume that I'm having
> trouble making fault tolerant (a seemingly basic feature!) under
> CentOS 6.6 using EPEL packages.
>
>
> Both nodes are as close to identical hardware and software as
> possible, and I'm running the following packages:
>
> glusterfs-rdma-3.6.2-1.el6.x86_64
> glusterfs-fuse-3.6.2-1.el6.x86_64
> glusterfs-libs-3.6.2-1.el6.x86_64
> glusterfs-cli-3.6.2-1.el6.x86_64
> glusterfs-api-3.6.2-1.el6.x86_64
> glusterfs-server-3.6.2-1.el6.x86_64
> glusterfs-3.6.2-1.el6.x86_64
>
3.6.2 is not considered production stable. Based on your expressed
concern, you should probably be running 3.5.3.
>
>
> They both have dual-port Mellanox 20Gbps InfiniBand cards with a
> straight (i.e. "crossover") cable and opensm to facilitate the RDMA
> transport between them.
>
>
> Here are some data dumps to set the stage (and yes, the output of
> these commands looks the same on both nodes):
>
>
> [root at duchess ~]# gluster volume info
>
> Volume Name: gluster_disk
> Type: Replicate
> Volume ID: b1279e22-8589-407b-8671-3760f42e93e4
> Status: Started
> Number of Bricks: 1 x 2 = 2
> Transport-type: rdma
> Bricks:
> Brick1: duke-ib:/bricks/brick1
> Brick2: duchess-ib:/bricks/brick1
>
>
> [root at duchess ~]# gluster volume status
> Status of volume: gluster_disk
> Gluster process Port Online Pid
> ------------------------------------------------------------------------------
> Brick duke-ib:/bricks/brick1 49153 Y 9594
> Brick duchess-ib:/bricks/brick1 49153 Y 9583
> NFS Server on localhost 2049 Y 9590
> Self-heal Daemon on localhost N/A Y 9597
> NFS Server on 10.10.10.1 2049 Y 9607
> Self-heal Daemon on 10.10.10.1 N/A Y 9614
>
> Task Status of Volume gluster_disk
> ------------------------------------------------------------------------------
> There are no active volume tasks
>
>
> [root at duchess ~]# gluster peer status
> Number of Peers: 1
>
> Hostname: 10.10.10.1
> Uuid: aca56ec5-94bb-4bb0-8a9e-b3d134bbfe7b
> State: Peer in Cluster (Connected)
>
>
> So before putting any real data on these guys (the data will
> eventually be a handful of large image files backing an iSCSI target
> via tgtd for ESXi datastores), I wanted to simulate the failure of one
> of the nodes. So I stopped glusterfsd and glusterd on duchess, waited
> about 5 minutes, then started them back up again, tail'ing
> /var/log/glusterfs/* and /var/log/messages. I'm not sure exactly what
> I'm looking for, but the logs quieted down after just a minute or so
> of restarting the daemons. I didn't see much indicating that
> self-healing was going on.
>
>
> Every now and then (and seemingly more often than not), when I run
> "gluster volume heal gluster_disk info", I get no output from the
> command, and the following dumps into my /var/log/messages:
>
>
> Mar 15 13:59:16 duchess kernel: glfsheal[10365]: segfault at
> 7ff56068d020 ip 00007ff54f366d80 sp 00007ff54e22adf8 error 6 in
> libmthca-rdmav2.so[7ff54f365000+7000]
>
This a segfault in the mellanox driver. Please report it to the driver
developers.
>
> Mar 15 13:59:17 duchess abrtd: Directory
> 'ccpp-2015-03-15-13:59:16-10359' creation detected
> Mar 15 13:59:17 duchess abrt[10368]: Saved core dump of pid 10359
> (/usr/sbin/glfsheal) to /var/spool/abrt/ccpp-2015-03-15-13:59:16-10359
> (225595392 bytes)
> Mar 15 13:59:25 duchess abrtd: Package 'glusterfs-server' isn't signed
> with proper key
> Mar 15 13:59:25 duchess abrtd: 'post-create' on
> '/var/spool/abrt/ccpp-2015-03-15-13:59:16-10359' exited with 1
> Mar 15 13:59:25 duchess abrtd: Deleting problem directory
> '/var/spool/abrt/ccpp-2015-03-15-13:59:16-10359'
>
> Other times, when I'm lucky, I get messages from the "heal info"
> command indicating that datastore1.img (the file that I intentionally
> changed while duchess was offline) is in need of healing:
>
>
> [root at duke ~]# gluster volume heal gluster_disk info
> Brick duke.jonheese.local:/bricks/brick1/
> /datastore1.img - Possibly undergoing heal
>
> Number of entries: 1
>
> Brick duchess.jonheese.local:/bricks/brick1/
> /datastore1.img - Possibly undergoing heal
>
> Number of entries: 1
>
>
> But watching df on the bricks and tailing glustershd.log doesn't seem
> to indicate that anything is actually happening -- and df indicates
> that brick on duke *is* different in file size from the brick on
> duchess. It's been over an hour now, and I'm not confident that the
> selfheal functionality is even working at all... Nor do I know how to
> do anything about it!
>
File sizes are not necessarily any indication. If the changes you made
were nulls, the change may be sparse. df --apparent is a little better
indicator. Comparing hashes would be even better.
The extended attributes on the file itself, on the bricks, can tell you
the heal state. Look at "getfattr -m . -d -e hex $file". The trusted.afr
attributes, if non-zero, show pending changes destined for the other server.
>
>
> Also, I find it a little bit troubling that I'm using the aliases (in
> /etc/hosts on both servers) duke-ib and duchess-ib for the gluster
> node configuration, but the "heal info" command refers to my nodes
> with their internal FQDNs, which resolve to their 1Gbps interface
> IPs... That doesn't mean that they're trying to communicate over those
> interfaces (the volume is configured with "transport rdma", as you can
> see above), does it?
>
I'd call that a bug. It should report the hostnames as they're listed in
the volume info.
>
>
> Can anyone throw out any ideas on how I can:
>
> 1. Determine whether this is intentional behavior (or a bug?),
>
> 2. Determine whether my data has been properly resync'd across the
> bricks, and
>
> 3. Make it work correctly if not.
>
>
> Thanks in advance!
>
>
> Regards,
>
> Jon Heese
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150315/a228c90c/attachment.html>
More information about the Gluster-users
mailing list