[Gluster-users] Self-heal doesn't appear to be happening

Joe Julian joe at julianfamily.org
Mon Mar 16 01:03:26 UTC 2015


Yes, please file that bug report. 

On March 15, 2015 5:32:20 PM PDT, Jonathan Heese <jheese at inetu.net> wrote:
>Joe,
>
>
>First, allow me to apologize for top-posting -- webmail client doesn't
>leave me much choice, unfortunately.
>
>
>Second, I'd like to thank you profusely for replying so quickly to my
>question. I am accustomed to long wait times on most OSS mailing lists.
>:)
>
>
>So... I wish that I had known that the procedure recommended at
>http://blog.gluster.org/2014/08/debunking-the-glusterfs-rdma-is-unstable-myth/
>(which is what I followed to get started on this little adventure)
>would leave me with potentially un-production-stable packages.... :/
>
>
>I went ahead and yanked out all of the 3.6.2 stuff and reinstalled with
>3.5.3, and it's like night and day. I can stop the glusterd on duchess,
>write out a 2GB file on the volume mountpoint on duke, and when I start
>glusterd back up on duchess, I can't even type "ls" fast enough before
>the new file is on the brick locally.
>
>
>Also, running "gluster volume heal $vol info" no longer results in the
>segfault and always gives me useful output (even if it's just to say
>that everything is fine...).
>
>
>For now, I think this has the potential of curing all of my issues
>here.   I will keep testing, and I'll post back here if I need any
>further assistance.
>
>
>Oh, by the way, I still get inaccurate node names from "gluster volume
>heal $vol info" with 3.5.3:
>
>[root at duchess ~]# gluster volume heal gluster_disk info
>Brick duke.jonheese.local:/bricks/brick1/
>Number of entries: 0
>
>Brick duchess.jonheese.local:/bricks/brick1/
>Number of entries: 0
>
>
>(Notice that the nodes are named "duke-ib" and "duchess-ib" in the
>'volume info' output:
>[root at duchess ~]# gluster volume info
>
>Volume Name: gluster_disk
>Type: Replicate
>Volume ID: 7158b824-455f-46f0-9da3-9b4d6c1fc484
>Status: Started
>Number of Bricks: 1 x 2 = 2
>Transport-type: rdma
>Bricks:
>Brick1: duke-ib:/bricks/brick1
>Brick2: duchess-ib:/bricks/brick1
>
>
>Should I raise a bug for this?
>
>
>Thanks again!
>
>
>Regards,
>
>Jon Heese
>
>
>________________________________
>From: gluster-users-bounces at gluster.org
><gluster-users-bounces at gluster.org> on behalf of Joe Julian
><joe at julianfamily.org>
>Sent: Sunday, March 15, 2015 3:39 PM
>To: gluster-users at gluster.org
>Subject: Re: [Gluster-users] Self-heal doesn't appear to be happening
>
>On 03/15/2015 11:16 AM, Jonathan Heese wrote:
>
>Hello all,
>
>
>I have a 2 node 2 brick replicate gluster volume that I'm having
>trouble making fault tolerant (a seemingly basic feature!) under CentOS
>6.6 using EPEL packages.
>
>
>Both nodes are as close to identical hardware and software as possible,
>and I'm running the following packages:
>
>glusterfs-rdma-3.6.2-1.el6.x86_64
>glusterfs-fuse-3.6.2-1.el6.x86_64
>glusterfs-libs-3.6.2-1.el6.x86_64
>glusterfs-cli-3.6.2-1.el6.x86_64
>glusterfs-api-3.6.2-1.el6.x86_64
>glusterfs-server-3.6.2-1.el6.x86_64
>glusterfs-3.6.2-1.el6.x86_64
>
>3.6.2 is not considered production stable. Based on your expressed
>concern, you should probably be running 3.5.3.
>
>
>They both have dual-port Mellanox 20Gbps InfiniBand cards with a
>straight (i.e. "crossover") cable and opensm to facilitate the RDMA
>transport between them.
>
>
>Here are some data dumps to set the stage (and yes, the output of these
>commands looks the same on both nodes):
>
>
>[root at duchess ~]# gluster volume info
>
>Volume Name: gluster_disk
>Type: Replicate
>Volume ID: b1279e22-8589-407b-8671-3760f42e93e4
>Status: Started
>Number of Bricks: 1 x 2 = 2
>Transport-type: rdma
>Bricks:
>Brick1: duke-ib:/bricks/brick1
>Brick2: duchess-ib:/bricks/brick1
>
>
>[root at duchess ~]# gluster volume status
>Status of volume: gluster_disk
>Gluster process                                         Port    Online 
>Pid
>------------------------------------------------------------------------------
>Brick duke-ib:/bricks/brick1                            49153   Y      
>9594
>Brick duchess-ib:/bricks/brick1                         49153   Y      
>9583
>NFS Server on localhost                                 2049    Y      
>9590
>Self-heal Daemon on localhost                           N/A     Y      
>9597
>NFS Server on 10.10.10.1                                2049    Y      
>9607
>Self-heal Daemon on 10.10.10.1                          N/A     Y      
>9614
>
>Task Status of Volume gluster_disk
>------------------------------------------------------------------------------
>There are no active volume tasks
>
>
>[root at duchess ~]# gluster peer status
>Number of Peers: 1
>
>Hostname: 10.10.10.1
>Uuid: aca56ec5-94bb-4bb0-8a9e-b3d134bbfe7b
>State: Peer in Cluster (Connected)
>
>
>So before putting any real data on these guys (the data will eventually
>be a handful of large image files backing an iSCSI target via tgtd for
>ESXi datastores), I wanted to simulate the failure of one of the nodes.
>So I stopped glusterfsd and glusterd on duchess, waited about 5
>minutes, then started them back up again, tail'ing /var/log/glusterfs/*
>and /var/log/messages. I'm not sure exactly what I'm looking for, but
>the logs quieted down after just a minute or so of restarting the
>daemons. I didn't see much indicating that self-healing was going on.
>
>
>Every now and then (and seemingly more often than not), when I run
>"gluster volume heal gluster_disk info", I get no output from the
>command, and the following dumps into my /var/log/messages:
>
>
>Mar 15 13:59:16 duchess kernel: glfsheal[10365]: segfault at
>7ff56068d020 ip 00007ff54f366d80 sp 00007ff54e22adf8 error 6 in
>libmthca-rdmav2.so[7ff54f365000+7000]
>
>This a segfault in the mellanox driver. Please report it to the driver
>developers.
>
>Mar 15 13:59:17 duchess abrtd: Directory
>'ccpp-2015-03-15-13:59:16-10359' creation detected
>Mar 15 13:59:17 duchess abrt[10368]: Saved core dump of pid 10359
>(/usr/sbin/glfsheal) to /var/spool/abrt/ccpp-2015-03-15-13:59:16-10359
>(225595392 bytes)
>Mar 15 13:59:25 duchess abrtd: Package 'glusterfs-server' isn't signed
>with proper key
>Mar 15 13:59:25 duchess abrtd: 'post-create' on
>'/var/spool/abrt/ccpp-2015-03-15-13:59:16-10359' exited with 1
>Mar 15 13:59:25 duchess abrtd: Deleting problem directory
>'/var/spool/abrt/ccpp-2015-03-15-13:59:16-10359'
>
>
>Other times, when I'm lucky, I get messages from the "heal info"
>command indicating that datastore1.img (the file that I intentionally
>changed while duchess was offline) is in need of healing:
>
>
>[root at duke ~]# gluster volume heal gluster_disk info
>Brick duke.jonheese.local:/bricks/brick1/
>/datastore1.img - Possibly undergoing heal
>
>Number of entries: 1
>
>Brick duchess.jonheese.local:/bricks/brick1/
>/datastore1.img - Possibly undergoing heal
>
>Number of entries: 1
>
>
>But watching df on the bricks and tailing glustershd.log doesn't seem
>to indicate that anything is actually happening -- and df indicates
>that brick on duke *is* different in file size from the brick on
>duchess. It's been over an hour now, and I'm not confident that the
>selfheal functionality is even working at all... Nor do I know how to
>do anything about it!
>
>File sizes are not necessarily any indication. If the changes you made
>were nulls, the change may be sparse. df --apparent is a little better
>indicator. Comparing hashes would be even better.
>
>The extended attributes on the file itself, on the bricks, can tell you
>the heal state. Look at "getfattr -m . -d -e hex $file". The
>trusted.afr attributes, if non-zero, show pending changes destined for
>the other server.
>
>
>Also, I find it a little bit troubling that I'm using the aliases (in
>/etc/hosts on both servers) duke-ib and duchess-ib for the gluster node
>configuration, but the "heal info" command refers to my nodes with
>their internal FQDNs, which resolve to their 1Gbps interface IPs...
>That doesn't mean that they're trying to communicate over those
>interfaces (the volume is configured with "transport rdma", as you can
>see above), does it?
>
>I'd call that a bug. It should report the hostnames as they're listed
>in the volume info.
>
>
>Can anyone throw out any ideas on how I can:
>
>1. Determine whether this is intentional behavior (or a bug?),
>
>2. Determine whether my data has been properly resync'd across the
>bricks, and
>
>3. Make it work correctly if not.
>
>
>Thanks in advance!
>
>
>Regards,
>
>Jon Heese
>
>
>
>_______________________________________________
>Gluster-users mailing list
>Gluster-users at gluster.org<mailto:Gluster-users at gluster.org>
>http://www.gluster.org/mailman/listinfo/gluster-users

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150315/f01a57e9/attachment.html>


More information about the Gluster-users mailing list