<p dir="ltr"></p>

<p dir="ltr">-Atin<br>

Sent from one plus one<br>

On Aug 10, 2015 9:37 PM, &quot;Kingsley&quot; &lt;<a href="mailto:gluster@gluster.dogwind.com">gluster@gluster.dogwind.com</a>&gt; wrote:<br>

&gt;<br>

&gt; On Mon, 2015-08-10 at 21:34 +0530, Atin Mukherjee wrote:<br>

&gt; &gt; -Atin<br>

&gt; &gt; Sent from one plus one<br>

&gt; &gt; On Aug 10, 2015 7:19 PM, &quot;Kingsley&quot; &lt;<a href="mailto:gluster@gluster.dogwind.com">gluster@gluster.dogwind.com</a>&gt;<br>

&gt; &gt; wrote:<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; Further to this, the volume doesn&#39;t seem overly healthy. Any idea<br>

&gt; &gt; how I<br>

&gt; &gt; &gt; can get it back into a working state?<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; Trying to access one particular directory on the clients just hangs.<br>

&gt; &gt; If<br>

&gt; &gt; &gt; I query heal info, that directory appears in the output as possibly<br>

&gt; &gt; &gt; undergoing heal (actual directory name changed as it&#39;s private<br>

&gt; &gt; info):<br>

&gt; &gt; Can you execute strace and see which call is stuck? That would help us<br>

&gt; &gt; to get to the exact component which we would need to look at.<br>

&gt;<br>

&gt; Hi,<br>

&gt;<br>

&gt; I&#39;ve never used strace before. Could you give me the command line to<br>

&gt; type?<br>

Just type strace followed by the command<br>

&gt;<br>

&gt; Then ... do I need to run something on one of the bricks while strace is<br>

&gt; running?<br>

&gt;<br>

&gt; Cheers,<br>

&gt; Kingsley.<br>

&gt;<br>

&gt;<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; [root@gluster1b-1 ~]# gluster volume heal callrec info<br>

&gt; &gt; &gt; Brick gluster1a-1.dns99.co.uk:/data/brick/callrec/<br>

&gt; &gt; &gt; &lt;gfid:164f888f-2049-49e6-ad26-c758ee091863&gt;<br>

&gt; &gt; &gt; /recordings/834723/14391 - Possibly undergoing heal<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; &lt;gfid:e280b40c-d8b7-43c5-9da7-4737054d7a7f&gt;<br>

&gt; &gt; &gt; &lt;gfid:b1fbda4a-732f-4f5d-b5a1-8355d786073e&gt;<br>

&gt; &gt; &gt; &lt;gfid:edb74524-b4b7-4190-85e7-4aad002f6e7c&gt;<br>

&gt; &gt; &gt; &lt;gfid:9b8b8446-1e27-4113-93c2-6727b1f457eb&gt;<br>

&gt; &gt; &gt; &lt;gfid:650efeca-b45c-413b-acc3-f0a5853ccebd&gt;<br>

&gt; &gt; &gt; Number of entries: 7<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; Brick gluster1b-1.dns99.co.uk:/data/brick/callrec/<br>

&gt; &gt; &gt; Number of entries: 0<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; Brick gluster2a-1.dns99.co.uk:/data/brick/callrec/<br>

&gt; &gt; &gt; &lt;gfid:e280b40c-d8b7-43c5-9da7-4737054d7a7f&gt;<br>

&gt; &gt; &gt; &lt;gfid:164f888f-2049-49e6-ad26-c758ee091863&gt;<br>

&gt; &gt; &gt; &lt;gfid:650efeca-b45c-413b-acc3-f0a5853ccebd&gt;<br>

&gt; &gt; &gt; &lt;gfid:b1fbda4a-732f-4f5d-b5a1-8355d786073e&gt;<br>

&gt; &gt; &gt; /recordings/834723/14391 - Possibly undergoing heal<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; &lt;gfid:edb74524-b4b7-4190-85e7-4aad002f6e7c&gt;<br>

&gt; &gt; &gt; &lt;gfid:9b8b8446-1e27-4113-93c2-6727b1f457eb&gt;<br>

&gt; &gt; &gt; Number of entries: 7<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; Brick gluster2b-1.dns99.co.uk:/data/brick/callrec/<br>

&gt; &gt; &gt; Number of entries: 0<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; If I query each brick directly for the number of files/directories<br>

&gt; &gt; &gt; within that, I get 1731 on gluster1a-1 and gluster2a-1, but 1737 on<br>

&gt; &gt; the<br>

&gt; &gt; &gt; other two, using this command:<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; # find /data/brick/callrec/recordings/834723/14391 -print | wc -l<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; Cheers,<br>

&gt; &gt; &gt; Kingsley.<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; On Mon, 2015-08-10 at 11:05 +0100, Kingsley wrote:<br>

&gt; &gt; &gt; &gt; Sorry for the blind panic - restarting the volume seems to have<br>

&gt; &gt; fixed<br>

&gt; &gt; &gt; &gt; it.<br>

&gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; But then my next question - why is this necessary? Surely it<br>

&gt; &gt; undermines<br>

&gt; &gt; &gt; &gt; the whole point of a high availability system?<br>

&gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; Cheers,<br>

&gt; &gt; &gt; &gt; Kingsley.<br>

&gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; On Mon, 2015-08-10 at 10:53 +0100, Kingsley wrote:<br>

&gt; &gt; &gt; &gt; &gt; Hi,<br>

&gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt; We have a 4 way replicated volume using gluster 3.6.3 on CentOS<br>

&gt; &gt; 7.<br>

&gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt; Over the weekend I did a yum update on each of the bricks in<br>

&gt; &gt; turn, but<br>

&gt; &gt; &gt; &gt; &gt; now when clients (using fuse mounts) try to access the volume,<br>

&gt; &gt; it hangs.<br>

&gt; &gt; &gt; &gt; &gt; Gluster itself wasn&#39;t updated (we&#39;ve disabled that repo so that<br>

&gt; &gt; we keep<br>

&gt; &gt; &gt; &gt; &gt; to 3.6.3 for now).<br>

&gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt; This was what I did:<br>

&gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt;       * on first brick, &quot;yum update&quot;<br>

&gt; &gt; &gt; &gt; &gt;       * reboot brick<br>

&gt; &gt; &gt; &gt; &gt;       * watch &quot;gluster volume status&quot; on another brick and wait<br>

&gt; &gt; for it<br>

&gt; &gt; &gt; &gt; &gt;         to say all 4 bricks are online before proceeding to<br>

&gt; &gt; update the<br>

&gt; &gt; &gt; &gt; &gt;         next brick<br>

&gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt; I was expecting the clients might pause 30 seconds while they<br>

&gt; &gt; notice a<br>

&gt; &gt; &gt; &gt; &gt; brick is offline, but then recover.<br>

&gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt; I&#39;ve tried re-mounting clients, but that hasn&#39;t helped.<br>

&gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt; I can&#39;t see much data in any of the log files.<br>

&gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt; I&#39;ve tried &quot;gluster volume heal callrec&quot; but it doesn&#39;t seem to<br>

&gt; &gt; have<br>

&gt; &gt; &gt; &gt; &gt; helped.<br>

&gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt; What shall I do next?<br>

&gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt; I&#39;ve pasted some stuff below in case any of it helps.<br>

&gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt; Cheers,<br>

&gt; &gt; &gt; &gt; &gt; Kingsley.<br>

&gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt; [root@gluster1b-1 ~]# gluster volume info callrec<br>

&gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt; Volume Name: callrec<br>

&gt; &gt; &gt; &gt; &gt; Type: Replicate<br>

&gt; &gt; &gt; &gt; &gt; Volume ID: a39830b7-eddb-4061-b381-39411274131a<br>

&gt; &gt; &gt; &gt; &gt; Status: Started<br>

&gt; &gt; &gt; &gt; &gt; Number of Bricks: 1 x 4 = 4<br>

&gt; &gt; &gt; &gt; &gt; Transport-type: tcp<br>

&gt; &gt; &gt; &gt; &gt; Bricks:<br>

&gt; &gt; &gt; &gt; &gt; Brick1: gluster1a-1:/data/brick/callrec<br>

&gt; &gt; &gt; &gt; &gt; Brick2: gluster1b-1:/data/brick/callrec<br>

&gt; &gt; &gt; &gt; &gt; Brick3: gluster2a-1:/data/brick/callrec<br>

&gt; &gt; &gt; &gt; &gt; Brick4: gluster2b-1:/data/brick/callrec<br>

&gt; &gt; &gt; &gt; &gt; Options Reconfigured:<br>

&gt; &gt; &gt; &gt; &gt; performance.flush-behind: off<br>

&gt; &gt; &gt; &gt; &gt; [root@gluster1b-1 ~]#<br>

&gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt; [root@gluster1b-1 ~]# gluster volume status callrec<br>

&gt; &gt; &gt; &gt; &gt; Status of volume: callrec<br>

&gt; &gt; &gt; &gt; &gt; Gluster process                                         Port<br>

&gt; &gt; Online  Pid<br>

&gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; ------------------------------------------------------------------------------<br>

&gt; &gt; &gt; &gt; &gt; Brick gluster1a-1:/data/brick/callrec                   49153<br>

&gt; &gt;  Y       6803<br>

&gt; &gt; &gt; &gt; &gt; Brick gluster1b-1:/data/brick/callrec                   49153<br>

&gt; &gt;  Y       2614<br>

&gt; &gt; &gt; &gt; &gt; Brick gluster2a-1:/data/brick/callrec                   49153<br>

&gt; &gt;  Y       2645<br>

&gt; &gt; &gt; &gt; &gt; Brick gluster2b-1:/data/brick/callrec                   49153<br>

&gt; &gt;  Y       4325<br>

&gt; &gt; &gt; &gt; &gt; NFS Server on localhost                                 2049<br>

&gt; &gt; Y       2769<br>

&gt; &gt; &gt; &gt; &gt; Self-heal Daemon on localhost                           N/A<br>

&gt; &gt;  Y       2789<br>

&gt; &gt; &gt; &gt; &gt; NFS Server on gluster2a-1                               2049<br>

&gt; &gt; Y       2857<br>

&gt; &gt; &gt; &gt; &gt; Self-heal Daemon on gluster2a-1                         N/A<br>

&gt; &gt;  Y       2814<br>

&gt; &gt; &gt; &gt; &gt; NFS Server on 88.151.41.100                             2049<br>

&gt; &gt; Y       6833<br>

&gt; &gt; &gt; &gt; &gt; Self-heal Daemon on 88.151.41.100                       N/A<br>

&gt; &gt;  Y       6824<br>

&gt; &gt; &gt; &gt; &gt; NFS Server on gluster2b-1                               2049<br>

&gt; &gt; Y       4428<br>

&gt; &gt; &gt; &gt; &gt; Self-heal Daemon on gluster2b-1                         N/A<br>

&gt; &gt;  Y       4387<br>

&gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt; Task Status of Volume callrec<br>

&gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; ------------------------------------------------------------------------------<br>

&gt; &gt; &gt; &gt; &gt; There are no active volume tasks<br>

&gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt; [root@gluster1b-1 ~]#<br>

&gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt; [root@gluster1b-1 ~]# gluster volume heal callrec info<br>

&gt; &gt; &gt; &gt; &gt; Brick gluster1a-1.dns99.co.uk:/data/brick/callrec/<br>

&gt; &gt; &gt; &gt; &gt; /to_process - Possibly undergoing heal<br>

&gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt; Number of entries: 1<br>

&gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt; Brick gluster1b-1.dns99.co.uk:/data/brick/callrec/<br>

&gt; &gt; &gt; &gt; &gt; Number of entries: 0<br>

&gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt; Brick gluster2a-1.dns99.co.uk:/data/brick/callrec/<br>

&gt; &gt; &gt; &gt; &gt; /to_process - Possibly undergoing heal<br>

&gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt; Number of entries: 1<br>

&gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt; Brick gluster2b-1.dns99.co.uk:/data/brick/callrec/<br>

&gt; &gt; &gt; &gt; &gt; Number of entries: 0<br>

&gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt; [root@gluster1b-1 ~]#<br>

&gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; &gt; _______________________________________________<br>

&gt; &gt; &gt; &gt; &gt; Gluster-users mailing list<br>

&gt; &gt; &gt; &gt; &gt; <a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>

&gt; &gt; &gt; &gt; &gt; <a href="http://www.gluster.org/mailman/listinfo/gluster-users">http://www.gluster.org/mailman/listinfo/gluster-users</a><br>

&gt; &gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt; &gt; _______________________________________________<br>

&gt; &gt; &gt; &gt; Gluster-users mailing list<br>

&gt; &gt; &gt; &gt; <a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>

&gt; &gt; &gt; &gt; <a href="http://www.gluster.org/mailman/listinfo/gluster-users">http://www.gluster.org/mailman/listinfo/gluster-users</a><br>

&gt; &gt; &gt; &gt;<br>

&gt; &gt; &gt;<br>

&gt; &gt; &gt; _______________________________________________<br>

&gt; &gt; &gt; Gluster-users mailing list<br>

&gt; &gt; &gt; <a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>

&gt; &gt; &gt; <a href="http://www.gluster.org/mailman/listinfo/gluster-users">http://www.gluster.org/mailman/listinfo/gluster-users</a><br>

&gt; &gt;<br>

&gt; &gt;<br>

&gt;<br>

</p>