[Gluster-users] Recovering out of sync nodes from input/output error

Alex Florescu alex.florescu at tripsolutions.co.uk
Thu Apr 12 15:51:48 UTC 2012


On Thu, Apr 12, 2012 at 3:49 PM, Jeff Darcy wrote:

>
> (1) To a first approximation, it should be safe to "merge" directory
> contents
> despite there being a split-brain problem, by healing any file that exists
> on
> only one brick from there to its peer(s).


 I am not sure if got this right, but if I did, this should be the two way
scenario depicted at the end of the message.

> (3) The reason you continue to get I/O errors is probably that the xattrs
> on
> the *parent directory* still indicate pending operations on both sides.
>  You
> can verify this with the following command on each brick:
>
>        getfattr -d -e hex -n trusted.glusterfs.dht /a
>

Unfortunately:
getfattr: /a: Input/output error
And when running on any working instance, it says trusted.glusterfs.dht: No
such attribute.

> If the result is non-zero (most likely in the last four-byte integer
> indicating
> a directory-entry operation) then that confirms our theory.  It should be
> safe
> for the self-heal code to clear these counts if (and only if) the
> directories
> are checked and found identical.  In fact, I think we already do this.
>  Thus,
> manual copying of files followed by self-heal on the parent directory
> should
> make the errors go away.  I encourage you to try that while I go look at
> the code.
>

 Ok, I thought of two ways to manually copy files and making gluster think
the directories are identical.
 ----BTW, I found out that if I disrupt again connectivity between the
nodes, I am able to do operations on the mountpoint (/a) ----

1st way - node1 (10.0.2.14)
scp /local/howareyou 10.0.2.15:/local
scp 10.0.2.15:/local/hello /local
ls /a
ls: cannot access /a: Input/output error
iptables -A INPUT -s 10.0.2.15 -j DROP - so I can access mountpoint
ls -lh /a
????????????? ? ?      ?       ?            ? hello
-rw-r--r-- 1 root root 0 Apr   6 01:48 howareyou

2nd way - node1 (10.0.2.14) (from scratch)
iptables -A INPUT -p tcp -s 10.0.2.15 -j DROP - so I can access mountpoint
-allow ssh-
scp 10.0.2.15:/a/hello /a
scp /a/howareyou 10.0.2.14:/a
- now they are in sync -
iptables -F INPUT
ls /a - works briefly but after a while:
ls: cannot access /a: Input/output error

As per documentation, triggering a self heal is done by
find <gluster-mount> -noleaf -print0 | xargs --null stat (where
<gluster-mount> is /a) - but again, /a cannot be accessed.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20120412/73c30621/attachment.html>


More information about the Gluster-users mailing list