[Gluster-users] Problems since 3.12.7: invisible files, strange rebalance size, setxattr failed during rebalance and broken unix rights

Tue Apr 24 08:26:48 UTC 2018

Hi,

thank you for you quick answer.

Am Montag, den 23.04.2018, 21:51 +0530 schrieb Nithya Balachandran:
> On 23 April 2018 at 18:52, Frank Ruehlemann <ruehlemann at itsc.uni-luebeck.de>
> wrote:
> 
> > Hi,
> >
> > after 2 years running GlusterFS without bigger problems we're facing
> > some strange errors lately.
> >
> > After updating to 3.12.7 some user reported at least 4 broken
> > directories with some invisible files. The files are at the bricks and
> > don't start with a dot, but aren't visible in "ls". Clients still can
> > interact with them by using the explicit path.
> > More information: https://bugzilla.redhat.com/show_bug.cgi?id=1564071
> 
> 
> I will continue the analysis for this issue in the bug.

This would be very helpful. We saw your request for additional
information and will provide them as soon as possible.

> > And since this update gluster reported for the rebalance of >16900 PB
> > (Petabyte!) of data for one of our 2 server, when using „gluster volume
> > rebalance $myvolume status“. The time looks right, but the size of
> > transfered files is absurd. The rebalance was with 3.12.6 in March 2018.
> > The last rebalance log file listed no errors and a realistic size at the
> > end.
> >
> 
> This has been seen a few times and is because an incorrect value is stored
> in the node_state.info file . However, I don't know what causes this
> incorrect value to be stored. It is harmless and can be ignored.

Ok. :)

> > We started a new rebalance today during a downtime of our corresponding
> > compute cluster, since these errors started to spread and this might
> > help. The output of „gluster volume rebalance $myvolume status“ doesn't
> > list any errors so far and the numbers look like realistic values.
> > But we're seeing some strange errors (every few minutes) reports in the
> > journald:
> > „[2018-04-23 12:31:24.942377] E [MSGID: 113001]
> > [posix.c:5983:_posix_handle_xattr_keyvalue_pair] 0-$myvolume-posix:
> > setxattr failed
> > on /srv/glusterfs/bricks/DATA112/data/.glusterfs/e6/a8/
> > e6a8ce50-fda5-4bad-8d4d-acd25dafcaa2 while doing xattrop:
> > key=trusted.glusterfs.quota.1ce02d3b-b7ae-4485-903c-2991de5350b6.contri.1
> > [No such file or directory]“
> > The rebalance log file lists no errors.
> >
> > Has anybody seen similar error messages during a rebalance?
> >
> 
> Are any directories being deleted/renamed during the rebalance? If yes,
> this could be a valid message.

No. We locked out all users and took down all clients that mount the volume before we started the rebalance to ensure that there's no interaction of any client with it.
The messages continued during the last hours and occurred up to several times per minute with some sporadic phases without them on all bricks of this volume.

> > And we see some files dublicated. There are two copies on different
> > bricks (we're running a distributed volume).
> > One copy looks like this:
> > $ ls -lah
> > -rwxr--r--  2 $user $group  293 May 11  2017 config
> >
> > The other one looks rather strange:
> > $ ls -lah
> > ---------T  2 root    $group    0 May 11  2017 config
> >
> > Has anybody seen similar broken files?
> >
> 
> This is fine as long as you only see a single file from the mount point.
> The 'T' files are internal gluster files (called linkto files) and should
> be invisible from the mount point.
> 
> 
> Regards,
> Nithya

This is good to know. Yes, all files we saw so far had only one of those
files.

Thanks for your message. It helped a lot.

-- 
Frank Rühlemann
   IT-Systemtechnik

UNIVERSITÄT ZU LÜBECK
    IT-Service-Center

    Ratzeburger Allee 160
    23562 Lübeck
    Tel +49 451 3101 2034
    Fax +49 451 3101 2004
    ruehlemann at itsc.uni-luebeck.de
    www.itsc.uni-luebeck.de