[Gluster-users] Files not healing & missing their extended attributes - Help!

Gambit15 dougti+gluster at gmail.com
Sun Jul 1 20:15:01 UTC 2018


Hi Ashish,

The output is below. It's a rep 2+1 volume. The arbiter is offline for
maintenance at the moment, however quorum is met & no files are reported as
in split-brain (it hosts VMs, so files aren't accessed concurrently).

======================
[root at v0 glusterfs]# gluster volume info engine

Volume Name: engine
Type: Replicate
Volume ID: 279737d3-3e5a-4ee9-8d4a-97edcca42427
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: s0:/gluster/engine/brick
Brick2: s1:/gluster/engine/brick
Brick3: s2:/gluster/engine/arbiter (arbiter)
Options Reconfigured:
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
storage.owner-uid: 36
storage.owner-gid: 36
performance.low-prio-threads: 32

======================

[root at v0 glusterfs]# gluster volume heal engine info
Brick s0:/gluster/engine/brick
/__DIRECT_IO_TEST__
/98495dbc-a29c-4893-b6a0-0aa70860d0c9/ha_agent
/98495dbc-a29c-4893-b6a0-0aa70860d0c9
<LIST TRUNCATED FOR BREVITY>
Status: Connected
Number of entries: 34

Brick s1:/gluster/engine/brick
<SAME AS ABOVE - TRUNCATED FOR BREVITY>
Status: Connected
Number of entries: 34

Brick s2:/gluster/engine/arbiter
Status: Ponto final de transporte não está conectado
Number of entries: -

======================
=== PEER V0 ===

[root at v0 glusterfs]# getfattr -m . -d -e hex
/gluster/engine/brick/98495dbc-a29c-4893-b6a0-0aa70860d0c9/ha_agent
getfattr: Removing leading '/' from absolute path names
# file: gluster/engine/brick/98495dbc-a29c-4893-b6a0-0aa70860d0c9/ha_agent
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.engine-client-2=0x0000000000000000000024e8
trusted.gfid=0xdb9afb92d2bc49ed8e34dcd437ba7be2
trusted.glusterfs.dht=0x000000010000000000000000ffffffff

[root at v0 glusterfs]# getfattr -m . -d -e hex
/gluster/engine/brick/98495dbc-a29c-4893-b6a0-0aa70860d0c9/ha_agent/*
getfattr: Removing leading '/' from absolute path names
# file:
gluster/engine/brick/98495dbc-a29c-4893-b6a0-0aa70860d0c9/ha_agent/hosted-engine.lockspace
security.selinux=0x73797374656d5f753a6f626a6563745f723a6675736566735f743a733000

# file:
gluster/engine/brick/98495dbc-a29c-4893-b6a0-0aa70860d0c9/ha_agent/hosted-engine.metadata
security.selinux=0x73797374656d5f753a6f626a6563745f723a6675736566735f743a733000


=== PEER V1 ===

[root at v1 glusterfs]# getfattr -m . -d -e hex
/gluster/engine/brick/98495dbc-a29c-4893-b6a0-0aa70860d0c9/ha_agent
getfattr: Removing leading '/' from absolute path names
# file: gluster/engine/brick/98495dbc-a29c-4893-b6a0-0aa70860d0c9/ha_agent
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.engine-client-2=0x0000000000000000000024ec
trusted.gfid=0xdb9afb92d2bc49ed8e34dcd437ba7be2
trusted.glusterfs.dht=0x000000010000000000000000ffffffff

======================

cmd_history.log-20180701:

[2018-07-01 03:11:38.461175]  : volume heal engine full : SUCCESS
[2018-07-01 03:11:51.151891]  : volume heal data full : SUCCESS

glustershd.log-20180701:
<LOGS FROM 06/01 TRUNCATED>
[2018-07-01 07:15:04.779122] I [MSGID: 100011]
[glusterfsd.c:1396:reincarnate] 0-glusterfsd: Fetching the volume file from
server...

glustershd.log:
[2018-07-01 07:15:04.779693] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk]
0-glusterfs: No change in volfile, continuing

That's the *only* message in glustershd.log today.

======================

[root at v0 glusterfs]# gluster volume status engine
Status of volume: engine
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick s0:/gluster/engine/brick              49154     0          Y
2816
Brick s1:/gluster/engine/brick              49154     0          Y
3995
Self-heal Daemon on localhost               N/A       N/A        Y
2919
Self-heal Daemon on s1                      N/A       N/A        Y
4013

Task Status of Volume engine
------------------------------------------------------------------------------
There are no active volume tasks

======================

Okay, so actually only the directory ha_agent is listed for healing (not
its contents), & that does have attributes set.

Many thanks for the reply!


On 1 July 2018 at 15:34, Ashish Pandey <aspandey at redhat.com> wrote:

> You have not even talked about the volume type and configuration and this
> issue would require lot of other information to fix it.
>
> 1 - What is the type of volume and config.
> 2 - Provide the gluster v <volname> info out put
> 3 - Heal info out put
> 4 - getxattr of one of the file, which needs healing, from all the bricks.
> 5 - What lead to the healing of file?
> 6 - gluster v <volname> status
> 7 - glustershd.log out put just after you run full heal or index heal
>
> ----
> Ashish
>
> ------------------------------
> *From: *"Gambit15" <dougti+gluster at gmail.com>
> *To: *"gluster-users" <gluster-users at gluster.org>
> *Sent: *Sunday, July 1, 2018 11:50:16 PM
> *Subject: *[Gluster-users] Files not healing & missing their
> extended        attributes - Help!
>
>
> Hi Guys,
>  I had to restart our datacenter yesterday, but since doing so a number of
> the files on my gluster share have been stuck, marked as healing. After no
> signs of progress, I manually set off a full heal last night, but after
> 24hrs, nothing's happened.
>
> The gluster logs all look normal, and there're no messages about failed
> connections or heal processes kicking off.
>
> I checked the listed files' extended attributes on their bricks today, and
> they only show the selinux attribute. There's none of the trusted.*
> attributes I'd expect.
> The healthy files on the bricks do have their extended attributes though.
>
> I'm guessing that perhaps the files somehow lost their attributes, and
> gluster is no longer able to work out what to do with them? It's not logged
> any errors, warnings, or anything else out of the normal though, so I've no
> idea what the problem is or how to resolve it.
>
> I've got 16 hours to get this sorted before the start of work, Monday.
> Help!
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180701/81b95daf/attachment.html>


More information about the Gluster-users mailing list