<div dir="ltr"><div>might be too late but sort of simple always working solution for such cases is rebuilding .glusterfs</div><div><br></div><div>kill it and query attr for all files again, it will recreate .glusterfs on all bricks<br></div><div><br></div><div>something like mentioned here<br></div><div></div><div><a href="https://lists.gluster.org/pipermail/gluster-users/2018-January/033352.html">https://lists.gluster.org/pipermail/gluster-users/2018-January/033352.html</a></div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Jul 3, 2018 at 4:27 PM, Gambit15 <span dir="ltr"><<a href="mailto:dougti+gluster@gmail.com" target="_blank">dougti+gluster@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span class="">On 1 July 2018 at 22:37, Ashish Pandey <span dir="ltr"><<a href="mailto:aspandey@redhat.com" target="_blank">aspandey@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div><div><br></div><div>The only problem at the moment is that arbiter brick offline. You should only bother about completion of maintenance of arbiter brick ASAP.<br></div><div>Bring this brick UP, start FULL heal or index heal and the volume will be in healthy state.<br></div></div></div></blockquote><div><br></div></span><div>Doesn't the arbiter only resolve split-brain situations? None of the files that have been marked for healing are marked as in split-brain.<br><br></div><div>The arbiter has now been brought back up, however the problem continues.<br><br></div><div>I've found the following information in the client log:<br><br>[2018-07-03 19:09:29.245089] W [MSGID: 108008] [afr-self-heal-name.c:354:afr_<wbr>selfheal_name_gfid_mismatch_<wbr>check] 0-engine-replicate-0: GFID mismatch for <gfid:db9afb92-d2bc-49ed-8e34-<wbr>dcd437ba7be2>/hosted-engine.<wbr>metadata 5e95ba8c-2f12-49bf-be2d-<wbr>b4baf210d366 on engine-client-1 and b9cd7613-3b96-415d-a549-<wbr>1dc788a4f94d on engine-client-0<br>[2018-07-03 19:09:29.245585] W [fuse-bridge.c:471:fuse_entry_<wbr>cbk] 0-glusterfs-fuse: 10430040: LOOKUP() /98495dbc-a29c-4893-b6a0-<wbr>0aa70860d0c9/ha_agent/hosted-<wbr>engine.metadata => -1 (Input/output error)<br>[2018-07-03 19:09:30.619000] W [MSGID: 108008] [afr-self-heal-name.c:354:afr_<wbr>selfheal_name_gfid_mismatch_<wbr>check] 0-engine-replicate-0: GFID mismatch for <gfid:db9afb92-d2bc-49ed-8e34-<wbr>dcd437ba7be2>/hosted-engine.<wbr>lockspace 8e86902a-c31c-4990-b0c5-<wbr>0318807edb8f on engine-client-1 and e5899a4c-dc5d-487e-84b0-<wbr>9bbc73133c25 on engine-client-0<br>[2018-07-03 19:09:30.619360] W [fuse-bridge.c:471:fuse_entry_<wbr>cbk] 0-glusterfs-fuse: 10430656: LOOKUP() /98495dbc-a29c-4893-b6a0-<wbr>0aa70860d0c9/ha_agent/hosted-<wbr>engine.lockspace => -1 (Input/output error)<br></div><br></div><div class="gmail_quote">As you can see from the logs I posted previously, neither of those two files, on either of the two servers, have any of gluster's extended attributes set.<br><br>The arbiter doesn't have any record of the files in question, as they were created after it went offline.<br><br></div><div class="gmail_quote">How do I fix this? Is it possible to locate the correct gfids somewhere & redefine them on the files manually?<br><br></div><div class="gmail_quote">Cheers,<br></div><div class="gmail_quote"> Doug<br></div><div><div class="h5"><div class="gmail_quote"><br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div><hr id="m_-7752785096565085217gmail-m_-7061863707512342433zwchr"><div style="color:rgb(0,0,0);font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt"><b>From: </b>"Gambit15" <<a href="mailto:dougti%2Bgluster@gmail.com" target="_blank">dougti+gluster@gmail.com</a>><br><b>To: </b>"Ashish Pandey" <<a href="mailto:aspandey@redhat.com" target="_blank">aspandey@redhat.com</a>><br><b>Cc: </b>"gluster-users" <<a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a>><br><b>Sent: </b>Monday, July 2, 2018 1:45:01 AM<br><b>Subject: </b>Re: [Gluster-users] Files not healing & missing their extended attributes - Help!<div><div class="m_-7752785096565085217gmail-h5"><br><div><br></div><div dir="ltr"><div>Hi Ashish,<br><div><br></div></div><div>The output is below. It's a rep 2+1 volume. The arbiter is offline for maintenance at the moment, however quorum is met & no files are reported as in split-brain (it hosts VMs, so files aren't accessed concurrently).<br></div><div><br>======================<br>[root@v0 glusterfs]# gluster volume info engine<br><div><br></div>Volume Name: engine<br>Type: Replicate<br>Volume ID: 279737d3-3e5a-4ee9-8d4a-97edcc<wbr>a42427<br>Status: Started<br>Snapshot Count: 0<br>Number of Bricks: 1 x (2 + 1) = 3<br>Transport-type: tcp<br>Bricks:<br>Brick1: s0:/gluster/engine/brick<br>Brick2: s1:/gluster/engine/brick<br>Brick3: s2:/gluster/engine/arbiter (arbiter)<br>Options Reconfigured:<br>nfs.disable: on<br>performance.readdir-ahead: on<br>transport.address-family: inet<br>performance.quick-read: off<br>performance.read-ahead: off<br>performance.io-cache: off<br>performance.stat-prefetch: off<br>cluster.eager-lock: enable<br>network.remote-dio: enable<br>cluster.quorum-type: auto<br>cluster.server-quorum-type: server<br>storage.owner-uid: 36<br>storage.owner-gid: 36<br>performance.low-prio-threads: 32<br><div><br></div>====================== <br><div><br></div>[root@v0 glusterfs]# gluster volume heal engine info<br>Brick s0:/gluster/engine/brick<br>/__DIRECT_IO_TEST__<br>/98495dbc-a29c-4893-b6a0-0aa70<wbr>860d0c9/ha_agent<br></div><div><div><div>/98495dbc-a29c-4893-b6a0-0aa70<wbr>860d0c9<br> <LIST TRUNCATED FOR BREVITY> <br>Status: Connected<br>Number of entries: 34<br><div><br></div>Brick s1:/gluster/engine/brick<br> <SAME AS ABOVE - TRUNCATED FOR BREVITY> <br>Status: Connected<br>Number of entries: 34<br><div><br></div>Brick s2:/gluster/engine/arbiter<br>Status: Ponto final de transporte não está conectado<br>Number of entries: -<br><div><br></div>======================<br>=== PEER V0 ===<br><div><br></div>[root@v0 glusterfs]# getfattr -m . -d -e hex /gluster/engine/brick/98495dbc<wbr>-a29c-4893-b6a0-0aa70860d0c9/<wbr>ha_agent<br>getfattr: Removing leading '/' from absolute path names<br># file: gluster/engine/brick/98495dbc-<wbr>a29c-4893-b6a0-0aa70860d0c9/ha<wbr>_agent<br>security.selinux=0x73797374656<wbr>d5f753a6f626a6563745f723a756e6<wbr>c6162656c65645f743a733000<br>trusted.afr.dirty=0x0000000000<wbr>00000000000000<br>trusted.afr.engine-client-2=0x<wbr>0000000000000000000024e8<br>trusted.gfid=0xdb9afb92d2bc49e<wbr>d8e34dcd437ba7be2<br>trusted.glusterfs.dht=0x000000<wbr>010000000000000000ffffffff<br><div><br></div>[root@v0 glusterfs]# getfattr -m . -d -e hex /gluster/engine/brick/98495dbc<wbr>-a29c-4893-b6a0-0aa70860d0c9/<wbr>ha_agent/*<br>getfattr: Removing leading '/' from absolute path names<br># file: gluster/engine/brick/98495dbc-<wbr>a29c-4893-b6a0-0aa70860d0c9/ha<wbr>_agent/hosted-engine.lockspace<br>security.selinux=0x73797374656<wbr>d5f753a6f626a6563745f723a66757<wbr>36566735f743a733000<br><div><br></div># file: gluster/engine/brick/98495dbc-<wbr>a29c-4893-b6a0-0aa70860d0c9/ha<wbr>_agent/hosted-engine.metadata<br>security.selinux=0x73797374656<wbr>d5f753a6f626a6563745f723a66757<wbr>36566735f743a733000 <br><div><br></div></div><div>=== PEER V1 ===<br></div><div><br>[root@v1 glusterfs]# getfattr -m . -d -e hex /gluster/engine/brick/98495dbc<wbr>-a29c-4893-b6a0-0aa70860d0c9/<wbr>ha_agent<br>getfattr: Removing leading '/' from absolute path names<br># file: gluster/engine/brick/98495dbc-<wbr>a29c-4893-b6a0-0aa70860d0c9/ha<wbr>_agent<br>security.selinux=0x73797374656<wbr>d5f753a6f626a6563745f723a756e6<wbr>c6162656c65645f743a733000<br>trusted.afr.dirty=0x0000000000<wbr>00000000000000<br>trusted.afr.engine-client-2=0x<wbr>0000000000000000000024ec<br>trusted.gfid=0xdb9afb92d2bc49e<wbr>d8e34dcd437ba7be2<br>trusted.glusterfs.dht=0x000000<wbr>010000000000000000ffffffff<br><div><br></div>======================<br><div><br></div>cmd_history.log-20180701:<br><div><br></div>[2018-07-01 03:11:38.461175] : volume heal engine full : SUCCESS<br>[2018-07-01 03:11:51.151891] : volume heal data full : SUCCESS<br><div><br></div>glustershd.log-20180701:<br></div><div><LOGS FROM 06/01 TRUNCATED><br></div><div>[2018-07-01 07:15:04.779122] I [MSGID: 100011] [glusterfsd.c:1396:reincarnate<wbr>] 0-glusterfsd: Fetching the volume file from server... <br><div><br></div>glustershd.log:<br>[2018-07-01 07:15:04.779693] I [glusterfsd-mgmt.c:1596:mgmt_g<wbr>etspec_cbk] 0-glusterfs: No change in volfile, continuing<br></div><div><br></div><div>That's the *only* message in glustershd.log today.<br></div><div><br> ====================== <br><div><br></div>[root@v0 glusterfs]# gluster volume status engine<br>Status of volume: engine<br>Gluster process <wbr> TCP Port RDMA Port Online Pid<br>------------------------------<wbr>------------------------------<wbr>------------------<br>Brick s0:/gluster/engine/brick <wbr> 49154 0 Y 2816<br>Brick s1:/gluster/engine/brick <wbr> 49154 0 Y 3995<br>Self-heal Daemon on localhost N/A N/A Y 2919<br>Self-heal Daemon on s1 N/A N/A Y 4013<br><div><br></div>Task Status of Volume engine<br>------------------------------<wbr>------------------------------<wbr>------------------<br>There are no active volume tasks<br><div><br></div>====================== <br><div><br></div></div><div>Okay, so actually only the directory ha_agent is listed for healing (not its contents), & that does have attributes set.<br><div><br></div></div><div>Many thanks for the reply!<br><div><br></div></div></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On 1 July 2018 at 15:34, Ashish Pandey <span dir="ltr"><<a href="mailto:aspandey@redhat.com" target="_blank">aspandey@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div><div>You have not even talked about the volume type and configuration and this issue would require lot of other information to fix it.<br></div><div><br></div><div>1 - What is the type of volume and config.<br></div><div>2 - Provide the gluster v <volname> info out put<br></div><div>3 - Heal info out put<br></div><div>4 - getxattr of one of the file, which needs healing, from all the bricks.<br></div><div>5 - What lead to the healing of file?<br></div><div>6 - gluster v <volname> status<br></div><div>7 - glustershd.log out put just after you run full heal or index heal<br></div><div><br></div><div>----<br></div><div>Ashish<br></div><div><br></div><hr id="m_-7752785096565085217gmail-m_-7061863707512342433m_-4349002051472701379zwchr"><div style="color:rgb(0,0,0);font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt"><b>From: </b>"Gambit15" <<a href="mailto:dougti%2Bgluster@gmail.com" target="_blank">dougti+gluster@gmail.com</a>><br><b>To: </b>"gluster-users" <<a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a>><br><b>Sent: </b>Sunday, July 1, 2018 11:50:16 PM<br><b>Subject: </b>[Gluster-users] Files not healing & missing their extended attributes - Help!<div><div class="m_-7752785096565085217gmail-m_-7061863707512342433h5"><br><div><br></div><div dir="ltr"><div><div><div><div><div><div>Hi Guys,<br></div> I had to restart our datacenter yesterday, but since doing so a number of the files on my gluster share have been stuck, marked as healing. After no signs of progress, I manually set off a full heal last night, but after 24hrs, nothing's happened.<br></div><br>The gluster logs all look normal, and there're no messages about failed connections or heal processes kicking off.<br><div><br></div></div>I checked the listed files' extended attributes on their bricks today, and they only show the selinux attribute. There's none of the trusted.* attributes I'd expect.<br></div>The healthy files on the bricks do have their extended attributes though.<br><div><br></div></div>I'm guessing that perhaps the files somehow lost their attributes, and gluster is no longer able to work out what to do with them? It's not logged any errors, warnings, or anything else out of the normal though, so I've no idea what the problem is or how to resolve it.<br><div><br></div></div>I've got 16 hours to get this sorted before the start of work, Monday. Help!<br></div><br></div></div>______________________________<wbr>_________________<br>Gluster-users mailing list<br><a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br><a href="http://lists.gluster.org/mailman/listinfo/gluster-users" target="_blank">http://lists.gluster.org/mailm<wbr>an/listinfo/gluster-users</a><br></div><div><br></div></div></div></blockquote></div><br></div><br>______________________________<wbr>_________________<br>Gluster-users mailing list<br><a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br><a href="http://lists.gluster.org/mailman/listinfo/gluster-users" target="_blank">http://lists.gluster.org/mailm<wbr>an/listinfo/gluster-users</a></div></div></div><div><br></div></div></div></blockquote></div><br></div></div></div></div>
<br>______________________________<wbr>_________________<br>
Gluster-users mailing list<br>
<a href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a><br>
<a href="http://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">http://lists.gluster.org/<wbr>mailman/listinfo/gluster-users</a><br></blockquote></div><br></div>