[Gluster-users] Quick way to fix stale gfids?

Mon Feb 13 11:21:33 UTC 2023

My volume is replica 3 arbiter 1, maybe that makes a difference?
Bricks processes tend to die quite often (I have to restart glusterd at 
least once a day because "gluster v info | grep ' N '" reports at least 
one missing brick; sometimes even if all bricks are reported up I have 
to kill all glusterfs[d] processes and restart glusterd).

The 3 servers have 192GB RAM (that should be way more than enough!), 30 
data bricks and 15 arbiters (the arbiters share a single SSD).

And I noticed that some "stale file handle" are not reported by heal info.

root at str957-cluster:/# ls -l 
/scratch/extra/m******/PNG/PNGQuijote/ModGrav/fNL40/
ls: cannot access 
'/scratch/extra/m******/PNG/PNGQuijote/ModGrav/fNL40/output_21': Stale 
file handle
total 40
d?????????  ? ?            ?               ?            ? output_21
...
but "gluster v heal cluster_data info |grep output_21" returns nothing. :(

Seems the other stale handles either got corrected by subsequent 'stat's 
or became I/O errors.

Diego.

Il 12/02/2023 21:34, Strahil Nikolov ha scritto:
> The 2-nd error indicates conflicts between the nodes. The only way that 
> could happen on replica 3 is gfid conflict (file/dir was renamed or 
> recreated).
> 
> Are you sure that all bricks are online? Usually 'Transport endpoint is 
> not connected' indicates a brick down situation.
> 
> First start with all stale file handles:
> check md5sum on all bricks. If it differs somewhere, delete the gfid and 
> move the file away from the brick and check in FUSE. If it's fine , 
> touch it and the FUSE client will "heal" it.
> 
> Best Regards,
> Strahil Nikolov
> 
> 
> 
>     On Tue, Feb 7, 2023 at 16:33, Diego Zuccato
>     <diego.zuccato at unibo.it> wrote:
>     The contents do not match exactly, but the only difference is the
>     "option shared-brick-count" line that sometimes is 0 and sometimes 1.
> 
>     The command you gave could be useful for the files that still needs
>     healing with the source still present, but the files related to the
>     stale gfids have been deleted, so "find -samefile" won't find anything.
> 
>     For the other files reported by heal info, I saved the output to
>     'healinfo', then:
>        for T in $(grep '^/' healinfo |sort|uniq); do stat /mnt/scratch$T >
>     /dev/null; done
> 
>     but I still see a lot of 'Transport endpoint is not connected' and
>     'Stale file handle' errors :( And many 'No such file or directory'...
> 
>     I don't understand the first two errors, since /mnt/scratch have been
>     freshly mounted after enabling client healing, and gluster v info does
>     not highlight unconnected/down bricks.
> 
>     Diego
> 
>     Il 06/02/2023 22:46, Strahil Nikolov ha scritto:
>      > I'm not sure if the md5sum has to match , but at least the content
>      > should do.
>      > In modern versions of GlusterFS the client side healing is
>     disabled ,
>      > but it's worth trying.
>      > You will need to enable cluster.metadata-self-heal,
>      > cluster.data-self-heal and cluster.entry-self-heal and then create a
>      > small one-liner that identifies the names of the files/dirs from the
>      > volume heal ,so you can stat them through the FUSE.
>      >
>      > Something like this:
>      >
>      >
>      > for i in $(gluster volume heal <VOL> info | awk -F '<gfid:|>'
>     '/gfid:/
>      > {print $2}'); do find /PATH/TO/BRICK/ -samefile
>      > /PATH/TO/BRICK/.glusterfs/${i:0:2}/${i:2:2}/$i | awk '!/.glusterfs/
>      > {gsub("/PATH/TO/BRICK", "stat /MY/FUSE/MOUNTPOINT", $0); print
>     $0}' ; done
>      >
>      > Then Just copy paste the output and you will trigger the client side
>      > heal only on the affected gfids.
>      >
>      > Best Regards,
>      > Strahil Nikolov
>      > В понеделник, 6 февруари 2023 г., 10:19:02 ч. Гринуич+2, Diego
>     Zuccato
>      > <diego.zuccato at unibo.it <mailto:diego.zuccato at unibo.it>> написа:
>      >
>      >
>      > Ops... Reincluding the list that got excluded in my previous
>     answer :(
>      >
>      > I generated md5sums of all files in vols/ on clustor02 and
>     compared to
>      > the other nodes (clustor00 and clustor01).
>      > There are differences in volfiles (shouldn't it always be 1,
>     since every
>      > data brick is on its own fs? quorum bricks, OTOH, share a single
>      > partition on SSD and should always be 15, but in both cases sometimes
>      > it's 0).
>      >
>      > I nearly got a stroke when I saw diff output for 'info' files,
>     but once
>      > I sorted 'em their contents matched. Pfhew!
>      >
>      > Diego
>      >
>      > Il 03/02/2023 19:01, Strahil Nikolov ha scritto:
>      >  > This one doesn't look good:
>      >  >
>      >  >
>      >  > [2023-02-03 07:45:46.896924 +0000] E [MSGID: 114079]
>      >  > [client-handshake.c:1253:client_query_portmap]
>     0-cluster_data-client-48:
>      >  > remote-subvolume not set in volfile []
>      >  >
>      >  >
>      >  > Can you compare all vol files in /var/lib/glusterd/vols/
>     between the
>      > nodes ?
>      >  > I have the suspicioun that there is a vol file mismatch (maybe
>      >  > /var/lib/glusterd/vols/<VOLUME_NAME>/*-shd.vol).
>      >  >
>      >  > Best Regards,
>      >  > Strahil Nikolov
>      >  >
>      >  >    On Fri, Feb 3, 2023 at 12:20, Diego Zuccato
>      >  >    <diego.zuccato at unibo.it <mailto:diego.zuccato at unibo.it>
>     <mailto:diego.zuccato at unibo.it <mailto:diego.zuccato at unibo.it>>> wrote:
>      >  >    Can't see anything relevant in glfsheal log, just messages
>     related to
>      >  >    the crash of one of the nodes (the one that had the mobo
>     replaced... I
>      >  >    fear some on-disk structures could have been silently
>     damaged by RAM
>      >  >    errors and that makes gluster processes crash, or it's just
>     an issue
>      >  >    with enabling brick-multiplex).
>      >  >    -8<--
>      >  >    [2023-02-03 07:45:46.896924 +0000] E [MSGID: 114079]
>      >  >    [client-handshake.c:1253:client_query_portmap]
>      >  >    0-cluster_data-client-48:
>      >  >    remote-subvolume not set in volfile []
>      >  >    [2023-02-03 07:45:46.897282 +0000] E
>      >  >    [rpc-clnt.c:331:saved_frames_unwind] (-->
>      >  >
>      >
>     /lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x195)[0x7fce0c867b95]
>      >  >    (-->
>     /lib/x86_64-linux-gnu/libgfrpc.so.0(+0x72fc)[0x7fce0c0ca2fc] (-->
>      >  >
>      >
>     /lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x109)[0x7fce0c0d2419]
>      >  >    (-->
>     /lib/x86_64-linux-gnu/libgfrpc.so.0(+0x10308)[0x7fce0c0d3308]
>      > (-->
>      >  >
>      >
>     /lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_notify+0x26)[0x7fce0c0ce7e6]
>      >  >    ))))) 0-cluster_data-client-48: forced unwinding frame
>     type(GF-DUMP)
>      >  >    op(NULL(2)) called at 2023-02-03 07:45:46.891054 +0000
>     (xid=0x13)
>      >  >    -8<--
>      >  >
>      >  >    Well, actually I *KNOW* the files outside .glusterfs have
>     been deleted
>      >  >    (by me :) ). That's why I call those 'stale' gfids.
>      >  >    Affected entries under .glusterfs have usually link count =
>     1 =>
>      >  >    nothing
>      >  >    'find' can find.
>      >  >    Since I already recovered those files (before deleting from
>     bricks),
>      >  >    can
>      >  >    .glusterfs entries be deleted too or should I check
>     something else?
>      >  >    Maybe I should create a script that finds all files/dirs (not
>      > symlinks,
>      >  >    IIUC) in .glusterfs on all bricks/arbiters and moves 'em to
>     a temp
>      > dir?
>      >  >
>      >  >    Diego
>      >  >
>      >  >    Il 02/02/2023 23:35, Strahil Nikolov ha scritto:
>      >  >      > Any issues reported in /var/log/glusterfs/glfsheal-*.log ?
>      >  >      >
>      >  >      > The easiest way to identify the affected entries is to run:
>      >  >      > find /FULL/PATH/TO/BRICK/ -samefile
>      >  >      >
>      >  >
>      >
>     /FULL/PATH/TO/BRICK/.glusterfs/57/e4/57e428c7-6bed-4eb3-b9bd-02ca4c46657a
>      >  >      >
>      >  >      >
>      >  >      > Best Regards,
>      >  >      > Strahil Nikolov
>      >  >      >
>      >  >      >
>      >  >      > В вторник, 31 януари 2023 г., 11:58:24 ч. Гринуич+2,
>     Diego Zuccato
>      >  >      > <diego.zuccato at unibo.it <mailto:diego.zuccato at unibo.it>
>     <mailto:diego.zuccato at unibo.it <mailto:diego.zuccato at unibo.it>>
>      > <mailto:diego.zuccato at unibo.it <mailto:diego.zuccato at unibo.it>
>     <mailto:diego.zuccato at unibo.it <mailto:diego.zuccato at unibo.it>>>>
>     написа:
>      >  >      >
>      >  >      >
>      >  >      > Hello all.
>      >  >      >
>      >  >      > I've had one of the 3 nodes serving a "replica 3
>     arbiter 1"
>      > down for
>      >  >      > some days (apparently RAM issues, but actually failing
>     mobo).
>      >  >      > The other nodes have had some issues (RAM exhaustion,
>     old problem
>      >  >      > already ticketed but still no solution) and some brick
>     processes
>      >  >      > coredumped. Restarting the processes allowed the
>     cluster to
>      > continue
>      >  >      > working. Mostly.
>      >  >      >
>      >  >      > After the third server got fixed I started a heal, but
>     files
>      >  >    didn't get
>      >  >      > healed and count (by "ls -l
>      >  >      > /srv/bricks/*/d/.glusterfs/indices/xattrop/|grep ^-|wc
>     -l")
>      > did not
>      >  >      > decrease over 2 days. So, to recover I copied files
>     from bricks
>      >  >    to temp
>      >  >      > storage (keeping both copies of conflicting files with
>     different
>      >  >      > contents), removed files on bricks and arbiters, and
>     finally
>      >  >    copied back
>      >  >      > from temp storage to the volume.
>      >  >      >
>      >  >      > Now the files are accessible but I still see lots of
>     entries like
>      >  >      > <gfid:57e428c7-6bed-4eb3-b9bd-02ca4c46657a>
>      >  >      >
>      >  >      > IIUC that's due to a mismatch between .glusterfs/
>     contents and
>      > normal
>      >  >      > hierarchy. Is there some tool to speed up the cleanup?
>      >  >      >
>      >  >      > Tks.
>      >  >      >
>      >  >      > --
>      >  >      > Diego Zuccato
>      >  >      > DIFA - Dip. di Fisica e Astronomia
>      >  >      > Servizi Informatici
>      >  >      > Alma Mater Studiorum - Università di Bologna
>      >  >      > V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
>      >  >      > tel.: +39 051 20 95786
>      >  >      > ________
>      >  >      >
>      >  >      >
>      >  >      >
>      >  >      > Community Meeting Calendar:
>      >  >      >
>      >  >      > Schedule -
>      >  >      > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>      >  >      > Bridge: https://meet.google.com/cpu-eiue-hvk
>     <https://meet.google.com/cpu-eiue-hvk >
>      > <https://meet.google.com/cpu-eiue-hvk
>     <https://meet.google.com/cpu-eiue-hvk>>
>      >  >    <https://meet.google.com/cpu-eiue-hvk
>     <https://meet.google.com/cpu-eiue-hvk >
>      > <https://meet.google.com/cpu-eiue-hvk
>     <https://meet.google.com/cpu-eiue-hvk >>>
>      >  >      > <https://meet.google.com/cpu-eiue-hvk
>     <https://meet.google.com/cpu-eiue-hvk >
>      > <https://meet.google.com/cpu-eiue-hvk
>     <https://meet.google.com/cpu-eiue-hvk>>
>      >  >    <https://meet.google.com/cpu-eiue-hvk
>     <https://meet.google.com/cpu-eiue-hvk >
>      > <https://meet.google.com/cpu-eiue-hvk
>     <https://meet.google.com/cpu-eiue-hvk>>>>
>      >  >      > Gluster-users mailing list
>      >  >      > Gluster-users at gluster.org
>     <mailto:Gluster-users at gluster.org> <mailto:Gluster-users at gluster.org
>     <mailto:Gluster-users at gluster.org>>
>      > <mailto:Gluster-users at gluster.org
>     <mailto:Gluster-users at gluster.org> <mailto:Gluster-users at gluster.org
>     <mailto:Gluster-users at gluster.org>>>
>      >  >    <mailto:Gluster-users at gluster.org
>     <mailto:Gluster-users at gluster.org>
>      > <mailto:Gluster-users at gluster.org
>     <mailto:Gluster-users at gluster.org>>
>     <mailto:Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>      > <mailto:Gluster-users at gluster.org
>     <mailto:Gluster-users at gluster.org>>>>
>      >  >      >
>     https://lists.gluster.org/mailman/listinfo/gluster-users
>     <https://lists.gluster.org/mailman/listinfo/gluster-users >
>      > <https://lists.gluster.org/mailman/listinfo/gluster-users
>     <https://lists.gluster.org/mailman/listinfo/gluster-users>>
>      >  >    <https://lists.gluster.org/mailman/listinfo/gluster-users
>     <https://lists.gluster.org/mailman/listinfo/gluster-users >
>      > <https://lists.gluster.org/mailman/listinfo/gluster-users
>     <https://lists.gluster.org/mailman/listinfo/gluster-users >>>
>      >  >      >
>     <https://lists.gluster.org/mailman/listinfo/gluster-users
>     <https://lists.gluster.org/mailman/listinfo/gluster-users >
>      > <https://lists.gluster.org/mailman/listinfo/gluster-users
>     <https://lists.gluster.org/mailman/listinfo/gluster-users>>
>      >  >    <https://lists.gluster.org/mailman/listinfo/gluster-users
>     <https://lists.gluster.org/mailman/listinfo/gluster-users >
>      > <https://lists.gluster.org/mailman/listinfo/gluster-users
>     <https://lists.gluster.org/mailman/listinfo/gluster-users>>>>
> 
>      >
>      >  >
>      >  >
>      >  >    --
>      >  >    Diego Zuccato
>      >  >    DIFA - Dip. di Fisica e Astronomia
>      >  >    Servizi Informatici
>      >  >    Alma Mater Studiorum - Università di Bologna
>      >  >    V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
>      >  >    tel.: +39 051 20 95786
>      >  >
>      >
>      > --
>      > Diego Zuccato
>      > DIFA - Dip. di Fisica e Astronomia
>      > Servizi Informatici
>      > Alma Mater Studiorum - Università di Bologna
>      > V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
>      > tel.: +39 051 20 95786
> 
>     -- 
>     Diego Zuccato
>     DIFA - Dip. di Fisica e Astronomia
>     Servizi Informatici
>     Alma Mater Studiorum - Università di Bologna
>     V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
>     tel.: +39 051 20 95786
> 

-- 
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786