[Bugs] [Bug 1707866] Thousands of duplicate files in glusterfs mountpoint directory listing

bugzilla at redhat.com bugzilla at redhat.com
Mon May 13 04:48:55 UTC 2019


https://bugzilla.redhat.com/show_bug.cgi?id=1707866

Nithya Balachandran <nbalacha at redhat.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |nbalacha at redhat.com
              Flags|                            |needinfo?(sergemp at mail.ru)



--- Comment #1 from Nithya Balachandran <nbalacha at redhat.com> ---
(In reply to Sergey from comment #0)
> I have something impossible: same filenames are listed multiple times:

Based on the information provided for zabbix.pm, the files are listed twice
because 2 separate copies of the files exist on different bricks.


> 
>   # ls -la /mnt/VOLNAME/
>   ...
>   -rwxrwxr-x   1 root   root   3486 Jan 28  2016 check_connections.pl
>   -rwxr-xr-x   1 root   root    153 Dec  7  2014 sigtest.sh
>   -rwxr-xr-x   1 root   root    153 Dec  7  2014 sigtest.sh
>   -rwxr-xr-x   1 root   root   3466 Jan  5  2015 zabbix.pm
>   -rwxr-xr-x   1 root   root   3466 Jan  5  2015 zabbix.pm
> 
> There're about 38981 duplicate files like that.
> 
> The volume itself is a 3 x 2-replica:
> 
>   # gluster volume info VOLNAME
>   Volume Name: VOLNAME
>   Type: Distributed-Replicate
>   Volume ID: 41f9096f-0d5f-4ea9-b369-89294cf1be99
>   Status: Started
>   Snapshot Count: 0
>   Number of Bricks: 3 x 2 = 6
>   Transport-type: tcp
>   Bricks:
>   Brick1: gfserver1:/srv/BRICK
>   Brick2: gfserver2:/srv/BRICK
>   Brick3: gfserver3:/srv/BRICK
>   Brick4: gfserver4:/srv/BRICK
>   Brick5: gfserver5:/srv/BRICK
>   Brick6: gfserver6:/srv/BRICK
>   Options Reconfigured:
>   transport.address-family: inet
>   nfs.disable: on
>   cluster.self-heal-daemon: enable
>   config.transport: tcp
> 
> The "duplicated" file on individual bricks:
> 
>   [gfserver1]# ls -la /srv/BRICK/zabbix.pm
>   ---------T 2 root root 0 Apr 23  2018 /srv/BRICK/zabbix.pm
> 
>   [gfserver2]# ls -la /srv/BRICK/zabbix.pm
>   ---------T 2 root root 0 Apr 23  2018 /srv/BRICK/zabbix.pm
> 

These 2 are linkto files and they are pointing to the data files on gfserver3
and gfserver4.

>   [gfserver3]# ls -la /srv/BRICK/zabbix.pm
>   -rwxr-xr-x 2 root root 3466 Jan  5  2015 /srv/BRICK/zabbix.pm
> 
>   [gfserver4]# ls -la /srv/BRICK/zabbix.pm
>   -rwxr-xr-x 2 root root 3466 Jan  5  2015 /srv/BRICK/zabbix.pm
> 



>   [gfserver5]# ls -la /srv/BRICK/zabbix.pm
>   -rwxr-xr-x 2 root root 3466 Jan  5  2015 /srv/BRICK/zabbix.pm
> 
>   [gfserver6]# ls -la /srv/BRICK/zabbix.pm
>   -rwxr-xr-x. 2 root root 3466 Jan  5  2015 /srv/BRICK/zabbix.pm
> 

These are the problematic files. I do not know why or how they ended up
existing on these bricks as well.


> Attributes:
> 
>   [gfserver1]# getfattr -m . -d -e hex /srv/BRICK/zabbix.pm
>   # file: srv/BRICK/zabbix.pm
>   trusted.afr.VOLNAME-client-1=0x000000000000000000000000
>   trusted.afr.VOLNAME-client-4=0x000000000000000000000000
>   trusted.gfid=0x422a7ccf018242b58e162a65266326c3
>   trusted.glusterfs.dht.linkto=0x6678666565642d7265706c69636174652d3100
> 
>   [gfserver2]# getfattr -m . -d -e hex /srv/BRICK/zabbix.pm
>   # file: srv/BRICK/zabbix.pm
>   trusted.gfid=0x422a7ccf018242b58e162a65266326c3
>  
> trusted.gfid2path.
> 3b27d24cad4dceef=0x30303030303030302d303030302d303030302d303030302d3030303030
> 303030303030312f7a61626269782e706d
>   trusted.glusterfs.dht.linkto=0x6678666565642d7265706c69636174652d3100
> 
>   [gfserver3]# getfattr -m . -d -e hex /srv/BRICK/zabbix.pm
>   # file: srv/BRICK/zabbix.pm
>   trusted.afr.VOLNAME-client-2=0x000000000000000000000000
>   trusted.afr.VOLNAME-client-3=0x000000000000000000000000
>   trusted.gfid=0x422a7ccf018242b58e162a65266326c3
> 
>   [gfserver4]# getfattr -m . -d -e hex /srv/BRICK/zabbix.pm
>   # file: srv/BRICK/zabbix.pm
>   trusted.gfid=0x422a7ccf018242b58e162a65266326c3
>  
> trusted.gfid2path.
> 3b27d24cad4dceef=0x30303030303030302d303030302d303030302d303030302d3030303030
> 303030303030312f7a61626269782e706d
> 
>   [gfserver5]# getfattr -m . -d -e hex /srv/BRICK/zabbix.pm
>   # file: srv/BRICK/zabbix.pm
>   trusted.bit-rot.version=0x03000000000000005c4f813c000bc71b
>   trusted.gfid=0x422a7ccf018242b58e162a65266326c3
> 
>   [gfserver6]# getfattr -m . -d -e hex /srv/BRICK/zabbix.pm
>   # file: srv/BRICK/zabbix.pm
>   security.selinux=0x73797374656d5f753a6f626a6563745f723a7661725f743a733000
>   trusted.bit-rot.version=0x02000000000000005add0ffc000eb66a
>   trusted.gfid=0x422a7ccf018242b58e162a65266326c3
> 
> Not sure why exactly it happened... Maybe because some nodes were suddenly
> upgraded from centos6's gluster ~3.7 to centos7's 4.1, and some files
> happened to be on nodes that they're not supposed to be on.
> 
> Currently all the nodes are online:
> 
>   # gluster pool list
>   UUID                                  Hostname        State
>   aac9e1a5-018f-4d27-9d77-804f0f1b2f13  gfserver5       Connected
>   98b22070-b579-4a91-86e3-482cfcc9c8cf  gfserver3       Connected
>   7a9841a1-c63c-49f2-8d6d-a90ae2ff4e04  gfserver4       Connected
>   955f5551-8b42-476c-9eaa-feab35b71041  gfserver6       Connected
>   7343d655-3527-4bcf-9d13-55386ccb5f9c  gfserver1       Connected
>   f9c79a56-830d-4056-b437-a669a1942626  gfserver2       Connected
>   45a72ab3-b91e-4076-9cf2-687669647217  localhost       Connected
> 
> and have glusterfs-3.12.14-1.el6.x86_64 (Centos 6) and
> glusterfs-4.1.7-1.el7.x86_64 (Centos 7) installed.
> 
> 
> Expected result
> ---------------
> 
> This looks like a layout issue, so:
> 
>   gluster volume rebalance VOLNAME fix-layout start
> 
> should fix it, right?
> 

No, fix layout only changes the layout and this is not a layout problem. This
is a problem with duplicate files on the bricks.


> 
> Actual result
> -------------
> 
> I tried:
>   gluster volume rebalance VOLNAME fix-layout start
>   gluster volume rebalance VOLNAME start
>   gluster volume rebalance VOLNAME start force
>   gluster volume heal VOLNAME full
> Those took 5 to 40 minutes to complete, but the duplicates are still there.



Can you send the rebalance logs for this volume from all the nodes?
How many clients to do you have accessing the volume?
Are the duplicate files seen only in the root of the volume or in subdirs as
well?

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list