[Gluster-users] Missing/Duplicate files on Gluster 3.6.5 distributed-replicate volume

Jon Sime jsime at omniti.com
Wed Aug 26 18:46:54 UTC 2015


We have a v3.6.5 two node cluster with a distributed-replicate volume (2x2
bricks, everything formatted with ext4 on CentOS 6.6) which regularly omits
some files from directory listings on the client side, and also regularly
duplicates the listing of some other files.

Summary of the issue and steps we've tried so far:

- There is only one client system connected to this volume.

- That client populates files in this volume by copying them from a local
  filesystem into the gluster mount point, via 'cp' within a single process
(it
  is a single-threaded Python script that invokes call() to run cp via a
  subprocess shell), so we believe we have ruled out any possibility of
  concurrency or race-condition problems as there is only one source of
writes
  and the files are copied sequentially.

- The two Gluster servers provide 7 volumes in total, but only one of the
  volumes has been observed with this behavior.

- There are no errors or warnings in the Gluster logs, on client or server.

- We have tried clearing all the extended attributes on all the bricks, but
  that did not resolve the problem.

- We have deleted everything on the brick filesystems (including
.glusterfs/),
  but copying the files over again (via the gluster mount point on the
client)
  results in the same missing & duplicate issue.

- We ran a rebalance/fix-layout on the volume, but that did not resolve the
  problem.

- Interestingly, the set of files which are missing from the directory
listings
  is the same each time we delete everything and try again with an empty
  directory; and the set of files which are duplicated in the listing
output is
  also the same each time.

- When all of the files have been copied over to the gluster volume,
running an
  'ls' from the client will show most, but not all of the files. Examining
the
  bricks directly shows that all of the files are present (and properly
  distributed and replicated). If an 'rm *' is then done from the client,
all
  of the files which were visible are deleted, but the files which had not
been
  visible on the client now are shown by 'ls' and some of them are shown
twice in
  the output. Examining the bricks directly again shows that all of the
files in
  the client's ls output are present, but there are no improper duplicates
(only
  the correctly-replicated copies that should be present). Running another
'rm *'
  correctly deletes all of the files both from the client's view, as well as
  removing all copies on the underlying bricks.

As requested in IRC, the following is output from getfattr for a file which
was
missing in the initial directory listing output on the client, as well as
the
getfattr output for its parent directory (I've included the same directory
from
all four bricks, though in this distributed+replicate layout, the file was
only
(properly) located on the bricks in each gluster hosts' /export/zones1).

As for an example of a file which appeared fine from the beginning, I'll
need
to follow up with that in a bit once I can get the client I'm doing this
for to
repeat the test, but pausing after the initial copy and before deleting the
set
of visible files.

FWIW, these files were copied to an empty volume after a rebalance operation
had been run.

(Host gluster-001)
-bash-4.1# getfattr -d -e hex -m .
/export/zones1/brick/landing/arrivals/xx/xx_user1/G03_Interim\ ELA\ PT\
Beetles\ \(IAB\)_2015-08-11.tar.gz.gpg
getfattr: Removing leading '/' from absolute path names
# file: export/zones1/brick/landing/arrivals/xx/xx_user1/G03_Interim ELA PT
Beetles (IAB)_2015-08-11.tar.gz.gpg
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.zones-client-0=0x000000000000000000000000
trusted.afr.zones-client-1=0x000000000000000000000000
trusted.gfid=0x8823094f0ea14f049bbc4f98895f7192

-bash-4.1# getfattr -d -e hex -m .
/export/zones1/brick/landing/arrivals/xx/xx_user1
getfattr: Removing leading '/' from absolute path names
# file: export/zones1/brick/landing/arrivals/xx/xx_user1
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.zones-client-0=0x000000000000000000000000
trusted.afr.zones-client-1=0x000000000000000000000000
trusted.gfid=0xdc7b9acea4084541a830935e48f4a2a1
trusted.glusterfs.dht=0x0000000100000000000000007fffd0ea

-bash-4.1# getfattr -d -e hex -m .
/export/zones2/brick/landing/arrivals/xx/xx_user1
getfattr: Removing leading '/' from absolute path names
# file: export/zones2/brick/landing/arrivals/xx/xx_user1
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.zones-client-2=0x000000000000000000000000
trusted.afr.zones-client-3=0x000000000000000000000000
trusted.gfid=0xdc7b9acea4084541a830935e48f4a2a1
trusted.glusterfs.dht=0x00000001000000007fffd0ebffffffff

(Host gluster-002)
-bash-4.1# getfattr -d -e hex -m .
/export/zones1/brick/landing/arrivals/xx/xx_user1/G03_Interim\ ELA\ PT\
Beetles\ \(IAB\)_2015-08-11.tar.gz.gpg
getfattr: Removing leading '/' from absolute path names
# file: export/zones1/brick/landing/arrivals/xx/xx_user1/G03_Interim ELA PT
Beetles (IAB)_2015-08-11.tar.gz.gpg
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.zones-client-0=0x000000000000000000000000
trusted.afr.zones-client-1=0x000000000000000000000000
trusted.gfid=0x8823094f0ea14f049bbc4f98895f7192

-bash-4.1# getfattr -d -e hex -m .
/export/zones1/brick/landing/arrivals/xx/xx_user1
getfattr: Removing leading '/' from absolute path names
# file: export/zones1/brick/landing/arrivals/xx/xx_user1
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.zones-client-0=0x000000000000000000000000
trusted.afr.zones-client-1=0x000000000000000000000000
trusted.gfid=0xdc7b9acea4084541a830935e48f4a2a1
trusted.glusterfs.dht=0x0000000100000000000000007fffd0ea

-bash-4.1# getfattr -d -e hex -m .
/export/zones2/brick/landing/arrivals/xx/xx_user1
getfattr: Removing leading '/' from absolute path names
# file: export/zones2/brick/landing/arrivals/xx/xx_user1
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.zones-client-2=0x000000000000000000000000
trusted.afr.zones-client-3=0x000000000000000000000000
trusted.gfid=0xdc7b9acea4084541a830935e48f4a2a1
trusted.glusterfs.dht=0x00000001000000007fffd0ebffffffff

Volume configuration server-side:
-bash-4.1# mount | grep zones
/dev/mapper/vg.zones1-lv.zones1 on /export/zones1 type ext4 (rw,noatime)
/dev/mapper/vg.zones2-lv.zones2 on /export/zones2 type ext4 (rw,noatime)
-bash-4.1# gluster volume info zones
Volume Name: zones
Type: Distributed-Replicate
Volume ID: 53ff45b1-8dc7-47ef-8a26-3245414e4990
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 10.1.1.122:/export/zones1/brick
Brick2: 10.1.1.121:/export/zones1/brick
Brick3: 10.1.1.122:/export/zones2/brick
Brick4: 10.1.1.121:/export/zones2/brick
Options Reconfigured:
client.ssl: off
server.ssl: off
performance.cache-size: 256MB
auth.ssl-allow: *
-bash-4.1# gluster volume status zones
Status of volume: zones
Gluster process                                      Port    Online  Pid
---------------------------------------------------------------------------
Brick 10.1.1.122:/export/zones1/brick                49165   Y       25189
Brick 10.1.1.121:/export/zones1/brick                49164   Y       697
Brick 10.1.1.122:/export/zones2/brick                49166   Y       25194
Brick 10.1.1.121:/export/zones2/brick                49161   Y       703
NFS Server on localhost                              2049    Y       25213
Self-heal Daemon on localhost                        N/A     Y       25222
NFS Server on 10.1.1.121                             2049    Y       719
Self-heal Daemon on 10.1.1.121                       N/A     Y       736

Task Status of Volume zones
---------------------------------------------------------------------------
Task                 : Rebalance
ID                   : 75f0b7ae-ed26-417b-a285-9ad81e40073c
Status               : completed

Mountpoint on client side:
-bash-4.1# mount | grep zones
10.1.1.122:/zones on /opt/edware/zones type fuse.glusterfs
(rw,default_permissions,allow_other,max_read=131072)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150826/91f49f28/attachment.html>


More information about the Gluster-users mailing list