[Gluster-users] [Errno 5] Input/output error: '/abc/def/ghi.gz'
    David Squire 
    dave at sawtoothsoftware.com
       
    Tue May 16 00:15:31 UTC 2017
    
    
  
I am running:  glusterfs 3.5.9 built on Mar 28 2016 07:10:17
 
Other volume info:
 
Type: Distributed-Replicate
Number of Bricks: 8 x 3 = 24
Transport-type: tcp
Options Reconfigured:
performance.cache-refresh-timeout: 30
performance.cache-size: 768MB
cluster.quorum-type: auto
cluster.server-quorum-type: server
cluster.server-quorum-ratio: 51
 
When I try to manipulate a file (def/ghi.gz) on the mounted glusterfs folder
(abc) I get an Errno 5 input/output error.  Most of the files work, but
there are lots that have this same problem.
 
I visited each brick in my volume to see what the extended file attributes
are for this file.
 
On my_volume-replicate-0 there is an empty file with the filename.  When I
run "ls -al" it looks like this:
---------T    2 root       root            0 Mar  1 14:56 ghi.gz
 
On the first two bricks (bricks 0 and 1) of my_volume-replicate-0 when I run
"getfattr -d -m. -e hex ghi.gz" I get the following results:
# file: ghi.gz
trusted.afr.my_volume-client-0=0x000000000000000000000000
trusted.afr.my_volume-client-1=0x000000000000000000000000
trusted.afr.my_volume-client-2=0x000000020000000200000000
trusted.gfid=0xabb0369b05844390add6ea72ce7e107a
trusted.glusterfs.dht.linkto=0x686f7374696e672d7265706c69636174652d3400
 
The link to looks like the following when I use text encoding instead of hex
encoding:
trusted.glusterfs.dht.linkto="my_volume-replicate-4"
 
The third brick (brick 2) of my_volume-replicate-0 has these extended
attributes:
# file: ghi.gz
trusted.gfid=0xc5c99fe21c3f4582b48e6f69ff76e33b
trusted.glusterfs.dht.linkto=0x686f7374696e672d7265706c69636174652d3400
 
So the third brick has a DIFFERENT trusted.gfid.
 
The first two bricks have
trusted.afr.my_volume-client-2=0x000000020000000200000000.  Does that mean
that the first two bricks think that the third brick (brick 2) has
differences?
 
All three bricks are linking to my_volume-replicate-4.
 
All three bricks (bricks 12, 13, and 14) of my_volume-replicate-4 all have
the actual file with these extended attributes:
# file: ghi.gz
trusted.afr.my_volume-client-12=0x000000000000000000000000
trusted.afr.my_volume-client-13=0x000000000000000000000000
trusted.afr.my_volume-client-14=0x000000000000000000000000
trusted.gfid=0xabb0369b05844390add6ea72ce7e107a
 
So, my_volume-replicate-4's trusted.gfid matches bricks 0 and 1 of
my_volume-replicate-0.  And they all have 0x000000000000000000000000 for all
three trusted.afr.my_volume-client-## attribute.  I assume this means that
the file is the same on all three bricks of my_volume-replicate-4.
 
No other bricks in the system have the ghi.gz file on them.
 
When I go to .glusterfs/indices/xattrop of bricks 0 and 1 there is a file
there named abb0369b-0584-4390-add6-ea72ce7e107a.  This means that this file
id is in need of healing, correct?  There is NOT a file named
abb0369b-0584-4390-add6-ea72ce7e107a on brick 2.
 
When I run "gluster volume heal my_volume info heal-failed" it lists
<gfid:abb0369b-0584-4390-add6-ea72ce7e107a> four times.  I have tried to do
a full heal and a rebalance of the system, but it does not fix this problem.
 
How do I fix this problem?  Is there an easy way that I can fix all of the
files with the problem in bulk?
 
Thank you very much for any insights or help you may have!!
 
Dave
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170515/8e0bf44e/attachment.html>
    
    
More information about the Gluster-users
mailing list