[Gluster-users] Issue with Pro active self healing for Erasure coding

Mohamed Pakkeer mdfakkeer at gmail.com
Tue May 26 08:15:38 UTC 2015


Hi Glusterfs Experts,

We are testing glusterfs 3.7.0 tarball on our 10 Node glusterfs cluster.
Each node has 36 dirves and please find the volume info below

Volume Name: vaulttest5
Type: Distributed-Disperse
Volume ID: 68e082a6-9819-4885-856c-1510cd201bd9
Status: Started
Number of Bricks: 36 x (8 + 2) = 360
Transport-type: tcp
Bricks:
Brick1: 10.1.2.1:/media/disk1
Brick2: 10.1.2.2:/media/disk1
Brick3: 10.1.2.3:/media/disk1
Brick4: 10.1.2.4:/media/disk1
Brick5: 10.1.2.5:/media/disk1
Brick6: 10.1.2.6:/media/disk1
Brick7: 10.1.2.7:/media/disk1
Brick8: 10.1.2.8:/media/disk1
Brick9: 10.1.2.9:/media/disk1
Brick10: 10.1.2.10:/media/disk1
Brick11: 10.1.2.1:/media/disk2
Brick12: 10.1.2.2:/media/disk2
Brick13: 10.1.2.3:/media/disk2
Brick14: 10.1.2.4:/media/disk2
Brick15: 10.1.2.5:/media/disk2
Brick16: 10.1.2.6:/media/disk2
Brick17: 10.1.2.7:/media/disk2
Brick18: 10.1.2.8:/media/disk2
Brick19: 10.1.2.9:/media/disk2
Brick20: 10.1.2.10:/media/disk2
...
....
Brick351: 10.1.2.1:/media/disk36
Brick352: 10.1.2.2:/media/disk36
Brick353: 10.1.2.3:/media/disk36
Brick354: 10.1.2.4:/media/disk36
Brick355: 10.1.2.5:/media/disk36
Brick356: 10.1.2.6:/media/disk36
Brick357: 10.1.2.7:/media/disk36
Brick358: 10.1.2.8:/media/disk36
Brick359: 10.1.2.9:/media/disk36
Brick360: 10.1.2.10:/media/disk36
Options Reconfigured:
performance.readdir-ahead: on

We did some performance testing and simulated the proactive self healing
for Erasure coding. Disperse volume has been created across nodes.

*Description of problem*

I disconnected the *network of two nodes* and tried to write some video
files and *glusterfs* *wrote the video files on balance 8 nodes perfectly*.
I tried to download the uploaded file and it was downloaded perfectly. Then
i enabled the network of two nodes, the pro active self healing mechanism
worked perfectly and wrote the unavailable junk of data to the recently
enabled node from the other 8 nodes. But when i tried to download the same
file node, it showed Input/Output error. I couldn't download the file. I
think there is an issue in pro active self healing.

Also we tried the simulation with one node network failure. We faced same
I/O error issue while downloading the file


*Error while downloading file *

root at master02:/home/admin# rsync -r --progress /mnt/gluster/file13_AN
./1/file13_AN-2

sending incremental file list

file13_AN

  3,342,355,597 100%    4.87MB/s    0:10:54 (xfr#1, to-chk=0/1)

rsync: read errors mapping "/mnt/gluster/file13_AN": Input/output error (5)

WARNING: file13_AN failed verification -- update discarded (will try again).



 root at master02:/home/admin# cp /mnt/gluster/file13_AN ./1/file13_AN-3

cp: error reading ‘/mnt/gluster/file13_AN’: Input/output error
cp: failed to extend ‘./1/file13_AN-3’: Input/output error


We can't conclude the issue with glusterfs 3.7.0 or our glusterfs
configuration.

Any help would be greatly appreciated

-- 
Cheers
Backer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150526/c3f3384e/attachment.html>


More information about the Gluster-users mailing list