[Gluster-users] Issue with Pro active self healing for Erasure coding

Wed May 27 10:01:26 UTC 2015

On 05/27/2015 11:26 AM, Mohamed Pakkeer wrote:
> Hi Xavier,
>
> Thanks for your reply. When can we expect the 3.7.1 release?

AFAIK a beta of 3.7.1 will be released very soon.

>
> cheers
> Backer
>
> On Wed, May 27, 2015 at 1:22 PM, Xavier Hernandez <xhernandez at datalab.es
> <mailto:xhernandez at datalab.es>> wrote:
>
>     Hi,
>
>     some Input/Output error issues have been identified and fixed. These
>     fixes will be available on 3.7.1.
>
>     Xavi
>
>
>     On 05/26/2015 10:15 AM, Mohamed Pakkeer wrote:
>
>         Hi Glusterfs Experts,
>
>         We are testing glusterfs 3.7.0 tarball on our 10 Node glusterfs
>         cluster.
>         Each node has 36 dirves and please find the volume info below
>
>         Volume Name: vaulttest5
>         Type: Distributed-Disperse
>         Volume ID: 68e082a6-9819-4885-856c-1510cd201bd9
>         Status: Started
>         Number of Bricks: 36 x (8 + 2) = 360
>         Transport-type: tcp
>         Bricks:
>         Brick1: 10.1.2.1:/media/disk1
>         Brick2: 10.1.2.2:/media/disk1
>         Brick3: 10.1.2.3:/media/disk1
>         Brick4: 10.1.2.4:/media/disk1
>         Brick5: 10.1.2.5:/media/disk1
>         Brick6: 10.1.2.6:/media/disk1
>         Brick7: 10.1.2.7:/media/disk1
>         Brick8: 10.1.2.8:/media/disk1
>         Brick9: 10.1.2.9:/media/disk1
>         Brick10: 10.1.2.10:/media/disk1
>         Brick11: 10.1.2.1:/media/disk2
>         Brick12: 10.1.2.2:/media/disk2
>         Brick13: 10.1.2.3:/media/disk2
>         Brick14: 10.1.2.4:/media/disk2
>         Brick15: 10.1.2.5:/media/disk2
>         Brick16: 10.1.2.6:/media/disk2
>         Brick17: 10.1.2.7:/media/disk2
>         Brick18: 10.1.2.8:/media/disk2
>         Brick19: 10.1.2.9:/media/disk2
>         Brick20: 10.1.2.10:/media/disk2
>         ...
>         ....
>         Brick351: 10.1.2.1:/media/disk36
>         Brick352: 10.1.2.2:/media/disk36
>         Brick353: 10.1.2.3:/media/disk36
>         Brick354: 10.1.2.4:/media/disk36
>         Brick355: 10.1.2.5:/media/disk36
>         Brick356: 10.1.2.6:/media/disk36
>         Brick357: 10.1.2.7:/media/disk36
>         Brick358: 10.1.2.8:/media/disk36
>         Brick359: 10.1.2.9:/media/disk36
>         Brick360: 10.1.2.10:/media/disk36
>         Options Reconfigured:
>         performance.readdir-ahead: on
>
>         We did some performance testing and simulated the proactive self
>         healing
>         for Erasure coding. Disperse volume has been created across nodes.
>
>         _*Description of problem*_
>
>         I disconnected the *network of two nodes* and tried to write
>         some video
>         files and *glusterfs* *wrote the video files on balance 8 nodes
>         perfectly*. I tried to download the uploaded file and it was
>         downloaded
>         perfectly. Then i enabled the network of two nodes, the pro
>         active self
>         healing mechanism worked perfectly and wrote the unavailable junk of
>         data to the recently enabled node from the other 8 nodes. But when i
>         tried to download the same file node, it showed Input/Output
>         error. I
>         couldn't download the file. I think there is an issue in pro
>         active self
>         healing.
>
>         Also we tried the simulation with one node network failure. We faced
>         same I/O error issue while downloading the file
>
>
>         _Error while downloading file _
>         _
>         _
>
>         root at master02:/home/admin# rsync -r --progress
>         /mnt/gluster/file13_AN
>         ./1/file13_AN-2
>
>         sending incremental file list
>
>         file13_AN
>
>             3,342,355,597 100% 4.87MB/s    0:10:54 (xfr#1, to-chk=0/1)
>
>         rsync: read errors mapping "/mnt/gluster/file13_AN":
>         Input/output error (5)
>
>         WARNING: file13_AN failed verification -- update discarded (will
>         try again).
>
>            root at master02:/home/admin# cp /mnt/gluster/file13_AN
>         ./1/file13_AN-3
>
>         cp: error reading ‘/mnt/gluster/file13_AN’: Input/output error
>
>         cp: failed to extend ‘./1/file13_AN-3’: Input/output error_
>         _
>
>
>         We can't conclude the issue with glusterfs 3.7.0 or our glusterfs
>         configuration.
>
>         Any help would be greatly appreciated
>
>         --
>         Cheers
>         Backer
>
>
>
>         _______________________________________________
>         Gluster-users mailing list
>         Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>         http://www.gluster.org/mailman/listinfo/gluster-users
>
>
>
>
>
>