[Gluster-users] Possible error not being returned

Ankireddypalle Reddy areddy at commvault.com
Mon May 23 09:22:16 UTC 2016


Xavier,
                We are using disperse volume to save data being backed up by Commvault Simpana software. We are performing stress testing and have noticed that the issue happens when the 10G link is completely saturated consistently. The read would succeed but would return incorrect data. CRC checks fail on the returned data. The brick daemons are up and operational. To us it mostly appears to be communication issues. We have noticed lot of these issues when 1G NIC's were used. The error frequency went down drastically after moving the gluster traffic to 10G. 

Volume Name: SDSStoragePool
Type: Distributed-Disperse
Volume ID: c5ebb780-669f-4c31-9970-e12dae1f473c
Status: Started
Number of Bricks: 8 x (2 + 1) = 24
Transport-type: tcp
Bricks:
Brick1: cvltpbba1sds:/ws/disk1/ws_brick
Brick2: cvltpbba3sds:/ws/disk1/ws_brick
Brick3: cvltpbba4sds:/ws/disk1/ws_brick
Brick4: cvltpbba1sds:/ws/disk2/ws_brick
Brick5: cvltpbba3sds:/ws/disk2/ws_brick
Brick6: cvltpbba4sds:/ws/disk2/ws_brick
Brick7: cvltpbba1sds:/ws/disk3/ws_brick
Brick8: cvltpbba3sds:/ws/disk3/ws_brick
Brick9: cvltpbba4sds:/ws/disk3/ws_brick
Brick10: cvltpbba1sds:/ws/disk4/ws_brick
Brick11: cvltpbba3sds:/ws/disk4/ws_brick
Brick12: cvltpbba4sds:/ws/disk4/ws_brick
Brick13: cvltpbba1sds:/ws/disk5/ws_brick
Brick14: cvltpbba3sds:/ws/disk5/ws_brick
Brick15: cvltpbba4sds:/ws/disk5/ws_brick
Brick16: cvltpbba1sds:/ws/disk6/ws_brick
Brick17: cvltpbba3sds:/ws/disk6/ws_brick
Brick18: cvltpbba4sds:/ws/disk6/ws_brick
Brick19: cvltpbba1sds:/ws/disk7/ws_brick
Brick20: cvltpbba3sds:/ws/disk7/ws_brick
Brick21: cvltpbba4sds:/ws/disk7/ws_brick
Brick22: cvltpbba1sds:/ws/disk8/ws_brick
Brick23: cvltpbba3sds:/ws/disk8/ws_brick
Brick24: cvltpbba4sds:/ws/disk8/ws_brick
Options Reconfigured:
performance.readdir-ahead: on

Thanks and Regards,
Ram    

-----Original Message-----
From: Xavier Hernandez [mailto:xhernandez at datalab.es] 
Sent: Monday, May 23, 2016 3:22 AM
To: Ankireddypalle Reddy; gluster-users at gluster.org
Subject: Re: [Gluster-users] Possible error not being returned

It's possible that the operation that failed is an internal one made by disperse itself or any other translator, so this error is not reported to the application.

The read issued by the application will only fail if anything fails while processing the read itself. If everything goes well, the read will succeed and it should contain healthy data.

What configuration are you using ? (gluster volume info) What are you doing exactly ? (workload) Why is one brick down/damaged ? are you doing tests ? how are you doing them ?

Best regards,

Xavi

On 20/05/16 16:54, Ankireddypalle Reddy wrote:
> Hi,
>
>         Did anyone get a chance to check this. We are intermittently 
> receiving corrupted data in read operations because of this.
>
>
>
> Thanks and Regards,
>
> Ram
>
>
>
> *From:*gluster-users-bounces at gluster.org
> [mailto:gluster-users-bounces at gluster.org] *On Behalf Of 
> *Ankireddypalle Reddy
> *Sent:* Thursday, May 19, 2016 3:59 PM
> *To:* gluster-users at gluster.org
> *Subject:* [Gluster-users] Possible error not being returned
>
>
>
> Hi,
>
>        A disperse volume  was configured on  servers with limited 
> network bandwidth. Some of the read operations failed with error
>
>
>
> [2016-05-16 18:38:36.035559] E [MSGID: 122034] 
> [ec-common.c:461:ec_child_select] 0-SDSStoragePool-disperse-2:
> Insufficient available childs for this request (have 1, need 2)
>
> [2016-05-16 18:38:36.035713] W [fuse-bridge.c:2213:fuse_readv_cbk]
> 0-glusterfs-fuse: 155121179: READ => -1 (Input/output error)
>
>
>
> For some read operations just the following error was logged but the 
> I/O did not fail.
>
> [2016-05-16 18:42:45.401570] E [MSGID: 122034] 
> [ec-common.c:461:ec_child_select] 0-SDSStoragePool-disperse-3:
> Insufficient available childs for this request (have 1, need 2)
>
> [2016-05-16 18:42:45.402054] W [MSGID: 122053] 
> [ec-common.c:116:ec_check_status] 0-SDSStoragePool-disperse-3: 
> Operation failed on some subvolumes (up=7, mask=6, remaining=0, 
> good=6, bad=1)
>
>
>
> We are receiving corrupted data in the read operation when the error 
> is logged but the read call did not return any error.
>
>
>
> Thanks and Regards,
>
> Ram
>
>
>
>
>
>
>
>
>
> ***************************Legal Disclaimer***************************
>
> "This communication may contain confidential and privileged material 
> for the
>
> sole use of the intended recipient. Any unauthorized review, use or 
> distribution
>
> by others is strictly prohibited. If you have received the message by 
> mistake,
>
> please advise the sender by reply email and delete the message. Thank you."
>
> **********************************************************************
>
>
> ***************************Legal Disclaimer***************************
> "This communication may contain confidential and privileged material 
> for the sole use of the intended recipient. Any unauthorized review, 
> use or distribution by others is strictly prohibited. If you have 
> received the message by mistake, please advise the sender by reply email and delete the message. Thank you."
> **********************************************************************
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>



***************************Legal Disclaimer***************************
"This communication may contain confidential and privileged material for the
sole use of the intended recipient. Any unauthorized review, use or distribution
by others is strictly prohibited. If you have received the message by mistake,
please advise the sender by reply email and delete the message. Thank you."
**********************************************************************


More information about the Gluster-users mailing list