[Gluster-users] Possible error not being returned

Xavier Hernandez xhernandez at datalab.es
Mon May 23 12:11:15 UTC 2016


In this case you should have more warnings/errors in your log files 
beside the ones related to EC. Can you post them ?

EC tries to keep track of good and bad bricks, however if multiple 
internal operations are failing (specially if related with 
communications), maybe some special case happens and it's unable to 
determine that some brick is bad when it should.

If that's the case, we need to know how this happens to try to solve it.

Xavi

On 23/05/16 11:22, Ankireddypalle Reddy wrote:
> Xavier,
>                 We are using disperse volume to save data being backed up by Commvault Simpana software. We are performing stress testing and have noticed that the issue happens when the 10G link is completely saturated consistently. The read would succeed but would return incorrect data. CRC checks fail on the returned data. The brick daemons are up and operational. To us it mostly appears to be communication issues. We have noticed lot of these issues when 1G NIC's were used. The error frequency went down drastically after moving the gluster traffic to 10G.
>
> Volume Name: SDSStoragePool
> Type: Distributed-Disperse
> Volume ID: c5ebb780-669f-4c31-9970-e12dae1f473c
> Status: Started
> Number of Bricks: 8 x (2 + 1) = 24
> Transport-type: tcp
> Bricks:
> Brick1: cvltpbba1sds:/ws/disk1/ws_brick
> Brick2: cvltpbba3sds:/ws/disk1/ws_brick
> Brick3: cvltpbba4sds:/ws/disk1/ws_brick
> Brick4: cvltpbba1sds:/ws/disk2/ws_brick
> Brick5: cvltpbba3sds:/ws/disk2/ws_brick
> Brick6: cvltpbba4sds:/ws/disk2/ws_brick
> Brick7: cvltpbba1sds:/ws/disk3/ws_brick
> Brick8: cvltpbba3sds:/ws/disk3/ws_brick
> Brick9: cvltpbba4sds:/ws/disk3/ws_brick
> Brick10: cvltpbba1sds:/ws/disk4/ws_brick
> Brick11: cvltpbba3sds:/ws/disk4/ws_brick
> Brick12: cvltpbba4sds:/ws/disk4/ws_brick
> Brick13: cvltpbba1sds:/ws/disk5/ws_brick
> Brick14: cvltpbba3sds:/ws/disk5/ws_brick
> Brick15: cvltpbba4sds:/ws/disk5/ws_brick
> Brick16: cvltpbba1sds:/ws/disk6/ws_brick
> Brick17: cvltpbba3sds:/ws/disk6/ws_brick
> Brick18: cvltpbba4sds:/ws/disk6/ws_brick
> Brick19: cvltpbba1sds:/ws/disk7/ws_brick
> Brick20: cvltpbba3sds:/ws/disk7/ws_brick
> Brick21: cvltpbba4sds:/ws/disk7/ws_brick
> Brick22: cvltpbba1sds:/ws/disk8/ws_brick
> Brick23: cvltpbba3sds:/ws/disk8/ws_brick
> Brick24: cvltpbba4sds:/ws/disk8/ws_brick
> Options Reconfigured:
> performance.readdir-ahead: on
>
> Thanks and Regards,
> Ram
>
> -----Original Message-----
> From: Xavier Hernandez [mailto:xhernandez at datalab.es]
> Sent: Monday, May 23, 2016 3:22 AM
> To: Ankireddypalle Reddy; gluster-users at gluster.org
> Subject: Re: [Gluster-users] Possible error not being returned
>
> It's possible that the operation that failed is an internal one made by disperse itself or any other translator, so this error is not reported to the application.
>
> The read issued by the application will only fail if anything fails while processing the read itself. If everything goes well, the read will succeed and it should contain healthy data.
>
> What configuration are you using ? (gluster volume info) What are you doing exactly ? (workload) Why is one brick down/damaged ? are you doing tests ? how are you doing them ?
>
> Best regards,
>
> Xavi
>
> On 20/05/16 16:54, Ankireddypalle Reddy wrote:
>> Hi,
>>
>>         Did anyone get a chance to check this. We are intermittently
>> receiving corrupted data in read operations because of this.
>>
>>
>>
>> Thanks and Regards,
>>
>> Ram
>>
>>
>>
>> *From:*gluster-users-bounces at gluster.org
>> [mailto:gluster-users-bounces at gluster.org] *On Behalf Of
>> *Ankireddypalle Reddy
>> *Sent:* Thursday, May 19, 2016 3:59 PM
>> *To:* gluster-users at gluster.org
>> *Subject:* [Gluster-users] Possible error not being returned
>>
>>
>>
>> Hi,
>>
>>        A disperse volume  was configured on  servers with limited
>> network bandwidth. Some of the read operations failed with error
>>
>>
>>
>> [2016-05-16 18:38:36.035559] E [MSGID: 122034]
>> [ec-common.c:461:ec_child_select] 0-SDSStoragePool-disperse-2:
>> Insufficient available childs for this request (have 1, need 2)
>>
>> [2016-05-16 18:38:36.035713] W [fuse-bridge.c:2213:fuse_readv_cbk]
>> 0-glusterfs-fuse: 155121179: READ => -1 (Input/output error)
>>
>>
>>
>> For some read operations just the following error was logged but the
>> I/O did not fail.
>>
>> [2016-05-16 18:42:45.401570] E [MSGID: 122034]
>> [ec-common.c:461:ec_child_select] 0-SDSStoragePool-disperse-3:
>> Insufficient available childs for this request (have 1, need 2)
>>
>> [2016-05-16 18:42:45.402054] W [MSGID: 122053]
>> [ec-common.c:116:ec_check_status] 0-SDSStoragePool-disperse-3:
>> Operation failed on some subvolumes (up=7, mask=6, remaining=0,
>> good=6, bad=1)
>>
>>
>>
>> We are receiving corrupted data in the read operation when the error
>> is logged but the read call did not return any error.
>>
>>
>>
>> Thanks and Regards,
>>
>> Ram
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> ***************************Legal Disclaimer***************************
>>
>> "This communication may contain confidential and privileged material
>> for the
>>
>> sole use of the intended recipient. Any unauthorized review, use or
>> distribution
>>
>> by others is strictly prohibited. If you have received the message by
>> mistake,
>>
>> please advise the sender by reply email and delete the message. Thank you."
>>
>> **********************************************************************
>>
>>
>> ***************************Legal Disclaimer***************************
>> "This communication may contain confidential and privileged material
>> for the sole use of the intended recipient. Any unauthorized review,
>> use or distribution by others is strictly prohibited. If you have
>> received the message by mistake, please advise the sender by reply email and delete the message. Thank you."
>> **********************************************************************
>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>>
>
>
>
> ***************************Legal Disclaimer***************************
> "This communication may contain confidential and privileged material for the
> sole use of the intended recipient. Any unauthorized review, use or distribution
> by others is strictly prohibited. If you have received the message by mistake,
> please advise the sender by reply email and delete the message. Thank you."
> **********************************************************************
>


More information about the Gluster-users mailing list