[Gluster-devel] Lot of EIO errors in disperse volume

Xavier Hernandez xhernandez at datalab.es
Tue Jan 10 14:09:40 UTC 2017


Hi Ram,

On 10/01/17 14:42, Ankireddypalle Reddy wrote:
> Attachments (2):
>
> 1
>
> 	
>
> ec.txt
> <https://imap.commvault.com/webconsole/embedded.do?url=https://imap.commvault.com/webconsole/api/drive/publicshare/346714/file/ee2d1536c2dc4dff94afb12132b4f8f6/action/preview&downloadUrl=https://imap.commvault.com/webconsole/api/contentstore/publicshare/346714/file/ee2d1536c2dc4dff94afb12132b4f8f6/action/download>
> [Download]
> <https://imap.commvault.com/webconsole/api/contentstore/publicshare/346714/file/ee2d1536c2dc4dff94afb12132b4f8f6/action/download>(11.50
> KB)
>
> 2
>
> 	
>
> ws-glus.log
> <https://imap.commvault.com/webconsole/embedded.do?url=https://imap.commvault.com/webconsole/api/drive/publicshare/346714/file/cff3e0506e754b9a939db02da1cbbd58/action/preview&downloadUrl=https://imap.commvault.com/webconsole/api/contentstore/publicshare/346714/file/cff3e0506e754b9a939db02da1cbbd58/action/download>
> [Download]
> <https://imap.commvault.com/webconsole/api/contentstore/publicshare/346714/file/cff3e0506e754b9a939db02da1cbbd58/action/download>(3.48
> MB)
>
> Xavi,
>           We are encountering errors for different kinds of FOPS.
>           The open failed for the following file:
>
>           cvd_2017_01_10_02_28_26.log:98182 1f9fe 01/10 00:57:10 8414465
> [MEDIAFS    ] 20117519-52075477 SingleInstancer_FS::StartDataFile2:
> Failed to create the data file
> [/ws/glus/Folder_07.11.2016_23.02/CV_MAGNETIC/V_8854974/CHUNK_51342720/SFILE_CONTAINER_062],
> error=0xECCC0005:{CQiFile::Open(92)} +
> {CQiUTFOSAPI::open(96)/ErrNo.5.(Input/output error)-Open failed,
> File=/ws/glus/Folder_07.11.2016_23.02/CV_MAGNETIC/V_8854974/CHUNK_51342720/SFILE_CONTAINER_062,
> OperationFlag=0xC1, PermissionMode=0x1FF}
>
>           I've attached the extended attributes for the directories
>           /ws/glus/Folder_07.11.2016_23.02/CV_MAGNETIC/V_8854974/ and
>
> /ws/glus/Folder_07.11.2016_23.02/CV_MAGNETIC/V_8854974/CHUNK_51342720
> from all the bricks.
>
>          The attributes look fine to me. I've also attached some log
> cuts to illustrate the problem.

I need the extended attributes of the file itself, not the parent 
directories.

Xavi

>
> Thanks and Regards,
> Ram
>
> -----Original Message-----
> From: Xavier Hernandez [mailto:xhernandez at datalab.es]
> Sent: Tuesday, January 10, 2017 7:53 AM
> To: Ankireddypalle Reddy; Gluster Devel (gluster-devel at gluster.org);
> gluster-users at gluster.org
> Subject: Re: [Gluster-devel] Lot of EIO errors in disperse volume
>
> Hi Ram,
>
> the error is caused by an extended attribute that does not match on all
> 3 bricks of the disperse set. Most probable value is trusted.ec.version,
> but could be others.
>
> At first sight, I don't see any change from 3.7.8 that could have caused
> this. I'll check again.
>
> What kind of operations are you doing ? this can help me narrow the search.
>
> Xavi
>
> On 10/01/17 13:43, Ankireddypalle Reddy wrote:
>> Xavi,
>>           Thanks. If you could please explain what to look for in the
> extended attributes then I will check and let you know if I find
> anything suspicious.  Also we noticed that some of these operations
> would succeed if retried. Do you know of any communicated related errors
> that are being reported/triaged.
>>
>> Thanks and Regards,
>> Ram
>>
>> -----Original Message-----
>> From: Xavier Hernandez [mailto:xhernandez at datalab.es]
>> Sent: Tuesday, January 10, 2017 7:23 AM
>> To: Ankireddypalle Reddy; Gluster Devel (gluster-devel at gluster.org);
>> gluster-users at gluster.org
>> Subject: Re: [Gluster-devel] Lot of EIO errors in disperse volume
>>
>> Hi Ram,
>>
>> On 10/01/17 13:14, Ankireddypalle Reddy wrote:
>>> Attachment (1):
>>>
>>> 1
>>>
>>>
>>>
>>> ecxattrs.txt
>>> <https://imap.commvault.com/webconsole/embedded.do?url=https://imap.c
>>> o
>>> mmvault.com/webconsole/api/drive/publicshare/346714/file/1272e6827874
>>> 4
>>> f15bf1a54f2b31b559d/action/preview&downloadUrl=https://imap.commvault.
>>> com/webconsole/api/contentstore/publicshare/346714/file/1272e68278744
>>> f
>>> 15bf1a54f2b31b559d/action/download>
>>> [Download]
>>> <https://imap.commvault.com/webconsole/api/contentstore/publicshare/3
>>> 4
>>> 6714/file/1272e68278744f15bf1a54f2b31b559d/action/download>(5.92
>>> KB)
>>>
>>> Xavi,
>>>              Please find attached the extended attributes for a
>>> directory from all the bricks. Free space check failed for this with
>>> error number EIO.
>>
>> What do you mean ? what operation have you made to check the free
> space on that directory ?
>>
>> If it's a recursive check, I need the extended attributes from the
> exact file that triggers the EIO. The attached attributes seem
> consistent and that directory shouldn't cause any problem. Does an 'ls'
> on that directory fail or does it show the contents ?
>>
>> Xavi
>>
>>>
>>> Thanks and Regards,
>>> Ram
>>>
>>> -----Original Message-----
>>> From: Xavier Hernandez [mailto:xhernandez at datalab.es]
>>> Sent: Tuesday, January 10, 2017 6:45 AM
>>> To: Ankireddypalle Reddy; Gluster Devel (gluster-devel at gluster.org);
>>> gluster-users at gluster.org
>>> Subject: Re: [Gluster-devel] Lot of EIO errors in disperse volume
>>>
>>> Hi Ram,
>>>
>>> can you execute the following command on all bricks on a file that is
>>> giving EIO ?
>>>
>>> getfattr -m. -e hex -d <path to file in brick>
>>>
>>> Xavi
>>>
>>> On 10/01/17 12:41, Ankireddypalle Reddy wrote:
>>>> Xavi,
>>>>             We have been running 3.7.8 on these servers. We upgraded
>>> to 3.7.18 yesterday. We upgraded all the servers at a time.  The
>>> volume was brought down during upgrade.
>>>>
>>>> Thanks and Regards,
>>>> Ram
>>>>
>>>> -----Original Message-----
>>>> From: Xavier Hernandez [mailto:xhernandez at datalab.es]
>>>> Sent: Tuesday, January 10, 2017 6:35 AM
>>>> To: Ankireddypalle Reddy; Gluster Devel (gluster-devel at gluster.org);
>>>> gluster-users at gluster.org
>>>> Subject: Re: [Gluster-devel] Lot of EIO errors in disperse volume
>>>>
>>>> Hi Ram,
>>>>
>>>> how did you upgrade gluster ? from which version ?
>>>>
>>>> Did you upgrade one server at a time and waited until self-heal
>>> finished before upgrading the next server ?
>>>>
>>>> Xavi
>>>>
>>>> On 10/01/17 11:39, Ankireddypalle Reddy wrote:
>>>>> Hi,
>>>>>
>>>>>       We upgraded to GlusterFS 3.7.18 yesterday.  We see lot of
>>>>> failures in our applications. Most of the errors are EIO. The
>>>>> following log lines are commonly seen in the logs:
>>>>>
>>>>>
>>>>>
>>>>> The message "W [MSGID: 122056] [ec-combine.c:873:ec_combine_check]
>>>>> 0-StoragePool-disperse-4: Mismatching xdata in answers of 'LOOKUP'"
>>>>> repeated 2 times between [2017-01-10 02:46:25.069809] and
>>>>> [2017-01-10 02:46:25.069835]
>>>>>
>>>>> [2017-01-10 02:46:25.069852] W [MSGID: 122056]
>>>>> [ec-combine.c:873:ec_combine_check] 0-StoragePool-disperse-5:
>>>>> Mismatching xdata in answers of 'LOOKUP'
>>>>>
>>>>> The message "W [MSGID: 122056] [ec-combine.c:873:ec_combine_check]
>>>>> 0-StoragePool-disperse-5: Mismatching xdata in answers of 'LOOKUP'"
>>>>> repeated 2 times between [2017-01-10 02:46:25.069852] and
>>>>> [2017-01-10 02:46:25.069873]
>>>>>
>>>>> [2017-01-10 02:46:25.069910] W [MSGID: 122056]
>>>>> [ec-combine.c:873:ec_combine_check] 0-StoragePool-disperse-6:
>>>>> Mismatching xdata in answers of 'LOOKUP'
>>>>>
>>>>> ...
>>>>>
>>>>> [2017-01-10 02:46:26.520774] I [MSGID: 109036]
>>>>> [dht-common.c:9076:dht_log_new_layout_for_dir_selfheal]
>>>>> 0-StoragePool-dht: Setting layout of
>>>>> /Folder_07.11.2016_23.02/CV_MAGNETIC/V_8854213/CHUNK_51334585 with
>>>>> [Subvol_name: StoragePool-disperse-0, Err: -1 , Start: 3221225466 ,
>>>>> Stop: 3758096376 , Hash: 1 ], [Subvol_name: StoragePool-disperse-1,
> Err:
>>>>> -1 , Start: 3758096377 , Stop: 4294967295 , Hash: 1 ], [Subvol_name:
>>>>> StoragePool-disperse-2, Err: -1 , Start: 0 , Stop: 536870910 , Hash:
>>>>> 1 ], [Subvol_name: StoragePool-disperse-3, Err: -1 , Start:
>>>>> 536870911 ,
>>>>> Stop: 1073741821 , Hash: 1 ], [Subvol_name: StoragePool-disperse-4,
> Err:
>>>>> -1 , Start: 1073741822 , Stop: 1610612732 , Hash: 1 ], [Subvol_name:
>>>>> StoragePool-disperse-5, Err: -1 , Start: 1610612733 , Stop:
>>>>> 2147483643 ,
>>>>> Hash: 1 ], [Subvol_name: StoragePool-disperse-6, Err: -1 , Start:
>>>>> 2147483644 , Stop: 2684354554 , Hash: 1 ], [Subvol_name:
>>>>> StoragePool-disperse-7, Err: -1 , Start: 2684354555 , Stop:
>>>>> 3221225465 ,
>>>>> Hash: 1 ],
>>>>>
>>>>> [2017-01-10 02:46:26.522841] N [MSGID: 122031]
>>>>> [ec-generic.c:1130:ec_combine_xattrop] 0-StoragePool-disperse-3:
>>>>> Mismatching dictionary in answers of 'GF_FOP_XATTROP'
>>>>>
>>>>> The message "N [MSGID: 122031]
>>>>> [ec-generic.c:1130:ec_combine_xattrop]
>>>>> 0-StoragePool-disperse-3: Mismatching dictionary in answers of
>>>>> 'GF_FOP_XATTROP'" repeated 2 times between [2017-01-10
>>>>> 02:46:26.522841] and [2017-01-10 02:46:26.522894]
>>>>>
>>>>> [2017-01-10 02:46:26.522898] W [MSGID: 122040]
>>>>> [ec-common.c:919:ec_prepare_update_cbk] 0-StoragePool-disperse-3:
>>>>> Failed to get size and version [Input/output error]
>>>>>
>>>>> [2017-01-10 02:46:26.523115] N [MSGID: 122031]
>>>>> [ec-generic.c:1130:ec_combine_xattrop] 0-StoragePool-disperse-6:
>>>>> Mismatching dictionary in answers of 'GF_FOP_XATTROP'
>>>>>
>>>>> The message "N [MSGID: 122031]
>>>>> [ec-generic.c:1130:ec_combine_xattrop]
>>>>> 0-StoragePool-disperse-6: Mismatching dictionary in answers of
>>>>> 'GF_FOP_XATTROP'" repeated 2 times between [2017-01-10
>>>>> 02:46:26.523115] and [2017-01-10 02:46:26.523143]
>>>>>
>>>>> [2017-01-10 02:46:26.523147] W [MSGID: 122040]
>>>>> [ec-common.c:919:ec_prepare_update_cbk] 0-StoragePool-disperse-6:
>>>>> Failed to get size and version [Input/output error]
>>>>>
>>>>> [2017-01-10 02:46:26.523302] N [MSGID: 122031]
>>>>> [ec-generic.c:1130:ec_combine_xattrop] 0-StoragePool-disperse-2:
>>>>> Mismatching dictionary in answers of 'GF_FOP_XATTROP'
>>>>>
>>>>> The message "N [MSGID: 122031]
>>>>> [ec-generic.c:1130:ec_combine_xattrop]
>>>>> 0-StoragePool-disperse-2: Mismatching dictionary in answers of
>>>>> 'GF_FOP_XATTROP'" repeated 2 times between [2017-01-10
>>>>> 02:46:26.523302] and [2017-01-10 02:46:26.523324]
>>>>>
>>>>> [2017-01-10 02:46:26.523328] W [MSGID: 122040]
>>>>> [ec-common.c:919:ec_prepare_update_cbk] 0-StoragePool-disperse-2:
>>>>> Failed to get size and version [Input/output error]
>>>>>
>>>>>
>>>>>
>>>>> [root at glusterfs3 Log_Files]# gluster --version
>>>>>
>>>>> glusterfs 3.7.18 built on Dec  8 2016 06:34:26
>>>>>
>>>>>
>>>>>
>>>>> [root at glusterfs3 Log_Files]# gluster volume info
>>>>>
>>>>>
>>>>>
>>>>> Volume Name: StoragePool
>>>>>
>>>>> Type: Distributed-Disperse
>>>>>
>>>>> Volume ID: 149e976f-4e21-451c-bf0f-f5691208531f
>>>>>
>>>>> Status: Started
>>>>>
>>>>> Number of Bricks: 8 x (2 + 1) = 24
>>>>>
>>>>> Transport-type: tcp
>>>>>
>>>>> Bricks:
>>>>>
>>>>> Brick1: glusterfs1sds:/ws/disk1/ws_brick
>>>>>
>>>>> Brick2: glusterfs2sds:/ws/disk1/ws_brick
>>>>>
>>>>> Brick3: glusterfs3sds:/ws/disk1/ws_brick
>>>>>
>>>>> Brick4: glusterfs1sds:/ws/disk2/ws_brick
>>>>>
>>>>> Brick5: glusterfs2sds:/ws/disk2/ws_brick
>>>>>
>>>>> Brick6: glusterfs3sds:/ws/disk2/ws_brick
>>>>>
>>>>> Brick7: glusterfs1sds:/ws/disk3/ws_brick
>>>>>
>>>>> Brick8: glusterfs2sds:/ws/disk3/ws_brick
>>>>>
>>>>> Brick9: glusterfs3sds:/ws/disk3/ws_brick
>>>>>
>>>>> Brick10: glusterfs1sds:/ws/disk4/ws_brick
>>>>>
>>>>> Brick11: glusterfs2sds:/ws/disk4/ws_brick
>>>>>
>>>>> Brick12: glusterfs3sds:/ws/disk4/ws_brick
>>>>>
>>>>> Brick13: glusterfs1sds:/ws/disk5/ws_brick
>>>>>
>>>>> Brick14: glusterfs2sds:/ws/disk5/ws_brick
>>>>>
>>>>> Brick15: glusterfs3sds:/ws/disk5/ws_brick
>>>>>
>>>>> Brick16: glusterfs1sds:/ws/disk6/ws_brick
>>>>>
>>>>> Brick17: glusterfs2sds:/ws/disk6/ws_brick
>>>>>
>>>>> Brick18: glusterfs3sds:/ws/disk6/ws_brick
>>>>>
>>>>> Brick19: glusterfs1sds:/ws/disk7/ws_brick
>>>>>
>>>>> Brick20: glusterfs2sds:/ws/disk7/ws_brick
>>>>>
>>>>> Brick21: glusterfs3sds:/ws/disk7/ws_brick
>>>>>
>>>>> Brick22: glusterfs1sds:/ws/disk8/ws_brick
>>>>>
>>>>> Brick23: glusterfs2sds:/ws/disk8/ws_brick
>>>>>
>>>>> Brick24: glusterfs3sds:/ws/disk8/ws_brick
>>>>>
>>>>> Options Reconfigured:
>>>>>
>>>>> performance.readdir-ahead: on
>>>>>
>>>>> diagnostics.client-log-level: INFO
>>>>>
>>>>>
>>>>>
>>>>> Thanks and Regards,
>>>>>
>>>>> Ram
>>>>>
>>>>> ***************************Legal
>>>>> Disclaimer***************************
>>>>> "This communication may contain confidential and privileged
>>>>> material for the sole use of the intended recipient. Any
>>>>> unauthorized review, use or distribution by others is strictly
>>>>> prohibited. If you have received the message by mistake, please
>>>>> advise the sender by reply email and delete the message. Thank you."
>>>>> *******************************************************************
>>>>> *
>>>>> *
>>>>> *
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Gluster-devel mailing list
>>>>> Gluster-devel at gluster.org
>>>>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>>>>
>>>>
>>>> ***************************Legal
>>>> Disclaimer***************************
>>>> "This communication may contain confidential and privileged material
>>>> for the sole use of the intended recipient. Any unauthorized review,
>>>> use or distribution by others is strictly prohibited. If you have
>>>> received the message by mistake, please advise the sender by reply
>>> email and delete the message. Thank you."
>>>> ********************************************************************
>>>> *
>>>> *
>>>>
>>>
>>> ***************************Legal
>>> Disclaimer***************************
>>> "This communication may contain confidential and privileged material
>>> for the sole use of the intended recipient. Any unauthorized review,
>>> use or distribution by others is strictly prohibited. If you have
>>> received the message by mistake, please advise the sender by reply
>>> email and delete the message. Thank you."
>>> *********************************************************************
>>> *
>>
>> ***************************Legal Disclaimer***************************
>> "This communication may contain confidential and privileged material
>> for the sole use of the intended recipient. Any unauthorized review,
>> use or distribution by others is strictly prohibited. If you have
>> received the message by mistake, please advise the sender by reply
> email and delete the message. Thank you."
>> **********************************************************************
>>
>
> ***************************Legal Disclaimer***************************
> "This communication may contain confidential and privileged material for the
> sole use of the intended recipient. Any unauthorized review, use or
> distribution
> by others is strictly prohibited. If you have received the message by
> mistake,
> please advise the sender by reply email and delete the message. Thank you."
> **********************************************************************



More information about the Gluster-devel mailing list