[Gluster-devel] [Gluster-users] Disperse mkdir fails

Xavier Hernandez xhernandez at datalab.es
Wed Mar 15 07:21:29 UTC 2017


Hi Ram,

On 14/03/17 16:48, Ankireddypalle Reddy wrote:
> Xavi,
>            Thanks for checking this.  We have an external metadata server which keeps track of every file that gets written to the volume and has the capability to validate the file contents. Will use this capability to validate the file contents. Once the data is verified will the following sequence of steps be sufficient to restore the volume.
>
> 1) Rebalance the volume.

Probably this won't succeed if you are already having problems creating 
directories. First you should make sure that the volume is healthy.

> 2) After rebalance is complete, stop ingesting more data to the volume.
> 3) Let the pending heals complete.

These two steps would be useful before the rebalance. After letting 
self-heal to heal everything it can, the remaining damaged entries 
should be healed by hand. The exact procedure depends on the nature of 
the problems.

> 4) Stop the volume
> 5) For any heals that fail because of mismatching version/dirty extended attributes on the directories,  set this to a matching value on all the nodes.

It depends. Making the version/dirty attributes to match doesn't solve 
the underlying problem that caused them to get out of sync. For example, 
for a directory entry you should also make sure that all subdirectories 
exist on all bricks and they have the same attributes. If any directory 
is missing, you need to create it along with its attributes and contents 
recursively.

For each disperse set, you also need to make sure that all directories 
and files match (for files only attributes, not file contents).

If self-heal is unable to fix a problem, most probably it's more complex 
than simply fixing version/dirty, so be cautious.

Xavi

>
> Thanks and Regards,
> Ram
>
> -----Original Message-----
> From: Xavier Hernandez [mailto:xhernandez at datalab.es]
> Sent: Tuesday, March 14, 2017 5:28 AM
> To: Ankireddypalle Reddy; Gluster Devel (gluster-devel at gluster.org); gluster-users at gluster.org
> Subject: Re: [Gluster-users] Disperse mkdir fails
>
> Hi Ram,
>
> On 13/03/17 15:02, Ankireddypalle Reddy wrote:
>> Xavi,
>>                CV_MAGNETIC directory on a single brick  has  155683 entries.  There are altogether 60 bricks in the volume. I could provide the output if you still need that.
>
> The problem is that not all bricks have the same number of entries:
>
> glusterfs1:disk1 155674
> glusterfs2:disk1 155675
> glusterfs3:disk1 155718
>
> glusterfs1:disk2 155688
> glusterfs2:disk2 155687
> glusterfs3:disk2 155730
>
> glusterfs1:disk3 155675
> glusterfs2:disk3 155674
> glusterfs3:disk3 155717
>
> glusterfs1:disk4 155684
> glusterfs2:disk4 155683
> glusterfs3:disk4 155726
>
> glusterfs1:disk5 155698
> glusterfs2:disk5 155695
> glusterfs3:disk5 155738
>
> glusterfs1:disk6 155668
> glusterfs2:disk6 155667
> glusterfs3:disk6 155710
>
> glusterfs1:disk7 155687
> glusterfs2:disk7 155689
> glusterfs3:disk7 155732
>
> glusterfs1:disk8 155673
> glusterfs2:disk8 155675
> glusterfs3:disk8 155718
>
> glusterfs4:disk1 149097
> glusterfs5:disk1 149097
> glusterfs6:disk1 149098
>
> glusterfs4:disk2 149097
> glusterfs5:disk2 149097
> glusterfs6:disk2 149098
>
> glusterfs4:disk3 149097
> glusterfs5:disk3 149097
> glusterfs6:disk3 149098
>
> glusterfs4:disk4 149097
> glusterfs5:disk4 149097
> glusterfs6:disk4 149098
>
> glusterfs4:disk5 149097
> glusterfs5:disk5 149097
> glusterfs6:disk5 149098
>
> glusterfs4:disk6 149097
> glusterfs5:disk6 149097
> glusterfs6:disk6 149098
>
> glusterfs4:disk7 149097
> glusterfs5:disk7 149097
> glusterfs6:disk7 149098
>
> glusterfs4:disk8 149097
> glusterfs5:disk8 149097
> glusterfs6:disk8 149098
>
> An small difference could be explained by concurrent operations while retrieving this data, but some bricks are way out of sync.
>
> trusted.ec.dirty and trusted.ec.version also show many discrepancies:
>
> glusterfs1:disk1 trusted.ec.dirty=0x0000000000000ba40000000000000000
> glusterfs2:disk1 trusted.ec.dirty=0x0000000000000bb80000000000000000
> glusterfs3:disk1 trusted.ec.dirty=0x00000000000000160000000000000000
> glusterfs1:disk1 trusted.ec.version=0x0000000000084db40000000000084e11
> glusterfs2:disk1 trusted.ec.version=0x0000000000084e070000000000084e0c
> glusterfs3:disk1 trusted.ec.version=0x000000000008426a0000000000084e11
>
> glusterfs1:disk2 trusted.ec.dirty=0x0000000000000ba50000000000000000
> glusterfs2:disk2 trusted.ec.dirty=0x0000000000000bb60000000000000000
> glusterfs3:disk2 trusted.ec.dirty=0x00000000000000170000000000000000
> glusterfs1:disk2 trusted.ec.version=0x000000000005ccb7000000000005cd0a
> glusterfs2:disk2 trusted.ec.version=0x000000000005cd00000000000005cd05
> glusterfs3:disk2 trusted.ec.version=0x000000000005c166000000000005cd0a
>
> glusterfs1:disk3 trusted.ec.dirty=0x0000000000000ba50000000000000000
> glusterfs2:disk3 trusted.ec.dirty=0x0000000000000bb50000000000000000
> glusterfs3:disk3 trusted.ec.dirty=0x00000000000000160000000000000000
> glusterfs1:disk3 trusted.ec.version=0x000000000005d0cb000000000005d123
> glusterfs2:disk3 trusted.ec.version=0x000000000005d119000000000005d11e
> glusterfs3:disk3 trusted.ec.version=0x000000000005c57f000000000005d123
>
> glusterfs1:disk4 trusted.ec.dirty=0x0000000000000ba00000000000000000
> glusterfs2:disk4 trusted.ec.dirty=0x0000000000000bb10000000000000000
> glusterfs3:disk4 trusted.ec.dirty=0x00000000000000130000000000000000
> glusterfs1:disk4 trusted.ec.version=0x0000000000084e2e0000000000084e78
> glusterfs2:disk4 trusted.ec.version=0x0000000000084e6e0000000000084e73
> glusterfs3:disk4 trusted.ec.version=0x00000000000842d50000000000084e78
>
> glusterfs1:disk5 trusted.ec.dirty=0x0000000000000b9a0000000000000000
> glusterfs2:disk5 trusted.ec.dirty=0x0000000000002e270000000000000000
> glusterfs3:disk5 trusted.ec.dirty=0x00000000000022950000000000000000
> glusterfs1:disk5 trusted.ec.version=0x000000000005aa1f000000000005cd18
> glusterfs2:disk5 trusted.ec.version=0x000000000005cd0d000000000005cd13
> glusterfs3:disk5 trusted.ec.version=0x000000000005c180000000000005cd18
>
> glusterfs1:disk6 trusted.ec.dirty=0x0000000000000ba20000000000000000
> glusterfs2:disk6 trusted.ec.dirty=0x0000000000000bad0000000000000000
> glusterfs3:disk6 trusted.ec.dirty=0x000000000000000f0000000000000000
> glusterfs1:disk6 trusted.ec.version=0x000000000005ccba000000000005cce7
> glusterfs2:disk6 trusted.ec.version=0x000000000005ccde000000000005cce2
> glusterfs3:disk6 trusted.ec.version=0x000000000005c145000000000005cce7
>
>
> glusterfs1:disk7 trusted.ec.dirty=0x0000000000000ba50000000000000000
> glusterfs2:disk7 trusted.ec.dirty=0x0000000000000bab0000000000000000
> glusterfs3:disk7 trusted.ec.dirty=0x000000000000000a0000000000000000
> glusterfs1:disk7 trusted.ec.version=0x000000000005cd03000000000005cd0d
> glusterfs2:disk7 trusted.ec.version=0x000000000005cd04000000000005cd08
> glusterfs3:disk7 trusted.ec.version=0x000000000005c138000000000005cd0d
>
>
> glusterfs1:disk8 trusted.ec.dirty=0x0000000000000bbb0000000000000000
> glusterfs2:disk8 trusted.ec.dirty=0x0000000000000bc00000000000000000
> glusterfs3:disk8 trusted.ec.dirty=0x00000000000000090000000000000000
> glusterfs1:disk8 trusted.ec.version=0x000000000005cdc4000000000005cdcd
> glusterfs2:disk8 trusted.ec.version=0x000000000005cdc4000000000005cdc8
> glusterfs3:disk8 trusted.ec.version=0x000000000005c158000000000005cdcd
>
> glusterfs4:disk1 trusted.ec.version=0x000000000005901d0000000000059021
> glusterfs5:disk1 trusted.ec.version=0x000000000005901d0000000000059021
> glusterfs6:disk1 trusted.ec.version=0x000000000005901e0000000000059022
>
> glusterfs4:disk2 trusted.ec.version=0x000000000002d2d7000000000002d2d9
> glusterfs5:disk2 trusted.ec.version=0x000000000002d2d7000000000002d2d9
> glusterfs6:disk2 trusted.ec.version=0x000000000002d2d8000000000002d2da
>
> glusterfs4:disk3 trusted.ec.version=0x000000000002d2d7000000000002d2d9
> glusterfs5:disk3 trusted.ec.version=0x000000000002d2d7000000000002d2d9
> glusterfs6:disk3 trusted.ec.version=0x000000000002d2d8000000000002d2da
>
> glusterfs4:disk4 trusted.ec.version=0x000000000002d2d7000000000002d2d9
> glusterfs5:disk4 trusted.ec.version=0x000000000002d2d7000000000002d2d9
> glusterfs6:disk4 trusted.ec.version=0x000000000002d2d8000000000002d2da
>
> glusterfs4:disk5 trusted.ec.version=0x000000000002d2d7000000000002d2d9
> glusterfs5:disk5 trusted.ec.version=0x000000000002d2d7000000000002d2d9
> glusterfs6:disk5 trusted.ec.version=0x000000000002d2d8000000000002d2da
>
> glusterfs4:disk6 trusted.ec.version=0x000000000002d2d7000000000002d2d9
> glusterfs5:disk6 trusted.ec.version=0x000000000002d2d7000000000002d2d9
> glusterfs6:disk6 trusted.ec.version=0x000000000002d2d8000000000002d2da
>
> glusterfs4:disk7 trusted.ec.version=0x000000000002d2d7000000000002d2d9
> glusterfs5:disk7 trusted.ec.version=0x000000000002d2d7000000000002d2d9
> glusterfs6:disk7 trusted.ec.version=0x000000000002d2d8000000000002d2da
>
> glusterfs4:disk8 trusted.ec.version=0x000000000002d2d7000000000002d2d9
> glusterfs5:disk8 trusted.ec.version=0x000000000002d2d7000000000002d2d9
> glusterfs6:disk8 trusted.ec.version=0x000000000002d2d8000000000002d2da
>
> Newer bricks seem to be healthy, but old bricks have a lot of differences.
>
> I also see that trusted.glusterfs.dht is not set for newer bricks, and the full range of hashes are assigned to the old bricks (at least for the CV_MAGNETIC directory). This probably means that a rebalance has not been executed on the volume after adding the new bricks (or it failed).
>
> This will require much more investigation and knowledge about how do you things, from how many clients, ...
>
> Xavi
>
>>
>> Thanks and Regards,
>> Ram
>>
>> -----Original Message-----
>> From: Xavier Hernandez [mailto:xhernandez at datalab.es]
>> Sent: Monday, March 13, 2017 9:56 AM
>> To: Ankireddypalle Reddy; Gluster Devel (gluster-devel at gluster.org);
>> gluster-users at gluster.org
>> Subject: Re: [Gluster-users] Disperse mkdir fails
>>
>> Hi Ram,
>>
>> On 13/03/17 14:13, Ankireddypalle Reddy wrote:
>>> Attachment (1):
>>>
>>> 1
>>>
>>> 	
>>>
>>> data.txt
>>> <https://imap.commvault.com/webconsole/embedded.do?url=https://imap.c
>>> o
>>> mmvault.com/webconsole/api/drive/publicshare/346714/file/02bb2e2504a5
>>> 4
>>> 3e58cc89bce9f350f8c/action/preview&downloadUrl=https://imap.commvault.
>>> com/webconsole/api/contentstore/publicshare/346714/file/02bb2e2504a54
>>> 3
>>> e58cc89bce9f350f8c/action/download>
>>> [Download]
>>> <https://imap.commvault.com/webconsole/api/contentstore/publicshare/3
>>> 4
>>> 6714/file/02bb2e2504a543e58cc89bce9f350f8c/action/download>(17.63
>>> KB)
>>>
>>> Xavier,
>>>                Please find attached the required info from all the
>>> six nodes of the cluster.
>>
>> I asked for the contents of the CV_MAGNETIC because this is the damaged directory, not the parent. But anyway we can see that the number of hard links of the directory differs for each brick, so this means that the number of subdirectories is different on each brick. A small difference could be explainable by the current activity of the volume while the data has been captured, but the differences are too big.
>>
>>>  We need to find
>>>                1) What is the solution through which this problem can
>>> be avoided.
>>>                2) How do we fix the current state of the cluster.
>>>
>>> Thanks and Regards,
>>> Ram
>>> -----Original Message-----
>>> From: Xavier Hernandez [mailto:xhernandez at datalab.es]
>>> Sent: Friday, March 10, 2017 3:34 AM
>>> To: Ankireddypalle Reddy; Gluster Devel (gluster-devel at gluster.org);
>>> gluster-users at gluster.org
>>> Subject: Re: [Gluster-users] Disperse mkdir fails
>>>
>>> Hi Ram,
>>>
>>> On 09/03/17 20:15, Ankireddypalle Reddy wrote:
>>>> Xavi,
>>>>             Thanks for checking this.
>>>>             1) mkdir returns errnum 5. EIO.
>>>>             2)  The specified directory is the parent directory
>>>> under
>>> which all the data in the gluster volume will be stored. Current
>>> around 160TB of 262 TB is  consumed.
>>>
>>> I only need the first level entries of that directory, not the entire
>>> tree of entries. This should be in the order of thousands, right ?
>>>
>>> We need to make sure that all bricks have the same entries in this
>>> directory. Otherwise we would need to check other things.
>>>
>>>>             3)  It is extremely difficult to list the exact sequence
>>> of FOPS that would have been issued to the directory. The storage is
>>> heavily used and lot of sub directories are present inside this directory.
>>>>
>>>>            Are you looking for the extended attributes for this
>>> directory from all the bricks inside the volume.  There are about 60 bricks.
>>>
>>> If possible, yes.
>>>
>>> However, if there's a lot of modifications on that directory while
>>> you are getting the xattr, it's possible that you get inconsistent
>>> values, but they are not really inconsistent.
>>>
>>> If possible, you should get that information pausing all activity to
>>> that directory.
>>>
>>> Xavi
>>>
>>>>
>>>> Thanks and Regards,
>>>> Ram
>>>>
>>>> -----Original Message-----
>>>> From: Xavier Hernandez [mailto:xhernandez at datalab.es]
>>>> Sent: Thursday, March 09, 2017 11:15 AM
>>>> To: Ankireddypalle Reddy; Gluster Devel (gluster-devel at gluster.org);
>>>> gluster-users at gluster.org
>>>> Subject: Re: [Gluster-users] Disperse mkdir fails
>>>>
>>>> Hi Ram,
>>>>
>>>> On 09/03/17 16:52, Ankireddypalle Reddy wrote:
>>>>> Attachment (1):
>>>>>
>>>>> 1
>>>>>
>>>>>
>>>>>
>>>>> info.txt
>>>>> <https://imap.commvault.com/webconsole/embedded.do?url=https://imap.
>>>>> c
>>>>> o
>>>>> mmvault.com/webconsole/api/drive/publicshare/346714/file/3037641a3f
>>>>> 9
>>>>> b
>>>>> 4
>>>>> 133920b1b251ed32d5d/action/preview&downloadUrl=https://imap.commvault.
>>>>> com/webconsole/api/contentstore/publicshare/346714/file/3037641a3f9
>>>>> b
>>>>> 4
>>>>> 1
>>>>> 33920b1b251ed32d5d/action/download>
>>>>> [Download]
>>>>> <https://imap.commvault.com/webconsole/api/contentstore/publicshare
>>>>> /
>>>>> 3
>>>>> 4
>>>>> 6714/file/3037641a3f9b4133920b1b251ed32d5d/action/download>(3.35
>>>>> KB)
>>>>>
>>>>> Hi,
>>>>>
>>>>>         I have a disperse gluster volume  with 6 servers. 262TB of
>>>>> usable capacity.  Gluster version is 3.7.19.
>>>>>
>>>>>         glusterfs1, glusterf2 and glusterfs3 nodes were initially
>>>>> used for creating the volume. Nodes glusterf4, glusterfs5 and
>>>>> glusterfs6 were later added to the volume.
>>>>>
>>>>>
>>>>>
>>>>>         Directory creation failed on a directory called
>>>>> /ws/glus/Folder_07.11.2016_23.02/CV_MAGNETIC.
>>>>>
>>>>>         # file: ws/glus/Folder_07.11.2016_23.02/CV_MAGNETIC
>>>>>
>>>>>         glusterfs.gfid.string="e8e51015-616f-4f04-b9d2-92f46eb5cfc7"
>>>>>
>>>>>
>>>>>
>>>>>         gluster mount log contains lot of following errors:
>>>>>
>>>>>         [2017-03-09 15:32:36.773937] W [MSGID: 122056]
>>>>> [ec-combine.c:875:ec_combine_check] 0-StoragePool-disperse-7:
>>>>> Mismatching xdata in answers of 'LOOKUP' for
>>>>> e8e51015-616f-4f04-b9d2-92f46eb5cfc7
>>>>>
>>>>>
>>>>>
>>>>>         The directory seems to be out of sync between nodes
>>>>> glusterfs1,
>>>>> glusterfs2 and glusterfs3. Each has different version.
>>>>>
>>>>>
>>>>>
>>>>>          trusted.ec.version=0x00000000000839f00000000000083a4d
>>>>>
>>>>>          trusted.ec.version=0x0000000000082ea40000000000083a4b
>>>>>
>>>>>          trusted.ec.version=0x0000000000083a760000000000083a7b
>>>>>
>>>>>
>>>>>
>>>>>          Self-heal does not seem to be healing this directory.
>>>>>
>>>>
>>>> This is very similar to what happened the other time. Once more than
>>>> 1
>>> brick is damaged, self-heal cannot do anything to heal it on a 2+1
>>> configuration.
>>>>
>>>> What error does return the mkdir request ?
>>>>
>>>> Does the directory you are trying to create already exist on some brick ?
>>>>
>>>> Can you show all the remaining extended attributes of the directory ?
>>>>
>>>> It would also be useful to have the directory contents on each brick
>>> (an 'ls -l'). In this case, include the name of the directory you are
>>> trying to create.
>>>>
>>>> Can you explain a detailed sequence of operations done on that
>>> directory since the last time you successfully created a new subdirectory ?
>>>> including any metadata change.
>>>>
>>>> Xavi
>>>>
>>>>>
>>>>>
>>>>> Thanks and Regards,
>>>>>
>>>>> Ram
>>>>>
>>>>> ***************************Legal
>>>>> Disclaimer***************************
>>>>> "This communication may contain confidential and privileged
>>>>> material for the sole use of the intended recipient. Any
>>>>> unauthorized review, use or distribution by others is strictly
>>>>> prohibited. If you have received the message by mistake, please
>>>>> advise the sender by reply email and delete the message. Thank you."
>>>>> *******************************************************************
>>>>> *
>>>>> *
>>>>> *
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>
>>>> ***************************Legal
>>>> Disclaimer***************************
>>>> "This communication may contain confidential and privileged material
>>>> for the sole use of the intended recipient. Any unauthorized review,
>>>> use or distribution by others is strictly prohibited. If you have
>>>> received the message by mistake, please advise the sender by reply
>>> email and delete the message. Thank you."
>>>> ********************************************************************
>>>> *
>>>> *
>>>>
>>>
>>> ***************************Legal
>>> Disclaimer***************************
>>> "This communication may contain confidential and privileged material
>>> for the sole use of the intended recipient. Any unauthorized review,
>>> use or distribution by others is strictly prohibited. If you have
>>> received the message by mistake, please advise the sender by reply
>>> email and delete the message. Thank you."
>>> *********************************************************************
>>> *
>>
>> ***************************Legal Disclaimer***************************
>> "This communication may contain confidential and privileged material
>> for the sole use of the intended recipient. Any unauthorized review,
>> use or distribution by others is strictly prohibited. If you have
>> received the message by mistake, please advise the sender by reply email and delete the message. Thank you."
>> **********************************************************************
>>
>
> ***************************Legal Disclaimer***************************
> "This communication may contain confidential and privileged material for the
> sole use of the intended recipient. Any unauthorized review, use or distribution
> by others is strictly prohibited. If you have received the message by mistake,
> please advise the sender by reply email and delete the message. Thank you."
> **********************************************************************
>



More information about the Gluster-devel mailing list