[Gluster-devel] [Gluster-users] Fwd: dht_is_subvol_filled messages on client

Xavier Hernandez xhernandez at datalab.es
Thu May 5 12:16:25 UTC 2016


On 05/05/16 13:59, Kaushal M wrote:
> On Thu, May 5, 2016 at 4:37 PM, Xavier Hernandez <xhernandez at datalab.es> wrote:
>> On 05/05/16 11:31, Kaushal M wrote:
>>>
>>> On Thu, May 5, 2016 at 2:36 PM, David Gossage
>>> <dgossage at carouselchecks.com> wrote:
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, May 5, 2016 at 3:28 AM, Serkan Çoban <cobanserkan at gmail.com>
>>>> wrote:
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> You can find the output below link:
>>>>> https://www.dropbox.com/s/wzrh5yp494ogksc/status_detail.txt?dl=0
>>>>>
>>>>> Thanks,
>>>>> Serkan
>>>>
>>>>
>>>>
>>>> Maybe not issue, but playing one of these things is not like the other I
>>>> notice of all the bricks only one seems to be different at a quick glance
>>>>
>>>> Brick                : Brick 1.1.1.235:/bricks/20
>>>> TCP Port             : 49170
>>>> RDMA Port            : 0
>>>> Online               : Y
>>>> Pid                  : 26736
>>>> File System          : ext4
>>>> Device               : /dev/mapper/vol0-vol_root
>>>> Mount Options        : rw,relatime,data=ordered
>>>> Inode Size           : 256
>>>> Disk Space Free      : 86.1GB
>>>> Total Disk Space     : 96.0GB
>>>> Inode Count          : 6406144
>>>> Free Inodes          : 6381374
>>>>
>>>> Every other brick seems to be 7TB and xfs but this one.
>>>
>>>
>>> Looks like the brick fs isn't mounted, and the root-fs is being used
>>> instead. But that still leaves enough inodes free.
>>>
>>> What I suspect is that one of the cluster translators is mixing up
>>> stats when aggregating from multiple bricks.
>>> From the log snippet you gave in the first mail, it seems like the
>>> disperse translator is possibly involved.
>>
>>
>> Currently ec takes the number of potential files in the subvolume (f_files)
>> as the maximum of all its subvolumes, but it takes the available count
>> (f_ffree) as the minumum of all its volumes.
>>
>> This causes max to be ~781.000.000, but free will be ~6.300.000. This gives
>> a ~0.8% available, i.e. almost 100% full.
>>
>> Given the circumstances I think it's the correct thing to do.
>
> Thanks for giving the reasoning Xavi.
>
> But why is the number of potential files the maximum?
> IIUC, a file (or parts of it) will be written to all subvolumes in the
> disperse set.
> So wouldn't the smallest subvolume limit the number of files that
> could be possibly created?

I'm not very sure why this decision was taken. In theory ec only 
supports identical subvolumes because of the way it works. This means 
that all bricks should report the same maximum.

When this doesn't happen, I suppose that the motivation was that this 
number should report the theoretic maximum number of files that the 
volume can contain.

>
> ~kaushal
>
>>
>> Xavi
>>
>>
>>>
>>> BTW, how large is the volume you have? Those are a lot of bricks!
>>>
>>> ~kaushal
>>>
>>>
>>>>
>>>>
>>>>
>>>>>
>>>>>
>>>>> On Thu, May 5, 2016 at 9:33 AM, Xavier Hernandez <xhernandez at datalab.es>
>>>>> wrote:
>>>>>>
>>>>>> Can you post the result of 'gluster volume status v0 detail' ?
>>>>>>
>>>>>>
>>>>>> On 05/05/16 06:49, Serkan Çoban wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hi, Can anyone suggest something for this issue? df, du has no issue
>>>>>>> for the bricks yet one subvolume not being used by gluster..
>>>>>>>
>>>>>>> On Wed, May 4, 2016 at 4:40 PM, Serkan Çoban <cobanserkan at gmail.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I changed cluster.min-free-inodes to "0". Remount the volume on
>>>>>>>> clients. inode full messages not coming to syslog anymore but I see
>>>>>>>> disperse-56 subvolume still not being used.
>>>>>>>> Anything I can do to resolve this issue? Maybe I can destroy and
>>>>>>>> recreate the volume but I am not sure It will fix this issue...
>>>>>>>> Maybe the disperse size 16+4 is too big should I change it to 8+2?
>>>>>>>>
>>>>>>>> On Tue, May 3, 2016 at 2:36 PM, Serkan Çoban <cobanserkan at gmail.com>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I also checked the df output all 20 bricks are same like below:
>>>>>>>>> /dev/sdu1 7.3T 34M 7.3T 1% /bricks/20
>>>>>>>>>
>>>>>>>>> On Tue, May 3, 2016 at 1:40 PM, Raghavendra G
>>>>>>>>> <raghavendra at gluster.com>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, May 2, 2016 at 11:41 AM, Serkan Çoban
>>>>>>>>>> <cobanserkan at gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> 1. What is the out put of du -hs <back-end-export>? Please get
>>>>>>>>>>>> this
>>>>>>>>>>>> information for each of the brick that are part of disperse.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Sorry. I needed df output of the filesystem containing brick. Not
>>>>>>>>>> du.
>>>>>>>>>> Sorry
>>>>>>>>>> about that.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> There are 20 bricks in disperse-56 and the du -hs output is like:
>>>>>>>>>>> 80K /bricks/20
>>>>>>>>>>> 80K /bricks/20
>>>>>>>>>>> 80K /bricks/20
>>>>>>>>>>> 80K /bricks/20
>>>>>>>>>>> 80K /bricks/20
>>>>>>>>>>> 80K /bricks/20
>>>>>>>>>>> 80K /bricks/20
>>>>>>>>>>> 80K /bricks/20
>>>>>>>>>>> 1.8M /bricks/20
>>>>>>>>>>> 80K /bricks/20
>>>>>>>>>>> 80K /bricks/20
>>>>>>>>>>> 80K /bricks/20
>>>>>>>>>>> 80K /bricks/20
>>>>>>>>>>> 80K /bricks/20
>>>>>>>>>>> 80K /bricks/20
>>>>>>>>>>> 80K /bricks/20
>>>>>>>>>>> 80K /bricks/20
>>>>>>>>>>> 80K /bricks/20
>>>>>>>>>>> 80K /bricks/20
>>>>>>>>>>> 80K /bricks/20
>>>>>>>>>>>
>>>>>>>>>>> I see that gluster is not writing to this disperse set. All other
>>>>>>>>>>> disperse sets are filled 13GB but this one is empty. I see
>>>>>>>>>>> directory
>>>>>>>>>>> structure created but no files in directories.
>>>>>>>>>>> How can I fix the issue? I will try to rebalance but I don't think
>>>>>>>>>>> it
>>>>>>>>>>> will write to this disperse set...
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Sat, Apr 30, 2016 at 9:22 AM, Raghavendra G
>>>>>>>>>>> <raghavendra at gluster.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Apr 29, 2016 at 12:32 AM, Serkan Çoban
>>>>>>>>>>>> <cobanserkan at gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi, I cannot get an answer from user list, so asking to devel
>>>>>>>>>>>>> list.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am getting [dht-diskusage.c:277:dht_is_subvol_filled]
>>>>>>>>>>>>> 0-v0-dht:
>>>>>>>>>>>>> inodes on subvolume 'v0-disperse-56' are at (100.00 %), consider
>>>>>>>>>>>>> adding more bricks.
>>>>>>>>>>>>>
>>>>>>>>>>>>> message on client logs.My cluster is empty there are only a
>>>>>>>>>>>>> couple
>>>>>>>>>>>>> of
>>>>>>>>>>>>> GB files for testing. Why this message appear in syslog?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> dht uses disk usage information from backend export.
>>>>>>>>>>>>
>>>>>>>>>>>> 1. What is the out put of du -hs <back-end-export>? Please get
>>>>>>>>>>>> this
>>>>>>>>>>>> information for each of the brick that are part of disperse.
>>>>>>>>>>>> 2. Once you get du information from each brick, the value seen by
>>>>>>>>>>>> dht
>>>>>>>>>>>> will
>>>>>>>>>>>> be based on how cluster/disperse aggregates du info (basically
>>>>>>>>>>>> statfs
>>>>>>>>>>>> fop).
>>>>>>>>>>>>
>>>>>>>>>>>> The reason for 100% disk usage may be,
>>>>>>>>>>>> In case of 1, backend fs might be shared by data other than
>>>>>>>>>>>> brick.
>>>>>>>>>>>> In case of 2, some issues with aggregation.
>>>>>>>>>>>>
>>>>>>>>>>>>> Is is safe to
>>>>>>>>>>>>> ignore it?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> dht will try not to have data files on the subvol in question
>>>>>>>>>>>> (v0-disperse-56). Hence lookup cost will be two hops for files
>>>>>>>>>>>> hashing
>>>>>>>>>>>> to
>>>>>>>>>>>> disperse-56 (note that other fops like read/write/open still have
>>>>>>>>>>>> the
>>>>>>>>>>>> cost
>>>>>>>>>>>> of single hop and dont suffer from this penalty). Other than that
>>>>>>>>>>>> there
>>>>>>>>>>>> is
>>>>>>>>>>>> no significant harm unless disperse-56 is really running out of
>>>>>>>>>>>> space.
>>>>>>>>>>>>
>>>>>>>>>>>> regards,
>>>>>>>>>>>> Raghavendra
>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> Gluster-devel mailing list
>>>>>>>>>>>>> Gluster-devel at gluster.org
>>>>>>>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Raghavendra G
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Gluster-devel mailing list
>>>>>>>>>>> Gluster-devel at gluster.org
>>>>>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Raghavendra G
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Gluster-users mailing list
>>>>>>> Gluster-users at gluster.org
>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>
>>


More information about the Gluster-devel mailing list