[Gluster-devel] Huge VSZ (VIRT) usage by glustershd on dummy node

Wed Jun 8 07:06:58 UTC 2016

On Wed, Jun 8, 2016 at 12:33 PM, Oleksandr Natalenko <
oleksandr at natalenko.name> wrote:

> Yup, I can do that, but please note that RSS does not change. Will
> statedump show VIRT values?
>
> Also, I'm looking at the numbers now, and see that on each reconnect VIRT
> grows by ~24M (once per ~10–15 mins). Probably, that could give you some
> idea what is going wrong.
>

That's interesting. Never saw something like this happen. I would still
like to see if there are any clues in statedump when all this happens. May
be what you said will be confirmed that nothing new is allocated but I
would just like to confirm.

> 08.06.2016 09:50, Pranith Kumar Karampuri написав:
>
> Oleksandr,
>> Could you take statedump of the shd process once in 5-10 minutes and
>> send may be 5 samples of them when it starts to increase? This will
>> help us find what datatypes are being allocated a lot and can lead to
>> coming up with possible theories for the increase.
>>
>> On Wed, Jun 8, 2016 at 12:03 PM, Oleksandr Natalenko
>> <oleksandr at natalenko.name> wrote:
>>
>> Also, I've checked shd log files, and found out that for some reason
>>> shd constantly reconnects to bricks: [1]
>>>
>>> Please note that suggested fix [2] by Pranith does not help, VIRT
>>> value still grows:
>>>
>>> ===
>>> root      1010  0.0  9.6 7415248 374688 ?      Ssl  чер07   0:14
>>> /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
>>> /var/lib/glusterd/glustershd/run/glustershd.pid -l
>>> /var/log/glusterfs/glustershd.log -S
>>> /var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket
>>> --xlator-option
>>> *replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7
>>> ===
>>>
>>> I do not know the reason why it is reconnecting, but I suspect leak
>>> to happen on that reconnect.
>>>
>>> CCing Pranith.
>>>
>>> [1] http://termbin.com/brob
>>> [2] http://review.gluster.org/#/c/14053/
>>>
>>> 06.06.2016 12:21, Kaushal M написав:
>>> Has multi-threaded SHD been merged into 3.7.* by any chance? If
>>> not,
>>>
>>> what I'm saying below doesn't apply.
>>>
>>> We saw problems when encrypted transports were used, because the RPC
>>> layer was not reaping threads (doing pthread_join) when a connection
>>> ended. This lead to similar observations of huge VIRT and relatively
>>> small RSS.
>>>
>>> I'm not sure how multi-threaded shd works, but it could be leaking
>>> threads in a similar way.
>>>
>>> On Mon, Jun 6, 2016 at 1:54 PM, Oleksandr Natalenko
>>> <oleksandr at natalenko.name> wrote:
>>> Hello.
>>>
>>> We use v3.7.11, replica 2 setup between 2 nodes + 1 dummy node for
>>> keeping
>>> volumes metadata.
>>>
>>> Now we observe huge VSZ (VIRT) usage by glustershd on dummy node:
>>>
>>> ===
>>> root     15109  0.0 13.7 76552820 535272 ?     Ssl  тра26   2:11
>>> /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
>>> /var/lib/glusterd/glustershd/run/glustershd.pid -l
>>> /var/log/glusterfs/glustershd.log -S
>>> /var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket
>>> --xlator-option
>>> *replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7
>>> ===
>>>
>>> that is ~73G. RSS seems to be OK (~522M). Here is the statedump of
>>> glustershd process: [1]
>>>
>>> Also, here is sum of sizes, presented in statedump:
>>>
>>> ===
>>> # cat /var/run/gluster/glusterdump.15109.dump.1465200139 | awk -F
>>> '=' 'BEGIN
>>> {sum=0} /^size=/ {sum+=$2} END {print sum}'
>>> 353276406
>>> ===
>>>
>>> That is ~337 MiB.
>>>
>>> Also, here are VIRT values from 2 replica nodes:
>>>
>>> ===
>>> root     24659  0.0  0.3 5645836 451796 ?      Ssl  тра24   3:28
>>> /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
>>> /var/lib/glusterd/glustershd/run/glustershd.pid -l
>>> /var/log/glusterfs/glustershd.log -S
>>> /var/run/gluster/44ec3f29003eccedf894865107d5db90.socket
>>> --xlator-option
>>> *replicate*.node-uuid=a19afcc2-e26c-43ce-bca6-d27dc1713e87
>>> root     18312  0.0  0.3 6137500 477472 ?      Ssl  тра19   6:37
>>> /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
>>> /var/lib/glusterd/glustershd/run/glustershd.pid -l
>>> /var/log/glusterfs/glustershd.log -S
>>> /var/run/gluster/1670a3abbd1eea968126eb6f5be20322.socket
>>> --xlator-option
>>> *replicate*.node-uuid=52dca21b-c81c-48b5-9de2-1ed37987fbc2
>>> ===
>>>
>>> Those are 5 to 6G, which is much less than dummy node has, but still
>>> look
>>> too big for us.
>>>
>>> Should we care about huge VIRT value on dummy node? Also, how one
>>> would
>>> debug that?
>>>
>>> Regards,
>>> Oleksandr.
>>>
>>> [1] https://gist.github.com/d2cfa25251136512580220fcdb8a6ce6
>>> _______________________________________________
>>> Gluster-devel mailing list
>>> Gluster-devel at gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>>
>>
>> --
>>
>> Pranith
>>
>

-- 
Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20160608/5241b210/attachment.html>