[Gluster-devel] Huge VSZ (VIRT) usage by glustershd on dummy node

Oleksandr Natalenko oleksandr at natalenko.name
Wed Jun 8 07:03:12 UTC 2016


Yup, I can do that, but please note that RSS does not change. Will 
statedump show VIRT values?

Also, I'm looking at the numbers now, and see that on each reconnect 
VIRT grows by ~24M (once per ~10–15 mins). Probably, that could give you 
some idea what is going wrong.

08.06.2016 09:50, Pranith Kumar Karampuri написав:
> Oleksandr,
> Could you take statedump of the shd process once in 5-10 minutes and
> send may be 5 samples of them when it starts to increase? This will
> help us find what datatypes are being allocated a lot and can lead to
> coming up with possible theories for the increase.
> 
> On Wed, Jun 8, 2016 at 12:03 PM, Oleksandr Natalenko
> <oleksandr at natalenko.name> wrote:
> 
>> Also, I've checked shd log files, and found out that for some reason
>> shd constantly reconnects to bricks: [1]
>> 
>> Please note that suggested fix [2] by Pranith does not help, VIRT
>> value still grows:
>> 
>> ===
>> root      1010  0.0  9.6 7415248 374688 ?      Ssl  чер07   0:14
>> /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
>> /var/lib/glusterd/glustershd/run/glustershd.pid -l
>> /var/log/glusterfs/glustershd.log -S
>> /var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket
>> --xlator-option
>> *replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7
>> ===
>> 
>> I do not know the reason why it is reconnecting, but I suspect leak
>> to happen on that reconnect.
>> 
>> CCing Pranith.
>> 
>> [1] http://termbin.com/brob
>> [2] http://review.gluster.org/#/c/14053/
>> 
>> 06.06.2016 12:21, Kaushal M написав:
>> Has multi-threaded SHD been merged into 3.7.* by any chance? If
>> not,
>> 
>> what I'm saying below doesn't apply.
>> 
>> We saw problems when encrypted transports were used, because the RPC
>> layer was not reaping threads (doing pthread_join) when a connection
>> ended. This lead to similar observations of huge VIRT and relatively
>> small RSS.
>> 
>> I'm not sure how multi-threaded shd works, but it could be leaking
>> threads in a similar way.
>> 
>> On Mon, Jun 6, 2016 at 1:54 PM, Oleksandr Natalenko
>> <oleksandr at natalenko.name> wrote:
>> Hello.
>> 
>> We use v3.7.11, replica 2 setup between 2 nodes + 1 dummy node for
>> keeping
>> volumes metadata.
>> 
>> Now we observe huge VSZ (VIRT) usage by glustershd on dummy node:
>> 
>> ===
>> root     15109  0.0 13.7 76552820 535272 ?     Ssl  тра26   2:11
>> /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
>> /var/lib/glusterd/glustershd/run/glustershd.pid -l
>> /var/log/glusterfs/glustershd.log -S
>> /var/run/gluster/7848e17764dd4ba80f4623aecb91b07a.socket
>> --xlator-option
>> *replicate*.node-uuid=80bc95e1-2027-4a96-bb66-d9c8ade624d7
>> ===
>> 
>> that is ~73G. RSS seems to be OK (~522M). Here is the statedump of
>> glustershd process: [1]
>> 
>> Also, here is sum of sizes, presented in statedump:
>> 
>> ===
>> # cat /var/run/gluster/glusterdump.15109.dump.1465200139 | awk -F
>> '=' 'BEGIN
>> {sum=0} /^size=/ {sum+=$2} END {print sum}'
>> 353276406
>> ===
>> 
>> That is ~337 MiB.
>> 
>> Also, here are VIRT values from 2 replica nodes:
>> 
>> ===
>> root     24659  0.0  0.3 5645836 451796 ?      Ssl  тра24   3:28
>> /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
>> /var/lib/glusterd/glustershd/run/glustershd.pid -l
>> /var/log/glusterfs/glustershd.log -S
>> /var/run/gluster/44ec3f29003eccedf894865107d5db90.socket
>> --xlator-option
>> *replicate*.node-uuid=a19afcc2-e26c-43ce-bca6-d27dc1713e87
>> root     18312  0.0  0.3 6137500 477472 ?      Ssl  тра19   6:37
>> /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p
>> /var/lib/glusterd/glustershd/run/glustershd.pid -l
>> /var/log/glusterfs/glustershd.log -S
>> /var/run/gluster/1670a3abbd1eea968126eb6f5be20322.socket
>> --xlator-option
>> *replicate*.node-uuid=52dca21b-c81c-48b5-9de2-1ed37987fbc2
>> ===
>> 
>> Those are 5 to 6G, which is much less than dummy node has, but still
>> look
>> too big for us.
>> 
>> Should we care about huge VIRT value on dummy node? Also, how one
>> would
>> debug that?
>> 
>> Regards,
>> Oleksandr.
>> 
>> [1] https://gist.github.com/d2cfa25251136512580220fcdb8a6ce6
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-devel
> 
> --
> 
> Pranith


More information about the Gluster-devel mailing list