[Gluster-users] Gluster high RPC calls and reply

Pranith Kumar Karampuri pkarampu at redhat.com
Tue Jul 8 05:37:59 UTC 2014


On 07/08/2014 09:12 AM, Gurdeep Singh (Guru) wrote:
> Sorry I am in Australia Time Zone so had to call it a night.
>
> I checked the PIDs and none of them were valid.
>
> The UID 48 is Apache.
> apache:x:48:48:Apache:/var/www/html:/bin/bash
>
> A bit detail as to what I am trying to achieve with gluster. We have 2 
> web servers which has a folder image that has images that are uploaded 
> by our customers. We are doing master-master replication on mysql, 
> which is working fine. As we have LB configured, customer would land 
> on any of the servers. If they add a product, the database syncs the 
> information, its only the image folder that they upload images to 
> needs to be synced between the servers, thus we looked into glusterfs. 
> Gluster works fine, but the only thing that caught my eye was the 
> bandwidth utilisation bumped up a little bit even when there were no 
> file changes made on either of the server. I understand keep alive, 
> but this looked a bit odd that we are seeing lookup calls for random 
> files on the wire.
> First I thought it was just Gluster using the 200KB/s throughput, but 
> last night I found out that its a combination of Gluster, mysql, LB. I 
> switched off LB & mysql and found that Gluster is using around 
> 30-40KB/s. At the moment we don’t have many files, but this raises 
> scalability questions, what will happen when we reach 1000s of files 
> in that directory.
>
> I don’t know how Apache is causing this. I have Ccd my dev team on 
> this as well.
Thanks Gurdeep. At least for now it seems to be traffic from apache. 
Probably we need to find root cause for what can cause crawls if any by 
apache on gluster mount.

Pranith
>
> Thanks for all your help.
>
> Regards,
> Gurdeep.
>
> On 8 Jul 2014, at 12:45 am, Pranith Kumar Karampuri 
> <pkarampu at redhat.com <mailto:pkarampu at redhat.com>> wrote:
>
>> hi Gurdeep,
>>       Niels extracted the pids of applications which are causing 
>> traffic here:
>> http://paste.fedoraproject.org/116010/74385214
>> Please check if any of those pids make sense. Apparently all those 
>> applications are from UID 48. You may want to check who that user as 
>> well?
>>
>> Pranith
>> On 07/07/2014 07:52 PM, Gurdeep Singh (Guru) wrote:
>>> Hello Pranith,
>>>
>>> That capture was taken in the morning. There is no pid 14927.
>>>
>>> Please see attached capture that I just took from the server.
>>>
>>> [guru at srv2 ~]$ ps -v 14927
>>> Warning: bad syntax, perhaps a bogus '-'? See 
>>> /usr/share/doc/procps-3.2.8/FAQ
>>>   PID TTY      STAT   TIME  MAJFL TRS   DRS   RSS %MEM COMMAND
>>>
>>>
>>>
>>>
>>> Thanks,
>>> Gurdeep.
>>>
>>>
>>> On 8 Jul 2014, at 12:12 am, Pranith Kumar Karampuri 
>>> <pkarampu at redhat.com <mailto:pkarampu at redhat.com>> wrote:
>>>
>>>>
>>>> On 07/07/2014 07:39 PM, Gurdeep Singh (Guru) wrote:
>>>>> Hello Pranith,
>>>>>
>>>>> Process 18629 is not sending any traffic across the servers.
>>>>>
>>>>> The process that are constantly sending packet across is 1055 & 18611.
>>>> In that case it is the application which is sending the traffic. 
>>>> Niels just looked into the pcap file and even he found the process 
>>>> with pid 14927 to be the one sending the traffic. Could you check 
>>>> what process it is?
>>>>
>>>> Pranith.
>>>>>
>>>>> If any, what is the interval of RPC lookup on one file? can we 
>>>>> somehow control the lookup frequency?
>>>>>
>>>>> Thanks,
>>>>> Gurdeep.
>>>>>
>>>>>
>>>>>
>>>>> On 7 Jul 2014, at 11:59 pm, Pranith Kumar Karampuri 
>>>>> <pkarampu at redhat.com <mailto:pkarampu at redhat.com>> wrote:
>>>>>
>>>>>>
>>>>>> On 07/07/2014 07:22 PM, Gurdeep Singh (Guru) wrote:
>>>>>>> [guru at srv2 ~]$ ps -v 1055
>>>>>>> Warning: bad syntax, perhaps a bogus '-'? See 
>>>>>>> /usr/share/doc/procps-3.2.8/FAQ
>>>>>>> PID TTY      STAT   TIME  MAJFL TRS   DRS   RSS %MEM COMMAND
>>>>>>>  1055 ?        Ssl   86:01     31     0 319148 33092  3.2 
>>>>>>> /usr/sbin/glusterfs --volfile-server=srv2 --volfile-id=/gv0 
>>>>>>> /var/www/html/image/
>>>>>>> [guru at srv2 ~]$
>>>>>>>
>>>>>> Gurdeep,
>>>>>>        Don't see anything odd here :-(. Mount is looking up files 
>>>>>> and brick is serving it. Why don't you keep a watch on the 
>>>>>> process '18629' and similar processes in the cluster. Do a ps aux 
>>>>>> | grep glustershd to get the pids. There will be one such process 
>>>>>> per machine in the cluster. Check how much it consumes. That is 
>>>>>> the only process which does operations without any operations on 
>>>>>> mounts.
>>>>>>
>>>>>> Pranith
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 7 Jul 2014, at 11:49 pm, Pranith Kumar Karampuri 
>>>>>>> <pkarampu at redhat.com <mailto:pkarampu at redhat.com>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> On 07/07/2014 07:03 PM, Gurdeep Singh (Guru) wrote:
>>>>>>>>> Hello Niels,
>>>>>>>>>
>>>>>>>>> I did a net hogs on the interface to see what process might be 
>>>>>>>>> using the bandwidth,
>>>>>>>>>
>>>>>>>>> NetHogs version 0.8.0
>>>>>>>>>
>>>>>>>>> PID USER     PROGRAM DEV        SENT RECEIVED
>>>>>>>>> 18611 root /usr/sbin/glusterfsd         tun0  16.307      
>>>>>>>>> 17.547 KB/sec
>>>>>>>>> 1055 root /usr/sbin/glusterfs         tun0  17.249      16.259 
>>>>>>>>> KB/sec
>>>>>>>>> 13439 guru     sshd: guru at pts/0   tun0  0.966       0.051 KB/sec
>>>>>>>>> 18625 root /usr/sbin/glusterfs         tun0  0.000       0.000 
>>>>>>>>> KB/sec
>>>>>>>>> 18629 root /usr/sbin/glusterfs         tun0  0.000       0.000 
>>>>>>>>> KB/sec
>>>>>>>>> 9636 root /usr/sbin/glusterd         tun0  0.000       0.000 
>>>>>>>>> KB/sec
>>>>>>>>> ? root     unknown TCP     0.000       0.000 KB/sec
>>>>>>>>>
>>>>>>>>> TOTAL         34.523 33.856 KB/sec
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Which process corresponds to '1055'?
>>>>>>>>
>>>>>>>> Pranith
>>>>>>>>> Its glusterfs and glusterfsd process.
>>>>>>>>>
>>>>>>>>> I looked at the capture file and see that the lookup is being 
>>>>>>>>> made on random files.
>>>>>>>>>
>>>>>>>>> For PID information, please see this:
>>>>>>>>>
>>>>>>>>> [guru at srv2 ~]$ sudo netstat -tpn | grep 49152
>>>>>>>>> tcp       0      0 127.0.0.1:49152 127.0.0.1:1012         
>>>>>>>>> ESTABLISHED 18611/glusterfsd
>>>>>>>>> tcp       0      0 127.0.0.1:49152 127.0.0.1:1016         
>>>>>>>>> ESTABLISHED 18611/glusterfsd
>>>>>>>>> tcp       0      0 127.0.0.1:1016 127.0.0.1:49152         
>>>>>>>>> ESTABLISHED 18625/glusterfs
>>>>>>>>> tcp       0      0 10.8.0.6:1021 10.8.0.1:49152         
>>>>>>>>> ESTABLISHED 1055/glusterfs
>>>>>>>>> tcp       0      0 10.8.0.6:49152 10.8.0.1:1017         
>>>>>>>>> ESTABLISHED 18611/glusterfsd
>>>>>>>>> tcp       0      0 10.8.0.6:1020 10.8.0.1:49152         
>>>>>>>>> ESTABLISHED 18629/glusterfs
>>>>>>>>> tcp       0      0 127.0.0.1:1023 127.0.0.1:49152         
>>>>>>>>> ESTABLISHED 18629/glusterfs
>>>>>>>>> tcp       0      0 10.8.0.6:49152 10.8.0.1:1022         
>>>>>>>>> ESTABLISHED 18611/glusterfsd
>>>>>>>>> tcp       0      0 10.8.0.6:49152 10.8.0.1:1021         
>>>>>>>>> ESTABLISHED 18611/glusterfsd
>>>>>>>>> tcp       0      0 127.0.0.1:49152 127.0.0.1:1023         
>>>>>>>>> ESTABLISHED 18611/glusterfsd
>>>>>>>>> tcp       0      0 127.0.0.1:1012 127.0.0.1:49152         
>>>>>>>>> ESTABLISHED 1055/glusterfs
>>>>>>>>> tcp       0      0 10.8.0.6:1019 10.8.0.1:49152         
>>>>>>>>> ESTABLISHED 18625/glusterfs
>>>>>>>>> [guru at srv2 ~]$ ps -v 18611
>>>>>>>>> Warning: bad syntax, perhaps a bogus '-'? See 
>>>>>>>>> /usr/share/doc/procps-3.2.8/FAQ
>>>>>>>>>   PID TTY      STAT TIME  MAJFL   TRS DRS   RSS %MEM COMMAND
>>>>>>>>> 18611 ?        Ssl 14:12      0     0 650068 20404  2.0 
>>>>>>>>> /usr/sbin/glusterfsd -s srv2 --volfile-id 
>>>>>>>>> gv0.srv2.root-gluster-vol0 -p /var/lib/glusterd/vols/gv0
>>>>>>>>> [guru at srv2 ~]$ ps -v 18629
>>>>>>>>> Warning: bad syntax, perhaps a bogus '-'? See 
>>>>>>>>> /usr/share/doc/procps-3.2.8/FAQ
>>>>>>>>>   PID TTY      STAT TIME  MAJFL   TRS DRS   RSS %MEM COMMAND
>>>>>>>>> 18629 ?        Ssl 0:04      0     0 333296 17380  1.7 
>>>>>>>>> /usr/sbin/glusterfs -s localhost --volfile-id 
>>>>>>>>> gluster/glustershd -p /var/lib/glusterd/glustershd/r
>>>>>>>>> [guru at srv2 ~]$
>>>>>>>>> [guru at srv2 ~]$
>>>>>>>>> [guru at srv2 ~]$ ps -v 18629
>>>>>>>>> Warning: bad syntax, perhaps a bogus '-'? See 
>>>>>>>>> /usr/share/doc/procps-3.2.8/FAQ
>>>>>>>>>   PID TTY      STAT TIME  MAJFL   TRS DRS   RSS %MEM COMMAND
>>>>>>>>> 18629 ?        Ssl 0:04      0     0 333296 17380  1.7 
>>>>>>>>> /usr/sbin/glusterfs -s localhost --volfile-id 
>>>>>>>>> gluster/glustershd -p 
>>>>>>>>> /var/lib/glusterd/glustershd/run/glustershd.pid -l 
>>>>>>>>> /var/log/glusterfs/glustershd.log -S 
>>>>>>>>> /var/run/823fa3197e2d1841be888
>>>>>>>>> [guru at srv2 ~]$
>>>>>>>>> [guru at srv2 ~]$
>>>>>>>>> [guru at srv2 ~]$
>>>>>>>>> [guru at srv2 ~]$
>>>>>>>>> [guru at srv2 ~]$ ps -v 18629
>>>>>>>>> Warning: bad syntax, perhaps a bogus '-'? See 
>>>>>>>>> /usr/share/doc/procps-3.2.8/FAQ
>>>>>>>>>   PID TTY      STAT TIME  MAJFL   TRS DRS   RSS %MEM COMMAND
>>>>>>>>> 18629 ?        Ssl 0:04      0     0 333296 17380  1.7 
>>>>>>>>> /usr/sbin/glusterfs -s localhost --volfile-id 
>>>>>>>>> gluster/glustershd -p 
>>>>>>>>> /var/lib/glusterd/glustershd/run/glustershd.pid -l 
>>>>>>>>> /var/log/glusterfs/glustershd.log -S 
>>>>>>>>> /var/run/823fa3197e2d1841be8881500723b063.socket 
>>>>>>>>> --xlator-option *replicate*.node-uuid=84af83c9-0a29-
>>>>>>>>> [guru at srv2 ~]$
>>>>>>>>> [guru at srv2 ~]$
>>>>>>>>> [guru at srv2 ~]$ ps -v 18625
>>>>>>>>> Warning: bad syntax, perhaps a bogus '-'? See 
>>>>>>>>> /usr/share/doc/procps-3.2.8/FAQ
>>>>>>>>>   PID TTY      STAT TIME  MAJFL   TRS DRS   RSS %MEM COMMAND
>>>>>>>>> 18625 ?        Ssl 0:03      0     0 239528 41040  4.0 
>>>>>>>>> /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p 
>>>>>>>>> /var/lib/glusterd/nfs/run/nfs.pid -l 
>>>>>>>>> /var/log/glusterfs/nfs.log -S 
>>>>>>>>> /var/run/5ad5b036fd636cc5dddffa73593e4089.socket
>>>>>>>>> [guru at srv2 ~]$ sudo nethogs tun0
>>>>>>>>> Waiting for first packet to arrive (see sourceforge.net 
>>>>>>>>> <http://sourceforge.net/> bug 1019381)
>>>>>>>>> [guru at srv2 ~]$ rpm -qa | grep gluster
>>>>>>>>> glusterfs-3.5.1-1.el6.x86_64
>>>>>>>>> glusterfs-cli-3.5.1-1.el6.x86_64
>>>>>>>>> glusterfs-libs-3.5.1-1.el6.x86_64
>>>>>>>>> glusterfs-fuse-3.5.1-1.el6.x86_64
>>>>>>>>> glusterfs-server-3.5.1-1.el6.x86_64
>>>>>>>>> glusterfs-api-3.5.1-1.el6.x86_64
>>>>>>>>> [guru at srv2 ~]$
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I don’t see anything odd here. Please suggest.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Gurdeep.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 7 Jul 2014, at 9:06 pm, Niels de Vos <ndevos at redhat.com 
>>>>>>>>> <mailto:ndevos at redhat.com>> wrote:
>>>>>>>>>
>>>>>>>>>> On Sun, Jul 06, 2014 at 11:28:51PM +1000, Gurdeep Singh 
>>>>>>>>>> (Guru) wrote:
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> I have setup gluster in replicate type and its working fine.
>>>>>>>>>>>
>>>>>>>>>>> I am seeing a constant chatting between the hosts for lookup 
>>>>>>>>>>> call and
>>>>>>>>>>> lookup reply. I am trying to understand as to why this 
>>>>>>>>>>> traffic is
>>>>>>>>>>> being initiated constantly. Please look at the attached 
>>>>>>>>>>> image. This
>>>>>>>>>>> traffic is using around 200KB/s of constant bandwidth and is
>>>>>>>>>>> exhausting our allocated monthly bandwidth on our 2 VPS.
>>>>>>>>>>
>>>>>>>>>> You can use Wireshark to identify which process does the 
>>>>>>>>>> LOOKUP calls.
>>>>>>>>>> For this, do the following:
>>>>>>>>>>
>>>>>>>>>> 1. select a LOOKUP Call
>>>>>>>>>> 2. enable the 'packet details' pane (found in the main menu, 
>>>>>>>>>> 'view')
>>>>>>>>>> 3. expand the 'Transmission Control Protocol' tree
>>>>>>>>>> 4. check the 'Source port' of the LOOKUP Call
>>>>>>>>>>
>>>>>>>>>> Together with the 'Source' and the 'Source port' you can go 
>>>>>>>>>> to the
>>>>>>>>>> server that matches the 'Source' address. A command like this 
>>>>>>>>>> would give
>>>>>>>>>> you the PID of the process in the right column:
>>>>>>>>>>
>>>>>>>>>>  # netstat -tpn | grep $SOURCE_PORT
>>>>>>>>>>
>>>>>>>>>> And with 'ps -v $PID' you can check which process is 
>>>>>>>>>> responsible for the
>>>>>>>>>> LOOKUP. This process can be a fuse-mount, self-heal-daemon or 
>>>>>>>>>> any other
>>>>>>>>>> glusterfs-client. Depending on the type of client, you maybe 
>>>>>>>>>> can tune
>>>>>>>>>> the workload or other options a little.
>>>>>>>>>>
>>>>>>>>>> In Wireshark you can also check what filename is LOOKUP'd, 
>>>>>>>>>> just expand
>>>>>>>>>> the 'GlusterFS' part in the 'packet details' and check the 
>>>>>>>>>> 'Basename'.
>>>>>>>>>> Maybe this filename (without directory structure) does give 
>>>>>>>>>> you any
>>>>>>>>>> ideas of which activity is causing the LOOKUPs.
>>>>>>>>>>
>>>>>>>>>> HTH,
>>>>>>>>>> Niels
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> The configuration I have for Gluster is:
>>>>>>>>>>>
>>>>>>>>>>> [guru at srv1 ~]$ sudo gluster volume info
>>>>>>>>>>> [sudo] password for guru:
>>>>>>>>>>>
>>>>>>>>>>> Volume Name: gv0
>>>>>>>>>>> Type: Replicate
>>>>>>>>>>> Volume ID: dc8dc3f2-f5bd-4047-9101-acad04695442
>>>>>>>>>>> Status: Started
>>>>>>>>>>> Number of Bricks: 1 x 2 = 2
>>>>>>>>>>> Transport-type: tcp
>>>>>>>>>>> Bricks:
>>>>>>>>>>> Brick1: srv1:/root/gluster-vol0
>>>>>>>>>>> Brick2: srv2:/root/gluster-vol0
>>>>>>>>>>> Options Reconfigured:
>>>>>>>>>>> cluster.lookup-unhashed: on
>>>>>>>>>>> performance.cache-refresh-timeout: 60
>>>>>>>>>>> performance.cache-size: 1GB
>>>>>>>>>>> storage.health-check-interval: 30
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Please suggest how to fine tune the RPC calls/reply.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Gluster-users mailing list
>>>>>>>>> Gluster-users at gluster.org
>>>>>>>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140708/59617130/attachment.html>


More information about the Gluster-users mailing list