[Gluster-users] glusterfs under high load failing?

Mon Oct 13 16:49:04 UTC 2014

On 10/13/2014 10:03 PM, Roman wrote:
> hmm,
> seems like another strange issue? Seen this before. Had to restart the 
> volume to get my empty space back.
> root at glstor-cli:/srv/nfs/HA-WIN-TT-1T# ls -l
> total 943718400
> -rw-r--r-- 1 root root 966367641600 Oct 13 16:55 disk
> root at glstor-cli:/srv/nfs/HA-WIN-TT-1T# rm disk
> root at glstor-cli:/srv/nfs/HA-WIN-TT-1T# df -h
> Filesystem  Size  Used Avail Use% Mounted on
> rootfs  282G  1.1G  266G   1% /
> udev 10M     0   10M   0% /dev
> tmpfs 1.4G  228K  1.4G   1% /run
> /dev/disk/by-uuid/c62ee3c0-c0e5-44af-b0cd-7cb3fbcc0fba  282G  1.1G 
>  266G   1% /
> tmpfs 5.0M     0  5.0M   0% /run/lock
> tmpfs 5.2G     0  5.2G   0% /run/shm
> stor1:HA-WIN-TT-1T 1008G  901G   57G  95% /srv/nfs/HA-WIN-TT-1T
>
> no file, but size is still 901G.
> Both servers show the same.
> Do I really have to restart the volume to fix that?
IMO this can happen if there is an fd leak. open-fd is the only variable 
that can change with volume restart. How do you re-create the bug?

Pranith
>
> 2014-10-13 19:30 GMT+03:00 Roman <romeo.r at gmail.com 
> <mailto:romeo.r at gmail.com>>:
>
>     Sure.
>     I'll let it to run for this night .
>
>     2014-10-13 19:19 GMT+03:00 Pranith Kumar Karampuri
>     <pkarampu at redhat.com <mailto:pkarampu at redhat.com>>:
>
>         hi Roman,
>              Do you think we can run this test again? this time, could
>         you enable 'gluster volume profile <volname> start', do the
>         same test. Provide output of 'gluster volume profile <volname>
>         info' and logs after the test?
>
>         Pranith
>
>         On 10/13/2014 09:45 PM, Roman wrote:
>>         Sure !
>>
>>         root at stor1:~# gluster volume info
>>
>>         Volume Name: HA-2TB-TT-Proxmox-cluster
>>         Type: Replicate
>>         Volume ID: 66e38bde-c5fa-4ce2-be6e-6b2adeaa16c2
>>         Status: Started
>>         Number of Bricks: 1 x 2 = 2
>>         Transport-type: tcp
>>         Bricks:
>>         Brick1: stor1:/exports/HA-2TB-TT-Proxmox-cluster/2TB
>>         Brick2: stor2:/exports/HA-2TB-TT-Proxmox-cluster/2TB
>>         Options Reconfigured:
>>         nfs.disable: 0
>>         network.ping-timeout: 10
>>
>>         Volume Name: HA-WIN-TT-1T
>>         Type: Replicate
>>         Volume ID: 2937ac01-4cba-44a8-8ff8-0161b67f8ee4
>>         Status: Started
>>         Number of Bricks: 1 x 2 = 2
>>         Transport-type: tcp
>>         Bricks:
>>         Brick1: stor1:/exports/NFS-WIN/1T
>>         Brick2: stor2:/exports/NFS-WIN/1T
>>         Options Reconfigured:
>>         nfs.disable: 1
>>         network.ping-timeout: 10
>>
>>
>>
>>         2014-10-13 19:09 GMT+03:00 Pranith Kumar Karampuri
>>         <pkarampu at redhat.com <mailto:pkarampu at redhat.com>>:
>>
>>             Could you give your 'gluster volume info' output?
>>
>>             Pranith
>>
>>             On 10/13/2014 09:36 PM, Roman wrote:
>>>             Hi,
>>>
>>>             I've got this kind of setup (servers run replica)
>>>
>>>
>>>             @ 10G backend
>>>             gluster storage1
>>>             gluster storage2
>>>             gluster client1
>>>
>>>             @1g backend
>>>             other gluster clients
>>>
>>>             Servers got HW RAID5 with SAS disks.
>>>
>>>             So today I've desided to create a 900GB file for iscsi
>>>             target that will be located @ glusterfs separate volume,
>>>             using dd (just a dummy file filled with zeros, bs=1G
>>>             count 900)
>>>             For the first of all the process took pretty lots of
>>>             time, the writing speed was 130 MB/sec (client port was
>>>             2 gbps, servers ports were running @ 1gbps).
>>>             Then it reported something like "endpoint is not
>>>             connected" and all of my VMs on the other volume started
>>>             to give me IO errors.
>>>             Servers load was around 4,6 (total 12 cores)
>>>
>>>             Maybe it was due to timeout of 2 secs, so I've made it a
>>>             big higher, 10 sec.
>>>
>>>             Also during the dd image creation time, VMs very often
>>>             reported me that their disks are slow like
>>>
>>>             WARNINGs: Read IO Wait time is -0.02 (outside range [0:1]).
>>>
>>>             Is 130MB /sec is the maximum bandwidth for all of the
>>>             volumes in total? That why would we need 10g backends?
>>>
>>>             HW Raid local speed is 300 MB/sec, so it should not be
>>>             an issue. any ideas or mby any advices?
>>>
>>>
>>>             Maybe some1 got optimized sysctl.conf for 10G backend?
>>>
>>>             mine is pretty simple, which can be found from googling.
>>>
>>>
>>>             just to mention: those VM-s were connected using
>>>             separate 1gbps intraface, which means, they should not
>>>             be affected by the client with 10g backend.
>>>
>>>
>>>             logs are pretty useless, they just say  this during the
>>>             outage
>>>
>>>
>>>             [2014-10-13 12:09:18.392910] W
>>>             [client-handshake.c:276:client_ping_cbk]
>>>             0-HA-2TB-TT-Proxmox-cluster-client-0: timer must have
>>>             expired
>>>
>>>             [2014-10-13 12:10:08.389708] C
>>>             [client-handshake.c:127:rpc_client_ping_timer_expired]
>>>             0-HA-2TB-TT-Proxmox-cluster-client-0: server
>>>             10.250.0.1:49159 <http://10.250.0.1:49159> has not
>>>             responded in the last 2 seconds, disconnecting.
>>>
>>>             [2014-10-13 12:10:08.390312] W
>>>             [client-handshake.c:276:client_ping_cbk]
>>>             0-HA-2TB-TT-Proxmox-cluster-client-0: timer must have
>>>             expired
>>>
>>>             so I decided to set the timout a bit higher.
>>>
>>>             So it seems to me, that under high load GlusterFS is not
>>>             useable? 130 MB/s is not that much to get some kind of
>>>             timeouts or makeing the systme so slow, that VM-s
>>>             feeling themselves bad.
>>>
>>>             Of course, after the disconnection, healing process was
>>>             started, but as VM-s lost connection to both of servers,
>>>             it was pretty useless, they could not run anymore. and
>>>             BTW, when u load the server with such huge job (dd of
>>>             900GB), healing process goes soooooo slow :)
>>>
>>>
>>>
>>>             -- 
>>>             Best regards,
>>>             Roman.
>>>
>>>
>>>             _______________________________________________
>>>             Gluster-users mailing list
>>>             Gluster-users at gluster.org  <mailto:Gluster-users at gluster.org>
>>>             http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>>
>>         -- 
>>         Best regards,
>>         Roman.
>
>
>
>
>     -- 
>     Best regards,
>     Roman.
>
>
>
>
> -- 
> Best regards,
> Roman.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20141013/99ecebae/attachment.html>