[Gluster-users] glusterfs under high load failing?

Mon Oct 13 16:53:46 UTC 2014

oh sorry, and then deleted it, of course :)

2014-10-13 19:53 GMT+03:00 Roman <romeo.r at gmail.com>:

> Still the same way: just created a large empty file:
>
> dd if=/dev/zero of=disk bs=1G count=900 iflag=fullblock
>
> 2014-10-13 19:49 GMT+03:00 Pranith Kumar Karampuri <pkarampu at redhat.com>:
>
>>
>> On 10/13/2014 10:03 PM, Roman wrote:
>>
>> hmm,
>> seems like another strange issue? Seen this before. Had to restart the
>> volume to get my empty space back.
>>  root at glstor-cli:/srv/nfs/HA-WIN-TT-1T# ls -l
>> total 943718400
>> -rw-r--r-- 1 root root 966367641600 Oct 13 16:55 disk
>> root at glstor-cli:/srv/nfs/HA-WIN-TT-1T# rm disk
>> root at glstor-cli:/srv/nfs/HA-WIN-TT-1T# df -h
>> Filesystem                                              Size  Used Avail
>> Use% Mounted on
>> rootfs                                                  282G  1.1G  266G
>>   1% /
>> udev                                                     10M     0   10M
>>   0% /dev
>> tmpfs                                                   1.4G  228K  1.4G
>>   1% /run
>> /dev/disk/by-uuid/c62ee3c0-c0e5-44af-b0cd-7cb3fbcc0fba  282G  1.1G  266G
>>   1% /
>> tmpfs                                                   5.0M     0  5.0M
>>   0% /run/lock
>> tmpfs                                                   5.2G     0  5.2G
>>   0% /run/shm
>> stor1:HA-WIN-TT-1T                                     1008G  901G   57G
>>  95% /srv/nfs/HA-WIN-TT-1T
>>
>>  no file, but size is still 901G.
>> Both servers show the same.
>> Do I really have to restart the volume to fix that?
>>
>> IMO this can happen if there is an fd leak. open-fd is the only variable
>> that can change with volume restart. How do you re-create the bug?
>>
>> Pranith
>>
>>
>> 2014-10-13 19:30 GMT+03:00 Roman <romeo.r at gmail.com>:
>>
>>> Sure.
>>> I'll let it to run for this night .
>>>
>>> 2014-10-13 19:19 GMT+03:00 Pranith Kumar Karampuri <pkarampu at redhat.com>
>>> :
>>>
>>>>  hi Roman,
>>>>      Do you think we can run this test again? this time, could you
>>>> enable 'gluster volume profile <volname> start', do the same test. Provide
>>>> output of 'gluster volume profile <volname> info' and logs after the test?
>>>>
>>>> Pranith
>>>>
>>>> On 10/13/2014 09:45 PM, Roman wrote:
>>>>
>>>> Sure !
>>>>
>>>>  root at stor1:~# gluster volume info
>>>>
>>>>  Volume Name: HA-2TB-TT-Proxmox-cluster
>>>> Type: Replicate
>>>> Volume ID: 66e38bde-c5fa-4ce2-be6e-6b2adeaa16c2
>>>> Status: Started
>>>> Number of Bricks: 1 x 2 = 2
>>>> Transport-type: tcp
>>>> Bricks:
>>>> Brick1: stor1:/exports/HA-2TB-TT-Proxmox-cluster/2TB
>>>> Brick2: stor2:/exports/HA-2TB-TT-Proxmox-cluster/2TB
>>>> Options Reconfigured:
>>>> nfs.disable: 0
>>>> network.ping-timeout: 10
>>>>
>>>>  Volume Name: HA-WIN-TT-1T
>>>> Type: Replicate
>>>> Volume ID: 2937ac01-4cba-44a8-8ff8-0161b67f8ee4
>>>> Status: Started
>>>> Number of Bricks: 1 x 2 = 2
>>>> Transport-type: tcp
>>>> Bricks:
>>>> Brick1: stor1:/exports/NFS-WIN/1T
>>>> Brick2: stor2:/exports/NFS-WIN/1T
>>>> Options Reconfigured:
>>>> nfs.disable: 1
>>>> network.ping-timeout: 10
>>>>
>>>>
>>>>
>>>> 2014-10-13 19:09 GMT+03:00 Pranith Kumar Karampuri <pkarampu at redhat.com
>>>> >:
>>>>
>>>>>  Could you give your 'gluster volume info' output?
>>>>>
>>>>> Pranith
>>>>>
>>>>> On 10/13/2014 09:36 PM, Roman wrote:
>>>>>
>>>>>  Hi,
>>>>>
>>>>>  I've got this kind of setup (servers run replica)
>>>>>
>>>>>
>>>>>  @ 10G backend
>>>>> gluster storage1
>>>>> gluster storage2
>>>>> gluster client1
>>>>>
>>>>>  @1g backend
>>>>> other gluster clients
>>>>>
>>>>>  Servers got HW RAID5 with SAS disks.
>>>>>
>>>>>  So today I've desided to create a 900GB file for iscsi target that
>>>>> will be located @ glusterfs separate volume, using dd (just a dummy file
>>>>> filled with zeros, bs=1G count 900)
>>>>> For the first of all the process took pretty lots of time, the writing
>>>>> speed was 130 MB/sec (client port was 2 gbps, servers ports were running @
>>>>> 1gbps).
>>>>> Then it reported something like "endpoint is not connected" and all of
>>>>> my VMs on the other volume started to give me IO errors.
>>>>> Servers load was around 4,6 (total 12 cores)
>>>>>
>>>>>  Maybe it was due to timeout of 2 secs, so I've made it a big higher,
>>>>> 10 sec.
>>>>>
>>>>>  Also during the dd image creation time, VMs very often reported me
>>>>> that their disks are slow like
>>>>>
>>>>> WARNINGs: Read IO Wait time is -0.02 (outside range [0:1]).
>>>>>
>>>>> Is 130MB /sec is the maximum bandwidth for all of the volumes in
>>>>> total? That why would we need 10g backends?
>>>>>
>>>>> HW Raid local speed is 300 MB/sec, so it should not be an issue. any
>>>>> ideas or mby any advices?
>>>>>
>>>>>
>>>>>  Maybe some1 got optimized sysctl.conf for 10G backend?
>>>>>
>>>>> mine is pretty simple, which can be found from googling.
>>>>>
>>>>>
>>>>>  just to mention: those VM-s were connected using separate 1gbps
>>>>> intraface, which means, they should not be affected by the client with 10g
>>>>> backend.
>>>>>
>>>>>
>>>>>  logs are pretty useless, they just say  this during the outage
>>>>>
>>>>>
>>>>>  [2014-10-13 12:09:18.392910] W
>>>>> [client-handshake.c:276:client_ping_cbk]
>>>>> 0-HA-2TB-TT-Proxmox-cluster-client-0: timer must have expired
>>>>>
>>>>> [2014-10-13 12:10:08.389708] C
>>>>> [client-handshake.c:127:rpc_client_ping_timer_expired]
>>>>> 0-HA-2TB-TT-Proxmox-cluster-client-0: server 10.250.0.1:49159 has not
>>>>> responded in the last 2 seconds, disconnecting.
>>>>>
>>>>> [2014-10-13 12:10:08.390312] W
>>>>> [client-handshake.c:276:client_ping_cbk]
>>>>> 0-HA-2TB-TT-Proxmox-cluster-client-0: timer must have expired
>>>>>  so I decided to set the timout a bit higher.
>>>>>
>>>>>  So it seems to me, that under high load GlusterFS is not useable?
>>>>> 130 MB/s is not that much to get some kind of timeouts or makeing the
>>>>> systme so slow, that VM-s feeling themselves bad.
>>>>>
>>>>>  Of course, after the disconnection, healing process was started, but
>>>>> as VM-s lost connection to both of servers, it was pretty useless, they
>>>>> could not run anymore. and BTW, when u load the server with such huge job
>>>>> (dd of 900GB), healing process goes soooooo slow :)
>>>>>
>>>>>
>>>>>
>>>>>  --
>>>>> Best regards,
>>>>> Roman.
>>>>>
>>>>>
>>>>>  _______________________________________________
>>>>> Gluster-users mailing listGluster-users at gluster.orghttp://supercolony.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>  --
>>>> Best regards,
>>>> Roman.
>>>>
>>>>
>>>>
>>>
>>>
>>>   --
>>> Best regards,
>>> Roman.
>>>
>>
>>
>>
>>  --
>> Best regards,
>> Roman.
>>
>>
>>
>
>
> --
> Best regards,
> Roman.
>


-- 
Best regards,
Roman.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20141013/60496bdd/attachment.html>