[Gluster-users] glusterfs under high load failing?

Mon Oct 13 16:09:05 UTC 2014

Could you give your 'gluster volume info' output?

Pranith
On 10/13/2014 09:36 PM, Roman wrote:
> Hi,
>
> I've got this kind of setup (servers run replica)
>
>
> @ 10G backend
> gluster storage1
> gluster storage2
> gluster client1
>
> @1g backend
> other gluster clients
>
> Servers got HW RAID5 with SAS disks.
>
> So today I've desided to create a 900GB file for iscsi target that 
> will be located @ glusterfs separate volume, using dd (just a dummy 
> file filled with zeros, bs=1G count 900)
> For the first of all the process took pretty lots of time, the writing 
> speed was 130 MB/sec (client port was 2 gbps, servers ports were 
> running @ 1gbps).
> Then it reported something like "endpoint is not connected" and all of 
> my VMs on the other volume started to give me IO errors.
> Servers load was around 4,6 (total 12 cores)
>
> Maybe it was due to timeout of 2 secs, so I've made it a big higher, 
> 10 sec.
>
> Also during the dd image creation time, VMs very often reported me 
> that their disks are slow like
>
> WARNINGs: Read IO Wait time is -0.02 (outside range [0:1]).
>
> Is 130MB /sec is the maximum bandwidth for all of the volumes in 
> total? That why would we need 10g backends?
>
> HW Raid local speed is 300 MB/sec, so it should not be an issue. any 
> ideas or mby any advices?
>
>
> Maybe some1 got optimized sysctl.conf for 10G backend?
>
> mine is pretty simple, which can be found from googling.
>
>
> just to mention: those VM-s were connected using separate 1gbps 
> intraface, which means, they should not be affected by the client with 
> 10g backend.
>
>
> logs are pretty useless, they just say  this during the outage
>
>
> [2014-10-13 12:09:18.392910] W 
> [client-handshake.c:276:client_ping_cbk] 
> 0-HA-2TB-TT-Proxmox-cluster-client-0: timer must have expired
>
> [2014-10-13 12:10:08.389708] C 
> [client-handshake.c:127:rpc_client_ping_timer_expired] 
> 0-HA-2TB-TT-Proxmox-cluster-client-0: server 10.250.0.1:49159 
> <http://10.250.0.1:49159> has not responded in the last 2 seconds, 
> disconnecting.
>
> [2014-10-13 12:10:08.390312] W 
> [client-handshake.c:276:client_ping_cbk] 
> 0-HA-2TB-TT-Proxmox-cluster-client-0: timer must have expired
>
> so I decided to set the timout a bit higher.
>
> So it seems to me, that under high load GlusterFS is not useable? 130 
> MB/s is not that much to get some kind of timeouts or makeing the 
> systme so slow, that VM-s feeling themselves bad.
>
> Of course, after the disconnection, healing process was started, but 
> as VM-s lost connection to both of servers, it was pretty useless, 
> they could not run anymore. and BTW, when u load the server with such 
> huge job (dd of 900GB), healing process goes soooooo slow :)
>
>
>
> -- 
> Best regards,
> Roman.
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20141013/8730b25a/attachment.html>