[Gluster-users] Strange server locks isuess with 2.0.7 - updating

Sat Nov 21 13:27:34 UTC 2009

The problem we experienced was occasional packet loss (not high, only very
occasional). You will see that in almost every LAN. If your ping-packet is
lost and you configured a low value a brick will be offline quite fast, though
there is no real problem. The bigger the timeout the more chances you have
that a following ping packet will make it and reset the wait-time.

On Fri, 20 Nov 2009 14:18:46 +0100
Marek <mb at kis.p.lodz.pl> wrote:

> Why You suggest ping-timeout with that high value?
> When some brick gets in trouble, mounted fs on client side is unusable (I/O is locked)
> and have to wait 120 sec. for timeout and "release fs".
> Locked client IO for 120 sec. is not acceptable.
> 
> 
> regards,
> 
> Stephan von Krawczynski wrote:
> > Try setting your ping-timeout way higher, since we use 120 we have almost no
> > issues in regular use. Nevertheless we do believe every problem will come back
> > when some brick(s) die...
> > 
> > 
> > On Tue, 10 Nov 2009 14:59:07 +0100
> > Marek Blaszkowski <mb at kis.p.lodz.pl> wrote:
> > 
> >> OK,
> >> here goes some more details, on a "bad" servers (with strange lockups) we got
> >> problems with open/move files. We are unable to open,move or just ls files
> >> (file utils just hangs )....
> >>
> >>
> >> Marek wrote:
> >>> Hello,
> >>> we're testing a simple configuration of glusterfs 2.0.7 with 4 servers 
> >>> and 1 client (2+2 bricks each replicated with
> >>> a distribute translator, configs below).
> >>> Durning our tests (client side copying/moving a lot of small files on 
> >>> glusterfs mounted FS) we got a strange
> >>> lockups on two of servers (bricks).
> >>> I was unable to login (via ssh) to server, on already started terminal 
> >>> sessions I couldn't spawn a "top"
> >>> process (it just hangs), vmstats exists with floating point exception. 
> >>> Other fileutils commands behaves "normal".
> >>> There was no any dmesg kernel messages (first guess was a kernel ups or 
> >>> other kernel related problems).
> >>> This server never had any CPU/memory problems under high loads before. 
> >>> Problems starts when we
> >>> run glusterfsd on this server. We had to a hard reset malfunction server 
> >>> (reboot doesn't work).
> >>> After a couple hours of testing another server disconected from a client 
> >>> (according to a client debug log).
> >>> Scenario was the same:
> >>> 1. unable to login to a server, connection was established but sshd on 
> >>> server side hang/timeout after entering a user password
> >>> 2. on a previous established terminal sessions was unable to run top or 
> >>> vmstat utility (vmstats exit with with
> >>> floating point exception. Copying/moving files was OK. Load was  0.00, 
> >>> 0.00, 0.00
> >>>
> >>>
> >>> What could be wrong? These servers never had problems before (simple 
> >>> terminal/proxy servers). Strange locking looks
> >>> like related to a kernel VM structures (why top/vmstat behaves odd??) or 
> >>> other kernel related problems.
> >>>
> >>> Server remote1 details: Linux version 2.6.26-1-686 (Debian 
> >>> 2.6.26-13lenny2) (dannf at debian.org)
> >>> (gcc version 4.1.3 20080704 (prerelease) (Debian 4.1.2-25)) #1 SMP Fri 
> >>> Mar 13 18:08:45 UTC 2009
> >>> running debian 5.0
> >>>
> >>> Server remote2 details: Linux version 2.6.22-14-server (buildd at palmer) 
> >>> (gcc version 4.1.3 20070929
> >>> (prerelease) (Ubuntu 4.1.2-16ubuntu2)) #1 SMP Sun Oct 14 23:34:23 GMT 2007
> >>> running ubuntu
> >>> both run glusterfsd:
> >>>  /usr/local/sbin/glusterfsd -p /var/run/glusterfsd.pid -f 
> >>> /usr/local/etc/glusterfs/glusterfs-server.vol
> >>>
> >>>
> >>> Note that both servers runs different os versions and got simillar 
> >>> lockup problems, never having problems
> >>> before (without glusterfsd).
> >>>
> >>>
> >>> Server gluster config file (the same on 4 servers):
> >>> -----------------cut here------------------------
> >>> volume brick
> >>> type storage/posix
> >>> option directory /var/gluster
> >>> end-volume
> >>>
> >>> volume locks
> >>> type features/posix-locks
> >>> subvolumes brick
> >>> end-volume
> >>>
> >>> volume server
> >>> type protocol/server
> >>> option transport-type tcp/server
> >>> option auth.ip.locks.allow *
> >>> option auth.ip.brick-ns.allow *
> >>> subvolumes locks
> >>> end-volume
> >>> -----------------cut here-----------------------
> >>>
> >>> client gluster config below (please note remote1 and remote4 got 
> >>> problems metioned above), gluster client was
> >>> start with a command:
> >>> glusterfs --log-file=/var/log/gluster-client -f 
> >>> /usr/local/etc/glusterfs/glusterfs-client.vol /var/glustertest
> >>>
> >>>
> >>> -----------------client config-cut here-----------------------
> >>> volume remote1
> >>> type protocol/client
> >>> option transport-type tcp/client
> >>> option remote-host 192.168.2.184
> >>> option ping-timeout 5
> >>> option remote-subvolume locks
> >>> end-volume
> >>>
> >>> volume remote2
> >>> type protocol/client
> >>> option transport-type tcp/client
> >>> option remote-host 192.168.2.195
> >>> option ping-timeout 5
> >>> option remote-subvolume locks
> >>> end-volume
> >>>
> >>> volume remote3
> >>> type protocol/client
> >>> option transport-type tcp/client
> >>> option remote-host 192.168.2.145
> >>> option ping-timeout 5
> >>> option remote-subvolume locks
> >>> end-volume
> >>>
> >>> volume remote4
> >>> type protocol/client
> >>> option transport-type tcp/client
> >>> option remote-host 192.168.2.193
> >>> option ping-timeout 5
> >>> option remote-subvolume locks
> >>> end-volume
> >>>
> >>> volume afr1
> >>> type cluster/replicate
> >>> subvolumes remote1 remote3
> >>> end-volume
> >>>
> >>> volume afr2
> >>> type cluster/replicate
> >>> subvolumes remote2 remote4
> >>> end-volume
> >>>
> >>>
> >>> volume bigfs
> >>> type cluster/distribute
> >>> subvolumes afr1 afr2
> >>> end-volume
> >>>
> >>> volume writebehind
> >>> type performance/write-behind
> >>> option flush-behind on
> >>> option cache-size 3MB
> >>> subvolumes bigfs
> >>> end-volume
> >>>
> >>> volume readahead
> >>> type performance/read-ahead
> >>> option page-count 16
> >>> subvolumes writebehind
> >>> end-volume
> >>> -----------------cut here--------------------------------------
> >>>
> >> _______________________________________________
> >> Gluster-users mailing list
> >> Gluster-users at gluster.org
> >> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
> >>
> > 
> > 
> 
>