[Gluster-users] Strange server locks isuess with 2.0.7 - updating

Marek Blaszkowski mb at kis.p.lodz.pl
Tue Nov 10 13:59:07 UTC 2009


OK,
here goes some more details, on a "bad" servers (with strange lockups) we got
problems with open/move files. We are unable to open,move or just ls files
(file utils just hangs )....


Marek wrote:
> Hello,
> we're testing a simple configuration of glusterfs 2.0.7 with 4 servers 
> and 1 client (2+2 bricks each replicated with
> a distribute translator, configs below).
> Durning our tests (client side copying/moving a lot of small files on 
> glusterfs mounted FS) we got a strange
> lockups on two of servers (bricks).
> I was unable to login (via ssh) to server, on already started terminal 
> sessions I couldn't spawn a "top"
> process (it just hangs), vmstats exists with floating point exception. 
> Other fileutils commands behaves "normal".
> There was no any dmesg kernel messages (first guess was a kernel ups or 
> other kernel related problems).
> This server never had any CPU/memory problems under high loads before. 
> Problems starts when we
> run glusterfsd on this server. We had to a hard reset malfunction server 
> (reboot doesn't work).
> After a couple hours of testing another server disconected from a client 
> (according to a client debug log).
> Scenario was the same:
> 1. unable to login to a server, connection was established but sshd on 
> server side hang/timeout after entering a user password
> 2. on a previous established terminal sessions was unable to run top or 
> vmstat utility (vmstats exit with with
> floating point exception. Copying/moving files was OK. Load was  0.00, 
> 0.00, 0.00
> 
> 
> What could be wrong? These servers never had problems before (simple 
> terminal/proxy servers). Strange locking looks
> like related to a kernel VM structures (why top/vmstat behaves odd??) or 
> other kernel related problems.
> 
> Server remote1 details: Linux version 2.6.26-1-686 (Debian 
> 2.6.26-13lenny2) (dannf at debian.org)
> (gcc version 4.1.3 20080704 (prerelease) (Debian 4.1.2-25)) #1 SMP Fri 
> Mar 13 18:08:45 UTC 2009
> running debian 5.0
> 
> Server remote2 details: Linux version 2.6.22-14-server (buildd at palmer) 
> (gcc version 4.1.3 20070929
> (prerelease) (Ubuntu 4.1.2-16ubuntu2)) #1 SMP Sun Oct 14 23:34:23 GMT 2007
> running ubuntu
> both run glusterfsd:
>  /usr/local/sbin/glusterfsd -p /var/run/glusterfsd.pid -f 
> /usr/local/etc/glusterfs/glusterfs-server.vol
> 
> 
> Note that both servers runs different os versions and got simillar 
> lockup problems, never having problems
> before (without glusterfsd).
> 
> 
> Server gluster config file (the same on 4 servers):
> -----------------cut here------------------------
> volume brick
> type storage/posix
> option directory /var/gluster
> end-volume
> 
> volume locks
> type features/posix-locks
> subvolumes brick
> end-volume
> 
> volume server
> type protocol/server
> option transport-type tcp/server
> option auth.ip.locks.allow *
> option auth.ip.brick-ns.allow *
> subvolumes locks
> end-volume
> -----------------cut here-----------------------
> 
> client gluster config below (please note remote1 and remote4 got 
> problems metioned above), gluster client was
> start with a command:
> glusterfs --log-file=/var/log/gluster-client -f 
> /usr/local/etc/glusterfs/glusterfs-client.vol /var/glustertest
> 
> 
> -----------------client config-cut here-----------------------
> volume remote1
> type protocol/client
> option transport-type tcp/client
> option remote-host 192.168.2.184
> option ping-timeout 5
> option remote-subvolume locks
> end-volume
> 
> volume remote2
> type protocol/client
> option transport-type tcp/client
> option remote-host 192.168.2.195
> option ping-timeout 5
> option remote-subvolume locks
> end-volume
> 
> volume remote3
> type protocol/client
> option transport-type tcp/client
> option remote-host 192.168.2.145
> option ping-timeout 5
> option remote-subvolume locks
> end-volume
> 
> volume remote4
> type protocol/client
> option transport-type tcp/client
> option remote-host 192.168.2.193
> option ping-timeout 5
> option remote-subvolume locks
> end-volume
> 
> volume afr1
> type cluster/replicate
> subvolumes remote1 remote3
> end-volume
> 
> volume afr2
> type cluster/replicate
> subvolumes remote2 remote4
> end-volume
> 
> 
> volume bigfs
> type cluster/distribute
> subvolumes afr1 afr2
> end-volume
> 
> volume writebehind
> type performance/write-behind
> option flush-behind on
> option cache-size 3MB
> subvolumes bigfs
> end-volume
> 
> volume readahead
> type performance/read-ahead
> option page-count 16
> subvolumes writebehind
> end-volume
> -----------------cut here--------------------------------------
> 




More information about the Gluster-users mailing list