[Gluster-users] Performance and redundancy help

Chad ccolumbu at hotmail.com
Fri Feb 26 17:01:30 UTC 2010


Ok, using glusterfs-volgen I rebuilt the config files and got the cluster working again.
The performance is improved from 1.6 to about 24, NFS is at about 11, and straight to the disks is 170+.
There seems to be a HUGE performance loss by using a network file system.

I must still be doing something wrong, because I have several problems:
1. If I create a file on the client while 1 of the servers is down, when the server comes back up the file is still missing.
	I thought the servers were supposed to be replicated, why don't they re-sync?
2. If server 1 is down and 2 is up, it takes about 30 seconds to do a df on the client, that is WAY to long, lots of my applications will timeout in under that 
time.
	How do I get the client to work the same if 1 of the servers is down?
3. On the server if the network is down and I try to do a df it hangs my shell (the server has the glusterfs mounted as a client) This is typical behavior for 
NFS also, is there any way to timeout instead of hanging?

Here are my new config files:
----- server.vol -----
volume tcb_posix
   type storage/posix
   option directory /mnt/tcb_data
end-volume

volume tcb_locks
     type features/locks
     subvolumes tcb_posix
end-volume

volume tcb_brick
     type performance/io-threads
     option thread-count 8
     subvolumes tcb_locks
end-volume

volume tcb_server
     type protocol/server
     option transport-type tcp
     option auth.addr.tcb_brick.allow *
     option transport.socket.listen-port 50001
     option transport.socket.nodelay on
     subvolumes tcb_brick
end-volume

----------------------------------------
----- client.vol -----
volume tcb_remote_glust1
     type protocol/client
     option transport-type tcp
     option remote-host x.x.x.x
     option transport.socket.nodelay on
     option transport.remote-port 50001
     option remote-subvolume tcb_brick
end-volume

volume tcb_remote_glust2
     type protocol/client
     option transport-type tcp
     option remote-host y.y.y.y
     option transport.socket.nodelay on
     option transport.remote-port 50001
     option remote-subvolume tcb_brick
end-volume

volume tcb_mirror
     type cluster/replicate
     subvolumes tcb_remote_glust1 tcb_remote_glust2
end-volume

volume tcb_writebehind
     type performance/write-behind
     option cache-size 4MB
     subvolumes tcb_mirror
end-volume

volume tcb_readahead
     type performance/read-ahead
     option page-count 4
     subvolumes tcb_writebehind
end-volume

volume tcb_iocache
     type performance/io-cache
     option cache-size `grep 'MemTotal' /proc/meminfo  | awk '{print $2 * 0.2 / 1024}' | cut -f1 -d.`MB
     option cache-timeout 1
     subvolumes tcb_readahead
end-volume

volume tcb_quickread
     type performance/quick-read
     option cache-timeout 1
     option max-file-size 64kB
     subvolumes tcb_iocache
end-volume

volume tcb_statprefetch
     type performance/stat-prefetch
     subvolumes tcb_quickread
end-volume

^C



Chad wrote:
> Ok, I tried to change over to this, but now I just get:
> [2010-02-24 09:30:41] E [authenticate.c:234:gf_authenticate] auth: no 
> authentication module is interested in accepting remote-client 
> 10.0.0.24:1007
> [2010-02-24 09:30:41] E [server-protocol.c:5822:mop_setvolume] 
> tcb_remote: Cannot authenticate client from 10.0.0.24:1007
> 
> I am sure it is something simple, I just don't know what.
> Is this a port problem? the port in the log is 1007, but the server is 
> on 50001.
> 
> Here are my config files:
> ----- server.vol: -----
> volume tcb_posix-export
>   type storage/posix
>   option directory /mnt/tcb_data
> end-volume
> 
> volume tcb_locks-export
>   type features/locks
>   subvolumes tcb_posix-export
> end-volume
> 
> volume tcb_export
>   type performance/io-threads
>   option thread-count 8
>   subvolumes tcb_locks-export
> end-volume
> 
> volume tcb_remote
>         type protocol/server
>         option transport-type tcp
>         option transport.socket.listen-port 50001
>         option transport.socket.nodelay on
>         subvolumes tcb_export tcb_locks-export
>         option auth.ip.tcb_locks-export.allow 
> 10.0.0.*,10.0.20.*,10.0.30.*,192.168.1.*,192.168.20.*,192.168.30.*,127.0.0.1 
> 
>         option auth.ip.tcb_export.allow 
> 10.0.0.*,10.0.20.*,10.0.30.*,192.168.1.*,192.168.20.*,192.168.30.*,127.0.0.1 
> 
> end-volume
> 
> 
> ----- client.vol -----
> volume tcb_remote1
>   type protocol/client
>   option transport-type tcp
>   option remote-port 50001
>   option remote-host 10.0.0.24
>   option remote-subvolume tcb_remote
> end-volume
> 
> volume tcb_remote2
>   type protocol/client
>   option transport-type tcp
>   option remote-port 50001
>   option remote-host 10.0.0.25
>   option remote-subvolume tcb_remote
> end-volume
> 
> volume tcb_mirror
>   type cluster/afr
>   subvolumes tcb_remote1 tcb_remote2
> end-volume
> 
> volume tcb_wb
>   type performance/write-behind
>   option cache-size 1MB
>   subvolumes tcb_mirror
> end-volume
> 
> volume tcb_ioc
>   type performance/io-cache
>   option cache-size 32MB
>   subvolumes tcb_wb
> end-volume
> 
> volume tcb_iothreads
>   type performance/io-threads
>   option thread-count 16
>   subvolumes tcb_ioc
> end-volume
> ^C
> 
> 
> 
> Chad wrote:
>> I finally got the servers transported 2000 miles, set-up, wired, and 
>> booted.
>> Here are the vol files.
>> Just to reiterate, the issues are slow performance on read/write, and 
>> clients hanging when 1 server goes down.
>>
>>
>> ### glusterfs.vol ###
>> ############################################
>> # Start tcb_cluster
>> ############################################
>> # the exported volume to mount                    # required!
>> volume tcb_cluster
>>         type protocol/client
>>         option transport-type tcp/client
>>         option remote-host glustcluster
>>         option remote-port 50001
>>         option remote-subvolume tcb_cluster
>> end-volume
>>
>> ############################################
>> # Start cs_cluster
>> ############################################
>> # the exported volume to mount                    # required!
>> volume cs_cluster
>>         type protocol/client
>>         option transport-type tcp/client
>>         option remote-host glustcluster
>>         option remote-port 50002
>>         option remote-subvolume cs_cluster
>> end-volume
>>
>> ############################################
>> # Start pbx_cluster
>> ############################################
>> # the exported volume to mount                    # required!
>> volume pbx_cluster
>>         type protocol/client
>>         option transport-type tcp/client
>>         option remote-host glustcluster
>>         option remote-port 50003
>>         option remote-subvolume pbx_cluster
>> end-volume
>>
>>
>> ---------------------------------------------------
>> ### glusterfsd.vol ###
>> #############################################
>> # Start tcb_data cluster
>> #############################################
>> volume tcb_local
>>         type storage/posix
>>         option directory /mnt/tcb_data
>> end-volume
>>
>> volume tcb_locks
>>         type features/locks
>>         option mandatory-locks on          # enables mandatory locking 
>> on all files
>>         subvolumes tcb_local
>> end-volume
>>
>> # dataspace on remote machine, look in /etc/hosts to see that
>> volume tcb_locks_remote
>>         type protocol/client
>>         option transport-type tcp
>>         option remote-port 50001
>>         option remote-host 192.168.1.25
>>         option remote-subvolume tcb_locks
>> end-volume
>>
>> # automatic file replication translator for dataspace
>> volume tcb_cluster_afr
>>         type cluster/replicate
>>         subvolumes tcb_locks tcb_locks_remote
>> end-volume
>>
>> # the actual exported volume
>> volume tcb_cluster
>>         type performance/io-threads
>>         option thread-count 256
>>         option cache-size 128MB
>>         subvolumes tcb_cluster_afr
>> end-volume
>>
>> volume tcb_cluster_server
>>         type protocol/server
>>         option transport-type tcp
>>         option transport.socket.listen-port 50001
>>         option auth.addr.tcb_locks.allow *
>>         option auth.addr.tcb_cluster.allow *
>>         option transport.socket.nodelay on
>>         subvolumes tcb_cluster
>> end-volume
>>
>> #############################################
>> # Start cs_data cluster
>> #############################################
>> volume cs_local
>>         type storage/posix
>>         option directory /mnt/cs_data
>> end-volume
>>
>> volume cs_locks
>>         type features/locks
>>         option mandatory-locks on          # enables mandatory locking 
>> on all files
>>         subvolumes cs_local
>> end-volume
>>
>> # dataspace on remote machine, look in /etc/hosts to see that
>> volume cs_locks_remote
>>         type protocol/client
>>         option transport-type tcp
>>         option remote-port 50002
>>         option remote-host 192.168.1.25
>>         option remote-subvolume cs_locks
>> end-volume
>>
>> # automatic file replication translator for dataspace
>> volume cs_cluster_afr
>>         type cluster/replicate
>>         subvolumes cs_locks cs_locks_remote
>> end-volume
>>
>> # the actual exported volume
>> volume cs_cluster
>>         type performance/io-threads
>>         option thread-count 256
>>         option cache-size 128MB
>>         subvolumes cs_cluster_afr
>> end-volume
>>
>> volume cs_cluster_server
>>         type protocol/server
>>         option transport-type tcp
>>         option transport.socket.listen-port 50002
>>         option auth.addr.cs_locks.allow *
>>         option auth.addr.cs_cluster.allow *
>>         option transport.socket.nodelay on
>>         subvolumes cs_cluster
>> end-volume
>>
>> #############################################
>> # Start pbx_data cluster
>> #############################################
>> volume pbx_local
>>         type storage/posix
>>         option directory /mnt/pbx_data
>> end-volume
>>
>> volume pbx_locks
>>         type features/locks
>>         option mandatory-locks on          # enables mandatory locking 
>> on all files
>>         subvolumes pbx_local
>> end-volume
>>
>> # dataspace on remote machine, look in /etc/hosts to see that
>> volume pbx_locks_remote
>>         type protocol/client
>>         option transport-type tcp
>>         option remote-port 50003
>>         option remote-host 192.168.1.25
>>         option remote-subvolume pbx_locks
>> end-volume
>>
>> # automatic file replication translator for dataspace
>> volume pbx_cluster_afr
>>         type cluster/replicate
>>         subvolumes pbx_locks pbx_locks_remote
>> end-volume
>>
>> # the actual exported volume
>> volume pbx_cluster
>>         type performance/io-threads
>>         option thread-count 256
>>         option cache-size 128MB
>>         subvolumes pbx_cluster_afr
>> end-volume
>>
>> volume pbx_cluster_server
>>         type protocol/server
>>         option transport-type tcp
>>         option transport.socket.listen-port 50003
>>         option auth.addr.pbx_locks.allow *
>>         option auth.addr.pbx_cluster.allow *
>>         option transport.socket.nodelay on
>>         subvolumes pbx_cluster
>> end-volume
>>
>>
>> -- 
>> ^C
>>
>>
>>
>> Smart Weblications GmbH - Florian Wiessner wrote:
>>> Hi,
>>>
>>> Am 16.02.2010 01:58, schrieb Chad:
>>>> I am new to glusterfs, and this list, please let me know if I have made
>>>> any mistakes in posting this to the list.
>>>> I am not sure what your standards are.
>>>>
>>>> I came across glusterfs last week, it was super easy to set-up and test
>>>> and is almost exactly what I want/need.
>>>> I set up 2 "glusterfs servers" that serve up a mirrored raid5 disk
>>>> partitioned into 3 5oogb partitions to 6 clients.
>>>> I am using round robin DNS, but I also tried to use heartbeat and
>>>> ldirectord (see details below).
>>>> Each server has 2 NICs: 1 for the clients, the other has a cross over
>>>> cable connecting the 2 servers. Both NICs are 1000mbps.
>>>>
>>>> There are only 2 issues.
>>>> #1. When one of the servers goes down the clients hang at least for a
>>>> little while (more testing is needed) I am not sure if the clients can
>>>> recover at all.
>>>> #2. The read/write tests I performed came in at 1.6 when using
>>>> glusterfs, NFS on all the same machines came in at 11, and a direct 
>>>> test
>>>> on the data server came
>>>> in at 111. How do I improve the performance?
>>>>
>>>
>>> please share your vol-files. i don't understand why you would need 
>>> loadbalancers.
>>>
>>>> ###############################################
>>>> My glusterfs set-up:
>>>> 2 supermicro dual Xeon 3.0 ghz CPUs, 8gb ram, 4 @ 750gb seagate sata
>>>> HDs, 3 in raid5 with 1 hot spare. (data servers)
>>>
>>> why not use raid10? same capacity, better speed..
>>>
>>>> 6 supermicro dual AMD 2.8 ghz CPUs, 4gb ram, 2 @ 250gb seagate sata HDs
>>>> in raid 1. (client machines)
>>>> glusterfs is set-up with round robin DNS to handle the load 
>>>> balancing of
>>>> the 2 data servers.
>>>
>>> afaik there is no need to setup dns rr nor loadbalancing for the gluster
>>> servers, glusterfs should take care of that itself. but without your 
>>> volfiles i
>>> can't give any hints.
>>>
>>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>
>>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
> 
> 



More information about the Gluster-users mailing list