[Gluster-users] replication problems

Pavan Vilas Sondur pavan at gluster.com
Thu Oct 1 07:55:53 UTC 2009


Hi Adrian,
Correct me if I've got you wrong - You have 2 servers and a client replicates to both the servers. If the first server is down, the client also does not respond. You mentioned about more than 1 client - can you clarify this so that we can try and understand the issue.

Pavan

On 01/10/09 08:41 +0200, Adrian Moisey wrote:
> Hi
>
> I am currently testing GlusterFS in with replication.
> I am running Ubuntu hardy using packages from the PPA on launchpad.net.  
> I am currently using glusterfs 2.0.6.
>
> I have 2 machines, both exporting 1 brick each. This is the config I'm  
> using:
> ----8<----8<----8<----8<----8<----8<----8<----8<----8<----
> volume posix
>  type storage/posix
>  option directory /home/export/
> end-volume
>
> volume locks
>   type features/locks
>   subvolumes posix
> end-volume
>
> volume cache
>   type performance/io-cache
>   subvolumes locks
> end-volume
>
> volume brick
>   type performance/io-threads
>   option thread-count 8
>   subvolumes cache
> end-volume
>
> ### Add network serving capability to above brick.
> volume server
>  type protocol/server
>  option transport-type tcp
>  subvolumes brick
>  option auth.addr.brick.allow * # Allow access to "brick" volume
> end-volume
> ----8<----8<----8<----8<----8<----8<----8<----8<----8<----
>
> I then have 2 clients (which happen to be the same 2 machines) that  
> connect to both bricks and replicate them using this config:
>
> ----8<----8<----8<----8<----8<----8<----8<----8<----8<----
> ### Add client feature and attach to remote subvolume of server1
> volume brick1
>  type protocol/client
>  option transport-type tcp
>  option remote-host 172.19.45.102      # IP address of the remote brick
>  option remote-subvolume brick        # name of the remote volume
> end-volume
>
> ### Add client feature and attach to remote subvolume of server2
> volume brick2
>  type protocol/client
>  option transport-type tcp
>  option remote-host 172.19.45.103      # IP address of the remote brick
>  option remote-subvolume brick        # name of the remote volume
> end-volume
>
> volume replicate
>  type cluster/replicate
>  subvolumes brick1 brick2
> end-volume
> ----8<----8<----8<----8<----8<----8<----8<----8<----8<----
>
> If I start the 2 servers up, then mount both clients everything works  
> file. I have shared storage which is replicated to each host.
>
> If I shut the one brick down, the client on that machine also dies and I  
>  get strange errors:
> ----8<----8<----8<----8<----8<----8<----8<----8<----8<----
> # cd /mnt/gluster
> bash: cd: /mnt/gluster: Transport endpoint is not connected
> # df -h
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/sda1             9.5G  1.1G  7.9G  13% /
> varrun                125M   68K  125M   1% /var/run
> varlock               125M     0  125M   0% /var/lock
> udev                  125M   44K  125M   1% /dev
> devshm                125M     0  125M   0% /dev/shm
> df: `/mnt/gluster': Transport endpoint is not connected
> # mount
> /dev/sda1 on / type ext3 (rw,relatime,errors=remount-ro)
> proc on /proc type proc (rw,noexec,nosuid,nodev)
> /sys on /sys type sysfs (rw,noexec,nosuid,nodev)
> varrun on /var/run type tmpfs (rw,noexec,nosuid,nodev,mode=0755)
> varlock on /var/lock type tmpfs (rw,noexec,nosuid,nodev,mode=1777)
> udev on /dev type tmpfs (rw,mode=0755)
> devshm on /dev/shm type tmpfs (rw)
> devpts on /dev/pts type devpts (rw,gid=5,mode=620)
> securityfs on /sys/kernel/security type securityfs (rw)
> /etc/glusterfs/glusterfs.vol on /mnt/gluster type fuse.glusterfs  
> (rw,allow_other,default_permissions,max_read=131072)
> ----8<----8<----8<----8<----8<----8<----8<----8<----8<----
>
> Here is a copy of debug logs:
> [2009-10-01 08:16:15] D [glusterfsd.c:354:_get_specfp] glusterfs:  
> loading volume file /etc/glusterfs/glusterfs.vol
> ================================================================================
> Version      : glusterfs 2.0.6 built on Aug 31 2009 20:14:31
> TLA Revision : v2.0.6
> Starting Time: 2009-10-01 08:16:15
> Command line : glusterfs --log-level=DEBUG  
> --volfile=/etc/glusterfs/glusterfs.vol /mnt/gluster/
> PID          : 17884
> System name  : Linux
> Nodename     : cj-cpt-molb01
> Kernel Release : 2.6.24-24-server
> Hardware Identifier: i686
>
> Given volfile:
> +------------------------------------------------------------------------------+
>   1: ### Add client feature and attach to remote subvolume of server1
>   2: volume brick1
>   3:  type protocol/client
>   4:  option transport-type tcp
>   5:  option remote-host 172.19.45.102      # IP address of the remote  
> brick
>   6:  option remote-subvolume brick        # name of the remote volume
>   7: end-volume
>   8:
>   9: ### Add client feature and attach to remote subvolume of server2
>  10: volume brick2
>  11:  type protocol/client
>  12:  option transport-type tcp
>  13:  option remote-host 172.19.45.103      # IP address of the remote  
> brick
>  14:  option remote-subvolume brick        # name of the remote volume
>  15: end-volume
>  16:
>  17: volume replicate
>  18:  type cluster/replicate
>  19:  subvolumes brick1 brick2
>  20: end-volume
>
> +------------------------------------------------------------------------------+
> [2009-10-01 08:16:15] D [glusterfsd.c:1205:main] glusterfs: running in  
> pid 17884
> [2009-10-01 08:16:15] D [client-protocol.c:5952:init] brick1: defaulting  
> frame-timeout to 30mins
> [2009-10-01 08:16:15] D [client-protocol.c:5963:init] brick1: defaulting  
> ping-timeout to 10
> [2009-10-01 08:16:15] D [transport.c:141:transport_load] transport:  
> attempt to load file /usr/lib/glusterfs/2.0.6/transport/socket.so
> [2009-10-01 08:16:15] D [transport.c:141:transport_load] transport:  
> attempt to load file /usr/lib/glusterfs/2.0.6/transport/socket.so
> [2009-10-01 08:16:15] D [client-protocol.c:5952:init] brick2: defaulting  
> frame-timeout to 30mins
> [2009-10-01 08:16:15] D [client-protocol.c:5963:init] brick2: defaulting  
> ping-timeout to 10
> [2009-10-01 08:16:15] D [transport.c:141:transport_load] transport:  
> attempt to load file /usr/lib/glusterfs/2.0.6/transport/socket.so
> [2009-10-01 08:16:15] D [transport.c:141:transport_load] transport:  
> attempt to load file /usr/lib/glusterfs/2.0.6/transport/socket.so
> [2009-10-01 08:16:15] D [client-protocol.c:6280:notify] brick1: got  
> GF_EVENT_PARENT_UP, attempting connect on transport
> [2009-10-01 08:16:15] D [client-protocol.c:6280:notify] brick1: got  
> GF_EVENT_PARENT_UP, attempting connect on transport
> [2009-10-01 08:16:15] D [client-protocol.c:6280:notify] brick2: got  
> GF_EVENT_PARENT_UP, attempting connect on transport
> [2009-10-01 08:16:15] D [client-protocol.c:6280:notify] brick2: got  
> GF_EVENT_PARENT_UP, attempting connect on transport
> [2009-10-01 08:16:15] D [client-protocol.c:6280:notify] brick1: got  
> GF_EVENT_PARENT_UP, attempting connect on transport
> [2009-10-01 08:16:15] D [client-protocol.c:6280:notify] brick1: got  
> GF_EVENT_PARENT_UP, attempting connect on transport
> [2009-10-01 08:16:15] D [client-protocol.c:6280:notify] brick2: got  
> GF_EVENT_PARENT_UP, attempting connect on transport
> [2009-10-01 08:16:15] D [client-protocol.c:6280:notify] brick2: got  
> GF_EVENT_PARENT_UP, attempting connect on transport
> [2009-10-01 08:16:15] N [glusterfsd.c:1224:main] glusterfs: Successfully  
> started
> [2009-10-01 08:16:15] D [client-protocol.c:6294:notify] brick1: got  
> GF_EVENT_CHILD_UP
> [2009-10-01 08:16:15] D [client-protocol.c:6294:notify] brick1: got  
> GF_EVENT_CHILD_UP
> [2009-10-01 08:16:15] N [client-protocol.c:5559:client_setvolume_cbk]  
> brick1: Connected to 172.19.45.102:6996, attached to remote volume 
> 'brick'.
> [2009-10-01 08:16:15] N [afr.c:2203:notify] replicate: Subvolume  
> 'brick1' came back up; going online.
> [2009-10-01 08:16:15] N [client-protocol.c:5559:client_setvolume_cbk]  
> brick1: Connected to 172.19.45.102:6996, attached to remote volume 
> 'brick'.
> [2009-10-01 08:16:15] N [afr.c:2203:notify] replicate: Subvolume  
> 'brick1' came back up; going online.
> [2009-10-01 08:16:15] D [client-protocol.c:6294:notify] brick2: got  
> GF_EVENT_CHILD_UP
> [2009-10-01 08:16:15] D [client-protocol.c:6294:notify] brick2: got  
> GF_EVENT_CHILD_UP
> [2009-10-01 08:16:15] N [client-protocol.c:5559:client_setvolume_cbk]  
> brick2: Connected to 172.19.45.103:6996, attached to remote volume 
> 'brick'.
> [2009-10-01 08:16:15] N [client-protocol.c:5559:client_setvolume_cbk]  
> brick2: Connected to 172.19.45.103:6996, attached to remote volume 
> 'brick'.
> [2009-10-01 08:17:24] N [client-protocol.c:6246:notify] brick1: disconnected
> [2009-10-01 08:17:27] E [socket.c:745:socket_connect_finish] brick1:  
> connection to 172.19.45.102:6996 failed (Connection refused)
> [2009-10-01 08:17:27] E [socket.c:745:socket_connect_finish] brick1:  
> connection to 172.19.45.102:6996 failed (Connection refused)
>
>
>
> Any ideas?
>
>
> -- 
> Adrian Moisey
> Systems Designer | CareerJunction | Better jobs. More often.
> Web: www.careerjunction.co.za | Email: adrian at careerjunction.co.za
> Phone: +27 21 818 8621 | Mobile: +27 82 858 7830 | Fax: +27 21 818 8855
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users



More information about the Gluster-users mailing list