[Gluster-users] replication problems

Adrian Moisey adrian at careerjunction.co.za
Thu Oct 1 06:41:17 UTC 2009


Hi

I am currently testing GlusterFS in with replication.
I am running Ubuntu hardy using packages from the PPA on launchpad.net. 
I am currently using glusterfs 2.0.6.

I have 2 machines, both exporting 1 brick each. This is the config I'm 
using:
----8<----8<----8<----8<----8<----8<----8<----8<----8<----
volume posix
  type storage/posix
  option directory /home/export/
end-volume

volume locks
   type features/locks
   subvolumes posix
end-volume

volume cache
   type performance/io-cache
   subvolumes locks
end-volume

volume brick
   type performance/io-threads
   option thread-count 8
   subvolumes cache
end-volume

### Add network serving capability to above brick.
volume server
  type protocol/server
  option transport-type tcp
  subvolumes brick
  option auth.addr.brick.allow * # Allow access to "brick" volume
end-volume
----8<----8<----8<----8<----8<----8<----8<----8<----8<----

I then have 2 clients (which happen to be the same 2 machines) that 
connect to both bricks and replicate them using this config:

----8<----8<----8<----8<----8<----8<----8<----8<----8<----
### Add client feature and attach to remote subvolume of server1
volume brick1
  type protocol/client
  option transport-type tcp
  option remote-host 172.19.45.102      # IP address of the remote brick
  option remote-subvolume brick        # name of the remote volume
end-volume

### Add client feature and attach to remote subvolume of server2
volume brick2
  type protocol/client
  option transport-type tcp
  option remote-host 172.19.45.103      # IP address of the remote brick
  option remote-subvolume brick        # name of the remote volume
end-volume

volume replicate
  type cluster/replicate
  subvolumes brick1 brick2
end-volume
----8<----8<----8<----8<----8<----8<----8<----8<----8<----

If I start the 2 servers up, then mount both clients everything works 
file. I have shared storage which is replicated to each host.

If I shut the one brick down, the client on that machine also dies and I 
  get strange errors:
----8<----8<----8<----8<----8<----8<----8<----8<----8<----
# cd /mnt/gluster
bash: cd: /mnt/gluster: Transport endpoint is not connected
# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1             9.5G  1.1G  7.9G  13% /
varrun                125M   68K  125M   1% /var/run
varlock               125M     0  125M   0% /var/lock
udev                  125M   44K  125M   1% /dev
devshm                125M     0  125M   0% /dev/shm
df: `/mnt/gluster': Transport endpoint is not connected
# mount
/dev/sda1 on / type ext3 (rw,relatime,errors=remount-ro)
proc on /proc type proc (rw,noexec,nosuid,nodev)
/sys on /sys type sysfs (rw,noexec,nosuid,nodev)
varrun on /var/run type tmpfs (rw,noexec,nosuid,nodev,mode=0755)
varlock on /var/lock type tmpfs (rw,noexec,nosuid,nodev,mode=1777)
udev on /dev type tmpfs (rw,mode=0755)
devshm on /dev/shm type tmpfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
securityfs on /sys/kernel/security type securityfs (rw)
/etc/glusterfs/glusterfs.vol on /mnt/gluster type fuse.glusterfs 
(rw,allow_other,default_permissions,max_read=131072)
----8<----8<----8<----8<----8<----8<----8<----8<----8<----

Here is a copy of debug logs:
[2009-10-01 08:16:15] D [glusterfsd.c:354:_get_specfp] glusterfs: 
loading volume file /etc/glusterfs/glusterfs.vol
================================================================================
Version      : glusterfs 2.0.6 built on Aug 31 2009 20:14:31
TLA Revision : v2.0.6
Starting Time: 2009-10-01 08:16:15
Command line : glusterfs --log-level=DEBUG 
--volfile=/etc/glusterfs/glusterfs.vol /mnt/gluster/
PID          : 17884
System name  : Linux
Nodename     : cj-cpt-molb01
Kernel Release : 2.6.24-24-server
Hardware Identifier: i686

Given volfile:
+------------------------------------------------------------------------------+
   1: ### Add client feature and attach to remote subvolume of server1
   2: volume brick1
   3:  type protocol/client
   4:  option transport-type tcp
   5:  option remote-host 172.19.45.102      # IP address of the remote 
brick
   6:  option remote-subvolume brick        # name of the remote volume
   7: end-volume
   8:
   9: ### Add client feature and attach to remote subvolume of server2
  10: volume brick2
  11:  type protocol/client
  12:  option transport-type tcp
  13:  option remote-host 172.19.45.103      # IP address of the remote 
brick
  14:  option remote-subvolume brick        # name of the remote volume
  15: end-volume
  16:
  17: volume replicate
  18:  type cluster/replicate
  19:  subvolumes brick1 brick2
  20: end-volume

+------------------------------------------------------------------------------+
[2009-10-01 08:16:15] D [glusterfsd.c:1205:main] glusterfs: running in 
pid 17884
[2009-10-01 08:16:15] D [client-protocol.c:5952:init] brick1: defaulting 
frame-timeout to 30mins
[2009-10-01 08:16:15] D [client-protocol.c:5963:init] brick1: defaulting 
ping-timeout to 10
[2009-10-01 08:16:15] D [transport.c:141:transport_load] transport: 
attempt to load file /usr/lib/glusterfs/2.0.6/transport/socket.so
[2009-10-01 08:16:15] D [transport.c:141:transport_load] transport: 
attempt to load file /usr/lib/glusterfs/2.0.6/transport/socket.so
[2009-10-01 08:16:15] D [client-protocol.c:5952:init] brick2: defaulting 
frame-timeout to 30mins
[2009-10-01 08:16:15] D [client-protocol.c:5963:init] brick2: defaulting 
ping-timeout to 10
[2009-10-01 08:16:15] D [transport.c:141:transport_load] transport: 
attempt to load file /usr/lib/glusterfs/2.0.6/transport/socket.so
[2009-10-01 08:16:15] D [transport.c:141:transport_load] transport: 
attempt to load file /usr/lib/glusterfs/2.0.6/transport/socket.so
[2009-10-01 08:16:15] D [client-protocol.c:6280:notify] brick1: got 
GF_EVENT_PARENT_UP, attempting connect on transport
[2009-10-01 08:16:15] D [client-protocol.c:6280:notify] brick1: got 
GF_EVENT_PARENT_UP, attempting connect on transport
[2009-10-01 08:16:15] D [client-protocol.c:6280:notify] brick2: got 
GF_EVENT_PARENT_UP, attempting connect on transport
[2009-10-01 08:16:15] D [client-protocol.c:6280:notify] brick2: got 
GF_EVENT_PARENT_UP, attempting connect on transport
[2009-10-01 08:16:15] D [client-protocol.c:6280:notify] brick1: got 
GF_EVENT_PARENT_UP, attempting connect on transport
[2009-10-01 08:16:15] D [client-protocol.c:6280:notify] brick1: got 
GF_EVENT_PARENT_UP, attempting connect on transport
[2009-10-01 08:16:15] D [client-protocol.c:6280:notify] brick2: got 
GF_EVENT_PARENT_UP, attempting connect on transport
[2009-10-01 08:16:15] D [client-protocol.c:6280:notify] brick2: got 
GF_EVENT_PARENT_UP, attempting connect on transport
[2009-10-01 08:16:15] N [glusterfsd.c:1224:main] glusterfs: Successfully 
started
[2009-10-01 08:16:15] D [client-protocol.c:6294:notify] brick1: got 
GF_EVENT_CHILD_UP
[2009-10-01 08:16:15] D [client-protocol.c:6294:notify] brick1: got 
GF_EVENT_CHILD_UP
[2009-10-01 08:16:15] N [client-protocol.c:5559:client_setvolume_cbk] 
brick1: Connected to 172.19.45.102:6996, attached to remote volume 'brick'.
[2009-10-01 08:16:15] N [afr.c:2203:notify] replicate: Subvolume 
'brick1' came back up; going online.
[2009-10-01 08:16:15] N [client-protocol.c:5559:client_setvolume_cbk] 
brick1: Connected to 172.19.45.102:6996, attached to remote volume 'brick'.
[2009-10-01 08:16:15] N [afr.c:2203:notify] replicate: Subvolume 
'brick1' came back up; going online.
[2009-10-01 08:16:15] D [client-protocol.c:6294:notify] brick2: got 
GF_EVENT_CHILD_UP
[2009-10-01 08:16:15] D [client-protocol.c:6294:notify] brick2: got 
GF_EVENT_CHILD_UP
[2009-10-01 08:16:15] N [client-protocol.c:5559:client_setvolume_cbk] 
brick2: Connected to 172.19.45.103:6996, attached to remote volume 'brick'.
[2009-10-01 08:16:15] N [client-protocol.c:5559:client_setvolume_cbk] 
brick2: Connected to 172.19.45.103:6996, attached to remote volume 'brick'.
[2009-10-01 08:17:24] N [client-protocol.c:6246:notify] brick1: disconnected
[2009-10-01 08:17:27] E [socket.c:745:socket_connect_finish] brick1: 
connection to 172.19.45.102:6996 failed (Connection refused)
[2009-10-01 08:17:27] E [socket.c:745:socket_connect_finish] brick1: 
connection to 172.19.45.102:6996 failed (Connection refused)



Any ideas?


-- 
Adrian Moisey
Systems Designer | CareerJunction | Better jobs. More often.
Web: www.careerjunction.co.za | Email: adrian at careerjunction.co.za
Phone: +27 21 818 8621 | Mobile: +27 82 858 7830 | Fax: +27 21 818 8855



More information about the Gluster-users mailing list