[Gluster-devel] Whole failure when one glusterfsd brought down.
Dale Dude
dale at oc3networks.com
Wed Mar 21 22:02:02 UTC 2007
Hi Krishna,
Works as expected. Much appreciated!
I thought this was asked earlier, but cant find the question:
One of my 3 servers die and its down for a period. During that down
period other processes recreate files that are still on the downed
server. When I bring up the downed server and it becomes part of the
filesystem again, how will glusterfs know which file to use now?
Should I ensure that the downed server does not contain files that are
on other gluster servers before adding it back into the filesystem? I
understand that before I bring the downed server back into glusterfs
that the directories should be created on it.
Thanks,
Dale
Krishna Srinivas wrote:
> Hi Dale,
>
> This is a recent change. You need to put following line in unify
> for it to work when one of the node goes down.
> "option readdir-force-success on"
>
> We discussed whether user should know if one of the node
> has gone down, coming to the conclusion that it is best to
> leave it as an option for the user to configure it himself.
>
> Regards
> Krishna
Krishna Srinivas wrote:
> Hi Dale,
>
> This is a recent change. You need to put following line in unify
> for it to work when one of the node goes down.
> "option readdir-force-success on"
>
> We discussed whether user should know if one of the node
> has gone down, coming to the conclusion that it is best to
> leave it as an option for the user to configure it himself.
>
> Regards
> Krishna
>
> On 3/22/07, Dale Dude <dale at oc3networks.com> wrote:
>> Using code pulled March 21st @ 1am EST. Kernel 2.6.20-10 on Ubuntu
>> Feisty 32bit.
>>
>> I have 3 machines serving with glusterfsd and mounted the cluster from
>> the first server. If I kill one of the glusterfsd's on any of the
>> machines, the mount point becomes broken with the 'Transport' error
>> below. Also, glusterfs will produce this error even if I unmount and
>> remount with the 1 glusterfsd server still down. Is this expected
>> results or shouldn't the mount continue to work even though one of the
>> servers has "died"?
>>
>> ls: reading directory local/: Transport endpoint is not connected
>>
>> glusterfs.log produces this:
>> [Mar 21 06:25:17] [ERROR/client-protocol.c:183/client_protocol_xfer()]
>> protocol/client: client_protocol_xfer: :transport_submit failed
>> [Mar 21 06:25:17] [ERROR/tcp-client.c:284/tcp_connect()]
>> tcp/client:non-blocking connect() returned: 111 (Connection refused)
>> [Mar 21 06:25:17] [ERROR/client-protocol.c:183/client_protocol_xfer()]
>> protocol/client: client_protocol_xfer: :transport_submit failed
>> [Mar 21 06:25:17] [ERROR/client-protocol.c:183/client_protocol_xfer()]
>> protocol/client: client_protocol_xfer: :transport_submit failed
>> [Mar 21 06:25:17] [ERROR/tcp-client.c:284/tcp_connect()]
>> tcp/client:non-blocking connect() returned: 111 (Connection refused)
>> [Mar 21 06:25:17] [ERROR/client-protocol.c:183/client_protocol_xfer()]
>> protocol/client: client_protocol_xfer: :transport_submit failed
>> [Mar 21 06:25:17] [ERROR/client-protocol.c:183/client_protocol_xfer()]
>> protocol/client: client_protocol_xfer: :transport_submit failed
>>
>>
>> ======================================
>> glusterfs-server.vol used by all the servers:
>> (ignore my bad volume naming, was just testing)
>>
>> volume brick
>> type storage/posix # POSIX FS translator
>> option directory /home/export # Export this directory
>> end-volume
>>
>> volume iothreads
>> type performance/io-threads
>> option thread-count 8
>> subvolumes brick
>> end-volume
>>
>> volume server
>> type protocol/server
>> option transport-type tcp/server # For TCP/IP transport
>> subvolumes iothreads
>> option auth.ip.brick.allow * # Allow access to "brick" volume
>> end-volume
>>
>>
>> ======================================
>> glusterfs-client.vol used on server1:
>> (ignore my bad volume naming, was just testing)
>>
>> volume client1
>> type protocol/client
>> option transport-type tcp/client # for TCP/IP transport
>> option remote-host 1.1.1.1 # IP address of the remote brick
>> option remote-subvolume brick # name of the remote volume
>> end-volume
>>
>> volume client2
>> type protocol/client
>> option transport-type tcp/client # for TCP/IP transport
>> option remote-host 2.2.2.2 # IP address of the remote brick
>> option remote-subvolume brick # name of the remote volume
>> end-volume
>>
>> volume client3
>> type protocol/client
>> option transport-type tcp/client # for TCP/IP transport
>> option remote-host 3.3.3.3 # IP address of the remote brick
>> option remote-subvolume brick # name of the remote volume
>> end-volume
>>
>> volume bricks
>> type cluster/unify
>> subvolumes client1 client2 client3
>> option scheduler alu
>> option alu.limits.min-free-disk 60GB # Stop creating
>> files when free-space lt 60GB
>> option alu.limits.max-open-files 10000
>> option alu.order
>> disk-usage:read-usage:write-usage:open-files-usage:disk-speed-usage
>> option alu.disk-usage.entry-threshold 2GB # Units in KB, MB
>> and GB are allowed
>> option alu.disk-usage.exit-threshold 60MB # Units in KB, MB
>> and GB are allowed
>> option alu.open-files-usage.entry-threshold 1024
>> option alu.open-files-usage.exit-threshold 32
>> option alu.stat-refresh.interval 10sec
>> end-volume
>>
>> volume writebehind #writebehind improves write performance a lot
>> type performance/write-behind
>> option aggregate-size 131072 # in bytes
>> subvolumes bricks
>> end-volume
>>
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at nongnu.org
>> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>>
>
More information about the Gluster-devel
mailing list