[Gluster-devel] Whole failure when one glusterfsd brought down.

Wed Mar 21 22:02:02 UTC 2007

Hi Krishna,

Works as expected. Much appreciated!

I thought this was asked earlier, but cant find the question:

One of my 3 servers die and its down for a period. During that down 
period other processes recreate files that are still on the downed 
server. When I bring up the downed server and it becomes part of the 
filesystem again, how will glusterfs know which file to use now?

Should I ensure that the downed server does not contain files that are 
on other gluster servers before adding it back into the filesystem? I 
understand that before I bring the downed server back into glusterfs 
that the directories should be created on it.

Thanks,
Dale

Krishna Srinivas wrote:
> Hi Dale,
>
> This is a recent change. You need to put following line in unify
> for it to work when one of the node goes down.
> "option readdir-force-success on"
>
> We discussed whether user should know if one of the node
> has gone down, coming to the conclusion that it is best to
> leave it as an option for the user to configure it himself.
>
> Regards
> Krishna

Krishna Srinivas wrote:
> Hi Dale,
>
> This is a recent change. You need to put following line in unify
> for it to work when one of the node goes down.
> "option readdir-force-success on"
>
> We discussed whether user should know if one of the node
> has gone down, coming to the conclusion that it is best to
> leave it as an option for the user to configure it himself.
>
> Regards
> Krishna
>
> On 3/22/07, Dale Dude <dale at oc3networks.com> wrote:
>> Using code pulled March 21st @ 1am EST. Kernel 2.6.20-10 on Ubuntu
>> Feisty 32bit.
>>
>> I have 3 machines serving with glusterfsd and mounted the cluster from
>> the first server. If I kill one of the glusterfsd's on any of the
>> machines, the mount point becomes broken with the 'Transport' error
>> below. Also, glusterfs will produce this error even if I unmount and
>> remount with the 1 glusterfsd server still down. Is this expected
>> results or shouldn't the mount continue to work even though one of the
>> servers has "died"?
>>
>> ls: reading directory local/: Transport endpoint is not connected
>>
>> glusterfs.log produces this:
>> [Mar 21 06:25:17] [ERROR/client-protocol.c:183/client_protocol_xfer()]
>> protocol/client: client_protocol_xfer: :transport_submit failed
>> [Mar 21 06:25:17] [ERROR/tcp-client.c:284/tcp_connect()]
>> tcp/client:non-blocking connect() returned: 111 (Connection refused)
>> [Mar 21 06:25:17] [ERROR/client-protocol.c:183/client_protocol_xfer()]
>> protocol/client: client_protocol_xfer: :transport_submit failed
>> [Mar 21 06:25:17] [ERROR/client-protocol.c:183/client_protocol_xfer()]
>> protocol/client: client_protocol_xfer: :transport_submit failed
>> [Mar 21 06:25:17] [ERROR/tcp-client.c:284/tcp_connect()]
>> tcp/client:non-blocking connect() returned: 111 (Connection refused)
>> [Mar 21 06:25:17] [ERROR/client-protocol.c:183/client_protocol_xfer()]
>> protocol/client: client_protocol_xfer: :transport_submit failed
>> [Mar 21 06:25:17] [ERROR/client-protocol.c:183/client_protocol_xfer()]
>> protocol/client: client_protocol_xfer: :transport_submit failed
>>
>>
>> ======================================
>> glusterfs-server.vol used by all the servers:
>> (ignore my bad volume naming, was just testing)
>>
>> volume brick
>>   type storage/posix                   # POSIX FS translator
>>   option directory /home/export        # Export this directory
>> end-volume
>>
>> volume iothreads
>>    type performance/io-threads
>>    option thread-count 8
>>    subvolumes brick
>> end-volume
>>
>> volume server
>>   type protocol/server
>>   option transport-type tcp/server     # For TCP/IP transport
>>   subvolumes iothreads
>>   option auth.ip.brick.allow * # Allow access to "brick" volume
>> end-volume
>>
>>
>> ======================================
>> glusterfs-client.vol used on server1:
>> (ignore my bad volume naming, was just testing)
>>
>> volume client1
>>   type protocol/client
>>   option transport-type tcp/client     # for TCP/IP transport
>>   option remote-host 1.1.1.1     # IP address of the remote brick
>>   option remote-subvolume brick        # name of the remote volume
>> end-volume
>>
>> volume client2
>>   type protocol/client
>>   option transport-type tcp/client     # for TCP/IP transport
>>   option remote-host 2.2.2.2     # IP address of the remote brick
>>   option remote-subvolume brick        # name of the remote volume
>> end-volume
>>
>> volume client3
>>   type protocol/client
>>   option transport-type tcp/client     # for TCP/IP transport
>>   option remote-host 3.3.3.3     # IP address of the remote brick
>>   option remote-subvolume brick        # name of the remote volume
>> end-volume
>>
>> volume bricks
>>   type cluster/unify
>>   subvolumes client1 client2 client3
>>   option scheduler alu
>>   option alu.limits.min-free-disk  60GB              # Stop creating
>> files when free-space lt 60GB
>>   option alu.limits.max-open-files 10000
>>   option alu.order
>> disk-usage:read-usage:write-usage:open-files-usage:disk-speed-usage
>>   option alu.disk-usage.entry-threshold 2GB          # Units in KB, MB
>> and GB are allowed
>>   option alu.disk-usage.exit-threshold  60MB         # Units in KB, MB
>> and GB are allowed
>>   option alu.open-files-usage.entry-threshold 1024
>>   option alu.open-files-usage.exit-threshold 32
>>   option alu.stat-refresh.interval 10sec
>> end-volume
>>
>> volume writebehind   #writebehind improves write performance a lot
>>   type performance/write-behind
>>   option aggregate-size 131072 # in bytes
>>   subvolumes bricks
>> end-volume
>>
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at nongnu.org
>> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>>
>