[Gluster-users] Why is it not possible to mount a replicated gluster volume with one Gluster server?

Mon Aug 31 17:04:46 UTC 2015

Thank you all for your help.

To explain the setup better, here is the goal I am trying to achieve:

- 3 servers running in a cluster, each with a webserver uploading and
serving files to visitors from a common glusterfs share.
- Server1 and Server2 have gluster-server installed
- One brick replicated between Server1 and Server2 with the goal of
achieving High Availability
- Server1, Server2 and Server3 mount the brick through fuse.
- Server1 mounts Gluster-Server1 with Backup of Server 2. Same via versa
for Server2

Now following scenario:

1. Server2 dies

In this case Server1 serves as a failover and serves the files for
Server1,2,3 until Server1 comes back up again. This works.

2. Server2 dies. Server1 has to reboot.

In this case the service stays down. It is inpossible to remount the share
without Server1. This is not acceptable for a High Availability System and
I believe also not intended, but a misconfiguration or bug.

Thank you again for looking into this.

2015-08-31 14:10 GMT+02:00 Yiping Peng <barius.cn at gmail.com>:

> One more thing, when I do this on server1, which has been in the pool for
>> a long time:
>> server1:~$ mount server1:/vol1 mountpoint
>> It also fails.
>> The log gave me:
>>
>
> My fault, I used localhost as endpoint.
>
> I re-issued "mount -t glusterfs server01:/speech0 qqq"
> and the log shows a lot of things like:
>
> [2015-08-31 12:08:44.801169] W [socket.c:923:__socket_keepalive] 0-socket:
> failed to set TCP_USER_TIMEOUT 0 on socket 57, Protocol not available
> [2015-08-31 12:08:44.801187] E [socket.c:3019:socket_connect]
> 0-speech0-client-43: Failed to set keep-alive: Protocol not available
> [2015-08-31 12:08:44.801305] W [socket.c:642:__socket_rwv]
> 0-speech0-client-43: readv on 10.88.153.25:24007 failed (Connection reset
> by peer)
> [2015-08-31 12:08:44.801404] E [rpc-clnt.c:362:saved_frames_unwind] (-->
> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1eb)[0x7fcf540db65b] (-->
> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x7fcf53ea71b7] (-->
> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fcf53ea72ce] (-->
> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xab)[0x7fcf53ea739b]
> (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15f)[0x7fcf53ea795f] )))))
> 0-speech0-client-43: forced unwinding frame type(GF-DUMP) op(DUMP(1))
> called at 2015-08-31 12:08:44.801294 (xid=0x17)
> [2015-08-31 12:08:44.801423] W [MSGID: 114032]
> [client-handshake.c:1623:client_dump_version_cbk] 0-speech0-client-43:
> received RPC status error [Transport endpoint is not connected]
> [2015-08-31 12:08:44.801440] I [MSGID: 114018]
> [client.c:2042:client_rpc_notify] 0-speech0-client-43: disconnected from
> speech0-client-43. Client process will keep trying to connect to glusterd
> until brick's port is available
> [2015-08-31 12:08:44.804488] W [socket.c:923:__socket_keepalive] 0-socket:
> failed to set TCP_USER_TIMEOUT 0 on socket 57, Protocol not available
> [2015-08-31 12:08:44.804505] E [socket.c:3019:socket_connect]
> 0-speech0-client-51: Failed to set keep-alive: Protocol not available
> [2015-08-31 12:08:44.804775] W [socket.c:642:__socket_rwv]
> 0-speech0-client-51: readv on 10.88.146.19:24007 failed (Connection reset
> by peer)
> [2015-08-31 12:08:44.804878] E [rpc-clnt.c:362:saved_frames_unwind] (-->
> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1eb)[0x7fcf540db65b] (-->
> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x7fcf53ea71b7] (-->
> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fcf53ea72ce] (-->
> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xab)[0x7fcf53ea739b]
> (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15f)[0x7fcf53ea795f] )))))
> 0-speech0-client-51: forced unwinding frame type(GF-DUMP) op(DUMP(1))
> called at 2015-08-31 12:08:44.804693 (xid=0x18)
> [2015-08-31 12:08:44.804898] W [MSGID: 114032]
> [client-handshake.c:1623:client_dump_version_cbk] 0-speech0-client-51:
> received RPC status error [Transport endpoint is not connected]
> [2015-08-31 12:08:44.804917] I [MSGID: 114018]
> [client.c:2042:client_rpc_notify] 0-speech0-client-51: disconnected from
> speech0-client-51. Client process will keep trying to connect to glusterd
> until brick's port is available
>
>
> 2015-08-31 20:06 GMT+08:00 Yiping Peng <barius.cn at gmail.com>:
>
>>
>> I believe the following events have happened in the cluster resulting
>>> into this situation:
>>> 1. GlusterD & brick process on node 2 was brought down
>>> 2. Node 1 was rebooted.
>>>
>> Strangely enough, glusterfs, glusterd and glusterfsd are running on my
>> server. Is glusterfsd the brick process? Also server01 has not been
>> rebooted during the whole process.
>>
>> glusterfsd has the following arguments:
>> /usr/sbin/glusterfsd -s server01.local.net --volfile-id
>> speech0.server01.local.net.home-glusterfs-speech0-brick0 -p
>> /var/lib/glusterd/vols/speech0/run/server01.local.net-home-glusterfs-speech0-brick0.pid
>> -S /var/run/gluster/6bf40a98deade9dde8b615226bc57567.socket --brick-name
>> /home/glusterfs/speech0/brick0 -l
>> /var/log/glusterfs/bricks/home-glusterfs-speech0-brick0.log --xlator-option
>> *-posix.glusterd-uuid=1c33ff18-2a6a-44cf-9a04-727fc96e92be --brick-port
>> 49159 --xlator-option speech0-server.listen-port=49159
>>
>> One more thing, when I do this on server1, which has been in the pool for
>> a long time:
>> server1:~$ mount server1:/vol1 mountpoint
>> It also fails.
>> The log gave me:
>>
>> [2015-08-31 11:56:57.123307] I [MSGID: 100030] [glusterfsd.c:2301:main]
>> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.3
>> (args: /usr/sbin/glusterfs --volfile-server=localhost --volfile-id=/speech0
>> qqq)
>> [2015-08-31 11:56:57.134642] W [socket.c:923:__socket_keepalive]
>> 0-socket: failed to set TCP_USER_TIMEOUT 0 on socket 9, Protocol not
>> available
>> [2015-08-31 11:56:57.134688] E [socket.c:3019:socket_connect]
>> 0-glusterfs: Failed to set keep-alive: Protocol not available
>> [2015-08-31 11:56:57.135063] I [MSGID: 101190]
>> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
>> with index 1
>> [2015-08-31 11:56:57.135113] E [socket.c:2332:socket_connect_finish]
>> 0-glusterfs: connection to 127.0.0.1:24007 failed (Connection reset by
>> peer)
>> [2015-08-31 11:56:57.135149] E [glusterfsd-mgmt.c:1819:mgmt_rpc_notify]
>> 0-glusterfsd-mgmt: failed to connect with remote-host: localhost (Transport
>> endpoint is not connected)
>> [2015-08-31 11:56:57.135158] I [glusterfsd-mgmt.c:1825:mgmt_rpc_notify]
>> 0-glusterfsd-mgmt: Exhausted all volfile servers
>> [2015-08-31 11:56:57.135333] W [glusterfsd.c:1219:cleanup_and_exit]
>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1a3) [0x7fb5e1be39a3]
>> -->/usr/sbin/glusterfs() [0x4099c8]
>> -->/usr/sbin/glusterfs(cleanup_and_exit+0x65) [0x4059b5] ) 0-: received
>> signum (1), shutting down
>> [2015-08-31 11:56:57.135371] I [fuse-bridge.c:5595:fini] 0-fuse:
>> Unmounting '/home/speech/pengyiping/qqq'.
>> [2015-08-31 11:56:57.140640] W [glusterfsd.c:1219:cleanup_and_exit]
>> (-->/lib64/libpthread.so.0() [0x318b207851]
>> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xcd) [0x405e4d]
>> -->/usr/sbin/glusterfs(cleanup_and_exit+0x65) [0x4059b5] ) 0-: received
>> signum (15), shutting down
>>
>>
>> Any help is much appreciated.
>>
>>
>> 2015-08-31 19:15 GMT+08:00 Atin Mukherjee <amukherj at redhat.com>:
>>
>>> I believe the following events have happened in the cluster resulting
>>> into this situation:
>>> 1. GlusterD & brick process on node 2 was brought down
>>> 2. Node 1 was rebooted.
>>>
>>> In the above case the mount will definitely fail since the brick process
>>> was not started as in a 2 node set up glusterd waits its peers to come
>>> up before it starts the bricks. Could you check whether the brick
>>> process is running or not?
>>>
>>> Thanks,
>>> Atin
>>>
>>> On 08/31/2015 04:17 PM, Yiping Peng wrote:
>>> > I've tried both: assuming server1 is already in pool, server2 is
>>> undergoing
>>> > peer-probing
>>> >
>>> > server2:~$ mount server1:/vol1 mountpoint, fail;
>>> > server2:~$ mount server2:/vol1 mountpoint, fail.
>>> >
>>> > Strange enough. I *should* be able to mount server1:/vol1 on server2.
>>> But
>>> > this is not the case :(
>>> > Maybe something is broken in the server pool, as I'm seeing
>>> disconnected
>>> > nodes?
>>> >
>>> >
>>> > 2015-08-31 18:02 GMT+08:00 Ravishankar N <ravishankar at redhat.com>:
>>> >
>>> >>
>>> >>
>>> >> On 08/31/2015 12:53 PM, Merlin Morgenstern wrote:
>>> >>
>>> >> Trying to mount the brick on the same physical server with deamon
>>> running
>>> >> on this server but not on the other server:
>>> >>
>>> >> @node2:~$ sudo mount -t glusterfs gs2:/volume1 /data/nfs
>>> >> Mount failed. Please check the log file for more details.
>>> >>
>>> >> For mount to succeed the glusterd must be up on the node that you
>>> specify
>>> >> as the volfile-server; gs2 in this case. You can use -o
>>> >> backupvolfile-server=gs1 as a fallback.
>>> >> -Ravi
>>> >>
>>> >> _______________________________________________
>>> >> Gluster-users mailing list
>>> >> Gluster-users at gluster.org
>>> >> http://www.gluster.org/mailman/listinfo/gluster-users
>>> >>
>>> >
>>> >
>>> >
>>> > _______________________________________________
>>> > Gluster-users mailing list
>>> > Gluster-users at gluster.org
>>> > http://www.gluster.org/mailman/listinfo/gluster-users
>>> >
>>>
>>
>>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150831/ce49d56c/attachment.html>