[Gluster-users] Initial mount problem - all subvolumes are down

Tue Mar 31 17:53:59 UTC 2015

On 03/31/2015 10:47 PM, Rumen Telbizov wrote:
> Pranith and Atin,
>
> Thank you for looking into this and confirming it's a bug. Please log 
> the bug yourself since I am not familiar with the project's 
> bug-tracking system.
>
> Assessing its severity and the fact that this effectively stops the 
> cluster from functioning properly after boot, what do you think would 
> be the timeline for fixing this issue? What version do you expect to 
> see this fixed in?
>
> In the meantime, is there another workaround that you might suggest 
> besides running a secondary mount later after the boot is over?
Adding glusterd maintainers to the thread: +kaushal, +krishnan
I will let them answer your questions.

Pranith
>
> Thank you again for your help,
> Rumen Telbizov
>
>
>
> On Tue, Mar 31, 2015 at 2:53 AM, Pranith Kumar Karampuri 
> <pkarampu at redhat.com <mailto:pkarampu at redhat.com>> wrote:
>
>
>     On 03/31/2015 01:55 PM, Atin Mukherjee wrote:
>
>
>         On 03/31/2015 01:03 PM, Pranith Kumar Karampuri wrote:
>
>             On 03/31/2015 12:53 PM, Atin Mukherjee wrote:
>
>                 On 03/31/2015 12:27 PM, Pranith Kumar Karampuri wrote:
>
>                     Atin,
>                              Could it be because bricks are started
>                     with PROC_START_NO_WAIT?
>
>                 That's the correct analysis Pranith. Mount was
>                 attempted before the
>                 bricks were started. If we can have a time lag in some
>                 seconds between
>                 mount and volume start the problem will go away.
>
>             Atin,
>                     I think one way to solve this issue is to start
>             the bricks with
>             NO_WAIT so that we can handle pmap-signin but wait for the
>             pmap-signins
>             to complete before responding to cli/completing 'init'?
>
>         Logically it should solve the problem. We need to think around
>         it more
>         from the existing design perspective.
>
>     Rumen,
>          Feel free to log a bug. This should be fixed in later
>     release. We can raise the bug and work it as well if you prefer it
>     this way.
>
>     Pranith
>
>
>         ~Atin
>
>             Pranith
>
>
>                     Pranith
>                     On 03/31/2015 04:41 AM, Rumen Telbizov wrote:
>
>                         Hello everyone,
>
>                         I have a problem that I am trying to resolve
>                         and not sure which way to
>                         go so here I am asking for your advise.
>
>                         What it comes down to is that upon initial
>                         boot of all my GlusterFS
>                         machines the shared volume doesn't get
>                         mounted. Nevertheless the
>                         volume successfully created and started and
>                         further attempts to mount
>                         it manually succeed. I suspect what's
>                         happening is that gluster
>                         processes/bricks/etc haven't fully started at
>                         the time the /etc/fstab
>                         entry is read and the initial mount attempt is
>                         being made. Again, by
>                         the time I log in and run a mount -a -- the
>                         volume mounts without any
>                         issues.
>
>                         _Details from the logs:_
>
>                         [2015-03-30 22:29:04.381918] I [MSGID: 100030]
>                         [glusterfsd.c:2018:main]
>                         0-/usr/sbin/glusterfs: Started running
>                         /usr/sbin/glusterfs version 3.6.2 (args:
>                         /usr/sbin/glusterfs
>                         --log-file=/var/log/glusterfs/glusterfs.log
>                         --attribute-timeout=0
>                         --entry-timeout=0 --volfile-server=localhost
>                         --volfile-server=10.12.130.21
>                         --volfile-server=10.12.130.22
>                         --volfile-server=10.12.130.23
>                         --volfile-id=/myvolume /opt/shared)
>                         [2015-03-30 22:29:04.394913] E
>                         [socket.c:2267:socket_connect_finish]
>                         0-glusterfs: connection to 127.0.0.1:24007
>                         <http://127.0.0.1:24007> <http://127.0.0.1:24007>
>                         failed (Connection refused)
>                         [2015-03-30 22:29:04.394950] E
>                         [glusterfsd-mgmt.c:1811:mgmt_rpc_notify]
>                         0-glusterfsd-mgmt: failed to
>                         connect with remote-host: localhost (Transport
>                         endpoint is not
>                         connected)
>                         [2015-03-30 22:29:04.394964] I
>                         [glusterfsd-mgmt.c:1838:mgmt_rpc_notify]
>                         0-glusterfsd-mgmt: connecting
>                         to next volfile server 10.12.130.21
>                         [2015-03-30 22:29:08.390687] E
>                         [glusterfsd-mgmt.c:1811:mgmt_rpc_notify]
>                         0-glusterfsd-mgmt: failed to
>                         connect with remote-host: 10.12.130.21
>                         (Transport endpoint is not
>                         connected)
>                         [2015-03-30 22:29:08.390720] I
>                         [glusterfsd-mgmt.c:1838:mgmt_rpc_notify]
>                         0-glusterfsd-mgmt: connecting
>                         to next volfile server 10.12.130.22
>                         [2015-03-30 22:29:11.392015] E
>                         [glusterfsd-mgmt.c:1811:mgmt_rpc_notify]
>                         0-glusterfsd-mgmt: failed to
>                         connect with remote-host: 10.12.130.22
>                         (Transport endpoint is not
>                         connected)
>                         [2015-03-30 22:29:11.392050] I
>                         [glusterfsd-mgmt.c:1838:mgmt_rpc_notify]
>                         0-glusterfsd-mgmt: connecting
>                         to next volfile server 10.12.130.23
>                         [2015-03-30 22:29:14.406429] I
>                         [dht-shared.c:337:dht_init_regex]
>                         0-brain-dht: using regex rsync-hash-regex =
>                         ^\.(.+)\.[^.]+$
>                         [2015-03-30 22:29:14.408964] I
>                         [rpc-clnt.c:969:rpc_clnt_connection_init]
>                         0-host-client-2: setting
>                         frame-timeout to 60
>                         [2015-03-30 22:29:14.409183] I
>                         [rpc-clnt.c:969:rpc_clnt_connection_init]
>                         0-host-client-1: setting
>                         frame-timeout to 60
>                         [2015-03-30 22:29:14.409388] I
>                         [rpc-clnt.c:969:rpc_clnt_connection_init]
>                         0-host-client-0: setting
>                         frame-timeout to 60
>                         [2015-03-30 22:29:14.409430] I
>                         [client.c:2280:notify] 0-host-client-0:
>                         parent translators are ready, attempting
>                         connect on transport
>                         [2015-03-30 22:29:14.409658] I
>                         [client.c:2280:notify] 0-host-client-1:
>                         parent translators are ready, attempting
>                         connect on transport
>                         [2015-03-30 22:29:14.409844] I
>                         [client.c:2280:notify] 0-host-client-2:
>                         parent translators are ready, attempting
>                         connect on transport
>                         Final graph:
>
>                         ....
>
>                         [2015-03-30 22:29:14.411045] I
>                         [client.c:2215:client_rpc_notify]
>                         0-host-client-2: disconnected from
>                         host-client-2. Client process will
>                         keep trying to connect to glusterd until
>                         brick's port is available
>                         *[2015-03-30 22:29:14.411063] E [MSGID: 108006]
>                         [afr-common.c:3591:afr_notify]
>                         0-myvolume-replicate-0: All subvolumes
>                         are down. Going offline until atleast one of
>                         them comes back up.
>                         *[2015-03-30 22:29:14.414871] I
>                         [fuse-bridge.c:5080:fuse_graph_setup]
>                         0-fuse: switched to graph 0
>                         [2015-03-30 22:29:14.415003] I
>                         [fuse-bridge.c:4009:fuse_init]
>                         0-glusterfs-fuse: FUSE inited with protocol
>                         versions: glusterfs 7.22
>                         kernel 7.17
>                         [2015-03-30 22:29:14.415101] I
>                         [afr-common.c:3722:afr_local_init]
>                         0-myvolume-replicate-0: no subvolumes up
>                         [2015-03-30 22:29:14.415215] I
>                         [afr-common.c:3722:afr_local_init]
>                         0-myvolume-replicate-0: no subvolumes up
>                         [2015-03-30 22:29:14.415236] W
>                         [fuse-bridge.c:779:fuse_attr_cbk]
>                         0-glusterfs-fuse: 2: LOOKUP() / => -1
>                         (Transport endpoint is not
>                         connected)
>                         [2015-03-30 22:29:14.419007] I
>                         [fuse-bridge.c:4921:fuse_thread_proc]
>                         0-fuse: unmounting /opt/shared
>                         *[2015-03-30 22:29:14.420176] W
>                         [glusterfsd.c:1194:cleanup_and_exit]
>                         (--> 0-: received signum (15), shutting down*
>                         [2015-03-30 22:29:14.420192] I
>                         [fuse-bridge.c:5599:fini] 0-fuse:
>                         Unmounting '/opt/shared'.
>
>
>                         _Relevant /etc/fstab entries are:_
>
>                         /dev/xvdb /opt/local xfs
>                         defaults,noatime,nodiratime 0 0
>
>                         localhost:/myvolume /opt/shared glusterfs
>                         defaults,_netdev,attribute-timeout=0,entry-timeout=0,log-file=/var/log/glusterfs/glusterfs.log,backup-volfile-servers=10.12.130.21:10.12.130.22:10.12.130.23
>
>                         0 0
>
>
>                         _Volume configuration is:_
>
>                         Volume Name: myvolume
>                         Type: Replicate
>                         Volume ID: xxxx
>                         Status: Started
>                         Number of Bricks: 1 x 3 = 3
>                         Transport-type: tcp
>                         Bricks:
>                         Brick1: host1:/opt/local/brick
>                         Brick2: host2:/opt/local/brick
>                         Brick3: host3:/opt/local/brick
>                         Options Reconfigured:
>                         storage.health-check-interval: 5
>                         network.ping-timeout: 5
>                         nfs.disable: on
>                         auth.allow: 10.12.130.21,10.12.130.22,10.12.130.23
>                         cluster.quorum-type: auto
>                         network.frame-timeout: 60
>
>
>                         I run Debian 7 and the following GlusterFS
>                         version 3.6.2-2.
>
>                         While I could together some rc.local type of
>                         script which retries to
>                         mount the volume for a while until it succeeds
>                         or times out I was
>                         wondering if there's a better way to solve
>                         this problem?
>
>                         Thank you for your help.
>
>                         Regards,
>                         -- 
>                         Rumen Telbizov
>                         Unix Systems Administrator <http://telbizov.com>
>
>
>                         _______________________________________________
>                         Gluster-users mailing list
>                         Gluster-users at gluster.org
>                         <mailto:Gluster-users at gluster.org>
>                         http://www.gluster.org/mailman/listinfo/gluster-users
>
>
>
>
>
>
>
> -- 
> Rumen Telbizov
> Unix Systems Administrator <http://telbizov.com>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150331/463fec1f/attachment.html>