[Gluster-devel] Gluster Brick Offline after reboot!!
ABHISHEK PALIWAL
abhishpaliwal at gmail.com
Tue Apr 19 07:42:21 UTC 2016
Hi Atin,
Thanks.
Have more doubts here.
Brick and glusterd connected by unix domain socket.It is just a local
socket then why it is disconnect in below logs:
1667 [2016-04-03 10:12:32.984331] I [MSGID: 106005]
[glusterd-handler.c:4908:__glusterd_brick_rpc_notify] 0-management:
Brick 10.32. 1.144:/opt/lvmdir/c2/brick has disconnected from
glusterd.
1668 [2016-04-03 10:12:32.984366] D [MSGID: 0]
[glusterd-utils.c:4872:glusterd_set_brick_status] 0-glusterd: Setting
brick 10.32.1. 144:/opt/lvmdir/c2/brick status to stopped
Regards,
Abhishek
On Fri, Apr 15, 2016 at 9:14 AM, Atin Mukherjee <amukherj at redhat.com> wrote:
>
>
> On 04/14/2016 04:07 PM, ABHISHEK PALIWAL wrote:
> >
> >
> > On Thu, Apr 14, 2016 at 2:33 PM, Atin Mukherjee <amukherj at redhat.com
> > <mailto:amukherj at redhat.com>> wrote:
> >
> >
> >
> > On 04/05/2016 03:35 PM, ABHISHEK PALIWAL wrote:
> > >
> > >
> > > On Tue, Apr 5, 2016 at 2:22 PM, Atin Mukherjee <
> amukherj at redhat.com <mailto:amukherj at redhat.com>
> > > <mailto:amukherj at redhat.com <mailto:amukherj at redhat.com>>> wrote:
> > >
> > >
> > >
> > > On 04/05/2016 01:04 PM, ABHISHEK PALIWAL wrote:
> > > > Hi Team,
> > > >
> > > > We are using Gluster 3.7.6 and facing one problem in which
> > brick is not
> > > > comming online after restart the board.
> > > >
> > > > To understand our setup, please look the following steps:
> > > > 1. We have two boards A and B on which Gluster volume is
> > running in
> > > > replicated mode having one brick on each board.
> > > > 2. Gluster mount point is present on the Board A which is
> > sharable
> > > > between number of processes.
> > > > 3. Till now our volume is in sync and everthing is working
> fine.
> > > > 4. Now we have test case in which we'll stop the glusterd,
> > reboot the
> > > > Board B and when this board comes up, starts the glusterd
> > again on it.
> > > > 5. We repeated Steps 4 multiple times to check the
> > reliability of system.
> > > > 6. After the Step 4, sometimes system comes in working state
> > (i.e. in
> > > > sync) but sometime we faces that brick of Board B is present
> in
> > > > “gluster volume status” command but not be online even
> > waiting for
> > > > more than a minute.
> > > As I mentioned in another email thread until and unless the
> > log shows
> > > the evidence that there was a reboot nothing can be concluded.
> > The last
> > > log what you shared with us few days back didn't give any
> > indication
> > > that brick process wasn't running.
> > >
> > > How can we identify that the brick process is running in brick
> logs?
> > >
> > > > 7. When the Step 4 is executing at the same time on Board A
> some
> > > > processes are started accessing the files from the Gluster
> > mount point.
> > > >
> > > > As a solution to make this brick online, we found some
> > existing issues
> > > > in gluster mailing list giving suggestion to use “gluster
> > volume start
> > > > <vol_name> force” to make the brick 'offline' to 'online'.
> > > >
> > > > If we use “gluster volume start <vol_name> force” command.
> > It will kill
> > > > the existing volume process and started the new process then
> > what will
> > > > happen if other processes are accessing the same volume at
> > the time when
> > > > volume process is killed by this command internally. Will it
> > impact any
> > > > failure on these processes?
> > > This is not true, volume start force will start the brick
> > processes only
> > > if they are not running. Running brick processes will not be
> > > interrupted.
> > >
> > > we have tried and check the pid of process before force start and
> > after
> > > force start.
> > > the pid has been changed after force start.
> > >
> > > Please find the logs at the time of failure attached once again
> with
> > > log-level=debug.
> > >
> > > if you can give me the exact line where you are able to find out
> that
> > > the brick process
> > > is running in brick log file please give me the line number of
> > that file.
> >
> > Here is the sequence at which glusterd and respective brick process
> is
> > restarted.
> >
> > 1. glusterd restart trigger - line number 1014 in glusterd.log file:
> >
> > [2016-04-03 10:12:29.051735] I [MSGID: 100030]
> [glusterfsd.c:2318:main]
> > 0-/usr/sbin/glusterd: Started running /usr/sbin/
> glusterd
> > version 3.7.6 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid
> > --log-level DEBUG)
> >
> > 2. brick start trigger - line number 190 in opt-lvmdir-c2-brick.log
> >
> > [2016-04-03 10:14:25.268833] I [MSGID: 100030]
> [glusterfsd.c:2318:main]
> > 0-/usr/sbin/glusterfsd: Started running /usr/sbin/
> glusterfsd
> > version 3.7.6 (args: /usr/sbin/glusterfsd -s 10.32.1.144 --volfile-id
> > c_glusterfs.10.32.1.144.opt-lvmdir-c2-brick -p /
> >
> system/glusterd/vols/c_glusterfs/run/10.32.1.144-opt-lvmdir-c2-brick.pid
> > -S /var/run/gluster/697c0e4a16ebc734cd06fd9150723005. socket
> > --brick-name /opt/lvmdir/c2/brick -l
> > /var/log/glusterfs/bricks/opt-lvmdir-c2-brick.log --xlator-option
> > *-posix.glusterd- uuid=2d576ff8-0cea-4f75-9e34-a5674fbf7256
> > --brick-port 49329 --xlator-option
> c_glusterfs-server.listen-port=49329)
> >
> > 3. The following log indicates that brick is up and is now started.
> > Refer to line 16123 in glusterd.log
> >
> > [2016-04-03 10:14:25.336855] D [MSGID: 0]
> > [glusterd-handler.c:4897:__glusterd_brick_rpc_notify] 0-management:
> > Connected to 10.32.1.144:/opt/lvmdir/c2/brick
> >
> > This clearly indicates that the brick is up and running as after
> that I
> > do not see any disconnect event been processed by glusterd for the
> brick
> > process.
> >
> >
> > Thanks for replying descriptively but please also clear some more doubts:
> >
> > 1. At this 10:14:25 moment of time brick is available because we have
> > removed brick and added it again to make it online:
> > following are the logs from cmd-history.log file of 000300
> >
> > [2016-04-03 10:14:21.446570] : volume status : SUCCESS
> > [2016-04-03 10:14:21.665889] : volume remove-brick c_glusterfs replica
> > 1 10.32.1.144:/opt/lvmdir/c2/brick force : SUCCESS
> > [2016-04-03 10:14:21.764270] : peer detach 10.32.1.144 : SUCCESS
> > [2016-04-03 10:14:23.060442] : peer probe 10.32.1.144 : SUCCESS
> > [2016-04-03 10:14:25.649525] : volume add-brick c_glusterfs replica 2
> > 10.32.1.144:/opt/lvmdir/c2/brick force : SUCCESS
> >
> > and also 10:12:29 was the last reboot time before this failure. So I am
> > totally agree what you said earlier.
> >
> > 2 .As you said at 10:12:29 glusterd restarted then why we are not
> > getting 'brick start trigger' related logs
> > like below between 10:12:29 to 10:14:25 time stamp which is something
> > two minute of time interval.
> So here is the culprit:
>
> 1667 [2016-04-03 10:12:32.984331] I [MSGID: 106005]
> [glusterd-handler.c:4908:__glusterd_brick_rpc_notify] 0-management:
> Brick 10.32. 1.144:/opt/lvmdir/c2/brick has disconnected from
> glusterd.
> 1668 [2016-04-03 10:12:32.984366] D [MSGID: 0]
> [glusterd-utils.c:4872:glusterd_set_brick_status] 0-glusterd: Setting
> brick 10.32.1. 144:/opt/lvmdir/c2/brick status to stopped
>
>
> GlusterD received a disconnect event for this brick process and mark it
> as stopped. This could happen due to two reasons. 1. brick process goes
> down or 2. Network issue. In this case its the later I believe since the
> brick process was running at that time. I'd request you to check this
> from the N/W side.
>
>
> >
> > [2016-04-03 10:14:25.268833] I [MSGID: 100030] [glusterfsd.c:2318:main]
> > 0-/usr/sbin/glusterfsd: Started running /usr/sbin/ glusterfsd
> > version 3.7.6 (args: /usr/sbin/glusterfsd -s 10.32.1.144 --volfile-id
> > c_glusterfs.10.32.1.144.opt-lvmdir-c2-brick -p /
> > system/glusterd/vols/c_glusterfs/run/10.32.1.144-opt-lvmdir-c2-brick.pid
> > -S /var/run/gluster/697c0e4a16ebc734cd06fd9150723005. socket
> > --brick-name /opt/lvmdir/c2/brick -l
> > /var/log/glusterfs/bricks/opt-lvmdir-c2-brick.log --xlator-option
> > *-posix.glusterd- uuid=2d576ff8-0cea-4f75-9e34-a5674fbf7256
> > --brick-port 49329 --xlator-option c_glusterfs-server.listen-port=49329)
> >
> > 3. We are continuously checking brick status in the above time duration
> > using "gluster volume status" refer the cmd-history.log file from 000300
> >
> > In glusterd.log file we are also getting below logs
> >
> > [2016-04-03 10:12:31.771051] D [MSGID: 0]
> > [glusterd-handler.c:4897:__glusterd_brick_rpc_notify] 0-management:
> > Connected to 10.32.1.144:/opt/lvmdir/c2/brick
> >
> > [2016-04-03 10:12:32.981152] D [MSGID: 0]
> > [glusterd-handler.c:4897:__glusterd_brick_rpc_notify] 0-management:
> > Connected to 10.32.1.144:/opt/lvmdir/c2/brick
> >
> > two times b/w 10:12:29 and 10:14:25 and as you said these logs "
> > clearly indicates that the brick is up and running as after" then why
> > brick is not online in "gluster volume status" command
> >
> > [2016-04-03 10:12:33.990487] : volume status : SUCCESS
> > [2016-04-03 10:12:34.007469] : volume status : SUCCESS
> > [2016-04-03 10:12:35.095918] : volume status : SUCCESS
> > [2016-04-03 10:12:35.126369] : volume status : SUCCESS
> > [2016-04-03 10:12:36.224018] : volume status : SUCCESS
> > [2016-04-03 10:12:36.251032] : volume status : SUCCESS
> > [2016-04-03 10:12:37.352377] : volume status : SUCCESS
> > [2016-04-03 10:12:37.374028] : volume status : SUCCESS
> > [2016-04-03 10:12:38.446148] : volume status : SUCCESS
> > [2016-04-03 10:12:38.468860] : volume status : SUCCESS
> > [2016-04-03 10:12:39.534017] : volume status : SUCCESS
> > [2016-04-03 10:12:39.553711] : volume status : SUCCESS
> > [2016-04-03 10:12:40.616610] : volume status : SUCCESS
> > [2016-04-03 10:12:40.636354] : volume status : SUCCESS
> > ......
> > ......
> > ......
> > [2016-04-03 10:14:21.446570] : volume status : SUCCESS
> > [2016-04-03 10:14:21.665889] : volume remove-brick c_glusterfs replica
> > 1 10.32.1.144:/opt/lvmdir/c2/brick force : SUCCESS
> > [2016-04-03 10:14:21.764270] : peer detach 10.32.1.144 : SUCCESS
> > [2016-04-03 10:14:23.060442] : peer probe 10.32.1.144 : SUCCESS
> > [2016-04-03 10:14:25.649525] : volume add-brick c_glusterfs replica 2
> > 10.32.1.144:/opt/lvmdir/c2/brick force : SUCCESS
> >
> > In above logs we are continuously checking brick status but when we
> > don't find brick status 'online' even after ~2 minutes then we removed
> > it and add it again to make it online.
> >
> > [2016-04-03 10:14:21.665889] : volume remove-brick c_glusterfs replica
> > 1 10.32.1.144:/opt/lvmdir/c2/brick force : SUCCESS
> > [2016-04-03 10:14:21.764270] : peer detach 10.32.1.144 : SUCCESS
> > [2016-04-03 10:14:23.060442] : peer probe 10.32.1.144 : SUCCESS
> > [2016-04-03 10:14:25.649525] : volume add-brick c_glusterfs replica 2
> > 10.32.1.144:/opt/lvmdir/c2/brick force : SUCCESS
> >
> > that is why in logs we are gettting "brick start trigger logs" at time
> > stamp 10:14:25
> >
> > [2016-04-03 10:14:25.268833] I [MSGID: 100030] [glusterfsd.c:2318:main]
> > 0-/usr/sbin/glusterfsd: Started running /usr/sbin/ glusterfsd
> > version 3.7.6 (args: /usr/sbin/glusterfsd -s 10.32.1.144 --volfile-id
> > c_glusterfs.10.32.1.144.opt-lvmdir-c2-brick -p /
> > system/glusterd/vols/c_glusterfs/run/10.32.1.144-opt-lvmdir-c2-brick.pid
> > -S /var/run/gluster/697c0e4a16ebc734cd06fd9150723005. socket
> > --brick-name /opt/lvmdir/c2/brick -l
> > /var/log/glusterfs/bricks/opt-lvmdir-c2-brick.log --xlator-option
> > *-posix.glusterd- uuid=2d576ff8-0cea-4f75-9e34-a5674fbf7256
> > --brick-port 49329 --xlator-option c_glusterfs-server.listen-port=49329)
> >
> >
> > Regards,
> > Abhishek
> >
> >
> > Please note that all the logs referred and pasted are from 002500.
> >
> > ~Atin
> > >
> > > 002500 - Board B that brick is offline
> > > 00300 - Board A logs
> > >
> > > >
> > > > *Question : What could be contributing to brick offline?*
> > > >
> > > >
> > > > --
> > > >
> > > > Regards
> > > > Abhishek Paliwal
> > > >
> > > >
> > > > _______________________________________________
> > > > Gluster-devel mailing list
> > > > Gluster-devel at gluster.org <mailto:Gluster-devel at gluster.org>
> > <mailto:Gluster-devel at gluster.org <mailto:Gluster-devel at gluster.org
> >>
> > > > http://www.gluster.org/mailman/listinfo/gluster-devel
> > > >
> > >
> > >
> > >
> > >
> >
> >
> >
> >
> > --
> >
> >
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20160419/13a2aac2/attachment-0001.html>
More information about the Gluster-devel
mailing list