[Gluster-devel] [glusterfs-3.6.0beta3-0.11.gitd01b00a] gluster volume status is running even though the Disk is detached

Tue Oct 28 07:40:56 UTC 2014

I applied the patches, compiled and installed the gluster.

# glusterfs --version
glusterfs 3.7dev built on Oct 28 2014 12:03:10
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2013 Red Hat, Inc. <http://www.redhat.com/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.

# git log
commit 990ce16151c3af17e4cdaa94608b737940b60e4d
Author: Lalatendu Mohanty <lmohanty at redhat.com>
Date:   Tue Jul 1 07:52:27 2014 -0400

    Posix: Brick failure detection fix for ext4 filesystem
...
...

I see below messages

File /var/log/glusterfs/etc-glusterfs-glusterd.vol.log :

The message "I [MSGID: 106005]
[glusterd-handler.c:4142:__glusterd_brick_rpc_notify] 0-management: Brick
192.168.1.246:/zp2/brick2 has disconnected from glusterd." repeated 39
times between [2014-10-28 05:58:09.209419] and [2014-10-28 06:00:06.226330]
[2014-10-28 06:00:09.226507] W [socket.c:545:__socket_rwv] 0-management:
readv on /var/run/6154ed2845b7f728a3acdce9d69e08ee.socket failed (Invalid
argument)
[2014-10-28 06:00:09.226712] I [MSGID: 106005]
[glusterd-handler.c:4142:__glusterd_brick_rpc_notify] 0-management: Brick
192.168.1.246:/zp2/brick2 has disconnected from glusterd.
[2014-10-28 06:00:12.226881] W [socket.c:545:__socket_rwv] 0-management:
readv on /var/run/6154ed2845b7f728a3acdce9d69e08ee.socket failed (Invalid
argument)
[2014-10-28 06:00:15.227249] W [socket.c:545:__socket_rwv] 0-management:
readv on /var/run/6154ed2845b7f728a3acdce9d69e08ee.socket failed (Invalid
argument)
[2014-10-28 06:00:18.227616] W [socket.c:545:__socket_rwv] 0-management:
readv on /var/run/6154ed2845b7f728a3acdce9d69e08ee.socket failed (Invalid
argument)
[2014-10-28 06:00:21.227976] W [socket.c:545:__socket_rwv] 0-management:
readv on

.....
.....

[2014-10-28 06:19:15.142867] I
[glusterd-handler.c:1280:__glusterd_handle_cli_get_volume] 0-glusterd:
Received get vol req
The message "I [MSGID: 106005]
[glusterd-handler.c:4142:__glusterd_brick_rpc_notify] 0-management: Brick
192.168.1.246:/zp2/brick2 has disconnected from glusterd." repeated 12
times between [2014-10-28 06:18:09.368752] and [2014-10-28 06:18:45.373063]
[2014-10-28 06:23:38.207649] W [glusterfsd.c:1194:cleanup_and_exit] (-->
0-: received signum (15), shutting down

dmesg output:

SPLError: 7869:0:(spl-err.c:67:vcmn_err()) WARNING: Pool 'zp2' has
encountered an uncorrectable I/O failure and has been suspended.

SPLError: 7868:0:(spl-err.c:67:vcmn_err()) WARNING: Pool 'zp2' has
encountered an uncorrectable I/O failure and has been suspended.

SPLError: 7869:0:(spl-err.c:67:vcmn_err()) WARNING: Pool 'zp2' has
encountered an uncorrectable I/O failure and has been suspended.

The brick is still online.

# gluster volume status
Status of volume: repvol
Gluster process Port Online Pid
------------------------------------------------------------------------------
Brick 192.168.1.246:/zp1/brick1 49152 Y 4067
Brick 192.168.1.246:/zp2/brick2 49153 Y 4078
NFS Server on localhost 2049 Y 4092
Self-heal Daemon on localhost N/A Y 4097

Task Status of Volume repvol
------------------------------------------------------------------------------
There are no active volume tasks

# gluster volume info

Volume Name: repvol
Type: Replicate
Volume ID: ba1e7c6d-1e1c-45cd-8132-5f4fa4d2d22b
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 192.168.1.246:/zp1/brick1
Brick2: 192.168.1.246:/zp2/brick2
Options Reconfigured:
storage.health-check-interval: 30

Let me know if you need further information.

Thanks,
Kiran.

On Tue, Oct 28, 2014 at 11:44 AM, Kiran Patil <kiran at fractalio.com> wrote:

> I changed  git fetch git://review.gluster.org/glusterfs  to git fetch
> http://review.gluster.org/glusterfs  and now it works.
>
> Thanks,
> Kiran.
>
> On Tue, Oct 28, 2014 at 11:13 AM, Kiran Patil <kiran at fractalio.com> wrote:
>
>> Hi Niels,
>>
>> I am getting "fatal: Couldn't find remote ref refs/changes/13/8213/9"
>> error.
>>
>> Steps to reproduce the issue.
>>
>> 1) # git clone git://review.gluster.org/glusterfs
>> Initialized empty Git repository in /root/gluster-3.6/glusterfs/.git/
>> remote: Counting objects: 84921, done.
>> remote: Compressing objects: 100% (48307/48307), done.
>> remote: Total 84921 (delta 57264), reused 63233 (delta 36254)
>> Receiving objects: 100% (84921/84921), 23.23 MiB | 192 KiB/s, done.
>> Resolving deltas: 100% (57264/57264), done.
>>
>> 2) # cd glusterfs
>>     # git branch
>>     * master
>>
>> 3) # git fetch git://review.gluster.org/glusterfs refs/changes/13/8213/9
>> && git checkout FETCH_HEAD
>> fatal: Couldn't find remote ref refs/changes/13/8213/9
>>
>> Note: I also tried the above steps on git repo
>> https://github.com/gluster/glusterfs and the result is same as above.
>>
>> Please let me know if I miss any steps.
>>
>> Thanks,
>> Kiran.
>>
>> On Mon, Oct 27, 2014 at 5:53 PM, Niels de Vos <ndevos at redhat.com> wrote:
>>
>>> On Mon, Oct 27, 2014 at 05:19:13PM +0530, Kiran Patil wrote:
>>> > Hi,
>>> >
>>> > I created replicated vol with two bricks on the same node and copied
>>> some
>>> > data to it.
>>> >
>>> > Now removed the disk which has hosted one of the brick of the volume.
>>> >
>>> > Storage.health-check-interval is set to 30 seconds.
>>> >
>>> > I could see the disk is unavailable using zpool command of zfs on
>>> linux but
>>> > the gluster volume status still displays the brick process running
>>> which
>>> > should have been shutdown by this time.
>>> >
>>> > Is this a bug in 3.6 since it is mentioned as feature "
>>> >
>>> https://github.com/gluster/glusterfs/blob/release-3.6/doc/features/brick-failure-detection.md
>>> "
>>> >  or am I doing any mistakes here?
>>>
>>> The initial detection of brick failures did not work for all
>>> filesystems. It may not work for ZFS too. A fix has been posted, but it
>>> has not been merged into the master branch yet. When the change has been
>>> merged, it can get backported to 3.6 and 3.5.
>>>
>>> You may want to test with the patch applied, and add your "+1 Verified"
>>> to the change in case it makes it functional for you:
>>> - http://review.gluster.org/8213
>>>
>>> Cheers,
>>> Niels
>>>
>>> >
>>> > [root at fractal-c92e gluster-3.6]# gluster volume status
>>> > Status of volume: repvol
>>> > Gluster process Port Online Pid
>>> >
>>> ------------------------------------------------------------------------------
>>> > Brick 192.168.1.246:/zp1/brick1 49154 Y 17671
>>> > Brick 192.168.1.246:/zp2/brick2 49155 Y 17682
>>> > NFS Server on localhost 2049 Y 17696
>>> > Self-heal Daemon on localhost N/A Y 17701
>>> >
>>> > Task Status of Volume repvol
>>> >
>>> ------------------------------------------------------------------------------
>>> > There are no active volume tasks
>>> >
>>> >
>>> > [root at fractal-c92e gluster-3.6]# gluster volume info
>>> >
>>> > Volume Name: repvol
>>> > Type: Replicate
>>> > Volume ID: d4f992b1-1393-43b8-9fda-2e2b6e3b5039
>>> > Status: Started
>>> > Number of Bricks: 1 x 2 = 2
>>> > Transport-type: tcp
>>> > Bricks:
>>> > Brick1: 192.168.1.246:/zp1/brick1
>>> > Brick2: 192.168.1.246:/zp2/brick2
>>> > Options Reconfigured:
>>> > storage.health-check-interval: 30
>>> >
>>> > [root at fractal-c92e gluster-3.6]# zpool status zp2
>>> >   pool: zp2
>>> >  state: UNAVAIL
>>> > status: One or more devices are faulted in response to IO failures.
>>> > action: Make sure the affected devices are connected, then run 'zpool
>>> > clear'.
>>> >    see: http://zfsonlinux.org/msg/ZFS-8000-HC
>>> >   scan: none requested
>>> > config:
>>> >
>>> > NAME        STATE     READ WRITE CKSUM
>>> > zp2         UNAVAIL      0     0     0  insufficient replicas
>>> >   sdb       UNAVAIL      0     0     0
>>> >
>>> > errors: 2 data errors, use '-v' for a list
>>> >
>>> >
>>> > Thanks,
>>> > Kiran.
>>>
>>> > _______________________________________________
>>> > Gluster-devel mailing list
>>> > Gluster-devel at gluster.org
>>> > http://supercolony.gluster.org/mailman/listinfo/gluster-devel
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20141028/f6ac75bc/attachment-0001.html>