[Gluster-users] libgfapi failover problem on replica bricks

Wed Aug 6 07:20:50 UTC 2014

On 08/06/2014 12:27 PM, Roman wrote:
> Yesterday I've reproduced this situation two times.
> The setup:
> 1. Hardware and network
>    a. Disks INTEL SSDSC2BB240G4
>    b1. Network cards: X540-AT2
>    b2. Netgear 10g switch
> 2. Software setup:
>    a. OS: Debian wheezy
>    b. Glusterfs: 3.4.4 (latest 3.4.4 from gluster repository)
>    c. Promox VE with update glusterfs from gluster repository
> 3. Software Configuration
>    a. create replicated volume with cluster.self-heal-daemon: off; 
> nfs.disable: off; network.ping-timeout: 2 opts
>    b. mount it on proxmox VE (via proxmox gui, it mouts with these 
> opts: stor1:HA-fast-150G-PVE1 on /mnt/pve/FAST-TESt type 
> fuse.glusterfs 
> (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) 
>   )
>    c. install VM with qcow2 or raw disk image.
>    d. disable port / remove network cable from one of storage servers
>    e. wait and put cable back
>    f. keep waiting for sync (pointless, it won't ever start)
>    g. disable another port for second server (or remove cable from 
> second server)
>    h. profit.
>
> Maybe I could use 3.5.2 from debian sid (testing) repository to test with?
Sure, you can go ahead. I will just write one document about maintaining 
VMs on gluster from the perspective of replication.

Pranith
>
>
> 2014-08-06 9:39 GMT+03:00 Pranith Kumar Karampuri <pkarampu at redhat.com 
> <mailto:pkarampu at redhat.com>>:
>
>     Roman,
>         The file went into split-brain. I think we should do these
>     tests with 3.5.2. Where monitoring the heals is easier. Let me
>     also come up with a document about how to do this testing you are
>     trying to do.
>
>     Humble/Niels,
>         Do we have debs available for 3.5.2? In 3.5.1 there was
>     packaging issue where /usr/bin/glfsheal is not packaged along with
>     the deb. I think that should be fixed now as well?
>
>     Pranith
>
>     On 08/06/2014 11:52 AM, Roman wrote:
>>     good morning,
>>
>>     root at stor1:~# getfattr -d -m. -e hex
>>     /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>     getfattr: Removing leading '/' from absolute path names
>>     # file: exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>     trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000
>>     trusted.afr.HA-fast-150G-PVE1-client-1=0x000001320000000000000000
>>     trusted.gfid=0x23c79523075a4158bea38078da570449
>>
>>     getfattr: Removing leading '/' from absolute path names
>>     # file: exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>     trusted.afr.HA-fast-150G-PVE1-client-0=0x000000040000000000000000
>>     trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000
>>     trusted.gfid=0x23c79523075a4158bea38078da570449
>>
>>
>>
>>     2014-08-06 9:20 GMT+03:00 Pranith Kumar Karampuri
>>     <pkarampu at redhat.com <mailto:pkarampu at redhat.com>>:
>>
>>
>>         On 08/06/2014 11:30 AM, Roman wrote:
>>>         Also, this time files are not the same!
>>>
>>>         root at stor1:~# md5sum
>>>         /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>         32411360c53116b96a059f17306caeda
>>>          /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>
>>>         root at stor2:~# md5sum
>>>         /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>         65b8a6031bcb6f5fb3a11cb1e8b1c9c9
>>>          /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>         What is the getfattr output?
>>
>>         Pranith
>>
>>>
>>>
>>>         2014-08-05 16:33 GMT+03:00 Roman <romeo.r at gmail.com
>>>         <mailto:romeo.r at gmail.com>>:
>>>
>>>             Nope, it is not working. But this time it went a bit
>>>             other way
>>>
>>>             root at gluster-client:~# dmesg
>>>             Segmentation fault
>>>
>>>
>>>             I was not able even to start the VM after I done the tests
>>>
>>>             Could not read qcow2 header: Operation not permitted
>>>
>>>             And it seems, it never starts to sync files after first
>>>             disconnect. VM survives first disconnect, but not second
>>>             (I waited around 30 minutes). Also, I've
>>>             got network.ping-timeout: 2 in volume settings, but logs
>>>             react on first disconnect around 30 seconds. Second was
>>>             faster, 2 seconds.
>>>
>>>             Reaction was different also:
>>>
>>>             slower one:
>>>             [2014-08-05 13:26:19.558435] W
>>>             [socket.c:514:__socket_rwv] 0-glusterfs: readv failed
>>>             (Connection timed out)
>>>             [2014-08-05 13:26:19.558485] W
>>>             [socket.c:1962:__socket_proto_state_machine]
>>>             0-glusterfs: reading from socket failed. Error
>>>             (Connection timed out), peer (10.250.0.1:24007
>>>             <http://10.250.0.1:24007>)
>>>             [2014-08-05 13:26:21.281426] W
>>>             [socket.c:514:__socket_rwv]
>>>             0-HA-fast-150G-PVE1-client-0: readv failed (Connection
>>>             timed out)
>>>             [2014-08-05 13:26:21.281474] W
>>>             [socket.c:1962:__socket_proto_state_machine]
>>>             0-HA-fast-150G-PVE1-client-0: reading from socket
>>>             failed. Error (Connection timed out), peer
>>>             (10.250.0.1:49153 <http://10.250.0.1:49153>)
>>>             [2014-08-05 13:26:21.281507] I
>>>             [client.c:2098:client_rpc_notify]
>>>             0-HA-fast-150G-PVE1-client-0: disconnected
>>>
>>>             the fast one:
>>>             2014-08-05 12:52:44.607389] C
>>>             [client-handshake.c:127:rpc_client_ping_timer_expired]
>>>             0-HA-fast-150G-PVE1-client-1: server 10.250.0.2:49153
>>>             <http://10.250.0.2:49153> has not responded in the last
>>>             2 seconds, disconnecting.
>>>             [2014-08-05 12:52:44.607491] W
>>>             [socket.c:514:__socket_rwv]
>>>             0-HA-fast-150G-PVE1-client-1: readv failed (No data
>>>             available)
>>>             [2014-08-05 12:52:44.607585] E
>>>             [rpc-clnt.c:368:saved_frames_unwind]
>>>             (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0xf8)
>>>             [0x7fcb1b4b0558]
>>>             (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3)
>>>             [0x7fcb1b4aea63]
>>>             (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)
>>>             [0x7fcb1b4ae97e]))) 0-HA-fast-150G-PVE1-client-1: forced
>>>             unwinding frame type(GlusterFS 3.3) op(LOOKUP(27))
>>>             called at 2014-08-05 12:52:42.463881 (xid=0x381883x)
>>>             [2014-08-05 12:52:44.607604] W
>>>             [client-rpc-fops.c:2624:client3_3_lookup_cbk]
>>>             0-HA-fast-150G-PVE1-client-1: remote operation failed:
>>>             Transport endpoint is not connected. Path: /
>>>             (00000000-0000-0000-0000-000000000001)
>>>             [2014-08-05 12:52:44.607736] E
>>>             [rpc-clnt.c:368:saved_frames_unwind]
>>>             (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0xf8)
>>>             [0x7fcb1b4b0558]
>>>             (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3)
>>>             [0x7fcb1b4aea63]
>>>             (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)
>>>             [0x7fcb1b4ae97e]))) 0-HA-fast-150G-PVE1-client-1: forced
>>>             unwinding frame type(GlusterFS Handshake) op(PING(3))
>>>             called at 2014-08-05 12:52:42.463891 (xid=0x381884x)
>>>             [2014-08-05 12:52:44.607753] W
>>>             [client-handshake.c:276:client_ping_cbk]
>>>             0-HA-fast-150G-PVE1-client-1: timer must have expired
>>>             [2014-08-05 12:52:44.607776] I
>>>             [client.c:2098:client_rpc_notify]
>>>             0-HA-fast-150G-PVE1-client-1: disconnected
>>>
>>>
>>>
>>>             I've got SSD disks (just for an info).
>>>             Should I go and give a try for 3.5.2?
>>>
>>>
>>>
>>>             2014-08-05 13:06 GMT+03:00 Pranith Kumar Karampuri
>>>             <pkarampu at redhat.com <mailto:pkarampu at redhat.com>>:
>>>
>>>                 reply along with gluster-users please :-). May be
>>>                 you are hitting 'reply' instead of 'reply all'?
>>>
>>>                 Pranith
>>>
>>>                 On 08/05/2014 03:35 PM, Roman wrote:
>>>>                 To make sure and clean, I've created another VM
>>>>                 with raw format and goint to repeat those steps. So
>>>>                 now I've got two VM-s one with qcow2 format and
>>>>                 other with raw format. I will send another e-mail
>>>>                 shortly.
>>>>
>>>>
>>>>                 2014-08-05 13:01 GMT+03:00 Pranith Kumar Karampuri
>>>>                 <pkarampu at redhat.com <mailto:pkarampu at redhat.com>>:
>>>>
>>>>
>>>>                     On 08/05/2014 03:07 PM, Roman wrote:
>>>>>                     really, seems like the same file
>>>>>
>>>>>                     stor1:
>>>>>                     a951641c5230472929836f9fcede6b04
>>>>>                      /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>
>>>>>                     stor2:
>>>>>                     a951641c5230472929836f9fcede6b04
>>>>>                      /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>
>>>>>
>>>>>                     one thing I've seen from logs, that somehow
>>>>>                     proxmox VE is connecting with wrong version to
>>>>>                     servers?
>>>>>                     [2014-08-05 09:23:45.218550] I
>>>>>                     [client-handshake.c:1659:select_server_supported_programs]
>>>>>                     0-HA-fast-150G-PVE1-client-0: Using Program
>>>>>                     GlusterFS 3.3, Num (1298437), Version (330)
>>>>                     It is the rpc (over the network data
>>>>                     structures) version, which is not changed at
>>>>                     all from 3.3 so thats not a problem. So what is
>>>>                     the conclusion? Is your test case working now
>>>>                     or not?
>>>>
>>>>                     Pranith
>>>>
>>>>>                     but if I issue:
>>>>>                     root at pve1:~# glusterfs -V
>>>>>                     glusterfs 3.4.4 built on Jun 28 2014 03:44:57
>>>>>                     seems ok.
>>>>>
>>>>>                     server  use 3.4.4 meanwhile
>>>>>                     [2014-08-05 09:23:45.117875] I
>>>>>                     [server-handshake.c:567:server_setvolume]
>>>>>                     0-HA-fast-150G-PVE1-server: accepted client
>>>>>                     from
>>>>>                     stor1-9004-2014/08/05-09:23:45:93538-HA-fast-150G-PVE1-client-1-0
>>>>>                     (version: 3.4.4)
>>>>>                     [2014-08-05 09:23:49.103035] I
>>>>>                     [server-handshake.c:567:server_setvolume]
>>>>>                     0-HA-fast-150G-PVE1-server: accepted client
>>>>>                     from
>>>>>                     stor1-8998-2014/08/05-09:23:45:89883-HA-fast-150G-PVE1-client-0-0
>>>>>                     (version: 3.4.4)
>>>>>
>>>>>                     if this could be the reason, of course.
>>>>>                     I did restart the Proxmox VE yesterday (just
>>>>>                     for an information)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>                     2014-08-05 12:30 GMT+03:00 Pranith Kumar
>>>>>                     Karampuri <pkarampu at redhat.com
>>>>>                     <mailto:pkarampu at redhat.com>>:
>>>>>
>>>>>
>>>>>                         On 08/05/2014 02:33 PM, Roman wrote:
>>>>>>                         Waited long enough for now, still
>>>>>>                         different sizes and no logs about healing :(
>>>>>>
>>>>>>                         stor1
>>>>>>                         # file:
>>>>>>                         exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>>                         trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000
>>>>>>                         trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000
>>>>>>                         trusted.gfid=0xf10ad81b58484bcd9b385a36a207f921
>>>>>>
>>>>>>                         root at stor1:~# du -sh
>>>>>>                         /exports/fast-test/150G/images/127/
>>>>>>                         1.2G  /exports/fast-test/150G/images/127/
>>>>>>
>>>>>>
>>>>>>                         stor2
>>>>>>                         # file:
>>>>>>                         exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>>                         trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000
>>>>>>                         trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000
>>>>>>                         trusted.gfid=0xf10ad81b58484bcd9b385a36a207f921
>>>>>>
>>>>>>
>>>>>>                         root at stor2:~# du -sh
>>>>>>                         /exports/fast-test/150G/images/127/
>>>>>>                         1.4G  /exports/fast-test/150G/images/127/
>>>>>                         According to the changelogs, the file
>>>>>                         doesn't need any healing. Could you stop
>>>>>                         the operations on the VMs and take md5sum
>>>>>                         on both these machines?
>>>>>
>>>>>                         Pranith
>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>                         2014-08-05 11:49 GMT+03:00 Pranith Kumar
>>>>>>                         Karampuri <pkarampu at redhat.com
>>>>>>                         <mailto:pkarampu at redhat.com>>:
>>>>>>
>>>>>>
>>>>>>                             On 08/05/2014 02:06 PM, Roman wrote:
>>>>>>>                             Well, it seems like it doesn't see
>>>>>>>                             the changes were made to the volume
>>>>>>>                             ? I created two files 200 and 100 MB
>>>>>>>                             (from /dev/zero) after I
>>>>>>>                             disconnected the first brick. Then
>>>>>>>                             connected it back and got these logs:
>>>>>>>
>>>>>>>                             [2014-08-05 08:30:37.830150] I
>>>>>>>                             [glusterfsd-mgmt.c:1584:mgmt_getspec_cbk]
>>>>>>>                             0-glusterfs: No change in volfile,
>>>>>>>                             continuing
>>>>>>>                             [2014-08-05 08:30:37.830207] I
>>>>>>>                             [rpc-clnt.c:1676:rpc_clnt_reconfig]
>>>>>>>                             0-HA-fast-150G-PVE1-client-0:
>>>>>>>                             changing port to 49153 (from 0)
>>>>>>>                             [2014-08-05 08:30:37.830239] W
>>>>>>>                             [socket.c:514:__socket_rwv]
>>>>>>>                             0-HA-fast-150G-PVE1-client-0: readv
>>>>>>>                             failed (No data available)
>>>>>>>                             [2014-08-05 08:30:37.831024] I
>>>>>>>                             [client-handshake.c:1659:select_server_supported_programs]
>>>>>>>                             0-HA-fast-150G-PVE1-client-0: Using
>>>>>>>                             Program GlusterFS 3.3, Num
>>>>>>>                             (1298437), Version (330)
>>>>>>>                             [2014-08-05 08:30:37.831375] I
>>>>>>>                             [client-handshake.c:1456:client_setvolume_cbk]
>>>>>>>                             0-HA-fast-150G-PVE1-client-0:
>>>>>>>                             Connected to 10.250.0.1:49153
>>>>>>>                             <http://10.250.0.1:49153>, attached
>>>>>>>                             to remote volume
>>>>>>>                             '/exports/fast-test/150G'.
>>>>>>>                             [2014-08-05 08:30:37.831394] I
>>>>>>>                             [client-handshake.c:1468:client_setvolume_cbk]
>>>>>>>                             0-HA-fast-150G-PVE1-client-0: Server
>>>>>>>                             and Client lk-version numbers are
>>>>>>>                             not same, reopening the fds
>>>>>>>                             [2014-08-05 08:30:37.831566] I
>>>>>>>                             [client-handshake.c:450:client_set_lk_version_cbk]
>>>>>>>                             0-HA-fast-150G-PVE1-client-0: Server
>>>>>>>                             lk version = 1
>>>>>>>
>>>>>>>
>>>>>>>                             [2014-08-05 08:30:37.830150] I
>>>>>>>                             [glusterfsd-mgmt.c:1584:mgmt_getspec_cbk]
>>>>>>>                             0-glusterfs: No change in volfile,
>>>>>>>                             continuing
>>>>>>>                             this line seems weird to me tbh.
>>>>>>>                             I do not see any traffic on switch
>>>>>>>                             interfaces between gluster servers,
>>>>>>>                             which means, there is no syncing
>>>>>>>                             between them.
>>>>>>>                             I tried to ls -l the files on the
>>>>>>>                             client and servers to trigger the
>>>>>>>                             healing, but seems like no success.
>>>>>>>                             Should I wait more?
>>>>>>                             Yes, it should take around 10-15
>>>>>>                             minutes. Could you provide 'getfattr
>>>>>>                             -d -m. -e hex <file-on-brick>' on
>>>>>>                             both the bricks.
>>>>>>
>>>>>>                             Pranith
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>                             2014-08-05 11:25 GMT+03:00 Pranith
>>>>>>>                             Kumar Karampuri <pkarampu at redhat.com
>>>>>>>                             <mailto:pkarampu at redhat.com>>:
>>>>>>>
>>>>>>>
>>>>>>>                                 On 08/05/2014 01:10 PM, Roman wrote:
>>>>>>>>                                 Ahha! For some reason I was not
>>>>>>>>                                 able to start the VM anymore,
>>>>>>>>                                 Proxmox VE told me, that it is
>>>>>>>>                                 not able to read the qcow2
>>>>>>>>                                 header due to permission is
>>>>>>>>                                 denied for some reason. So I
>>>>>>>>                                 just deleted that file and
>>>>>>>>                                 created a new VM. And the nex
>>>>>>>>                                 message I've got was this:
>>>>>>>                                 Seems like these are the
>>>>>>>                                 messages where you took down the
>>>>>>>                                 bricks before self-heal. Could
>>>>>>>                                 you restart the run waiting for
>>>>>>>                                 self-heals to complete before
>>>>>>>                                 taking down the next brick?
>>>>>>>
>>>>>>>                                 Pranith
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                                 [2014-08-05 07:31:25.663412] E
>>>>>>>>                                 [afr-self-heal-common.c:197:afr_sh_print_split_brain_log]
>>>>>>>>                                 0-HA-fast-150G-PVE1-replicate-0: Unable
>>>>>>>>                                 to self-heal contents of
>>>>>>>>                                 '/images/124/vm-124-disk-1.qcow2'
>>>>>>>>                                 (possible split-brain). Please
>>>>>>>>                                 delete the file from all but
>>>>>>>>                                 the preferred subvolume.-
>>>>>>>>                                 Pending matrix:  [ [ 0 60 ] [
>>>>>>>>                                 11 0 ] ]
>>>>>>>>                                 [2014-08-05 07:31:25.663955] E
>>>>>>>>                                 [afr-self-heal-common.c:2262:afr_self_heal_completion_cbk]
>>>>>>>>                                 0-HA-fast-150G-PVE1-replicate-0: background
>>>>>>>>                                  data self-heal failed on
>>>>>>>>                                 /images/124/vm-124-disk-1.qcow2
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                                 2014-08-05 10:13 GMT+03:00
>>>>>>>>                                 Pranith Kumar Karampuri
>>>>>>>>                                 <pkarampu at redhat.com
>>>>>>>>                                 <mailto:pkarampu at redhat.com>>:
>>>>>>>>
>>>>>>>>                                     I just responded to your
>>>>>>>>                                     earlier mail about how the
>>>>>>>>                                     log looks. The log comes on
>>>>>>>>                                     the mount's logfile
>>>>>>>>
>>>>>>>>                                     Pranith
>>>>>>>>
>>>>>>>>                                     On 08/05/2014 12:41 PM,
>>>>>>>>                                     Roman wrote:
>>>>>>>>>                                     Ok, so I've waited enough,
>>>>>>>>>                                     I think. Had no any
>>>>>>>>>                                     traffic on switch ports
>>>>>>>>>                                     between servers. Could not
>>>>>>>>>                                     find any suitable log
>>>>>>>>>                                     message about completed
>>>>>>>>>                                     self-heal (waited about 30
>>>>>>>>>                                     minutes). Plugged out the
>>>>>>>>>                                     other server's UTP cable
>>>>>>>>>                                     this time and got in the
>>>>>>>>>                                     same situation:
>>>>>>>>>                                     root at gluster-test1:~# cat
>>>>>>>>>                                     /var/log/dmesg
>>>>>>>>>                                     -bash: /bin/cat:
>>>>>>>>>                                     Input/output error
>>>>>>>>>
>>>>>>>>>                                     brick logs:
>>>>>>>>>                                     [2014-08-05
>>>>>>>>>                                     07:09:03.005474] I
>>>>>>>>>                                     [server.c:762:server_rpc_notify]
>>>>>>>>>                                     0-HA-fast-150G-PVE1-server: disconnecting
>>>>>>>>>                                     connectionfrom
>>>>>>>>>                                     pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
>>>>>>>>>                                     [2014-08-05
>>>>>>>>>                                     07:09:03.005530] I
>>>>>>>>>                                     [server-helpers.c:729:server_connection_put]
>>>>>>>>>                                     0-HA-fast-150G-PVE1-server: Shutting
>>>>>>>>>                                     down connection
>>>>>>>>>                                     pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
>>>>>>>>>                                     [2014-08-05
>>>>>>>>>                                     07:09:03.005560] I
>>>>>>>>>                                     [server-helpers.c:463:do_fd_cleanup]
>>>>>>>>>                                     0-HA-fast-150G-PVE1-server: fd
>>>>>>>>>                                     cleanup on
>>>>>>>>>                                     /images/124/vm-124-disk-1.qcow2
>>>>>>>>>                                     [2014-08-05
>>>>>>>>>                                     07:09:03.005797] I
>>>>>>>>>                                     [server-helpers.c:617:server_connection_destroy]
>>>>>>>>>                                     0-HA-fast-150G-PVE1-server: destroyed
>>>>>>>>>                                     connection of
>>>>>>>>>                                     pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                                     2014-08-05 9:53 GMT+03:00
>>>>>>>>>                                     Pranith Kumar Karampuri
>>>>>>>>>                                     <pkarampu at redhat.com
>>>>>>>>>                                     <mailto:pkarampu at redhat.com>>:
>>>>>>>>>
>>>>>>>>>                                         Do you think it is
>>>>>>>>>                                         possible for you to do
>>>>>>>>>                                         these tests on the
>>>>>>>>>                                         latest version 3.5.2?
>>>>>>>>>                                         'gluster volume heal
>>>>>>>>>                                         <volname> info' would
>>>>>>>>>                                         give you that
>>>>>>>>>                                         information in
>>>>>>>>>                                         versions > 3.5.1.
>>>>>>>>>                                         Otherwise you will
>>>>>>>>>                                         have to check it from
>>>>>>>>>                                         either the logs, there
>>>>>>>>>                                         will be self-heal
>>>>>>>>>                                         completed message on
>>>>>>>>>                                         the mount logs (or) by
>>>>>>>>>                                         observing 'getfattr -d
>>>>>>>>>                                         -m. -e hex
>>>>>>>>>                                         <image-file-on-bricks>'
>>>>>>>>>
>>>>>>>>>                                         Pranith
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                                         On 08/05/2014 12:09
>>>>>>>>>                                         PM, Roman wrote:
>>>>>>>>>>                                         Ok, I understand. I
>>>>>>>>>>                                         will try this shortly.
>>>>>>>>>>                                         How can I be sure,
>>>>>>>>>>                                         that healing process
>>>>>>>>>>                                         is done, if I am not
>>>>>>>>>>                                         able to see its status?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                                         2014-08-05 9:30
>>>>>>>>>>                                         GMT+03:00 Pranith
>>>>>>>>>>                                         Kumar Karampuri
>>>>>>>>>>                                         <pkarampu at redhat.com
>>>>>>>>>>                                         <mailto:pkarampu at redhat.com>>:
>>>>>>>>>>
>>>>>>>>>>                                             Mounts will do
>>>>>>>>>>                                             the healing, not
>>>>>>>>>>                                             the
>>>>>>>>>>                                             self-heal-daemon.
>>>>>>>>>>                                             The problem I
>>>>>>>>>>                                             feel is that
>>>>>>>>>>                                             whichever process
>>>>>>>>>>                                             does the healing
>>>>>>>>>>                                             has the latest
>>>>>>>>>>                                             information about
>>>>>>>>>>                                             the good bricks
>>>>>>>>>>                                             in this usecase.
>>>>>>>>>>                                             Since for VM
>>>>>>>>>>                                             usecase, mounts
>>>>>>>>>>                                             should have the
>>>>>>>>>>                                             latest
>>>>>>>>>>                                             information, we
>>>>>>>>>>                                             should let the
>>>>>>>>>>                                             mounts do the
>>>>>>>>>>                                             healing. If the
>>>>>>>>>>                                             mount accesses
>>>>>>>>>>                                             the VM image
>>>>>>>>>>                                             either by someone
>>>>>>>>>>                                             doing operations
>>>>>>>>>>                                             inside the VM or
>>>>>>>>>>                                             explicit stat on
>>>>>>>>>>                                             the file it
>>>>>>>>>>                                             should do the
>>>>>>>>>>                                             healing.
>>>>>>>>>>
>>>>>>>>>>                                             Pranith.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                                             On 08/05/2014
>>>>>>>>>>                                             10:39 AM, Roman
>>>>>>>>>>                                             wrote:
>>>>>>>>>>>                                             Hmmm, you told
>>>>>>>>>>>                                             me to turn it
>>>>>>>>>>>                                             off. Did I
>>>>>>>>>>>                                             understood
>>>>>>>>>>>                                             something wrong?
>>>>>>>>>>>                                             After I issued
>>>>>>>>>>>                                             the command
>>>>>>>>>>>                                             you've sent me,
>>>>>>>>>>>                                             I was not able
>>>>>>>>>>>                                             to watch the
>>>>>>>>>>>                                             healing process,
>>>>>>>>>>>                                             it said, it
>>>>>>>>>>>                                             won't be healed,
>>>>>>>>>>>                                             becouse its
>>>>>>>>>>>                                             turned off.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                                             2014-08-05 5:39
>>>>>>>>>>>                                             GMT+03:00
>>>>>>>>>>>                                             Pranith Kumar
>>>>>>>>>>>                                             Karampuri
>>>>>>>>>>>                                             <pkarampu at redhat.com
>>>>>>>>>>>                                             <mailto:pkarampu at redhat.com>>:
>>>>>>>>>>>
>>>>>>>>>>>                                                 You didn't
>>>>>>>>>>>                                                 mention
>>>>>>>>>>>                                                 anything
>>>>>>>>>>>                                                 about
>>>>>>>>>>>                                                 self-healing. Did
>>>>>>>>>>>                                                 you wait
>>>>>>>>>>>                                                 until the
>>>>>>>>>>>                                                 self-heal is
>>>>>>>>>>>                                                 complete?
>>>>>>>>>>>
>>>>>>>>>>>                                                 Pranith
>>>>>>>>>>>
>>>>>>>>>>>                                                 On
>>>>>>>>>>>                                                 08/04/2014
>>>>>>>>>>>                                                 05:49 PM,
>>>>>>>>>>>                                                 Roman wrote:
>>>>>>>>>>>>                                                 Hi!
>>>>>>>>>>>>                                                 Result is
>>>>>>>>>>>>                                                 pretty
>>>>>>>>>>>>                                                 same. I set
>>>>>>>>>>>>                                                 the switch
>>>>>>>>>>>>                                                 port down
>>>>>>>>>>>>                                                 for 1st
>>>>>>>>>>>>                                                 server, it
>>>>>>>>>>>>                                                 was ok.
>>>>>>>>>>>>                                                 Then set it
>>>>>>>>>>>>                                                 up back and
>>>>>>>>>>>>                                                 set other
>>>>>>>>>>>>                                                 server's
>>>>>>>>>>>>                                                 port off.
>>>>>>>>>>>>                                                 and it
>>>>>>>>>>>>                                                 triggered
>>>>>>>>>>>>                                                 IO error on
>>>>>>>>>>>>                                                 two virtual
>>>>>>>>>>>>                                                 machines:
>>>>>>>>>>>>                                                 one with
>>>>>>>>>>>>                                                 local root
>>>>>>>>>>>>                                                 FS but
>>>>>>>>>>>>                                                 network
>>>>>>>>>>>>                                                 mounted
>>>>>>>>>>>>                                                 storage.
>>>>>>>>>>>>                                                 and other
>>>>>>>>>>>>                                                 with
>>>>>>>>>>>>                                                 network
>>>>>>>>>>>>                                                 root FS.
>>>>>>>>>>>>                                                 1st gave an
>>>>>>>>>>>>                                                 error on
>>>>>>>>>>>>                                                 copying to
>>>>>>>>>>>>                                                 or from the
>>>>>>>>>>>>                                                 mounted
>>>>>>>>>>>>                                                 network
>>>>>>>>>>>>                                                 disk, other
>>>>>>>>>>>>                                                 just gave
>>>>>>>>>>>>                                                 me an error
>>>>>>>>>>>>                                                 for even
>>>>>>>>>>>>                                                 reading
>>>>>>>>>>>>                                                 log.files.
>>>>>>>>>>>>
>>>>>>>>>>>>                                                 cat:
>>>>>>>>>>>>                                                 /var/log/alternatives.log:
>>>>>>>>>>>>                                                 Input/output error
>>>>>>>>>>>>                                                 then I
>>>>>>>>>>>>                                                 reset the
>>>>>>>>>>>>                                                 kvm VM and
>>>>>>>>>>>>                                                 it said me,
>>>>>>>>>>>>                                                 there is no
>>>>>>>>>>>>                                                 boot
>>>>>>>>>>>>                                                 device.
>>>>>>>>>>>>                                                 Next I
>>>>>>>>>>>>                                                 virtually
>>>>>>>>>>>>                                                 powered it
>>>>>>>>>>>>                                                 off and
>>>>>>>>>>>>                                                 then back
>>>>>>>>>>>>                                                 on and it
>>>>>>>>>>>>                                                 has booted.
>>>>>>>>>>>>
>>>>>>>>>>>>                                                 By the way,
>>>>>>>>>>>>                                                 did I have
>>>>>>>>>>>>                                                 to
>>>>>>>>>>>>                                                 start/stop
>>>>>>>>>>>>                                                 volume?
>>>>>>>>>>>>
>>>>>>>>>>>>                                                 >> Could
>>>>>>>>>>>>                                                 you do the
>>>>>>>>>>>>                                                 following
>>>>>>>>>>>>                                                 and test it
>>>>>>>>>>>>                                                 again?
>>>>>>>>>>>>                                                 >> gluster
>>>>>>>>>>>>                                                 volume set
>>>>>>>>>>>>                                                 <volname>
>>>>>>>>>>>>                                                 cluster.self-heal-daemon
>>>>>>>>>>>>                                                 off
>>>>>>>>>>>>
>>>>>>>>>>>>                                                 >>Pranith
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>                                                 2014-08-04
>>>>>>>>>>>>                                                 14:10
>>>>>>>>>>>>                                                 GMT+03:00
>>>>>>>>>>>>                                                 Pranith
>>>>>>>>>>>>                                                 Kumar
>>>>>>>>>>>>                                                 Karampuri
>>>>>>>>>>>>                                                 <pkarampu at redhat.com
>>>>>>>>>>>>                                                 <mailto:pkarampu at redhat.com>>:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>                                                     On
>>>>>>>>>>>>                                                     08/04/2014
>>>>>>>>>>>>                                                     03:33
>>>>>>>>>>>>                                                     PM,
>>>>>>>>>>>>                                                     Roman
>>>>>>>>>>>>                                                     wrote:
>>>>>>>>>>>>>                                                     Hello!
>>>>>>>>>>>>>
>>>>>>>>>>>>>                                                     Facing
>>>>>>>>>>>>>                                                     the
>>>>>>>>>>>>>                                                     same
>>>>>>>>>>>>>                                                     problem as
>>>>>>>>>>>>>                                                     mentioned
>>>>>>>>>>>>>                                                     here:
>>>>>>>>>>>>>
>>>>>>>>>>>>>                                                     http://supercolony.gluster.org/pipermail/gluster-users/2014-April/039959.html
>>>>>>>>>>>>>
>>>>>>>>>>>>>                                                     my set
>>>>>>>>>>>>>                                                     up is
>>>>>>>>>>>>>                                                     up and
>>>>>>>>>>>>>                                                     running,
>>>>>>>>>>>>>                                                     so i'm
>>>>>>>>>>>>>                                                     ready
>>>>>>>>>>>>>                                                     to
>>>>>>>>>>>>>                                                     help
>>>>>>>>>>>>>                                                     you
>>>>>>>>>>>>>                                                     back
>>>>>>>>>>>>>                                                     with
>>>>>>>>>>>>>                                                     feedback.
>>>>>>>>>>>>>
>>>>>>>>>>>>>                                                     setup:
>>>>>>>>>>>>>                                                     proxmox server
>>>>>>>>>>>>>                                                     as client
>>>>>>>>>>>>>                                                     2
>>>>>>>>>>>>>                                                     gluster physical
>>>>>>>>>>>>>                                                      servers
>>>>>>>>>>>>>
>>>>>>>>>>>>>                                                     server
>>>>>>>>>>>>>                                                     side
>>>>>>>>>>>>>                                                     and
>>>>>>>>>>>>>                                                     client
>>>>>>>>>>>>>                                                     side
>>>>>>>>>>>>>                                                     both
>>>>>>>>>>>>>                                                     running atm
>>>>>>>>>>>>>                                                     3.4.4
>>>>>>>>>>>>>                                                     glusterfs
>>>>>>>>>>>>>                                                     from
>>>>>>>>>>>>>                                                     gluster repo.
>>>>>>>>>>>>>
>>>>>>>>>>>>>                                                     the
>>>>>>>>>>>>>                                                     problem is:
>>>>>>>>>>>>>
>>>>>>>>>>>>>                                                     1.
>>>>>>>>>>>>>                                                     craeted replica
>>>>>>>>>>>>>                                                     bricks.
>>>>>>>>>>>>>                                                     2.
>>>>>>>>>>>>>                                                     mounted in
>>>>>>>>>>>>>                                                     proxmox (tried
>>>>>>>>>>>>>                                                     both
>>>>>>>>>>>>>                                                     promox
>>>>>>>>>>>>>                                                     ways:
>>>>>>>>>>>>>                                                     via
>>>>>>>>>>>>>                                                     GUI
>>>>>>>>>>>>>                                                     and
>>>>>>>>>>>>>                                                     fstab
>>>>>>>>>>>>>                                                     (with
>>>>>>>>>>>>>                                                     backup
>>>>>>>>>>>>>                                                     volume
>>>>>>>>>>>>>                                                     line),
>>>>>>>>>>>>>                                                     btw
>>>>>>>>>>>>>                                                     while
>>>>>>>>>>>>>                                                     mounting
>>>>>>>>>>>>>                                                     via
>>>>>>>>>>>>>                                                     fstab
>>>>>>>>>>>>>                                                     I'm
>>>>>>>>>>>>>                                                     unable
>>>>>>>>>>>>>                                                     to
>>>>>>>>>>>>>                                                     launch
>>>>>>>>>>>>>                                                     a VM
>>>>>>>>>>>>>                                                     without cache,
>>>>>>>>>>>>>                                                     meanwhile
>>>>>>>>>>>>>                                                     direct-io-mode
>>>>>>>>>>>>>                                                     is
>>>>>>>>>>>>>                                                     enabled in
>>>>>>>>>>>>>                                                     fstab
>>>>>>>>>>>>>                                                     line)
>>>>>>>>>>>>>                                                     3.
>>>>>>>>>>>>>                                                     installed
>>>>>>>>>>>>>                                                     VM
>>>>>>>>>>>>>                                                     4.
>>>>>>>>>>>>>                                                     bring
>>>>>>>>>>>>>                                                     one
>>>>>>>>>>>>>                                                     volume
>>>>>>>>>>>>>                                                     down - ok
>>>>>>>>>>>>>                                                     5.
>>>>>>>>>>>>>                                                     bringing
>>>>>>>>>>>>>                                                     up,
>>>>>>>>>>>>>                                                     waiting for
>>>>>>>>>>>>>                                                     sync
>>>>>>>>>>>>>                                                     is done.
>>>>>>>>>>>>>                                                     6.
>>>>>>>>>>>>>                                                     bring
>>>>>>>>>>>>>                                                     other
>>>>>>>>>>>>>                                                     volume
>>>>>>>>>>>>>                                                     down -
>>>>>>>>>>>>>                                                     getting IO
>>>>>>>>>>>>>                                                     errors
>>>>>>>>>>>>>                                                     on VM
>>>>>>>>>>>>>                                                     guest
>>>>>>>>>>>>>                                                     and
>>>>>>>>>>>>>                                                     not
>>>>>>>>>>>>>                                                     able
>>>>>>>>>>>>>                                                     to
>>>>>>>>>>>>>                                                     restore the
>>>>>>>>>>>>>                                                     VM
>>>>>>>>>>>>>                                                     after
>>>>>>>>>>>>>                                                     I
>>>>>>>>>>>>>                                                     reset
>>>>>>>>>>>>>                                                     the VM
>>>>>>>>>>>>>                                                     via
>>>>>>>>>>>>>                                                     host.
>>>>>>>>>>>>>                                                     It
>>>>>>>>>>>>>                                                     says
>>>>>>>>>>>>>                                                     (no
>>>>>>>>>>>>>                                                     bootable
>>>>>>>>>>>>>                                                     media). After
>>>>>>>>>>>>>                                                     I shut
>>>>>>>>>>>>>                                                     it
>>>>>>>>>>>>>                                                     down
>>>>>>>>>>>>>                                                     (forced)
>>>>>>>>>>>>>                                                     and
>>>>>>>>>>>>>                                                     bring
>>>>>>>>>>>>>                                                     back
>>>>>>>>>>>>>                                                     up, it
>>>>>>>>>>>>>                                                     boots.
>>>>>>>>>>>>                                                     Could
>>>>>>>>>>>>                                                     you do
>>>>>>>>>>>>                                                     the
>>>>>>>>>>>>                                                     following
>>>>>>>>>>>>                                                     and
>>>>>>>>>>>>                                                     test it
>>>>>>>>>>>>                                                     again?
>>>>>>>>>>>>                                                     gluster
>>>>>>>>>>>>                                                     volume
>>>>>>>>>>>>                                                     set
>>>>>>>>>>>>                                                     <volname>
>>>>>>>>>>>>                                                     cluster.self-heal-daemon
>>>>>>>>>>>>                                                     off
>>>>>>>>>>>>
>>>>>>>>>>>>                                                     Pranith
>>>>>>>>>>>>>
>>>>>>>>>>>>>                                                     Need
>>>>>>>>>>>>>                                                     help.
>>>>>>>>>>>>>                                                     Tried
>>>>>>>>>>>>>                                                     3.4.3,
>>>>>>>>>>>>>                                                     3.4.4.
>>>>>>>>>>>>>                                                     Still
>>>>>>>>>>>>>                                                     missing pkg-s
>>>>>>>>>>>>>                                                     for
>>>>>>>>>>>>>                                                     3.4.5
>>>>>>>>>>>>>                                                     for
>>>>>>>>>>>>>                                                     debian
>>>>>>>>>>>>>                                                     and
>>>>>>>>>>>>>                                                     3.5.2
>>>>>>>>>>>>>                                                     (3.5.1
>>>>>>>>>>>>>                                                     always
>>>>>>>>>>>>>                                                     gives
>>>>>>>>>>>>>                                                     a
>>>>>>>>>>>>>                                                     healing error
>>>>>>>>>>>>>                                                     for
>>>>>>>>>>>>>                                                     some
>>>>>>>>>>>>>                                                     reason)
>>>>>>>>>>>>>
>>>>>>>>>>>>>                                                     -- 
>>>>>>>>>>>>>                                                     Best
>>>>>>>>>>>>>                                                     regards,
>>>>>>>>>>>>>                                                     Roman.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>                                                     _______________________________________________
>>>>>>>>>>>>>                                                     Gluster-users mailing list
>>>>>>>>>>>>>                                                     Gluster-users at gluster.org  <mailto:Gluster-users at gluster.org>
>>>>>>>>>>>>>                                                     http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>                                                 -- 
>>>>>>>>>>>>                                                 Best regards,
>>>>>>>>>>>>                                                 Roman.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                                             -- 
>>>>>>>>>>>                                             Best regards,
>>>>>>>>>>>                                             Roman.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                                         -- 
>>>>>>>>>>                                         Best regards,
>>>>>>>>>>                                         Roman.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                                     -- 
>>>>>>>>>                                     Best regards,
>>>>>>>>>                                     Roman.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                                 -- 
>>>>>>>>                                 Best regards,
>>>>>>>>                                 Roman.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>                             -- 
>>>>>>>                             Best regards,
>>>>>>>                             Roman.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>                         -- 
>>>>>>                         Best regards,
>>>>>>                         Roman.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>                     -- 
>>>>>                     Best regards,
>>>>>                     Roman.
>>>>
>>>>
>>>>
>>>>
>>>>                 -- 
>>>>                 Best regards,
>>>>                 Roman.
>>>
>>>
>>>
>>>
>>>             -- 
>>>             Best regards,
>>>             Roman.
>>>
>>>
>>>
>>>
>>>         -- 
>>>         Best regards,
>>>         Roman.
>>
>>
>>
>>
>>     -- 
>>     Best regards,
>>     Roman.
>
>
>
>
> -- 
> Best regards,
> Roman.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140806/87efccc5/attachment.html>