[Gluster-users] libgfapi failover problem on replica bricks

Pranith Kumar Karampuri pkarampu at redhat.com
Wed Aug 6 06:39:57 UTC 2014


Roman,
     The file went into split-brain. I think we should do these tests 
with 3.5.2. Where monitoring the heals is easier. Let me also come up 
with a document about how to do this testing you are trying to do.

Humble/Niels,
     Do we have debs available for 3.5.2? In 3.5.1 there was packaging 
issue where /usr/bin/glfsheal is not packaged along with the deb. I 
think that should be fixed now as well?

Pranith

On 08/06/2014 11:52 AM, Roman wrote:
> good morning,
>
> root at stor1:~# getfattr -d -m. -e hex 
> /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
> getfattr: Removing leading '/' from absolute path names
> # file: exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
> trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000
> trusted.afr.HA-fast-150G-PVE1-client-1=0x000001320000000000000000
> trusted.gfid=0x23c79523075a4158bea38078da570449
>
> getfattr: Removing leading '/' from absolute path names
> # file: exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
> trusted.afr.HA-fast-150G-PVE1-client-0=0x000000040000000000000000
> trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000
> trusted.gfid=0x23c79523075a4158bea38078da570449
>
>
>
> 2014-08-06 9:20 GMT+03:00 Pranith Kumar Karampuri <pkarampu at redhat.com 
> <mailto:pkarampu at redhat.com>>:
>
>
>     On 08/06/2014 11:30 AM, Roman wrote:
>>     Also, this time files are not the same!
>>
>>     root at stor1:~# md5sum
>>     /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>     32411360c53116b96a059f17306caeda
>>      /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>
>>     root at stor2:~# md5sum
>>     /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>     65b8a6031bcb6f5fb3a11cb1e8b1c9c9
>>      /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>     What is the getfattr output?
>
>     Pranith
>
>>
>>
>>     2014-08-05 16:33 GMT+03:00 Roman <romeo.r at gmail.com
>>     <mailto:romeo.r at gmail.com>>:
>>
>>         Nope, it is not working. But this time it went a bit other way
>>
>>         root at gluster-client:~# dmesg
>>         Segmentation fault
>>
>>
>>         I was not able even to start the VM after I done the tests
>>
>>         Could not read qcow2 header: Operation not permitted
>>
>>         And it seems, it never starts to sync files after first
>>         disconnect. VM survives first disconnect, but not second (I
>>         waited around 30 minutes). Also, I've
>>         got network.ping-timeout: 2 in volume settings, but logs
>>         react on first disconnect around 30 seconds. Second was
>>         faster, 2 seconds.
>>
>>         Reaction was different also:
>>
>>         slower one:
>>         [2014-08-05 13:26:19.558435] W [socket.c:514:__socket_rwv]
>>         0-glusterfs: readv failed (Connection timed out)
>>         [2014-08-05 13:26:19.558485] W
>>         [socket.c:1962:__socket_proto_state_machine] 0-glusterfs:
>>         reading from socket failed. Error (Connection timed out),
>>         peer (10.250.0.1:24007 <http://10.250.0.1:24007>)
>>         [2014-08-05 13:26:21.281426] W [socket.c:514:__socket_rwv]
>>         0-HA-fast-150G-PVE1-client-0: readv failed (Connection timed out)
>>         [2014-08-05 13:26:21.281474] W
>>         [socket.c:1962:__socket_proto_state_machine]
>>         0-HA-fast-150G-PVE1-client-0: reading from socket failed.
>>         Error (Connection timed out), peer (10.250.0.1:49153
>>         <http://10.250.0.1:49153>)
>>         [2014-08-05 13:26:21.281507] I
>>         [client.c:2098:client_rpc_notify]
>>         0-HA-fast-150G-PVE1-client-0: disconnected
>>
>>         the fast one:
>>         2014-08-05 12:52:44.607389] C
>>         [client-handshake.c:127:rpc_client_ping_timer_expired]
>>         0-HA-fast-150G-PVE1-client-1: server 10.250.0.2:49153
>>         <http://10.250.0.2:49153> has not responded in the last 2
>>         seconds, disconnecting.
>>         [2014-08-05 12:52:44.607491] W [socket.c:514:__socket_rwv]
>>         0-HA-fast-150G-PVE1-client-1: readv failed (No data available)
>>         [2014-08-05 12:52:44.607585] E
>>         [rpc-clnt.c:368:saved_frames_unwind]
>>         (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0xf8)
>>         [0x7fcb1b4b0558]
>>         (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3)
>>         [0x7fcb1b4aea63]
>>         (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)
>>         [0x7fcb1b4ae97e]))) 0-HA-fast-150G-PVE1-client-1: forced
>>         unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at
>>         2014-08-05 12:52:42.463881 (xid=0x381883x)
>>         [2014-08-05 12:52:44.607604] W
>>         [client-rpc-fops.c:2624:client3_3_lookup_cbk]
>>         0-HA-fast-150G-PVE1-client-1: remote operation failed:
>>         Transport endpoint is not connected. Path: /
>>         (00000000-0000-0000-0000-000000000001)
>>         [2014-08-05 12:52:44.607736] E
>>         [rpc-clnt.c:368:saved_frames_unwind]
>>         (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0xf8)
>>         [0x7fcb1b4b0558]
>>         (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3)
>>         [0x7fcb1b4aea63]
>>         (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)
>>         [0x7fcb1b4ae97e]))) 0-HA-fast-150G-PVE1-client-1: forced
>>         unwinding frame type(GlusterFS Handshake) op(PING(3)) called
>>         at 2014-08-05 12:52:42.463891 (xid=0x381884x)
>>         [2014-08-05 12:52:44.607753] W
>>         [client-handshake.c:276:client_ping_cbk]
>>         0-HA-fast-150G-PVE1-client-1: timer must have expired
>>         [2014-08-05 12:52:44.607776] I
>>         [client.c:2098:client_rpc_notify]
>>         0-HA-fast-150G-PVE1-client-1: disconnected
>>
>>
>>
>>         I've got SSD disks (just for an info).
>>         Should I go and give a try for 3.5.2?
>>
>>
>>
>>         2014-08-05 13:06 GMT+03:00 Pranith Kumar Karampuri
>>         <pkarampu at redhat.com <mailto:pkarampu at redhat.com>>:
>>
>>             reply along with gluster-users please :-). May be you are
>>             hitting 'reply' instead of 'reply all'?
>>
>>             Pranith
>>
>>             On 08/05/2014 03:35 PM, Roman wrote:
>>>             To make sure and clean, I've created another VM with raw
>>>             format and goint to repeat those steps. So now I've got
>>>             two VM-s one with qcow2 format and other with raw
>>>             format. I will send another e-mail shortly.
>>>
>>>
>>>             2014-08-05 13:01 GMT+03:00 Pranith Kumar Karampuri
>>>             <pkarampu at redhat.com <mailto:pkarampu at redhat.com>>:
>>>
>>>
>>>                 On 08/05/2014 03:07 PM, Roman wrote:
>>>>                 really, seems like the same file
>>>>
>>>>                 stor1:
>>>>                 a951641c5230472929836f9fcede6b04
>>>>                  /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>
>>>>                 stor2:
>>>>                 a951641c5230472929836f9fcede6b04
>>>>                  /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>
>>>>
>>>>                 one thing I've seen from logs, that somehow proxmox
>>>>                 VE is connecting with wrong version to servers?
>>>>                 [2014-08-05 09:23:45.218550] I
>>>>                 [client-handshake.c:1659:select_server_supported_programs]
>>>>                 0-HA-fast-150G-PVE1-client-0: Using Program
>>>>                 GlusterFS 3.3, Num (1298437), Version (330)
>>>                 It is the rpc (over the network data structures)
>>>                 version, which is not changed at all from 3.3 so
>>>                 thats not a problem. So what is the conclusion? Is
>>>                 your test case working now or not?
>>>
>>>                 Pranith
>>>
>>>>                 but if I issue:
>>>>                 root at pve1:~# glusterfs -V
>>>>                 glusterfs 3.4.4 built on Jun 28 2014 03:44:57
>>>>                 seems ok.
>>>>
>>>>                 server  use 3.4.4 meanwhile
>>>>                 [2014-08-05 09:23:45.117875] I
>>>>                 [server-handshake.c:567:server_setvolume]
>>>>                 0-HA-fast-150G-PVE1-server: accepted client from
>>>>                 stor1-9004-2014/08/05-09:23:45:93538-HA-fast-150G-PVE1-client-1-0
>>>>                 (version: 3.4.4)
>>>>                 [2014-08-05 09:23:49.103035] I
>>>>                 [server-handshake.c:567:server_setvolume]
>>>>                 0-HA-fast-150G-PVE1-server: accepted client from
>>>>                 stor1-8998-2014/08/05-09:23:45:89883-HA-fast-150G-PVE1-client-0-0
>>>>                 (version: 3.4.4)
>>>>
>>>>                 if this could be the reason, of course.
>>>>                 I did restart the Proxmox VE yesterday (just for an
>>>>                 information)
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>                 2014-08-05 12:30 GMT+03:00 Pranith Kumar Karampuri
>>>>                 <pkarampu at redhat.com <mailto:pkarampu at redhat.com>>:
>>>>
>>>>
>>>>                     On 08/05/2014 02:33 PM, Roman wrote:
>>>>>                     Waited long enough for now, still different
>>>>>                     sizes and no logs about healing :(
>>>>>
>>>>>                     stor1
>>>>>                     # file:
>>>>>                     exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>                     trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000
>>>>>                     trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000
>>>>>                     trusted.gfid=0xf10ad81b58484bcd9b385a36a207f921
>>>>>
>>>>>                     root at stor1:~# du -sh
>>>>>                     /exports/fast-test/150G/images/127/
>>>>>                     1.2G  /exports/fast-test/150G/images/127/
>>>>>
>>>>>
>>>>>                     stor2
>>>>>                     # file:
>>>>>                     exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>                     trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000
>>>>>                     trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000
>>>>>                     trusted.gfid=0xf10ad81b58484bcd9b385a36a207f921
>>>>>
>>>>>
>>>>>                     root at stor2:~# du -sh
>>>>>                     /exports/fast-test/150G/images/127/
>>>>>                     1.4G  /exports/fast-test/150G/images/127/
>>>>                     According to the changelogs, the file doesn't
>>>>                     need any healing. Could you stop the operations
>>>>                     on the VMs and take md5sum on both these machines?
>>>>
>>>>                     Pranith
>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>                     2014-08-05 11:49 GMT+03:00 Pranith Kumar
>>>>>                     Karampuri <pkarampu at redhat.com
>>>>>                     <mailto:pkarampu at redhat.com>>:
>>>>>
>>>>>
>>>>>                         On 08/05/2014 02:06 PM, Roman wrote:
>>>>>>                         Well, it seems like it doesn't see the
>>>>>>                         changes were made to the volume ? I
>>>>>>                         created two files 200 and 100 MB (from
>>>>>>                         /dev/zero) after I disconnected the first
>>>>>>                         brick. Then connected it back and got
>>>>>>                         these logs:
>>>>>>
>>>>>>                         [2014-08-05 08:30:37.830150] I
>>>>>>                         [glusterfsd-mgmt.c:1584:mgmt_getspec_cbk]
>>>>>>                         0-glusterfs: No change in volfile, continuing
>>>>>>                         [2014-08-05 08:30:37.830207] I
>>>>>>                         [rpc-clnt.c:1676:rpc_clnt_reconfig]
>>>>>>                         0-HA-fast-150G-PVE1-client-0: changing
>>>>>>                         port to 49153 (from 0)
>>>>>>                         [2014-08-05 08:30:37.830239] W
>>>>>>                         [socket.c:514:__socket_rwv]
>>>>>>                         0-HA-fast-150G-PVE1-client-0: readv
>>>>>>                         failed (No data available)
>>>>>>                         [2014-08-05 08:30:37.831024] I
>>>>>>                         [client-handshake.c:1659:select_server_supported_programs]
>>>>>>                         0-HA-fast-150G-PVE1-client-0: Using
>>>>>>                         Program GlusterFS 3.3, Num (1298437),
>>>>>>                         Version (330)
>>>>>>                         [2014-08-05 08:30:37.831375] I
>>>>>>                         [client-handshake.c:1456:client_setvolume_cbk]
>>>>>>                         0-HA-fast-150G-PVE1-client-0: Connected
>>>>>>                         to 10.250.0.1:49153
>>>>>>                         <http://10.250.0.1:49153>, attached to
>>>>>>                         remote volume '/exports/fast-test/150G'.
>>>>>>                         [2014-08-05 08:30:37.831394] I
>>>>>>                         [client-handshake.c:1468:client_setvolume_cbk]
>>>>>>                         0-HA-fast-150G-PVE1-client-0: Server and
>>>>>>                         Client lk-version numbers are not same,
>>>>>>                         reopening the fds
>>>>>>                         [2014-08-05 08:30:37.831566] I
>>>>>>                         [client-handshake.c:450:client_set_lk_version_cbk]
>>>>>>                         0-HA-fast-150G-PVE1-client-0: Server lk
>>>>>>                         version = 1
>>>>>>
>>>>>>
>>>>>>                         [2014-08-05 08:30:37.830150] I
>>>>>>                         [glusterfsd-mgmt.c:1584:mgmt_getspec_cbk]
>>>>>>                         0-glusterfs: No change in volfile, continuing
>>>>>>                         this line seems weird to me tbh.
>>>>>>                         I do not see any traffic on switch
>>>>>>                         interfaces between gluster servers, which
>>>>>>                         means, there is no syncing between them.
>>>>>>                         I tried to ls -l the files on the client
>>>>>>                         and servers to trigger the healing, but
>>>>>>                         seems like no success. Should I wait more?
>>>>>                         Yes, it should take around 10-15 minutes.
>>>>>                         Could you provide 'getfattr -d -m. -e hex
>>>>>                         <file-on-brick>' on both the bricks.
>>>>>
>>>>>                         Pranith
>>>>>
>>>>>>
>>>>>>
>>>>>>                         2014-08-05 11:25 GMT+03:00 Pranith Kumar
>>>>>>                         Karampuri <pkarampu at redhat.com
>>>>>>                         <mailto:pkarampu at redhat.com>>:
>>>>>>
>>>>>>
>>>>>>                             On 08/05/2014 01:10 PM, Roman wrote:
>>>>>>>                             Ahha! For some reason I was not able
>>>>>>>                             to start the VM anymore, Proxmox VE
>>>>>>>                             told me, that it is not able to read
>>>>>>>                             the qcow2 header due to permission
>>>>>>>                             is denied for some reason. So I just
>>>>>>>                             deleted that file and created a new
>>>>>>>                             VM. And the nex message I've got was
>>>>>>>                             this:
>>>>>>                             Seems like these are the messages
>>>>>>                             where you took down the bricks before
>>>>>>                             self-heal. Could you restart the run
>>>>>>                             waiting for self-heals to complete
>>>>>>                             before taking down the next brick?
>>>>>>
>>>>>>                             Pranith
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>                             [2014-08-05 07:31:25.663412] E
>>>>>>>                             [afr-self-heal-common.c:197:afr_sh_print_split_brain_log]
>>>>>>>                             0-HA-fast-150G-PVE1-replicate-0:
>>>>>>>                             Unable to self-heal contents of
>>>>>>>                             '/images/124/vm-124-disk-1.qcow2'
>>>>>>>                             (possible split-brain). Please
>>>>>>>                             delete the file from all but the
>>>>>>>                             preferred subvolume.- Pending
>>>>>>>                             matrix:  [ [ 0 60 ] [ 11 0 ] ]
>>>>>>>                             [2014-08-05 07:31:25.663955] E
>>>>>>>                             [afr-self-heal-common.c:2262:afr_self_heal_completion_cbk]
>>>>>>>                             0-HA-fast-150G-PVE1-replicate-0:
>>>>>>>                             background  data self-heal failed on
>>>>>>>                             /images/124/vm-124-disk-1.qcow2
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>                             2014-08-05 10:13 GMT+03:00 Pranith
>>>>>>>                             Kumar Karampuri <pkarampu at redhat.com
>>>>>>>                             <mailto:pkarampu at redhat.com>>:
>>>>>>>
>>>>>>>                                 I just responded to your earlier
>>>>>>>                                 mail about how the log looks.
>>>>>>>                                 The log comes on the mount's logfile
>>>>>>>
>>>>>>>                                 Pranith
>>>>>>>
>>>>>>>                                 On 08/05/2014 12:41 PM, Roman wrote:
>>>>>>>>                                 Ok, so I've waited enough, I
>>>>>>>>                                 think. Had no any traffic on
>>>>>>>>                                 switch ports between servers.
>>>>>>>>                                 Could not find any suitable log
>>>>>>>>                                 message about completed
>>>>>>>>                                 self-heal (waited about 30
>>>>>>>>                                 minutes). Plugged out the other
>>>>>>>>                                 server's UTP cable this time
>>>>>>>>                                 and got in the same situation:
>>>>>>>>                                 root at gluster-test1:~# cat
>>>>>>>>                                 /var/log/dmesg
>>>>>>>>                                 -bash: /bin/cat: Input/output error
>>>>>>>>
>>>>>>>>                                 brick logs:
>>>>>>>>                                 [2014-08-05 07:09:03.005474] I
>>>>>>>>                                 [server.c:762:server_rpc_notify] 0-HA-fast-150G-PVE1-server:
>>>>>>>>                                 disconnecting connectionfrom
>>>>>>>>                                 pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
>>>>>>>>                                 [2014-08-05 07:09:03.005530] I
>>>>>>>>                                 [server-helpers.c:729:server_connection_put]
>>>>>>>>                                 0-HA-fast-150G-PVE1-server:
>>>>>>>>                                 Shutting down connection
>>>>>>>>                                 pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
>>>>>>>>                                 [2014-08-05 07:09:03.005560] I
>>>>>>>>                                 [server-helpers.c:463:do_fd_cleanup]
>>>>>>>>                                 0-HA-fast-150G-PVE1-server: fd
>>>>>>>>                                 cleanup on
>>>>>>>>                                 /images/124/vm-124-disk-1.qcow2
>>>>>>>>                                 [2014-08-05 07:09:03.005797] I
>>>>>>>>                                 [server-helpers.c:617:server_connection_destroy]
>>>>>>>>                                 0-HA-fast-150G-PVE1-server:
>>>>>>>>                                 destroyed connection of
>>>>>>>>                                 pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                                 2014-08-05 9:53 GMT+03:00
>>>>>>>>                                 Pranith Kumar Karampuri
>>>>>>>>                                 <pkarampu at redhat.com
>>>>>>>>                                 <mailto:pkarampu at redhat.com>>:
>>>>>>>>
>>>>>>>>                                     Do you think it is possible
>>>>>>>>                                     for you to do these tests
>>>>>>>>                                     on the latest version
>>>>>>>>                                     3.5.2? 'gluster volume heal
>>>>>>>>                                     <volname> info' would give
>>>>>>>>                                     you that information in
>>>>>>>>                                     versions > 3.5.1.
>>>>>>>>                                     Otherwise you will have to
>>>>>>>>                                     check it from either the
>>>>>>>>                                     logs, there will be
>>>>>>>>                                     self-heal completed message
>>>>>>>>                                     on the mount logs (or) by
>>>>>>>>                                     observing 'getfattr -d -m.
>>>>>>>>                                     -e hex <image-file-on-bricks>'
>>>>>>>>
>>>>>>>>                                     Pranith
>>>>>>>>
>>>>>>>>
>>>>>>>>                                     On 08/05/2014 12:09 PM,
>>>>>>>>                                     Roman wrote:
>>>>>>>>>                                     Ok, I understand. I will
>>>>>>>>>                                     try this shortly.
>>>>>>>>>                                     How can I be sure, that
>>>>>>>>>                                     healing process is done,
>>>>>>>>>                                     if I am not able to see
>>>>>>>>>                                     its status?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                                     2014-08-05 9:30 GMT+03:00
>>>>>>>>>                                     Pranith Kumar Karampuri
>>>>>>>>>                                     <pkarampu at redhat.com
>>>>>>>>>                                     <mailto:pkarampu at redhat.com>>:
>>>>>>>>>
>>>>>>>>>                                         Mounts will do the
>>>>>>>>>                                         healing, not the
>>>>>>>>>                                         self-heal-daemon. The
>>>>>>>>>                                         problem I feel is that
>>>>>>>>>                                         whichever process does
>>>>>>>>>                                         the healing has the
>>>>>>>>>                                         latest information
>>>>>>>>>                                         about the good bricks
>>>>>>>>>                                         in this usecase. Since
>>>>>>>>>                                         for VM usecase, mounts
>>>>>>>>>                                         should have the latest
>>>>>>>>>                                         information, we should
>>>>>>>>>                                         let the mounts do the
>>>>>>>>>                                         healing. If the mount
>>>>>>>>>                                         accesses the VM image
>>>>>>>>>                                         either by someone
>>>>>>>>>                                         doing operations
>>>>>>>>>                                         inside the VM or
>>>>>>>>>                                         explicit stat on the
>>>>>>>>>                                         file it should do the
>>>>>>>>>                                         healing.
>>>>>>>>>
>>>>>>>>>                                         Pranith.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                                         On 08/05/2014 10:39
>>>>>>>>>                                         AM, Roman wrote:
>>>>>>>>>>                                         Hmmm, you told me to
>>>>>>>>>>                                         turn it off. Did I
>>>>>>>>>>                                         understood something
>>>>>>>>>>                                         wrong? After I issued
>>>>>>>>>>                                         the command you've
>>>>>>>>>>                                         sent me, I was not
>>>>>>>>>>                                         able to watch the
>>>>>>>>>>                                         healing process, it
>>>>>>>>>>                                         said, it won't be
>>>>>>>>>>                                         healed, becouse its
>>>>>>>>>>                                         turned off.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                                         2014-08-05 5:39
>>>>>>>>>>                                         GMT+03:00 Pranith
>>>>>>>>>>                                         Kumar Karampuri
>>>>>>>>>>                                         <pkarampu at redhat.com
>>>>>>>>>>                                         <mailto:pkarampu at redhat.com>>:
>>>>>>>>>>
>>>>>>>>>>                                             You didn't
>>>>>>>>>>                                             mention anything
>>>>>>>>>>                                             about
>>>>>>>>>>                                             self-healing. Did
>>>>>>>>>>                                             you wait until
>>>>>>>>>>                                             the self-heal is
>>>>>>>>>>                                             complete?
>>>>>>>>>>
>>>>>>>>>>                                             Pranith
>>>>>>>>>>
>>>>>>>>>>                                             On 08/04/2014
>>>>>>>>>>                                             05:49 PM, Roman
>>>>>>>>>>                                             wrote:
>>>>>>>>>>>                                             Hi!
>>>>>>>>>>>                                             Result is pretty
>>>>>>>>>>>                                             same. I set the
>>>>>>>>>>>                                             switch port down
>>>>>>>>>>>                                             for 1st server,
>>>>>>>>>>>                                             it was ok. Then
>>>>>>>>>>>                                             set it up back
>>>>>>>>>>>                                             and set other
>>>>>>>>>>>                                             server's port
>>>>>>>>>>>                                             off. and it
>>>>>>>>>>>                                             triggered IO
>>>>>>>>>>>                                             error on two
>>>>>>>>>>>                                             virtual
>>>>>>>>>>>                                             machines: one
>>>>>>>>>>>                                             with local root
>>>>>>>>>>>                                             FS but network
>>>>>>>>>>>                                             mounted storage.
>>>>>>>>>>>                                             and other with
>>>>>>>>>>>                                             network root FS.
>>>>>>>>>>>                                             1st gave an
>>>>>>>>>>>                                             error on copying
>>>>>>>>>>>                                             to or from the
>>>>>>>>>>>                                             mounted network
>>>>>>>>>>>                                             disk, other just
>>>>>>>>>>>                                             gave me an error
>>>>>>>>>>>                                             for even reading
>>>>>>>>>>>                                             log.files.
>>>>>>>>>>>
>>>>>>>>>>>                                             cat:
>>>>>>>>>>>                                             /var/log/alternatives.log:
>>>>>>>>>>>                                             Input/output error
>>>>>>>>>>>                                             then I reset the
>>>>>>>>>>>                                             kvm VM and it
>>>>>>>>>>>                                             said me, there
>>>>>>>>>>>                                             is no boot
>>>>>>>>>>>                                             device. Next I
>>>>>>>>>>>                                             virtually
>>>>>>>>>>>                                             powered it off
>>>>>>>>>>>                                             and then back on
>>>>>>>>>>>                                             and it has booted.
>>>>>>>>>>>
>>>>>>>>>>>                                             By the way, did
>>>>>>>>>>>                                             I have to
>>>>>>>>>>>                                             start/stop volume?
>>>>>>>>>>>
>>>>>>>>>>>                                             >> Could you do
>>>>>>>>>>>                                             the following
>>>>>>>>>>>                                             and test it again?
>>>>>>>>>>>                                             >> gluster volume
>>>>>>>>>>>                                             set <volname>
>>>>>>>>>>>                                             cluster.self-heal-daemon
>>>>>>>>>>>                                             off
>>>>>>>>>>>
>>>>>>>>>>>                                             >>Pranith
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                                             2014-08-04 14:10
>>>>>>>>>>>                                             GMT+03:00
>>>>>>>>>>>                                             Pranith Kumar
>>>>>>>>>>>                                             Karampuri
>>>>>>>>>>>                                             <pkarampu at redhat.com
>>>>>>>>>>>                                             <mailto:pkarampu at redhat.com>>:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                                                 On
>>>>>>>>>>>                                                 08/04/2014
>>>>>>>>>>>                                                 03:33 PM,
>>>>>>>>>>>                                                 Roman wrote:
>>>>>>>>>>>>                                                 Hello!
>>>>>>>>>>>>
>>>>>>>>>>>>                                                 Facing the
>>>>>>>>>>>>                                                 same
>>>>>>>>>>>>                                                 problem as
>>>>>>>>>>>>                                                 mentioned
>>>>>>>>>>>>                                                 here:
>>>>>>>>>>>>
>>>>>>>>>>>>                                                 http://supercolony.gluster.org/pipermail/gluster-users/2014-April/039959.html
>>>>>>>>>>>>
>>>>>>>>>>>>                                                 my set up
>>>>>>>>>>>>                                                 is up and
>>>>>>>>>>>>                                                 running, so
>>>>>>>>>>>>                                                 i'm ready
>>>>>>>>>>>>                                                 to help you
>>>>>>>>>>>>                                                 back with
>>>>>>>>>>>>                                                 feedback.
>>>>>>>>>>>>
>>>>>>>>>>>>                                                 setup:
>>>>>>>>>>>>                                                 proxmox
>>>>>>>>>>>>                                                 server as
>>>>>>>>>>>>                                                 client
>>>>>>>>>>>>                                                 2 gluster
>>>>>>>>>>>>                                                 physical
>>>>>>>>>>>>                                                  servers
>>>>>>>>>>>>
>>>>>>>>>>>>                                                 server side
>>>>>>>>>>>>                                                 and client
>>>>>>>>>>>>                                                 side both
>>>>>>>>>>>>                                                 running atm
>>>>>>>>>>>>                                                 3.4.4
>>>>>>>>>>>>                                                 glusterfs
>>>>>>>>>>>>                                                 from
>>>>>>>>>>>>                                                 gluster repo.
>>>>>>>>>>>>
>>>>>>>>>>>>                                                 the problem is:
>>>>>>>>>>>>
>>>>>>>>>>>>                                                 1. craeted
>>>>>>>>>>>>                                                 replica bricks.
>>>>>>>>>>>>                                                 2. mounted
>>>>>>>>>>>>                                                 in proxmox
>>>>>>>>>>>>                                                 (tried both
>>>>>>>>>>>>                                                 promox
>>>>>>>>>>>>                                                 ways: via
>>>>>>>>>>>>                                                 GUI and
>>>>>>>>>>>>                                                 fstab (with
>>>>>>>>>>>>                                                 backup
>>>>>>>>>>>>                                                 volume
>>>>>>>>>>>>                                                 line), btw
>>>>>>>>>>>>                                                 while
>>>>>>>>>>>>                                                 mounting
>>>>>>>>>>>>                                                 via fstab
>>>>>>>>>>>>                                                 I'm unable
>>>>>>>>>>>>                                                 to launch a
>>>>>>>>>>>>                                                 VM without
>>>>>>>>>>>>                                                 cache,
>>>>>>>>>>>>                                                 meanwhile
>>>>>>>>>>>>                                                 direct-io-mode
>>>>>>>>>>>>                                                 is enabled
>>>>>>>>>>>>                                                 in fstab line)
>>>>>>>>>>>>                                                 3. installed VM
>>>>>>>>>>>>                                                 4. bring
>>>>>>>>>>>>                                                 one volume
>>>>>>>>>>>>                                                 down - ok
>>>>>>>>>>>>                                                 5. bringing
>>>>>>>>>>>>                                                 up, waiting
>>>>>>>>>>>>                                                 for sync is
>>>>>>>>>>>>                                                 done.
>>>>>>>>>>>>                                                 6. bring
>>>>>>>>>>>>                                                 other
>>>>>>>>>>>>                                                 volume down
>>>>>>>>>>>>                                                 - getting
>>>>>>>>>>>>                                                 IO errors
>>>>>>>>>>>>                                                 on VM guest
>>>>>>>>>>>>                                                 and not
>>>>>>>>>>>>                                                 able to
>>>>>>>>>>>>                                                 restore the
>>>>>>>>>>>>                                                 VM after I
>>>>>>>>>>>>                                                 reset the
>>>>>>>>>>>>                                                 VM via
>>>>>>>>>>>>                                                 host. It
>>>>>>>>>>>>                                                 says (no
>>>>>>>>>>>>                                                 bootable
>>>>>>>>>>>>                                                 media).
>>>>>>>>>>>>                                                 After I
>>>>>>>>>>>>                                                 shut it
>>>>>>>>>>>>                                                 down
>>>>>>>>>>>>                                                 (forced)
>>>>>>>>>>>>                                                 and bring
>>>>>>>>>>>>                                                 back up, it
>>>>>>>>>>>>                                                 boots.
>>>>>>>>>>>                                                 Could you do
>>>>>>>>>>>                                                 the
>>>>>>>>>>>                                                 following
>>>>>>>>>>>                                                 and test it
>>>>>>>>>>>                                                 again?
>>>>>>>>>>>                                                 gluster
>>>>>>>>>>>                                                 volume set
>>>>>>>>>>>                                                 <volname>
>>>>>>>>>>>                                                 cluster.self-heal-daemon
>>>>>>>>>>>                                                 off
>>>>>>>>>>>
>>>>>>>>>>>                                                 Pranith
>>>>>>>>>>>>
>>>>>>>>>>>>                                                 Need help.
>>>>>>>>>>>>                                                 Tried
>>>>>>>>>>>>                                                 3.4.3, 3.4.4.
>>>>>>>>>>>>                                                 Still
>>>>>>>>>>>>                                                 missing
>>>>>>>>>>>>                                                 pkg-s for
>>>>>>>>>>>>                                                 3.4.5 for
>>>>>>>>>>>>                                                 debian and
>>>>>>>>>>>>                                                 3.5.2
>>>>>>>>>>>>                                                 (3.5.1
>>>>>>>>>>>>                                                 always
>>>>>>>>>>>>                                                 gives a
>>>>>>>>>>>>                                                 healing
>>>>>>>>>>>>                                                 error for
>>>>>>>>>>>>                                                 some reason)
>>>>>>>>>>>>
>>>>>>>>>>>>                                                 -- 
>>>>>>>>>>>>                                                 Best regards,
>>>>>>>>>>>>                                                 Roman.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>                                                 _______________________________________________
>>>>>>>>>>>>                                                 Gluster-users mailing list
>>>>>>>>>>>>                                                 Gluster-users at gluster.org  <mailto:Gluster-users at gluster.org>
>>>>>>>>>>>>                                                 http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                                             -- 
>>>>>>>>>>>                                             Best regards,
>>>>>>>>>>>                                             Roman.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                                         -- 
>>>>>>>>>>                                         Best regards,
>>>>>>>>>>                                         Roman.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                                     -- 
>>>>>>>>>                                     Best regards,
>>>>>>>>>                                     Roman.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                                 -- 
>>>>>>>>                                 Best regards,
>>>>>>>>                                 Roman.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>                             -- 
>>>>>>>                             Best regards,
>>>>>>>                             Roman.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>                         -- 
>>>>>>                         Best regards,
>>>>>>                         Roman.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>                     -- 
>>>>>                     Best regards,
>>>>>                     Roman.
>>>>
>>>>
>>>>
>>>>
>>>>                 -- 
>>>>                 Best regards,
>>>>                 Roman.
>>>
>>>
>>>
>>>
>>>             -- 
>>>             Best regards,
>>>             Roman.
>>
>>
>>
>>
>>         -- 
>>         Best regards,
>>         Roman.
>>
>>
>>
>>
>>     -- 
>>     Best regards,
>>     Roman.
>
>
>
>
> -- 
> Best regards,
> Roman.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140806/f2001092/attachment-0001.html>


More information about the Gluster-users mailing list