[Gluster-users] libgfapi failover problem on replica bricks

Tue Aug 5 08:49:03 UTC 2014

On 08/05/2014 02:06 PM, Roman wrote:
> Well, it seems like it doesn't see the changes were made to the volume 
> ? I created two files 200 and 100 MB (from /dev/zero) after I 
> disconnected the first brick. Then connected it back and got these logs:
>
> [2014-08-05 08:30:37.830150] I 
> [glusterfsd-mgmt.c:1584:mgmt_getspec_cbk] 0-glusterfs: No change in 
> volfile, continuing
> [2014-08-05 08:30:37.830207] I [rpc-clnt.c:1676:rpc_clnt_reconfig] 
> 0-HA-fast-150G-PVE1-client-0: changing port to 49153 (from 0)
> [2014-08-05 08:30:37.830239] W [socket.c:514:__socket_rwv] 
> 0-HA-fast-150G-PVE1-client-0: readv failed (No data available)
> [2014-08-05 08:30:37.831024] I 
> [client-handshake.c:1659:select_server_supported_programs] 
> 0-HA-fast-150G-PVE1-client-0: Using Program GlusterFS 3.3, Num 
> (1298437), Version (330)
> [2014-08-05 08:30:37.831375] I 
> [client-handshake.c:1456:client_setvolume_cbk] 
> 0-HA-fast-150G-PVE1-client-0: Connected to 10.250.0.1:49153 
> <http://10.250.0.1:49153>, attached to remote volume 
> '/exports/fast-test/150G'.
> [2014-08-05 08:30:37.831394] I 
> [client-handshake.c:1468:client_setvolume_cbk] 
> 0-HA-fast-150G-PVE1-client-0: Server and Client lk-version numbers are 
> not same, reopening the fds
> [2014-08-05 08:30:37.831566] I 
> [client-handshake.c:450:client_set_lk_version_cbk] 
> 0-HA-fast-150G-PVE1-client-0: Server lk version = 1
>
>
> [2014-08-05 08:30:37.830150] I 
> [glusterfsd-mgmt.c:1584:mgmt_getspec_cbk] 0-glusterfs: No change in 
> volfile, continuing
> this line seems weird to me tbh.
> I do not see any traffic on switch interfaces between gluster servers, 
> which means, there is no syncing between them.
> I tried to ls -l the files on the client and servers to trigger the 
> healing, but seems like no success. Should I wait more?
Yes, it should take around 10-15 minutes. Could you provide 'getfattr -d 
-m. -e hex <file-on-brick>' on both the bricks.

Pranith
>
>
> 2014-08-05 11:25 GMT+03:00 Pranith Kumar Karampuri 
> <pkarampu at redhat.com <mailto:pkarampu at redhat.com>>:
>
>
>     On 08/05/2014 01:10 PM, Roman wrote:
>>     Ahha! For some reason I was not able to start the VM anymore,
>>     Proxmox VE told me, that it is not able to read the qcow2 header
>>     due to permission is denied for some reason. So I just deleted
>>     that file and created a new VM. And the nex message I've got was
>>     this:
>     Seems like these are the messages where you took down the bricks
>     before self-heal. Could you restart the run waiting for self-heals
>     to complete before taking down the next brick?
>
>     Pranith
>
>>
>>
>>     [2014-08-05 07:31:25.663412] E
>>     [afr-self-heal-common.c:197:afr_sh_print_split_brain_log]
>>     0-HA-fast-150G-PVE1-replicate-0: Unable to self-heal contents of
>>     '/images/124/vm-124-disk-1.qcow2' (possible split-brain). Please
>>     delete the file from all but the preferred subvolume.- Pending
>>     matrix:  [ [ 0 60 ] [ 11 0 ] ]
>>     [2014-08-05 07:31:25.663955] E
>>     [afr-self-heal-common.c:2262:afr_self_heal_completion_cbk]
>>     0-HA-fast-150G-PVE1-replicate-0: background  data self-heal
>>     failed on /images/124/vm-124-disk-1.qcow2
>>
>>
>>
>>     2014-08-05 10:13 GMT+03:00 Pranith Kumar Karampuri
>>     <pkarampu at redhat.com <mailto:pkarampu at redhat.com>>:
>>
>>         I just responded to your earlier mail about how the log
>>         looks. The log comes on the mount's logfile
>>
>>         Pranith
>>
>>         On 08/05/2014 12:41 PM, Roman wrote:
>>>         Ok, so I've waited enough, I think. Had no any traffic on
>>>         switch ports between servers. Could not find any suitable
>>>         log message about completed self-heal (waited about 30
>>>         minutes). Plugged out the other server's UTP cable this time
>>>         and got in the same situation:
>>>         root at gluster-test1:~# cat /var/log/dmesg
>>>         -bash: /bin/cat: Input/output error
>>>
>>>         brick logs:
>>>         [2014-08-05 07:09:03.005474] I
>>>         [server.c:762:server_rpc_notify] 0-HA-fast-150G-PVE1-server:
>>>         disconnecting connectionfrom
>>>         pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
>>>         [2014-08-05 07:09:03.005530] I
>>>         [server-helpers.c:729:server_connection_put]
>>>         0-HA-fast-150G-PVE1-server: Shutting down connection
>>>         pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
>>>         [2014-08-05 07:09:03.005560] I
>>>         [server-helpers.c:463:do_fd_cleanup]
>>>         0-HA-fast-150G-PVE1-server: fd cleanup on
>>>         /images/124/vm-124-disk-1.qcow2
>>>         [2014-08-05 07:09:03.005797] I
>>>         [server-helpers.c:617:server_connection_destroy]
>>>         0-HA-fast-150G-PVE1-server: destroyed connection of
>>>         pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
>>>
>>>
>>>
>>>
>>>
>>>         2014-08-05 9:53 GMT+03:00 Pranith Kumar Karampuri
>>>         <pkarampu at redhat.com <mailto:pkarampu at redhat.com>>:
>>>
>>>             Do you think it is possible for you to do these tests on
>>>             the latest version 3.5.2? 'gluster volume heal <volname>
>>>             info' would give you that information in versions > 3.5.1.
>>>             Otherwise you will have to check it from either the
>>>             logs, there will be self-heal completed message on the
>>>             mount logs (or) by observing 'getfattr -d -m. -e hex
>>>             <image-file-on-bricks>'
>>>
>>>             Pranith
>>>
>>>
>>>             On 08/05/2014 12:09 PM, Roman wrote:
>>>>             Ok, I understand. I will try this shortly.
>>>>             How can I be sure, that healing process is done, if I
>>>>             am not able to see its status?
>>>>
>>>>
>>>>             2014-08-05 9:30 GMT+03:00 Pranith Kumar Karampuri
>>>>             <pkarampu at redhat.com <mailto:pkarampu at redhat.com>>:
>>>>
>>>>                 Mounts will do the healing, not the
>>>>                 self-heal-daemon. The problem I feel is that
>>>>                 whichever process does the healing has the latest
>>>>                 information about the good bricks in this usecase.
>>>>                 Since for VM usecase, mounts should have the latest
>>>>                 information, we should let the mounts do the
>>>>                 healing. If the mount accesses the VM image either
>>>>                 by someone doing operations inside the VM or
>>>>                 explicit stat on the file it should do the healing.
>>>>
>>>>                 Pranith.
>>>>
>>>>
>>>>                 On 08/05/2014 10:39 AM, Roman wrote:
>>>>>                 Hmmm, you told me to turn it off. Did I understood
>>>>>                 something wrong? After I issued the command you've
>>>>>                 sent me, I was not able to watch the healing
>>>>>                 process, it said, it won't be healed, becouse its
>>>>>                 turned off.
>>>>>
>>>>>
>>>>>                 2014-08-05 5:39 GMT+03:00 Pranith Kumar Karampuri
>>>>>                 <pkarampu at redhat.com <mailto:pkarampu at redhat.com>>:
>>>>>
>>>>>                     You didn't mention anything about
>>>>>                     self-healing. Did you wait until the self-heal
>>>>>                     is complete?
>>>>>
>>>>>                     Pranith
>>>>>
>>>>>                     On 08/04/2014 05:49 PM, Roman wrote:
>>>>>>                     Hi!
>>>>>>                     Result is pretty same. I set the switch port
>>>>>>                     down for 1st server, it was ok. Then set it
>>>>>>                     up back and set other server's port off. and
>>>>>>                     it triggered IO error on two virtual
>>>>>>                     machines: one with local root FS but network
>>>>>>                     mounted storage. and other with network root
>>>>>>                     FS. 1st gave an error on copying to or from
>>>>>>                     the mounted network disk, other just gave me
>>>>>>                     an error for even reading log.files.
>>>>>>
>>>>>>                     cat: /var/log/alternatives.log: Input/output
>>>>>>                     error
>>>>>>                     then I reset the kvm VM and it said me, there
>>>>>>                     is no boot device. Next I virtually powered
>>>>>>                     it off and then back on and it has booted.
>>>>>>
>>>>>>                     By the way, did I have to start/stop volume?
>>>>>>
>>>>>>                     >> Could you do the following and test it again?
>>>>>>                     >> gluster volume set <volname>
>>>>>>                     cluster.self-heal-daemon off
>>>>>>
>>>>>>                     >>Pranith
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>                     2014-08-04 14:10 GMT+03:00 Pranith Kumar
>>>>>>                     Karampuri <pkarampu at redhat.com
>>>>>>                     <mailto:pkarampu at redhat.com>>:
>>>>>>
>>>>>>
>>>>>>                         On 08/04/2014 03:33 PM, Roman wrote:
>>>>>>>                         Hello!
>>>>>>>
>>>>>>>                         Facing the same problem as mentioned here:
>>>>>>>
>>>>>>>                         http://supercolony.gluster.org/pipermail/gluster-users/2014-April/039959.html
>>>>>>>
>>>>>>>                         my set up is up and running, so i'm
>>>>>>>                         ready to help you back with feedback.
>>>>>>>
>>>>>>>                         setup:
>>>>>>>                         proxmox server as client
>>>>>>>                         2 gluster physical  servers
>>>>>>>
>>>>>>>                         server side and client side both running
>>>>>>>                         atm 3.4.4 glusterfs from gluster repo.
>>>>>>>
>>>>>>>                         the problem is:
>>>>>>>
>>>>>>>                         1. craeted replica bricks.
>>>>>>>                         2. mounted in proxmox (tried both promox
>>>>>>>                         ways: via GUI and fstab (with backup
>>>>>>>                         volume line), btw while mounting via
>>>>>>>                         fstab I'm unable to launch a VM without
>>>>>>>                         cache, meanwhile direct-io-mode is
>>>>>>>                         enabled in fstab line)
>>>>>>>                         3. installed VM
>>>>>>>                         4. bring one volume down - ok
>>>>>>>                         5. bringing up, waiting for sync is done.
>>>>>>>                         6. bring other volume down - getting IO
>>>>>>>                         errors on VM guest and not able to
>>>>>>>                         restore the VM after I reset the VM via
>>>>>>>                         host. It says (no bootable media). After
>>>>>>>                         I shut it down (forced) and bring back
>>>>>>>                         up, it boots.
>>>>>>                         Could you do the following and test it again?
>>>>>>                         gluster volume set <volname>
>>>>>>                         cluster.self-heal-daemon off
>>>>>>
>>>>>>                         Pranith
>>>>>>>
>>>>>>>                         Need help. Tried 3.4.3, 3.4.4.
>>>>>>>                         Still missing pkg-s for 3.4.5 for debian
>>>>>>>                         and 3.5.2 (3.5.1 always gives a healing
>>>>>>>                         error for some reason)
>>>>>>>
>>>>>>>                         -- 
>>>>>>>                         Best regards,
>>>>>>>                         Roman.
>>>>>>>
>>>>>>>
>>>>>>>                         _______________________________________________
>>>>>>>                         Gluster-users mailing list
>>>>>>>                         Gluster-users at gluster.org  <mailto:Gluster-users at gluster.org>
>>>>>>>                         http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>                     -- 
>>>>>>                     Best regards,
>>>>>>                     Roman.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>                 -- 
>>>>>                 Best regards,
>>>>>                 Roman.
>>>>
>>>>
>>>>
>>>>
>>>>             -- 
>>>>             Best regards,
>>>>             Roman.
>>>
>>>
>>>
>>>
>>>         -- 
>>>         Best regards,
>>>         Roman.
>>
>>
>>
>>
>>     -- 
>>     Best regards,
>>     Roman.
>
>
>
>
> -- 
> Best regards,
> Roman.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140805/1aa43c4a/attachment-0001.html>