[Gluster-users] libgfapi failover problem on replica bricks

Pranith Kumar Karampuri pkarampu at redhat.com
Tue Aug 5 08:25:22 UTC 2014


On 08/05/2014 01:10 PM, Roman wrote:
> Ahha! For some reason I was not able to start the VM anymore, Proxmox 
> VE told me, that it is not able to read the qcow2 header due to 
> permission is denied for some reason. So I just deleted that file and 
> created a new VM. And the nex message I've got was this:
Seems like these are the messages where you took down the bricks before 
self-heal. Could you restart the run waiting for self-heals to complete 
before taking down the next brick?

Pranith
>
>
> [2014-08-05 07:31:25.663412] E 
> [afr-self-heal-common.c:197:afr_sh_print_split_brain_log] 
> 0-HA-fast-150G-PVE1-replicate-0: Unable to self-heal contents of 
> '/images/124/vm-124-disk-1.qcow2' (possible split-brain). Please 
> delete the file from all but the preferred subvolume.- Pending matrix: 
>  [ [ 0 60 ] [ 11 0 ] ]
> [2014-08-05 07:31:25.663955] E 
> [afr-self-heal-common.c:2262:afr_self_heal_completion_cbk] 
> 0-HA-fast-150G-PVE1-replicate-0: background  data self-heal failed on 
> /images/124/vm-124-disk-1.qcow2
>
>
>
> 2014-08-05 10:13 GMT+03:00 Pranith Kumar Karampuri 
> <pkarampu at redhat.com <mailto:pkarampu at redhat.com>>:
>
>     I just responded to your earlier mail about how the log looks. The
>     log comes on the mount's logfile
>
>     Pranith
>
>     On 08/05/2014 12:41 PM, Roman wrote:
>>     Ok, so I've waited enough, I think. Had no any traffic on switch
>>     ports between servers. Could not find any suitable log message
>>     about completed self-heal (waited about 30 minutes). Plugged out
>>     the other server's UTP cable this time and got in the same
>>     situation:
>>     root at gluster-test1:~# cat /var/log/dmesg
>>     -bash: /bin/cat: Input/output error
>>
>>     brick logs:
>>     [2014-08-05 07:09:03.005474] I [server.c:762:server_rpc_notify]
>>     0-HA-fast-150G-PVE1-server: disconnecting connectionfrom
>>     pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
>>     [2014-08-05 07:09:03.005530] I
>>     [server-helpers.c:729:server_connection_put]
>>     0-HA-fast-150G-PVE1-server: Shutting down connection
>>     pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
>>     [2014-08-05 07:09:03.005560] I
>>     [server-helpers.c:463:do_fd_cleanup] 0-HA-fast-150G-PVE1-server:
>>     fd cleanup on /images/124/vm-124-disk-1.qcow2
>>     [2014-08-05 07:09:03.005797] I
>>     [server-helpers.c:617:server_connection_destroy]
>>     0-HA-fast-150G-PVE1-server: destroyed connection of
>>     pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
>>
>>
>>
>>
>>
>>     2014-08-05 9:53 GMT+03:00 Pranith Kumar Karampuri
>>     <pkarampu at redhat.com <mailto:pkarampu at redhat.com>>:
>>
>>         Do you think it is possible for you to do these tests on the
>>         latest version 3.5.2? 'gluster volume heal <volname> info'
>>         would give you that information in versions > 3.5.1.
>>         Otherwise you will have to check it from either the logs,
>>         there will be self-heal completed message on the mount logs
>>         (or) by observing 'getfattr -d -m. -e hex <image-file-on-bricks>'
>>
>>         Pranith
>>
>>
>>         On 08/05/2014 12:09 PM, Roman wrote:
>>>         Ok, I understand. I will try this shortly.
>>>         How can I be sure, that healing process is done, if I am not
>>>         able to see its status?
>>>
>>>
>>>         2014-08-05 9:30 GMT+03:00 Pranith Kumar Karampuri
>>>         <pkarampu at redhat.com <mailto:pkarampu at redhat.com>>:
>>>
>>>             Mounts will do the healing, not the self-heal-daemon.
>>>             The problem I feel is that whichever process does the
>>>             healing has the latest information about the good bricks
>>>             in this usecase. Since for VM usecase, mounts should
>>>             have the latest information, we should let the mounts do
>>>             the healing. If the mount accesses the VM image either
>>>             by someone doing operations inside the VM or explicit
>>>             stat on the file it should do the healing.
>>>
>>>             Pranith.
>>>
>>>
>>>             On 08/05/2014 10:39 AM, Roman wrote:
>>>>             Hmmm, you told me to turn it off. Did I understood
>>>>             something wrong? After I issued the command you've sent
>>>>             me, I was not able to watch the healing process, it
>>>>             said, it won't be healed, becouse its turned off.
>>>>
>>>>
>>>>             2014-08-05 5:39 GMT+03:00 Pranith Kumar Karampuri
>>>>             <pkarampu at redhat.com <mailto:pkarampu at redhat.com>>:
>>>>
>>>>                 You didn't mention anything about self-healing. Did
>>>>                 you wait until the self-heal is complete?
>>>>
>>>>                 Pranith
>>>>
>>>>                 On 08/04/2014 05:49 PM, Roman wrote:
>>>>>                 Hi!
>>>>>                 Result is pretty same. I set the switch port down
>>>>>                 for 1st server, it was ok. Then set it up back and
>>>>>                 set other server's port off. and it triggered IO
>>>>>                 error on two virtual machines: one with local root
>>>>>                 FS but network mounted storage. and other with
>>>>>                 network root FS. 1st gave an error on copying to
>>>>>                 or from the mounted network disk, other just gave
>>>>>                 me an error for even reading log.files.
>>>>>
>>>>>                 cat: /var/log/alternatives.log: Input/output error
>>>>>                 then I reset the kvm VM and it said me, there is
>>>>>                 no boot device. Next I virtually powered it off
>>>>>                 and then back on and it has booted.
>>>>>
>>>>>                 By the way, did I have to start/stop volume?
>>>>>
>>>>>                 >> Could you do the following and test it again?
>>>>>                 >> gluster volume set <volname>
>>>>>                 cluster.self-heal-daemon off
>>>>>
>>>>>                 >>Pranith
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>                 2014-08-04 14:10 GMT+03:00 Pranith Kumar Karampuri
>>>>>                 <pkarampu at redhat.com <mailto:pkarampu at redhat.com>>:
>>>>>
>>>>>
>>>>>                     On 08/04/2014 03:33 PM, Roman wrote:
>>>>>>                     Hello!
>>>>>>
>>>>>>                     Facing the same problem as mentioned here:
>>>>>>
>>>>>>                     http://supercolony.gluster.org/pipermail/gluster-users/2014-April/039959.html
>>>>>>
>>>>>>                     my set up is up and running, so i'm ready to
>>>>>>                     help you back with feedback.
>>>>>>
>>>>>>                     setup:
>>>>>>                     proxmox server as client
>>>>>>                     2 gluster physical  servers
>>>>>>
>>>>>>                     server side and client side both running atm
>>>>>>                     3.4.4 glusterfs from gluster repo.
>>>>>>
>>>>>>                     the problem is:
>>>>>>
>>>>>>                     1. craeted replica bricks.
>>>>>>                     2. mounted in proxmox (tried both promox
>>>>>>                     ways: via GUI and fstab (with backup volume
>>>>>>                     line), btw while mounting via fstab I'm
>>>>>>                     unable to launch a VM without cache,
>>>>>>                     meanwhile direct-io-mode is enabled in fstab
>>>>>>                     line)
>>>>>>                     3. installed VM
>>>>>>                     4. bring one volume down - ok
>>>>>>                     5. bringing up, waiting for sync is done.
>>>>>>                     6. bring other volume down - getting IO
>>>>>>                     errors on VM guest and not able to restore
>>>>>>                     the VM after I reset the VM via host. It says
>>>>>>                     (no bootable media). After I shut it down
>>>>>>                     (forced) and bring back up, it boots.
>>>>>                     Could you do the following and test it again?
>>>>>                     gluster volume set <volname>
>>>>>                     cluster.self-heal-daemon off
>>>>>
>>>>>                     Pranith
>>>>>>
>>>>>>                     Need help. Tried 3.4.3, 3.4.4.
>>>>>>                     Still missing pkg-s for 3.4.5 for debian and
>>>>>>                     3.5.2 (3.5.1 always gives a healing error for
>>>>>>                     some reason)
>>>>>>
>>>>>>                     -- 
>>>>>>                     Best regards,
>>>>>>                     Roman.
>>>>>>
>>>>>>
>>>>>>                     _______________________________________________
>>>>>>                     Gluster-users mailing list
>>>>>>                     Gluster-users at gluster.org  <mailto:Gluster-users at gluster.org>
>>>>>>                     http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>                 -- 
>>>>>                 Best regards,
>>>>>                 Roman.
>>>>
>>>>
>>>>
>>>>
>>>>             -- 
>>>>             Best regards,
>>>>             Roman.
>>>
>>>
>>>
>>>
>>>         -- 
>>>         Best regards,
>>>         Roman.
>>
>>
>>
>>
>>     -- 
>>     Best regards,
>>     Roman.
>
>
>
>
> -- 
> Best regards,
> Roman.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140805/04bcb6b2/attachment.html>


More information about the Gluster-users mailing list