[Gluster-users] libgfapi failover problem on replica bricks
Pranith Kumar Karampuri
pkarampu at redhat.com
Tue Aug 5 08:25:22 UTC 2014
On 08/05/2014 01:10 PM, Roman wrote:
> Ahha! For some reason I was not able to start the VM anymore, Proxmox
> VE told me, that it is not able to read the qcow2 header due to
> permission is denied for some reason. So I just deleted that file and
> created a new VM. And the nex message I've got was this:
Seems like these are the messages where you took down the bricks before
self-heal. Could you restart the run waiting for self-heals to complete
before taking down the next brick?
Pranith
>
>
> [2014-08-05 07:31:25.663412] E
> [afr-self-heal-common.c:197:afr_sh_print_split_brain_log]
> 0-HA-fast-150G-PVE1-replicate-0: Unable to self-heal contents of
> '/images/124/vm-124-disk-1.qcow2' (possible split-brain). Please
> delete the file from all but the preferred subvolume.- Pending matrix:
> [ [ 0 60 ] [ 11 0 ] ]
> [2014-08-05 07:31:25.663955] E
> [afr-self-heal-common.c:2262:afr_self_heal_completion_cbk]
> 0-HA-fast-150G-PVE1-replicate-0: background data self-heal failed on
> /images/124/vm-124-disk-1.qcow2
>
>
>
> 2014-08-05 10:13 GMT+03:00 Pranith Kumar Karampuri
> <pkarampu at redhat.com <mailto:pkarampu at redhat.com>>:
>
> I just responded to your earlier mail about how the log looks. The
> log comes on the mount's logfile
>
> Pranith
>
> On 08/05/2014 12:41 PM, Roman wrote:
>> Ok, so I've waited enough, I think. Had no any traffic on switch
>> ports between servers. Could not find any suitable log message
>> about completed self-heal (waited about 30 minutes). Plugged out
>> the other server's UTP cable this time and got in the same
>> situation:
>> root at gluster-test1:~# cat /var/log/dmesg
>> -bash: /bin/cat: Input/output error
>>
>> brick logs:
>> [2014-08-05 07:09:03.005474] I [server.c:762:server_rpc_notify]
>> 0-HA-fast-150G-PVE1-server: disconnecting connectionfrom
>> pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
>> [2014-08-05 07:09:03.005530] I
>> [server-helpers.c:729:server_connection_put]
>> 0-HA-fast-150G-PVE1-server: Shutting down connection
>> pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
>> [2014-08-05 07:09:03.005560] I
>> [server-helpers.c:463:do_fd_cleanup] 0-HA-fast-150G-PVE1-server:
>> fd cleanup on /images/124/vm-124-disk-1.qcow2
>> [2014-08-05 07:09:03.005797] I
>> [server-helpers.c:617:server_connection_destroy]
>> 0-HA-fast-150G-PVE1-server: destroyed connection of
>> pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
>>
>>
>>
>>
>>
>> 2014-08-05 9:53 GMT+03:00 Pranith Kumar Karampuri
>> <pkarampu at redhat.com <mailto:pkarampu at redhat.com>>:
>>
>> Do you think it is possible for you to do these tests on the
>> latest version 3.5.2? 'gluster volume heal <volname> info'
>> would give you that information in versions > 3.5.1.
>> Otherwise you will have to check it from either the logs,
>> there will be self-heal completed message on the mount logs
>> (or) by observing 'getfattr -d -m. -e hex <image-file-on-bricks>'
>>
>> Pranith
>>
>>
>> On 08/05/2014 12:09 PM, Roman wrote:
>>> Ok, I understand. I will try this shortly.
>>> How can I be sure, that healing process is done, if I am not
>>> able to see its status?
>>>
>>>
>>> 2014-08-05 9:30 GMT+03:00 Pranith Kumar Karampuri
>>> <pkarampu at redhat.com <mailto:pkarampu at redhat.com>>:
>>>
>>> Mounts will do the healing, not the self-heal-daemon.
>>> The problem I feel is that whichever process does the
>>> healing has the latest information about the good bricks
>>> in this usecase. Since for VM usecase, mounts should
>>> have the latest information, we should let the mounts do
>>> the healing. If the mount accesses the VM image either
>>> by someone doing operations inside the VM or explicit
>>> stat on the file it should do the healing.
>>>
>>> Pranith.
>>>
>>>
>>> On 08/05/2014 10:39 AM, Roman wrote:
>>>> Hmmm, you told me to turn it off. Did I understood
>>>> something wrong? After I issued the command you've sent
>>>> me, I was not able to watch the healing process, it
>>>> said, it won't be healed, becouse its turned off.
>>>>
>>>>
>>>> 2014-08-05 5:39 GMT+03:00 Pranith Kumar Karampuri
>>>> <pkarampu at redhat.com <mailto:pkarampu at redhat.com>>:
>>>>
>>>> You didn't mention anything about self-healing. Did
>>>> you wait until the self-heal is complete?
>>>>
>>>> Pranith
>>>>
>>>> On 08/04/2014 05:49 PM, Roman wrote:
>>>>> Hi!
>>>>> Result is pretty same. I set the switch port down
>>>>> for 1st server, it was ok. Then set it up back and
>>>>> set other server's port off. and it triggered IO
>>>>> error on two virtual machines: one with local root
>>>>> FS but network mounted storage. and other with
>>>>> network root FS. 1st gave an error on copying to
>>>>> or from the mounted network disk, other just gave
>>>>> me an error for even reading log.files.
>>>>>
>>>>> cat: /var/log/alternatives.log: Input/output error
>>>>> then I reset the kvm VM and it said me, there is
>>>>> no boot device. Next I virtually powered it off
>>>>> and then back on and it has booted.
>>>>>
>>>>> By the way, did I have to start/stop volume?
>>>>>
>>>>> >> Could you do the following and test it again?
>>>>> >> gluster volume set <volname>
>>>>> cluster.self-heal-daemon off
>>>>>
>>>>> >>Pranith
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> 2014-08-04 14:10 GMT+03:00 Pranith Kumar Karampuri
>>>>> <pkarampu at redhat.com <mailto:pkarampu at redhat.com>>:
>>>>>
>>>>>
>>>>> On 08/04/2014 03:33 PM, Roman wrote:
>>>>>> Hello!
>>>>>>
>>>>>> Facing the same problem as mentioned here:
>>>>>>
>>>>>> http://supercolony.gluster.org/pipermail/gluster-users/2014-April/039959.html
>>>>>>
>>>>>> my set up is up and running, so i'm ready to
>>>>>> help you back with feedback.
>>>>>>
>>>>>> setup:
>>>>>> proxmox server as client
>>>>>> 2 gluster physical servers
>>>>>>
>>>>>> server side and client side both running atm
>>>>>> 3.4.4 glusterfs from gluster repo.
>>>>>>
>>>>>> the problem is:
>>>>>>
>>>>>> 1. craeted replica bricks.
>>>>>> 2. mounted in proxmox (tried both promox
>>>>>> ways: via GUI and fstab (with backup volume
>>>>>> line), btw while mounting via fstab I'm
>>>>>> unable to launch a VM without cache,
>>>>>> meanwhile direct-io-mode is enabled in fstab
>>>>>> line)
>>>>>> 3. installed VM
>>>>>> 4. bring one volume down - ok
>>>>>> 5. bringing up, waiting for sync is done.
>>>>>> 6. bring other volume down - getting IO
>>>>>> errors on VM guest and not able to restore
>>>>>> the VM after I reset the VM via host. It says
>>>>>> (no bootable media). After I shut it down
>>>>>> (forced) and bring back up, it boots.
>>>>> Could you do the following and test it again?
>>>>> gluster volume set <volname>
>>>>> cluster.self-heal-daemon off
>>>>>
>>>>> Pranith
>>>>>>
>>>>>> Need help. Tried 3.4.3, 3.4.4.
>>>>>> Still missing pkg-s for 3.4.5 for debian and
>>>>>> 3.5.2 (3.5.1 always gives a healing error for
>>>>>> some reason)
>>>>>>
>>>>>> --
>>>>>> Best regards,
>>>>>> Roman.
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Gluster-users mailing list
>>>>>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>>>>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best regards,
>>>>> Roman.
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best regards,
>>>> Roman.
>>>
>>>
>>>
>>>
>>> --
>>> Best regards,
>>> Roman.
>>
>>
>>
>>
>> --
>> Best regards,
>> Roman.
>
>
>
>
> --
> Best regards,
> Roman.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140805/04bcb6b2/attachment.html>
More information about the Gluster-users
mailing list