[Gluster-users] libgfapi failover problem on replica bricks
Pranith Kumar Karampuri
pkarampu at redhat.com
Tue Aug 5 08:49:03 UTC 2014
On 08/05/2014 02:06 PM, Roman wrote:
> Well, it seems like it doesn't see the changes were made to the volume
> ? I created two files 200 and 100 MB (from /dev/zero) after I
> disconnected the first brick. Then connected it back and got these logs:
>
> [2014-08-05 08:30:37.830150] I
> [glusterfsd-mgmt.c:1584:mgmt_getspec_cbk] 0-glusterfs: No change in
> volfile, continuing
> [2014-08-05 08:30:37.830207] I [rpc-clnt.c:1676:rpc_clnt_reconfig]
> 0-HA-fast-150G-PVE1-client-0: changing port to 49153 (from 0)
> [2014-08-05 08:30:37.830239] W [socket.c:514:__socket_rwv]
> 0-HA-fast-150G-PVE1-client-0: readv failed (No data available)
> [2014-08-05 08:30:37.831024] I
> [client-handshake.c:1659:select_server_supported_programs]
> 0-HA-fast-150G-PVE1-client-0: Using Program GlusterFS 3.3, Num
> (1298437), Version (330)
> [2014-08-05 08:30:37.831375] I
> [client-handshake.c:1456:client_setvolume_cbk]
> 0-HA-fast-150G-PVE1-client-0: Connected to 10.250.0.1:49153
> <http://10.250.0.1:49153>, attached to remote volume
> '/exports/fast-test/150G'.
> [2014-08-05 08:30:37.831394] I
> [client-handshake.c:1468:client_setvolume_cbk]
> 0-HA-fast-150G-PVE1-client-0: Server and Client lk-version numbers are
> not same, reopening the fds
> [2014-08-05 08:30:37.831566] I
> [client-handshake.c:450:client_set_lk_version_cbk]
> 0-HA-fast-150G-PVE1-client-0: Server lk version = 1
>
>
> [2014-08-05 08:30:37.830150] I
> [glusterfsd-mgmt.c:1584:mgmt_getspec_cbk] 0-glusterfs: No change in
> volfile, continuing
> this line seems weird to me tbh.
> I do not see any traffic on switch interfaces between gluster servers,
> which means, there is no syncing between them.
> I tried to ls -l the files on the client and servers to trigger the
> healing, but seems like no success. Should I wait more?
Yes, it should take around 10-15 minutes. Could you provide 'getfattr -d
-m. -e hex <file-on-brick>' on both the bricks.
Pranith
>
>
> 2014-08-05 11:25 GMT+03:00 Pranith Kumar Karampuri
> <pkarampu at redhat.com <mailto:pkarampu at redhat.com>>:
>
>
> On 08/05/2014 01:10 PM, Roman wrote:
>> Ahha! For some reason I was not able to start the VM anymore,
>> Proxmox VE told me, that it is not able to read the qcow2 header
>> due to permission is denied for some reason. So I just deleted
>> that file and created a new VM. And the nex message I've got was
>> this:
> Seems like these are the messages where you took down the bricks
> before self-heal. Could you restart the run waiting for self-heals
> to complete before taking down the next brick?
>
> Pranith
>
>>
>>
>> [2014-08-05 07:31:25.663412] E
>> [afr-self-heal-common.c:197:afr_sh_print_split_brain_log]
>> 0-HA-fast-150G-PVE1-replicate-0: Unable to self-heal contents of
>> '/images/124/vm-124-disk-1.qcow2' (possible split-brain). Please
>> delete the file from all but the preferred subvolume.- Pending
>> matrix: [ [ 0 60 ] [ 11 0 ] ]
>> [2014-08-05 07:31:25.663955] E
>> [afr-self-heal-common.c:2262:afr_self_heal_completion_cbk]
>> 0-HA-fast-150G-PVE1-replicate-0: background data self-heal
>> failed on /images/124/vm-124-disk-1.qcow2
>>
>>
>>
>> 2014-08-05 10:13 GMT+03:00 Pranith Kumar Karampuri
>> <pkarampu at redhat.com <mailto:pkarampu at redhat.com>>:
>>
>> I just responded to your earlier mail about how the log
>> looks. The log comes on the mount's logfile
>>
>> Pranith
>>
>> On 08/05/2014 12:41 PM, Roman wrote:
>>> Ok, so I've waited enough, I think. Had no any traffic on
>>> switch ports between servers. Could not find any suitable
>>> log message about completed self-heal (waited about 30
>>> minutes). Plugged out the other server's UTP cable this time
>>> and got in the same situation:
>>> root at gluster-test1:~# cat /var/log/dmesg
>>> -bash: /bin/cat: Input/output error
>>>
>>> brick logs:
>>> [2014-08-05 07:09:03.005474] I
>>> [server.c:762:server_rpc_notify] 0-HA-fast-150G-PVE1-server:
>>> disconnecting connectionfrom
>>> pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
>>> [2014-08-05 07:09:03.005530] I
>>> [server-helpers.c:729:server_connection_put]
>>> 0-HA-fast-150G-PVE1-server: Shutting down connection
>>> pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
>>> [2014-08-05 07:09:03.005560] I
>>> [server-helpers.c:463:do_fd_cleanup]
>>> 0-HA-fast-150G-PVE1-server: fd cleanup on
>>> /images/124/vm-124-disk-1.qcow2
>>> [2014-08-05 07:09:03.005797] I
>>> [server-helpers.c:617:server_connection_destroy]
>>> 0-HA-fast-150G-PVE1-server: destroyed connection of
>>> pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
>>>
>>>
>>>
>>>
>>>
>>> 2014-08-05 9:53 GMT+03:00 Pranith Kumar Karampuri
>>> <pkarampu at redhat.com <mailto:pkarampu at redhat.com>>:
>>>
>>> Do you think it is possible for you to do these tests on
>>> the latest version 3.5.2? 'gluster volume heal <volname>
>>> info' would give you that information in versions > 3.5.1.
>>> Otherwise you will have to check it from either the
>>> logs, there will be self-heal completed message on the
>>> mount logs (or) by observing 'getfattr -d -m. -e hex
>>> <image-file-on-bricks>'
>>>
>>> Pranith
>>>
>>>
>>> On 08/05/2014 12:09 PM, Roman wrote:
>>>> Ok, I understand. I will try this shortly.
>>>> How can I be sure, that healing process is done, if I
>>>> am not able to see its status?
>>>>
>>>>
>>>> 2014-08-05 9:30 GMT+03:00 Pranith Kumar Karampuri
>>>> <pkarampu at redhat.com <mailto:pkarampu at redhat.com>>:
>>>>
>>>> Mounts will do the healing, not the
>>>> self-heal-daemon. The problem I feel is that
>>>> whichever process does the healing has the latest
>>>> information about the good bricks in this usecase.
>>>> Since for VM usecase, mounts should have the latest
>>>> information, we should let the mounts do the
>>>> healing. If the mount accesses the VM image either
>>>> by someone doing operations inside the VM or
>>>> explicit stat on the file it should do the healing.
>>>>
>>>> Pranith.
>>>>
>>>>
>>>> On 08/05/2014 10:39 AM, Roman wrote:
>>>>> Hmmm, you told me to turn it off. Did I understood
>>>>> something wrong? After I issued the command you've
>>>>> sent me, I was not able to watch the healing
>>>>> process, it said, it won't be healed, becouse its
>>>>> turned off.
>>>>>
>>>>>
>>>>> 2014-08-05 5:39 GMT+03:00 Pranith Kumar Karampuri
>>>>> <pkarampu at redhat.com <mailto:pkarampu at redhat.com>>:
>>>>>
>>>>> You didn't mention anything about
>>>>> self-healing. Did you wait until the self-heal
>>>>> is complete?
>>>>>
>>>>> Pranith
>>>>>
>>>>> On 08/04/2014 05:49 PM, Roman wrote:
>>>>>> Hi!
>>>>>> Result is pretty same. I set the switch port
>>>>>> down for 1st server, it was ok. Then set it
>>>>>> up back and set other server's port off. and
>>>>>> it triggered IO error on two virtual
>>>>>> machines: one with local root FS but network
>>>>>> mounted storage. and other with network root
>>>>>> FS. 1st gave an error on copying to or from
>>>>>> the mounted network disk, other just gave me
>>>>>> an error for even reading log.files.
>>>>>>
>>>>>> cat: /var/log/alternatives.log: Input/output
>>>>>> error
>>>>>> then I reset the kvm VM and it said me, there
>>>>>> is no boot device. Next I virtually powered
>>>>>> it off and then back on and it has booted.
>>>>>>
>>>>>> By the way, did I have to start/stop volume?
>>>>>>
>>>>>> >> Could you do the following and test it again?
>>>>>> >> gluster volume set <volname>
>>>>>> cluster.self-heal-daemon off
>>>>>>
>>>>>> >>Pranith
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2014-08-04 14:10 GMT+03:00 Pranith Kumar
>>>>>> Karampuri <pkarampu at redhat.com
>>>>>> <mailto:pkarampu at redhat.com>>:
>>>>>>
>>>>>>
>>>>>> On 08/04/2014 03:33 PM, Roman wrote:
>>>>>>> Hello!
>>>>>>>
>>>>>>> Facing the same problem as mentioned here:
>>>>>>>
>>>>>>> http://supercolony.gluster.org/pipermail/gluster-users/2014-April/039959.html
>>>>>>>
>>>>>>> my set up is up and running, so i'm
>>>>>>> ready to help you back with feedback.
>>>>>>>
>>>>>>> setup:
>>>>>>> proxmox server as client
>>>>>>> 2 gluster physical servers
>>>>>>>
>>>>>>> server side and client side both running
>>>>>>> atm 3.4.4 glusterfs from gluster repo.
>>>>>>>
>>>>>>> the problem is:
>>>>>>>
>>>>>>> 1. craeted replica bricks.
>>>>>>> 2. mounted in proxmox (tried both promox
>>>>>>> ways: via GUI and fstab (with backup
>>>>>>> volume line), btw while mounting via
>>>>>>> fstab I'm unable to launch a VM without
>>>>>>> cache, meanwhile direct-io-mode is
>>>>>>> enabled in fstab line)
>>>>>>> 3. installed VM
>>>>>>> 4. bring one volume down - ok
>>>>>>> 5. bringing up, waiting for sync is done.
>>>>>>> 6. bring other volume down - getting IO
>>>>>>> errors on VM guest and not able to
>>>>>>> restore the VM after I reset the VM via
>>>>>>> host. It says (no bootable media). After
>>>>>>> I shut it down (forced) and bring back
>>>>>>> up, it boots.
>>>>>> Could you do the following and test it again?
>>>>>> gluster volume set <volname>
>>>>>> cluster.self-heal-daemon off
>>>>>>
>>>>>> Pranith
>>>>>>>
>>>>>>> Need help. Tried 3.4.3, 3.4.4.
>>>>>>> Still missing pkg-s for 3.4.5 for debian
>>>>>>> and 3.5.2 (3.5.1 always gives a healing
>>>>>>> error for some reason)
>>>>>>>
>>>>>>> --
>>>>>>> Best regards,
>>>>>>> Roman.
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Gluster-users mailing list
>>>>>>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>>>>>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best regards,
>>>>>> Roman.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best regards,
>>>>> Roman.
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best regards,
>>>> Roman.
>>>
>>>
>>>
>>>
>>> --
>>> Best regards,
>>> Roman.
>>
>>
>>
>>
>> --
>> Best regards,
>> Roman.
>
>
>
>
> --
> Best regards,
> Roman.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140805/1aa43c4a/attachment-0001.html>
More information about the Gluster-users
mailing list