[Gluster-users] libgfapi failover problem on replica bricks

Roman romeo.r at gmail.com
Tue Aug 5 07:20:57 UTC 2014


Sorry, the "gluster-users" fell out of the receivers list somehow, so I'm
replying to it with the full history.
I'm watching the mount's logfile with tail -f command and am not able to
see such logs... seems like for ever? What is the  optimal time for
self-heal to complete? The mount is almost empty, there is a stripped file
with VM image only.

The only logs I see are:

2014-08-05 07:12:03.808352] I [server-handshake.c:567:server_setvolume]
0-HA-fast-150G-PVE1-server: accepted client from
stor2-31563-2014/08/05-06:10:19:381800-HA-fast-150G-PVE1-client-0-0
(version: 3.4.4)
[2014-08-05 07:12:04.547935] I [server-handshake.c:567:server_setvolume]
0-HA-fast-150G-PVE1-server: accepted client from
sisemon-262292-2014/08/04-13:27:19:221777-HA-fast-150G-PVE1-client-0-0
(version: 3.4.4)
[2014-08-05 07:12:06.761596] I [server-handshake.c:567:server_setvolume]
0-HA-fast-150G-PVE1-server: accepted client from
pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
(version: 3.4.4)
[2014-08-05 07:12:09.151322] I [server-handshake.c:567:server_setvolume]
0-HA-fast-150G-PVE1-server: accepted client from
pve1-27476-2014/08/04-13:27:19:838805-HA-fast-150G-PVE1-client-0-0
(version: 3.4.4)



2014-08-05 10:13 GMT+03:00 Pranith Kumar Karampuri <pkarampu at redhat.com>:

>  I just responded to your earlier mail about how the log looks. The log
> comes on the mount's logfile
>
> Pranith
>
> On 08/05/2014 12:41 PM, Roman wrote:
>
> Ok, so I've waited enough, I think. Had no any traffic on switch ports
> between servers. Could not find any suitable log message about completed
> self-heal (waited about 30 minutes). Plugged out the other server's UTP
> cable this time and got in the same situation:
> root at gluster-test1:~# cat /var/log/dmesg
> -bash: /bin/cat: Input/output error
>
>  brick logs:
>  [2014-08-05 07:09:03.005474] I [server.c:762:server_rpc_notify]
> 0-HA-fast-150G-PVE1-server: disconnecting connectionfrom
> pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
> [2014-08-05 07:09:03.005530] I
> [server-helpers.c:729:server_connection_put] 0-HA-fast-150G-PVE1-server:
> Shutting down connection
> pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
> [2014-08-05 07:09:03.005560] I [server-helpers.c:463:do_fd_cleanup]
> 0-HA-fast-150G-PVE1-server: fd cleanup on /images/124/vm-124-disk-1.qcow2
> [2014-08-05 07:09:03.005797] I
> [server-helpers.c:617:server_connection_destroy]
> 0-HA-fast-150G-PVE1-server: destroyed connection of
> pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
>
>
>
>
>
> 2014-08-05 9:53 GMT+03:00 Pranith Kumar Karampuri <pkarampu at redhat.com>:
>
>>  Do you think it is possible for you to do these tests on the latest
>> version 3.5.2? 'gluster volume heal <volname> info' would give you that
>> information in versions > 3.5.1.
>> Otherwise you will have to check it from either the logs, there will be
>> self-heal completed message on the mount logs (or) by observing 'getfattr
>> -d -m. -e hex <image-file-on-bricks>'
>>
>> Pranith
>>
>>
>> On 08/05/2014 12:09 PM, Roman wrote:
>>
>> Ok, I understand. I will try this shortly.
>> How can I be sure, that healing process is done, if I am not able to see
>> its status?
>>
>>
>> 2014-08-05 9:30 GMT+03:00 Pranith Kumar Karampuri <pkarampu at redhat.com>:
>>
>>>  Mounts will do the healing, not the self-heal-daemon. The problem I
>>> feel is that whichever process does the healing has the latest information
>>> about the good bricks in this usecase. Since for VM usecase, mounts should
>>> have the latest information, we should let the mounts do the healing. If
>>> the mount accesses the VM image either by someone doing operations inside
>>> the VM or explicit stat on the file it should do the healing.
>>>
>>> Pranith.
>>>
>>>
>>> On 08/05/2014 10:39 AM, Roman wrote:
>>>
>>> Hmmm, you told me to turn it off. Did I understood something wrong?
>>> After I issued the command you've sent me, I was not able to watch the
>>> healing process, it said, it won't be healed, becouse its turned off.
>>>
>>>
>>> 2014-08-05 5:39 GMT+03:00 Pranith Kumar Karampuri <pkarampu at redhat.com>:
>>>
>>>>  You didn't mention anything about self-healing. Did you wait until the
>>>> self-heal is complete?
>>>>
>>>> Pranith
>>>>
>>>> On 08/04/2014 05:49 PM, Roman wrote:
>>>>
>>>>  Hi!
>>>> Result is pretty same. I set the switch port down for 1st server, it
>>>> was ok. Then set it up back and set other server's port off. and it
>>>> triggered IO error on two virtual machines: one with local root FS but
>>>> network mounted storage. and other with network root FS. 1st gave an error
>>>> on copying to or from the mounted network disk, other just gave me an error
>>>> for even reading log.files.
>>>>
>>>>  cat: /var/log/alternatives.log: Input/output error
>>>>  then I reset the kvm VM and it said me, there is no boot device. Next
>>>> I virtually powered it off and then back on and it has booted.
>>>>
>>>>  By the way, did I have to start/stop volume?
>>>>
>>>>  >> Could you do the following and test it again?
>>>> >> gluster volume set <volname> cluster.self-heal-daemon off
>>>>
>>>> >>Pranith
>>>>
>>>>
>>>>
>>>>
>>>> 2014-08-04 14:10 GMT+03:00 Pranith Kumar Karampuri <pkarampu at redhat.com
>>>> >:
>>>>
>>>>>
>>>>> On 08/04/2014 03:33 PM, Roman wrote:
>>>>>
>>>>>  Hello!
>>>>>
>>>>>  Facing the same problem as mentioned here:
>>>>>
>>>>>
>>>>> http://supercolony.gluster.org/pipermail/gluster-users/2014-April/039959.html
>>>>>
>>>>>  my set up is up and running, so i'm ready to help you back with
>>>>> feedback.
>>>>>
>>>>>  setup:
>>>>> proxmox server as client
>>>>>  2 gluster physical  servers
>>>>>
>>>>>  server side and client side both running atm 3.4.4 glusterfs from
>>>>> gluster repo.
>>>>>
>>>>>  the problem is:
>>>>>
>>>>>  1. craeted replica bricks.
>>>>> 2. mounted in proxmox (tried both promox ways: via GUI and fstab (with
>>>>> backup volume line), btw while mounting via fstab I'm unable to launch a VM
>>>>> without cache, meanwhile direct-io-mode is enabled in fstab line)
>>>>> 3. installed VM
>>>>> 4. bring one volume down - ok
>>>>>  5. bringing up, waiting for sync is done.
>>>>> 6. bring other volume down - getting IO errors on VM guest and not
>>>>> able to restore the VM after I reset the VM via host. It says (no bootable
>>>>> media). After I shut it down (forced) and bring back up, it boots.
>>>>>
>>>>>  Could you do the following and test it again?
>>>>> gluster volume set <volname> cluster.self-heal-daemon off
>>>>>
>>>>> Pranith
>>>>>
>>>>>
>>>>>  Need help. Tried 3.4.3, 3.4.4.
>>>>> Still missing pkg-s for 3.4.5 for debian and 3.5.2 (3.5.1 always gives
>>>>> a healing error for some reason)
>>>>>
>>>>>  --
>>>>> Best regards,
>>>>> Roman.
>>>>>
>>>>>
>>>>>  _______________________________________________
>>>>> Gluster-users mailing listGluster-users at gluster.orghttp://supercolony.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>  --
>>>> Best regards,
>>>> Roman.
>>>>
>>>>
>>>>
>>>
>>>
>>>  --
>>> Best regards,
>>> Roman.
>>>
>>>
>>>
>>
>>
>>  --
>> Best regards,
>> Roman.
>>
>>
>>
>
>
>  --
> Best regards,
> Roman.
>
>
>


-- 
Best regards,
Roman.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140805/81b04974/attachment.html>


More information about the Gluster-users mailing list