[Gluster-users] Slef-heal still not finished after 2 days

John Gardeniers jgardeniers at objectmastery.com
Mon Jun 30 04:49:36 UTC 2014


Hi Pranith,

I suppose then that this fits into the "or replace" option. As long as
the output can be reliable parsed for the count this will meet
monitoring requirements. Thanks.

regards,
John


On 30/06/14 14:45, Pranith Kumar Karampuri wrote:
>
> On 06/30/2014 09:49 AM, John Gardeniers wrote:
>> Hi Pranith,
>>
>> I strongly urge the team to reconsider this. It's akin to removing error
>> messages for real error conditions, which of course makes no sense. If
>> the command is causing confusing then the command should be fixed or
>> replaced, not removed.
>>
>> May I suggest that rather than deprecate 'gluster volume heal <volname>
>> info heal-failed', it would be vastly more sensible and useful to have
>> it return only the current count, or perhaps only those within the last
>> minute or so if that's the best that can be achieved. For monitoring we
>> care only about 2 things but both are extremely important - Split-brain
>> errors and heal-failed errors.
>>
>> Deprecating the command means we can only monitor half of the two
>> potential disastrous problems, which means a system could be in a very
>> poor state without the operator being aware of it. The repercussions are
>> immense. e.g. An operator wants to take one server down for any reason
>> and is unaware that the replica is not in a fit state to be used because
>> there are multiple unhealed errors. The end result will be unreliable
>> data and almost assured split-brain when the first server is brought
>> back on-line, very possibly making *both* copies useless.
>
> Nope. Operator will *definitely* know about it because 'gluster volume
> heal <volname> info' shows those entries in the output.
> Like I said, from 3.5.1 on wards it is showing only the files that
> need self-heal, excluding ones where only I/O is going on. So
> instead of showing constantly changing output it will show there are
> '0' entries to be healed when heal is complete and both the bricks are
> good.
> That is when operator can go ahead with taking the brick down.
> I agree that split-brain is an extremely important error. But heal
> failures are generally transient.
> If for some reason other than split-brain if the heal keeps on
> failing. We can always look at the logs which have the *complete*
> history/information of why the heal is failing.
>
> So we are not losing *any* information but we fix unnecessary panic.
>
> Let me know what you think about it.
>
> Pranith
>>
>> regards,
>> John
>>
>>
>> On 30/06/14 14:03, Pranith Kumar Karampuri wrote:
>>> On 06/30/2014 09:17 AM, John Gardeniers wrote:
>>>> Hi Pranith,
>>>>
>>>> On 30/06/14 13:37, Pranith Kumar Karampuri wrote:
>>>>> On 06/30/2014 08:48 AM, John Gardeniers wrote:
>>>>>> Hi again Pranith,
>>>>>>
>>>>>> On 30/06/14 11:58, Pranith Kumar Karampuri wrote:
>>>>>>> Oops, I see you are the same user who posted about VM files
>>>>>>> self-heal.
>>>>>>> Sorry I couldn't get back in time. So you are using 3.4.2.
>>>>>>> Could you post logfiles of mount, bricks please. That should
>>>>>>> help us
>>>>>>> to find more information about any issues.
>>>>>>>
>>>>>> When you say the log for the mount, which log is that? There are
>>>>>> none
>>>>>> that I can identify with the mount.
>>>>>>
>>>>>>> gluster volume heal <volname> info heal-failed records the last
>>>>>>> 1024
>>>>>>> failures. It also prints the timestamp of when the failures
>>>>>>> occurred.
>>>>>>> Even after the heal is successful it keeps showing the errors. So
>>>>>>> timestamp of when the heal failed is important. Because some of
>>>>>>> these
>>>>>>> commands are causing such confusion we depracated these commands in
>>>>>>> upcoming releases (3.6).
>>>>>>>
>>>>>> So far I've been focusing on the heal-failed count, which I fully,
>>>>>> and I
>>>>>> believe understandably, expect to show zero when there are no
>>>>>> errors.
>>>>>> Now that I look at the timestamps of those errors I realise they
>>>>>> are all
>>>>>> from *before* the slave brick was added back in. May I assume then
>>>>>> that
>>>>>> in reality there are no unhealed files? If this is correct, I must
>>>>>> point
>>>>>> out that if errors are reported when there are none that is a
>>>>>> massive
>>>>>> design flaw. It means things like nagios checks, such as the one we
>>>>>> use,
>>>>>> are useless. This makes monitoring near enough to impossible.
>>>>> Exactly, that is why we deprecated it. The goal is to show only the
>>>>> files that need to be healed, which is achieved in 3.5.1.
>>>>> Just "gluster volume heal info". It shows the exact number of
>>>>> files/directories that need to be healed. Once it becomes zero,
>>>>> we know the healing is complete. But all of these are useful only
>>>>> when
>>>>> the brick is not erased. We still need to improve at monitoring
>>>>> when the brick is erased and we trigger full volume self-heal using
>>>>> "gluster volume heal <volname> full" like you did.
>>>>>
>>>>> Raised the following bug:
>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1114415 to address the
>>>>> same.
>>>>>
>>>>> Thanks a lot for you inputs John. We shall fix these with priority.
>>>>>
>>>> In I run  "watch gluster volume heal gluster-rhev info" I get a
>>>> constantly changing output similar to below, except the numbers and
>>>> the
>>>> files are changing. I believe this is normal, as it is what I have
>>>> seen
>>>> even when everything was running normally (before the problem
>>>> started).
>>>> This is also why the nagios check uses "gluster volume heal
>>>> gluster-rhev
>>>> info heal-failed". If that command is removed and not replaced with
>>>> something else it removes any possibility of monitoring heal failures.
>>> That generally is a good news. But we can still try to make sure that
>>> all the heals are complete.
>>> Most of the self-heal failures are intermittent, and subsequent
>>> attempts of healing will be successful.
>>> We found the only thing 'gluster volume heal <volname> info
>>> heal-failed'
>>> command achieved is un-necessary panic to the users. That is the main
>>> reason we deprecated it.
>>>
>>> The main things we need is to know are files which need to be healed
>>> and files which are in split-brain. For the rest
>>> of the things we generally need more info in debugging, so logs are
>>> preferable. That is going to be the monitoring
>>> story going forward. We still need to improve on 'heal full'
>>> monitoring story though. I hope things will progressively
>>> improve.
>>>
>>> Pranith.
>>>> Brick jupiter.om.net:/gluster_brick_1
>>>> Number of entries: 11
>>>> /44d30b24-1ed7-48a0-b905-818dc0a006a2/images/c9de61dc-286a-456a-bc3b-e2755ca5c8b3/ac3f2166-61af-4fc0-99c4-a76e9b63000e
>>>>
>>>>
>>>> /44d30b24-1ed7-48a0-b905-818dc0a006a2/images/0483fad0-2dfc-4dbe-98d4-62dbdbe120f3/1d3812af-caa4-4d99-a598-04dfd78eb6cb
>>>>
>>>>
>>>> /44d30b24-1ed7-48a0-b905-818dc0a006a2/images/bc3b165b-619a-4afa-9d5c-5ea599a99c61/2203cc21-db54-4f69-bbf1-4884a86c05d0
>>>>
>>>>
>>>> /44d30b24-1ed7-48a0-b905-818dc0a006a2/images/dd344276-04ef-48d1-9953-02bee5cc3b87/786b1a84-bc9c-48c7-859b-844a383e47ec
>>>>
>>>>
>>>> /44d30b24-1ed7-48a0-b905-818dc0a006a2/images/3d37825c-c798-421b-96ff-ce07128ee3ad/5119ad56-f0c9-4b3f-8f84-71e5f4b6b693
>>>>
>>>>
>>>> /44d30b24-1ed7-48a0-b905-818dc0a006a2/images/ca843e93-447e-4155-83fc-e0d7586b4b50/215e0914-4def-4230-9f2a-a9ece61f2038
>>>>
>>>>
>>>> /44d30b24-1ed7-48a0-b905-818dc0a006a2/images/51927b5f-2a3c-4c04-a90b-4500be0a526c/d14a842e-5fd9-4f6f-b08e-f5895b8b72fd
>>>>
>>>>
>>>> /44d30b24-1ed7-48a0-b905-818dc0a006a2/images/f06bcb57-4a0a-446d-b71d-773595bb0e2f/4dc55bae-5881-4a04-9815-18cdeb8bcfc8
>>>>
>>>>
>>>> /44d30b24-1ed7-48a0-b905-818dc0a006a2/images/cc11aab2-530e-4ffa-84ee-2989d39efeb8/49b2ff17-096b-45cf-a973-3d1466e16066
>>>>
>>>>
>>>> /44d30b24-1ed7-48a0-b905-818dc0a006a2/images/903c3ef4-feaa-4262-9654-69ef118a43ce/8c6732d3-4ce8-422e-a74e-48151e7f7102
>>>>
>>>>
>>>> /44d30b24-1ed7-48a0-b905-818dc0a006a2/images/66dcc584-36ce-43e6-8ce4-538fc6ff03d1/44192148-0708-4bd3-b8da-30baa85b89bf
>>>>
>>>>
>>>>
>>>> Brick nix.om.net:/gluster_brick_1
>>>> Number of entries: 11
>>>> /44d30b24-1ed7-48a0-b905-818dc0a006a2/images/29b3b7cf-7b21-44e7-bede-b86e12d2b69a/7fbce4ad-185b-42d8-a093-168560a3df89
>>>>
>>>>
>>>> /44d30b24-1ed7-48a0-b905-818dc0a006a2/images/903c3ef4-feaa-4262-9654-69ef118a43ce/8c6732d3-4ce8-422e-a74e-48151e7f7102
>>>>
>>>>
>>>> /44d30b24-1ed7-48a0-b905-818dc0a006a2/images/bc3b165b-619a-4afa-9d5c-5ea599a99c61/2203cc21-db54-4f69-bbf1-4884a86c05d0
>>>>
>>>>
>>>> /44d30b24-1ed7-48a0-b905-818dc0a006a2/images/c9de61dc-286a-456a-bc3b-e2755ca5c8b3/ac3f2166-61af-4fc0-99c4-a76e9b63000e
>>>>
>>>>
>>>> /44d30b24-1ed7-48a0-b905-818dc0a006a2/images/3d37825c-c798-421b-96ff-ce07128ee3ad/5119ad56-f0c9-4b3f-8f84-71e5f4b6b693
>>>>
>>>>
>>>> /44d30b24-1ed7-48a0-b905-818dc0a006a2/images/66dcc584-36ce-43e6-8ce4-538fc6ff03d1/44192148-0708-4bd3-b8da-30baa85b89bf
>>>>
>>>>
>>>> /44d30b24-1ed7-48a0-b905-818dc0a006a2/images/0483fad0-2dfc-4dbe-98d4-62dbdbe120f3/1d3812af-caa4-4d99-a598-04dfd78eb6cb
>>>>
>>>>
>>>> /44d30b24-1ed7-48a0-b905-818dc0a006a2/images/dd344276-04ef-48d1-9953-02bee5cc3b87/786b1a84-bc9c-48c7-859b-844a383e47ec
>>>>
>>>>
>>>> /44d30b24-1ed7-48a0-b905-818dc0a006a2/images/ca843e93-447e-4155-83fc-e0d7586b4b50/215e0914-4def-4230-9f2a-a9ece61f2038
>>>>
>>>>
>>>> /44d30b24-1ed7-48a0-b905-818dc0a006a2/images/51927b5f-2a3c-4c04-a90b-4500be0a526c/d14a842e-5fd9-4f6f-b08e-f5895b8b72fd
>>>>
>>>>
>>>> /44d30b24-1ed7-48a0-b905-818dc0a006a2/images/f06bcb57-4a0a-446d-b71d-773595bb0e2f/4dc55bae-5881-4a04-9815-18cdeb8bcfc8
>>>>
>>>>
>>>>
>>>>
>>>>> Do you think it is possible for you to come to #gluster IRC on
>>>>> freenode?
>>>> I'll see what I can do. I've never used IRC before and first need to
>>>> find out how. :)
>>>>
>>>>> Pranith
>>>>>
>>>>>>> This is probably a stupid question but let me ask it anyway. When a
>>>>>>> brick contents are erased from backend
>>>>>>> we need to make sure about the following two things:
>>>>>>> 1) Extended attributes of the root brick is showing pending
>>>>>>> operations
>>>>>>> on the brick that is erased
>>>>>>> 2) Execute "gluster volume heal <volname> full"
>>>>>> 1) While gluster was stopped I merely did an rm -rf on both the data
>>>>>> sub-directory and the .gluster sub-directory. How do I show that
>>>>>> there
>>>>>> are pending operations?
>>>>>> 2) Yes, I did run that.
>>>>>>
>>>>>>> Did you do the steps above?
>>>>>>>
>>>>>>> Since you are on 3.4.2 I think best way to check what files are
>>>>>>> healed
>>>>>>> is using extended attributes in the backend. Could you please post
>>>>>>> them again.
>>>>>> I don't quite understand what you're asking for. I understand
>>>>>> attributes
>>>>>> as belonging to files and directories, not operations. Please
>>>>>> elaborate.
>>>>>>
>>>>>>> Pranith
>>>>>>>
>>>>>>> On 06/30/2014 07:12 AM, Pranith Kumar Karampuri wrote:
>>>>>>>> On 06/30/2014 04:03 AM, John Gardeniers wrote:
>>>>>>>>> Hi All,
>>>>>>>>>
>>>>>>>>> We have 2 servers, each with on 5TB brick, configured as
>>>>>>>>> replica 2.
>>>>>>>>> After a series of events that caused the 2 bricks to become way
>>>>>>>>> out of
>>>>>>>>> step gluster was turned off on one server and its brick was
>>>>>>>>> wiped of
>>>>>>>>> everything but the attributes were untouched.
>>>>>>>>>
>>>>>>>>> This weekend we stopped the client and gluster and made a backup
>>>>>>>>> of the
>>>>>>>>> remaining brick, just to play safe. Gluster was then turned back
>>>>>>>>> on,
>>>>>>>>> first on the "master" and then on the "slave". Self-heal
>>>>>>>>> kicked in
>>>>>>>>> and
>>>>>>>>> started rebuilding the second brick. However, after 2 full
>>>>>>>>> days all
>>>>>>>>> files in the volume are still showing heal failed errors.
>>>>>>>>>
>>>>>>>>> The rebuild was, in my opinion at least, very slow, taking most
>>>>>>>>> of a
>>>>>>>>> day
>>>>>>>>> even though the system is on a 10Gb LAN. The data is a little
>>>>>>>>> under
>>>>>>>>> 1.4TB committed, 2TB allocated.
>>>>>>>> How much more to be healed? 0.6TB?
>>>>>>>>> Once the 2 bricks were very close to having the same amount of
>>>>>>>>> space
>>>>>>>>> used things slowed right down. For the last day both bricks
>>>>>>>>> show a
>>>>>>>>> very
>>>>>>>>> slow increase in used space, even though there are no changes
>>>>>>>>> being
>>>>>>>>> written by the client. By slow I mean just a few KB per minute.
>>>>>>>> Is the I/O still in progress on the mount? Self-heal doesn't
>>>>>>>> happen
>>>>>>>> on files where I/O is going on mounts in 3.4.x. So that could
>>>>>>>> be the
>>>>>>>> reason if I/O is going on.
>>>>>>>>> The logs are confusing, to say the least. In
>>>>>>>>> etc-glusterfs-glusterd.vol.log on both servers there are
>>>>>>>>> thousands of
>>>>>>>>> entries such as (possibly because I was using watch to monitor
>>>>>>>>> self-heal
>>>>>>>>> progress):
>>>>>>>>>
>>>>>>>>> [2014-06-29 21:41:11.289742] I
>>>>>>>>> [glusterd-volume-ops.c:478:__glusterd_handle_cli_heal_volume]
>>>>>>>>> 0-management: Received heal vol req for volume gluster-rhev
>>>>>>>> What versoin of gluster are you using?
>>>>>>>>> That timestamp is the latest on either server, that's about 9
>>>>>>>>> hours ago
>>>>>>>>> as I type this. I find that a bit disconcerting. I have requested
>>>>>>>>> volume
>>>>>>>>> heal-failed info since then.
>>>>>>>>>
>>>>>>>>> The brick log on the "master" server (the one from which we are
>>>>>>>>> rebuilding the new brick) contains no entries since before the
>>>>>>>>> rebuild
>>>>>>>>> started.
>>>>>>>>>
>>>>>>>>> On the "slave" server the brick log shows a lot of entries
>>>>>>>>> such as:
>>>>>>>>>
>>>>>>>>> [2014-06-28 08:49:47.887353] E
>>>>>>>>> [marker.c:2140:marker_removexattr_cbk]
>>>>>>>>> 0-gluster-rhev-marker: Numerical result out of range occurred
>>>>>>>>> while
>>>>>>>>> creating symlinks
>>>>>>>>> [2014-06-28 08:49:47.887382] I
>>>>>>>>> [server-rpc-fops.c:745:server_removexattr_cbk]
>>>>>>>>> 0-gluster-rhev-server:
>>>>>>>>> 10311315: REMOVEXATTR
>>>>>>>>> /44d30b24-1ed7-48a0-b905-818dc0a006a2/images/02d4bd3c-b057-4f04-ada5-838f83d0b761/d962466d-1894-4716-b5d0-3a10979145ec
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> (1c1f53ac-afe2-420d-8c93-b1eb53ffe8b1) of key  ==> (Numerical
>>>>>>>>> result
>>>>>>>>> out
>>>>>>>>> of range)
>>>>>>>> CC Raghavendra who knows about marker translator.
>>>>>>>>> Those entries are around the time the rebuild was starting. The
>>>>>>>>> final
>>>>>>>>> entries in that same log (immediately after those listed above)
>>>>>>>>> are:
>>>>>>>>>
>>>>>>>>> [2014-06-29 12:47:28.473999] I
>>>>>>>>> [server-rpc-fops.c:243:server_inodelk_cbk] 0-gluster-rhev-server:
>>>>>>>>> 2869:
>>>>>>>>> INODELK (null) (c67e9bbe-5956-4c61-b650-2cd5df4c4df0) ==> (No
>>>>>>>>> such
>>>>>>>>> file
>>>>>>>>> or directory)
>>>>>>>>> [2014-06-29 12:47:28.489527] I
>>>>>>>>> [server-rpc-fops.c:1572:server_open_cbk]
>>>>>>>>> 0-gluster-rhev-server: 2870: OPEN (null)
>>>>>>>>> (c67e9bbe-5956-4c61-b650-2cd5df4c4df0) ==> (No such file or
>>>>>>>>> directory)
>>>>>>>> These logs are harmless and were fixed in 3.5 I think. Are you on
>>>>>>>> 3.4.x?
>>>>>>>>
>>>>>>>>> As I type it's 2014-06-30 08:31.
>>>>>>>>>
>>>>>>>>> What do they mean and how can I rectify it?
>>>>>>>>>
>>>>>>>>> regards,
>>>>>>>>> John
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Gluster-users mailing list
>>>>>>>>> Gluster-users at gluster.org
>>>>>>>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>>>>>>> _______________________________________________
>>>>>>>> Gluster-users mailing list
>>>>>>>> Gluster-users at gluster.org
>>>>>>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>>>>>> ______________________________________________________________________
>>>>>>>
>>>>>>>
>>>>>>> This email has been scanned by the Symantec Email Security.cloud
>>>>>>> service.
>>>>>>> For more information please visit http://www.symanteccloud.com
>>>>>>> ______________________________________________________________________
>>>>>>>
>>>>>>>
>>>>> ______________________________________________________________________
>>>>>
>>>>> This email has been scanned by the Symantec Email Security.cloud
>>>>> service.
>>>>> For more information please visit http://www.symanteccloud.com
>>>>> ______________________________________________________________________
>>>>>
>>>
>>> ______________________________________________________________________
>>> This email has been scanned by the Symantec Email Security.cloud
>>> service.
>>> For more information please visit http://www.symanteccloud.com
>>> ______________________________________________________________________
>
>
> ______________________________________________________________________
> This email has been scanned by the Symantec Email Security.cloud service.
> For more information please visit http://www.symanteccloud.com
> ______________________________________________________________________




More information about the Gluster-users mailing list