[Gluster-users] Slef-heal still not finished after 2 days

Mon Jun 30 04:52:40 UTC 2014

On 06/30/2014 10:19 AM, John Gardeniers wrote:
> Hi Pranith,
>
> I suppose then that this fits into the "or replace" option. As long as
> the output can be reliable parsed for the count this will meet
> monitoring requirements. Thanks.
hi John,
Yes. Now the monitoring should happen based 'gluster volume heal 
<volname> info' instead of 'gluster volume heal <volname> info heal-failed'.
It also has the 'number of entries' filed which can be easily parsed to 
meet the needs.

Pranith
>
> regards,
> John
>
>
> On 30/06/14 14:45, Pranith Kumar Karampuri wrote:
>> On 06/30/2014 09:49 AM, John Gardeniers wrote:
>>> Hi Pranith,
>>>
>>> I strongly urge the team to reconsider this. It's akin to removing error
>>> messages for real error conditions, which of course makes no sense. If
>>> the command is causing confusing then the command should be fixed or
>>> replaced, not removed.
>>>
>>> May I suggest that rather than deprecate 'gluster volume heal <volname>
>>> info heal-failed', it would be vastly more sensible and useful to have
>>> it return only the current count, or perhaps only those within the last
>>> minute or so if that's the best that can be achieved. For monitoring we
>>> care only about 2 things but both are extremely important - Split-brain
>>> errors and heal-failed errors.
>>>
>>> Deprecating the command means we can only monitor half of the two
>>> potential disastrous problems, which means a system could be in a very
>>> poor state without the operator being aware of it. The repercussions are
>>> immense. e.g. An operator wants to take one server down for any reason
>>> and is unaware that the replica is not in a fit state to be used because
>>> there are multiple unhealed errors. The end result will be unreliable
>>> data and almost assured split-brain when the first server is brought
>>> back on-line, very possibly making *both* copies useless.
>> Nope. Operator will *definitely* know about it because 'gluster volume
>> heal <volname> info' shows those entries in the output.
>> Like I said, from 3.5.1 on wards it is showing only the files that
>> need self-heal, excluding ones where only I/O is going on. So
>> instead of showing constantly changing output it will show there are
>> '0' entries to be healed when heal is complete and both the bricks are
>> good.
>> That is when operator can go ahead with taking the brick down.
>> I agree that split-brain is an extremely important error. But heal
>> failures are generally transient.
>> If for some reason other than split-brain if the heal keeps on
>> failing. We can always look at the logs which have the *complete*
>> history/information of why the heal is failing.
>>
>> So we are not losing *any* information but we fix unnecessary panic.
>>
>> Let me know what you think about it.
>>
>> Pranith
>>> regards,
>>> John
>>>
>>>
>>> On 30/06/14 14:03, Pranith Kumar Karampuri wrote:
>>>> On 06/30/2014 09:17 AM, John Gardeniers wrote:
>>>>> Hi Pranith,
>>>>>
>>>>> On 30/06/14 13:37, Pranith Kumar Karampuri wrote:
>>>>>> On 06/30/2014 08:48 AM, John Gardeniers wrote:
>>>>>>> Hi again Pranith,
>>>>>>>
>>>>>>> On 30/06/14 11:58, Pranith Kumar Karampuri wrote:
>>>>>>>> Oops, I see you are the same user who posted about VM files
>>>>>>>> self-heal.
>>>>>>>> Sorry I couldn't get back in time. So you are using 3.4.2.
>>>>>>>> Could you post logfiles of mount, bricks please. That should
>>>>>>>> help us
>>>>>>>> to find more information about any issues.
>>>>>>>>
>>>>>>> When you say the log for the mount, which log is that? There are
>>>>>>> none
>>>>>>> that I can identify with the mount.
>>>>>>>
>>>>>>>> gluster volume heal <volname> info heal-failed records the last
>>>>>>>> 1024
>>>>>>>> failures. It also prints the timestamp of when the failures
>>>>>>>> occurred.
>>>>>>>> Even after the heal is successful it keeps showing the errors. So
>>>>>>>> timestamp of when the heal failed is important. Because some of
>>>>>>>> these
>>>>>>>> commands are causing such confusion we depracated these commands in
>>>>>>>> upcoming releases (3.6).
>>>>>>>>
>>>>>>> So far I've been focusing on the heal-failed count, which I fully,
>>>>>>> and I
>>>>>>> believe understandably, expect to show zero when there are no
>>>>>>> errors.
>>>>>>> Now that I look at the timestamps of those errors I realise they
>>>>>>> are all
>>>>>>> from *before* the slave brick was added back in. May I assume then
>>>>>>> that
>>>>>>> in reality there are no unhealed files? If this is correct, I must
>>>>>>> point
>>>>>>> out that if errors are reported when there are none that is a
>>>>>>> massive
>>>>>>> design flaw. It means things like nagios checks, such as the one we
>>>>>>> use,
>>>>>>> are useless. This makes monitoring near enough to impossible.
>>>>>> Exactly, that is why we deprecated it. The goal is to show only the
>>>>>> files that need to be healed, which is achieved in 3.5.1.
>>>>>> Just "gluster volume heal info". It shows the exact number of
>>>>>> files/directories that need to be healed. Once it becomes zero,
>>>>>> we know the healing is complete. But all of these are useful only
>>>>>> when
>>>>>> the brick is not erased. We still need to improve at monitoring
>>>>>> when the brick is erased and we trigger full volume self-heal using
>>>>>> "gluster volume heal <volname> full" like you did.
>>>>>>
>>>>>> Raised the following bug:
>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1114415 to address the
>>>>>> same.
>>>>>>
>>>>>> Thanks a lot for you inputs John. We shall fix these with priority.
>>>>>>
>>>>> In I run  "watch gluster volume heal gluster-rhev info" I get a
>>>>> constantly changing output similar to below, except the numbers and
>>>>> the
>>>>> files are changing. I believe this is normal, as it is what I have
>>>>> seen
>>>>> even when everything was running normally (before the problem
>>>>> started).
>>>>> This is also why the nagios check uses "gluster volume heal
>>>>> gluster-rhev
>>>>> info heal-failed". If that command is removed and not replaced with
>>>>> something else it removes any possibility of monitoring heal failures.
>>>> That generally is a good news. But we can still try to make sure that
>>>> all the heals are complete.
>>>> Most of the self-heal failures are intermittent, and subsequent
>>>> attempts of healing will be successful.
>>>> We found the only thing 'gluster volume heal <volname> info
>>>> heal-failed'
>>>> command achieved is un-necessary panic to the users. That is the main
>>>> reason we deprecated it.
>>>>
>>>> The main things we need is to know are files which need to be healed
>>>> and files which are in split-brain. For the rest
>>>> of the things we generally need more info in debugging, so logs are
>>>> preferable. That is going to be the monitoring
>>>> story going forward. We still need to improve on 'heal full'
>>>> monitoring story though. I hope things will progressively
>>>> improve.
>>>>
>>>> Pranith.
>>>>> Brick jupiter.om.net:/gluster_brick_1
>>>>> Number of entries: 11
>>>>> /44d30b24-1ed7-48a0-b905-818dc0a006a2/images/c9de61dc-286a-456a-bc3b-e2755ca5c8b3/ac3f2166-61af-4fc0-99c4-a76e9b63000e
>>>>>
>>>>>
>>>>> /44d30b24-1ed7-48a0-b905-818dc0a006a2/images/0483fad0-2dfc-4dbe-98d4-62dbdbe120f3/1d3812af-caa4-4d99-a598-04dfd78eb6cb
>>>>>
>>>>>
>>>>> /44d30b24-1ed7-48a0-b905-818dc0a006a2/images/bc3b165b-619a-4afa-9d5c-5ea599a99c61/2203cc21-db54-4f69-bbf1-4884a86c05d0
>>>>>
>>>>>
>>>>> /44d30b24-1ed7-48a0-b905-818dc0a006a2/images/dd344276-04ef-48d1-9953-02bee5cc3b87/786b1a84-bc9c-48c7-859b-844a383e47ec
>>>>>
>>>>>
>>>>> /44d30b24-1ed7-48a0-b905-818dc0a006a2/images/3d37825c-c798-421b-96ff-ce07128ee3ad/5119ad56-f0c9-4b3f-8f84-71e5f4b6b693
>>>>>
>>>>>
>>>>> /44d30b24-1ed7-48a0-b905-818dc0a006a2/images/ca843e93-447e-4155-83fc-e0d7586b4b50/215e0914-4def-4230-9f2a-a9ece61f2038
>>>>>
>>>>>
>>>>> /44d30b24-1ed7-48a0-b905-818dc0a006a2/images/51927b5f-2a3c-4c04-a90b-4500be0a526c/d14a842e-5fd9-4f6f-b08e-f5895b8b72fd
>>>>>
>>>>>
>>>>> /44d30b24-1ed7-48a0-b905-818dc0a006a2/images/f06bcb57-4a0a-446d-b71d-773595bb0e2f/4dc55bae-5881-4a04-9815-18cdeb8bcfc8
>>>>>
>>>>>
>>>>> /44d30b24-1ed7-48a0-b905-818dc0a006a2/images/cc11aab2-530e-4ffa-84ee-2989d39efeb8/49b2ff17-096b-45cf-a973-3d1466e16066
>>>>>
>>>>>
>>>>> /44d30b24-1ed7-48a0-b905-818dc0a006a2/images/903c3ef4-feaa-4262-9654-69ef118a43ce/8c6732d3-4ce8-422e-a74e-48151e7f7102
>>>>>
>>>>>
>>>>> /44d30b24-1ed7-48a0-b905-818dc0a006a2/images/66dcc584-36ce-43e6-8ce4-538fc6ff03d1/44192148-0708-4bd3-b8da-30baa85b89bf
>>>>>
>>>>>
>>>>>
>>>>> Brick nix.om.net:/gluster_brick_1
>>>>> Number of entries: 11
>>>>> /44d30b24-1ed7-48a0-b905-818dc0a006a2/images/29b3b7cf-7b21-44e7-bede-b86e12d2b69a/7fbce4ad-185b-42d8-a093-168560a3df89
>>>>>
>>>>>
>>>>> /44d30b24-1ed7-48a0-b905-818dc0a006a2/images/903c3ef4-feaa-4262-9654-69ef118a43ce/8c6732d3-4ce8-422e-a74e-48151e7f7102
>>>>>
>>>>>
>>>>> /44d30b24-1ed7-48a0-b905-818dc0a006a2/images/bc3b165b-619a-4afa-9d5c-5ea599a99c61/2203cc21-db54-4f69-bbf1-4884a86c05d0
>>>>>
>>>>>
>>>>> /44d30b24-1ed7-48a0-b905-818dc0a006a2/images/c9de61dc-286a-456a-bc3b-e2755ca5c8b3/ac3f2166-61af-4fc0-99c4-a76e9b63000e
>>>>>
>>>>>
>>>>> /44d30b24-1ed7-48a0-b905-818dc0a006a2/images/3d37825c-c798-421b-96ff-ce07128ee3ad/5119ad56-f0c9-4b3f-8f84-71e5f4b6b693
>>>>>
>>>>>
>>>>> /44d30b24-1ed7-48a0-b905-818dc0a006a2/images/66dcc584-36ce-43e6-8ce4-538fc6ff03d1/44192148-0708-4bd3-b8da-30baa85b89bf
>>>>>
>>>>>
>>>>> /44d30b24-1ed7-48a0-b905-818dc0a006a2/images/0483fad0-2dfc-4dbe-98d4-62dbdbe120f3/1d3812af-caa4-4d99-a598-04dfd78eb6cb
>>>>>
>>>>>
>>>>> /44d30b24-1ed7-48a0-b905-818dc0a006a2/images/dd344276-04ef-48d1-9953-02bee5cc3b87/786b1a84-bc9c-48c7-859b-844a383e47ec
>>>>>
>>>>>
>>>>> /44d30b24-1ed7-48a0-b905-818dc0a006a2/images/ca843e93-447e-4155-83fc-e0d7586b4b50/215e0914-4def-4230-9f2a-a9ece61f2038
>>>>>
>>>>>
>>>>> /44d30b24-1ed7-48a0-b905-818dc0a006a2/images/51927b5f-2a3c-4c04-a90b-4500be0a526c/d14a842e-5fd9-4f6f-b08e-f5895b8b72fd
>>>>>
>>>>>
>>>>> /44d30b24-1ed7-48a0-b905-818dc0a006a2/images/f06bcb57-4a0a-446d-b71d-773595bb0e2f/4dc55bae-5881-4a04-9815-18cdeb8bcfc8
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> Do you think it is possible for you to come to #gluster IRC on
>>>>>> freenode?
>>>>> I'll see what I can do. I've never used IRC before and first need to
>>>>> find out how. :)
>>>>>
>>>>>> Pranith
>>>>>>
>>>>>>>> This is probably a stupid question but let me ask it anyway. When a
>>>>>>>> brick contents are erased from backend
>>>>>>>> we need to make sure about the following two things:
>>>>>>>> 1) Extended attributes of the root brick is showing pending
>>>>>>>> operations
>>>>>>>> on the brick that is erased
>>>>>>>> 2) Execute "gluster volume heal <volname> full"
>>>>>>> 1) While gluster was stopped I merely did an rm -rf on both the data
>>>>>>> sub-directory and the .gluster sub-directory. How do I show that
>>>>>>> there
>>>>>>> are pending operations?
>>>>>>> 2) Yes, I did run that.
>>>>>>>
>>>>>>>> Did you do the steps above?
>>>>>>>>
>>>>>>>> Since you are on 3.4.2 I think best way to check what files are
>>>>>>>> healed
>>>>>>>> is using extended attributes in the backend. Could you please post
>>>>>>>> them again.
>>>>>>> I don't quite understand what you're asking for. I understand
>>>>>>> attributes
>>>>>>> as belonging to files and directories, not operations. Please
>>>>>>> elaborate.
>>>>>>>
>>>>>>>> Pranith
>>>>>>>>
>>>>>>>> On 06/30/2014 07:12 AM, Pranith Kumar Karampuri wrote:
>>>>>>>>> On 06/30/2014 04:03 AM, John Gardeniers wrote:
>>>>>>>>>> Hi All,
>>>>>>>>>>
>>>>>>>>>> We have 2 servers, each with on 5TB brick, configured as
>>>>>>>>>> replica 2.
>>>>>>>>>> After a series of events that caused the 2 bricks to become way
>>>>>>>>>> out of
>>>>>>>>>> step gluster was turned off on one server and its brick was
>>>>>>>>>> wiped of
>>>>>>>>>> everything but the attributes were untouched.
>>>>>>>>>>
>>>>>>>>>> This weekend we stopped the client and gluster and made a backup
>>>>>>>>>> of the
>>>>>>>>>> remaining brick, just to play safe. Gluster was then turned back
>>>>>>>>>> on,
>>>>>>>>>> first on the "master" and then on the "slave". Self-heal
>>>>>>>>>> kicked in
>>>>>>>>>> and
>>>>>>>>>> started rebuilding the second brick. However, after 2 full
>>>>>>>>>> days all
>>>>>>>>>> files in the volume are still showing heal failed errors.
>>>>>>>>>>
>>>>>>>>>> The rebuild was, in my opinion at least, very slow, taking most
>>>>>>>>>> of a
>>>>>>>>>> day
>>>>>>>>>> even though the system is on a 10Gb LAN. The data is a little
>>>>>>>>>> under
>>>>>>>>>> 1.4TB committed, 2TB allocated.
>>>>>>>>> How much more to be healed? 0.6TB?
>>>>>>>>>> Once the 2 bricks were very close to having the same amount of
>>>>>>>>>> space
>>>>>>>>>> used things slowed right down. For the last day both bricks
>>>>>>>>>> show a
>>>>>>>>>> very
>>>>>>>>>> slow increase in used space, even though there are no changes
>>>>>>>>>> being
>>>>>>>>>> written by the client. By slow I mean just a few KB per minute.
>>>>>>>>> Is the I/O still in progress on the mount? Self-heal doesn't
>>>>>>>>> happen
>>>>>>>>> on files where I/O is going on mounts in 3.4.x. So that could
>>>>>>>>> be the
>>>>>>>>> reason if I/O is going on.
>>>>>>>>>> The logs are confusing, to say the least. In
>>>>>>>>>> etc-glusterfs-glusterd.vol.log on both servers there are
>>>>>>>>>> thousands of
>>>>>>>>>> entries such as (possibly because I was using watch to monitor
>>>>>>>>>> self-heal
>>>>>>>>>> progress):
>>>>>>>>>>
>>>>>>>>>> [2014-06-29 21:41:11.289742] I
>>>>>>>>>> [glusterd-volume-ops.c:478:__glusterd_handle_cli_heal_volume]
>>>>>>>>>> 0-management: Received heal vol req for volume gluster-rhev
>>>>>>>>> What versoin of gluster are you using?
>>>>>>>>>> That timestamp is the latest on either server, that's about 9
>>>>>>>>>> hours ago
>>>>>>>>>> as I type this. I find that a bit disconcerting. I have requested
>>>>>>>>>> volume
>>>>>>>>>> heal-failed info since then.
>>>>>>>>>>
>>>>>>>>>> The brick log on the "master" server (the one from which we are
>>>>>>>>>> rebuilding the new brick) contains no entries since before the
>>>>>>>>>> rebuild
>>>>>>>>>> started.
>>>>>>>>>>
>>>>>>>>>> On the "slave" server the brick log shows a lot of entries
>>>>>>>>>> such as:
>>>>>>>>>>
>>>>>>>>>> [2014-06-28 08:49:47.887353] E
>>>>>>>>>> [marker.c:2140:marker_removexattr_cbk]
>>>>>>>>>> 0-gluster-rhev-marker: Numerical result out of range occurred
>>>>>>>>>> while
>>>>>>>>>> creating symlinks
>>>>>>>>>> [2014-06-28 08:49:47.887382] I
>>>>>>>>>> [server-rpc-fops.c:745:server_removexattr_cbk]
>>>>>>>>>> 0-gluster-rhev-server:
>>>>>>>>>> 10311315: REMOVEXATTR
>>>>>>>>>> /44d30b24-1ed7-48a0-b905-818dc0a006a2/images/02d4bd3c-b057-4f04-ada5-838f83d0b761/d962466d-1894-4716-b5d0-3a10979145ec
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> (1c1f53ac-afe2-420d-8c93-b1eb53ffe8b1) of key  ==> (Numerical
>>>>>>>>>> result
>>>>>>>>>> out
>>>>>>>>>> of range)
>>>>>>>>> CC Raghavendra who knows about marker translator.
>>>>>>>>>> Those entries are around the time the rebuild was starting. The
>>>>>>>>>> final
>>>>>>>>>> entries in that same log (immediately after those listed above)
>>>>>>>>>> are:
>>>>>>>>>>
>>>>>>>>>> [2014-06-29 12:47:28.473999] I
>>>>>>>>>> [server-rpc-fops.c:243:server_inodelk_cbk] 0-gluster-rhev-server:
>>>>>>>>>> 2869:
>>>>>>>>>> INODELK (null) (c67e9bbe-5956-4c61-b650-2cd5df4c4df0) ==> (No
>>>>>>>>>> such
>>>>>>>>>> file
>>>>>>>>>> or directory)
>>>>>>>>>> [2014-06-29 12:47:28.489527] I
>>>>>>>>>> [server-rpc-fops.c:1572:server_open_cbk]
>>>>>>>>>> 0-gluster-rhev-server: 2870: OPEN (null)
>>>>>>>>>> (c67e9bbe-5956-4c61-b650-2cd5df4c4df0) ==> (No such file or
>>>>>>>>>> directory)
>>>>>>>>> These logs are harmless and were fixed in 3.5 I think. Are you on
>>>>>>>>> 3.4.x?
>>>>>>>>>
>>>>>>>>>> As I type it's 2014-06-30 08:31.
>>>>>>>>>>
>>>>>>>>>> What do they mean and how can I rectify it?
>>>>>>>>>>
>>>>>>>>>> regards,
>>>>>>>>>> John
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>> Gluster-users at gluster.org
>>>>>>>>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>> _______________________________________________
>>>>>>>>> Gluster-users mailing list
>>>>>>>>> Gluster-users at gluster.org
>>>>>>>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>>>>>>> ______________________________________________________________________
>>>>>>>>
>>>>>>>>
>>>>>>>> This email has been scanned by the Symantec Email Security.cloud
>>>>>>>> service.
>>>>>>>> For more information please visit http://www.symanteccloud.com
>>>>>>>> ______________________________________________________________________
>>>>>>>>
>>>>>>>>
>>>>>> ______________________________________________________________________
>>>>>>
>>>>>> This email has been scanned by the Symantec Email Security.cloud
>>>>>> service.
>>>>>> For more information please visit http://www.symanteccloud.com
>>>>>> ______________________________________________________________________
>>>>>>
>>>> ______________________________________________________________________
>>>> This email has been scanned by the Symantec Email Security.cloud
>>>> service.
>>>> For more information please visit http://www.symanteccloud.com
>>>> ______________________________________________________________________
>>
>> ______________________________________________________________________
>> This email has been scanned by the Symantec Email Security.cloud service.
>> For more information please visit http://www.symanteccloud.com
>> ______________________________________________________________________