[Gluster-devel] Geo-replication fills up inode by saving processed changelog files

Mon Dec 15 07:27:24 UTC 2014

On 12/15/2014 11:46 AM, Aravinda wrote:
> On 12/15/2014 11:21 AM, Vijay Bellur wrote:
>> On 12/15/2014 11:13 AM, Aravinda wrote:
>>> On 12/15/2014 10:39 AM, Vijay Bellur wrote:
>>>> On 12/11/2014 11:40 AM, Aravinda wrote:
>>>>> Hi,
>>>>>
>>>>>
>>>>> While geo-replication is running it keeps processed changelog files in
>>>>> $WORKING_DIR/.processed or $WORKING_DIR/.history/.processed. These
>>>>> changelog files are useful for debugging processed changelogs. But
>>>>> these
>>>>> changelogs eats up the space/available inodes. The changelog files
>>>>> saved
>>>>> in processed directory is duplicate of changelogs available in brick
>>>>> backend(Difference is in the format, changelogs in processed dir are
>>>>> parsed and human readable).
>>>>
>>>> Do we consume 2 inodes per changelog - one in $WORKING_DIR and another
>>>> in the brick? If yes, how do we avoid consuming inodes in the
>>>> filesystem that contains the brick?
>>> Changelog generated by changelog translator saved in brick. When a
>>> consumer is registers working dir through libgfchangelog api processes
>>> brick changelog and copies to working dir.
>>>>
>>
>> How are the changelogs in the bricks cleaned up?
> Changelogs in the bricks will not get cleaned up, these changelogs
> remains with data(even in snapshot).
> With these changelogs we can get historical changes whenever required,
> using changelog history API.

Can we archive even the changelogs in the bricks and make them available 
only when needed?

>>
>>
>>>>>
>>>>> How about keeping only the reference to changelog file after
>>>>> processed.
>>>>> For debugging their will be additional step to look for changelog from
>>>>> backend($BRICK/.glusterfs/changelogs) using this reference.
>>>>>
>>>>> After syncing data to slave(In geo-replication)
>>>>> echo $changelog_filename >> $WORKING_DIR/.processed_files
>>>>> rm $WORKING_DIR/.processing/$changelog_filename
>>>>>
>>>>> We need to modify `gf_changelog_done` and `gf_history_changelog_done`
>>>>> functions in
>>>>> libgfchangelog($GLUSTER_SRC/xlators/features/changelog/lib/src)
>>>>>
>>>>> Any thoughts?
>>>>
>>>> Archiving retired changelogs may be an option. What would be the
>>>> scenario when there are multiple changelog consumers (apart from
>>>> geo-replication)?
>>> Each consumer registers a working dir, so it will not affect other
>>> consumers if we archive. Only issue could be backend changelogs are not
>>> copied in sosreports,(may be difficult to debug without copy of
>>> changelog in working dir)
>>
>> sosreport plugin does pick up everything in working directory. If the
>> changelogs get archived in the working directory, this should not be a
>> problem right?
> Yeah, if we archive then it will be available in sosreports, But as I
> mentioned in my first mail, if I delete the changelog in working
> dir(after syncing to slave, in processed dir) by keeping only references
> in working dir then sosreport will not have it.

OK, seems like one more reason to not delete the changelogs in 
working_directory :).

-Vijay