[Gluster-users] geo-replication breaks on CentOS 6.5 + gluster 3.6.0 beta3

Mon Oct 20 07:10:30 UTC 2014

On 18/10/14 12:46, James Payne wrote:
> Not in my particular use case which is where in Windows a new folder or file is created through explorer. The new folder is created by Windows with the name 'New Folder' which almost certainly the user will the rename. The same goes with newly created files in explorer.
>
> does this mean the issue shouldn't be there in a replicate only scenario?
Yes. The issue shouldn't be seen in pure replicate volume.

Best Regards,
Vishwanath

>
> Regards
> James
>
> --- Original Message ---
>
> From: "M S Vishwanath Bhat" <vbhat at redhat.com>
> Sent: 17 October 2014 20:53
> To: "Kingsley" <gluster at gluster.dogwind.com>, "James Payne" <jimqwerty4 at hotmail.com>
> Cc: gluster-users at gluster.org
> Subject: Re: [Gluster-users] geo-replication breaks on CentOS 6.5 +     gluster 3.6.0 beta3
>
> Hi,
>
> Right now, distributed-geo-rep has bunch of known issues with deletes
> and renames. Part of the issue was solved with a patch sent to upstream
> recently. But still it doesn't solve complete issue.
>
> So long story short, dist-geo-rep has still issues with short lived
> renames where the renamed files are hashed to different subvolume
> (bricks). If the renamed file is hashed to same brick then issue should
> not be seen (hopefully).
>
> Using volume set, we can force the renamed file to be hashed to same
> brick. "gluster volume set <volname> cluster.extra-hash-regex
> <regex_of_the_renamed_files>"
>
> For example if you open a file in vi, it will rename the file to
> filename.txt~, so the regex should be
> gluster volume set VOLNAME cluster.extra-hash-regex '^(.+)~$'
>
> But for this to work, the format of the files created by your
> application has to be identified. Does your application create files in
> a identifiable format which can be specified in a regex? Is this a
> possibility?
>
>
> Best Regards,
> Vishwanath
>
> On 15/10/14 15:41, Kingsley wrote:
>> I have added a comment to that bug report (a paste of my original
>> email).
>>
>> Cheers,
>> Kingsley.
>>
>> On Tue, 2014-10-14 at 22:10 +0100, James Payne wrote:
>>> Just adding that I have verified this as well with the 3.6 beta, I added a
>>> log to the ticket regarding this.
>>>
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1141379
>>>
>>> Please feel free to add to the bug report, I think we are seeing the same
>>> issue. It isn't present in the 3.4 series which in the one I'm testing
>>> currently. (no distributed geo rep though)
>>>
>>> Regards
>>> James
>>>
>>> -----Original Message-----
>>> From: Kingsley [mailto:gluster at gluster.dogwind.com]
>>> Sent: 13 October 2014 16:51
>>> To: gluster-users at gluster.org
>>> Subject: [Gluster-users] geo-replication breaks on CentOS 6.5 + gluster
>>> 3.6.0 beta3
>>>
>>> Hi,
>>>
>>> I have a small script to simulate file activity for an application we have.
>>> It breaks geo-replication within about 15 - 20 seconds when I try it.
>>>
>>> This is on a small Gluster test environment running in some VMs running
>>> CentOS 6.5 and using gluster 3.6.0 beta3. I have 6 VMs - test1, test2,
>>> test3, test4, test5 and test6. test1, test2 , test3 and test4 are gluster
>>> servers while test5 and test6 are the clients. test3 is actually not used in
>>> this test.
>>>
>>>
>>> Before the test, I had a single gluster volume as follows:
>>>
>>> test1# gluster volume status
>>> Status of volume: gv0
>>> Gluster process                                         Port    Online  Pid
>>> ----------------------------------------------------------------------------
>>> --
>>> Brick test1:/data/brick/gv0                             49168   Y
>>> 12017
>>> Brick test2:/data/brick/gv0                             49168   Y
>>> 11835
>>> NFS Server on localhost                                 2049    Y
>>> 12032
>>> Self-heal Daemon on localhost                           N/A     Y
>>> 12039
>>> NFS Server on test4                                     2049    Y       7934
>>> Self-heal Daemon on test4                               N/A     Y       7939
>>> NFS Server on test3                                     2049    Y
>>> 11768
>>> Self-heal Daemon on test3                               N/A     Y
>>> 11775
>>> NFS Server on test2                                     2049    Y
>>> 11849
>>> Self-heal Daemon on test2                               N/A     Y
>>> 11855
>>>
>>> Task Status of Volume gv0
>>> ----------------------------------------------------------------------------
>>> --
>>> There are no active volume tasks
>>>
>>>
>>> I created a new volume and set up geo-replication as follows (as these are
>>> test machines I only have one file system on each, hence using "force" to
>>> create the bricks in the root FS):
>>>
>>> test4# date ; gluster volume create gv0-slave test4:/data/brick/gv0-slave
>>> force; date Mon Oct 13 15:03:14 BST 2014 volume create: gv0-slave: success:
>>> please start the volume to access data Mon Oct 13 15:03:15 BST 2014
>>>
>>> test4# date ; gluster volume start gv0-slave; date Mon Oct 13 15:03:36 BST
>>> 2014 volume start: gv0-slave: success Mon Oct 13 15:03:39 BST 2014
>>>
>>> test4# date ; gluster volume geo-replication gv0 test4::gv0-slave create
>>> push-pem force ; date Mon Oct 13 15:05:59 BST 2014 Creating geo-replication
>>> session between gv0 & test4::gv0-slave has been successful Mon Oct 13
>>> 15:06:11 BST 2014
>>>
>>>
>>> I then mount volume gv0 on one of the client machines. I can create files
>>> within the gv0 volume and can see the changes being replicated to the
>>> gv0-slave volume, so I know that geo-replication is working at the start.
>>>
>>> When I run my script (which quickly creates, deletes and renames files),
>>> geo-replication breaks within a very short time. The test script output is
>>> in http://gluster.dogwind.com/files/georep20141013/test6_script-output.log
>>> (I interrupted the script once I saw that geo-replication was broken).
>>> Note that when it deletes a file, it renames any later-numbered file so that
>>> the file numbering remains sequential with no gaps; this simulates a real
>>> world application that we use.
>>>
>>> If you want a copy of the test script, it's here:
>>> http://gluster.dogwind.com/files/georep20141013/test_script.tar.gz
>>>
>>>
>>> The various gluster log files can be downloaded from here:
>>> http://gluster.dogwind.com/files/georep20141013/ - each log file has the
>>> actual log file path at the top of the file.
>>>
>>> If you want to run the test script on your own system, edit test.pl so that
>>> @mailstores contains a directory path to a gluster volume.
>>>
>>> My systems' timezone is BST (GMT+1 / UTC+1) so any timestamps outside of
>>> gluster logs are in this timezone.
>>>
>>> Let me know if you need any more info.
>>>
>>> --
>>> Cheers,
>>> Kingsley.
>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://supercolony.gluster.org/mailman/listinfo/gluster-users