[Gluster-users] frequent split-brain with Gluster + Samba + Win client

Thu Aug 7 10:01:06 UTC 2014

On 08/07/2014 03:23 PM, Pranith Kumar Karampuri wrote:
>
> On 08/07/2014 03:18 PM, Tiemen Ruiten wrote:
>> Hello Pranith,
>>
>> Thanks for your reply. I'm using 3.5.2.
>>
>> Is it possible that Windows doesn't release the files after a write 
>> happens?
>>
>> Because the self-heal often never occurs. Just this morning we 
>> discovered that when a web server read from the other node, some 
>> files that had been changed days ago still had content from before 
>> the edit.
>>
>> How can I ensure that everything syncs reliably and consistently when 
>> mounting from SMB? Is Samba VFS more reliable in this respect?
> It should happen automatically. Even the mount *must* serve reads from 
> good copy. In what scenario did you observe that the reads are served 
> from stale brick?
> Could you give 'getfattr -d -m. -e hex <path-of-file-on-brick>' output 
> from both the bricks?
Sorry I was not clear here. Please give the output of the above commands 
for the file where you observed 'stale read'

Pranith
>
> Is it possible to provide self-heal-daemon logs so that we can inspect 
> what is happening?
>
> Pranith
>>
>> Tiemen
>>
>> On 7 August 2014 03:14, Pranith Kumar Karampuri <pkarampu at redhat.com 
>> <mailto:pkarampu at redhat.com>> wrote:
>>
>>     hi Tiemen,
>>     From the logs you have pasted, it doesn't seem there are any
>>     split-brains. It is just performing self-heals. What version of
>>     glusterfs are you using? Self-heals sometimes don't happen if the
>>     data operations from mount are in progress because it tries to
>>     give that more priority. Missing files should be created once the
>>     self-heal completes on the parent directory of those files.
>>
>>     Pranith
>>
>>
>>     On 08/07/2014 01:40 AM, Tiemen Ruiten wrote:
>>>     Sorry, I seem to have messed up the subject.
>>>
>>>     I should add, I'm mounting these volumes through GlusterFS FUSE,
>>>     not the Samba VFS plugin.
>>>
>>>     On 06-08-14 21:47, Tiemen Ruiten wrote:
>>>>     Hello,
>>>>
>>>>     I'm running into some serious problems with Gluster + CTDB and
>>>>     Samba. What I have:
>>>>
>>>>     A two node replicated gluster cluster set up to share volumes
>>>>     using Samba setup according to this guide:
>>>>     https://download.gluster.org/pub/gluster/glusterfs/doc/Gluster_CTDB_setup.v1.pdf
>>>>
>>>>     When we edit or copy files into the volume via SMB (from a
>>>>     Windows client accessing through a samba file share) this
>>>>     inevitably leads to a split-brain scenario. For example:
>>>>
>>>>     gluster> volume heal fl-webroot info
>>>>     Brick ankh.int.rdmedia.com:/export/glu/web/flash/webroot/
>>>>     <gfid:0b162618-e46f-4921-92d0-c0fdb5290bf5>
>>>>     <gfid:a259de7d-69fc-47bd-90e7-06a33b3e6cc8>
>>>>     Number of entries: 2
>>>>
>>>>     Brick morpork.int.rdmedia.com:/export/glu/web/flash/webroot/
>>>>     /LandingPage_Saturn_Production/images
>>>>     /LandingPage_Saturn_Production
>>>>     /LandingPage_Saturn_Production/Services/v2
>>>>     /LandingPage_Saturn_Production/images/country/be
>>>>     /LandingPage_Saturn_Production/bin
>>>>     /LandingPage_Saturn_Production/Services
>>>>     /LandingPage_Saturn_Production/images/generic
>>>>     /LandingPage_Saturn_Production/aspnet_client/system_web
>>>>     /LandingPage_Saturn_Production/images/country
>>>>     /LandingPage_Saturn_Production/Scripts
>>>>     /LandingPage_Saturn_Production/aspnet_client
>>>>     /LandingPage_Saturn_Production/images/country/fr
>>>>     Number of entries: 12
>>>>
>>>>     gluster> volume heal fl-webroot info
>>>>     Brick ankh.int.rdmedia.com:/export/glu/web/flash/webroot/
>>>>     <gfid:0b162618-e46f-4921-92d0-c0fdb5290bf5>
>>>>     <gfid:a259de7d-69fc-47bd-90e7-06a33b3e6cc8>
>>>>     Number of entries: 2
>>>>
>>>>     Brick morpork.int.rdmedia.com:/export/glu/web/flash/webroot/
>>>>     /LandingPage_Saturn_Production/images
>>>>     /LandingPage_Saturn_Production
>>>>     /LandingPage_Saturn_Production/Services/v2
>>>>     /LandingPage_Saturn_Production/images/country/be
>>>>     /LandingPage_Saturn_Production/bin
>>>>     /LandingPage_Saturn_Production/Services
>>>>     /LandingPage_Saturn_Production/images/generic
>>>>     /LandingPage_Saturn_Production/aspnet_client/system_web
>>>>     /LandingPage_Saturn_Production/images/country
>>>>     /LandingPage_Saturn_Production/Scripts
>>>>     /LandingPage_Saturn_Production/aspnet_client
>>>>     /LandingPage_Saturn_Production/images/country/fr
>>>>
>>>>
>>>>
>>>>     Sometimes self-heal works, sometimes it doesn't:
>>>>
>>>>     [2014-08-06 19:32:17.986790] E
>>>>     [afr-self-heal-common.c:2868:afr_log_self_heal_completion_status]
>>>>     0-fl-webroot-replicate-0:  entry self heal  failed,   on
>>>>     /LandingPage_Saturn_Production/Services/v2
>>>>     [2014-08-06 19:32:18.008330] W
>>>>     [client-rpc-fops.c:2772:client3_3_lookup_cbk]
>>>>     0-fl-webroot-client-0: remote operation failed: No such file or
>>>>     directory. Path: <gfid:a89d7a07-2e3d-41ee-adcc-cb2fba3d2282>
>>>>     (a89d7a07-2e3d-41ee-adcc-cb2fba3d2282)
>>>>     [2014-08-06 19:32:18.024057] I
>>>>     [afr-self-heal-common.c:2868:afr_log_self_heal_completion_status]
>>>>     0-fl-webroot-replicate-0:  gfid or missing entry self heal  is
>>>>     started, metadata self heal  is successfully completed,
>>>>     backgroung data self heal  is successfully completed,  data
>>>>     self heal from fl-webroot-client-1  to sinks
>>>>     fl-webroot-client-0, with 0 bytes on fl-webroot-client-0, 168
>>>>     bytes on fl-webroot-client-1,  data - Pending matrix:  [ [ 0 0
>>>>     ] [ 1 0 ] ]  metadata self heal from source fl-webroot-client-1
>>>>     to fl-webroot-client-0,  metadata - Pending matrix:  [ [ 0 0 ]
>>>>     [ 2 0 ] ], on
>>>>     /LandingPage_Saturn_Production/Services/v2/PartnerApiService.asmx
>>>>
>>>>     *More seriously, some files are simply missing on one of the
>>>>     nodes without any error in the logs or notice when running
>>>>     gluster volume heal $volume info.*
>>>>
>>>>     Of course I can provide any log file necessary.
>>
>>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140807/4ddbeb18/attachment.html>