[Gluster-users] spli

Tiemen Ruiten t.ruiten at rdmedia.com
Wed Aug 6 19:47:01 UTC 2014


Hello,

I'm running into some serious problems with Gluster + CTDB and Samba. What
I have:

A two node replicated gluster cluster set up to share volumes using Samba
setup according to this guide:
https://download.gluster.org/pub/gluster/glusterfs/doc/Gluster_CTDB_setup.v1.pdf

When we edit or copy files into the volume via SMB (from a Windows client
accessing through a samba file share) this inevitably leads to a
split-brain scenario. For example:

gluster> volume heal fl-webroot info
Brick ankh.int.rdmedia.com:/export/glu/web/flash/webroot/
<gfid:0b162618-e46f-4921-92d0-c0fdb5290bf5>
<gfid:a259de7d-69fc-47bd-90e7-06a33b3e6cc8>
Number of entries: 2

Brick morpork.int.rdmedia.com:/export/glu/web/flash/webroot/
/LandingPage_Saturn_Production/images
/LandingPage_Saturn_Production
/LandingPage_Saturn_Production/Services/v2
/LandingPage_Saturn_Production/images/country/be
/LandingPage_Saturn_Production/bin
/LandingPage_Saturn_Production/Services
/LandingPage_Saturn_Production/images/generic
/LandingPage_Saturn_Production/aspnet_client/system_web
/LandingPage_Saturn_Production/images/country
/LandingPage_Saturn_Production/Scripts
/LandingPage_Saturn_Production/aspnet_client
/LandingPage_Saturn_Production/images/country/fr
Number of entries: 12

gluster> volume heal fl-webroot info
Brick ankh.int.rdmedia.com:/export/glu/web/flash/webroot/
<gfid:0b162618-e46f-4921-92d0-c0fdb5290bf5>
<gfid:a259de7d-69fc-47bd-90e7-06a33b3e6cc8>
Number of entries: 2

Brick morpork.int.rdmedia.com:/export/glu/web/flash/webroot/
/LandingPage_Saturn_Production/images
/LandingPage_Saturn_Production
/LandingPage_Saturn_Production/Services/v2
/LandingPage_Saturn_Production/images/country/be
/LandingPage_Saturn_Production/bin
/LandingPage_Saturn_Production/Services
/LandingPage_Saturn_Production/images/generic
/LandingPage_Saturn_Production/aspnet_client/system_web
/LandingPage_Saturn_Production/images/country
/LandingPage_Saturn_Production/Scripts
/LandingPage_Saturn_Production/aspnet_client
/LandingPage_Saturn_Production/images/country/fr



Sometimes self-heal works, sometimes it doesn't:

[2014-08-06 19:32:17.986790] E
[afr-self-heal-common.c:2868:afr_log_self_heal_completion_status]
0-fl-webroot-replicate-0:  entry self heal  failed,   on
/LandingPage_Saturn_Production/Services/v2
[2014-08-06 19:32:18.008330] W
[client-rpc-fops.c:2772:client3_3_lookup_cbk] 0-fl-webroot-client-0: remote
operation failed: No such file or directory. Path:
<gfid:a89d7a07-2e3d-41ee-adcc-cb2fba3d2282>
(a89d7a07-2e3d-41ee-adcc-cb2fba3d2282)
[2014-08-06 19:32:18.024057] I
[afr-self-heal-common.c:2868:afr_log_self_heal_completion_status]
0-fl-webroot-replicate-0:  gfid or missing entry self heal  is started,
metadata self heal  is successfully completed, backgroung data self heal
is successfully completed,  data self heal from fl-webroot-client-1  to
sinks  fl-webroot-client-0, with 0 bytes on fl-webroot-client-0, 168 bytes
on fl-webroot-client-1,  data - Pending matrix:  [ [ 0 0 ] [ 1 0 ] ]
metadata self heal from source fl-webroot-client-1 to fl-webroot-client-0,
metadata - Pending matrix:  [ [ 0 0 ] [ 2 0 ] ], on
/LandingPage_Saturn_Production/Services/v2/PartnerApiService.asmx

*More seriously, some files are simply missing on one of the nodes without
any error in the logs or notice when running gluster volume heal $volume
info.*

Of course I can provide any log file necessary.

-- 
Tiemen Ruiten
Systems Engineer
R&D Media
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140806/e1da3016/attachment.html>


More information about the Gluster-users mailing list