[Gluster-users] Data consistency with Gluster 3.2.5
Sean Fulton
sean at gcnpublishing.com
Mon Mar 12 13:56:51 UTC 2012
I have set up a replicated, four-node gluster config for a web farm. The
idea is that each web node is its own Gluster server, and will have its
own copy of the entire web root locally. It then serves the cluster to
itself via a mount. We're running it over dual GigE NICs bonded.
The problem I am having is when we switch live traffic to nodes in the
cluster, they almost immediately get out of sync. The issue seems to be
with cache files that are read/written a lot. Here is an excerpt
pointing to issues with our OpenX banner cache:
[2012-02-25 18:53:04.198326] E
[afr-self-heal-common.c:2074:afr_self_heal_completion_cbk]
0-web-pub-replicate-0: background meta-data data missing-entry
self-heal failed on
/cust/site1/www/openx/var/cache/deliverycache_f8e7a8862cb80b4933c58acdf65aaef5.php
[2012-02-25 18:53:04.199191] W
[afr-common.c:1121:afr_conflicting_iattrs] 0-web-pub-replicate-0:
/cust/site1/www/openx/var/cache/deliverycache_f8e7a8862cb80b4933c58acdf65aaef5.php:
gfid differs on subvolume 0 (53fa373a-3830-4c5e-aa22-6ed35c947d97,
c12e0cdd-9b6c-4988-b793-819db0472780)
[2012-02-25 18:53:04.199210] W
[afr-common.c:1121:afr_conflicting_iattrs] 0-web-pub-replicate-0:
/cust/site1/www/openx/var/cache/deliverycache_f8e7a8862cb80b4933c58acdf65aaef5.php:
gfid differs on subvolume 0 (53fa373a-3830-4c5e-aa22-6ed35c947d97,
c12e0cdd-9b6c-4988-b793-819db0472780)
[2012-02-25 18:53:04.199219] W
[afr-common.c:882:afr_detect_self_heal_by_iatt] 0-web-pub-replicate-0:
/cust/site1/www/openx/var/cache/deliverycache_f8e7a8862cb80b4933c58acdf65aaef5.php:
gfid different on subvolume
[2012-02-25 18:53:04.199236] I [afr-common.c:1038:afr_launch_self_heal]
0-web-pub-replicate-0: background meta-data data missing-entry
self-heal triggered. path:
/cust/site1/www/openx/var/cache/deliverycache_f8e7a8862cb80b4933c58acdf65aaef5.php
[2012-02-25 18:53:04.200752] W
[afr-common.c:1121:afr_conflicting_iattrs] 0-web-pub-replicate-0:
/cust/site1/www/openx/var/cache/deliverycache_f8e7a8862cb80b4933c58acdf65aaef5.php:
gfid differs on subvolume 0 (53fa373a-3830-4c5e-aa22-6ed35c947d97,
c12e0cdd-9b6c-4988-b793-819db0472780)
[2012-02-25 18:53:04.200971] I
[afr-self-heal-common.c:963:afr_sh_missing_entries_done]
0-web-pub-replicate-0: split brain found, aborting selfheal of
/cust/site1/www/openx/var/cache/deliverycache_f8e7a8862cb80b4933c58acdf65aaef5.php
[2012-02-25 18:53:04.200986] E
[afr-self-heal-common.c:2074:afr_self_heal_completion_cbk]
0-web-pub-replicate-0: background meta-data data missing-entry
self-heal failed on
/cust/site1/www/openx/var/cache/deliverycache_f8e7a8862cb80b4933c58acdf65aaef5.php
[2012-02-25 18:53:04.202159] W
[afr-common.c:1121:afr_conflicting_iattrs] 0-web-pub-replicate-0:
/cust/site1/www/openx/var/cache/deliverycache_f901ff39b456df599289c590ed89b19d.php:
gfid differs on subvolume 1 (375e1754-0420-4e26-9176-bb2128c6596b,
3e9eca35-3351-450e-b8ab-c62785968953)
[2012-02-25 18:53:04.202178] W
[afr-common.c:1121:afr_conflicting_iattrs] 0-web-pub-replicate-0:
/cust/site1/www/openx/var/cache/deliverycache_f901ff39b456df599289c590ed89b19d.php:
gfid differs on subvolume 1 (375e1754-0420-4e26-9176-bb2128c6596b,
3e9eca35-3351-450e-b8ab-c62785968953)
[2012-02-25 18:53:04.202188] W
[afr-common.c:882:afr_detect_self_heal_by_iatt] 0-web-pub-replicate-0:
/cust/site1/www/openx/var/cache/deliverycache_f901ff39b456df599289c590ed89b19d.php:
gfid different on subvolume
[2012-02-25 18:53:04.202204] I [afr-common.c:1038:afr_launch_self_heal]
0-web-pub-replicate-0: background meta-data data missing-entry
self-heal triggered. path:
/cust/site1/www/openx/var/cache/deliverycache_f901ff39b456df599289c590ed89b19d.php
[2012-02-25 18:53:04.203463] W
[afr-common.c:1121:afr_conflicting_iattrs] 0-web-pub-replicate-0:
/cust/site1/www/openx/var/cache/deliverycache_f901ff39b456df599289c590ed89b19d.php:
gfid differs on subvolume 0 (375e1754-0420-4e26-9176-bb2128c6596b,
3e9eca35-3351-450e-b8ab-c62785968953)
[2012-02-25 18:53:04.203678] I
[afr-self-heal-common.c:963:afr_sh_missing_entries_done]
0-web-pub-replicate-0: split brain found, aborting selfheal of
/cust/site1/www/openx/var/cache/deliverycache_f901ff39b456df599289c590ed89b19d.php
[2012-02-25 18:53:04.203693] E
[afr-self-heal-common.c:2074:afr_self_heal_completion_cbk]
0-web-pub-replicate-0: background meta-data data missing-entry
self-heal failed on
/cust/site1/www/openx/var/cache/deliverycache_f901ff39b456df599289c590ed89b19d.php
[2012-02-25 18:53:04.204759] W
[afr-common.c:1121:afr_conflicting_iattrs] 0-web-pub-replicate-0:
/cust/site1/www/openx/var/cache/deliverycache_f901ff39b456df599289c590ed89b19d.php:
gfid differs on subvolume 0 (375e1754-0420-4e26-9176-bb2128c6596b,
3e9eca35-3351-450e-b8ab-c62785968953)
[2012-02-25 18:53:04.204781] W
[afr-common.c:1121:afr_conflicting_iattrs] 0-web-pub-replicate-0:
/cust/site1/www/openx/var/cache/deliverycache_f901ff39b456df599289c590ed89b19d.php:
gfid differs on subvolume 0 (375e1754-0420-4e26-9176-bb2128c6596b,
3e9eca35-3351-450e-b8ab-c62785968953)
[2012-02-25 18:53:04.204800] W
[afr-common.c:882:afr_detect_self_heal_by_iatt] 0-web-pub-replicate-0:
/cust/site1/www/openx/var/cache/deliverycache_f901ff39b456df599289c590ed89b19d.php:
gfid different on subvolume
[2012-02-25 18:53:04.204818] I [afr-common.c:1038:afr_launch_self_heal]
0-web-pub-replicate-0: background meta-data data missing-entry
self-heal triggered. path:
/cust/site1/www/openx/var/cache/deliverycache_f901ff39b456df599289c590ed89b19d.php
[2012-02-25 18:53:04.206150] W
[afr-common.c:1121:afr_conflicting_iattrs] 0-web-pub-replicate-0:
/cust/site1/www/openx/var/cache/deliverycache_f901ff39b456df599289c590ed89b19d.php:
gfid differs on subvolume 0 (375e1754-0420-4e26-9176-bb2128c6596b,
3e9eca35-3351-450e-b8ab-c62785968953)
[2012-02-25 18:53:04.206384] I
[afr-self-heal-common.c:963:afr_sh_missing_entries_done]
0-web-pub-replicate-0: split brain found, aborting selfheal of
/cust/site1/www/openx/var/cache/deliverycache_f901ff39b456df599289c590ed89b19d.php
[2012-02-25 18:53:04.206400] E
[afr-self-heal-common.c:2074:afr_self_heal_completion_cbk]
0-web-pub-replicate-0: background meta-data data missing-entry
self-heal failed on
/cust/site1/www/openx/var/cache/deliverycache_f901ff39b456df599289c590ed89b19d.php
[2012-02-25 18:53:04.207725] W
[afr-common.c:1121:afr_conflicting_iattrs] 0-web-pub-replicate-0:
/cust/site1/www/openx/var/cache/deliverycache_f901ff39b456df599289c590ed89b19d.php:
gfid differs on subvolume 0 (375e1754-0420-4e26-9176-bb2128c6596b,
3e9eca35-3351-450e-b8ab-c62785968953)
[2012-02-25 18:53:04.207746] W
[afr-common.c:1121:afr_conflicting_iattrs] 0-web-pub-replicate-0:
/cust/site1/www/openx/var/cache/deliverycache_f901ff39b456df599289c590ed89b19d.php:
gfid differs on subvolume 0 (375e1754-0420-4e26-9176-bb2128c6596b,
3e9eca35-3351-450e-b8ab-c62785968953)
[2012-02-25 18:53:04.207756] W
[afr-common.c:882:afr_detect_self_heal_by_iatt] 0-web-pub-replicate-0:
/cust/site1/www/openx/var/cache/deliverycache_f901ff39b456df599289c590ed89b19d.php:
gfid different on subvolume
[2012-02-25 18:53:04.207772] I [afr-common.c:1038:afr_launch_self_heal]
0-web-pub-replicate-0: background meta-data data missing-entry
self-heal triggered. path:
/cust/site1/www/openx/var/cache/deliverycache_f901ff39b456df599289c590ed89b19d.php
[2012-02-25 18:53:04.209217] W
[afr-common.c:1121:afr_conflicting_iattrs] 0-web-pub-replicate-0:
/cust/site1/www/openx/var/cache/deliverycache_f901ff39b456df599289c590ed89b19d.php:
gfid differs on subvolume 0 (375e1754-0420-4e26-9176-bb2128c6596b,
3e9eca35-3351-450e-b8ab-c62785968953)
Nodes and network are fine. I have tried mounting the volumes using both
the Gluster native client and with the Gluster NFS client but get the
same results. It's killing performance.
Here is the config:
1: volume web-pub-client-0
2: type protocol/client
3: option remote-host web-web1
4: option remote-subvolume /glusterfs/pub
5: option transport-type tcp
6: end-volume
7:
8: volume web-pub-client-1
9: type protocol/client
10: option remote-host web-web2
11: option remote-subvolume /glusterfs/pub
12: option transport-type tcp
13: end-volume
14:
15: volume web-pub-client-2
16: type protocol/client
17: option remote-host web-web3
18: option remote-subvolume /glusterfs/pub
19: option transport-type tcp
20: end-volume
21:
22: volume web-pub-client-3
23: type protocol/client
24: option remote-host web-web4
25: option remote-subvolume /glusterfs/pub
26: option transport-type tcp
27: end-volume
28:
29: volume web-pub-replicate-0
30: type cluster/replicate
31: subvolumes web-pub-client-0 web-pub-client-1 web-pub-client-2
web-pub-client-3
32: end-volume
33:
34: volume web-pub-write-behind
35: type performance/write-behind
36: subvolumes web-pub-replicate-0
37: end-volume
38:
39: volume web-pub-read-ahead
40: type performance/read-ahead
41: subvolumes web-pub-write-behind
42: end-volume
43:
44: volume web-pub-io-cache
45: type performance/io-cache
46: option cache-size 256MB
47: subvolumes web-pub-read-ahead
48: end-volume
49:
50: volume web-pub-quick-read
51: type performance/quick-read
52: option cache-size 256MB
53: subvolumes web-pub-io-cache
54: end-volume
55:
56: volume web-pub
57: type debug/io-stats
58: option latency-measurement off
59: option count-fop-hits off
60: subvolumes web-pub-quick-read
61: end-volume
62:
63: volume nfs-server
64: type nfs/server
65: option nfs.dynamic-volumes on
66: option rpc-auth.addr.web-pub.allow *
67: option nfs3.web-pub.volume-id ac556d2e-e8a9-4857-bd17-cab603820fcb
68: subvolumes web-pub
69: end-volume
Any ideas or help would be greatly appreciated.
sean
--
Sean Fulton
GCN Publishing, Inc.
Internet Design, Development and Consulting For Today's Media Companies
http://www.gcnpublishing.com
(203) 665-6211, x203
More information about the Gluster-users
mailing list