[Gluster-devel] Gluster 3.5.0 geo-replication with multiple bricks

Mon Jun 30 15:06:43 UTC 2014

 Hi all.

I've recently installed three Gluster 3.5 servers, two masters and one
geo-replication slave, all of them with 2 bricks.  After some configuration
problems It seems that all is working ok but I've found some problems with
geo-replication.
First I'd like to do one question because I couldn't find the answers
neither in documentation nor in any mail list:

This is the volume configuration:

root at filepre03:/gluster/jbossbricks/pre01/disk01/b01/.glusterfs/changelogs#
gluster v info

(master)
Volume Name: jbpre01vol
Type: Distributed-Replicate
Volume ID: 316231f7-20bf-44f6-9d9b-20d4e3b27c2c
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: filepre03:/gluster/jbossbricks/pre01/disk01/b01
Brick2: filepre04:/gluster/jbossbricks/pre01/disk01/b01
Brick3: filepre03:/gluster/jbossbricks/pre01/disk02/b02
Brick4: filepre04:/gluster/jbossbricks/pre01/disk02/b02
Options Reconfigured:
diagnostics.brick-log-level: WARNING
changelog.changelog: on
geo-replication.ignore-pid-check: on
geo-replication.indexing: on

(geo-replica)
Volume Name: jbpre01slvol
Type: Distribute
Volume ID: 0a4d2f3e-c803-4cfe-971b-2f8107180a69
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: filepre05:/gluster/jbossbricks/pre01/disk01/b01
Brick2: filepre05:/gluster/jbossbricks/pre01/disk02/b02
Options Reconfigured:
diagnostics.brick-log-level: WARNING

And geo replication is running on bricks b01 and b02 :

root at filepre03:/gluster/jbossbricks/pre01/disk01/b01/.glusterfs/changelogs#
gluster v g jbpre01vol filepre05::jbpre01slvol status

MASTER NODE    MASTER VOL    MASTER BRICK
SLAVE                      STATUS     CHECKPOINT STATUS    CRAWL STATUS
-------------------------------------------------------------------------------------------------------------------------------------------------
filepre03      jbpre01vol    /gluster/jbossbricks/pre01/disk01/b01
filepre05::jbpre01slvol    Active     N/A                  Changelog Crawl
filepre03      jbpre01vol    /gluster/jbossbricks/pre01/disk02/b02
filepre05::jbpre01slvol    Active     N/A                  Changelog Crawl
filepre04      jbpre01vol    /gluster/jbossbricks/pre01/disk01/b01
filepre05::jbpre01slvol    Passive    N/A                  N/A
filepre04      jbpre01vol    /gluster/jbossbricks/pre01/disk02/b02
filepre05::jbpre01slvol    Passive    N/A                  N/A

Tests are done from another server mounting master and slave volumes:

root at testgluster:/mnt/gluster# mount |grep gluster
filepre03:/jbpre01vol on /mnt/gluster/pre01filepre03 type fuse.glusterfs
(rw,default_permissions,allow_other,max_read=131072)
filepre04:/jbpre01vol on /mnt/gluster/pre01filepre04 type fuse.glusterfs
(rw,default_permissions,allow_other,max_read=131072)
filepre05:/jbpre01slvol on /mnt/gluster/pre01filepre05 type fuse.glusterfs
(rw,default_permissions,allow_other,max_read=131072)

My question is about directories dates in geo replica, in all my tests
directory date in remote server shows the date when replication was
executed, not original date. Is this the usual behavior?

For example:

root at testgluster:/mnt/gluster# mkdir /mnt/gluster/pre01filepre03/TESTDIR1
root at testgluster:/mnt/gluster# echo TEST > pre01filepre03/TESTDIR1/TESTING1

After a while, gluster has created dir and file but directory's date is
current date not original:

root at testgluster:/mnt/gluster# ls -d --full-time pre01filepre0*/TESTDIR1
drwxr-xr-x 2 root root 8192 2014-06-30 11:55:18.651528230 +0200
pre01filepre03/TESTDIR1
drwxr-xr-x 2 root root 8192 2014-06-30 11:55:18.652637248 +0200
pre01filepre04/TESTDIR1
drwxr-xr-x 2 root root 8192 2014-06-30 11:56:14.087626822 +0200
pre01filepre05/TESTDIR1 (geo-replica)

However file is replicated with original date:

root at testgluster:/mnt/gluster# find . -type f -exec ls --full-time {} \;
-rw-r--r-- 1 root root 5 2014-06-30 11:55:18.664637725 +0200
./pre01filepre04/TESTDIR1/TESTING1
-rw-r--r-- 1 root root 5 2014-06-30 11:55:18.663528750 +0200
./pre01filepre03/TESTDIR1/TESTING1
-rw-r--r-- 1 root root 5 2014-06-30 11:55:18.000000000 +0200
./pre01filepre05/TESTDIR1/TESTING1 (geo-replica)

This makes dificult to validate any syncing error between masters and slave
using commands like rsync because directories are always different:

root at testgluster:/mnt/gluster# rsync -avn pre01filepre03/TESTDIR1/
pre01filepre05
sending incremental file list
./
TESTING1

sent 49 bytes  received 18 bytes  134.00 bytes/sec
total size is 5  speedup is 0.07 (DRY RUN)

Next, I would ask If someone has found next problem when bricks in remoter
server goes down:

After check currently status I've kill one brick process in remote server:

root at filepre03:~# gluster v status jbpre01vol
Status of volume: jbpre01vol
Gluster process                                         Port    Online  Pid
------------------------------------------------------------------------------
Brick filepre03:/gluster/jbossbricks/pre01/disk01/b01   49169   Y       8167
Brick filepre04:/gluster/jbossbricks/pre01/disk01/b01   49172   Y       7027
Brick filepre03:/gluster/jbossbricks/pre01/disk02/b02   49170   Y       8180
Brick filepre04:/gluster/jbossbricks/pre01/disk02/b02   49173   Y       7040
NFS Server on localhost
      2049    Y       2088
Self-heal Daemon on localhost
   N/A     Y       30873
NFS Server on filepre04
       2049    Y       9171
Self-heal Daemon on filepre04
    N/A     Y       7061
NFS Server on filepre05
      2049    Y       1128
Self-heal Daemon on filepre05
      N/A     Y       1137

root at filepre03:~# gluster v status jbpre01slvol
Status of volume: jbpre01slvol
Gluster process                                         Port    Online  Pid
------------------------------------------------------------------------------
Brick filepre05:/gluster/jbossbricks/pre01/disk01/b01   49152   Y       6321
Brick filepre05:/gluster/jbossbricks/pre01/disk02/b02   49155   Y       6375
NFS Server on localhost                                 2049    Y       2088
NFS Server on filepre04                                 2049    Y       9171
NFS Server on filepre05                                 2049    Y       1128

root at filepre03:~# gluster v g status

MASTER NODE    MASTER VOL    MASTER BRICK
SLAVE                      STATUS     CHECKPOINT STATUS    CRAWL STATUS
-------------------------------------------------------------------------------------------------------------------------------------------------
filepre03      jbpre01vol    /gluster/jbossbricks/pre01/disk01/b01
filepre05::jbpre01slvol    Active     N/A                  Changelog Crawl
filepre03      jbpre01vol    /gluster/jbossbricks/pre01/disk02/b02
filepre05::jbpre01slvol    Active     N/A                  Changelog Crawl
filepre04      jbpre01vol    /gluster/jbossbricks/pre01/disk01/b01
filepre05::jbpre01slvol    Passive    N/A                  N/A
filepre04      jbpre01vol    /gluster/jbossbricks/pre01/disk02/b02
filepre05::jbpre01slvol    Passive    N/A                  N/A

root at filepre05:/gluster/jbossbricks/pre01# kill -9 6375  (slave: brick 02
process)

root at filepre03:~# gluster v status jbpre01slvol
Gluster process                                         Port    Online  Pid
------------------------------------------------------------------------------
Brick filepre05:/gluster/jbossbricks/pre01/disk01/b01   49152   Y       6321
Brick filepre05:/gluster/jbossbricks/pre01/disk02/b02     N/A     N
6375
NFS Server on localhost
2049    Y       2088
NFS Server on filepre04
2049    Y       9171
NFS Server on filepre05
2049    Y       1128

If I kill only one brick process geo replication doesn't show any problem
and doesn't detect problems when a client writes on the brick.

I write on some files:
root at testgluster:/mnt/gluster# echo "TEST2" >pre01filepre03/TESTFILE2
root at testgluster:/mnt/gluster# echo "TEST3" >pre01filepre03/TESTFILE3
root at testgluster:/mnt/gluster# echo "TEST4" >pre01filepre03/TESTFILE4
root at testgluster:/mnt/gluster# echo "TEST5" >pre01filepre03/TESTFILE5

Then, I check where they are:

root at filepre03:~# find /gluster -name "TESTFILE*" -exec ls -l {} \;
-rw-r--r-- 2 root root 6 Jun 30 16:11
/gluster/jbossbricks/pre01/disk01/b01/TESTFILE3
-rw-r--r-- 2 root root 6 Jun 30 16:11
/gluster/jbossbricks/pre01/disk01/b01/TESTFILE4
-rw-r--r-- 2 root root 6 Jun 30 16:09
/gluster/jbossbricks/pre01/disk01/b01/TESTFILE2
-rw-r--r-- 2 root root 6 Jun 30 16:12
/gluster/jbossbricks/pre01/disk02/b02/TESTFILE5   <---  brick 02

and finally check geo replication status:

root at filepre03:~root at filepre03:~# gluster v g jbpre01vol
filepre05::jbpre01slvol status detail

MASTER NODE    MASTER VOL    MASTER BRICK
SLAVE                      STATUS     CHECKPOINT STATUS    CRAWL
STATUS       FILES SYNCD    FILES PENDING    BYTES PENDING    DELETES
PENDING    FILES SKIPPED
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
filepre03      jbpre01vol    /gluster/jbossbricks/pre01/disk01/b01
filepre05::jbpre01slvol    Active     N/A                  Changelog
Crawl    3738           0                0
0                  0
filepre03      jbpre01vol    /gluster/jbossbricks/pre01/disk02/b02
filepre05::jbpre01slvol    Active     N/A                  Changelog
Crawl    3891           0                0
0                  0
filepre04      jbpre01vol    /gluster/jbossbricks/pre01/disk01/b01
filepre05::jbpre01slvol    Passive    N/A
N/A                0              0                0
0                  0
filepre04      jbpre01vol    /gluster/jbossbricks/pre01/disk02/b02
filepre05::jbpre01slvol    Passive    N/A
N/A                0              0                0
0                  0

No problem is shown, no file sync is pending but on remote server:

root at testgluster:/mnt/gluster# ll pre01filepre05
total 14
drwxr-xr-x 4 root root 4096 Jun 30 16:12 ./
drwxr-xr-x 5 root root 4096 Jun 26 12:57 ../
-rw-r--r-- 1 root root    6 Jun 30 16:09 TESTFILE2
-rw-r--r-- 1 root root    6 Jun 30 16:11 TESTFILE3
-rw-r--r-- 1 root root    6 Jun 30 16:11 TESTFILE4

... only files written on brick 01 have been replicated

I can't find any gluster command that returns a failed status.

Only one more comment, if I kill all brick process at remote server then
geo replication status change to faulty as I expected

Any additional info will be appreciated.

Thank you

Eva
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20140630/7cf20bf1/attachment-0001.html>