[Gluster-users] slow write to non-hosted replica in distributed-replicated volume

Wed Oct 17 20:47:16 UTC 2012

I have four servers, absolutely identical, connected to the same switches. One interface is on a 100Mb switch, the other is on a 1Gb switch. I access the nodes via the 100Mb port, gluster is configured on the 1Gb port. The nodes are all loaded with Scientific Linux 6.3, Virtualization Host, with glusterfs-3.2.7 from EPEL. The nodes are 2-socket quad-core AMD (so 8 cores total) servers with 6x 300GB internal drives. I'm using LVM on top of h/w RAID0, and have a 1.5TB xfs brick on each node. I have libvirtd running, but no VMs created yet.

I initially configured each pair of servers as a separate cluster with a 1x2 replicated volume. Mount the volumes as glusterfs from localhost, and dd tests gives me ~90MB/s.. pretty decent for 1Gb network (max 125MB/s). So, tear that all down, and join all four nodes together, and create a 2x2 distributed-replicated volume. Now is where it gets interesting. First node dd test is consistent. Second node dd test is half-speed. Third node dd test is back to full speed. Fourth node dd test is back to half-speed. So when I look in the bricks directly, I see that the nodes that were slower had their file in a brick that was not part of the replica they were hosting.

For example..
gluster volume create vol1 replica 2 transport tcp server1:/brick1 server2:/brick2 server3:/brick3 server4:/brick4

server1:/brick1 and server2:/brick2 are the first replica pair
server3:/brick3 and server4:/brick4 are the second replica pair

server1.. file1 goes into brick1/brick2 - fast
server2.. file2 goes into brick3/brick4 - slow
server3.. file3 goes into brick3/brick4 - fast
server4.. file4 goes into brick1/brick2 - slow

So I delete that volume, and create another..
gluster volume create vol2 replica 2 transport tcp server2:/brick2 server3:/brick3 server4:/brick4 server1:/brick1

server2:/brick2 and server3:/brick3 are the first replica pair
server4:/brick4 and server1:/brick1 are the second replica pair

server2.. file2 goes into brick2/brick3 - fast
server3.. file3 goes into brick2/brick3 - fast
server4.. file4 goes into brick4/brick1 - fast
server1.. file1 goes into brick4/brick1 - fast

So now I'm like seriously WTF. So I remove all output files, and try four consecutive tests from the same node with output file1, file2, file3, file4. And sure enough two of them are fast and two are slow, and the fast ones are placed in "its" replica pair and the slow ones are in the other. And I notice that every time I delete them, the files get created in the same replica pair each time, no matter what order I create them. I've tried this with nfs mounts also (instead of glusterfs), and the results are the same.

Has anyone seen this behavior before? Is this a known issue or mis-configuration?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20121017/5cd555d0/attachment.html>