[Gluster-users] Gluster volume heal statistics aren't changing.
Ernie Dunbar
maillist at lightspeed.ca
Thu Apr 14 21:51:34 UTC 2016
Hi everyone.
So, a few days ago, I installed another gluster server to our cluster to
prevent split-brains. I told the server to do a self-heal operation, and
sat back and waited while the performance of the cluster dropped
dramatically and our customers all lost patience with us over the course
of several days.
Now I see that the disk on the new node has filled somewhat, but
apparently the self-heal process has stalled. This is what I see when I
run the "volume heal statistics heal-count" command:
root at nfs3:/home/ernied# date
Thu Apr 14 13:14:00 PDT 2016
root at nfs3:/home/ernied# gluster volume heal gv2 statistics heal-count
Gathering count of entries to be healed on volume gv2 has been
successful
Brick nfs1:/brick1/gv2
Number of entries: 475
Brick nfs2:/brick1/gv2
Number of entries: 190
Brick nfs3:/brick1/gv2
Number of entries: 36
root at nfs3:/home/ernied# date
Thu Apr 14 14:35:00 PDT 2016
root at nfs3:/home/ernied# gluster volume heal gv2 statistics heal-count
Gathering count of entries to be healed on volume gv2 has been
successful
Brick nfs1:/brick1/gv2
Number of entries: 475
Brick nfs2:/brick1/gv2
Number of entries: 190
Brick nfs3:/brick1/gv2
Number of entries: 36
After an hour and 20 minutes, I see zero progress. How do I give this
thing a kick in the pants to get moving?
Also, after reading a bit about Gluster tuning, I suspect I may have
made a mistake in creating the bricks. I hear about how we should have
pairs of bricks for faster access, but we've only got 1 brick replicated
over 3 servers. Or maybe that's 3 bricks all named the same thing, I'm
not sure. Here's what the "volume info" command shows:
root at nfs1:/home/ernied# gluster volume info
Volume Name: gv2
Type: Replicate
Volume ID: 3969e9cc-a2bf-4819-8c02-bf51ec0c905f
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: nfs1:/brick1/gv2
Brick2: nfs2:/brick1/gv2
Brick3: nfs3:/brick1/gv2
Options Reconfigured:
cluster.server-quorum-type: none
cluster.server-quorum-ratio: 51
We currently have about 618 GB of data shared on 3 6 TB RAID arrays. The
data is nearly all e-mail, so a lot of small files and IMAP doing a lot
of random read/write operations. Customers are not pleased with the
speed of our webmail right now. Would creating a larger number of
smaller bricks speed up our backend performance? Is there a way to do
that non-destructively?
More information about the Gluster-users
mailing list