[Gluster-users] Unexpected behaviour adding a third server

Joe Julian joe at julianfamily.org
Wed Jan 27 06:40:09 UTC 2016


As an aside, increasing your replica count from 3 to 4 gets you very 
little in increased reliability. If your default uptime is 4 nines (weak 
SLA for a cloud provider), replica 2 gets you:

1 - 1(1-99.99)^2 or 99.999999% (8 nines) or 315 milliseconds of down 
time annually.

Not much point after that of increasing reliability, is there? But 
replica 3 gives us 1 - 1(1-99.99)^3 or 11 nines, 0.03 milliseconds of 
downtime per year.

Do you need more replicas than that? As you increase your replica count, 
you're decreasing your performance (generally), and you're not really 
doing anything for reliability.

The only reason you should be growing your replica count, imho, is if 
you've got so much demand for a single file that you're maxing out the 
read capabilities of your existing replicas. Then, adding another 
replica to increase read capacity makes sense.

On 01/26/2016 10:17 PM, Anuradha Talur wrote:
>
> ----- Original Message -----
>> From: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
>> To: "Steve Spence" <steve at pixeldynamo.com>, gluster-users at gluster.org
>> Cc: "Anuradha Talur" <atalur at redhat.com>
>> Sent: Monday, January 25, 2016 8:24:41 PM
>> Subject: Re: [Gluster-users] Unexpected behaviour adding a third server
>>
>>
>> On 01/23/2016 02:17 PM, Steve Spence wrote:
>>> We've a simple two-server one volume arrangement, replicating ~340k
>>> files (15GB) between our web servers.
>>>
>>> The servers are in AWS, sat in different availability zones. One of
>>> the operations for this weekend is to add another pair of machines,
>>> one in each AZ.
>>>
>>> I've deployed the same OS image of the gluster server (3.6) and was
>>> under the impression I could add a brick to the existing replica
>>> simply by issuing the below:
>>>
>>> gluster volume add-brick volume1 replica 3 pd-wfe3:/gluster-store
>>>
>>> And then presumably would add the fourth server by repeating the above
>>> with "replica 4" and the fourth server name.
>>>
>>> The operation appeared to succeed, the brick appears alongside the others:
>>>
>>> Status: Started
>>> Number of Bricks: 1 x 3 = 3
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: pd-wfe1:/gluster-store
>>> Brick2: pd-wfe2:/gluster-store
>>> Brick3: pd-wfe3:/gluster-store
>>>
>>> but almost immediately pd-wfe1 crept up to 100% CPU with the gluster
>>> processes, and nginx began timing out serving content from the volume.
>> Could you disable client-side healing?
>> "gluster volume set <volname> cluster.entry-self-heal off
>> "gluster volume set <volname> cluster.data-self-heal off
>> "gluster volume set <volname> cluster.metadata-self-heal off
>>
>> We are in the process of making this experience smooth for 3.8. by
>> introducing throttling of self-heal traffic, automatic healing.
>> +Anuradha,
>>          Could you give him the steps he need to perform after doing
>> add-brick until the patch you sent is merged?
> Hi Steve,
>
> Once you add-bricks to a replicate volume such that replica count is increased (2 to 3 and then 3 to 4 in the case you mentioned), files need to be healed from the pre-existing bricks to newly added ones.
>
> The process of triggering heals from old to new bricks is not automatic yet. We have a patch for it undergoing review.
> Meanwhile, you can follow below given steps to trigger heals from self-heal-daemon:
>
> I'm assuming you have added the third brick and yet to add the 4th one.
>
> 1) Turn off client side self-healing by using steps given by Pranith.
> 2) Kill the 3rd brick (newly added one).
> 3) On the new brick, i.e., pd-wfe3:/gluster-store : setfattr -n trusted.afr.dirty -v 0x000000000000000000000001 <path-to-bricks>
> 4) Say the mount point used for this volume is in /mnt, perform :
>        a) touch /mnt/<non-existent-file>
>        b) setfattr -n <user.non-existent-xattr> -v <1> /mnt
>        c) rm /mnt/<non-existent-file>
>        d) setfattr -x <user.non-existent-xattr> /mnt
> These operations will set pending xattrs on the newly added brick such that heal is triggered.
> 5) Bring the brick back up by gluster volume start force
> 6) Run gluster volume heal <volname> from one of the servers
>
> You can monitor if the files are bring healed or not from gluster volume heal <volname> info.
>
> Let me know if there is any clarification required.
>
>> Pranith
>>> The glusterfs-glusterd-vol log is filled with this error at pd-wfe1:
>>>
>>> [2016-01-23 08:43:28.459215] W [socket.c:620:__socket_rwv]
>>> 0-management: readv on
>>> /var/run/c8bc2f99e7584cb9cf077c4f98d1db2e.socket failed (Invalid argument)
>>>
>>> while I see this error for the log named by the mount point:
>>>
>>> [2016-01-23 08:43:28.986379] W
>>> [client-rpc-fops.c:306:client3_3_mkdir_cbk] 2-volume1-client-2: remote
>>> operation failed: Permission denied. Path: (null)
>>>
>>> Does anyone have any suggestions how to proceed? I would appreciate
>>> any input on this one.
>>>
>>> Steve
>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>



More information about the Gluster-users mailing list