[Gluster-users] Copy operation freezes. Lots of locks in state BLOCKED (3-node setup with 1 arbiter)

Mon Oct 26 11:37:57 UTC 2015


On 10/23/2015 10:10 AM, Ravishankar N wrote:
>
>
> On 10/21/2015 05:55 PM, Adrian Gruntkowski wrote:
>> Hello,
>>
>> I'm trying to track down a problem with my setup (version 3.7.3 on 
>> Debian stable).
>>
>> I have a couple of volumes setup in 3-node configuration with 1 brick 
>> as an arbiter for each.
>>
>> There are 4 volumes set up in cross-over across 3 physical servers, 
>> like this:
>>
>>
>>
>>              ------------------------------------->[ GigabitEthernet 
>> switch ]<--------------------------
>>              |  ^                                        |
>>              |  |                                        |
>>              V  V                                        V
>> /-------------------------- \ /-------------------------- \ 
>> /-------------------------- \
>> | web-rep                   |                   | cluster-rep         
>>       |             | mail-rep                  |
>> |                           |                   |               |     
>>         |                           |
>> | vols:                     |                   | vols:               
>> |             | vols:                     |
>> | system_www1               |                   | system_www1         
>>       |             | system_www1(arbiter)      |
>> | data_www1                 |                   | data_www1           
>>     |             | data_www1(arbiter)        |
>> | system_mail1(arbiter)     |                   | system_mail1       
>>        |             | system_mail1              |
>> | data_mail1(arbiter)       |                   | data_mail1         
>>      |             | data_mail1                |
>> \---------------------------/ \---------------------------/ 
>> \---------------------------/
>>
>>
>> Now, after a fresh boot-up, everything seems to be running fine.
>> Then I start copying big files (KVM disk images) from local disk to 
>> gluster mounts.
>> In the beginning it seems to be running fine (although iowait seems 
>> go so high that it clogs up io operations
>> at some moments, but that's an issue for later). After some time the 
>> transfer freezes, then
>> after some (long) time, it advances in a short burst to freeze again. 
>> Another interesting thing is that
>> I see constant flow of the network traffic on interfaces dedicated to 
>> gluster, even when there's a "freeze".
>>
>> I have done "gluster volume statedump" at that time of transfer (file 
>> is copied from local disk on cluster-rep
>> onto local mount of "system_www1" volume). I've observer a following 
>> section in the dump for cluster-rep node:
>>
>> [xlator.features.locks.system_www1-locks.inode]
>> path=/images/101/vm-101-disk-1.qcow2
>> mandatory=0
>> inodelk-count=12
>> lock-dump.domain.domain=system_www1-replicate-0:self-heal
>> inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid 
>> = 18446744073709551610, owner=c811600cd67f0000, 
>> client=0x7fbe100df280, 
>> connection-id=cluster-vm-3603-2015/10/21-10:35:54:596929-system_www1-client-0-0-0, 
>> granted at 2015-10-21 11:36:22
>> lock-dump.domain.domain=system_www1-replicate-0
>> inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=2195849216, 
>> len=131072, pid = 18446744073709551610, owner=c811600cd67f0000, 
>> client=0x7fbe100df280, 
>> connection-id=cluster-vm-3603-2015/10/21-10:35:54:596929-system_www1-client-0-0-0, 
>> granted at 2015-10-21 11:37:45
>> inodelk.inodelk[1](ACTIVE)=type=WRITE, whence=0, 
>> start=9223372036854775805, len=1, pid = 18446744073709551610, 
>> owner=c811600cd67f0000, client=0x7fbe100df280, 
>> connection-id=cluster-vm-3603-2015/10/21-10:35:54:596929-system_www1-client-0-0-0, 
>> granted at 2015-10-21 11:36:22
>
> From the statedump, It looks like self-heal daemon had taken locks to 
> heal the file due to which the locks attempted by the client (mount) 
> are in blocked state.
> In Arbiter volumes the client (mount) takes full locks (start=0, 
> len=0) for every write() as opposed to normal replica volumes which 
> take range locks (i.e. appropriate start,len values) for that write(). 
> This is done to avoid network split-brains.
> So in normal replica volumes, clients can still write to a file while 
> heal is going on, as long as the offsets don't overlap. This is not 
> the case with arbiter volumes.
> You can look at the client or glustershd logs to see if there are 
> messages that indicate healing of a file, something along the lines of 
> "Completed data selfheal on xxx"
hi Adrian,
       Thanks for taking the time to send this mail. I raised this as 
bug @https://bugzilla.redhat.com/show_bug.cgi?id=1275247, fix is posted 
for review @ http://review.gluster.com/#/c/12426/

Pranith
>
>> inodelk.inodelk[2](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid 
>> = 0, owner=c4fd2d78487f0000, client=0x7fbe100e1380, 
>> connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0, 
>> blocked at 2015-10-21 11:37:45
>> inodelk.inodelk[3](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid 
>> = 0, owner=dc752e78487f0000, client=0x7fbe100e1380, 
>> connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0, 
>> blocked at 2015-10-21 11:37:45
>> inodelk.inodelk[4](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid 
>> = 0, owner=34832e78487f0000, client=0x7fbe100e1380, 
>> connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0, 
>> blocked at 2015-10-21 11:37:45
>> inodelk.inodelk[5](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid 
>> = 0, owner=d44d2e78487f0000, client=0x7fbe100e1380, 
>> connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0, 
>> blocked at 2015-10-21 11:37:45
>> inodelk.inodelk[6](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid 
>> = 0, owner=306f2e78487f0000, client=0x7fbe100e1380, 
>> connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0, 
>> blocked at 2015-10-21 11:37:45
>> inodelk.inodelk[7](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid 
>> = 0, owner=8c902e78487f0000, client=0x7fbe100e1380, 
>> connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0, 
>> blocked at 2015-10-21 11:37:45
>> inodelk.inodelk[8](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid 
>> = 0, owner=782c2e78487f0000, client=0x7fbe100e1380, 
>> connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0, 
>> blocked at 2015-10-21 11:37:45
>> inodelk.inodelk[9](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid 
>> = 0, owner=1c0b2e78487f0000, client=0x7fbe100e1380, 
>> connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0, 
>> blocked at 2015-10-21 11:37:45
>> inodelk.inodelk[10](BLOCKED)=type=WRITE, whence=0, start=0, len=0, 
>> pid = 0, owner=24332e78487f0000, client=0x7fbe100e1380, 
>> connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0, 
>> blocked at 2015-10-21 11:37:45
>>
>> There seem to be multiple locks in BLOCKED state - which doesn't look 
>> normal to me. The other 2 nodes have
>> only 2 ACTIVE locks at the same time.
>>
>> Below is "gluster volume info" output.
>>
>> # gluster volume info
>>
>> Volume Name: data_mail1
>> Type: Replicate
>> Volume ID: fc3259a1-ddcf-46e9-ae77-299aaad93b7c
>> Status: Started
>> Number of Bricks: 1 x 3 = 3
>> Transport-type: tcp
>> Bricks:
>> Brick1: cluster-rep:/GFS/data/mail1
>> Brick2: mail-rep:/GFS/data/mail1
>> Brick3: web-rep:/GFS/data/mail1
>> Options Reconfigured:
>> performance.readdir-ahead: on
>> cluster.quorum-count: 2
>> cluster.quorum-type: fixed
>> cluster.server-quorum-ratio: 51%
>>
>> Volume Name: data_www1
>> Type: Replicate
>> Volume ID: 0c37a337-dbe5-4e75-8010-94e068c02026
>> Status: Started
>> Number of Bricks: 1 x 3 = 3
>> Transport-type: tcp
>> Bricks:
>> Brick1: cluster-rep:/GFS/data/www1
>> Brick2: web-rep:/GFS/data/www1
>> Brick3: mail-rep:/GFS/data/www1
>> Options Reconfigured:
>> performance.readdir-ahead: on
>> cluster.quorum-type: fixed
>> cluster.quorum-count: 2
>> cluster.server-quorum-ratio: 51%
>>
>> Volume Name: system_mail1
>> Type: Replicate
>> Volume ID: 0568d985-9fa7-40a7-bead-298310622cb5
>> Status: Started
>> Number of Bricks: 1 x 3 = 3
>> Transport-type: tcp
>> Bricks:
>> Brick1: cluster-rep:/GFS/system/mail1
>> Brick2: mail-rep:/GFS/system/mail1
>> Brick3: web-rep:/GFS/system/mail1
>> Options Reconfigured:
>> performance.readdir-ahead: on
>> cluster.quorum-type: none
>> cluster.quorum-count: 2
>> cluster.server-quorum-ratio: 51%
>>
>> Volume Name: system_www1
>> Type: Replicate
>> Volume ID: 147636a2-5c15-4d9a-93c8-44d51252b124
>> Status: Started
>> Number of Bricks: 1 x 3 = 3
>> Transport-type: tcp
>> Bricks:
>> Brick1: cluster-rep:/GFS/system/www1
>> Brick2: web-rep:/GFS/system/www1
>> Brick3: mail-rep:/GFS/system/www1
>> Options Reconfigured:
>> performance.readdir-ahead: on
>> cluster.quorum-type: none
>> cluster.quorum-count: 2
>> cluster.server-quorum-ratio: 51%
>>
>> The issue does not occur when I get rid of 3rd arbiter brick.
>
> What do you mean by 'getting rid of'? Killing the 3rd brick process of 
> the volume?
>
> Regards,
> Ravi
>>
>> If there's any additional information that is missing and I could 
>> provide, please let me know.
>>
>> Greetings,
>> Adrian
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20151026/d09cbe12/attachment.html>