[Gluster-users] Copy operation freezes. Lots of locks in state BLOCKED (3-node setup with 1 arbiter)

Wed Nov 4 23:18:46 UTC 2015


On 11/04/2015 09:10 PM, Adrian Gruntkowski wrote:
> Hello,
>
> I have applied Pranith's patch myself on current 3.7.5 release and 
> rebuilt packages. Unfortunately, the issue is still there :( It 
> behaves exactly the same.
Could you get the statedumps of the bricks again? I will take a look? 
May be the hang I observed is different from what you are observing and 
I only fixed the one I observed.

Pranith
>
> Regards,
> Adrian
>
> 2015-10-28 12:02 GMT+01:00 Pranith Kumar Karampuri 
> <pkarampu at redhat.com <mailto:pkarampu at redhat.com>>:
>
>
>
>     On 10/28/2015 04:27 PM, Adrian Gruntkowski wrote:
>>     Hello Pranith,
>>
>>     Thank you for prompt reaction. I didn't get back to this until
>>     now, because I had other problems to deal with.
>>
>>     Are there chances that it will get released this or next month?
>>     If not, I will probably have to resort to compiling on my own.
>     I am planning to get this in for 3.7.6 which is to be released by
>     end of this month. I guess in 4-5 days :-). I will update you
>
>     Pranith
>
>>
>>     Regards,
>>     Adrian
>>
>>
>>     2015-10-26 12:37 GMT+01:00 Pranith Kumar Karampuri
>>     <pkarampu at redhat.com <mailto:pkarampu at redhat.com>>:
>>
>>
>>
>>         On 10/23/2015 10:10 AM, Ravishankar N wrote:
>>>
>>>
>>>         On 10/21/2015 05:55 PM, Adrian Gruntkowski wrote:
>>>>         Hello,
>>>>
>>>>         I'm trying to track down a problem with my setup (version
>>>>         3.7.3 on Debian stable).
>>>>
>>>>         I have a couple of volumes setup in 3-node configuration
>>>>         with 1 brick as an arbiter for each.
>>>>
>>>>         There are 4 volumes set up in cross-over across 3 physical
>>>>         servers, like this:
>>>>
>>>>
>>>>
>>>>          ------------------------------------->[ GigabitEthernet
>>>>         switch ]<--------------------------
>>>>                      |                            ^                
>>>>                          |
>>>>                      |                            |                
>>>>                          |
>>>>                      V                            V                
>>>>                          V
>>>>         /-------------------------- \ /-------------------------- \
>>>>         /-------------------------- \
>>>>         | web-rep                   |               | cluster-rep  
>>>>               |             | mail-rep                |
>>>>         |                           |               |         |    
>>>>                 |                 |
>>>>         | vols:                     |               | vols:        
>>>>         |             | vols:                 |
>>>>         | system_www1               |               | system_www1  
>>>>               |             | system_www1(arbiter)      |
>>>>         | data_www1                 |               | data_www1    
>>>>             |             | data_www1(arbiter)        |
>>>>         | system_mail1(arbiter)     |               | system_mail1
>>>>                |             | system_mail1              |
>>>>         | data_mail1(arbiter)       |               | data_mail1  
>>>>              |             | data_mail1                |
>>>>         \---------------------------/ \---------------------------/
>>>>         \---------------------------/
>>>>
>>>>
>>>>         Now, after a fresh boot-up, everything seems to be running
>>>>         fine.
>>>>         Then I start copying big files (KVM disk images) from local
>>>>         disk to gluster mounts.
>>>>         In the beginning it seems to be running fine (although
>>>>         iowait seems go so high that it clogs up io operations
>>>>         at some moments, but that's an issue for later). After some
>>>>         time the transfer freezes, then
>>>>         after some (long) time, it advances in a short burst to
>>>>         freeze again. Another interesting thing is that
>>>>         I see constant flow of the network traffic on interfaces
>>>>         dedicated to gluster, even when there's a "freeze".
>>>>
>>>>         I have done "gluster volume statedump" at that time of
>>>>         transfer (file is copied from local disk on cluster-rep
>>>>         onto local mount of "system_www1" volume). I've observer a
>>>>         following section in the dump for cluster-rep node:
>>>>
>>>>         [xlator.features.locks.system_www1-locks.inode]
>>>>         path=/images/101/vm-101-disk-1.qcow2
>>>>         mandatory=0
>>>>         inodelk-count=12
>>>>         lock-dump.domain.domain=system_www1-replicate-0:self-heal
>>>>         inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0,
>>>>         len=0, pid = 18446744073709551610, owner=c811600cd67f0000,
>>>>         client=0x7fbe100df280,
>>>>         connection-id=cluster-vm-3603-2015/10/21-10:35:54:596929-system_www1-client-0-0-0,
>>>>         granted at 2015-10-21 11:36:22
>>>>         lock-dump.domain.domain=system_www1-replicate-0
>>>>         inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0,
>>>>         start=2195849216, len=131072, pid = 18446744073709551610,
>>>>         owner=c811600cd67f0000, client=0x7fbe100df280,
>>>>         connection-id=cluster-vm-3603-2015/10/21-10:35:54:596929-system_www1-client-0-0-0,
>>>>         granted at 2015-10-21 11:37:45
>>>>         inodelk.inodelk[1](ACTIVE)=type=WRITE, whence=0,
>>>>         start=9223372036854775805, len=1, pid =
>>>>         18446744073709551610, owner=c811600cd67f0000,
>>>>         client=0x7fbe100df280,
>>>>         connection-id=cluster-vm-3603-2015/10/21-10:35:54:596929-system_www1-client-0-0-0,
>>>>         granted at 2015-10-21 11:36:22
>>>
>>>         From the statedump, It looks like self-heal daemon had taken
>>>         locks to heal the file due to which the locks attempted by
>>>         the client (mount) are in blocked state.
>>>         In Arbiter volumes the client (mount) takes full locks
>>>         (start=0, len=0) for every write() as opposed to normal
>>>         replica volumes which take range locks (i.e. appropriate
>>>         start,len values) for that write(). This is done to avoid
>>>         network split-brains.
>>>         So in normal replica volumes, clients can still write to a
>>>         file while heal is going on, as long as the offsets don't
>>>         overlap. This is not the case with arbiter volumes.
>>>         You can look at the client or glustershd logs to see if
>>>         there are messages that indicate healing of a file,
>>>         something along the lines of "Completed data selfheal on xxx"
>>         hi Adrian,
>>               Thanks for taking the time to send this mail. I raised
>>         this as bug
>>         @https://bugzilla.redhat.com/show_bug.cgi?id=1275247, fix is
>>         posted for review @ http://review.gluster.com/#/c/12426/
>>
>>         Pranith
>>
>>>
>>>>         inodelk.inodelk[2](BLOCKED)=type=WRITE, whence=0, start=0,
>>>>         len=0, pid = 0, owner=c4fd2d78487f0000,
>>>>         client=0x7fbe100e1380,
>>>>         connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,
>>>>         blocked at 2015-10-21 11:37:45
>>>>         inodelk.inodelk[3](BLOCKED)=type=WRITE, whence=0, start=0,
>>>>         len=0, pid = 0, owner=dc752e78487f0000,
>>>>         client=0x7fbe100e1380,
>>>>         connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,
>>>>         blocked at 2015-10-21 11:37:45
>>>>         inodelk.inodelk[4](BLOCKED)=type=WRITE, whence=0, start=0,
>>>>         len=0, pid = 0, owner=34832e78487f0000,
>>>>         client=0x7fbe100e1380,
>>>>         connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,
>>>>         blocked at 2015-10-21 11:37:45
>>>>         inodelk.inodelk[5](BLOCKED)=type=WRITE, whence=0, start=0,
>>>>         len=0, pid = 0, owner=d44d2e78487f0000,
>>>>         client=0x7fbe100e1380,
>>>>         connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,
>>>>         blocked at 2015-10-21 11:37:45
>>>>         inodelk.inodelk[6](BLOCKED)=type=WRITE, whence=0, start=0,
>>>>         len=0, pid = 0, owner=306f2e78487f0000,
>>>>         client=0x7fbe100e1380,
>>>>         connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,
>>>>         blocked at 2015-10-21 11:37:45
>>>>         inodelk.inodelk[7](BLOCKED)=type=WRITE, whence=0, start=0,
>>>>         len=0, pid = 0, owner=8c902e78487f0000,
>>>>         client=0x7fbe100e1380,
>>>>         connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,
>>>>         blocked at 2015-10-21 11:37:45
>>>>         inodelk.inodelk[8](BLOCKED)=type=WRITE, whence=0, start=0,
>>>>         len=0, pid = 0, owner=782c2e78487f0000,
>>>>         client=0x7fbe100e1380,
>>>>         connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,
>>>>         blocked at 2015-10-21 11:37:45
>>>>         inodelk.inodelk[9](BLOCKED)=type=WRITE, whence=0, start=0,
>>>>         len=0, pid = 0, owner=1c0b2e78487f0000,
>>>>         client=0x7fbe100e1380,
>>>>         connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,
>>>>         blocked at 2015-10-21 11:37:45
>>>>         inodelk.inodelk[10](BLOCKED)=type=WRITE, whence=0, start=0,
>>>>         len=0, pid = 0, owner=24332e78487f0000,
>>>>         client=0x7fbe100e1380,
>>>>         connection-id=cluster-vm-3846-2015/10/21-10:36:03:123909-system_www1-client-0-0-0,
>>>>         blocked at 2015-10-21 11:37:45
>>>>
>>>>         There seem to be multiple locks in BLOCKED state - which
>>>>         doesn't look normal to me. The other 2 nodes have
>>>>         only 2 ACTIVE locks at the same time.
>>>>
>>>>         Below is "gluster volume info" output.
>>>>
>>>>         # gluster volume info
>>>>
>>>>         Volume Name: data_mail1
>>>>         Type: Replicate
>>>>         Volume ID: fc3259a1-ddcf-46e9-ae77-299aaad93b7c
>>>>         Status: Started
>>>>         Number of Bricks: 1 x 3 = 3
>>>>         Transport-type: tcp
>>>>         Bricks:
>>>>         Brick1: cluster-rep:/GFS/data/mail1
>>>>         Brick2: mail-rep:/GFS/data/mail1
>>>>         Brick3: web-rep:/GFS/data/mail1
>>>>         Options Reconfigured:
>>>>         performance.readdir-ahead: on
>>>>         cluster.quorum-count: 2
>>>>         cluster.quorum-type: fixed
>>>>         cluster.server-quorum-ratio: 51%
>>>>
>>>>         Volume Name: data_www1
>>>>         Type: Replicate
>>>>         Volume ID: 0c37a337-dbe5-4e75-8010-94e068c02026
>>>>         Status: Started
>>>>         Number of Bricks: 1 x 3 = 3
>>>>         Transport-type: tcp
>>>>         Bricks:
>>>>         Brick1: cluster-rep:/GFS/data/www1
>>>>         Brick2: web-rep:/GFS/data/www1
>>>>         Brick3: mail-rep:/GFS/data/www1
>>>>         Options Reconfigured:
>>>>         performance.readdir-ahead: on
>>>>         cluster.quorum-type: fixed
>>>>         cluster.quorum-count: 2
>>>>         cluster.server-quorum-ratio: 51%
>>>>
>>>>         Volume Name: system_mail1
>>>>         Type: Replicate
>>>>         Volume ID: 0568d985-9fa7-40a7-bead-298310622cb5
>>>>         Status: Started
>>>>         Number of Bricks: 1 x 3 = 3
>>>>         Transport-type: tcp
>>>>         Bricks:
>>>>         Brick1: cluster-rep:/GFS/system/mail1
>>>>         Brick2: mail-rep:/GFS/system/mail1
>>>>         Brick3: web-rep:/GFS/system/mail1
>>>>         Options Reconfigured:
>>>>         performance.readdir-ahead: on
>>>>         cluster.quorum-type: none
>>>>         cluster.quorum-count: 2
>>>>         cluster.server-quorum-ratio: 51%
>>>>
>>>>         Volume Name: system_www1
>>>>         Type: Replicate
>>>>         Volume ID: 147636a2-5c15-4d9a-93c8-44d51252b124
>>>>         Status: Started
>>>>         Number of Bricks: 1 x 3 = 3
>>>>         Transport-type: tcp
>>>>         Bricks:
>>>>         Brick1: cluster-rep:/GFS/system/www1
>>>>         Brick2: web-rep:/GFS/system/www1
>>>>         Brick3: mail-rep:/GFS/system/www1
>>>>         Options Reconfigured:
>>>>         performance.readdir-ahead: on
>>>>         cluster.quorum-type: none
>>>>         cluster.quorum-count: 2
>>>>         cluster.server-quorum-ratio: 51%
>>>>
>>>>         The issue does not occur when I get rid of 3rd arbiter brick.
>>>
>>>         What do you mean by 'getting rid of'? Killing the 3rd brick
>>>         process of the volume?
>>>
>>>         Regards,
>>>         Ravi
>>>>
>>>>         If there's any additional information that is missing and I
>>>>         could provide, please let me know.
>>>>
>>>>         Greetings,
>>>>         Adrian
>>>>
>>>>
>>>>         _______________________________________________
>>>>         Gluster-users mailing list
>>>>         Gluster-users at gluster.org  <mailto:Gluster-users at gluster.org>
>>>>         http://www.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>>>
>>>         _______________________________________________
>>>         Gluster-users mailing list
>>>         Gluster-users at gluster.org  <mailto:Gluster-users at gluster.org>
>>>         http://www.gluster.org/mailman/listinfo/gluster-users
>>
>>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20151105/1d47490e/attachment.html>