[Gluster-users] Volume locking up when used over IB (was Re: Peer Probe)

Joe Julian joe at julianfamily.org
Thu Feb 28 19:50:32 UTC 2013

Where are you copying from? localhost mount, or some other client?

Check your client log. Look for clues there. If you can't find any then 
continue with these diagnostic techniques.

kill -USR1 <glusterfs pid>

That will create a dump file in /tmp of the client's state. If you can 
determine which brick it's stuck on, getting a dump of that brick's 
glusterfsd could be useful as well. Include that with a bug report. I 
still haven't figured out how to read that myself though.

Attach an strace and/or gdb to the client and/or brick server and see 
where it's locked up. Be sure to backtrace all threads in gdb. Again, 
include those with a bug report.

Finally, of course, there's the possibility that it's your IB drivers.

On 02/28/2013 11:32 AM, Tony Saenz wrote:
> No, they're XFS and thanks for the other tip! Is there something missing? It's pretty consistent.. Reads are fine but as soon as I transfer/copy files the mount starts hanging and the server locks up.
> On Feb 28, 2013, at 10:55 AM, Joe Julian <joe at julianfamily.org>
>   wrote:
>> Are your bricks formatted ext4?
>> On 02/28/2013 10:32 AM, Tony Saenz wrote:
>>> I finally got everything up. However, when transferring files the server locks up.. df hangs etc. In order to get things working I have to kill off processes and unmount for the box to start responding. I put everything back on the NIC cards and transferring files works as expected.
>>> Any ideas?
>>> On Feb 26, 2013, at 1:36 AM, Brian Candler <B.Candler at pobox.com> wrote:
>>>> On Mon, Feb 25, 2013 at 06:28:01PM +0000, Tony Saenz wrote:
>>>>> Any help please? The regular NICs are fine which is what it currently sees but I'd like to move them over to the Infiniband cards.
>>>> ...
>>>>>> [root at fpsgluster testvault]# gluster peer probe fpsgluster2ib
>>>>>> Probe on host fpsgluster2ib port 0 already in peer list
>>>> Probing only works in one direction. The HTML admin guide has been taken
>>>> down so I can only point you to the PDF:
>>>> http://www.gluster.org/wp-content/uploads/2012/05/Gluster_File_System-3.3.0-Administration_Guide-en-US.pdf
>>>> "use the probe command from a storage server that is already part of the
>>>> trusted storage pool."
>>>> That is, probe from existing cluster node to new node, not from new node to
>>>> cluster.
