[Gluster-users] Extremely high load after 100% full bricks

Thu Nov 1 16:24:17 UTC 2012

Dear All-
The excessive CPU load problem seems to have been caused by the 
problematic upgrade and subsequent downgrade I reported in the following 
thread.

http://www.gluster.org/pipermail/gluster-users/2012-November/034643.html

When downgrading to 3.3.0 using yum the glusterfs-server-3.3.1-1 
packages removal failed because of an RPM script error.  On CentOS-5 
servers "yum remove glusterfs-3.3.1-1.el5" did the trick, but on 
CentOS-6 I had to forcibly remove the package with "rpm -e --noscripts 
glusterfs-server-3.3.1-1.el6.x86_64".  I later discovered that the UUID 
value in /var/lib/glusterd/glusterd.info on the CentOS-6 servers had 
changed, and they were listing themselves in the output of "gluster peer 
status".

I found the original UUID's for the CentOS-6 servers by looking at the 
file names in /var/lib/glusterd/peers on other servers, like this.

[root at remus peers]# grep romulus /var/lib/glusterd/peers/*
/var/lib/glusterd/peers/cb21050d-05c2-42b3-8660-230954bab324:hostname1=romulus.nerc-essc.ac.uk

With glusterd stopped on all servers I changed the "UUID=" line in 
/var/lib/glusterd/glusterd.info back to the original value for each 
server.  With glusterd running again on all the servers everything 
seemed to go back to normal, except for a lot of self-heal activity on 
the servers that had been suffering from the excessive load problem.  I 
presume a lot of xattr errors had been caused by those servers not 
talking to the others properly while the load was so high.

While looking back at what I did in order to write this message, I have 
just discovered another UUID related problem.  On some servers the files 
in /var/lib/glusterd/peers have the wrong UUID.  The "UUID=" line in 
each of those files should match the file names but on some servers they 
don't.  I haven't noticed any adverse effects yet, except for not being 
able to do "gluster volume status" on any of the CentOS-6 servers that 
were messed up by the problematic downgrade to 3.3.0.  I suppose I will 
have to stop glusterd on all the servers again and manually correct 
these errors on all the servers.  I have 21 of them so it will take a 
while, but it could be worse I suppose.  I would be interested to know 
if there is a quicker way to recover from a mess like this; any suggestions?

-Dan.

On 10/25/2012 04:34 PM, Dan Bretherton wrote:
> Dear All-
> I'm not sure this excessive server load has anything to do with the 
> bricks having been full.  I noticed the full bricks while I was 
> investigating the excessive load, and assumed the two were related.  
> However despite there being plenty of room on all the bricks the load 
> on this particular pair of servers has been consistently between 60 
> and 80 all week, and this is causing serious problems for users who 
> are getting repeated I/O errors.  The servers are responding so slowly 
> that GlusterFS isn't working properly, and CLI commands like "gluster 
> volume stop" just time out when issued on any server.  Restarting 
> glusterd on all servers has no effect.
>
> Is there any way to limit the load imposed by GlusterFS on a server?  
> I desperately need to reduce it to a level where GlusterFS can work 
> properly and talk to the other servers without timing out.
>
> -Dan.
>
>
> On 10/22/2012 02:03 PM, Dan Bretherton wrote:
>> Dear All-
>> A replicated pair of servers in my GlusterFS 3.3.0 cluster have been 
>> experiencing extremely high load for the past few days after a 
>> replicated brick pair became 100% full.  The GlusterFS related load 
>> on one of the servers was fluctuating at around 60, and this high 
>> load would swap to the other server periodically.  When I noticed the 
>> full bricks I quickly extended the volume by creating new bricks on 
>> another server, and manually moved some data off the full bricks to 
>> create space for write operations.  The fix-layout operation seemed 
>> to start normally but the load then increased even further.  The 
>> server with the high load (then up to about 80) became very slow to 
>> respond and I noticed a lot of errors in the VOLNAME-rebalance.log 
>> files like the following.
>>
>> [2012-10-22 00:35:52.070364] W 
>> [socket.c:1512:__socket_proto_state_machine] 0-atmos-client-10: 
>> reading from socket failed. Error (Transport endpoint is not 
>> connected), peer (192.171.166.92:24052)
>> [2012-10-22 00:35:52.070446] E [rpc-clnt.c:373:saved_frames_unwind] 
>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0xe7) [0x2b3fd905c547] 
>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xb2) 
>> [0x2b3fd905bf42] 
>> (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) 
>> [0x2b3fd905bbfe]))) 0-atmos-client-10: forced unwinding frame 
>> type(GlusterFS 3.1) op(INODELK(29)) called at 2012-10-22 
>> 00:35:45.454529 (xid=0x285951x)
>>
>> There have also been occasional errors like the following, referring 
>> to the pair of bricks that became 100% full.
>>
>> [2012-10-22 01:32:52.827044] W 
>> [client3_1-fops.c:5517:client3_1_readdir] 0-atmos-client-15:  
>> (00000000-0000-0000-0000-000000000000) remote_fd is -1. EBADFD
>> [2012-10-22 09:49:21.103066] W 
>> [client3_1-fops.c:5628:client3_1_readdirp] 0-atmos-client-14:  
>> (00000000-0000-0000-0000-000000000000) remote_fd is -1. EBADFD
>>
>> The log files from the bricks that were 100% full have a lot of these 
>> errors in, from the period after I freed up some space on them.
>>
>> [2012-10-22 00:40:56.246075] E [server.c:176:server_submit_reply] 
>> (-->/usr/lib64/libglusterfs.so.0(default_inodelk_cbk+0xa4) 
>> [0x361da23e84] 
>> (-->/usr/lib64/glusterfs/3.3.0/xlator/debug/io-stats.so(io_stats_inodelk_cbk+0xd8) 
>> [0x2aaaabd74d48] 
>> (-->/usr/lib64/glusterfs/3.3.0/xlator/protocol/server.so(server_inodelk_cbk+0x10b) 
>> [0x2aaaabf9742b]))) 0-: Reply submission failed
>> [2012-10-22 00:40:56.246117] I 
>> [server-helpers.c:629:server_connection_destroy] 0-atmos-server: 
>> destroyed connection of 
>> bdan10.nerc-essc.ac.uk-13609-2012/10/21-23:04:53:323865-atmos-client-15-0
>>
>> All these errors have only occurred on the replicated pair of servers 
>> that had suffered from 100% full bricks.  I don't know if the errors 
>> are being caused by the high load (resulting in poor communication 
>> with other peers for example) or if the high load is the result of 
>> replication and/or distribution errors.  I have tried various things 
>> to bring the load down, including un-mounting the volume and stopping 
>> the fix-layout operation, but the only thing that works is stopping 
>> the volume. Obviously I can't do that for long because people need to 
>> use the data, but with the load as high as it is data access is very 
>> slow and users are experiencing a lot of temporary I/O errors.   
>> Bricks from several volumes are on those servers so everybody in the 
>> department is being affected by this problem.  I thought at first 
>> that the load was being caused by self-heal operations fixing errors 
>> caused by write failures that occurred when the bricks were full, but 
>> it is glusterfs threads that are causing the high load, not glustershd.
>>
>> Can anyone suggest a way to bring the load down so people can access 
>> the data properly again?  Also, can I trust GlusterFS to eventually 
>> self-heal the errors causing the above error messages?
>>
>> Regards,
>> -Dan.