[Gluster-users] Extremely high load after 100% full bricks
Dan Bretherton
d.a.bretherton at reading.ac.uk
Thu Nov 1 16:24:17 UTC 2012
Dear All-
The excessive CPU load problem seems to have been caused by the
problematic upgrade and subsequent downgrade I reported in the following
thread.
http://www.gluster.org/pipermail/gluster-users/2012-November/034643.html
When downgrading to 3.3.0 using yum the glusterfs-server-3.3.1-1
packages removal failed because of an RPM script error. On CentOS-5
servers "yum remove glusterfs-3.3.1-1.el5" did the trick, but on
CentOS-6 I had to forcibly remove the package with "rpm -e --noscripts
glusterfs-server-3.3.1-1.el6.x86_64". I later discovered that the UUID
value in /var/lib/glusterd/glusterd.info on the CentOS-6 servers had
changed, and they were listing themselves in the output of "gluster peer
status".
I found the original UUID's for the CentOS-6 servers by looking at the
file names in /var/lib/glusterd/peers on other servers, like this.
[root at remus peers]# grep romulus /var/lib/glusterd/peers/*
/var/lib/glusterd/peers/cb21050d-05c2-42b3-8660-230954bab324:hostname1=romulus.nerc-essc.ac.uk
With glusterd stopped on all servers I changed the "UUID=" line in
/var/lib/glusterd/glusterd.info back to the original value for each
server. With glusterd running again on all the servers everything
seemed to go back to normal, except for a lot of self-heal activity on
the servers that had been suffering from the excessive load problem. I
presume a lot of xattr errors had been caused by those servers not
talking to the others properly while the load was so high.
While looking back at what I did in order to write this message, I have
just discovered another UUID related problem. On some servers the files
in /var/lib/glusterd/peers have the wrong UUID. The "UUID=" line in
each of those files should match the file names but on some servers they
don't. I haven't noticed any adverse effects yet, except for not being
able to do "gluster volume status" on any of the CentOS-6 servers that
were messed up by the problematic downgrade to 3.3.0. I suppose I will
have to stop glusterd on all the servers again and manually correct
these errors on all the servers. I have 21 of them so it will take a
while, but it could be worse I suppose. I would be interested to know
if there is a quicker way to recover from a mess like this; any suggestions?
-Dan.
On 10/25/2012 04:34 PM, Dan Bretherton wrote:
> Dear All-
> I'm not sure this excessive server load has anything to do with the
> bricks having been full. I noticed the full bricks while I was
> investigating the excessive load, and assumed the two were related.
> However despite there being plenty of room on all the bricks the load
> on this particular pair of servers has been consistently between 60
> and 80 all week, and this is causing serious problems for users who
> are getting repeated I/O errors. The servers are responding so slowly
> that GlusterFS isn't working properly, and CLI commands like "gluster
> volume stop" just time out when issued on any server. Restarting
> glusterd on all servers has no effect.
>
> Is there any way to limit the load imposed by GlusterFS on a server?
> I desperately need to reduce it to a level where GlusterFS can work
> properly and talk to the other servers without timing out.
>
> -Dan.
>
>
> On 10/22/2012 02:03 PM, Dan Bretherton wrote:
>> Dear All-
>> A replicated pair of servers in my GlusterFS 3.3.0 cluster have been
>> experiencing extremely high load for the past few days after a
>> replicated brick pair became 100% full. The GlusterFS related load
>> on one of the servers was fluctuating at around 60, and this high
>> load would swap to the other server periodically. When I noticed the
>> full bricks I quickly extended the volume by creating new bricks on
>> another server, and manually moved some data off the full bricks to
>> create space for write operations. The fix-layout operation seemed
>> to start normally but the load then increased even further. The
>> server with the high load (then up to about 80) became very slow to
>> respond and I noticed a lot of errors in the VOLNAME-rebalance.log
>> files like the following.
>>
>> [2012-10-22 00:35:52.070364] W
>> [socket.c:1512:__socket_proto_state_machine] 0-atmos-client-10:
>> reading from socket failed. Error (Transport endpoint is not
>> connected), peer (192.171.166.92:24052)
>> [2012-10-22 00:35:52.070446] E [rpc-clnt.c:373:saved_frames_unwind]
>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0xe7) [0x2b3fd905c547]
>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xb2)
>> [0x2b3fd905bf42]
>> (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)
>> [0x2b3fd905bbfe]))) 0-atmos-client-10: forced unwinding frame
>> type(GlusterFS 3.1) op(INODELK(29)) called at 2012-10-22
>> 00:35:45.454529 (xid=0x285951x)
>>
>> There have also been occasional errors like the following, referring
>> to the pair of bricks that became 100% full.
>>
>> [2012-10-22 01:32:52.827044] W
>> [client3_1-fops.c:5517:client3_1_readdir] 0-atmos-client-15:
>> (00000000-0000-0000-0000-000000000000) remote_fd is -1. EBADFD
>> [2012-10-22 09:49:21.103066] W
>> [client3_1-fops.c:5628:client3_1_readdirp] 0-atmos-client-14:
>> (00000000-0000-0000-0000-000000000000) remote_fd is -1. EBADFD
>>
>> The log files from the bricks that were 100% full have a lot of these
>> errors in, from the period after I freed up some space on them.
>>
>> [2012-10-22 00:40:56.246075] E [server.c:176:server_submit_reply]
>> (-->/usr/lib64/libglusterfs.so.0(default_inodelk_cbk+0xa4)
>> [0x361da23e84]
>> (-->/usr/lib64/glusterfs/3.3.0/xlator/debug/io-stats.so(io_stats_inodelk_cbk+0xd8)
>> [0x2aaaabd74d48]
>> (-->/usr/lib64/glusterfs/3.3.0/xlator/protocol/server.so(server_inodelk_cbk+0x10b)
>> [0x2aaaabf9742b]))) 0-: Reply submission failed
>> [2012-10-22 00:40:56.246117] I
>> [server-helpers.c:629:server_connection_destroy] 0-atmos-server:
>> destroyed connection of
>> bdan10.nerc-essc.ac.uk-13609-2012/10/21-23:04:53:323865-atmos-client-15-0
>>
>> All these errors have only occurred on the replicated pair of servers
>> that had suffered from 100% full bricks. I don't know if the errors
>> are being caused by the high load (resulting in poor communication
>> with other peers for example) or if the high load is the result of
>> replication and/or distribution errors. I have tried various things
>> to bring the load down, including un-mounting the volume and stopping
>> the fix-layout operation, but the only thing that works is stopping
>> the volume. Obviously I can't do that for long because people need to
>> use the data, but with the load as high as it is data access is very
>> slow and users are experiencing a lot of temporary I/O errors.
>> Bricks from several volumes are on those servers so everybody in the
>> department is being affected by this problem. I thought at first
>> that the load was being caused by self-heal operations fixing errors
>> caused by write failures that occurred when the bricks were full, but
>> it is glusterfs threads that are causing the high load, not glustershd.
>>
>> Can anyone suggest a way to bring the load down so people can access
>> the data properly again? Also, can I trust GlusterFS to eventually
>> self-heal the errors causing the above error messages?
>>
>> Regards,
>> -Dan.
More information about the Gluster-users
mailing list