[Gluster-users] Gluster NFS performance issue upgrading from 3.2.5 to 3.2.6/3.3.0

Tue Jun 12 01:45:02 UTC 2012

For what it is worth, I had weird performance issues when I moved from 
3.2.5 to 3.3.0 - I saw increased CPU utilization, as well as drastically 
increased network utilization between the nodes with the same workload. 
I could never really quantify the difference, other than I noticed my 
systems mounting the volumes via NFS had more problems when a brick went 
offline in an unclean way (e.g. network disappeared). I have six nodes 
which run 4-way or 2-way replicas and mount the volumes locally, so it's 
a pretty self-contained configuration.

Today I rolled back to 3.2.5 and ran for ~8hours. My traffic dropped 
back to what it looked like previously, and my load dropped to next to 
nothing. I just upgraded to 3.2.7, and at least in the last couple of 
hours network utilization is about the same as 3.3.0. CPU usage is 
actually worse with 3.2.7 than 3.3.0.

Does anyone have a 'good' test of Gluster performance? Most of my 
operations take <10s, so when they take 5.4s avg with 3.3.0 after 100 
runs, and avg 4.6s with 3.2.5 it's hard to tell if it's a meaningful 15% 
or bad statistics. I'd like to understand what is different between 
3.2.5 and 3.3.0 or 3.2.7, but really need a good way to quantify it.

On 6/11/12 10:15 AM, Simon Detheridge wrote:
> Hi,
>
> I have a situation where I'm mounting a gluster volume on several web servers via NFS. The web servers run Rails applications off the gluster NFS mounts. The whole thing is running on EC2.
>
> On 3.2.5, starting a Rails application on the web server was sluggish but acceptable. However, after upgrading to 3.2.6 the length of time taken to start a Rails application has increased by over 10 times, to something that's not really suitable for a production environment. The situation still occurs with 3.3.0 as well.
>
> If I attach strace to the rails process as it's starting up, I see that it's looking for a very large number of nonexistent files. I think this is something that Rails does that can't be helped - it checks to see if a file is there for many things, and does something accordingly if it does.
>
> Has something changed that could negatively affect the length of time it takes to stat a nonexistent file over a NFS mount to a gluster volume, between 3.2.5 and 3.2.6? Is there any way I can get the old behaviour without downgrading?
>
> -- I don't currently have proof that it's the nonexistent files that's causing the problem, but it seems highly likely as performance for the other tasks that the servers carry out appears unaffected.
>
> Sorry this is slightly vague. I can run some more tests/benchmarks to try and figure out what's going on in more detail, but thought I would ask here first in case this is related to a known issue.
>
> Thanks,
> Simon
>