[Gluster-users] Performance optimization tips Gluster 3.3? (small files / directory listings)
olav johansen
luxis2012 at gmail.com
Fri Jun 8 04:19:58 UTC 2012
Hi All,
Thanks for great feedback, I had changed ip's and I noticed one server
wasn't connecting correctly when checking log.
To ensure I had no wrong-doings I've re-done the bricks from scratch, clean
configurations, with mount info attached below, still not performing
'great' compared to a single NFS mount.
The application we're running our files don't change, we only add / delete
files, so I'd like to get directory / file info cached as much as possible.
Config info:
gluster> volume info data-storage
Volume Name: data-storage
Type: Replicate
Volume ID: cc91c107-bdbb-4179-a097-cdd3e9d5ac93
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: fs1:/data/storage
Brick2: fs2:/data/storage
gluster>
On my web1 node I mounted:
# mount -t glusterfs fs1:/data-storage /storage
I've copied over my data to it again and doing a ls several times, takes
~0.5 seconds:
[@web1 files]# time ls -all|wc -l
1989
real 0m0.485s
user 0m0.022s
sys 0m0.109s
[@web1 files]# time ls -all|wc -l
1989
real 0m0.489s
user 0m0.016s
sys 0m0.116s
[@web1 files]# time ls -all|wc -l
1989
real 0m0.493s
user 0m0.018s
sys 0m0.115s
Doing the same thing on the raw os files on one node takes 0.021s
[@fs2 files]# time ls -all|wc -l
1989
real 0m0.021s
user 0m0.007s
sys 0m0.015s
[@fs2 files]# time ls -all|wc -l
1989
real 0m0.020s
user 0m0.008s
sys 0m0.013s
Now full directory listing even seems slower... :
[@web1 files]# time ls -alR|wc -l
2242956
real 74m0.660s
user 0m20.117s
sys 1m24.734s
[@web1 files]# time ls -alR|wc -l
2242956
real 26m27.159s
user 0m17.387s
sys 1m11.217s
[@web1 files]# time ls -alR|wc -l
2242956
real 27m38.163s
user 0m18.333s
sys 1m19.824s
Just as crazy reference, on another single server with SSD's (Raid 10)
drives I get:
files# time ls -alR|wc -l
2260484
real 0m15.761s
user 0m5.170s
sys 0m7.670s
For the same operation. (this server even have more files...)
My goal is to get this directory listing as fast as possible, I don't have
the hardware/budget to test a SSD configuration, but would a SSD setup give
me ~1minute directory listing time (assuming it is 4 times slower than
single node)?
If I added two more bricks to the cluster / replicated, would this double
read speed?
Thanks for any insight!
-------------------- storage.log from web1 on mount ---------------------
[2012-06-07 20:47:45.584320] I [glusterfsd.c:1666:main]
0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.3.0
[2012-06-07 20:47:45.624548] I [io-cache.c:1549:check_cache_size_ok]
0-data-storage-quick-read: Max cache size is 8252092416
[2012-06-07 20:47:45.624612] I [io-cache.c:1549:check_cache_size_ok]
0-data-storage-io-cache: Max cache size is 8252092416
[2012-06-07 20:47:45.628148] I [client.c:2142:notify]
0-data-storage-client-0: parent translators are ready, attempting connect
on transport
[2012-06-07 20:47:45.631059] I [client.c:2142:notify]
0-data-storage-client-1: parent translators are ready, attempting connect
on transport
Given volfile:
+------------------------------------------------------------------------------+
1: volume data-storage-client-0
2: type protocol/client
3: option remote-host fs1
4: option remote-subvolume /data/storage
5: option transport-type tcp
6: end-volume
7:
8: volume data-storage-client-1
9: type protocol/client
10: option remote-host fs2
11: option remote-subvolume /data/storage
12: option transport-type tcp
13: end-volume
14:
15: volume data-storage-replicate-0
16: type cluster/replicate
17: subvolumes data-storage-client-0 data-storage-client-1
18: end-volume
19:
20: volume data-storage-write-behind
21: type performance/write-behind
22: subvolumes data-storage-replicate-0
23: end-volume
24:
25: volume data-storage-read-ahead
26: type performance/read-ahead
27: subvolumes data-storage-write-behind
28: end-volume
29:
30: volume data-storage-io-cache
31: type performance/io-cache
32: subvolumes data-storage-read-ahead
33: end-volume
34:
35: volume data-storage-quick-read
36: type performance/quick-read
37: subvolumes data-storage-io-cache
38: end-volume
39:
40: volume data-storage-md-cache
41: type performance/md-cache
42: subvolumes data-storage-quick-read
43: end-volume
44:
45: volume data-storage
46: type debug/io-stats
47: option latency-measurement off
48: option count-fop-hits off
49: subvolumes data-storage-md-cache
50: end-volume
+------------------------------------------------------------------------------+
[2012-06-07 20:47:45.642625] I [rpc-clnt.c:1660:rpc_clnt_reconfig]
0-data-storage-client-0: changing port to 24009 (from 0)
[2012-06-07 20:47:45.648604] I [rpc-clnt.c:1660:rpc_clnt_reconfig]
0-data-storage-client-1: changing port to 24009 (from 0)
[2012-06-07 20:47:49.592729] I
[client-handshake.c:1636:select_server_supported_programs]
0-data-storage-client-0: Using Program GlusterFS 3.3.0, Num (1298437),
Version (330)
[2012-06-07 20:47:49.595099] I
[client-handshake.c:1636:select_server_supported_programs]
0-data-storage-client-1: Using Program GlusterFS 3.3.0, Num (1298437),
Version (330)
[2012-06-07 20:47:49.608455] I
[client-handshake.c:1433:client_setvolume_cbk] 0-data-storage-client-0:
Connected to 10.1.80.81:24009, attached to remote volume '/data/storage'.
[2012-06-07 20:47:49.608489] I
[client-handshake.c:1445:client_setvolume_cbk] 0-data-storage-client-0:
Server and Client lk-version numbers are not same, reopening the fds
[2012-06-07 20:47:49.608572] I [afr-common.c:3627:afr_notify]
0-data-storage-replicate-0: Subvolume 'data-storage-client-0' came back up;
going online.
[2012-06-07 20:47:49.608837] I
[client-handshake.c:453:client_set_lk_version_cbk] 0-data-storage-client-0:
Server lk version = 1
[2012-06-07 20:47:49.616381] I
[client-handshake.c:1433:client_setvolume_cbk] 0-data-storage-client-1:
Connected to 10.1.80.82:24009, attached to remote volume '/data/storage'.
[2012-06-07 20:47:49.616434] I
[client-handshake.c:1445:client_setvolume_cbk] 0-data-storage-client-1:
Server and Client lk-version numbers are not same, reopening the fds
[2012-06-07 20:47:49.621808] I [fuse-bridge.c:4193:fuse_graph_setup]
0-fuse: switched to graph 0
[2012-06-07 20:47:49.622793] I
[client-handshake.c:453:client_set_lk_version_cbk] 0-data-storage-client-1:
Server lk version = 1
[2012-06-07 20:47:49.622873] I [fuse-bridge.c:3376:fuse_init]
0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel
7.13
[2012-06-07 20:47:49.623440] I
[afr-common.c:1964:afr_set_root_inode_on_first_lookup]
0-data-storage-replicate-0: added root inode
-------------------- End storage.log
-----------------------------------------------------
On Thu, Jun 7, 2012 at 9:46 AM, Pranith Kumar Karampuri <pkarampu at redhat.com
> wrote:
> hi Brian,
> 'stat' command comes as fop (File-operation) 'lookup' to the gluster
> mount which triggers self-heal. So the behavior is still same.
> I was referring to the fop 'stat' which will be performed only on one of
> the bricks.
> Unfortunately most of the commands and fops have same name.
> Following are some of the examples of read-fops:
> .access
> .stat
> .fstat
> .readlink
> .getxattr
> .fgetxattr
> .readv
>
> Pranith.
> ----- Original Message -----
> From: "Brian Candler" <B.Candler at pobox.com>
> To: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
> Cc: "olav johansen" <luxis2012 at gmail.com>, gluster-users at gluster.org,
> "Fernando Frediani (Qube)" <fernando.frediani at qubenet.net>
> Sent: Thursday, June 7, 2012 7:06:26 PM
> Subject: Re: [Gluster-users] Performance optimization tips Gluster 3.3?
> (small files / directory listings)
>
> On Thu, Jun 07, 2012 at 08:34:56AM -0400, Pranith Kumar Karampuri wrote:
> > Brian,
> > Small correction: 'sending queries to *both* servers to check they are
> in sync - even read accesses.' Read fops like stat/getxattr etc are sent to
> only one brick.
>
> Is that new behaviour for 3.3? My understanding was that stat() was a
> healing operation.
>
> http://gluster.org/community/documentation/index.php/Gluster_3.2:_Triggering_Self-Heal_on_Replicate
>
> If this is no longer true, then I'd like to understand what happens after a
> node has been down and comes up again. I understand there's a self-healing
> daemon in 3.3, but what if you try to access a file which has not yet been
> healed?
>
> I'm interested in understanding this, especially the split-brain scenarios
> (better to understand them *before* you're stuck in a problem :-)
>
> BTW I'm in the process of building a 2-node 3.3 test cluster right now.
>
> Cheers,
>
> Brian.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20120608/b30ac40b/attachment.html>
More information about the Gluster-users
mailing list