[Gluster-users] Cascading errors and very bad write performance

Thu Aug 6 23:32:03 UTC 2015

Hi,

No idea to help me fix this issue? (big logs, small write performance (/4), etc.)

For comparison, here to volumes: 
	- home: distributed on 4 bricks / 2 nodes  (and replicated on 4 other bricks / 2 other nodes):
# ddt -t 35g /home
Writing to /home/ddt.24172 ... syncing ... done.
sleeping 10 seconds ... done.
Reading from /home/ddt.24172 ... done.
33792MiB    KiB/s  CPU%
Write      103659     1
Read       391955     3

	- workdir: distributed on 4 bricks / 2 nodes (one the same RAID volumes and servers than home):
# ddt -t 35g /workdir
Writing to /workdir/ddt.24717 ... syncing ... done.
sleeping 10 seconds ... done.
Reading from /workdir/ddt.24717 ... done.
35840MiB    KiB/s  CPU%
Write      738314     4
Read       536497     4

For information, previously on 3.5.3-2 version, I obtained roughly 1.1GBs for workdir volume and ~550-600MBs for home.

All my tests (CP, RSYNC, etc.) provides me the same result (write throughput between 100MBs and 150MBs)

Thanks.
Geoffrey
------------------------------------------------------
Geoffrey Letessier
Responsable informatique & ingénieur système
UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005 Paris
Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr

Le 5 août 2015 à 10:40, Geoffrey Letessier <geoffrey.letessier at cnrs.fr> a écrit :

> Hello,
> 
> In addition, knowing I have reactivated the log (brick-log-level = INFO not CRITICAL) only for the file creation duration (i.e. a few minutes), do you have noticed the log sizes and the number of lines inside:
> # ls -lh storage*
> -rw-------  1 letessier  staff    18M  5 aoû 00:54 storage1__export-brick_home-brick1-data.log
> -rw-------  1 letessier  staff   2,1K  5 aoû 00:54 storage1__export-brick_home-brick2-data.log
> -rw-------  1 letessier  staff    15M  5 aoû 00:56 storage2__export-brick_home-brick1-data.log
> -rw-------  1 letessier  staff   2,1K  5 aoû 00:54 storage2__export-brick_home-brick2-data.log
> -rw-------  1 letessier  staff    47M  5 aoû 00:55 storage3__export-brick_home-brick1-data.log
> -rw-------  1 letessier  staff   2,1K  5 aoû 00:54 storage3__export-brick_home-brick2-data.log
> -rw-------  1 letessier  staff    47M  5 aoû 00:55 storage4__export-brick_home-brick1-data.log
> -rw-------  1 letessier  staff   2,1K  5 aoû 00:55 storage4__export-brick_home-brick2-data.log
> 
> # wc -l storage*
>    55381 storage1__export-brick_home-brick1-data.log
>       17 storage1__export-brick_home-brick2-data.log
>    41636 storage2__export-brick_home-brick1-data.log
>       17 storage2__export-brick_home-brick2-data.log
>   270360 storage3__export-brick_home-brick1-data.log
>       17 storage3__export-brick_home-brick2-data.log
>   270358 storage4__export-brick_home-brick1-data.log
>       17 storage4__export-brick_home-brick2-data.log
>   637803 total
> 
> If the let brick-log-level to INFO, the brick log files in each server will consume all my /var partition capacity within only a few hours/days…
> 
> Thanks in advance,
> Geoffrey
> ------------------------------------------------------
> Geoffrey Letessier
> Responsable informatique & ingénieur système
> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
> Institut de Biologie Physico-Chimique
> 13, rue Pierre et Marie Curie - 75005 Paris
> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr
> 
> Le 5 août 2015 à 01:12, Geoffrey Letessier <geoffrey.letessier at cnrs.fr> a écrit :
> 
>> Hello,
>> 
>> Since the problem motioned previously (all errors noticed in brick log files), i notice a very very bad performance: i can note my write performance divided by 4 than previously -knowing it was not so good before.
>> Now, a write of a 33GB file, my write throughput is around 150MBs (with Infiniband), before it was around 550-600MBs; and this, both with RDMA and TCP protocol.
>> 
>> During this test, more than 40 000 error lines (as the following) were added to the brick log files.
>> [2015-08-04 22:34:27.337622] E [dict.c:1418:dict_copy_with_ref] (-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60) [0x7f021c6f7410] -->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88) [0x7f021c6f7188] -->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4) [0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide]
>> 
>> 
>> All brick log files are in attachments.
>> 
>> Thanks in advance for all your help and fix,
>> Best,
>> Geoffrey
>> 
>> PS: question: is it possible to easily downgrade GlusterFS to a previous version from 3.7 (for example: v3.5)?
>> 
>> ------------------------------------------------------
>> Geoffrey Letessier
>> Responsable informatique & ingénieur système
>> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
>> Institut de Biologie Physico-Chimique
>> 13, rue Pierre et Marie Curie - 75005 Paris
>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr
>> <bricks-logs.tgz>
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150807/d330524a/attachment.html>