[Gluster-users] Cascading errors and very bad write performance

Sat Aug 8 08:02:18 UTC 2015

Maybe related to the insecure port issue reported ?

try with :

gluster volume set xxx server.allow-insecure on

Cordialement,
Mathieu CHATEAU
http://www.lotp.fr

2015-08-07 23:47 GMT+02:00 Geoffrey Letessier <geoffrey.letessier at cnrs.fr>:

> I’m not really sure to well understand your answer.
>
> I try to set inode-lru-limit to 1, I can not notice any good effect.
>
> When i re-run ddt application, I can note 2 kinds of messages:
> [2015-08-07 21:29:21.792156] W
> [marker-quota.c:3379:_mq_initiate_quota_txn] 0-vol_home-marker: parent is
> NULL for <gfid:5a32328a-7fd9-474e-9bc6-cafde9c41af7>, aborting updation txn
> [2015-08-07 21:29:21.792176] W
> [marker-quota.c:3379:_mq_initiate_quota_txn] 0-vol_home-marker: parent is
> NULL for <gfid:5a32328a-7fd9-474e-9bc6-cafde9c41af7>, aborting updation txn
>
> and/or:
> [2015-08-07 21:44:19.279971] E [marker-quota.c:2990:mq_start_quota_txn_v2]
> 0-vol_home-marker: contribution node list is empty
> (31d7bf88-b63a-4731-a737-a3dce73b8cd1)
> [2015-08-07 21:41:26.177095] E [dict.c:1418:dict_copy_with_ref]
> (-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60)
> [0x7f85e9a6a410]
> -->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88)
> [0x7f85e9a6a188] -->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4)
> [0x3e99c20674] ) 0-dict: invalid argument: dict [Argument invalide]
>
> And concerning the bad IO performance?
>
> [letessier at node031 ~]$ ddt -t 35g /home/admin_team/letessier/
> Writing to /home/admin_team/letessier/ddt.25259 ... syncing ... done.
> sleeping 10 seconds ... done.
> Reading from /home/admin_team/letessier/ddt.25259 ... done.
> 35840MiB    KiB/s  CPU%
> Write      277451     3
> Read       188682     1
> [letessier at node031 ~]$ logout
> [root at node031 ~]# ddt -t 35g /home/
> Writing to /home/ddt.25559 ... syncing ... done.
> sleeping 10 seconds ... done.
> Reading from /home/ddt.25559 ... done.
> 35840MiB    KiB/s  CPU%
> Write      196539     2
> Read       438944     3
> Notice the read/write throughput differences when i’m root and when i’m a
> simple user.
>
> Thanks.
> Geoffrey
> ------------------------------------------------------
> Geoffrey Letessier
> Responsable informatique & ingénieur système
> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
> Institut de Biologie Physico-Chimique
> 13, rue Pierre et Marie Curie - 75005 Paris
> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr
>
> Le 7 août 2015 à 14:57, Vijaikumar M <vmallika at redhat.com> a écrit :
>
>
>
> On Friday 07 August 2015 05:34 PM, Geoffrey Letessier wrote:
>
> Hi Vijay,
>
> My brick logs issue and big performance problem have begun when I upgraded
> Gluster into 3.7.3 version; before write throughput was good enough
> (~500MBs) -but not as good as with GlusterFS 3.5.3 (especially with
> distributed volumes)- and didn’t notice these problème with brick-logs.
>
> OK… in live:
>
> i just disable to quota for my home volume and now my performance appears
> to be relatively better (around 300MBs) but i still see the logs (from
> storage1 and its replicate storage2) growing up with only this kind of
> lines:
> [2015-08-07 11:16:51.746142] E [dict.c:1418:dict_copy_with_ref]
> (-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60)
> [0x7f85e9a6a410]
> -->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88)
> [0x7f85e9a6a188] -->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4)
> [0x3e99c20674] ) 0-dict: invalid argument: dict [Argument invalide]
>
> We have root caused log issue,  bug# 1244613 tracks this issue
>
>
> After a few minutes: my write throughput seems to be now correct (~550MBs)
> but the log are still growing up (to not say exploding). So one part of the
> problem looks like taking its origin in the quota system management.
> … after a few minutes (and still only 1 client connected), now it is the
> read operation which is very very slow… -I’m gonna become crazy! :/-
> # ddt -t 50g /home/
> Writing to /home/ddt.11293 ... syncing ... done.
> sleeping 10 seconds ... done.
> Reading from /home/ddt.11293 ... done.
> 35840MiB    KiB/s  CPU%
> Write      568201     5
> Read       567008     4
> # ddt -t 50g /home/
> Writing to /home/ddt.11397 ... syncing ... done.
> sleeping 10 seconds ... done.
> Reading from /home/ddt.11397 ... done.
> 51200MiB    KiB/s  CPU%
> Write      573631     5
> Read       164716     1
>
> and my log are still exploding…
>
> After having re-enabled the quota on my volume:
> # ddt -t 50g /home/
> Writing to /home/ddt.11817 ... syncing ... done.
> sleeping 10 seconds ... done.
> Reading from /home/ddt.11817 ... done.
> 51200MiB    KiB/s  CPU%
> Write      269608     3
> Read       160219     1
>
> Thanks
> Geoffrey
> ------------------------------------------------------
> Geoffrey Letessier
> Responsable informatique & ingénieur système
> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
> Institut de Biologie Physico-Chimique
> 13, rue Pierre et Marie Curie - 75005 Paris
> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr
>
> Le 7 août 2015 à 06:28, Vijaikumar M <vmallika at redhat.com> a écrit :
>
> Hi Geoffrey,
>
> Some performance improvements has been done in quota in glusterfs-3.7.3.
> Could you upgrade to glusterfs-3.7.3 and see if this helps
>
> Thanks,
> Vijay
>
>
> On Friday 07 August 2015 05:02 AM, Geoffrey Letessier wrote:
>
> Hi,
>
> No idea to help me fix this issue? (big logs, small write performance
> (/4), etc.)
>
> For comparison, here to volumes:
> - home: distributed on 4 bricks / 2 nodes  (and replicated on 4 other
> bricks / 2 other nodes):
> # ddt -t 35g /home
> Writing to /home/ddt.24172 ... syncing ... done.
> sleeping 10 seconds ... done.
> Reading from /home/ddt.24172 ... done.
> 33792MiB    KiB/s  CPU%
> Write      103659     1
> Read       391955     3
>
> - workdir: distributed on 4 bricks / 2 nodes (one the same RAID volumes
> and servers than home):
> # ddt -t 35g /workdir
> Writing to /workdir/ddt.24717 ... syncing ... done.
> sleeping 10 seconds ... done.
> Reading from /workdir/ddt.24717 ... done.
> 35840MiB    KiB/s  CPU%
> Write      738314     4
> Read       536497     4
>
> For information, previously on 3.5.3-2 version, I obtained roughly 1.1GBs
> for workdir volume and ~550-600MBs for home.
>
> All my tests (CP, RSYNC, etc.) provides me the same result (write
> throughput between 100MBs and 150MBs)
>
> Thanks.
> Geoffrey
> ------------------------------------------------------
> Geoffrey Letessier
> Responsable informatique & ingénieur système
> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
> Institut de Biologie Physico-Chimique
> 13, rue Pierre et Marie Curie - 75005 Paris
> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr
>
> Le 5 août 2015 à 10:40, Geoffrey Letessier <geoffrey.letessier at cnrs.fr> a
> écrit :
>
> Hello,
>
> In addition, knowing I have reactivated the log (brick-log-level = INFO
> not CRITICAL) only for the file creation duration (i.e. a few minutes), do
> you have noticed the log sizes and the number of lines inside:
> # ls -lh storage*
> -rw-------  1 letessier  staff    18M  5 aoû 00:54
> storage1__export-brick_home-brick1-data.log
> -rw-------  1 letessier  staff   2,1K  5 aoû 00:54
> storage1__export-brick_home-brick2-data.log
> -rw-------  1 letessier  staff    15M  5 aoû 00:56
> storage2__export-brick_home-brick1-data.log
> -rw-------  1 letessier  staff   2,1K  5 aoû 00:54
> storage2__export-brick_home-brick2-data.log
> -rw-------  1 letessier  staff    47M  5 aoû 00:55
> storage3__export-brick_home-brick1-data.log
> -rw-------  1 letessier  staff   2,1K  5 aoû 00:54
> storage3__export-brick_home-brick2-data.log
> -rw-------  1 letessier  staff    47M  5 aoû 00:55
> storage4__export-brick_home-brick1-data.log
> -rw-------  1 letessier  staff   2,1K  5 aoû 00:55
> storage4__export-brick_home-brick2-data.log
>
> # wc -l storage*
>    55381 storage1__export-brick_home-brick1-data.log
>       17 storage1__export-brick_home-brick2-data.log
>    41636 storage2__export-brick_home-brick1-data.log
>       17 storage2__export-brick_home-brick2-data.log
>   270360 storage3__export-brick_home-brick1-data.log
>       17 storage3__export-brick_home-brick2-data.log
>   270358 storage4__export-brick_home-brick1-data.log
>       17 storage4__export-brick_home-brick2-data.log
>   637803 total
>
> If the let brick-log-level to INFO, the brick log files in each server
> will consume all my /var partition capacity within only a few hours/days…
>
> Thanks in advance,
> Geoffrey
> ------------------------------------------------------
> Geoffrey Letessier
> Responsable informatique & ingénieur système
> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
> Institut de Biologie Physico-Chimique
> 13, rue Pierre et Marie Curie - 75005 Paris
> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr
>
> Le 5 août 2015 à 01:12, Geoffrey Letessier <geoffrey.letessier at cnrs.fr> a
> écrit :
>
> Hello,
>
> Since the problem motioned previously (all errors noticed in brick log
> files), i notice a very very bad performance: i can note my write
> performance divided by 4 than previously -knowing it was not so good before.
> Now, a write of a 33GB file, my write throughput is around 150MBs (with
> Infiniband), before it was around 550-600MBs; and this, both with RDMA and
> TCP protocol.
>
> During this test, more than 40 000 error lines (as the following) were
> added to the brick log files.
> [2015-08-04 22:34:27.337622] E [dict.c:1418:dict_copy_with_ref]
> (-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60)
> [0x7f021c6f7410]
> -->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88)
> [0x7f021c6f7188] -->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4)
> [0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide]
>
>
> All brick log files are in attachments.
>
> Thanks in advance for all your help and fix,
> Best,
> Geoffrey
>
> PS: question: is it possible to easily downgrade GlusterFS to a previous
> version from 3.7 (for example: v3.5)?
>
> ------------------------------------------------------
> Geoffrey Letessier
> Responsable informatique & ingénieur système
> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
> Institut de Biologie Physico-Chimique
> 13, rue Pierre et Marie Curie - 75005 Paris
> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr
> <bricks-logs.tgz>
>
>
>
>
>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150808/5436eb15/attachment.html>