[Gluster-users] Cascading errors and very bad write performance
Vijaikumar M
vmallika at redhat.com
Fri Aug 7 12:57:08 UTC 2015
On Friday 07 August 2015 05:34 PM, Geoffrey Letessier wrote:
> Hi Vijay,
>
> My brick logs issue and big performance problem have begun when I
> upgraded Gluster into 3.7.3 version; before write throughput was good
> enough (~500MBs) -but not as good as with GlusterFS 3.5.3 (especially
> with distributed volumes)- and didn’t notice these problème with
> brick-logs.
>
> OK… in live:
>
> i just disable to quota for my home volume and now my performance
> appears to be relatively better (around 300MBs) but i still see the
> logs (from storage1 and its replicate storage2) growing up with only
> this kind of lines:
> [2015-08-07 11:16:51.746142] E [dict.c:1418:dict_copy_with_ref]
> (-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60)
> [0x7f85e9a6a410]
> -->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88)
> [0x7f85e9a6a188]
> -->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4)
> [0x3e99c20674] ) 0-dict: invalid argument: dict [Argument invalide]
>
We have root caused log issue, bug# 1244613 tracks this issue
> After a few minutes: my write throughput seems to be now correct
> (~550MBs) but the log are still growing up (to not say exploding). So
> one part of the problem looks like taking its origin in the quota
> system management.
> … after a few minutes (and still only 1 client connected), now it is
> the read operation which is very very slow… -I’m gonna become crazy! :/-
> # ddt -t 50g /home/
> Writing to /home/ddt.11293 ... syncing ... done.
> sleeping 10 seconds ... done.
> Reading from /home/ddt.11293 ... done.
> 35840MiB KiB/s CPU%
> Write 568201 5
> Read 567008 4
> # ddt -t 50g /home/
> Writing to /home/ddt.11397 ... syncing ... done.
> sleeping 10 seconds ... done.
> Reading from /home/ddt.11397 ... done.
> 51200MiB KiB/s CPU%
> Write 573631 5
> Read 164716 1
>
> and my log are still exploding…
>
> After having re-enabled the quota on my volume:
> # ddt -t 50g /home/
> Writing to /home/ddt.11817 ... syncing ... done.
> sleeping 10 seconds ... done.
> Reading from /home/ddt.11817 ... done.
> 51200MiB KiB/s CPU%
> Write 269608 3
> Read 160219 1
>
> Thanks
> Geoffrey
> ------------------------------------------------------
> Geoffrey Letessier
> Responsable informatique & ingénieur système
> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
> Institut de Biologie Physico-Chimique
> 13, rue Pierre et Marie Curie - 75005 Paris
> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr
> <mailto:geoffrey.letessier at ibpc.fr>
>
> Le 7 août 2015 à 06:28, Vijaikumar M <vmallika at redhat.com
> <mailto:vmallika at redhat.com>> a écrit :
>
>> Hi Geoffrey,
>>
>> Some performance improvements has been done in quota in glusterfs-3.7.3.
>> Could you upgrade to glusterfs-3.7.3 and see if this helps
>>
>> Thanks,
>> Vijay
>>
>>
>> On Friday 07 August 2015 05:02 AM, Geoffrey Letessier wrote:
>>> Hi,
>>>
>>> No idea to help me fix this issue? (big logs, small write
>>> performance (/4), etc.)
>>>
>>> For comparison, here to volumes:
>>> - home: distributed on 4 bricks / 2 nodes (and replicated on 4
>>> other bricks / 2 other nodes):
>>> # ddt -t 35g /home
>>> Writing to /home/ddt.24172 ... syncing ... done.
>>> sleeping 10 seconds ... done.
>>> Reading from /home/ddt.24172 ... done.
>>> 33792MiB KiB/s CPU%
>>> Write 103659 1
>>> Read 391955 3
>>>
>>> - workdir: distributed on 4 bricks / 2 nodes (one the same RAID
>>> volumes and servers than home):
>>> # ddt -t 35g /workdir
>>> Writing to /workdir/ddt.24717 ... syncing ... done.
>>> sleeping 10 seconds ... done.
>>> Reading from /workdir/ddt.24717 ... done.
>>> 35840MiB KiB/s CPU%
>>> Write 738314 4
>>> Read 536497 4
>>>
>>> For information, previously on 3.5.3-2 version, I obtained roughly
>>> 1.1GBs for workdir volume and ~550-600MBs for home.
>>>
>>> All my tests (CP, RSYNC, etc.) provides me the same result (write
>>> throughput between 100MBs and 150MBs)
>>>
>>> Thanks.
>>> Geoffrey
>>> ------------------------------------------------------
>>> Geoffrey Letessier
>>> Responsable informatique & ingénieur système
>>> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
>>> Institut de Biologie Physico-Chimique
>>> 13, rue Pierre et Marie Curie - 75005 Paris
>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr
>>> <mailto:geoffrey.letessier at ibpc.fr>
>>>
>>> Le 5 août 2015 à 10:40, Geoffrey Letessier
>>> <geoffrey.letessier at cnrs.fr <mailto:geoffrey.letessier at cnrs.fr>> a
>>> écrit :
>>>
>>>> Hello,
>>>>
>>>> In addition, knowing I have reactivated the log (brick-log-level =
>>>> INFO not CRITICAL) only for the file creation duration (i.e. a few
>>>> minutes), do you have noticed the log sizes and the number of lines
>>>> inside:
>>>> # ls -lh storage*
>>>> -rw------- 1 letessier staff 18M 5 aoû 00:54
>>>> storage1__export-brick_home-brick1-data.log
>>>> -rw------- 1 letessier staff 2,1K 5 aoû 00:54
>>>> storage1__export-brick_home-brick2-data.log
>>>> -rw------- 1 letessier staff 15M 5 aoû 00:56
>>>> storage2__export-brick_home-brick1-data.log
>>>> -rw------- 1 letessier staff 2,1K 5 aoû 00:54
>>>> storage2__export-brick_home-brick2-data.log
>>>> -rw------- 1 letessier staff 47M 5 aoû 00:55
>>>> storage3__export-brick_home-brick1-data.log
>>>> -rw------- 1 letessier staff 2,1K 5 aoû 00:54
>>>> storage3__export-brick_home-brick2-data.log
>>>> -rw------- 1 letessier staff 47M 5 aoû 00:55
>>>> storage4__export-brick_home-brick1-data.log
>>>> -rw------- 1 letessier staff 2,1K 5 aoû 00:55
>>>> storage4__export-brick_home-brick2-data.log
>>>>
>>>> # wc -l storage*
>>>> 55381 storage1__export-brick_home-brick1-data.log
>>>> 17 storage1__export-brick_home-brick2-data.log
>>>> 41636 storage2__export-brick_home-brick1-data.log
>>>> 17 storage2__export-brick_home-brick2-data.log
>>>> 270360 storage3__export-brick_home-brick1-data.log
>>>> 17 storage3__export-brick_home-brick2-data.log
>>>> 270358 storage4__export-brick_home-brick1-data.log
>>>> 17 storage4__export-brick_home-brick2-data.log
>>>> 637803 total
>>>>
>>>> If the let brick-log-level to INFO, the brick log files in each
>>>> server will consume all my /var partition capacity within only a
>>>> few hours/days…
>>>>
>>>> Thanks in advance,
>>>> Geoffrey
>>>> ------------------------------------------------------
>>>> Geoffrey Letessier
>>>> Responsable informatique & ingénieur système
>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
>>>> Institut de Biologie Physico-Chimique
>>>> 13, rue Pierre et Marie Curie - 75005 Paris
>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr
>>>> <mailto:geoffrey.letessier at ibpc.fr>
>>>>
>>>> Le 5 août 2015 à 01:12, Geoffrey Letessier
>>>> <geoffrey.letessier at cnrs.fr <mailto:geoffrey.letessier at cnrs.fr>> a
>>>> écrit :
>>>>
>>>>> Hello,
>>>>>
>>>>> Since the problem motioned previously (all errors noticed in brick
>>>>> log files), i notice a very very bad performance: i can note my
>>>>> write performance divided by 4 than previously -knowing it was not
>>>>> so good before.
>>>>> Now, a write of a 33GB file, my write throughput is around 150MBs
>>>>> (with Infiniband), before it was around 550-600MBs; and this, both
>>>>> with RDMA and TCP protocol.
>>>>>
>>>>> During this test, more than 40 000 error lines (as the following)
>>>>> were added to the brick log files.
>>>>> [2015-08-04 22:34:27.337622] E [dict.c:1418:dict_copy_with_ref]
>>>>> (-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60)
>>>>> [0x7f021c6f7410]
>>>>> -->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88)
>>>>> [0x7f021c6f7188]
>>>>> -->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4)
>>>>> [0x7f0229cba674] ) 0-dict: invalid argument: dict [Argument invalide]
>>>>>
>>>>>
>>>>> All brick log files are in attachments.
>>>>>
>>>>> Thanks in advance for all your help and fix,
>>>>> Best,
>>>>> Geoffrey
>>>>>
>>>>> PS: question: is it possible to easily downgrade GlusterFS to a
>>>>> previous version from 3.7 (for example: v3.5)?
>>>>>
>>>>> ------------------------------------------------------
>>>>> Geoffrey Letessier
>>>>> Responsable informatique & ingénieur système
>>>>> UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
>>>>> Institut de Biologie Physico-Chimique
>>>>> 13, rue Pierre et Marie Curie - 75005 Paris
>>>>> Tel: 01 58 41 50 93 - eMail: geoffrey.letessier at ibpc.fr
>>>>> <mailto:geoffrey.letessier at ibpc.fr>
>>>>> <bricks-logs.tgz>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150807/d9046875/attachment.html>
More information about the Gluster-users
mailing list