[Gluster-devel] [Gluster-users] high load when copy directory with many files

Xavi Hernandez jahernan at redhat.com
Wed Apr 21 21:46:02 UTC 2021


Hi Marco,

sorry for the late reply.

I've run some tests and I don't see any big difference between ls, stat and
getfattr. Can you provide more details about what test did you run ?

It would also help to provide a profile info for each test:

To start profile info: gluster volume profile <volname> start
Before each test: gluster volume profile <volname> info clear
After the test: gluster volume profile <volname> info >/some/file

Regards,

Xavi

On Mon, Apr 12, 2021 at 9:01 AM Xavi Hernandez <jahernan at redhat.com> wrote:

> On Sun, Apr 11, 2021 at 10:29 AM Amar Tumballi <amar at kadalu.io> wrote:
>
>> Hi Marco, this is really good test/info. Thanks.
>>
>> One more thing to observe is you are running such tests is 'gluster
>> profile info', so the bottleneck fop is listed.
>>
>> Mohit, Xavi, in this parallel operations, the load may be high due to
>> inodelk used in mds xattr update in dht? Or you guys suspect something else?
>>
>
> A profile info would be very useful to know which fop gets more requests.
> I think inodelk by itself shouldn't be an issue (I guess we are setting mds
> only once, right ?). In theory we shouldn't be sending any operation on an
> inode without a previous successful lookup, and in this case lookups should
> fail, so I don't clearly see what's the difference compared to an stat.
>
> We should investigate this. I'll try to do some experiments (not sure if
> this week, though).
>
> Regards,
>
> Xavi
>
>
>> Regards
>> Amar
>>
>> On Sat, 10 Apr, 2021, 11:45 pm Marco Lerda - FOREACH S.R.L., <
>> marco.lerda at foreach.it> wrote:
>>
>>> hi,
>>> we have isolated the problem (meanwhile some hardware upgrade and code
>>> optimization helped to limit the problem).
>>> it happens when many request (HTTP over apache) comes to a non existent
>>> file.
>>> With 30 concurrent request to the same non existing file cause the load
>>> go high without limit.
>>> Same requests on existing files works fine.
>>> I have tried to simulate che apache access to file excluding apache with
>>> repeated command on files with the same parallelism (30):
>>> - with ls works fine, file exists or not
>>> - with stat works fine, file exists or not
>>> - with xattr load go up, file exists or not
>>>
>>> thank you
>>>
>>>
>>> Il 05/10/2020 19.45, Marco Lerda - FOREACH S.R.L. ha scritto:
>>> > hi,
>>> > we use glusterfs on a php application that have many small php files
>>> > images etc...
>>> > We use glusterfs in replication mode.
>>> > We have 2 nodes connected in fiber with 100MBps and less than 1 ms
>>> > latency.
>>> > We have also an arbiter on slower network (but the issue is there also
>>> > without the arbiter).
>>> > When we copy a directory (cp command) with many files, cpu usage and
>>> > load explode raplidly,
>>> > our application become inaccessible until the copy ends.
>>> >
>>> > I wonder if is that normal or we have done something wrong.
>>> > I know that glusterfs is not indicated with many small files, and I
>>> > know that it slow down,
>>> > but I want to avoid that a simple copy of a directory will put down
>>> > out application.
>>> >
>>> > Any suggestion?
>>> >
>>> > Thanks a lot
>>> >
>>> >
>>> >
>>> > ________
>>> >
>>> >
>>> >
>>> > Community Meeting Calendar:
>>> >
>>> > Schedule -
>>> > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>>> > Bridge: https://bluejeans.com/441850968
>>> >
>>> > Gluster-users mailing list
>>> > Gluster-users at gluster.org
>>> > https://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>> --
>>>
>>> ------------------------------------------------------
>>> Marco Lerda
>>> FOREACH S.R.L.
>>> Via Laghi di Avigliana 115, 12022 - Busca (CN)
>>> Telefono: 0171-1984102
>>> Centralino/Fax: 0171-1984100
>>> Email:  marco.lerda at foreach.it
>>> Web: http://www.foreach.it
>>>
>>> ________
>>>
>>>
>>>
>>> Community Meeting Calendar:
>>>
>>> Schedule -
>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>>> Bridge: https://meet.google.com/cpu-eiue-hvk
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>> -------
>>
>> Community Meeting Calendar:
>> Schedule -
>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
>> Bridge: https://meet.google.com/cpu-eiue-hvk
>>
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-devel/attachments/20210421/485d9550/attachment.html>


More information about the Gluster-devel mailing list