[Gluster-users] [External] Re: Problems with gluster distributed mode and numpy memory mapped files

Amar Tumballi amarts at gmail.com
Tue Jan 7 14:28:30 UTC 2020


This is very helpful info Jewell. Thanks for sharing the update.

-Amar

On Tue, Jan 7, 2020 at 7:51 PM Jewell, Paul <
Paul.Jewell at pennmedicine.upenn.edu> wrote:

> Hi All,
>
> Just as a follow up on this. I changed the mount options line to look like:
>
> server:/gv0 /data glusterfs defaults,_netdev,direct-io-mode=disable 0 0
>
> The change was adding the part "direct-io-mode=disable".
>
> After this and a complete reboot of the cluster, it appears the problem
> has been solved.
>
> So I guess I would recommend using this "direct-io-mode=disable" when
> working with numpy files.
>
> Thanks,
> -Paul
> ------------------------------
> *From:* gluster-users-bounces at gluster.org <
> gluster-users-bounces at gluster.org> on behalf of Jewell, Paul <
> Paul.Jewell at pennmedicine.upenn.edu>
> *Sent:* Thursday, December 12, 2019 10:52 AM
> *To:* gluster-users at gluster.org <gluster-users at gluster.org>
> *Subject:* [External] Re: [Gluster-users] Problems with gluster
> distributed mode and numpy memory mapped files
>
> Hi All,
>
> I am using gluster in order to share data between four development
> servers. It is just four primarily compute-machine attached to a switch.
> Each one has a 10TB hardware raid5 volume for a total of about 40TB. Each
> of the machines is a gluster client and server mounted to "itself". (For
> example, machine1 mounts the gluster volume as machine1:gvfs) The volume is
> a standard "distributed" (just JBOD all of the space, no duplication)
>
> This works well in general. Recently I have been developing an application
> for our research that makes use of the numpy "mmap" mode. Basically, a loop
> is iterated and at the end of it, for each file in a list, it is opened,
> and an array of data is dumped into one contiguous slice in the file,
> basically like this:
>
> for data in chunks:
>     for i in range(10):
>         new_coverage = np.load("coverage_file_%d" % d, mmap_mode='r+')
>         new_coverage[(i*10000):((i+1)*10000), : ] = data
>
> That is to say, a slice of about ~50MB of data is written to each file,
> one at a time. On the first iteration, the first ~50mb is written, on the
> second iteration, the second ~50mb is written, etc.
>
> I have timed the run of these iterations, and something very strange is
> happening. This is an output example of one iteration of the outer loop.
> The small numbers represent a reasonable amount of time to write this small
> data. For some reason
> on some files the writing time is absolutely huge for this small amount of
> data.
>
> Writing new coverage for majiq file 1 took 0.591965
> Writing new coverage for majiq file 2 took 0.601540
> Writing new coverage for majiq file 3 took 0.989093
> Writing new coverage for majiq file 4 took 612.667724
> Writing new coverage for majiq file 5 took 612.630379
> Writing new coverage for majiq file 6 took 612.099524
> Writing new coverage for majiq file 7 took 612.666857
> Writing new coverage for majiq file 8 took 612.633965
> Writing new coverage for majiq file 9 took 612.634483
> Writing new coverage for majiq file 10 took 612.096660
> Writing new coverage for majiq file 11 took 612.634655
> Writing new coverage for majiq file 12 took 612.632943
> Writing new coverage for majiq file 13 took 612.631088
> Writing new coverage for majiq file 14 took 612.675435
> Writing new coverage for majiq file 15 took 341.445473
> Writing new coverage for majiq file 16 took 0.550073
> Writing new coverage for majiq file 17 took 0.655051
> Writing new coverage for majiq file 18 took 0.641738
> Writing new coverage for majiq file 19 took 0.808618
> Writing new coverage for majiq file 20 took 0.583891
> Writing new coverage for majiq file 21 took 0.617392
> Writing new coverage for majiq file 22 took 0.749011
> Writing new coverage for majiq file 23 took 0.609862
>
> At first I thought this must be something like the fast files are on the
> same disk as the server where the program is running, the the slow files
> were distributed to other machines, but it turns out that for every
> iteration of the outer loop, the slow group of files actually changes. In
> top the python process is stuck in D state throughout these writes. I
> have tested my algorithm on my development machine, as well as a scratch
> (single SSD drive) on the same servers that I ran the other very slow
> problem test on above, and all of the writes are under 1 second.
>
> It is an extremely strange problem I have never seen before.
>
> Because all of the high values are pretty much exactly the same amount of
> time, it seems like this is some kind of timeout happening perhaps? What
> would be causing it to happen that often?
>
> Thanks for reading!
>
> ------------------------------
> *From:* Jewell, Paul <Paul.Jewell at pennmedicine.upenn.edu>
> *Sent:* Monday, December 9, 2019 1:40 PM
> *To:* gluster-users at gluster.org <gluster-users at gluster.org>
> *Subject:* Re: Problems with gluster distributed mode and numpy memory
> mapped files
>
> Hi All,
>
> I am using gluster in order to share data between four development
> servers. It is just four primarily compute-machine attached to a switch.
> Each one has a 10TB hardware raid5 volume for a total of about 40TB. Each
> of the machines is a gluster client and server mounted to "itself". (For
> example, machine1 mounts the gluster volume as machine1:gvfs) The volume is
> a standard "distributed" (just JBOD all of the space, no duplication)
>
> This works well in general. Recently I have been developing an application
> for our research that makes use of the numpy "mmap" mode. Basically, a loop
> is iterated and at the end of it, for each file in a list, it is opened,
> and an array of data is dumped into one contiguous slice in the file,
> basically like this:
>
> for data in chunks:
>     for i in range(10):
>         new_coverage = np.load("coverage_file_%d" % d, mmap_mode='r+')
>         new_coverage[(i*10000):((i+1)*10000), : ] = data
>
> That is to say, a slice of about ~50MB of data is written to each file,
> one at a time. On the first iteration, the first ~50mb is written, on the
> second iteration, the second ~50mb is written, etc.
>
> I have timed the run of these iterations, and something very strange is
> happening. This is an output example of one iteration of the outer loop.
> The small numbers represent a reasonable amount of time to write this small
> data. For some reason
> on some files the writing time is absolutely huge for this small amount of
> data.
>
> Writing new coverage for majiq file 1 took 0.591965
> Writing new coverage for majiq file 2 took 0.601540
> Writing new coverage for majiq file 3 took 0.989093
> Writing new coverage for majiq file 4 took 612.667724
> Writing new coverage for majiq file 5 took 612.630379
> Writing new coverage for majiq file 6 took 612.099524
> Writing new coverage for majiq file 7 took 612.666857
> Writing new coverage for majiq file 8 took 612.633965
> Writing new coverage for majiq file 9 took 612.634483
> Writing new coverage for majiq file 10 took 612.096660
> Writing new coverage for majiq file 11 took 612.634655
> Writing new coverage for majiq file 12 took 612.632943
> Writing new coverage for majiq file 13 took 612.631088
> Writing new coverage for majiq file 14 took 612.675435
> Writing new coverage for majiq file 15 took 341.445473
> Writing new coverage for majiq file 16 took 0.550073
> Writing new coverage for majiq file 17 took 0.655051
> Writing new coverage for majiq file 18 took 0.641738
> Writing new coverage for majiq file 19 took 0.808618
> Writing new coverage for majiq file 20 took 0.583891
> Writing new coverage for majiq file 21 took 0.617392
> Writing new coverage for majiq file 22 took 0.749011
> Writing new coverage for majiq file 23 took 0.609862
>
> At first I thought this must be something like the fast files are on the
> same disk as the server where the program is running, the the slow files
> were distributed to other machines, but it turns out that for every
> iteration of the outer loop, the slow group of files actually changes. In
> top the python process is stuck in D state throughout these writes. I
> have tested my algorithm on my development machine, as well as a scratch
> (single SSD drive) on the same servers that I ran the other very slow
> problem test on above, and all of the writes are under 1 second.
>
> It is an extremely strange problem I have never seen before.
>
> Because all of the high values are pretty much exactly the same amount of
> time, it seems like this is some kind of timeout happening perhaps? What
> would be causing it to happen that often?
>
> Thanks for reading!
>
>
> ------------------------------
> *From:* Jewell, Paul
> *Sent:* Thursday, December 5, 2019 4:18 PM
> *To:* gluster-users at gluster.org <gluster-users at gluster.org>
> *Subject:* Problems with gluster distributed mode and numpy memory mapped
> files
>
> Hi All,
>
> I am using gluster in order to share data between four development
> servers. It is just four primarily compute-machine attached to a switch.
> Each one has a 10TB hardware raid5 volume for a total of about 40TB. Each
> of the machines is a gluster client and server mounted to "itself". (For
> example, machine1 mounts the gluster volume as machine1:gvfs) The volume is
> a standard "distributed" (just JBOD all of the space, no duplication)
>
> This works well in general. Recently I have been developing an application
> for our research that makes use of the numpy "mmap" mode. Basically, a loop
> is iterated and at the end of it, for each file in a list, it is opened,
> and an array of data is dumped into one contiguous slice in the file,
> basically like this:
>
> for data in chunks:
>     for i in range(10):
>         new_coverage = np.load("coverage_file_%d" % d, mmap_mode='r+')
>         new_coverage[(i*10000):((i+1)*10000), : ] = data
>
> That is to say, a slice of about ~50MB of data is written to each file,
> one at a time. On the first iteration, the first ~50mb is written, on the
> second iteration, the second ~50mb is written, etc.
>
> I have timed the run of these iterations, and something very strange is
> happening. This is an output example of one iteration of the outer loop.
> The small numbers represent a reasonable amount of time to write this small
> data. For some reason
> on some files the writing time is absolutely huge for this small amount of
> data.
>
> Writing new coverage for majiq file 1 took 0.591965
> Writing new coverage for majiq file 2 took 0.601540
> Writing new coverage for majiq file 3 took 0.989093
> Writing new coverage for majiq file 4 took 612.667724
> Writing new coverage for majiq file 5 took 612.630379
> Writing new coverage for majiq file 6 took 612.099524
> Writing new coverage for majiq file 7 took 612.666857
> Writing new coverage for majiq file 8 took 612.633965
> Writing new coverage for majiq file 9 took 612.634483
> Writing new coverage for majiq file 10 took 612.096660
> Writing new coverage for majiq file 11 took 612.634655
> Writing new coverage for majiq file 12 took 612.632943
> Writing new coverage for majiq file 13 took 612.631088
> Writing new coverage for majiq file 14 took 612.675435
> Writing new coverage for majiq file 15 took 341.445473
> Writing new coverage for majiq file 16 took 0.550073
> Writing new coverage for majiq file 17 took 0.655051
> Writing new coverage for majiq file 18 took 0.641738
> Writing new coverage for majiq file 19 took 0.808618
> Writing new coverage for majiq file 20 took 0.583891
> Writing new coverage for majiq file 21 took 0.617392
> Writing new coverage for majiq file 22 took 0.749011
> Writing new coverage for majiq file 23 took 0.609862
>
> At first I thought this must be something like the fast files are on the
> same disk as the server where the program is running, the the slow files
> were distributed to other machines, but it turns out that for every
> iteration of the outer loop, the slow group of files actually changes. In
> top the python process is stuck in D state throughout these writes. I
> have tested my algorithm on my development machine, as well as a scratch
> (single SSD drive) on the same servers that I ran the other very slow
> problem test on above, and all of the writes are under 1 second.
>
> It is an extremely strange problem I have never seen before.
>
> Because all of the high values are pretty much exactly the same amount of
> time, it seems like this is some kind of timeout happening perhaps? What
> would be causing it to happen that often?
>
> Thanks for reading!
>
> ________
>
> Community Meeting Calendar:
>
> APAC Schedule -
> Every 2nd and 4th Tuesday at 11:30 AM IST
> Bridge: https://bluejeans.com/441850968
>
> NA/EMEA Schedule -
> Every 1st and 3rd Tuesday at 01:00 PM EDT
> Bridge: https://bluejeans.com/441850968
>
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20200107/af2466a8/attachment.html>


More information about the Gluster-users mailing list