<div dir="ltr"><div dir="ltr">This is very helpful info Jewell. Thanks for sharing the update.</div><div dir="ltr"><br></div><div dir="ltr"><div>-Amar</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Jan 7, 2020 at 7:51 PM Jewell, Paul <<a href="mailto:Paul.Jewell@pennmedicine.upenn.edu">Paul.Jewell@pennmedicine.upenn.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div dir="ltr">
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Hi All, </div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
Just as a follow up on this. I changed the mount options line to look like:</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
server:/gv0 /data glusterfs defaults,_netdev,direct-io-mode=disable 0 0<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
The change was adding the part "<span style="font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(34,36,37);display:inline">direct-io-mode=disable". </span></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(34,36,37);display:inline"><br>
</span></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(34,36,37);display:inline">After this and a complete reboot of the cluster, it appears
the problem has been solved. </span></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(34,36,37);display:inline"><br>
</span></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(34,36,37);display:inline">So I guess I would recommend using this "<span style="font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(34,36,37);display:inline">direct-io-mode=disable"
when working with numpy files. </span></span></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(34,36,37);display:inline"><span style="font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(34,36,37);display:inline"><br>
</span></span></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(34,36,37);display:inline"><span style="font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(34,36,37);display:inline">Thanks, </span></span></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(34,36,37);display:inline"><span style="font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(34,36,37);display:inline">-Paul</span></span></div>
<div id="gmail-m_-6962186958588945804appendonsend"></div>
<hr style="display:inline-block;width:98%">
<div id="gmail-m_-6962186958588945804divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> <a href="mailto:gluster-users-bounces@gluster.org" target="_blank">gluster-users-bounces@gluster.org</a> <<a href="mailto:gluster-users-bounces@gluster.org" target="_blank">gluster-users-bounces@gluster.org</a>> on behalf of Jewell, Paul <<a href="mailto:Paul.Jewell@pennmedicine.upenn.edu" target="_blank">Paul.Jewell@pennmedicine.upenn.edu</a>><br>
<b>Sent:</b> Thursday, December 12, 2019 10:52 AM<br>
<b>To:</b> <a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a> <<a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a>><br>
<b>Subject:</b> [External] Re: [Gluster-users] Problems with gluster distributed mode and numpy memory mapped files</font>
<div> </div>
</div>
<div dir="ltr">
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;color:rgb(32,31,30);background-color:white">
Hi All, </div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;color:rgb(32,31,30);background-color:white">
<br>
</div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;color:rgb(32,31,30);background-color:white">
I am using gluster in order to share data between four development servers. It is just four primarily compute-machine attached to a switch. Each one has a 10TB hardware raid5 volume for a total of about 40TB. Each of the machines is a gluster client and server
mounted to "itself". (For example, machine1 mounts the gluster volume as machine1:gvfs) The volume is a standard "distributed" (just JBOD all of the space, no duplication)</div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;color:rgb(32,31,30);background-color:white">
<br>
</div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;color:rgb(32,31,30);background-color:white">
This works well in general. Recently I have been developing an application for our research that makes use of the numpy "mmap" mode. Basically, a loop is iterated and at the end of it, for each file in a list, it is opened, and an array of data is dumped into
one contiguous slice in the file, basically like this:</div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;color:rgb(32,31,30);background-color:white">
<br>
</div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;color:rgb(32,31,30);background-color:white">
for data in chunks:</div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;color:rgb(32,31,30);background-color:white">
for i in range(10):</div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;color:rgb(32,31,30);background-color:white">
new_coverage = np.load("coverage_file_%d" % d, mmap_mode='r+')</div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;color:rgb(32,31,30);background-color:white">
new_coverage[(i*10000):((i+1)*10000), : ] = data</div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;color:rgb(32,31,30);background-color:white">
<br>
</div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;color:rgb(32,31,30);background-color:white">
That is to say, a slice of about ~50MB of data is written to each file, one at a time. On the first iteration, the first ~50mb is written, on the second iteration, the second ~50mb is written, etc. </div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;color:rgb(32,31,30);background-color:white">
<br>
</div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;color:rgb(32,31,30);background-color:white">
I have timed the run of these iterations, and something very strange is happening. This is an output example of one iteration of the outer loop. The small numbers represent a reasonable amount of time to write this small data. For some reason</div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;color:rgb(32,31,30);background-color:white">
on some files the writing time is absolutely huge for this small amount of data. </div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;color:rgb(32,31,30);background-color:white">
<br>
</div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;color:rgb(32,31,30);background-color:white">
<span style="margin:0px;font-family:monospace"><span style="margin:0px;background-color:white"></span><span style="margin:0px;background-color:white">Writing new coverage for majiq file 1 took 0.591965<span style="margin:0px"> </span></span><br>
<span style="margin:0px;background-color:white">Writing new coverage for majiq file 2 took 0.601540<span style="margin:0px"> </span></span><br>
<span style="margin:0px;background-color:white"></span><span style="margin:0px;background-color:white">Writing new coverage for majiq file 3 took 0.989093</span><br>
Writing new coverage for majiq file 4 took 612.667724<br>
Writing new coverage for majiq file 5 took 612.630379<br>
Writing new coverage for majiq file 6 took 612.099524<br>
Writing new coverage for majiq file 7 took 612.666857<br>
Writing new coverage for majiq file 8 took 612.633965<br>
Writing new coverage for majiq file 9 took 612.634483<br>
Writing new coverage for majiq file 10 took 612.096660<br>
Writing new coverage for majiq file 11 took 612.634655<br>
Writing new coverage for majiq file 12 took 612.632943<br>
Writing new coverage for majiq file 13 took 612.631088<br>
Writing new coverage for majiq file 14 took 612.675435<br>
Writing new coverage for majiq file 15 took 341.445473<br>
Writing new coverage for majiq file 16 took 0.550073<br>
Writing new coverage for majiq file 17 took 0.655051<br>
Writing new coverage for majiq file 18 took 0.641738<br>
Writing new coverage for majiq file 19 took 0.808618<br>
Writing new coverage for majiq file 20 took 0.583891<br>
Writing new coverage for majiq file 21 took 0.617392<br>
Writing new coverage for majiq file 22 took 0.749011<br>
Writing new coverage for majiq file 23 took 0.609862 </span></div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;color:rgb(32,31,30);background-color:white">
<span style="margin:0px;font-family:monospace"><br>
</span></div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;color:rgb(32,31,30);background-color:white">
<span style="margin:0px;font-family:monospace"><span style="margin:0px;font-family:Calibri,Arial,Helvetica,sans-serif;background-color:white">At first I thought this must be something like the fast files are on the same disk as the server where the program
is running, the the slow files were distributed to other machines, but it turns out that for every iteration of the outer loop, the slow group of files actually changes. In top the python process is stuck in D state throughout these writes. </span></span><span style="margin:0px">I
have tested my algorithm on my development machine, as well as a scratch (single SSD drive) on the same servers that I ran the other very slow problem test on above, and all of the writes are under 1 second. </span></div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;color:rgb(32,31,30);background-color:white">
<span style="margin:0px"><br>
</span></div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;color:rgb(32,31,30);background-color:white">
<span style="margin:0px">It is an extremely strange problem I have never seen before. </span></div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;color:rgb(32,31,30);background-color:white">
<span style="margin:0px"><br>
</span></div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;color:rgb(32,31,30);background-color:white">
<span style="margin:0px">Because all of the high values are pretty much exactly the same amount of time, it seems like this is some kind of timeout happening perhaps? What would be causing it to happen that often?</span></div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;color:rgb(32,31,30);background-color:white">
<span style="margin:0px"><br>
</span></div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;color:rgb(32,31,30);background-color:white">
<span style="margin:0px">Thanks for reading!</span></div>
<br>
</div>
<div id="gmail-m_-6962186958588945804x_appendonsend"></div>
<hr style="display:inline-block;width:98%">
<div id="gmail-m_-6962186958588945804x_divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" color="#000000" style="font-size:11pt"><b>From:</b> Jewell, Paul <<a href="mailto:Paul.Jewell@pennmedicine.upenn.edu" target="_blank">Paul.Jewell@pennmedicine.upenn.edu</a>><br>
<b>Sent:</b> Monday, December 9, 2019 1:40 PM<br>
<b>To:</b> <a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a> <<a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a>><br>
<b>Subject:</b> Re: Problems with gluster distributed mode and numpy memory mapped files</font>
<div> </div>
</div>
<div dir="ltr">
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(255,255,255)">
Hi All, </div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(255,255,255)">
<br>
</div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(255,255,255)">
I am using gluster in order to share data between four development servers. It is just four primarily compute-machine attached to a switch. Each one has a 10TB hardware raid5 volume for a total of about 40TB. Each of the machines is a gluster client and server
mounted to "itself". (For example, machine1 mounts the gluster volume as machine1:gvfs) The volume is a standard "distributed" (just JBOD all of the space, no duplication)</div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(255,255,255)">
<br>
</div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(255,255,255)">
This works well in general. Recently I have been developing an application for our research that makes use of the numpy "mmap" mode. Basically, a loop is iterated and at the end of it, for each file in a list, it is opened, and an array of data is dumped into
one contiguous slice in the file, basically like this:</div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(255,255,255)">
<br>
</div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(255,255,255)">
for data in chunks:</div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(255,255,255)">
for i in range(10):</div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(255,255,255)">
new_coverage = np.load("coverage_file_%d" % d, mmap_mode='r+')</div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(255,255,255)">
new_coverage[(i*10000):((i+1)*10000), : ] = data</div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(255,255,255)">
<br>
</div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(255,255,255)">
That is to say, a slice of about ~50MB of data is written to each file, one at a time. On the first iteration, the first ~50mb is written, on the second iteration, the second ~50mb is written, etc. </div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(255,255,255)">
<br>
</div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(255,255,255)">
I have timed the run of these iterations, and something very strange is happening. This is an output example of one iteration of the outer loop. The small numbers represent a reasonable amount of time to write this small data. For some reason</div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(255,255,255)">
on some files the writing time is absolutely huge for this small amount of data. </div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(255,255,255)">
<br>
</div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(255,255,255)">
<span style="margin:0px;font-family:monospace"><span style="margin:0px;background-color:rgb(255,255,255)"></span><span style="margin:0px;background-color:rgb(255,255,255);display:inline">Writing new coverage for majiq file 1 took 0.591965<span style="margin:0px"> </span></span><br style="background-color:rgb(255,255,255)">
<span style="margin:0px;background-color:rgb(255,255,255);display:inline">Writing new coverage for majiq file 2 took 0.601540<span style="margin:0px"> </span></span><br style="background-color:rgb(255,255,255)">
<span style="margin:0px;background-color:rgb(255,255,255);display:inline"></span><span style="margin:0px;background-color:rgb(255,255,255);display:inline">Writing new coverage for majiq file 3 took 0.989093</span><br>
Writing new coverage for majiq file 4 took 612.667724<br>
Writing new coverage for majiq file 5 took 612.630379<br>
Writing new coverage for majiq file 6 took 612.099524<br>
Writing new coverage for majiq file 7 took 612.666857<br>
Writing new coverage for majiq file 8 took 612.633965<br>
Writing new coverage for majiq file 9 took 612.634483<br>
Writing new coverage for majiq file 10 took 612.096660<br>
Writing new coverage for majiq file 11 took 612.634655<br>
Writing new coverage for majiq file 12 took 612.632943<br>
Writing new coverage for majiq file 13 took 612.631088<br>
Writing new coverage for majiq file 14 took 612.675435<br>
Writing new coverage for majiq file 15 took 341.445473<br>
Writing new coverage for majiq file 16 took 0.550073<br>
Writing new coverage for majiq file 17 took 0.655051<br>
Writing new coverage for majiq file 18 took 0.641738<br>
Writing new coverage for majiq file 19 took 0.808618<br>
Writing new coverage for majiq file 20 took 0.583891<br>
Writing new coverage for majiq file 21 took 0.617392<br>
Writing new coverage for majiq file 22 took 0.749011<br>
Writing new coverage for majiq file 23 took 0.609862 </span></div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(255,255,255)">
<span style="margin:0px;font-family:monospace"><br>
</span></div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(255,255,255)">
<span style="margin:0px;font-family:monospace"><span style="margin:0px;font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(255,255,255);display:inline">At first I thought this must be something like the fast files are on the same
disk as the server where the program is running, the the slow files were distributed to other machines, but it turns out that for every iteration of the outer loop, the slow group of files actually changes. In top the python process is stuck in D state throughout
these writes. </span></span><span style="margin:0px">I have tested my algorithm on my development machine, as well as a scratch (single SSD drive) on the same servers that I ran the other very slow problem test on above, and all of the writes are under 1 second. </span></div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(255,255,255)">
<span style="margin:0px"><br>
</span></div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(255,255,255)">
<span style="margin:0px">It is an extremely strange problem I have never seen before. </span></div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(255,255,255)">
<span style="margin:0px"><br>
</span></div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(255,255,255)">
<span style="margin:0px">Because all of the high values are pretty much exactly the same amount of time, it seems like this is some kind of timeout happening perhaps? What would be causing it to happen that often?</span></div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(255,255,255)">
<span style="margin:0px"><br>
</span></div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(255,255,255)">
<span style="margin:0px">Thanks for reading!</span></div>
<br>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<br>
</div>
<div id="gmail-m_-6962186958588945804x_x_appendonsend"></div>
<hr style="display:inline-block;width:98%">
<div id="gmail-m_-6962186958588945804x_x_divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" color="#000000" style="font-size:11pt"><b>From:</b> Jewell, Paul<br>
<b>Sent:</b> Thursday, December 5, 2019 4:18 PM<br>
<b>To:</b> <a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a> <<a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a>><br>
<b>Subject:</b> Problems with gluster distributed mode and numpy memory mapped files</font>
<div> </div>
</div>
<div dir="ltr">
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(255,255,255)">
Hi All, </div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(255,255,255)">
<br>
</div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(255,255,255)">
I am using gluster in order to share data between four development servers. It is just four primarily compute-machine attached to a switch. Each one has a 10TB hardware raid5 volume for a total of about 40TB. Each of the machines is a gluster client and server
mounted to "itself". (For example, machine1 mounts the gluster volume as machine1:gvfs) The volume is a standard "distributed" (just JBOD all of the space, no duplication)</div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(255,255,255)">
<br>
</div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(255,255,255)">
This works well in general. Recently I have been developing an application for our research that makes use of the numpy "mmap" mode. Basically, a loop is iterated and at the end of it, for each file in a list, it is opened, and an array of data is dumped into
one contiguous slice in the file, basically like this:</div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(255,255,255)">
<br>
</div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(255,255,255)">
for data in chunks:</div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(255,255,255)">
for i in range(10):</div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(255,255,255)">
new_coverage = np.load("coverage_file_%d" % d, mmap_mode='r+')</div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(255,255,255)">
new_coverage[(i*10000):((i+1)*10000), : ] = data</div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(255,255,255)">
<br>
</div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(255,255,255)">
That is to say, a slice of about ~50MB of data is written to each file, one at a time. On the first iteration, the first ~50mb is written, on the second iteration, the second ~50mb is written, etc. </div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(255,255,255)">
<br>
</div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(255,255,255)">
I have timed the run of these iterations, and something very strange is happening. This is an output example of one iteration of the outer loop. The small numbers represent a reasonable amount of time to write this small data. For some reason</div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(255,255,255)">
on some files the writing time is absolutely huge for this small amount of data. </div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(255,255,255)">
<br>
</div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(255,255,255)">
<span style="margin:0px;font-family:monospace"><span style="margin:0px;background-color:rgb(255,255,255)"></span><span style="margin:0px;background-color:rgb(255,255,255);display:inline">Writing new coverage for majiq file 1 took 0.591965<span style="margin:0px"> </span></span><br style="background-color:rgb(255,255,255)">
<span style="margin:0px;background-color:rgb(255,255,255);display:inline">Writing new coverage for majiq file 2 took 0.601540<span style="margin:0px"> </span></span><br style="background-color:rgb(255,255,255)">
<span style="margin:0px;background-color:rgb(255,255,255);display:inline"></span><span style="margin:0px;background-color:rgb(255,255,255);display:inline">Writing new coverage for majiq file 3 took 0.989093</span><br>
Writing new coverage for majiq file 4 took 612.667724<br>
Writing new coverage for majiq file 5 took 612.630379<br>
Writing new coverage for majiq file 6 took 612.099524<br>
Writing new coverage for majiq file 7 took 612.666857<br>
Writing new coverage for majiq file 8 took 612.633965<br>
Writing new coverage for majiq file 9 took 612.634483<br>
Writing new coverage for majiq file 10 took 612.096660<br>
Writing new coverage for majiq file 11 took 612.634655<br>
Writing new coverage for majiq file 12 took 612.632943<br>
Writing new coverage for majiq file 13 took 612.631088<br>
Writing new coverage for majiq file 14 took 612.675435<br>
Writing new coverage for majiq file 15 took 341.445473<br>
Writing new coverage for majiq file 16 took 0.550073<br>
Writing new coverage for majiq file 17 took 0.655051<br>
Writing new coverage for majiq file 18 took 0.641738<br>
Writing new coverage for majiq file 19 took 0.808618<br>
Writing new coverage for majiq file 20 took 0.583891<br>
Writing new coverage for majiq file 21 took 0.617392<br>
Writing new coverage for majiq file 22 took 0.749011<br>
Writing new coverage for majiq file 23 took 0.609862 </span></div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(255,255,255)">
<span style="margin:0px;font-family:monospace"><br>
</span></div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(255,255,255)">
<span style="margin:0px;font-family:monospace"><span style="margin:0px;font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(255,255,255);display:inline">At first I thought this must be something like the fast files are on the same
disk as the server where the program is running, the the slow files were distributed to other machines, but it turns out that for every iteration of the outer loop, the slow group of files actually changes. In top the python process is stuck in D state throughout
these writes. </span></span><span style="margin:0px">I have tested my algorithm on my development machine, as well as a scratch (single SSD drive) on the same servers that I ran the other very slow problem test on above, and all of the writes are under 1 second. </span></div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(255,255,255)">
<span style="margin:0px"><br>
</span></div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(255,255,255)">
<span style="margin:0px">It is an extremely strange problem I have never seen before. </span></div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(255,255,255)">
<span style="margin:0px"><br>
</span></div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(255,255,255)">
<span style="margin:0px">Because all of the high values are pretty much exactly the same amount of time, it seems like this is some kind of timeout happening perhaps? What would be causing it to happen that often?</span></div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(255,255,255)">
<span style="margin:0px"><br>
</span></div>
<div style="margin:0px;font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif;background-color:rgb(255,255,255)">
<span style="margin:0px">Thanks for reading!</span></div>
<br>
</div>
</div>
</div>
</div>
</div>
________<br>
<br>
Community Meeting Calendar:<br>
<br>
APAC Schedule -<br>
Every 2nd and 4th Tuesday at 11:30 AM IST<br>
Bridge: <a href="https://bluejeans.com/441850968" rel="noreferrer" target="_blank">https://bluejeans.com/441850968</a><br>
<br>
NA/EMEA Schedule -<br>
Every 1st and 3rd Tuesday at 01:00 PM EDT<br>
Bridge: <a href="https://bluejeans.com/441850968" rel="noreferrer" target="_blank">https://bluejeans.com/441850968</a><br>
<br>
Gluster-users mailing list<br>
<a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>
<a href="https://lists.gluster.org/mailman/listinfo/gluster-users" rel="noreferrer" target="_blank">https://lists.gluster.org/mailman/listinfo/gluster-users</a><br>
</blockquote></div></div>