<div dir="ltr">Hi Kotresh, Any update on this bug status?<div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Sep 22, 2016 at 6:57 PM, Amudhan P <span dir="ltr"><<a href="mailto:amudhan83@gmail.com" target="_blank">amudhan83@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>Hi Kotresh,</div><div><br></div><div>I have raised bug.</div><div><br></div><a href="https://bugzilla.redhat.com/show_bug.cgi?id=1378466" target="_blank">https://bugzilla.redhat.com/<wbr>show_bug.cgi?id=1378466</a><br><div><br></div><div>Thanks</div><span class="HOEnZb"><font color="#888888"><div>Amudhan</div></font></span></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Sep 22, 2016 at 2:45 PM, Kotresh Hiremath Ravishankar <span dir="ltr"><<a href="mailto:khiremat@redhat.com" target="_blank">khiremat@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Amudhan,<br>
<br>
It's as of now, hard coded based on some testing results. That part is not tune-able yet.<br>
Only scrubber throttling is tune-able. As I have told you, because brick process has<br>
an open fd, bitrot signer process is not picking it up for scrubbing. Please raise<br>
a bug. We will take a look at it.<br>
<span class="m_-8513521622723434256im m_-8513521622723434256HOEnZb"><br>
Thanks and Regards,<br>
Kotresh H R<br>
<br>
----- Original Message -----<br>
> From: "Amudhan P" <<a href="mailto:amudhan83@gmail.com" target="_blank">amudhan83@gmail.com</a>><br>
> To: "Kotresh Hiremath Ravishankar" <<a href="mailto:khiremat@redhat.com" target="_blank">khiremat@redhat.com</a>><br>
> Cc: "Gluster Users" <<a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a>><br>
</span><div class="m_-8513521622723434256HOEnZb"><div class="m_-8513521622723434256h5">> Sent: Thursday, September 22, 2016 2:37:25 PM<br>
> Subject: Re: 3.8.3 Bitrot signature process<br>
><br>
> Hi Kotresh,<br>
><br>
> its same behaviour in replicated volume also, file fd opens after 120<br>
> seconds in brick pid.<br>
><br>
> for calculating signature for 100MB file it took 15m57s.<br>
><br>
><br>
> How can i increase CPU usage?, in your earlier mail you have said "To limit<br>
> the usage of CPU, throttling is done using token bucket algorithm".<br>
> any possibility of increasing bitrot hash calculation speed ?.<br>
><br>
><br>
> Thanks,<br>
> Amudhan<br>
><br>
><br>
> On Thu, Sep 22, 2016 at 11:44 AM, Kotresh Hiremath Ravishankar <<br>
> <a href="mailto:khiremat@redhat.com" target="_blank">khiremat@redhat.com</a>> wrote:<br>
><br>
> > Hi Amudhan,<br>
> ><br>
> > Thanks for the confirmation. If that's the case please try with dist-rep<br>
> > volume,<br>
> > and see if you are observing similar behavior.<br>
> ><br>
> > In any case please raise a bug for the same with your observations. We<br>
> > will work<br>
> > on it.<br>
> ><br>
> > Thanks and Regards,<br>
> > Kotresh H R<br>
> ><br>
> > ----- Original Message -----<br>
> > > From: "Amudhan P" <<a href="mailto:amudhan83@gmail.com" target="_blank">amudhan83@gmail.com</a>><br>
> > > To: "Kotresh Hiremath Ravishankar" <<a href="mailto:khiremat@redhat.com" target="_blank">khiremat@redhat.com</a>><br>
> > > Cc: "Gluster Users" <<a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a>><br>
> > > Sent: Thursday, September 22, 2016 11:25:28 AM<br>
> > > Subject: Re: 3.8.3 Bitrot signature process<br>
> > ><br>
> > > Hi Kotresh,<br>
> > ><br>
> > > 2280 is a brick process, i have not tried with dist-rep volume?<br>
> > ><br>
> > > I have not seen any fd in bitd process in any of the node's and bitd<br>
> > > process usage always 0% CPU and randomly it goes 0.3% CPU.<br>
> > ><br>
> > ><br>
> > ><br>
> > > Thanks,<br>
> > > Amudhan<br>
> > ><br>
> > > On Thursday, September 22, 2016, Kotresh Hiremath Ravishankar <<br>
> > > <a href="mailto:khiremat@redhat.com" target="_blank">khiremat@redhat.com</a>> wrote:<br>
> > > > Hi Amudhan,<br>
> > > ><br>
> > > > No, bitrot signer is a different process by itself and is not part of<br>
> > > brick process.<br>
> > > > I believe the process 2280 is a brick process ? Did you check with<br>
> > > dist-rep volume?<br>
> > > > Is the same behavior being observed there as well? We need to figure<br>
> > out<br>
> > > why brick<br>
> > > > process is holding that fd for such a long time.<br>
> > > ><br>
> > > > Thanks and Regards,<br>
> > > > Kotresh H R<br>
> > > ><br>
> > > > ----- Original Message -----<br>
> > > >> From: "Amudhan P" <<a href="mailto:amudhan83@gmail.com" target="_blank">amudhan83@gmail.com</a>><br>
> > > >> To: "Kotresh Hiremath Ravishankar" <<a href="mailto:khiremat@redhat.com" target="_blank">khiremat@redhat.com</a>><br>
> > > >> Sent: Wednesday, September 21, 2016 8:15:33 PM<br>
> > > >> Subject: Re: [Gluster-users] 3.8.3 Bitrot signature process<br>
> > > >><br>
> > > >> Hi Kotresh,<br>
> > > >><br>
> > > >> As soon as fd closes from brick1 pid, i can see bitrot signature for<br>
> > the<br>
> > > >> file in brick.<br>
> > > >><br>
> > > >> So, it looks like fd opened by brick process to calculate signature.<br>
> > > >><br>
> > > >> output of the file:<br>
> > > >><br>
> > > >> -rw-r--r-- 2 root root 250M Sep 21 18:32<br>
> > > >> /media/disk1/brick1/data/G/tes<wbr>t59-bs10M-c100.nul<br>
> > > >><br>
> > > >> getfattr: Removing leading '/' from absolute path names<br>
> > > >> # file: media/disk1/brick1/data/G/test<wbr>59-bs10M-c100.nul<br>
> > > >> trusted.bit-rot.signature=0x01<wbr>0200000000000000e9474e4cc6<br>
> > > 73c0c227a6e807e04aa4ab1f88d374<wbr>4243950a290869c53daa65df<br>
> > > >> trusted.bit-rot.version=0x0200<wbr>00000000000057d6af3200012a13<br>
> > > >> trusted.ec.config=0x0000080501<wbr>000200<br>
> > > >> trusted.ec.size=0x000000003e80<wbr>0000<br>
> > > >> trusted.ec.version=0x000000000<wbr>0001f400000000000001f40<br>
> > > >> trusted.gfid=0x4c0911454294484<wbr>68fffe358482c63e1<br>
> > > >><br>
> > > >> stat /media/disk1/brick1/data/G/tes<wbr>t59-bs10M-c100.nul<br>
> > > >> File: ‘/media/disk1/brick1/data/G/te<wbr>st59-bs10M-c100.nul’<br>
> > > >> Size: 262144000 Blocks: 512000 IO Block: 4096 regular<br>
> > file<br>
> > > >> Device: 811h/2065d Inode: 402653311 Links: 2<br>
> > > >> Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/<br>
> > root)<br>
> > > >> Access: 2016-09-21 18:34:43.722712751 +0530<br>
> > > >> Modify: 2016-09-21 18:32:41.650712946 +0530<br>
> > > >> Change: 2016-09-21 19:14:41.698708914 +0530<br>
> > > >> Birth: -<br>
> > > >><br>
> > > >><br>
> > > >> In other 2 bricks in same set, still signature is not updated for the<br>
> > > same<br>
> > > >> file.<br>
> > > >><br>
> > > >><br>
> > > >> On Wed, Sep 21, 2016 at 6:48 PM, Amudhan P <<a href="mailto:amudhan83@gmail.com" target="_blank">amudhan83@gmail.com</a>><br>
> > wrote:<br>
> > > >><br>
> > > >> > Hi Kotresh,<br>
> > > >> ><br>
> > > >> > I am very sure, No read was going on from mount point.<br>
> > > >> ><br>
> > > >> > Again i did same test but after writing data to mount point. I have<br>
> > > >> > unmounted mount point.<br>
> > > >> ><br>
> > > >> > after 120 seconds i am seeing this file fd entry in brick 1 pid<br>
> > > >> ><br>
> > > >> > getfattr -m. -e hex -d test59-bs10<br>
> > > >> > # file: test59-bs10M-c100.nul<br>
> > > >> > trusted.bit-rot.version=0x0200<wbr>00000000000057bed574000ed534<br>
> > > >> > trusted.ec.config=0x0000080501<wbr>000200<br>
> > > >> > trusted.ec.size=0x000000003e80<wbr>0000<br>
> > > >> > trusted.ec.version=0x000000000<wbr>0001f400000000000001f40<br>
> > > >> > trusted.gfid=0x4c0911454294484<wbr>68fffe358482c63e1<br>
> > > >> ><br>
> > > >> ><br>
> > > >> > ls -l /proc/2280/fd<br>
> > > >> > lr-x------ 1 root root 64 Sep 21 13:08 19 -> /media/disk1/brick1/.<br>
> > > >> > glusterfs/4c/09/4c091145-4294-<wbr>4846-8fff-e358482c63e1<br>
> > > >> ><br>
> > > >> > Volume is a EC - 4+1<br>
> > > >> ><br>
> > > >> > On Wed, Sep 21, 2016 at 6:17 PM, Kotresh Hiremath Ravishankar <<br>
> > > >> > <a href="mailto:khiremat@redhat.com" target="_blank">khiremat@redhat.com</a>> wrote:<br>
> > > >> ><br>
> > > >> >> Hi Amudhan,<br>
> > > >> >><br>
> > > >> >> If you see the ls output, some process has a fd opened in the<br>
> > backend.<br>
> > > >> >> That is the reason bitrot is not considering for the signing.<br>
> > > >> >> Could you please observe, after 120 secs of closure of<br>
> > > >> >> "/media/disk2/brick2/.glusterf<wbr>s/6e/7c/6e7c49e6-094e-4435-<br>
> > > >> >> 85bf-f21f99fd8764"<br>
> > > >> >> the signing happens. If so we need to figure out who holds this fd<br>
> > for<br>
> > > >> >> such a long time.<br>
> > > >> >> And also we need to figure is this issue specific to EC volume.<br>
> > > >> >><br>
> > > >> >> Thanks and Regards,<br>
> > > >> >> Kotresh H R<br>
> > > >> >><br>
> > > >> >> ----- Original Message -----<br>
> > > >> >> > From: "Amudhan P" <<a href="mailto:amudhan83@gmail.com" target="_blank">amudhan83@gmail.com</a>><br>
> > > >> >> > To: "Kotresh Hiremath Ravishankar" <<a href="mailto:khiremat@redhat.com" target="_blank">khiremat@redhat.com</a>><br>
> > > >> >> > Cc: "Gluster Users" <<a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a>><br>
> > > >> >> > Sent: Wednesday, September 21, 2016 4:56:40 PM<br>
> > > >> >> > Subject: Re: [Gluster-users] 3.8.3 Bitrot signature process<br>
> > > >> >> ><br>
> > > >> >> > Hi Kotresh,<br>
> > > >> >> ><br>
> > > >> >> ><br>
> > > >> >> > Writing new file.<br>
> > > >> >> ><br>
> > > >> >> > getfattr -m. -e hex -d /media/disk2/brick2/data/G/<br>
> > > test58-bs10M-c100.nul<br>
> > > >> >> > getfattr: Removing leading '/' from absolute path names<br>
> > > >> >> > # file: media/disk2/brick2/data/G/test<wbr>58-bs10M-c100.nul<br>
> > > >> >> > trusted.bit-rot.version=0x0200<wbr>00000000000057da8b23000b120e<br>
> > > >> >> > trusted.ec.config=0x0000080501<wbr>000200<br>
> > > >> >> > trusted.ec.size=0x000000003e80<wbr>0000<br>
> > > >> >> > trusted.ec.version=0x000000000<wbr>0001f400000000000001f40<br>
> > > >> >> > trusted.gfid=0x6e7c49e6094e443<wbr>585bff21f99fd8764<br>
> > > >> >> ><br>
> > > >> >> ><br>
> > > >> >> > Running ls -l in brick 2 pid<br>
> > > >> >> ><br>
> > > >> >> > ls -l /proc/30162/fd<br>
> > > >> >> ><br>
> > > >> >> > lr-x------ 1 root root 64 Sep 21 16:22 59 -><br>
> > > >> >> > /media/disk2/brick2/.glusterfs<wbr>/quanrantine<br>
> > > >> >> > lrwx------ 1 root root 64 Sep 21 16:22 6 -><br>
> > > >> >> > /var/lib/glusterd/vols/glsvol1<wbr>/run/10.1.2.2-media-<br>
> > disk2-brick2.pid<br>
> > > >> >> > lr-x------ 1 root root 64 Sep 21 16:25 60 -><br>
> > > >> >> > /media/disk2/brick2/.glusterfs<wbr>/6e/7c/6e7c49e6-094e-4435-<br>
> > > >> >> 85bf-f21f99fd8764<br>
> > > >> >> > lr-x------ 1 root root 64 Sep 21 16:22 61 -><br>
> > > >> >> > /media/disk2/brick2/.glusterfs<wbr>/quanrantine<br>
> > > >> >> ><br>
> > > >> >> ><br>
> > > >> >> > find /media/disk2/ -samefile<br>
> > > >> >> > /media/disk2/brick2/.glusterfs<wbr>/6e/7c/6e7c49e6-094e-4435-<br>
> > > >> >> 85bf-f21f99fd8764<br>
> > > >> >> > /media/disk2/brick2/.glusterfs<wbr>/6e/7c/6e7c49e6-094e-4435-<br>
> > > >> >> 85bf-f21f99fd8764<br>
> > > >> >> > /media/disk2/brick2/data/G/tes<wbr>t58-bs10M-c100.nul<br>
> > > >> >> ><br>
> > > >> >> ><br>
> > > >> >> ><br>
> > > >> >> > On Wed, Sep 21, 2016 at 3:28 PM, Kotresh Hiremath Ravishankar <<br>
> > > >> >> > <a href="mailto:khiremat@redhat.com" target="_blank">khiremat@redhat.com</a>> wrote:<br>
> > > >> >> ><br>
> > > >> >> > > Hi Amudhan,<br>
> > > >> >> > ><br>
> > > >> >> > > Don't grep for the filename, glusterfs maintains hardlink in<br>
> > > >> >> .glusterfs<br>
> > > >> >> > > directory<br>
> > > >> >> > > for each file. Just check 'ls -l /proc/<respective brick<br>
> > pid>/fd'<br>
> > > for<br>
> > > >> >> any<br>
> > > >> >> > > fds opened<br>
> > > >> >> > > for a file in .glusterfs and check if it's the same file.<br>
> > > >> >> > ><br>
> > > >> >> > > Thanks and Regards,<br>
> > > >> >> > > Kotresh H R<br>
> > > >> >> > ><br>
> > > >> >> > > ----- Original Message -----<br>
> > > >> >> > > > From: "Amudhan P" <<a href="mailto:amudhan83@gmail.com" target="_blank">amudhan83@gmail.com</a>><br>
> > > >> >> > > > To: "Kotresh Hiremath Ravishankar" <<a href="mailto:khiremat@redhat.com" target="_blank">khiremat@redhat.com</a>><br>
> > > >> >> > > > Cc: "Gluster Users" <<a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a>><br>
> > > >> >> > > > Sent: Wednesday, September 21, 2016 1:33:10 PM<br>
> > > >> >> > > > Subject: Re: [Gluster-users] 3.8.3 Bitrot signature process<br>
> > > >> >> > > ><br>
> > > >> >> > > > Hi Kotresh,<br>
> > > >> >> > > ><br>
> > > >> >> > > > i have used below command to verify any open fd for file.<br>
> > > >> >> > > ><br>
> > > >> >> > > > "ls -l /proc/*/fd | grep filename".<br>
> > > >> >> > > ><br>
> > > >> >> > > > as soon as write completes there no open fd's, if there is<br>
> > any<br>
> > > >> >> alternate<br>
> > > >> >> > > > option. please let me know will also try that.<br>
> > > >> >> > > ><br>
> > > >> >> > > ><br>
> > > >> >> > > ><br>
> > > >> >> > > ><br>
> > > >> >> > > > Also, below is my scrub status in my test setup. number of<br>
> > > skipped<br>
> > > >> >> files<br>
> > > >> >> > > > slow reducing day by day. I think files are skipped due to<br>
> > > bitrot<br>
> > > >> >> > > signature<br>
> > > >> >> > > > process is not completed yet.<br>
> > > >> >> > > ><br>
> > > >> >> > > > where can i see scrub skipped files?<br>
> > > >> >> > > ><br>
> > > >> >> > > ><br>
> > > >> >> > > > Volume name : glsvol1<br>
> > > >> >> > > ><br>
> > > >> >> > > > State of scrub: Active (Idle)<br>
> > > >> >> > > ><br>
> > > >> >> > > > Scrub impact: normal<br>
> > > >> >> > > ><br>
> > > >> >> > > > Scrub frequency: daily<br>
> > > >> >> > > ><br>
> > > >> >> > > > Bitrot error log location: /var/log/glusterfs/bitd.log<br>
> > > >> >> > > ><br>
> > > >> >> > > > Scrubber error log location: /var/log/glusterfs/scrub.log<br>
> > > >> >> > > ><br>
> > > >> >> > > ><br>
> > > >> >> > > > ==============================<wbr>===========================<br>
> > > >> >> > > ><br>
> > > >> >> > > > Node: localhost<br>
> > > >> >> > > ><br>
> > > >> >> > > > Number of Scrubbed files: 1644<br>
> > > >> >> > > ><br>
> > > >> >> > > > Number of Skipped files: 1001<br>
> > > >> >> > > ><br>
> > > >> >> > > > Last completed scrub time: 2016-09-20 11:59:58<br>
> > > >> >> > > ><br>
> > > >> >> > > > Duration of last scrub (D:M:H:M:S): 0:0:39:26<br>
> > > >> >> > > ><br>
> > > >> >> > > > Error count: 0<br>
> > > >> >> > > ><br>
> > > >> >> > > ><br>
> > > >> >> > > > ==============================<wbr>===========================<br>
> > > >> >> > > ><br>
> > > >> >> > > > Node: 10.1.2.3<br>
> > > >> >> > > ><br>
> > > >> >> > > > Number of Scrubbed files: 1644<br>
> > > >> >> > > ><br>
> > > >> >> > > > Number of Skipped files: 1001<br>
> > > >> >> > > ><br>
> > > >> >> > > > Last completed scrub time: 2016-09-20 10:50:00<br>
> > > >> >> > > ><br>
> > > >> >> > > > Duration of last scrub (D:M:H:M:S): 0:0:38:17<br>
> > > >> >> > > ><br>
> > > >> >> > > > Error count: 0<br>
> > > >> >> > > ><br>
> > > >> >> > > ><br>
> > > >> >> > > > ==============================<wbr>===========================<br>
> > > >> >> > > ><br>
> > > >> >> > > > Node: 10.1.2.4<br>
> > > >> >> > > ><br>
> > > >> >> > > > Number of Scrubbed files: 981<br>
> > > >> >> > > ><br>
> > > >> >> > > > Number of Skipped files: 1664<br>
> > > >> >> > > ><br>
> > > >> >> > > > Last completed scrub time: 2016-09-20 12:38:01<br>
> > > >> >> > > ><br>
> > > >> >> > > > Duration of last scrub (D:M:H:M:S): 0:0:35:19<br>
> > > >> >> > > ><br>
> > > >> >> > > > Error count: 0<br>
> > > >> >> > > ><br>
> > > >> >> > > ><br>
> > > >> >> > > > ==============================<wbr>===========================<br>
> > > >> >> > > ><br>
> > > >> >> > > > Node: 10.1.2.1<br>
> > > >> >> > > ><br>
> > > >> >> > > > Number of Scrubbed files: 1263<br>
> > > >> >> > > ><br>
> > > >> >> > > > Number of Skipped files: 1382<br>
> > > >> >> > > ><br>
> > > >> >> > > > Last completed scrub time: 2016-09-20 11:57:21<br>
> > > >> >> > > ><br>
> > > >> >> > > > Duration of last scrub (D:M:H:M:S): 0:0:37:17<br>
> > > >> >> > > ><br>
> > > >> >> > > > Error count: 0<br>
> > > >> >> > > ><br>
> > > >> >> > > ><br>
> > > >> >> > > > ==============================<wbr>===========================<br>
> > > >> >> > > ><br>
> > > >> >> > > > Node: 10.1.2.2<br>
> > > >> >> > > ><br>
> > > >> >> > > > Number of Scrubbed files: 1644<br>
> > > >> >> > > ><br>
> > > >> >> > > > Number of Skipped files: 1001<br>
> > > >> >> > > ><br>
> > > >> >> > > > Last completed scrub time: 2016-09-20 11:59:25<br>
> > > >> >> > > ><br>
> > > >> >> > > > Duration of last scrub (D:M:H:M:S): 0:0:39:18<br>
> > > >> >> > > ><br>
> > > >> >> > > > Error count: 0<br>
> > > >> >> > > ><br>
> > > >> >> > > > ==============================<wbr>===========================<br>
> > > >> >> > > ><br>
> > > >> >> > > ><br>
> > > >> >> > > ><br>
> > > >> >> > > ><br>
> > > >> >> > > > Thanks<br>
> > > >> >> > > > Amudhan<br>
> > > >> >> > > ><br>
> > > >> >> > > ><br>
> > > >> >> > > > On Wed, Sep 21, 2016 at 11:45 AM, Kotresh Hiremath<br>
> > Ravishankar <<br>
> > > >> >> > > > <a href="mailto:khiremat@redhat.com" target="_blank">khiremat@redhat.com</a>> wrote:<br>
> > > >> >> > > ><br>
> > > >> >> > > > > Hi Amudhan,<br>
> > > >> >> > > > ><br>
> > > >> >> > > > > I don't think it's the limitation with read data from the<br>
> > > brick.<br>
> > > >> >> > > > > To limit the usage of CPU, throttling is done using token<br>
> > > bucket<br>
> > > >> >> > > > > algorithm. The log message showed is related to it. But<br>
> > even<br>
> > > then<br>
> > > >> >> > > > > I think it should not take 12 minutes for check-sum<br>
> > > calculation<br>
> > > >> >> unless<br>
> > > >> >> > > > > there is an fd open (might be internal). Could you please<br>
> > > cross<br>
> > > >> >> verify<br>
> > > >> >> > > > > if there are any fd opened on that file by looking into<br>
> > > /proc? I<br>
> > > >> >> will<br>
> > > >> >> > > > > also test it out in the mean time and get back to you.<br>
> > > >> >> > > > ><br>
> > > >> >> > > > > Thanks and Regards,<br>
> > > >> >> > > > > Kotresh H R<br>
> > > >> >> > > > ><br>
> > > >> >> > > > > ----- Original Message -----<br>
> > > >> >> > > > > > From: "Amudhan P" <<a href="mailto:amudhan83@gmail.com" target="_blank">amudhan83@gmail.com</a>><br>
> > > >> >> > > > > > To: "Kotresh Hiremath Ravishankar" <<a href="mailto:khiremat@redhat.com" target="_blank">khiremat@redhat.com</a>><br>
> > > >> >> > > > > > Cc: "Gluster Users" <<a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a>><br>
> > > >> >> > > > > > Sent: Tuesday, September 20, 2016 3:19:28 PM<br>
> > > >> >> > > > > > Subject: Re: [Gluster-users] 3.8.3 Bitrot signature<br>
> > process<br>
> > > >> >> > > > > ><br>
> > > >> >> > > > > > Hi Kotresh,<br>
> > > >> >> > > > > ><br>
> > > >> >> > > > > > Please correct me if i am wrong, Once a file write<br>
> > completes<br>
> > > >> >> and as<br>
> > > >> >> > > soon<br>
> > > >> >> > > > > as<br>
> > > >> >> > > > > > closes fds, bitrot waits for 120 seconds and starts<br>
> > hashing<br>
> > > and<br>
> > > >> >> > > update<br>
> > > >> >> > > > > > signature for the file in brick.<br>
> > > >> >> > > > > ><br>
> > > >> >> > > > > > But, what i am feeling that bitrot takes too much of<br>
> > time to<br>
> > > >> >> complete<br>
> > > >> >> > > > > > hashing.<br>
> > > >> >> > > > > ><br>
> > > >> >> > > > > > below is test result i would like to share.<br>
> > > >> >> > > > > ><br>
> > > >> >> > > > > ><br>
> > > >> >> > > > > > writing data in below path using dd :<br>
> > > >> >> > > > > ><br>
> > > >> >> > > > > > /mnt/gluster/data/G (mount point)<br>
> > > >> >> > > > > > -rw-r--r-- 1 root root 10M Sep 20 12:19<br>
> > test53-bs10M-c1.nul<br>
> > > >> >> > > > > > -rw-r--r-- 1 root root 100M Sep 20 12:19<br>
> > > test54-bs10M-c10.nul<br>
> > > >> >> > > > > ><br>
> > > >> >> > > > > > No any other write or read process is going on.<br>
> > > >> >> > > > > ><br>
> > > >> >> > > > > ><br>
> > > >> >> > > > > > Checking file data in one of the brick.<br>
> > > >> >> > > > > ><br>
> > > >> >> > > > > > -rw-r--r-- 2 root root 2.5M Sep 20 12:23<br>
> > test53-bs10M-c1.nul<br>
> > > >> >> > > > > > -rw-r--r-- 2 root root 25M Sep 20 12:23<br>
> > > test54-bs10M-c10.nul<br>
> > > >> >> > > > > ><br>
> > > >> >> > > > > > file's stat and getfattr info from brick, after write<br>
> > > process<br>
> > > >> >> > > completed.<br>
> > > >> >> > > > > ><br>
> > > >> >> > > > > > gfstst-node5:/media/disk2/bric<wbr>k2/data/G$ stat<br>
> > > >> >> test53-bs10M-c1.nul<br>
> > > >> >> > > > > > File: ‘test53-bs10M-c1.nul’<br>
> > > >> >> > > > > > Size: 2621440 Blocks: 5120 IO Block: 4096<br>
> > > >> >> regular<br>
> > > >> >> > > file<br>
> > > >> >> > > > > > Device: 821h/2081d Inode: 536874168 Links: 2<br>
> > > >> >> > > > > > Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: (<br>
> > > 0/<br>
> > > >> >> > > root)<br>
> > > >> >> > > > > > Access: 2016-09-20 12:23:28.798886647 +0530<br>
> > > >> >> > > > > > Modify: 2016-09-20 12:23:28.994886646 +0530<br>
> > > >> >> > > > > > Change: 2016-09-20 12:23:28.998886646 +0530<br>
> > > >> >> > > > > > Birth: -<br>
> > > >> >> > > > > ><br>
> > > >> >> > > > > > gfstst-node5:/media/disk2/bric<wbr>k2/data/G$ stat<br>
> > > >> >> test54-bs10M-c10.nul<br>
> > > >> >> > > > > > File: ‘test54-bs10M-c10.nul’<br>
> > > >> >> > > > > > Size: 26214400 Blocks: 51200 IO Block: 4096<br>
> > > >> >> regular<br>
> > > >> >> > > file<br>
> > > >> >> > > > > > Device: 821h/2081d Inode: 536874169 Links: 2<br>
> > > >> >> > > > > > Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: (<br>
> > > 0/<br>
> > > >> >> > > root)<br>
> > > >> >> > > > > > Access: 2016-09-20 12:23:42.902886624 +0530<br>
> > > >> >> > > > > > Modify: 2016-09-20 12:23:44.378886622 +0530<br>
> > > >> >> > > > > > Change: 2016-09-20 12:23:44.378886622 +0530<br>
> > > >> >> > > > > > Birth: -<br>
> > > >> >> > > > > ><br>
> > > >> >> > > > > > gfstst-node5:/media/disk2/bric<wbr>k2/data/G$ sudo getfattr<br>
> > -m.<br>
> > > -e<br>
> > > >> >> hex -d<br>
> > > >> >> > > > > > test53-bs10M-c1.nul<br>
> > > >> >> > > > > > # file: test53-bs10M-c1.nul<br>
> > > >> >> > > > > > trusted.bit-rot.version=0x0200<wbr>00000000000057daa7b50002<br>
> > e5b4<br>
> > > >> >> > > > > > trusted.ec.config=0x0000080501<wbr>000200<br>
> > > >> >> > > > > > trusted.ec.size=0x0000000000a0<wbr>0000<br>
> > > >> >> > > > > > trusted.ec.version=0x000000000<wbr>00000500000000000000050<br>
> > > >> >> > > > > > trusted.gfid=0xe2416bd1aae4403<wbr>c88f44286273bbe99<br>
> > > >> >> > > > > ><br>
> > > >> >> > > > > > gfstst-node5:/media/disk2/bric<wbr>k2/data/G$ sudo getfattr<br>
> > -m.<br>
> > > -e<br>
> > > >> >> hex -d<br>
> > > >> >> > > > > > test54-bs10M-c10.nul<br>
> > > >> >> > > > > > # file: test54-bs10M-c10.nul<br>
> > > >> >> > > > > > trusted.bit-rot.version=0x0200<wbr>00000000000057daa7b50002<br>
> > e5b4<br>
> > > >> >> > > > > > trusted.ec.config=0x0000080501<wbr>000200<br>
> > > >> >> > > > > > trusted.ec.size=0x000000000640<wbr>0000<br>
> > > >> >> > > > > > trusted.ec.version=0x000000000<wbr>00003200000000000000320<br>
> > > >> >> > > > > > trusted.gfid=0x54e018dd8c5a4bd<wbr>79e0317729d8a57c5<br>
> > > >> >> > > > > ><br>
> > > >> >> > > > > ><br>
> > > >> >> > > > > ><br>
> > > >> >> > > > > > file's stat and getfattr info from brick, after bitrot<br>
> > > signature<br>
> > > >> >> > > updated.<br>
> > > >> >> > > > > ><br>
> > > >> >> > > > > > gfstst-node5:/media/disk2/bric<wbr>k2/data/G$ stat<br>
> > > >> >> test53-bs10M-c1.nul<br>
> > > >> >> > > > > > File: ‘test53-bs10M-c1.nul’<br>
> > > >> >> > > > > > Size: 2621440 Blocks: 5120 IO Block: 4096<br>
> > > >> >> regular<br>
> > > >> >> > > file<br>
> > > >> >> > > > > > Device: 821h/2081d Inode: 536874168 Links: 2<br>
> > > >> >> > > > > > Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: (<br>
> > > 0/<br>
> > > >> >> > > root)<br>
> > > >> >> > > > > > Access: 2016-09-20 12:25:31.494886450 +0530<br>
> > > >> >> > > > > > Modify: 2016-09-20 12:23:28.994886646 +0530<br>
> > > >> >> > > > > > Change: 2016-09-20 12:27:00.994886307 +0530<br>
> > > >> >> > > > > > Birth: -<br>
> > > >> >> > > > > ><br>
> > > >> >> > > > > ><br>
> > > >> >> > > > > > gfstst-node5:/media/disk2/bric<wbr>k2/data/G$ sudo getfattr<br>
> > -m.<br>
> > > -e<br>
> > > >> >> hex -d<br>
> > > >> >> > > > > > test53-bs10M-c1.nul<br>
> > > >> >> > > > > > # file: test53-bs10M-c1.nul<br>
> > > >> >> > > > > > trusted.bit-rot.signature=0x01<wbr>02000000000000006de7493c5c<br>
> > > >> >> > > > > 90f643357c268fbaaf461c1567e033<wbr>4e4948023ce17268403aa37a<br>
> > > >> >> > > > > > trusted.bit-rot.version=0x0200<wbr>00000000000057daa7b50002<br>
> > e5b4<br>
> > > >> >> > > > > > trusted.ec.config=0x0000080501<wbr>000200<br>
> > > >> >> > > > > > trusted.ec.size=0x0000000000a0<wbr>0000<br>
> > > >> >> > > > > > trusted.ec.version=0x000000000<wbr>00000500000000000000050<br>
> > > >> >> > > > > > trusted.gfid=0xe2416bd1aae4403<wbr>c88f44286273bbe99<br>
> > > >> >> > > > > ><br>
> > > >> >> > > > > ><br>
> > > >> >> > > > > > gfstst-node5:/media/disk2/bric<wbr>k2/data/G$ stat<br>
> > > >> >> test54-bs10M-c10.nul<br>
> > > >> >> > > > > > File: ‘test54-bs10M-c10.nul’<br>
> > > >> >> > > > > > Size: 26214400 Blocks: 51200 IO Block: 4096<br>
> > > >> >> regular<br>
> > > >> >> > > file<br>
> > > >> >> > > > > > Device: 821h/2081d Inode: 536874169 Links: 2<br>
> > > >> >> > > > > > Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: (<br>
> > > 0/<br>
> > > >> >> > > root)<br>
> > > >> >> > > > > > Access: 2016-09-20 12:25:47.510886425 +0530<br>
> > > >> >> > > > > > Modify: 2016-09-20 12:23:44.378886622 +0530<br>
> > > >> >> > > > > > Change: 2016-09-20 12:38:05.954885243 +0530<br>
> > > >> >> > > > > > Birth: -<br>
> > > >> >> > > > > ><br>
> > > >> >> > > > > ><br>
> > > >> >> > > > > > gfstst-node5:/media/disk2/bric<wbr>k2/data/G$ sudo getfattr<br>
> > -m.<br>
> > > -e<br>
> > > >> >> hex -d<br>
> > > >> >> > > > > > test54-bs10M-c10.nul<br>
> > > >> >> > > > > > # file: test54-bs10M-c10.nul<br>
> > > >> >> > > > > > trusted.bit-rot.signature=0x01<wbr>0200000000000000394c345f0b<br>
> > > >> >> > > > > 0c63ee652627a62eed069244d35c4d<wbr>5134e4f07d4eabb51afda47e<br>
> > > >> >> > > > > > trusted.bit-rot.version=0x0200<wbr>00000000000057daa7b50002<br>
> > e5b4<br>
> > > >> >> > > > > > trusted.ec.config=0x0000080501<wbr>000200<br>
> > > >> >> > > > > > trusted.ec.size=0x000000000640<wbr>0000<br>
> > > >> >> > > > > > trusted.ec.version=0x000000000<wbr>00003200000000000000320<br>
> > > >> >> > > > > > trusted.gfid=0x54e018dd8c5a4bd<wbr>79e0317729d8a57c5<br>
> > > >> >> > > > > ><br>
> > > >> >> > > > > ><br>
> > > >> >> > > > > > (Actual time taken for reading file from brick for<br>
> > md5sum)<br>
> > > >> >> > > > > ><br>
> > > >> >> > > > > > gfstst-node5:/media/disk2/bric<wbr>k2/data/G$ time md5sum<br>
> > > >> >> > > test53-bs10M-c1.nul<br>
> > > >> >> > > > > > 8354dcaa18a1ecb52d0895bf00888c<wbr>44 test53-bs10M-c1.nul<br>
> > > >> >> > > > > ><br>
> > > >> >> > > > > > real 0m0.045s<br>
> > > >> >> > > > > > user 0m0.007s<br>
> > > >> >> > > > > > sys 0m0.003s<br>
> > > >> >> > > > > ><br>
> > > >> >> > > > > > gfstst-node5:/media/disk2/bric<wbr>k2/data/G$ time md5sum<br>
> > > >> >> > > > > test54-bs10M-c10.nul<br>
> > > >> >> > > > > > bed3c0a4a1407f584989b4009e9ce3<wbr>3f test54-bs10M-c10.nul<br>
> > > >> >> > > > > ><br>
> > > >> >> > > > > > real 0m0.166s<br>
> > > >> >> > > > > > user 0m0.062s<br>
> > > >> >> > > > > > sys 0m0.011s<br>
> > > >> >> > > > > ><br>
> > > >> >> > > > > > As you can see that 'test54-bs10M-c10.nul' file took<br>
> > around<br>
> > > 12<br>
> > > >> >> > > minutes to<br>
> > > >> >> > > > > > update bitort signature (pls refer stat output for the<br>
> > > file).<br>
> > > >> >> > > > > ><br>
> > > >> >> > > > > > what would be the cause for such a slow read?. Any<br>
> > > limitation<br>
> > > >> >> in read<br>
> > > >> >> > > > > data<br>
> > > >> >> > > > > > from brick?<br>
> > > >> >> > > > > ><br>
> > > >> >> > > > > > Also, i am seeing this line bitd.log, what does this<br>
> > mean?<br>
> > > >> >> > > > > > [bit-rot.c:1784:br_rate_limit_<wbr>signer]<br>
> > 0-glsvol1-bit-rot-0:<br>
> > > >> >> [Rate<br>
> > > >> >> > > Limit<br>
> > > >> >> > > > > > Info] "tokens/sec (rate): 131072, maxlimit: 524288<br>
> > > >> >> > > > > ><br>
> > > >> >> > > > > ><br>
> > > >> >> > > > > > Thanks<br>
> > > >> >> > > > > > Amudhan P<br>
> > > >> >> > > > > ><br>
> > > >> >> > > > > ><br>
> > > >> >> > > > > ><br>
> > > >> >> > > > > > On Mon, Sep 19, 2016 at 1:00 PM, Kotresh Hiremath<br>
> > > Ravishankar <<br>
> > > >> >> > > > > > <a href="mailto:khiremat@redhat.com" target="_blank">khiremat@redhat.com</a>> wrote:<br>
> > > >> >> > > > > ><br>
> > > >> >> > > > > > > Hi Amudhan,<br>
> > > >> >> > > > > > ><br>
> > > >> >> > > > > > > Thanks for testing out the bitrot feature and sorry for<br>
> > > the<br>
> > > >> >> delayed<br>
> > > >> >> > > > > > > response.<br>
> > > >> >> > > > > > > Please find the answers inline.<br>
> > > >> >> > > > > > ><br>
> > > >> >> > > > > > > Thanks and Regards,<br>
> > > >> >> > > > > > > Kotresh H R<br>
> > > >> >> > > > > > ><br>
> > > >> >> > > > > > > ----- Original Message -----<br>
> > > >> >> > > > > > > > From: "Amudhan P" <<a href="mailto:amudhan83@gmail.com" target="_blank">amudhan83@gmail.com</a>><br>
> > > >> >> > > > > > > > To: "Gluster Users" <<a href="mailto:gluster-users@gluster.org" target="_blank">gluster-users@gluster.org</a>><br>
> > > >> >> > > > > > > > Sent: Friday, September 16, 2016 4:14:10 PM<br>
> > > >> >> > > > > > > > Subject: Re: [Gluster-users] 3.8.3 Bitrot signature<br>
> > > process<br>
> > > >> >> > > > > > > ><br>
> > > >> >> > > > > > > > Hi,<br>
> > > >> >> > > > > > > ><br>
> > > >> >> > > > > > > > Can anyone reply to this mail.<br>
> > > >> >> > > > > > > ><br>
> > > >> >> > > > > > > > On Tue, Sep 13, 2016 at 12:49 PM, Amudhan P <<br>
> > > >> >> > > <a href="mailto:amudhan83@gmail.com" target="_blank">amudhan83@gmail.com</a> ><br>
> > > >> >> > > > > > > wrote:<br>
> > > >> >> > > > > > > ><br>
> > > >> >> > > > > > > ><br>
> > > >> >> > > > > > > ><br>
> > > >> >> > > > > > > > Hi,<br>
> > > >> >> > > > > > > ><br>
> > > >> >> > > > > > > > I am testing bitrot feature in Gluster 3.8.3 with<br>
> > > disperse<br>
> > > >> >> EC<br>
> > > >> >> > > volume<br>
> > > >> >> > > > > 4+1.<br>
> > > >> >> > > > > > > ><br>
> > > >> >> > > > > > > > When i write single small file (< 10MB) after 2<br>
> > seconds<br>
> > > i<br>
> > > >> >> can see<br>
> > > >> >> > > > > bitrot<br>
> > > >> >> > > > > > > > signature in bricks for the file, but when i write<br>
> > > multiple<br>
> > > >> >> files<br>
> > > >> >> > > > > with<br>
> > > >> >> > > > > > > > different size ( > 10MB) it takes long time (> 24hrs)<br>
> > > to see<br>
> > > >> >> > > bitrot<br>
> > > >> >> > > > > > > > signature in all the files.<br>
> > > >> >> > > > > > ><br>
> > > >> >> > > > > > > The default timeout for signing to happen is 120<br>
> > > seconds.<br>
> > > >> >> So the<br>
> > > >> >> > > > > > > signing will happen<br>
> > > >> >> > > > > > > 120 secs after the last fd gets closed on that file.<br>
> > So<br>
> > > if<br>
> > > >> >> the<br>
> > > >> >> > > file<br>
> > > >> >> > > > > is<br>
> > > >> >> > > > > > > being written<br>
> > > >> >> > > > > > > continuously, it will not be signed until 120 secs<br>
> > after<br>
> > > >> >> it's<br>
> > > >> >> > > last<br>
> > > >> >> > > > > fd is<br>
> > > >> >> > > > > > > closed.<br>
> > > >> >> > > > > > > ><br>
> > > >> >> > > > > > > > My questions are.<br>
> > > >> >> > > > > > > > 1. I have enabled scrub schedule as hourly and<br>
> > throttle<br>
> > > as<br>
> > > >> >> > > normal,<br>
> > > >> >> > > > > does<br>
> > > >> >> > > > > > > this<br>
> > > >> >> > > > > > > > make any impact in delaying bitrot signature?<br>
> > > >> >> > > > > > > No.<br>
> > > >> >> > > > > > > > 2. other than "bitd.log" where else i can watch<br>
> > current<br>
> > > >> >> status of<br>
> > > >> >> > > > > bitrot,<br>
> > > >> >> > > > > > > > like number of files added for signature and file<br>
> > > status?<br>
> > > >> >> > > > > > > Signature will happen after 120 sec of last fd<br>
> > > closure,<br>
> > > >> >> as<br>
> > > >> >> > > said<br>
> > > >> >> > > > > above.<br>
> > > >> >> > > > > > > There is not status command which tracks the<br>
> > > signature<br>
> > > >> >> of the<br>
> > > >> >> > > > > files.<br>
> > > >> >> > > > > > > But there is bitrot status command which tracks<br>
> > the<br>
> > > >> >> number of<br>
> > > >> >> > > > > files<br>
> > > >> >> > > > > > > scrubbed.<br>
> > > >> >> > > > > > ><br>
> > > >> >> > > > > > > #gluster vol bitrot <volname> scrub status<br>
> > > >> >> > > > > > ><br>
> > > >> >> > > > > > ><br>
> > > >> >> > > > > > > > 3. where i can confirm that all the files in the<br>
> > brick<br>
> > > are<br>
> > > >> >> bitrot<br>
> > > >> >> > > > > signed?<br>
> > > >> >> > > > > > ><br>
> > > >> >> > > > > > > As said, signing information of all the files is<br>
> > not<br>
> > > >> >> tracked.<br>
> > > >> >> > > > > > ><br>
> > > >> >> > > > > > > > 4. is there any file read size limit in bitrot?<br>
> > > >> >> > > > > > ><br>
> > > >> >> > > > > > > I didn't get. Could you please elaborate this ?<br>
> > > >> >> > > > > > ><br>
> > > >> >> > > > > > > > 5. options for tuning bitrot for faster signing of<br>
> > > files?<br>
> > > >> >> > > > > > ><br>
> > > >> >> > > > > > > Bitrot feature is mainly to detect silent<br>
> > corruption<br>
> > > >> >> > > (bitflips) of<br>
> > > >> >> > > > > > > files due to long<br>
> > > >> >> > > > > > > term storage. Hence the default is 120 sec of<br>
> > last fd<br>
> > > >> >> > > closure, the<br>
> > > >> >> > > > > > > signing happens.<br>
> > > >> >> > > > > > > But there is a tune able which can change the<br>
> > default<br>
> > > >> >> 120 sec<br>
> > > >> >> > > but<br>
> > > >> >> > > > > > > that's only for<br>
> > > >> >> > > > > > > testing purposes and we don't recommend it.<br>
> > > >> >> > > > > > ><br>
> > > >> >> > > > > > > gluster vol get master features.expiry-time<br>
> > > >> >> > > > > > ><br>
> > > >> >> > > > > > > For testing purposes, you can change this default<br>
> > and<br>
> > > >> >> test.<br>
> > > >> >> > > > > > > ><br>
> > > >> >> > > > > > > > Thanks<br>
> > > >> >> > > > > > > > Amudhan<br>
> > > >> >> > > > > > > ><br>
> > > >> >> > > > > > > ><br>
> > > >> >> > > > > > > > ______________________________<wbr>_________________<br>
> > > >> >> > > > > > > > Gluster-users mailing list<br>
> > > >> >> > > > > > > > <a href="mailto:Gluster-users@gluster.org" target="_blank">Gluster-users@gluster.org</a><br>
> > > >> >> > > > > > > > <a href="http://www.gluster.org/" rel="noreferrer" target="_blank">http://www.gluster.org/</a><br>
> > mailman/listinfo/gluster-users<br>
> > > >> >> > > > > > ><br>
> > > >> >> > > > > ><br>
> > > >> >> > > > ><br>
> > > >> >> > > ><br>
> > > >> >> > ><br>
> > > >> >> ><br>
> > > >> >><br>
> > > >> ><br>
> > > >> ><br>
> > > >><br>
> > > ><br>
> > ><br>
> ><br>
><br>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div></div></div>