[Gluster-devel] Issue about the size of fstat is less than the really size of the syslog file

Raghavendra G raghavendra at gluster.com
Thu Nov 3 07:40:36 UTC 2016


On Thu, Nov 3, 2016 at 10:27 AM, Raghavendra G <raghavendra at gluster.com>
wrote:

>
>
> On Thu, Nov 3, 2016 at 7:16 AM, Lian, George (Nokia - CN/Hangzhou) <
> george.lian at nokia.com> wrote:
>
>> >Yes. I was assuming that the previous results were tested with:
>> >1. write-behind on with the fix
>> >2. quick-read and readdir-ahead off
>> # gluster volume info log
>>
>> performance.quick-read: off
>> performance.readdir-ahead: off
>> performance.stat-prefetch: on
>> performance.write-behind: on
>>
>>
>> with the above configuration and write-behind.so with patch 2, the "tail
>> truncated" issue still be there.
>>
>> # tail -f syslog >/dev/null
>> tail: syslog: file truncated
>> tail: syslog: file truncated
>>
>> FYI,
>>
>
> Thanks George. I'll take a look.
>

Can you please test with following configuration?

1. write-behind on with my fix
2. readdir-ahead and quick-read off
3. performance.stat-prefetch on
4. performance.force-readdirp off
5. dht.force-readdirp off
6. Also mount glusterfs with option "use-readdirp=no"

[root at booradley glusterfs]# mount -t glusterfs -o use-readdirp=no
booradley:/newptop /mnt

[root at booradley glusterfs]# ps ax | grep -i mnt
14418 ?        Ssl    0:00 /usr/local/sbin/glusterfs --use-readdirp=no
--volfile-server=booradley --volfile-id=/newptop /mnt

[root at booradley glusterfs]# gluster volume set newptop
performance.write-behind on
volume set: success

[root at booradley glusterfs]# gluster volume set newptop
performance.quick-read off
volume set: success

[root at booradley glusterfs]# gluster volume set newptop
performance.stat-prefetch on
volume set: success

[root at booradley glusterfs]# gluster volume set newptop
performance.force-readdirp off
volume set: success

[root at booradley glusterfs]# gluster volume set newptop dht.force-readdirp
off
volume set: success

[root at booradley glusterfs]# gluster volume set newptop
performance.readdir-ahead off
volume set: success

[root at booradley glusterfs]# gluster volume info newptop

Volume Name: newptop
Type: Distribute
Volume ID: 092756e1-e095-4e05-9f14-3e9a6aed908c
Status: Started
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: booradley:/home/export/newptop
Options Reconfigured:
dht.force-readdirp: off
performance.force-readdirp: off
performance.stat-prefetch: on
performance.write-behind: on
performance.quick-read: off
transport.address-family: inet
performance.readdir-ahead: off
nfs.disable: on



>
>>
>> Best Regards,
>> George
>>
>>
>> -----Original Message-----
>> From: Raghavendra Gowdappa [mailto:rgowdapp at redhat.com]
>> Sent: Wednesday, November 02, 2016 5:41 PM
>> To: Lian, George (Nokia - CN/Hangzhou) <george.lian at nokia.com>
>> Cc: Raghavendra G <raghavendra at gluster.com>; Gluster-devel at gluster.org;
>> Zizka, Jan (Nokia - CZ/Prague) <jan.zizka at nokia.com>; Zhang, Bingxuan
>> (Nokia - CN/Hangzhou) <bingxuan.zhang at nokia.com>
>> Subject: Re: [Gluster-devel] Issue about the size of fstat is less than
>> the really size of the syslog file
>>
>>
>>
>> ----- Original Message -----
>> > From: "George Lian (Nokia - CN/Hangzhou)" <george.lian at nokia.com>
>> > To: "Raghavendra Gowdappa" <rgowdapp at redhat.com>
>> > Cc: "Raghavendra G" <raghavendra at gluster.com>,
>> Gluster-devel at gluster.org, "Jan Zizka (Nokia - CZ/Prague)"
>> > <jan.zizka at nokia.com>, "Bingxuan Zhang (Nokia - CN/Hangzhou)" <
>> bingxuan.zhang at nokia.com>
>> > Sent: Wednesday, November 2, 2016 1:38:44 PM
>> > Subject: RE: [Gluster-devel] Issue about the size of fstat is less than
>> the really size of the syslog file
>> >
>> > Yes, I confirm use the Patch 2.
>> >
>> > One update: the issue is occurred when readdir-ahead off and
>> write-behind on.
>> > Seems gone when write-behind and readdir-ahead and quick-read all off.
>> > Not verified with readdir-ahead and quick-read both off and
>> write-behind on
>> > till now.
>> >
>> > Need I test it with write-behind on and readdir-ahead and quick-read
>> both
>> > off?
>>
>> Yes. I was assuming that the previous results were tested with:
>> 1. write-behind on with the fix
>> 2. quick-read and readdir-ahead off
>>
>> If not, test results with this configuration will help.
>>
>> >
>> > Best Regards,
>> > George
>> >
>> > -----Original Message-----
>> > From: Raghavendra Gowdappa [mailto:rgowdapp at redhat.com]
>> > Sent: Wednesday, November 02, 2016 4:04 PM
>> > To: Lian, George (Nokia - CN/Hangzhou) <george.lian at nokia.com>
>> > Cc: Raghavendra G <raghavendra at gluster.com>; Gluster-devel at gluster.org;
>> > Zizka, Jan (Nokia - CZ/Prague) <jan.zizka at nokia.com>; Zhang, Bingxuan
>> (Nokia
>> > - CN/Hangzhou) <bingxuan.zhang at nokia.com>
>> > Subject: Re: [Gluster-devel] Issue about the size of fstat is less than
>> the
>> > really size of the syslog file
>> >
>> >
>> >
>> > ----- Original Message -----
>> > > From: "George Lian (Nokia - CN/Hangzhou)" <george.lian at nokia.com>
>> > > To: "Raghavendra Gowdappa" <rgowdapp at redhat.com>
>> > > Cc: "Raghavendra G" <raghavendra at gluster.com>,
>> Gluster-devel at gluster.org,
>> > > "Jan Zizka (Nokia - CZ/Prague)"
>> > > <jan.zizka at nokia.com>, "Bingxuan Zhang (Nokia - CN/Hangzhou)"
>> > > <bingxuan.zhang at nokia.com>
>> > > Sent: Wednesday, November 2, 2016 1:29:13 PM
>> > > Subject: RE: [Gluster-devel] Issue about the size of fstat is less
>> than the
>> > > really size of the syslog file
>> > >
>> > > Hi,
>> > >
>> > > When those 3 options turn off, the issue seems gone in about 3 hours,
>> > > otherwise, the issue will be occurred about every 10 minutes.
>> >
>> > That's a good news. IIRC, you mentioned that you saw the issue with just
>> > write-behind on, with fix applied (readdir-ahead and quick-read off).
>> Can
>> > you please confirm you had patcset 2 of http://review.gluster.org/15757
>> ?
>> > patchset 1 had some issues that I corrected in 2.
>> >
>> > >
>> > > Best Regards,
>> > > George
>> > >
>> > > -----Original Message-----
>> > > From: Raghavendra Gowdappa [mailto:rgowdapp at redhat.com]
>> > > Sent: Wednesday, November 02, 2016 1:07 PM
>> > > To: Lian, George (Nokia - CN/Hangzhou) <george.lian at nokia.com>
>> > > Cc: Raghavendra G <raghavendra at gluster.com>;
>> Gluster-devel at gluster.org;
>> > > Zizka, Jan (Nokia - CZ/Prague) <jan.zizka at nokia.com>; Zhang, Bingxuan
>> > > (Nokia
>> > > - CN/Hangzhou) <bingxuan.zhang at nokia.com>
>> > > Subject: Re: [Gluster-devel] Issue about the size of fstat is less
>> than the
>> > > really size of the syslog file
>> > >
>> > > Can you try with following xlators turned off?
>> > >
>> > > 1. write-behind
>> > > 2. readdir-ahead
>> > > 3. quick-read
>> > >
>> > > regards,
>> > > Raghavendra
>> > >
>> > > ----- Original Message -----
>> > > > From: "George Lian (Nokia - CN/Hangzhou)" <george.lian at nokia.com>
>> > > > To: "Raghavendra Gowdappa" <rgowdapp at redhat.com>, "Raghavendra G"
>> > > > <raghavendra at gluster.com>
>> > > > Cc: Gluster-devel at gluster.org, "Jan Zizka (Nokia - CZ/Prague)"
>> > > > <jan.zizka at nokia.com>, "Bingxuan Zhang (Nokia -
>> > > > CN/Hangzhou)" <bingxuan.zhang at nokia.com>
>> > > > Sent: Wednesday, November 2, 2016 10:26:08 AM
>> > > > Subject: RE: [Gluster-devel] Issue about the size of fstat is less
>> than
>> > > > the
>> > > > really size of the syslog file
>> > > >
>> > > > Hi,
>> > > >
>> > > > Sorry, it seems the issue still there with readdir-ahead off.
>> > > >
>> > > > # tail -f syslog >syslog.2.log
>> > > > tail: syslog: file truncated
>> > > > tail: syslog: file truncated
>> > > > tail: syslog: file truncated
>> > > > tail: syslog: file truncated
>> > > > tail: syslog: file truncated
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > tail: syslog: file truncated
>> > > >
>> > > > The last line is new truncated error message after the following
>> set with
>> > > > option readdir-ahead.
>> > > >
>> > > >
>> > > > # gluster volume set log performance.readdir-ahead off
>> > > > volume set: success
>> > > > [root at SN-0(RNC-124) /root]
>> > > > # gluster volume info log
>> > > >
>> > > > Volume Name: log
>> > > > Type: Replicate
>> > > > Volume ID: 9d987a23-1d64-4857-980a-548f1dd0d1e6
>> > > > Status: Started
>> > > > Number of Bricks: 1 x 2 = 2
>> > > > Transport-type: tcp
>> > > > Bricks:
>> > > > Brick1: 169.254.0.12:/mnt/bricks/log/brick
>> > > > Brick2: 169.254.0.13:/mnt/bricks/log/brick
>> > > > Options Reconfigured:
>> > > > performance.readdir-ahead: off
>> > > >
>> > > > BRs
>> > > > George
>> > > >
>> > > > -----Original Message-----
>> > > > From: gluster-devel-bounces at gluster.org
>> > > > [mailto:gluster-devel-bounces at gluster.org] On Behalf Of Raghavendra
>> > > > Gowdappa
>> > > > Sent: Tuesday, November 01, 2016 9:29 PM
>> > > > To: Raghavendra G <raghavendra at gluster.com>
>> > > > Cc: Gluster-devel at gluster.org; Zizka, Jan (Nokia - CZ/Prague)
>> > > > <jan.zizka at nokia.com>; Zhang, Bingxuan (Nokia - CN/Hangzhou)
>> > > > <bingxuan.zhang at nokia.com>
>> > > > Subject: Re: [Gluster-devel] Issue about the size of fstat is less
>> than
>> > > > the
>> > > > really size of the syslog file
>> > > >
>> > > >
>> > > >
>> > > > ----- Original Message -----
>> > > > > From: "Raghavendra G" <raghavendra at gluster.com>
>> > > > > To: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
>> > > > > Cc: Gluster-devel at gluster.org, "Jan Zizka (Nokia - CZ/Prague)"
>> > > > > <jan.zizka at nokia.com>, "Bingxuan Zhang (Nokia -
>> > > > > CN/Hangzhou)" <bingxuan.zhang at nokia.com>
>> > > > > Sent: Tuesday, November 1, 2016 6:55:31 PM
>> > > > > Subject: Re: [Gluster-devel] Issue about the size of fstat is
>> less than
>> > > > > the
>> > > > > really size of the syslog file
>> > > > >
>> > > > > Hi George,
>> > > > >
>> > > > > You can try your tests in two scenarios:
>> > > > >
>> > > > > 1. Switch off following performance translators:
>> > > > > a. write-behind
>> > > > > b. quick-read
>> > > > > c. readdir-ahead
>> > > > >
>> > > > > Once you see that the issue goes away in the above scenario, you
>> can
>> > > > > apply
>> > > > > write-behind fix [1] and turn write-behind on.
>> > > > >
>> > > > > We have a fix in quick-read at [2].
>> > > > >
>> > > > > You can also try to test with the quick-read fix and turning
>> quick-read
>> > > > > on.
>> > > > > However, since your issue is stale stat I doubt quick-read is not
>> the
>> > > >
>> > > > s/quick-read is not the culprit/quick-read is the culprit/
>> > > >
>> > > > > culprit here (The issue with quick-read is stale data).
>> > > > >
>> > > > > We see that readdir-ahead is not updating/flushing the stats
>> stored
>> > > > > along
>> > > > > with dentries when write or file modifications happen. That seem
>> to be
>> > > > > the
>> > > > > issue with readdir-ahead. For now, during both of the above two
>> > > > > scenarios,
>> > > > > please keep readdir-ahead off.
>> > > > >
>> > > > > [1] http://review.gluster.org/15757/2
>> > > > > [2] http://review.gluster.org/#/c/15759/2
>> > > > >
>> > > > > regards,
>> > > > > Raghavendra
>> > > > >
>> > > > > On Tue, Nov 1, 2016 at 7:55 AM, Pranith Kumar Karampuri <
>> > > > > pkarampu at redhat.com
>> > > > > > wrote:
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Tue, Nov 1, 2016 at 7:32 AM, Lian, George (Nokia -
>> CN/Hangzhou) <
>> > > > > george.lian at nokia.com > wrote:
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > Hi,
>> > > > >
>> > > > >
>> > > > >
>> > > > > I will test it with your patches and update to you when I have
>> result.
>> > > > >
>> > > > > hi George,
>> > > > > Please use http://review.gluster.org/#/c/15757/2 i.e. second
>> version of
>> > > > > Raghavendra's patch. I tested it and it worked fine. We are still
>> > > > > trying
>> > > > > to
>> > > > > figure out quick-read and readdir-ahead as I type this mail.
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > Thanks a lots
>> > > > >
>> > > > >
>> > > > >
>> > > > > Best Regards,
>> > > > >
>> > > > > George
>> > > > >
>> > > > >
>> > > > >
>> > > > > From: Pranith Kumar Karampuri [mailto: pkarampu at redhat.com ]
>> > > > > Sent: Monday, October 31, 2016 11:23 AM
>> > > > > To: Lian, George (Nokia - CN/Hangzhou) < george.lian at nokia.com >
>> > > > > Cc: Raghavendra Gowdappa < rgowdapp at redhat.com >; Zhang, Bingxuan
>> > > > > (Nokia
>> > > > > -
>> > > > > CN/Hangzhou) < bingxuan.zhang at nokia.com >;
>> Gluster-devel at gluster.org ;
>> > > > > Zizka, Jan (Nokia - CZ/Prague) < jan.zizka at nokia.com >
>> > > > >
>> > > > > Subject: Re: [Gluster-devel] Issue about the size of fstat is
>> less than
>> > > > > the
>> > > > > really size of the syslog file
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > Removing I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX at internal.nsn.com , it is
>> > > > > causing
>> > > > > mail delivery problems for me.
>> > > > >
>> > > > >
>> > > > > George,
>> > > > >
>> > > > >
>> > > > > Raghavendra and I made some progress on this issue. We were in
>> parallel
>> > > > > working on another issue which is similar where elastic search
>> indices
>> > > > > are
>> > > > > getting corrupted because of wrong stat sizes in our opinion. So
>> I have
>> > > > > been
>> > > > > running different translator stacks in identifying the problematic
>> > > > > xlators
>> > > > > which are leading to indices corruption.
>> > > > >
>> > > > >
>> > > > > We found the list to be 1) Write-behind, 2) Quick-read, 3)
>> > > > > Readdir-ahead.
>> > > > > Raghavendra and I just had a chat and we are suspecting that lack
>> of
>> > > > > lookup/readdirp implementation in write-behind could be the
>> reason for
>> > > > > this
>> > > > > problem. Similar problems may exist in other two xlators too. But
>> we
>> > > > > are
>> > > > > working on write-behind with priority.
>> > > > >
>> > > > >
>> > > > > Our theory is this:
>> > > > >
>> > > > >
>> > > > > If we do a 4KB write for example and it is cached in write-behind
>> and
>> > > > > we
>> > > > > do
>> > > > > a
>> > > > > lookup on the file/do a readdirp on the directory with this file
>> we
>> > > > > send
>> > > > > out
>> > > > > wrong stat value to the kernel. There are different caches between
>> > > > > kernel
>> > > > > and gluster which may lead to fstat never coming till
>> write-behind. So
>> > > > > we
>> > > > > need to make sure that we don't get into this situation.
>> > > > >
>> > > > >
>> > > > > Action items:
>> > > > >
>> > > > >
>> > > > > At the moment Raghavendra is working on a patch to implement
>> > > > > lookup/readdirp
>> > > > > in write-behind. I am going to test the same for elastic search.
>> Will
>> > > > > it
>> > > > > be
>> > > > > possible for you to test your application against the same patch
>> and
>> > > > > confirm
>> > > > > that the patch fixes the problem?
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Fri, Oct 28, 2016 at 12:08 PM, Pranith Kumar Karampuri <
>> > > > > pkarampu at redhat.com > wrote:
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > hi George,
>> > > > >
>> > > > >
>> > > > > It would help if we can identify the bare minimum xlators which
>> are
>> > > > > contributing to the issue like Raghavendra was mentioning
>> earlier. We
>> > > > > were
>> > > > > wondering if it is possible for you to help us in identifying the
>> issue
>> > > > > by
>> > > > > running the workload on a modified setup? We can suggest testing
>> out
>> > > > > using
>> > > > > custom volfiles so that we can slowly build the graph which could
>> be
>> > > > > causing
>> > > > > this issue. We would like you guys to try out this problem with
>> just
>> > > > > posix-xlator and fuse and nothing else.
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Thu, Oct 27, 2016 at 1:40 PM, Lian, George (Nokia -
>> CN/Hangzhou) <
>> > > > > george.lian at nokia.com > wrote:
>> > > > >
>> > > > >
>> > > > >
>> > > > > Hi, Raghavendra,
>> > > > >
>> > > > > Could you please give some suggestion for this issue? we try to
>> find
>> > > > > the
>> > > > > clue
>> > > > > for this issue for a long time, but it has no progress:(
>> > > > >
>> > > > > Thanks & Best Regards,
>> > > > > George
>> > > > >
>> > > > > -----Original Message-----
>> > > > > From: Lian, George (Nokia - CN/Hangzhou)
>> > > > > Sent: Wednesday, October 19, 2016 4:40 PM
>> > > > > To: 'Raghavendra Gowdappa' < rgowdapp at redhat.com >
>> > > > > Cc: Gluster-devel at gluster.org ; I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX_GMS
>> <
>> > > > > I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX at internal.nsn.com >; Zhang,
>> Bingxuan
>> > > > > (Nokia
>> > > > > -
>> > > > > CN/Hangzhou) < bingxuan.zhang at nokia.com >; Zizka, Jan (Nokia -
>> > > > > CZ/Prague)
>> > > > > <
>> > > > > jan.zizka at nokia.com >
>> > > > > Subject: RE: [Gluster-devel] Issue about the size of fstat is
>> less than
>> > > > > the
>> > > > > really size of the syslog file
>> > > > >
>> > > > > Hi, Raghavendra
>> > > > >
>> > > > > Just now, we test it with glusterfs log with debug-level "TRACE",
>> and
>> > > > > let
>> > > > > some application trigger "glusterfs" produce large log, in that
>> case,
>> > > > > when
>> > > > > we set write-behind and stat-prefetch both OFF,
>> > > > > Tail the glusterfs log such like mnt-{VOLUME-NAME}.log, it still
>> failed
>> > > > > with
>> > > > > "file truncated",
>> > > > >
>> > > > > So that means if file's IO in huge amount, the issue will still be
>> > > > > there
>> > > > > even
>> > > > > write-behind and stat-prefetch both OFF.
>> > > > >
>> > > > > Best Regards,
>> > > > > George
>> > > > >
>> > > > > -----Original Message-----
>> > > > > From: Raghavendra Gowdappa [mailto: rgowdapp at redhat.com ]
>> > > > >
>> > > > >
>> > > > > Sent: Wednesday, October 19, 2016 2:54 PM
>> > > > > To: Lian, George (Nokia - CN/Hangzhou) < george.lian at nokia.com >
>> > > > > Cc: Gluster-devel at gluster.org ; I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX_GMS
>> <
>> > > > > I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX at internal.nsn.com >; Zhang,
>> Bingxuan
>> > > > > (Nokia
>> > > > > -
>> > > > > CN/Hangzhou) < bingxuan.zhang at nokia.com >; Zizka, Jan (Nokia -
>> > > > > CZ/Prague)
>> > > > > <
>> > > > > jan.zizka at nokia.com >
>> > > > > Subject: Re: [Gluster-devel] Issue about the size of fstat is
>> less than
>> > > > > the
>> > > > > really size of the syslog file
>> > > > >
>> > > > >
>> > > > >
>> > > > > ----- Original Message -----
>> > > > > > From: "George Lian (Nokia - CN/Hangzhou)" <
>> george.lian at nokia.com >
>> > > > > > To: "Raghavendra Gowdappa" < rgowdapp at redhat.com >
>> > > > > > Cc: Gluster-devel at gluster.org , "I_EXT_MBB_WCDMA_SWD3_DA1_MATR
>> IX_GMS"
>> > > > > > < I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX at internal.nsn.com >,
>> "Bingxuan Zhang
>> > > > > > (Nokia - CN/Hangzhou)"
>> > > > > > < bingxuan.zhang at nokia.com >, "Jan Zizka (Nokia - CZ/Prague)" <
>> > > > > > jan.zizka at nokia.com >
>> > > > > > Sent: Wednesday, October 19, 2016 12:05:01 PM
>> > > > > > Subject: RE: [Gluster-devel] Issue about the size of fstat is
>> less
>> > > > > > than
>> > > > > > the
>> > > > > > really size of the syslog file
>> > > > > >
>> > > > > > Hi, Raghavendra,
>> > > > > >
>> > > > > > Thanks a lots for your quickly update!
>> > > > > > In my case, there are so many process(write) is writing to the
>> syslog
>> > > > > > file,
>> > > > > > it do involve the writer is in the same host and writing in same
>> > > > > > mount
>> > > > > > point
>> > > > > > while the tail(reader) is reading it.
>> > > > > >
>> > > > > > The bug I just guess is:
>> > > > > > When a writer write the data with write-behind, it call the
>> call-back
>> > > > > > function " mdc_writev_cbk" and called
>> "mdc_inode_iatt_set_validate"
>> > > > > > to
>> > > > > > validate the "iatt" data, but with the code I mentioned last
>> mail, it
>> > > > > > do
>> > > > > > nothing.
>> > > > >
>> > > > > mdc_inode_iatt_set_validate has following code
>> > > > >
>> > > > > <snippet>
>> > > > > if (!iatt || !iatt->ia_ctime) {
>> > > > > mdc->ia_time = 0;
>> > > > > goto unlock;
>> > > > > }
>> > > > > </snippet>
>> > > > >
>> > > > > Which means a NULL iatt sets mdc->ia_time to 0. This results in
>> > > > > subsequent
>> > > > > lookup/stat calls to be NOT served from md-cache. Instead, the
>> stat is
>> > > > > served from backend bricks. So, I don't see an issue here.
>> > > > >
>> > > > > However, one case where a NULL iatt is different from a valid iatt
>> > > > > (which
>> > > > > differs from the value stored in md-cache) is that the latter
>> results
>> > > > > in
>> > > > > a
>> > > > > call to inode_invalidate. This invalidation propagates to kernel
>> and
>> > > > > all
>> > > > > dentry and page cache corresponding to file is purged. So, I am
>> > > > > suspecting
>> > > > > whether the stale stat you saw was served from kernel cache (not
>> from
>> > > > > glusterfs). If this is the case, having mount options
>> > > > > "attribute-timeout=0"
>> > > > > and "entry-timeout=0" should've helped.
>> > > > >
>> > > > > I am still at loss to point out the RCA for this issue.
>> > > > >
>> > > > >
>> > > > > > And in same time, the reader(tail) read the "iatt" data, but in
>> case
>> > > > > > of
>> > > > > > the
>> > > > > > cache-time is not timeout, it will return the "iatt" data
>> without the
>> > > > > > last
>> > > > > > change.
>> > > > > >
>> > > > > > Do your think it is a possible bug?
>> > > > > >
>> > > > > > Thanks & Best Regards,
>> > > > > > George
>> > > > > >
>> > > > > > -----Original Message-----
>> > > > > > From: Raghavendra Gowdappa [mailto: rgowdapp at redhat.com ]
>> > > > > > Sent: Wednesday, October 19, 2016 2:06 PM
>> > > > > > To: Lian, George (Nokia - CN/Hangzhou) < george.lian at nokia.com
>> >
>> > > > > > Cc: Gluster-devel at gluster.org ; I_EXT_MBB_WCDMA_SWD3_DA1_MATRI
>> X_GMS
>> > > > > > < I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX at internal.nsn.com >; Zhang,
>> Bingxuan
>> > > > > > (Nokia -
>> > > > > > CN/Hangzhou) < bingxuan.zhang at nokia.com >; Zizka, Jan (Nokia -
>> > > > > > CZ/Prague)
>> > > > > > < jan.zizka at nokia.com >
>> > > > > > Subject: Re: [Gluster-devel] Issue about the size of fstat is
>> less
>> > > > > > than
>> > > > > > the
>> > > > > > really size of the syslog file
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > ----- Original Message -----
>> > > > > > > From: "George Lian (Nokia - CN/Hangzhou)" <
>> george.lian at nokia.com >
>> > > > > > > To: "Raghavendra Gowdappa" < rgowdapp at redhat.com >
>> > > > > > > Cc: Gluster-devel at gluster.org ,
>> > > > > > > "I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX_GMS"
>> > > > > > > < I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX at internal.nsn.com >,
>> "Bingxuan
>> > > > > > > Zhang
>> > > > > > > (Nokia
>> > > > > > > - CN/Hangzhou)"
>> > > > > > > < bingxuan.zhang at nokia.com >, "Jan Zizka (Nokia - CZ/Prague)"
>> > > > > > > < jan.zizka at nokia.com >
>> > > > > > > Sent: Wednesday, October 19, 2016 10:51:24 AM
>> > > > > > > Subject: RE: [Gluster-devel] Issue about the size of fstat is
>> less
>> > > > > > > than
>> > > > > > > the
>> > > > > > > really size of the syslog file
>> > > > > > >
>> > > > > > > Hi, Raghavendra,
>> > > > > > >
>> > > > > > > When we disable md-cache(gluster volume set log
>> > > > > > > performance.md-cache-timeout
>> > > > > > > 0), the issue seems gone.
>> > > > > > > (we can't disable with " gluster volume set log
>> > > > > > > performance.md-cache
>> > > > > > > off"
>> > > > > > > why?)
>> > > > > >
>> > > > > > Please use
>> > > > > > #gluster volume set log performance.stat-prefetch off
>> > > > > >
>> > > > > > >
>> > > > > > > So I double confuse that the code I abstract in last mail
>> maybe
>> > > > > > > have
>> > > > > > > some
>> > > > > > > issue for this case.
>> > > > > > > Could you please share your comments?
>> > > > > >
>> > > > > > Please find my comments below.
>> > > > > >
>> > > > > > >
>> > > > > > > Thanks & Best Regards,
>> > > > > > > George
>> > > > > > >
>> > > > > > > -----Original Message-----
>> > > > > > > From: Lian, George (Nokia - CN/Hangzhou)
>> > > > > > > Sent: Friday, October 14, 2016 1:44 PM
>> > > > > > > To: 'Raghavendra Gowdappa' < rgowdapp at redhat.com >
>> > > > > > > Cc: Gluster-devel at gluster.org ;
>> I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX_GMS
>> > > > > > > < I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX at internal.nsn.com >; Zhang,
>> > > > > > > Bingxuan
>> > > > > > > (Nokia
>> > > > > > > -
>> > > > > > > CN/Hangzhou) < bingxuan.zhang at nokia.com >; Zizka, Jan (Nokia
>> -
>> > > > > > > CZ/Prague)
>> > > > > > > < jan.zizka at nokia.com >
>> > > > > > > Subject: RE: [Gluster-devel] Issue about the size of fstat is
>> less
>> > > > > > > than
>> > > > > > > the
>> > > > > > > really size of the syslog file
>> > > > > > >
>> > > > > > > Hi, Raghavendra,
>> > > > > > >
>> > > > > > > Our version of GlusterFS is 3.6.9, and I also check the
>> newest code
>> > > > > > > of
>> > > > > > > main
>> > > > > > > branch, the function of " mdc_inode_iatt_set_validate" is
>> almost
>> > > > > > > same,
>> > > > > > > from
>> > > > > > > the following code of this function,
>> > > > > > > We could see a "TODO" comments inline, does it mean if we
>> enhance
>> > > > > > > write-behind feature, the "iatt" field in callback will be
>> NULL, so
>> > > > > > > that
>> > > > > > > inode_invalidate will not be called? So the size of file will
>> not
>> > > > > > > update
>> > > > > > > since "write behind" enabled ?
>> > > > > > > Is it the root cause for "tail" application failed with "file
>> > > > > > > truncated"
>> > > > > > > issue ?
>> > > > > > >
>> > > > > > > LOCK (&mdc->lock);
>> > > > > > > {
>> > > > > > > if (!iatt || !iatt->ia_ctime) {
>> > > > > > > mdc->ia_time = 0;
>> > > > > > > goto unlock;
>> > > > > > > }
>> > > > > > >
>> > > > > > > /*
>> > > > > > > * Invalidate the inode if the mtime or ctime has changed
>> > > > > > > * and the prebuf doesn't match the value we have cached.
>> > > > > > > * TODO: writev returns with a NULL iatt due to
>> > > > > > > * performance/write-behind, causing invalidation on writes.
>> > > > > > > */
>> > > > > >
>> > > > > > The issue explained in this comment is hit only when writes are
>> done.
>> > > > > > But,
>> > > > > > in
>> > > > > > your use-case only "tail" is the application running on the
>> mount (If
>> > > > > > I
>> > > > > > am
>> > > > > > not wrong, the writer is running on a different mountpoint).
>> So, I
>> > > > > > doubt
>> > > > > > you are hitting this issue. But, you are saying that the issue
>> goes
>> > > > > > away
>> > > > > > when write-behind/md-cache is turned off pointing to some
>> interaction
>> > > > > > between md-cache and write-behind causing the issue. I need
>> more time
>> > > > > > to
>> > > > > > look into this issue. Can you file a bug on this?
>> > > > > >
>> > > > > > > if (IA_ISREG(inode->ia_type) &&
>> > > > > > > ((iatt->ia_mtime != mdc->md_mtime) ||
>> > > > > > > (iatt->ia_ctime != mdc->md_ctime)))
>> > > > > > > if (!prebuf || (prebuf->ia_ctime != mdc->md_ctime) ||
>> > > > > > > (prebuf->ia_mtime != mdc->md_mtime))
>> > > > > > > inode_invalidate(inode);
>> > > > > > >
>> > > > > > > mdc_from_iatt (mdc, iatt);
>> > > > > > >
>> > > > > > > time (&mdc->ia_time);
>> > > > > > > }
>> > > > > > >
>> > > > > > > Best Regards,
>> > > > > > > George
>> > > > > > > -----Original Message-----
>> > > > > > > From: Raghavendra Gowdappa [mailto: rgowdapp at redhat.com ]
>> > > > > > > Sent: Thursday, October 13, 2016 8:58 PM
>> > > > > > > To: Lian, George (Nokia - CN/Hangzhou) <
>> george.lian at nokia.com >
>> > > > > > > Cc: Gluster-devel at gluster.org ;
>> I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX_GMS
>> > > > > > > < I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX at internal.nsn.com >; Zhang,
>> > > > > > > Bingxuan
>> > > > > > > (Nokia
>> > > > > > > -
>> > > > > > > CN/Hangzhou) < bingxuan.zhang at nokia.com >; Zizka, Jan (Nokia
>> -
>> > > > > > > CZ/Prague)
>> > > > > > > < jan.zizka at nokia.com >
>> > > > > > > Subject: Re: [Gluster-devel] Issue about the size of fstat is
>> less
>> > > > > > > than
>> > > > > > > the
>> > > > > > > really size of the syslog file
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > ----- Original Message -----
>> > > > > > > > From: "George Lian (Nokia - CN/Hangzhou)" <
>> george.lian at nokia.com
>> > > > > > > > >
>> > > > > > > > To: Gluster-devel at gluster.org
>> > > > > > > > Cc: "I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX_GMS"
>> > > > > > > > < I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX at internal.nsn.com >,
>> "Bingxuan
>> > > > > > > > Zhang
>> > > > > > > > (Nokia
>> > > > > > > > - CN/Hangzhou)" < bingxuan.zhang at nokia.com >, "Jan Zizka
>> (Nokia -
>> > > > > > > > CZ/Prague)"
>> > > > > > > > < jan.zizka at nokia.com >
>> > > > > > > > Sent: Thursday, October 13, 2016 2:33:53 PM
>> > > > > > > > Subject: [Gluster-devel] Issue about the size of fstat is
>> less
>> > > > > > > > than
>> > > > > > > > the
>> > > > > > > > really size of the syslog file
>> > > > > > > >
>> > > > > > > > Hi, Dear Expert,
>> > > > > > > > We have use glusterfs as a network filesystem, and syslog
>> store
>> > > > > > > > in
>> > > > > > > > there,
>> > > > > > > > some clients on different host may write the syslog file via
>> > > > > > > > “glusterfs”
>> > > > > > > > mount point.
>> > > > > > > > Now we encounter an issue when we “tail” the syslog file,
>> it will
>> > > > > > > > occasional
>> > > > > > > > failed with error “ file truncated ”
>> > > > > > > > As we study and trace with the “tail” source code, it
>> failed with
>> > > > > > > > the
>> > > > > > > > following code:
>> > > > > > > > if ( S_ISREG (mode) && stats.st_size < f[i].size )
>> > > > > > > > {
>> > > > > > > > error (0, 0, _("%s: file truncated"), quotef (name));
>> > > > > > > > /* Assume the file was truncated to 0,
>> > > > > > > > and therefore output all "new" data. */
>> > > > > > > > xlseek (fd, 0, SEEK_SET, name);
>> > > > > > > > f[i].size = 0;
>> > > > > > > > }
>> > > > > > > > When stats.st_size < f[i].size, what mean the size report by
>> > > > > > > > fstat
>> > > > > > > > is
>> > > > > > > > less
>> > > > > > > > than “tail” had read, it lead to “file truncated”, we also
>> use
>> > > > > > > > “strace”
>> > > > > > > > tools to trace the tail application, the related tail
>> strace log
>> > > > > > > > as
>> > > > > > > > the
>> > > > > > > > below:
>> > > > > > > > nanosleep({1, 0}, NULL) = 0
>> > > > > > > > fstat(3, {st_mode=S_IFREG|0644, st_size=192543105, ...}) = 0
>> > > > > > > > nanosleep({1, 0}, NULL) = 0
>> > > > > > > > fstat(3, {st_mode=S_IFREG|0644, st_size=192543105, ...}) = 0
>> > > > > > > > nanosleep({1, 0}, NULL) = 0
>> > > > > > > > fstat(3, {st_mode=S_IFREG|0644, st_size=192543105, ...}) = 0
>> > > > > > > > nanosleep({1, 0}, NULL) = 0
>> > > > > > > > fstat(3, {st_mode=S_IFREG|0644, st_size=192544549, ...}) = 0
>> > > > > > > > read(3, " Data … -"..., 8192) = 1444
>> > > > > > > > read(3, " Data.. "..., 8192) = 720
>> > > > > > > > read(3, "", 8192) = 0
>> > > > > > > > fstat(3, {st_mode=S_IFREG|0644, st_size=192544789, ...}) = 0
>> > > > > > > > write(1, “DATA…..” ) = 2164
>> > > > > > > > write(2, "tail: ", 6tail: ) = 6
>> > > > > > > > write(2, "/mnt/log/master/syslog: file tru"...,
>> > > > > > > > 38/mnt/log/master/syslog:
>> > > > > > > > file truncated) = 38
>> > > > > > > > as the above strace log, tail has read 1444+720=2164 bytes,
>> > > > > > > > but fstat tell “tail” 192544789 – 192543105 = 1664 which
>> less
>> > > > > > > > than
>> > > > > > > > 2164,
>> > > > > > > > so
>> > > > > > > > it lead to “tail” application “file truncated”.
>> > > > > > > > And if we turn off “write-behind” feature, the issue will
>> not be
>> > > > > > > > reproduced
>> > > > > > > > any more.
>> > > > > > >
>> > > > > > > That seems strange. There are no writes happening on the
>> fd/inode
>> > > > > > > through
>> > > > > > > which tail is reading/stating from. So, it seems strange that
>> > > > > > > write-behind
>> > > > > > > is involved here. I suspect whether any of
>> > > > > > > md-cache/read-ahead/io-cache
>> > > > > > > is
>> > > > > > > causing the issue. Can you,
>> > > > > > >
>> > > > > > > 1. Turn off md-cache, read-ahead, io-cache xlators
>> > > > > > > 2. mount glusterfs with --attribute-timeout=0
>> > > > > > > 3. set write-behind on
>> > > > > > >
>> > > > > > > and rerun the tests? If you don't hit the issue, you can
>> experiment
>> > > > > > > by
>> > > > > > > turning on/off of md-cache, read-ahead and io-cache
>> translators and
>> > > > > > > see
>> > > > > > > what
>> > > > > > > are the minimal number of xlators that need to be turned off
>> to not
>> > > > > > > hit
>> > > > > > > the
>> > > > > > > issue (with write-behind on)?
>> > > > > > >
>> > > > > > > regards,
>> > > > > > > Raghavendra
>> > > > > > >
>> > > > > > > > So we think it may be related to cache consistence issue
>> due to
>> > > > > > > > performance
>> > > > > > > > consider, but we still have concern that:
>> > > > > > > > The syslog file is used only with “Append” mode, so the
>> size of
>> > > > > > > > file
>> > > > > > > > shouldn’t be reduced, when a client read the file, why
>> “fstat”
>> > > > > > > > can’t
>> > > > > > > > return
>> > > > > > > > the really size match to the cache?
>> > > > > > > > From current investigation, we doubt that the current
>> implement
>> > > > > > > > of
>> > > > > > > > “glusterfs” has a bug on “fstat” when cache is on.
>> > > > > > > > Your comments is our highly appreciated!
>> > > > > > > > Thanks & Best Regards
>> > > > > > > > George
>> > > > > > > >
>> > > > > > > > _______________________________________________
>> > > > > > > > Gluster-devel mailing list
>> > > > > > > > Gluster-devel at gluster.org
>> > > > > > > > http://www.gluster.org/mailman/listinfo/gluster-devel
>> > > > > > >
>> > > > > >
>> > > > > _______________________________________________
>> > > > > Gluster-devel mailing list
>> > > > > Gluster-devel at gluster.org
>> > > > > http://www.gluster.org/mailman/listinfo/gluster-devel
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > >
>> > > > >
>> > > > > Pranith
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > >
>> > > > >
>> > > > > Pranith
>> > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > Pranith
>> > > > >
>> > > > > _______________________________________________
>> > > > > Gluster-devel mailing list
>> > > > > Gluster-devel at gluster.org
>> > > > > http://www.gluster.org/mailman/listinfo/gluster-devel
>> > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > Raghavendra G
>> > > > >
>> > > > > _______________________________________________
>> > > > > Gluster-devel mailing list
>> > > > > Gluster-devel at gluster.org
>> > > > > http://www.gluster.org/mailman/listinfo/gluster-devel
>> > > > _______________________________________________
>> > > > Gluster-devel mailing list
>> > > > Gluster-devel at gluster.org
>> > > > http://www.gluster.org/mailman/listinfo/gluster-devel
>> > > >
>> > >
>> >
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>
>
>
>
> --
> Raghavendra G
>



-- 
Raghavendra G
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20161103/f29e9c91/attachment-0001.html>


More information about the Gluster-devel mailing list