[Gluster-Maintainers] Build failed in Jenkins: regression-test-burn-in #203

Raghavendra Gowdappa rgowdapp at redhat.com
Thu Jan 7 16:30:47 UTC 2016



----- Original Message -----
> From: "Pranith Kumar Karampuri" <pkarampu at redhat.com>
> To: "Xavier Hernandez" <xhernandez at datalab.es>, "Vijay Bellur" <vbellur at redhat.com>, maintainers at gluster.org
> Sent: Thursday, January 7, 2016 6:21:17 PM
> Subject: Re: [Gluster-Maintainers] Build failed in Jenkins: regression-test-burn-in #203
> 
> 
> 
> On 01/07/2016 06:00 PM, Xavier Hernandez wrote:
> > The problem seems to be that the inode is not valid (i.e. a previous
> > lookup has not been fully completed) before calling a setattr fop.
> > This causes that some information needed by the fop won't be available
> > and the core is generated by a failed GF_ASSERT().
> >
> > EC needs additional information to handle the encoding/decoding of
> > regular files. This information is stored in special xattr that are
> > only retrieved for inodes with ia_type == IA_IFREG.
> >
> > I've seen that sometimes an inode corresponding to a regular file is
> > received with its ia_type == IA_INVAL. Sometimes the type is set while
> > processing the request, in which case a log message is written
> > ("Unable to get size xattr") and the fop fails. However in other cases
> > the ia_type is set later (or not set at all), causing a later assert
> > to fail.
> >
> > There's a patch that could solve this particular problem
> > (http://review.gluster.org/13039) but it's only a hack to avoid a
> > worse problem that could reappear in other places.
> >
> > I talked with Pranith about this, and we agreed that a better solution
> > should be implemented. It doesn't seem acceptable that any fop
> > receives an inode that has invalid information.
> Xavi that solution will take a bit of time to implement IMO. I will
> definitely start the conversation about what we discussed on
> gluster-devel, 

Tier also needs some sort of splitting of current lookup into lookup + resolve/heal. Main problem there is inode_link done at dht/tier layer without parent xlators witnessing a lookup call on ancestors of a particular inode. So, we need a proper framework to let only root xlators of a graph (fuse/nfs/protocol-server/gfapi etc) do inode management and rest of them only "facilitate"/"request for some actions like heal".

> meanwhile can we accept this patch?
> 
> Raghavendra G,
>              How long do you think it will take for us to implement the
> solution we talked about?

I am bad at timelines :). I can see following work:

1. In dht - split into plain lookup and heal. This code already exists for major part. So, may be a dedicated week for both implementation and review.
2. In root of xlator graph (fuse/nfs/protocol-server/gfapi) we need to change resolver code to block fops to be sent only when heal is complete. Again a week for both implementation and review? this can be parallelized and owned by different people.
3. splitting of heal and lookup in other translators too (EC, afr?) - you would be in a better position to comment on this.
4. on bricks storage/posix too does inode linking as part of build-ancestry. Probably that can be looked into too. Not a necessity but good to have.

> 
> Pranith
> 
> >
> > Xavi
> >
> > On 07/01/16 03:08, Vijay Bellur wrote:
> >> On 01/05/2016 09:07 PM, Vijay Bellur wrote:
> >>> On 01/05/2016 04:55 PM, jenkins at build.gluster.org wrote:
> >>>> See <http://build.gluster.org/job/regression-test-burn-in/203/>
> >>>
> >>>> ++ ls /build/install/cores/core.9754
> >>>> + CORELIST=/build/install/cores/core.9754
> >>>> + for corefile in '$CORELIST'
> >>>> + getliblistfromcore /build/install/cores/core.9754
> >>>> + rm -f /build/install/cores/gdbout.txt
> >>>> + gdb -c /build/install/cores/core.9754 -q -ex 'info sharedlibrary'
> >>>> -ex q
> >>>> + set +x
> >>>> + rm -f /build/install/cores/gdbout.txt
> >>>> + sort /build/install/cores/liblist.txt
> >>>> + uniq
> >>>> + cat /build/install/cores/liblist.txt.tmp
> >>>> + grep -v /build/install
> >>>> + tar -cf
> >>>> /archives/archived_builds/build-install-20160105:20:48:04.tar
> >>>> /build/install/sbin /build/install/bin /build/install/lib
> >>>> /build/install/libexec /build/install/cores
> >>>> tar: Removing leading `/' from member names
> >>>> + tar -rhf
> >>>> /archives/archived_builds/build-install-20160105:20:48:04.tar -T
> >>>> /build/install/cores/liblist.txt
> >>>> tar: Removing leading `/' from member names
> >>>> + bzip2 /archives/archived_builds/build-install-20160105:20:48:04.tar
> >>>> + rm -f /build/install/cores/liblist.txt
> >>>> + rm -f /build/install/cores/liblist.txt.tmp
> >>>> + echo Cores and build archived in
> >>>> http://slave21.cloud.gluster.org/archived_builds/build-install-20160105:20:48:04.tar.bz2
> >>>>
> >>>>
> >>>>
> >>>> Cores and build archived in
> >>>> http://slave21.cloud.gluster.org/archived_builds/build-install-20160105:20:48:04.tar.bz2
> >>>>
> >>>>
> >>>>
> >>>> + echo Open core using the following command to get a proper stack...
> >>>> Open core using the following command to get a proper stack...
> >>>> + echo Example: From root of extracted tarball
> >>>> Example: From root of extracted tarball
> >>>> + echo 'gdb -ex '\''set sysroot ./'\'' -ex '\''core-file
> >>>> ./build/install/cores/core.xxx'\'' <target, say
> >>>> ./build/install/sbin/glusterd>'
> >>>> gdb -ex 'set sysroot ./' -ex 'core-file
> >>>> ./build/install/cores/core.xxx' <target, say
> >>>> ./build/install/sbin/glusterd>
> >>>> + RET=1
> >>>> + '[' 1 -ne 0 ']'
> >>>> + filename=logs/glusterfs-logs-20160105:20:48:04.tgz
> >>>> + tar -czf /archives/logs/glusterfs-logs-20160105:20:48:04.tgz
> >>>> /var/log/glusterfs /var/log/messages /var/log/messages-20151129
> >>>> /var/log/messages-20151206 /var/log/messages-20151213
> >>>> /var/log/messages-20160104
> >>>> tar: Removing leading `/' from member names
> >>>> + echo Logs archived in
> >>>> http://slave21.cloud.gluster.org/logs/glusterfs-logs-20160105:20:48:04.tgz
> >>>>
> >>>>
> >>>>
> >>>> Logs archived in
> >>>> http://slave21.cloud.gluster.org/logs/glusterfs-logs-20160105:20:48:04.tgz
> >>>>
> >>>>
> >>>>
> >>>> + exit 1
> >>>> + RET=1
> >>>> + '[' 1 = 0 ']'
> >>>> + V=-1
> >>>> + VERDICT=FAILED
> >>>
> >>>
> >>> This run has failed due to a core in ec.
> >>>
> >>> Pranith, Xavi - can you please take a look?
> >>>
> >>
> >> Another regression run failed due to this core:
> >>
> >> https://build.gluster.org/job/regression-test-burn-in/206/consoleFull
> >>
> >> Can we please expedite resolution of this crash?
> >>
> >> Thanks,
> >> Vijay
> >>
> 
> _______________________________________________
> maintainers mailing list
> maintainers at gluster.org
> http://www.gluster.org/mailman/listinfo/maintainers
> 


More information about the maintainers mailing list