[Gluster-users] VM going down

Thu May 11 15:18:45 UTC 2017

On Thu, May 11, 2017 at 06:05:59PM +0530, Ravishankar N wrote:
> On 05/11/2017 05:49 PM, Niels de Vos wrote:
> > On Wed, May 10, 2017 at 09:08:03PM +0530, Pranith Kumar Karampuri wrote:
> > > On Wed, May 10, 2017 at 7:11 PM, Niels de Vos <ndevos at redhat.com> wrote:
> > > 
> > > > On Wed, May 10, 2017 at 04:08:22PM +0530, Pranith Kumar Karampuri wrote:
> > > > > On Tue, May 9, 2017 at 7:40 PM, Niels de Vos <ndevos at redhat.com> wrote:
> > > > > 
> > > > > > ...
> > > > > > > > client from
> > > > > > > > srvpve2-162483-2017/05/08-10:01:06:189720-datastore2-client-0-0-0
> > > > > > > > (version: 3.8.11)
> > > > > > > > [2017-05-08 10:01:06.237433] E [MSGID: 113107]
> > > > > > [posix.c:1079:posix_seek]
> > > > > > > > 0-datastore2-posix: seek failed on fd 18 length 42957209600 [No
> > > > such
> > > > > > > > device or address]
> > > > > > The SEEK procedure translates to lseek() in the posix xlator. This can
> > > > > > return with "No suck device or address" (ENXIO) in only one case:
> > > > > > 
> > > > > >      ENXIO    whence is SEEK_DATA or SEEK_HOLE, and the file offset is
> > > > > >               beyond the end of the file.
> > > > > > 
> > > > > > This means that an lseek() was executed where the current offset of the
> > > > > > filedescriptor was higher than the size of the file. I'm not sure how
> > > > > > that could happen... Sharding prevents using SEEK at all atm.
> > > > > > 
> > > > > > ...
> > > > > > > > The strange part is that I cannot seem to find any other error.
> > > > > > > > If I restart the VM everything works as expected (it stopped at
> > > > ~9.51
> > > > > > > > UTC and was started at ~10.01 UTC) .
> > > > > > > > 
> > > > > > > > This is not the first time that this happened, and I do not see any
> > > > > > > > problems with networking or the hosts.
> > > > > > > > 
> > > > > > > > Gluster version is 3.8.11
> > > > > > > > this is the incriminated volume (though it happened on a different
> > > > one
> > > > > > too)
> > > > > > > > Volume Name: datastore2
> > > > > > > > Type: Replicate
> > > > > > > > Volume ID: c95ebb5f-6e04-4f09-91b9-bbbe63d83aea
> > > > > > > > Status: Started
> > > > > > > > Snapshot Count: 0
> > > > > > > > Number of Bricks: 1 x (2 + 1) = 3
> > > > > > > > Transport-type: tcp
> > > > > > > > Bricks:
> > > > > > > > Brick1: srvpve2g:/data/brick2/brick
> > > > > > > > Brick2: srvpve3g:/data/brick2/brick
> > > > > > > > Brick3: srvpve1g:/data/brick2/brick (arbiter)
> > > > > > > > Options Reconfigured:
> > > > > > > > nfs.disable: on
> > > > > > > > performance.readdir-ahead: on
> > > > > > > > transport.address-family: inet
> > > > > > > > 
> > > > > > > > Any hint on how to dig more deeply into the reason would be greatly
> > > > > > > > appreciated.
> > > > > > Probably the problem is with SEEK support in the arbiter functionality.
> > > > > > Just like with a READ or a WRITE on the arbiter brick, SEEK can only
> > > > > > succeed on bricks where the files with content are located. It does not
> > > > > > look like arbiter handles SEEK, so the offset in lseek() will likely be
> > > > > > higher than the size of the file on the brick (empty, 0 size file). I
> > > > > > don't know how the replication xlator responds on an error return from
> > > > > > SEEK on one of the bricks, but I doubt it likes it.
> > > > > > 
> > > > > inode-read fops don't get sent to arbiter brick. So this won't happen.
> > > > Yes, I see that the arbiter xlator returns on reads without going to the
> > > > bricks. Should that not be done for seek as well? It's the first time I
> > > > actually looked at the code of the arbiter xlator, so I might well be
> > > > misunderstanding how it works :)
> > > > 
> > > inode-read fops are the fops which read some information from the inode.
> > > Like stat/getxattr/read. Even seek falls in that category. It is not sent
> > > on arbiter brick...
> > What confuses me is that the arbiter xlator defines the following FOPs
> > in xlators/features/arbiter/src/arbiter.c:
> AFR has a list of readable subvols on which all read related FOPS are wound.
> For arbiter volumes, we mark the arbiter as non-readable during lookup cbk.
> So any read FOP is not wound to arbiter anymore. This change was made at a
> later stage after arbiter_readv was  coded initially to send an error. So in
> the current code, arbiter_readv should never get hit.

Aha! Thanks, that explains it well.

> >      struct xlator_fops fops = {
> >              .lookup = arbiter_lookup,
> >              .readv  = arbiter_readv,
> >              .truncate = arbiter_truncate,
> >              .writev = arbiter_writev,
> >              .ftruncate = arbiter_ftruncate,
> >              .fallocate = arbiter_fallocate,
> >              .discard = arbiter_discard,
> >              .zerofill = arbiter_zerofill,
> >      };
> > 
> > 
> > To go back to the error message:
> > 
> >    [posix.c:1079:posix_seek] 0-datastore2-posix: seek failed on fd 18 length 42957209600 [No such device or address]
> > 
> > We need to know on which brick this occurs to confirm that is was not
> > sent on the arbiter brick somehow.
> 
> This is what Alessandro said earlier in the thread:
> 
> "Also the seek errors where there before when there was no arbiter (only 2
> replica)."

Ok, I missed that detail. We then just need to figure out why QEMU and
FUSE try to do an lseek() with an offset of 42957209600 while the file
is not that large...

Any ideas how that can happen?

Niels
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170511/bf6cb042/attachment.sig>