[Gluster-users] VM going down

Krutika Dhananjay kdhananj at redhat.com
Thu May 11 07:05:42 UTC 2017


Niels,

Allesandro's configuration does not have shard enabled. So it has
definitely not got anything to do with shard not supporting seek fop.

Copy-pasting volume-info output from the first mail:

Volume Name: datastore2
Type: Replicate
Volume ID: c95ebb5f-6e04-4f09-91b9-bbbe63d83aea
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: srvpve2g:/data/brick2/brick
Brick2: srvpve3g:/data/brick2/brick
Brick3: srvpve1g:/data/brick2/brick (arbiter)
Options Reconfigured:
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet


-Krutika


On Tue, May 9, 2017 at 7:40 PM, Niels de Vos <ndevos at redhat.com> wrote:

> ...
> > > client from
> > > srvpve2-162483-2017/05/08-10:01:06:189720-datastore2-client-0-0-0
> > > (version: 3.8.11)
> > > [2017-05-08 10:01:06.237433] E [MSGID: 113107]
> [posix.c:1079:posix_seek]
> > > 0-datastore2-posix: seek failed on fd 18 length 42957209600 [No such
> > > device or address]
>
> The SEEK procedure translates to lseek() in the posix xlator. This can
> return with "No suck device or address" (ENXIO) in only one case:
>
>     ENXIO    whence is SEEK_DATA or SEEK_HOLE, and the file offset is
>              beyond the end of the file.
>
> This means that an lseek() was executed where the current offset of the
> filedescriptor was higher than the size of the file. I'm not sure how
> that could happen... Sharding prevents using SEEK at all atm.
>
> ...
> > > The strange part is that I cannot seem to find any other error.
> > > If I restart the VM everything works as expected (it stopped at ~9.51
> > > UTC and was started at ~10.01 UTC) .
> > >
> > > This is not the first time that this happened, and I do not see any
> > > problems with networking or the hosts.
> > >
> > > Gluster version is 3.8.11
> > > this is the incriminated volume (though it happened on a different one
> too)
> > >
> > > Volume Name: datastore2
> > > Type: Replicate
> > > Volume ID: c95ebb5f-6e04-4f09-91b9-bbbe63d83aea
> > > Status: Started
> > > Snapshot Count: 0
> > > Number of Bricks: 1 x (2 + 1) = 3
> > > Transport-type: tcp
> > > Bricks:
> > > Brick1: srvpve2g:/data/brick2/brick
> > > Brick2: srvpve3g:/data/brick2/brick
> > > Brick3: srvpve1g:/data/brick2/brick (arbiter)
> > > Options Reconfigured:
> > > nfs.disable: on
> > > performance.readdir-ahead: on
> > > transport.address-family: inet
> > >
> > > Any hint on how to dig more deeply into the reason would be greatly
> > > appreciated.
>
> Probably the problem is with SEEK support in the arbiter functionality.
> Just like with a READ or a WRITE on the arbiter brick, SEEK can only
> succeed on bricks where the files with content are located. It does not
> look like arbiter handles SEEK, so the offset in lseek() will likely be
> higher than the size of the file on the brick (empty, 0 size file). I
> don't know how the replication xlator responds on an error return from
> SEEK on one of the bricks, but I doubt it likes it.
>
> We have https://bugzilla.redhat.com/show_bug.cgi?id=1301647 to support
> SEEK for sharding. I suggest you open a bug for getting SEEK in the
> arbiter xlator as well.
>
> HTH,
> Niels
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170511/0e8606c9/attachment.html>


More information about the Gluster-users mailing list