[Gluster-users] VM going down

Thu May 11 14:10:15 UTC 2017

On Thu, May 11, 2017 at 5:39 PM, Niels de Vos <ndevos at redhat.com> wrote:

> On Thu, May 11, 2017 at 12:35:42PM +0530, Krutika Dhananjay wrote:
> > Niels,
> >
> > Allesandro's configuration does not have shard enabled. So it has
> > definitely not got anything to do with shard not supporting seek fop.
>
> Yes, but in case sharding would have been enabled, the seek FOP would be
> handled correctly (detected as not supported at all).
>
> I'm still not sure how arbiter prevents doing shards though. We normally
> advise to use sharding *and* (optional) arbiter for VM workloads,
> arbiter without sharding has not been tested much. In addition, the seek
> functionality is only available in recent kernels, so there has been
> little testing on CentOS or similar enterprise Linux distributions.
>

That is not true. Both are independent. There are quite a few questions we
answered in the past ~1 year on gluster-users which don't use
sharding+arbiter but plain old 2+1 configuration.

>
>
> HTH,
> Niels
>
>
> > Copy-pasting volume-info output from the first mail:
> >
> > Volume Name: datastore2
> > Type: Replicate
> > Volume ID: c95ebb5f-6e04-4f09-91b9-bbbe63d83aea
> > Status: Started
> > Snapshot Count: 0
> > Number of Bricks: 1 x (2 + 1) = 3
> > Transport-type: tcp
> > Bricks:
> > Brick1: srvpve2g:/data/brick2/brick
> > Brick2: srvpve3g:/data/brick2/brick
> > Brick3: srvpve1g:/data/brick2/brick (arbiter)
> > Options Reconfigured:
> > nfs.disable: on
> > performance.readdir-ahead: on
> > transport.address-family: inet
> >
> >
> > -Krutika
> >
> >
> > On Tue, May 9, 2017 at 7:40 PM, Niels de Vos <ndevos at redhat.com> wrote:
> >
> > > ...
> > > > > client from
> > > > > srvpve2-162483-2017/05/08-10:01:06:189720-datastore2-client-0-0-0
> > > > > (version: 3.8.11)
> > > > > [2017-05-08 10:01:06.237433] E [MSGID: 113107]
> > > [posix.c:1079:posix_seek]
> > > > > 0-datastore2-posix: seek failed on fd 18 length 42957209600 [No
> such
> > > > > device or address]
> > >
> > > The SEEK procedure translates to lseek() in the posix xlator. This can
> > > return with "No suck device or address" (ENXIO) in only one case:
> > >
> > >     ENXIO    whence is SEEK_DATA or SEEK_HOLE, and the file offset is
> > >              beyond the end of the file.
> > >
> > > This means that an lseek() was executed where the current offset of the
> > > filedescriptor was higher than the size of the file. I'm not sure how
> > > that could happen... Sharding prevents using SEEK at all atm.
> > >
> > > ...
> > > > > The strange part is that I cannot seem to find any other error.
> > > > > If I restart the VM everything works as expected (it stopped at
> ~9.51
> > > > > UTC and was started at ~10.01 UTC) .
> > > > >
> > > > > This is not the first time that this happened, and I do not see any
> > > > > problems with networking or the hosts.
> > > > >
> > > > > Gluster version is 3.8.11
> > > > > this is the incriminated volume (though it happened on a different
> one
> > > too)
> > > > >
> > > > > Volume Name: datastore2
> > > > > Type: Replicate
> > > > > Volume ID: c95ebb5f-6e04-4f09-91b9-bbbe63d83aea
> > > > > Status: Started
> > > > > Snapshot Count: 0
> > > > > Number of Bricks: 1 x (2 + 1) = 3
> > > > > Transport-type: tcp
> > > > > Bricks:
> > > > > Brick1: srvpve2g:/data/brick2/brick
> > > > > Brick2: srvpve3g:/data/brick2/brick
> > > > > Brick3: srvpve1g:/data/brick2/brick (arbiter)
> > > > > Options Reconfigured:
> > > > > nfs.disable: on
> > > > > performance.readdir-ahead: on
> > > > > transport.address-family: inet
> > > > >
> > > > > Any hint on how to dig more deeply into the reason would be greatly
> > > > > appreciated.
> > >
> > > Probably the problem is with SEEK support in the arbiter functionality.
> > > Just like with a READ or a WRITE on the arbiter brick, SEEK can only
> > > succeed on bricks where the files with content are located. It does not
> > > look like arbiter handles SEEK, so the offset in lseek() will likely be
> > > higher than the size of the file on the brick (empty, 0 size file). I
> > > don't know how the replication xlator responds on an error return from
> > > SEEK on one of the bricks, but I doubt it likes it.
> > >
> > > We have https://bugzilla.redhat.com/show_bug.cgi?id=1301647 to support
> > > SEEK for sharding. I suggest you open a bug for getting SEEK in the
> > > arbiter xlator as well.
> > >
> > > HTH,
> > > Niels
> > >
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>

-- 
Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20170511/52d1aa1e/attachment.html>