<div dir="ltr"><div dir="ltr"><div class="gmail_default" style="font-size:small;color:#000000"><span style="color:rgb(34,34,34)">On Thu, Aug 22, 2019 at 10:03 AM Niels de Vos <<a href="mailto:ndevos@redhat.com">ndevos@redhat.com</a>> wrote:</span><br></div></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Wed, Aug 21, 2019 at 07:04:17PM +0200, Max Reitz wrote:<br>
> On 17.08.19 23:21, Nir Soffer wrote:<br>
> > Implement alignment probing similar to file-posix, by reading from the<br>
> > first 4k of the image.<br>
> > <br>
> > Before this change, provisioning a VM on storage with sector size of<br>
> > 4096 bytes would fail when the installer try to create filesystems. Here<br>
> > is an example command that reproduces this issue:<br>
> > <br>
> > $ qemu-system-x86_64 -accel kvm -m 2048 -smp 2 \<br>
> > -drive file=gluster://gluster1/gv0/fedora29.raw,format=raw,cache=none \<br>
> > -cdrom Fedora-Server-dvd-x86_64-29-1.2.iso<br>
> > <br>
> > The installer fails in few seconds when trying to create filesystem on<br>
> > /dev/mapper/fedora-root. In error report we can see that it failed with<br>
> > EINVAL (I could not extract the error from guest).<br>
> > <br>
> > Copying disk fails with EINVAL:<br>
> > <br>
> > $ qemu-img convert -p -f raw -O raw -t none -T none \<br>
> > gluster://gluster1/gv0/fedora29.raw \<br>
> > gluster://gluster1/gv0/fedora29-clone.raw<br>
> > qemu-img: error while writing sector 4190208: Invalid argument<br>
> > <br>
> > This is a fix to same issue fixed in commit a6b257a08e3d (file-posix:<br>
> > Handle undetectable alignment) for gluster:// images.<br>
> > <br>
> > This fix has the same limit, that the first block of the image should be<br>
> > allocated, otherwise we cannot detect the alignment and fallback to a<br>
> > safe value (4096) even when using storage with sector size of 512 bytes.<br>
> > <br>
> > Signed-off-by: Nir Soffer <<a href="mailto:nsoffer@redhat.com" target="_blank">nsoffer@redhat.com</a>><br>
> > ---<br>
> > block/gluster.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++<br>
> > 1 file changed, 47 insertions(+)<br>
> > <br>
> > diff --git a/block/gluster.c b/block/gluster.c<br>
> > index f64dc5b01e..d936240b72 100644<br>
> > --- a/block/gluster.c<br>
> > +++ b/block/gluster.c<br>
> > @@ -52,6 +52,9 @@<br>
> > <br>
> > #define GERR_INDEX_HINT "hint: check in 'server' array index '%d'\n"<br>
> > <br>
> > +/* The value is known only on the server side. */<br>
> > +#define MAX_ALIGN 4096<br>
> > +<br>
> > typedef struct GlusterAIOCB {<br>
> > int64_t size;<br>
> > int ret;<br>
> > @@ -902,8 +905,52 @@ out:<br>
> > return ret;<br>
> > }<br>
> > <br>
> > +/*<br>
> > + * Check if read is allowed with given memory buffer and length.<br>
> > + *<br>
> > + * This function is used to check O_DIRECT request alignment.<br>
> > + */<br>
> > +static bool gluster_is_io_aligned(struct glfs_fd *fd, void *buf, size_t len)<br>
> > +{<br>
> > + ssize_t ret = glfs_pread(fd, buf, len, 0, 0, NULL);<br>
> > + return ret >= 0 || errno != EINVAL;<br>
> <br>
> Is glfs_pread() guaranteed to return EINVAL on invalid alignment?<br>
> file-posix says this is only the case on Linux (for normal files). Now<br>
> I also don’t know whether the gluster driver works on anything but Linux<br>
> anyway.<br>
<br>
The behaviour depends on the filesystem used by the Gluster backend. XFS<br>
is the recommendation, but in the end it is up to the users. The Gluster<br>
server is known to work on Linux, NetBSD and FreeBSD, the vast majority<br>
of users runs it on Linux.<br>
<br>
I do not think there is a strong guarantee EINVAL is always correct. How<br>
about only checking 'ret > 0'?<br></blockquote><div><br></div><div><div class="gmail_default" style="font-size:small;color:rgb(0,0,0)">Looks like we don't have a choice.</div></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
> <br>
> > +}<br>
> > +<br>
> > +static void gluster_probe_alignment(BlockDriverState *bs, struct glfs_fd *fd,<br>
> > + Error **errp)<br>
> > +{<br>
> > + char *buf;<br>
> > + size_t alignments[] = {1, 512, 1024, 2048, 4096};<br>
> > + size_t align;<br>
> > + int i;<br>
> > +<br>
> > + buf = qemu_memalign(MAX_ALIGN, MAX_ALIGN);<br>
> > +<br>
> > + for (i = 0; i < ARRAY_SIZE(alignments); i++) {<br>
> > + align = alignments[i];<br>
> > + if (gluster_is_io_aligned(fd, buf, align)) {<br>
> > + /* Fallback to safe value. */<br>
> > + bs->bl.request_alignment = (align != 1) ? align : MAX_ALIGN;<br>
> > + break;<br>
> > + }<br>
> > + }<br>
> <br>
> I don’t like the fact that the last element of alignments[] should be<br>
> the same as MAX_ALIGN, without that ever having been made explicit anywhere.<br>
> <br>
> It’s a bit worse in the file-posix patch, because if getpagesize() is<br>
> greater than 4k, max_align will be greater than 4k. But MAX_BLOCKSIZE<br>
> is 4k, too, so I suppose we wouldn’t support any block size beyond that<br>
> anyway, which makes the error message appropriate still.<br>
> <br>
> > +<br>
> > + qemu_vfree(buf);<br>
> > +<br>
> > + if (!bs->bl.request_alignment) {<br>
> > + error_setg(errp, "Could not find working O_DIRECT alignment");<br>
> > + error_append_hint(errp, "Try cache.direct=off\n");<br>
> > + }<br>
> > +}<br>
> > +<br>
> > static void qemu_gluster_refresh_limits(BlockDriverState *bs, Error **errp)<br>
> > {<br>
> > + BDRVGlusterState *s = bs->opaque;<br>
> > +<br>
> > + gluster_probe_alignment(bs, s->fd, errp);<br>
> > +<br>
> > + bs->bl.min_mem_alignment = bs->bl.request_alignment;<br>
> <br>
> Well, I’ll just trust you that there is no weird system where the memory<br>
> alignment is greater than the request alignment.<br>
> <br>
> > + bs->bl.opt_mem_alignment = MAX(bs->bl.request_alignment, MAX_ALIGN);<br>
> <br>
> Isn’t request_alignment guaranteed to not exceed MAX_ALIGN, i.e. isn’t<br>
> this always MAX_ALIGN?<br>
> <br>
> > bs->bl.max_transfer = GLUSTER_MAX_TRANSFER;<br>
> > }<br>
> <br>
> file-posix has a check in raw_reopen_prepare() whether we can find a<br>
> working alignment for the new FD. Shouldn’t we do the same in<br>
> qemu_gluster_reopen_prepare()?<br>
<br>
Yes, I think that makes sense too.<br></blockquote><div><br></div><div><div class="gmail_default" style="font-size:small;color:rgb(0,0,0)">I'l add it in v2.</div><div class="gmail_default" style="font-size:small;color:rgb(0,0,0)"><br></div><div class="gmail_default" style="font-size:small;color:rgb(0,0,0)">Thanks for reviewing.</div></div><div class="gmail_default" style="font-size:small;color:rgb(0,0,0)"><br></div><div class="gmail_default" style="font-size:small;color:rgb(0,0,0)">Nir</div></div></div>