[Gluster-devel] Issues with fallocate, discard and zerofill

M. Mohan Kumar mohan at in.ibm.com
Tue Oct 1 17:28:31 UTC 2013


Anand Avati <anand.avati at gmail.com> writes:

> It is cleaner to implement it as a separate fop. The complexity of
> overloading writev() is unnecessary. There would be a whole bunch of new
> if/else condititions to be introduced in existing code, and modules like
> write-behind, stripe etc. where special action is taken in multiple places
> based on size (and offset into the buffer), would be very delicate error
> prone changes.
>
> That being said, I still believe the FOP interface should be similar to
> SCSI write_same, something like this:
>
> int fop_write_same (call_frame_t *frame, xlator_t *this, fd_t *fd, void
> *buf, size_t len, off_t offset, int repeat);
>
> and zerofill would be a gfapi wrapper around write_same:
>
> int zerofill (call_frame_t *frame, xlator_t *this, fd_t *fd, off_t offset,
> int len)
> {
>   char zero[1] = {0};
>
>   return fop_write_same (frame, this, fd, zero, 1, offset, len);
> }
>
> Avati

I understand your point about adding write_same FOP to align with SCSI
write same command, but as of now Linux supports ioctl to only zero out the
SCSI blocks. In future if Linux SCSI subsystem supports full write same
command, this FOP will be useful?

Write same always takes one block of data as input and start lba and
number of LBA to write the data. This one block of data could be 512
bytes (almost all disks now have 512 byte as LBA size) or it could be
4096 bytes also. So should an user specify 

int fop_write_same (call_frame_t *frame, xlator_t *this, fd_t *fd,
    void *outbuf, off_t offset, off_t length);

For assumptioon if sizeof outbuf is 4096 byte and SCSI device LBA size
is 512 bytes, SCSI write same command will ignore data from offset 512 -
4096 at outbuf buffer.

Also SCSI write same supports so many paramters like NDOB (No Data Out
Buffer), should we really mimic scsi write same command in GlusterFS as
fop or zerofill fop will do?
 

>
>
> On Thu, Sep 5, 2013 at 10:28 PM, M. Mohan Kumar <mohan at in.ibm.com> wrote:
>
>> Anand Avati <anand.avati at gmail.com> writes:
>>
>> Hi Shishir,
>>
>> Its possible to overload writev FOP for achieving zerofill
>> functionality. Is there any open issues with this zerofill functionality
>> even after overloading in writev?
>>
>> > Shishir,
>> > Is this in reference to the dht open file rebalance (of replaying the
>> > operations to the destination server)? I am assuming so, as that is
>> > something which has to be handled.
>> >
>> > The other question is how should fallocate/discard be handled by
>> self-heal
>> > in AFR. I'm not sure how important it is, but will be certainly good to
>> > bounce some ideas off here. Maybe we should implement a fiemap fop to
>> query
>> > extents/holes and replay them in the other serverl?
>> >
>> > Avati
>> >
>> >
>> >
>> > On Tue, Aug 13, 2013 at 10:49 PM, Bharata B Rao <bharata.rao at gmail.com
>> >wrote:
>> >
>> >> Hi Avati, Brian,
>> >>
>> >> During the recently held gluster meetup, Shishir mentioned about a
>> >> potential problem (related to fd migration etc) in the zerofill
>> >> implementation (http://review.gluster.org/#/c/5327/) and also
>> >> mentioned that same/similar issues are present with fallocate and
>> >> discard implementations. Since zerofill has been modelled on
>> >> fallocate/discard, I was wondering if it would be possible to address
>> >> these issues in fallocate/discard first so that we could potentially
>> >> follow the same in zerofill implementation.
>> >>
>> >> Regards,
>> >> Bharata.
>> >> --
>> >> http://raobharata.wordpress.com/
>> >>
>> >> _______________________________________________
>> >> Gluster-devel mailing list
>> >> Gluster-devel at nongnu.org
>> >> https://lists.nongnu.org/mailman/listinfo/gluster-devel
>> >>
>> > _______________________________________________
>> > Gluster-devel mailing list
>> > Gluster-devel at nongnu.org
>> > https://lists.nongnu.org/mailman/listinfo/gluster-devel
>>
>>





More information about the Gluster-devel mailing list