[Gluster-devel] RADOS translator for GlusterFS
Samuel Just
sam.just at inktank.com
Mon May 5 18:23:50 UTC 2014
We do essentially lock entire objects for many purposes. This isn't
generally a problem (and greatly simplifies many bits of the
implementation) because all existing rados users employ some form of
chunking/striping. That said, it's probably a good thing to punt on
for a prototype.
-Sam
On Mon, May 5, 2014 at 11:07 AM, Jeff Darcy <jdarcy at redhat.com> wrote:
>> It's very important, several kinds of blocking are done at object
>> granularity. Off the top of my head, large objects would cause deep
>> scrub and recovery to stall requests for longer. Elephant objects
>> would also be able to skew data distribution.
>
> There are some definite parallels here to discussions we've had in
> Gluster-land, which we might as well go through because people from
> either "parent" won't have heard the other. The data distribution
> issue has turned out to be a practical non-issue for GlusterFS
> users. Sure, if you have very few "elephant objects" on very few
> small-ish bricks (our equivalent of OSDs) then you can get skewed
> distribution. On the other hand, that problem *very* quickly
> solves itself for even moderate object and brick counts, to the
> point that almost no users have found it useful to enable striping.
> Has your experience been different, or do you not know because
> striping is mandatory instead of optional?
>
> The "deep scrub and recovery" point brings up a whole different
> set of memories. We used to have a problem in GlusterFS where
> self-heal would lock an entire file while it ran, so other access
> to that file would be blocked for a long time. This would cause
> VMs to hang, for example. In either 3.3 or 3.4 (can't remember)
> we added "granular self-heal" which would only lock the portion
> of the file that was currently under repair, in a sort of rolling
> fashion. From your comment, it sounds like RADOS still locks the
> entire object. Is that correct? If so, I posit that it's
> something we wouldn't need to solve in a prototype. If/when that
> starts turning into something real, then we'd have two options.
> One is to do striping as you suggest, which means solving all of
> the associated coordination problems. Another would be to do
> something like what GlusterFS did, with locking at the sub-object
> level. That does make repair less atomic, which some would
> consider a consistency problem, but we do have some evidence that
> it's a violation users don't seem to care about.
>
>
More information about the Gluster-devel
mailing list