[Gluster-Maintainers] GF_CALLOC to GF_MALLOC conversion - is it safe?

Thu Mar 21 10:55:09 UTC 2019

On Thu, Mar 21, 2019 at 4:16 PM Atin Mukherjee <amukherj at redhat.com> wrote:

> All,
>
> In the last few releases of glusterfs, with stability as a primary theme
> of the releases, there has been lots of changes done on the code
> optimization with an expectation that such changes will have gluster to
> provide better performance. While many of these changes do help, but off
> late we have started seeing some diverse effects of them, one especially
> being the calloc to malloc conversions. While I do understand that malloc
> syscall will eliminate the extra memset bottleneck which calloc bears, but
> with recent kernels having in-built strong compiler optimizations I am not
> sure whether that makes any significant difference, but as I mentioned
> earlier certainly if this isn't done carefully it can potentially introduce
> lot of bugs and I'm writing this email to share one of such experiences.
>
> Sanju & I were having troubles for last two days to figure out why
> https://review.gluster.org/#/c/glusterfs/+/22388/ wasn't working in
> Sanju's system but it had no problems running the same fix in my gluster
> containers. After spending a significant amount of time, what we now
> figured out is that a malloc call [1] (which was a calloc earlier) is the
> culprit here. As you all can see, in this function we allocate txn_id and
> copy the event->txn_id into it through gf_uuid_copy () . But when we were
> debugging this step wise through gdb, txn_id wasn't exactly copied with the
> exact event->txn_id and it had some junk values which made the
> glusterd_clear_txn_opinfo to be invoked with a wrong txn_id later on
> resulting the leaks to remain the same which was the original intention of
> the fix.
>
> This was quite painful to debug and we had to spend some time to figure
> this out. Considering we have converted many such calls in past, I'd urge
> that we review all such conversions and see if there're any side effects to
> it. Otherwise we might end up running into many potential memory related
> bugs later on. OTOH, going forward I'd request every patch
> owners/maintainers to pay some special attention to these conversions and
> see they are really beneficial and error free. IMO, general guideline
> should be - for bigger buffers, malloc would make better sense but has to
> be done carefully, for smaller size, we stick to calloc.
>
> What do others think about it?
>

I too am afraid of unknown effects of this change as much of the codebase
relies on the assumption of zero-initialized data structures. I vote for
reverting these patches unless it can be demonstrated that performance
benefits are indeed significant. Otherwise the trade off in stability is
not worth the cost.

>
> [1]
> https://github.com/gluster/glusterfs/blob/master/xlators/mgmt/glusterd/src/glusterd-op-sm.c#L5681
> _______________________________________________
> maintainers mailing list
> maintainers at gluster.org
> https://lists.gluster.org/mailman/listinfo/maintainers
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/maintainers/attachments/20190321/b5f8d0a4/attachment-0001.html>