[Gluster-Maintainers] GF_CALLOC to GF_MALLOC conversion - is it safe?

Nithya Balachandran nbalacha at redhat.com
Thu Mar 21 15:22:45 UTC 2019


On Thu, 21 Mar 2019 at 16:16, Atin Mukherjee <amukherj at redhat.com> wrote:

> All,
>
> In the last few releases of glusterfs, with stability as a primary theme
> of the releases, there has been lots of changes done on the code
> optimization with an expectation that such changes will have gluster to
> provide better performance. While many of these changes do help, but off
> late we have started seeing some diverse effects of them, one especially
> being the calloc to malloc conversions. While I do understand that malloc
> syscall will eliminate the extra memset bottleneck which calloc bears, but
> with recent kernels having in-built strong compiler optimizations I am not
> sure whether that makes any significant difference, but as I mentioned
> earlier certainly if this isn't done carefully it can potentially introduce
> lot of bugs and I'm writing this email to share one of such experiences.
>
> Sanju & I were having troubles for last two days to figure out why
> https://review.gluster.org/#/c/glusterfs/+/22388/ wasn't working in
> Sanju's system but it had no problems running the same fix in my gluster
> containers. After spending a significant amount of time, what we now
> figured out is that a malloc call [1] (which was a calloc earlier) is the
> culprit here. As you all can see, in this function we allocate txn_id and
> copy the event->txn_id into it through gf_uuid_copy () . But when we were
> debugging this step wise through gdb, txn_id wasn't exactly copied with the
> exact event->txn_id and it had some junk values which made the
> glusterd_clear_txn_opinfo to be invoked with a wrong txn_id later on
> resulting the leaks to remain the same which was the original intention of
> the fix.
>
> This was quite painful to debug and we had to spend some time to figure
> this out. Considering we have converted many such calls in past, I'd urge
> that we review all such conversions and see if there're any side effects to
> it. Otherwise we might end up running into many potential memory related
> bugs later on. OTOH, going forward I'd request every patch
> owners/maintainers to pay some special attention to these conversions and
> see they are really beneficial and error free. IMO, general guideline
> should be - for bigger buffers, malloc would make better sense but has to
> be done carefully, for smaller size, we stick to calloc.
>

> What do others think about it?
>

I believe that replacing calloc with malloc everywhere without adequate
testing and review is not safe and am against doing so for the following
reasons:

   1. Most of these patches have not been tested, especially the error
   paths.I have seen some that introduced issues in error scenarios with
   pointers being non-null.
   2. As Raghavendra said, the code assumes that certain elements will be
   initialized to null/zero and changing that can have consequences which are
   not immediately obvious. I think it might be worthwhile to go through the
   already merged calloc->malloc patches to check error paths and so on to see
   if they are safe.
   3. Making such changes to the libglusterfs code while we are currently
   working to stabilize the product is not a good idea. The patches take time
   to review and any errors introduced in the core pieces affect all processes
   and require significant effort to debug.

Yaniv, while the example you provided might make sense to change to malloc,
a lot of the other changes, in my opinion, do not for the effort required.
For performance testing, smallfile might be a useful tool to see if any of
the changes make a difference. That said, I am reluctant to take in patches
that change core code significantly without being tested or providing proof
of benefits.

We need better performance and scalability but that is going to need
changes in our algorithms and fop handling and that is what we need to be
working on. Such changes, when done right, will provide more benefits than
the micro optimizations. I think it unlikely the micro optimizations will
provide much benefit but am willing to be proven wrong if you have numbers
that show otherwise.

Regards,
Nithya


>
> [1]
> https://github.com/gluster/glusterfs/blob/master/xlators/mgmt/glusterd/src/glusterd-op-sm.c#L5681
> _______________________________________________
> maintainers mailing list
> maintainers at gluster.org
> https://lists.gluster.org/mailman/listinfo/maintainers
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/maintainers/attachments/20190321/5c299264/attachment.html>


More information about the maintainers mailing list