[Gluster-users] The continuing story ...

Thu Sep 10 15:50:04 UTC 2009

On Thu, Sep 10, 2009 at 5:37 PM, Stephan von
Krawczynski<skraw at ithnet.com> wrote:
>
>> > Only if backed up. Has the trace been shown to the linux developers?
>> > What do they think?
>
> Maybe we should just ask questions about the source before bothering others...
>
> From 2.0.6 /transport/socket/src/socket.c line 867 ff:
>
>                        new_trans = CALLOC (1, sizeof (*new_trans));
>                        new_trans->xl = this->xl;
>                        new_trans->fini = this->fini;
>
>                        memcpy (&new_trans->peerinfo.sockaddr, &new_sockaddr,
>                                addrlen);
>                        new_trans->peerinfo.sockaddr_len = addrlen;
>
>                        new_trans->myinfo.sockaddr_len =
>                                sizeof (new_trans->myinfo.sockaddr);
>
>                        ret = getsockname (new_sock,
>                                           SA (&new_trans->myinfo.sockaddr),
>                                           &new_trans->myinfo.sockaddr_len);
>
> CALLOC from libglusterfs/src/mem-pool.h:
> #define CALLOC(cnt,size) calloc(cnt,size)
>
> man calloc:
> RETURN VALUE
>       For calloc() and malloc(), the value returned is a pointer to the allocated memory, which is suitably aligned for any
>       kind of variable, or NULL if the request fails.
>
>
> Did I understand the source? What about calloc returning NULL?

Now, failing to check for NULL pointer here is a bug which we will fix
in future releases (blame it on our laziness for not doing the check
already!) Thanks for pointing it out.

As you can see if the bug is related to glusterfs we gracefully accept
and fix it! Not accepting a problem in glusterfs will be counter
productive for us. If you report a bug in glusterfs we thank you.

Server kernel lockup is not a glusterfs related problem and we do not
have any control over it :-) Anand and Mark have clearly and patiently
explained why.

As Mark suggested you can post it on Linux Kernel Mailing List, please
get back to us even if one of the kernel developers reply that the
kernel lockup you saw is not a kernel bug.

Talking about analogy, in a car assume that engine is the glusterfs
and tyres the kernel. If you get flat tyres and the car doesn't move
you can't blame the engine!

Thanks
Krishna