[Gluster-devel] Change in glusterfs[master]: libglusterfs: use safer symbol resolution strategy with our ...

Anand Avati avati at redhat.com
Fri Aug 30 21:17:20 UTC 2013


[cc'ing gluster-devel on the final consensus]

On 08/30/2013 01:19 PM, Brian Foster wrote:
> On 08/30/2013 04:01 PM, Anand Avati wrote:
>> On 8/30/13 7:50 AM, Brian Foster wrote:
>>> On 08/30/2013 09:51 AM, Brian Foster wrote:
>>>> On 08/29/2013 09:08 PM, Emmanuel Dreyfus wrote:
>>>>> Anand Avati (Code Review) <review at dev.gluster.org> wrote:
>>>>>
>>> ...
>>>> TBH, I'm not totally sure how much this impacts things that aren't cross
>>>> DSO conflicts, so I ran a quick test. If I define a function in an
>>>> executable and a dso and call the function from both points, the library
>>>> always invokes the local version if the library is loaded via dlopen().
>>>> In fact, if I remove the local version from the library, I hit an
>>>> undefined symbol error when I attempt to invoke the library call.
>>>>
>>>> Note that the library does invoke the executable version of the function
>>>> if a compile time link dependency is made (i.e., even if the library is
>>>> still referenced via dlopen()/dlsym(), but not loaded by that method). I
>>>> suspect there is something going on here at compile/link time that
>>>> determines whether the executable exports the symbol, but that's an
>>>> initial guess based on observed behavior.
>>>>
>>>
>>> FYI, an elf dump of both executables shows that they differ in whether
>>> my duplicate symbol is listed in the executable .dynsym (dynamic symbol
>>> table) section or not.
>>>
>>> A dump of my locally installed qemu-kvm executable shows a couple
>>> "exported" block driver symbols: bdrv_aio_readv and bdrv_aio_writev.
>>>
>>> Brian
>>>
>>>> Given that, perhaps the best thing to do is hold off on the
>>>> RTLD_DEEPBIND bits until we understand the behavior a bit more
>>>> conclusively here and evaluate whether it's really an issue in the weird
>>>> qemu case. Thoughts?
>>>>
>>>> (FWIW, I think the change should at least hold with regard to the
>>>> RTLD_LOCAL bits. A translator is already intended to be a black box with
>>>> a few specially named interfaces. I don't see any reason to pollute the
>>>> global namespace with all kinds of extra symbols from different
>>>> translators).
>>>>
>>>>> NetBSD has different OS-specific flags, but I am not sure wether they
>>>>> overlap or not:
>>>>>
>>>>
>>>> Appears to define a similar behavior, but this apparently applies to an
>>>> explicit search of a symbol in a library as opposed to how the external
>>>> dependencies of the library are resolved.
>>>>
>>>> Brian
>>>>
>>>> P.S., All tests run on Linux.
>>>>
>>>>>> The following special handle values may be used with dlsym():
>>>>>> (...)
>>>>>>            RTLD_SELF       The search for symbol is limited to the
>>>>>> shared
>>>>>>                      object issuing the call to dlsym() and those
>>>>>> shared
>>>>>>                      objects which were loaded after it that are
>>>>>> visible.
>>>>>
>>>>
>>>
>>
>> Using RTLD_LOCAL solves one half of the confusion, and I think there is
>> no disagreement that RTLD_GLOBAL must be changed to RTLD_LOCAL. This
>> solves the problem of qemu-block translator's symbols not getting picked
>> up by qemu when libgfapi for the hypervisor's calls.
>>
>
> Agreed.
>
>> The other half of the problem - functions in qemu getting called instead
>> of same named functions in qemu-block translator is the open concern.
>> Brian, you report above that a DSO always uses the version which is
>> available in the same DSO. If that is the case (which sounds good), I
>> don't understand why RTLD_DEEPBIND is required.
>>
>
> Not always... I suspect there are two conditions here. 1.) the
> executable includes the symbol in its dynamic symbol table. 2.) the dso
> is compiled/linked to look for a symbol through the dynamic symbol
> tables (I suspect via PLT entries).
>
> The experiment I outlined before effectively toggles #1. By linking the
> library directly or not, I'm controlling whether the executable exports
> the dependent symbol. #2 it appears can be controlled by something like
> the -Bsymbolic linker option. For example, objdump a library with and
> without that option and you can see the call sites to a particular
> function either refer to library (relative?) offsets or plt entries.
>
> Using this option, the library always uses the local version regardless
> of whether the symbol is exported from the executable.
>
>> I am concerned about bugs like 994314, where a symbol defined in qemu
>> (uuid_is_null()) got picked up instead of the one in libglusterfs.so,
>> for a call made from libglusterfs.so. It may be the case that the
>> problem occured here because libglusterfs.so was not dlopen()ed, but
>> dynamically linked as -lglusterfs and build time.
>>
>
> It might not have mattered in that particular case. If the symbol was
> exported by the executable (i.e., condition #1), it probably/possibly
> could get bound to the unexpected symbol. What was the ultimate fix for
> that bug?
>
> While it appears that right now qemu does not export many bdrv_*
> symbols, it still seems like it could be a problem if we used those
> symbols or the nature of the executable changed in the future (i.e., #1
> is not under our control). For that reason, I'd suggest we use something
> like -Bsymbolic for qemu-block to address the second half of the problem
> (assuming it doesn't break anything else :P). I mentioned this on
> #gluster-dev btw, and Kaleb pointed out that this appears to also be
> more portable than RTLD_DEEPBIND.
>

OK. This seems good for now. I will resubmit 
http://review.gluster.org/5697 to only change RTLD_LOCAL instead of 
RTLD_GLOBAL (in all places of dlopen()). As a new patch I will send 
changes to makefile of the qemu-block translator to use -Bsymbolic 
linker flag, as that might involve additional investigation and testing. 
In the mean time, can we get review +1 for #5367 and #5366? I think they 
can go in independently as we have no issues to be used in non-gfapi use 
cases.

Avati

> Thoughts?
>
> Brian
>
>> However the text description of RTLD_DEEPBIND makes you think that the
>> observation of Brian is *not* the default behavior.
>>
>> Avati
>>
>





More information about the Gluster-devel mailing list