[Bugs] [Bug 1191437] build: issue with update of upstream build from 3.7dev-0.529 to 3.7dev-0.577

bugzilla at redhat.com bugzilla at redhat.com
Fri Feb 13 10:12:06 UTC 2015


https://bugzilla.redhat.com/show_bug.cgi?id=1191437

Kaushal <kaushal at redhat.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |CLOSED
         Resolution|---                         |CANTFIX
        Last Closed|                            |2015-02-13 05:12:06



--- Comment #2 from Kaushal <kaushal at redhat.com> ---
This is an issue caused by commit c8a6904 'uss: disable memory accounting for
the snapshot daemon'.

It was first observed during downstream testing a little while back. The
details of why this happens were found by Raghavendra Bhat, which I'm
paraphrasing his findings below.

There are 3 causes leading for the behaviour.
1. The patch brings a command line option to disable memory accounting in a
glusterfs process. This required the addition of a new member into the
cmd_args_t object which stores the command line flags. The cmd_args_t object is
embedded in the global context object, glusterfsd_ctx_t. The change to
cmd_args_t, caused the other members of glusterfsd_ctx_t to shift. This detail
is important.

2. The glusterfs version string for nightly builds from upstream master is
currently 3.7dev. This means glusterfs libraries will be installed under
/lib/glusterfs/3.7dev. With the nightly builds, just the release version
changes, but not the glusterfs binary version. This means that upgrades will
install libraries into /lib/glusterfs/3.7dev .

3. glusterfs-rdma package is installed after the glusterfs-server. This
behaviour is because we don't users to be forced to install rdma libraries if
they are not interested. This means that when glusterfs-server is being
updated, glusterfs-rdma libraries present in /lib/glusterfs/3.7dev belong to
the previous package.


When glusterd starts, it loads the rdma transport library and does some checks
to see if rdma can be supported on the machine. If the library is not present
it doesn't glusterd complains and continues (which is the source of a lot of
user confusion).

This checking happens during when glusterd is started during the upgrade as
well. Glusterd starts, loads the rdma library, and passes it global context
(glusterfsd_ctx_t) to the rdma library. But glusterd would have loaded the
older rdma library. As the binary version didn't change, glusterd searches for
the rdma library in /lib/glusterfs/3.7dev itself. It finds the rdma library
installed by the older release, as glusterfs-rdma package is only updated after
glusterfs-server. The rdma transport initialization requires a lock present in
the global context to be held. But, the rdma library recieves the newer shifted
global context object, and not the older object it is expecting. The rdma will
try to lock using the location of the lock struct as it knows, but as the lock
struct is shifted, it hangs. This is the hang observed.

This hang will not happen when upgrading between different package versions, as
the libraries will be installed into versioned locations. This can't happen on
a upgrade from 3.6 to 3.7 when released.

This hang will also not happen when upgrading from nightly build
glusterfs-3.7dev-0.545.git88136b5.autobuild (the first build to have the above
mentioned commit) to any newer versions.

There are workarounds for upgrades from nightly builds older than
3.7dev-0.545.git88136b5 to newer releases.
1. Remove glusterfs-rdma and don't install glusterfs-rdma and don't use
glusterfs-rdma ==> No problems!
2. If you want to have rdma installed,
   a. update glusterfs-rdma before updating other gluster packages. (Not sure
if this will work)
   b. remove glusterfs-rdma, update remaining packages, install new
glusterfs-rdma package.

This issue cannot be fixed from the code as any code solution will require
modification of the glusterfsd_ctx_t object, which will lead to the same
problem again.

Anyway, as this issue cannot happen between proper glusterfs releases, I'm
closing this as CANTFIX.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Bugs mailing list