[Gluster-devel] add-brick

Anand Avati anand.avati at gmail.com
Sun Aug 19 07:13:08 UTC 2012


On Sat, Aug 18, 2012 at 9:13 PM, Emmanuel Dreyfus <manu at netbsd.org> wrote:

> > I understand this code checks that the layout cover the whole space,
> > is that right? Then it must be upset that layout->list[0] does not cover
> > anything. Since the error is transcient, I susepct a race condition:
> > the layout would be filled after that check. Is it possible? Where is
> > the layout crafted?
>
> I improved my test by completely deleting and re-creating the
> volume before adding a brick. Here is what happens when I add a brick:
>
> 1-vndfs-client-1: Connected to 192.0.2.103:24027, attached to remote
>   volume '/export/vnd1a'.
> 1-vndfs-client-1: Server and Client lk-version numbers are not same,
>   reopening the fds
> 0-fuse: switched to graph 1
>

Do you see all the protocol/client's connection succeeding on graph 1
before 'switched to graph 1' log? In the above excerpt you have shown
"client-1" succeeding log. What about other protocol clients?

Avati


> 1-vndfs-client-1: Server lk version = 1
> 1-vndfs-dht: missing disk layout on vndfs-client-0. err = -1
> 1-dht_layout_merge: ==> layout[0] 0 - 0 err -1
> 1-dht_layout_merge: ==> layout[1] 0 - 0 err 0
> 1-vndfs-dht: missing disk layout on vndfs-client-1. err = -1
> 1-dht_layout_merge: ==> layout[0] 0 - 0 err -1
> 1-dht_layout_merge: ==> layout[1] 0 - 0 err -1
>
> I am not sure it is expected behavior. The broken layout does not
> raise EINVAL to process using the filesystem, but later similar
> treatment will.
>
> After playing a bit I tested the race condition with this patch:
>
> --- a/xlators/cluster/dht/src/dht-common.c
> +++ b/xlators/cluster/dht/src/dht-common.c
> @@ -477,6 +477,11 @@ unlock:
>                          ret = dht_layout_normalize (this, &local->loc,
> layout);
>
>                          if (ret != 0) {
> +                                if (strcmp(local->loc.path, "/") == 0) {
> +                                        gf_log (this->name,
> GF_LOG_WARNING,
> +                                           "wAit 2s for DHT to
> settle...");
> +                                        sleep(2);
> +                                }
>                                  gf_log (this->name, GF_LOG_DEBUG,
>                                          "fixing assignment on %s",
>                                          local->loc.path);
>
> Here is the kind og log it procudes. I do not always see the EINVAL
> in the log, but it is never seen by processes using the filesystem.
> At least during the tests I did.
>
> [2012-08-19 06:04:06.288131] I [fuse-bridge.c:4195:fuse_graph_setup]
>   0-fuse: switched to graph 1
> [2012-08-19 06:04:06.289052] I [client-handshake.c:453:
>   client_set_lk_version_cbk] 1-vndfs-client-1: Server lk version = 1
> [2012-08-19 06:04:06.294234] W [dht-common.c:482:dht_lookup_dir_cbk]
>   1-vndfs-dht: wait 2s for DHT to settle...
> [2012-08-19 06:04:08.306937] I [client.c:2151:notify] 0-vndfs-client-0:
>   current graph is no longer active, destroying rpc_client
> [2012-08-19 06:04:08.308114] I [client.c:2090:client_rpc_notify]
>   0-vndfs-client-0: disconnected
> [2012-08-19 06:04:08.309833] W [fuse-resolve.c:151:fuse_resolve_gfid_cbk]
>   0-fuse: 4e4b4110-a585-4aae-b919-b2416355f5d1: failed to resolve
>   (Invalid argument)
> [2012-08-19 06:04:08.310275] E [fuse-bridge.c:353:fuse_lookup_resume]
>   0-fuse: failed to resolve path (null)
>
> But this probably does not really fix the problem. I got an unreproductible
> ENOENT for a directory while copying a hierarchy for instance.
>
> --
> Emmanuel Dreyfus
> manu at netbsd.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-devel/attachments/20120819/72ce56cb/attachment-0003.html>


More information about the Gluster-devel mailing list