[Gluster-devel] non-blocking connect() returned: 111 (Connection refused) [solved]

Thu Dec 18 08:54:51 UTC 2008

En/na Basavanagowda Kanur ha escrit:
> Jordi,
>   Do you have any firewall running on machines?
>
> --
> gowda
>
> On Thu, Dec 18, 2008 at 1:47 PM, Jordi Moles Blanco <jordi at cdmon.com 
> <mailto:jordi at cdmon.com>> wrote:
>
>     En/na Raghavendra G ha escrit:
>
>         Hi Jordi,
>
>         Have you started glusterfsd on each of the newly added nodes?
>         If not, please start them.
>
>         some comments have been inlined.
>
>         On Wed, Dec 17, 2008 at 3:28 PM, Jordi Moles Blanco
>         <jordi at cdmon.com <mailto:jordi at cdmon.com>
>         <mailto:jordi at cdmon.com <mailto:jordi at cdmon.com>>> wrote:
>
>            Hi,
>
>            i've got 6 nodes providing a storage unit with gluster 2.5
>         patch
>            800. They are set in 2 groups of 3 nodes each.
>
>            On top of that, i've got a Xen 3.2 machine storing its virtual
>            machines in gluster mount point.
>
>            The thing is that i used to have only 2 nodes for group,
>         that's 4
>            nodes in total, and today I'm trying to add 1 extra node
>         for each
>            group.
>
>            This is the final setting on Xen's Side:
>
>
>            **************
>
>            volume espai1
>                  type protocol/client
>                  option transport-type tcp/client
>                  option remote-host 10.0.0.3
>                  option remote-subvolume espai
>            end-volume
>
>            volume espai2
>                  type protocol/client
>                  option transport-type tcp/client
>                  option remote-host 10.0.0.4
>                  option remote-subvolume espai
>            end-volume
>
>            volume espai3
>                  type protocol/client
>                  option transport-type tcp/client
>                  option remote-host 10.0.0.5
>                  option remote-subvolume espai
>            end-volume
>
>            volume espai4
>              type protocol/client
>              option transport-type tcp/client
>              option remote-host 10.0.0.6
>              option remote-subvolume espai
>            end-volume
>
>            volume espai5
>              type protocol/client
>              option transport-type tcp/client
>              option remote-host 10.0.0.7
>              option remote-subvolume espai
>            end-volume
>
>            volume espai6
>              type protocol/client
>              option transport-type tcp/client
>              option remote-host 10.0.0.8
>              option remote-subvolume espai
>            end-volume
>
>            volume namespace1
>                  type protocol/client
>                  option transport-type tcp/client
>                  option remote-host 10.0.0.4
>                  option remote-subvolume nm
>            end-volume
>
>            volume namespace2
>                  type protocol/client
>                  option transport-type tcp/client
>                  option remote-host 10.0.0.5
>                  option remote-subvolume nm
>            end-volume
>
>            volume grup1
>                  type cluster/afr
>                  subvolumes espai1 espai3 espai5
>            end-volume
>
>            volume grup2
>                  type cluster/afr
>                  subvolumes espai2 espai4 espai6
>            end-volume
>
>            volume nm
>                  type cluster/afr
>                  subvolumes namespace1 namespace2
>            end-volume
>
>            volume g01
>                  type cluster/unify
>                  subvolumes grup1 grup2
>                  option scheduler rr
>                  option namespace nm
>            end-volume
>
>            volume io-cache
>                  type performance/io-cache
>                  option cache-size 512MB
>                  option page-size 1MB
>                  option force-revalidate-timeout 2
>                  subvolumes g01
>            end-volume  
>            **************
>
>            so... i stopped all virtual machines, unmounted gluster on Xen,
>            updated the spec file (the one above) and ran gluster again
>         in Xen.
>
>            I've set different gluster environments but i had never tried
>            this, and now i'm facing some problems.
>
>            For what i had read before this... i used to think that when
>            adding and extra node to a group and "remounting" on client's
>            side, the Healing feature would copy all the content of the
>         other
>            nodes already present in the group to the "new one". That
>         hasn't
>            happened, even when I've tried to force the file system, by
>            listing the files or doing what you suggest in you
>         documentation:
>
>            **********
>
>            find /mnt/glusterfs -type f -print0 | xargs -0 head -c1
>         >/dev/null
>
>            **********
>
>            so... my first question would be... does "self-healing"
>         work this
>            way? If it doesn't.... which is the best way to add a node to a
>            group? Do i have to run a "copy" command manually to get
>         the new
>            node ready?
>            I've also noticed that i have necessarily to umount gluster
>         from
>            Xen. Is there a way to avoid stopping all the virtual machines,
>            umounting and mounting again? Is there a feature like "refresh
>            config file"?
>
>
>         Hot add ("refresh config file") is in the roadmap.
>          
>
>
>            And finally... i looked into the logs to see why self-healing
>            wasn't working, and i found this on Xen's Side:
>
>            **********
>            2008-12-17 12:08:30 E [tcp-client.c:190:tcp_connect] espai6:
>            non-blocking connect() returned: 111 (Connection refused)
>            **********
>
>            and it keeps saying this when i want to access  files which
>         were
>            created in the "old nodes".
>
>            is this a bug? how can i work around this?
>
>            If i create new stuff, though, it replicates to the 3 nodes, no
>            problem with that.... the only problem is with the old
>         files that
>            were already present before i added the new node.
>
>            Thanks for your help in advance, and let me know if you
>         need any
>            further information.
>
>
>
>
>            _______________________________________________
>            Gluster-devel mailing list
>            Gluster-devel at nongnu.org <mailto:Gluster-devel at nongnu.org>
>         <mailto:Gluster-devel at nongnu.org
>         <mailto:Gluster-devel at nongnu.org>>
>
>            http://lists.nongnu.org/mailman/listinfo/gluster-devel
>
>
>
>
>         -- 
>         Raghavendra G
>
>
>     hi, yes.
>
>     when gluster behaves like this, all nodes are running. As i said,
>     when you create new data, it replicates to all the nodes of each
>     group, so it's working fine.
>     However, it keeps logging "connection refused", which i though was
>     reported only when a node wasn't available, but they are all
>     available and replicating data fine.
>
>     The thing, though, is that old data is not beeing replicated into
>     the new nodes?
>
>     Is there any way to "force" replication to the new nodes? Could i
>     be getting somehow the "connection refused" because new nodes
>     won't accept previous data?
>
>     Thanks for your help.
>
>
>
>     _______________________________________________
>     Gluster-devel mailing list
>     Gluster-devel at nongnu.org <mailto:Gluster-devel at nongnu.org>
>     http://lists.nongnu.org/mailman/listinfo/gluster-devel
>
>
>
>
> -- 
> hard work often pays off after time, but laziness always pays off now

Well.... sorry for having you bothered about this, i found what the 
problem was in the end.

I got mixed up with a couple of things:

-On the one hand, in .vol file in Xen there was a mistake, one node was 
declared with a wrong ip address, so it was giving the "connection 
refused" status. I didn't pay enough attention to all 6 nodes, and 5 
were replicating OK and i missed the one it wasn't and i thought that 
they were all working fine.
-On the other hand, old data was not replicated to the new nodes because 
i didn't set the attributes to "trusted gluster" when adding new nodes. 
Now all the data appears fine in all nodes.

Sorry for that and thanks for your help and patience :)