[Gluster-users] glusterd service fails to start from AWS AMI

Wed Mar 5 21:40:42 UTC 2014

Hi Carlos,

Thanks for the input.  You guess right, I'm using the latest RH distro.

As for the IP/hostname resolution, that should have been accounted for using the elastic IPs.  When an instance within the private network attempts to resolve the public DNS containing an elastic IP, it gets pointed to the private IP address of the machine.  The /etc/hosts file contains these elastic IPs and ssh / ping works find.  I'm also not sure if a hostname resolution should cause a failure on start.  Does someone have an answer to that?

----- Original Message -----
From: "Carlos Capriotti" <capriotti.carlos at gmail.com>
To: "Jon Cope" <jcope at redhat.com>
Cc: gluster-users at gluster.org
Sent: Tuesday, March 4, 2014 4:29:31 PM
Subject: Re: [Gluster-users] glusterd service fails to start from AWS AMI

I don't want to sound simplistic, but seems to be name resolution/network
related.

Again, I DO know you email ends with redhat.com, but just to make sure,
Gluster is running on what distro ? I never dealt with Amazon's platform,
so ignorance here is abundant.

The reason why I am asking is that I am stress-testing my first (on
premisses) install, and I ran into a problem that I am choosing to ignore
for now, but will have to solve in the future: DNS resolution stops working
after a while.

I am using CentOS 6.5, with Gluster 3.4.2. I have a bonded NIC, made out of
two physical ones and a third NIC for management.

I realized that, despite the fact that I have manually configured all
interfaces, disabled user control (may be this), disabled NM access to
them, and even tried to update resolv.conf, after a reboot, name resolution
does not work.

While the NICs were working with NM and/or DHCP, all went file, but after
tailoring my ifcfg-* files, DNS went south.

You said your name resolution does work. Maybe an entry on your hosts file
just to test ?

Another thought would be using 3.4.2, instead of 3.4.0.

Just wanted to share.

KR,

Carlos

On Tue, Mar 4, 2014 at 10:45 PM, Jon Cope <jcope at redhat.com> wrote:

> Hello all.
>
> I have a working replica 2 cluster (4 nodes) up and running happily over
> Amazon EC2.  My end goal is to create AMIs of each machine and then quickly
> reproduce the same, but new, cluster from those AMIs.  Essentially, I'd
> like a cluster "template".
>
> -Assigned original instances' Elastic IPs to new machines to reduce
> resolution issues.
> -Passwordless SSH works on initial boot across all machines
> -Node1: has no evident issue.  Starts with glusterd running.
> -Node1: 'gluster peer status' returns correct public DNS / hostnames for
> peer nodes.  Status: (Disconnected)  --since the service is off on them
>
> Since my goal is to create a cluster template, reinstalling gluster for
> each node, though it'll probably work, isn't an optimal solution.
>
> Thank You
>
> #  Node2: etc-glusterfs-glusterd.vol.log
> #  Begins at 'service glusterd start' command entry
>
> [2014-03-04 21:20:30.532138] I [glusterfsd.c:2024:main]
> 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version
> 3.4.0.44rhs (/usr/sbin/glusterd --pid-file=/var/run/glusterd.pid)
> [2014-03-04 21:20:30.539331] I [glusterd.c:1020:init] 0-management: Using
> /var/lib/glusterd as working directory
> [2014-03-04 21:20:30.542578] I [socket.c:3485:socket_init]
> 0-socket.management: SSL support is NOT enabled
> [2014-03-04 21:20:30.542603] I [socket.c:3500:socket_init]
> 0-socket.management: using system polling thread
> [2014-03-04 21:20:30.543203] C [rdma.c:4099:gf_rdma_init]
> 0-rpc-transport/rdma: Failed to get IB devices
> [2014-03-04 21:20:30.543342] E [rdma.c:4990:init] 0-rdma.management:
> Failed to initialize IB Device
> [2014-03-04 21:20:30.543375] E [rpc-transport.c:320:rpc_transport_load]
> 0-rpc-transport: 'rdma' initialization failed
> [2014-03-04 21:20:30.543471] W [rpcsvc.c:1387:rpcsvc_transport_create]
> 0-rpc-service: cannot create listener, initing the transport failed
> [2014-03-04 21:20:37.116571] I
> [glusterd-store.c:1388:glusterd_restore_op_version] 0-glusterd: retrieved
> op-version: 2
> [2014-03-04 21:20:37.120082] E
> [glusterd-store.c:1905:glusterd_store_retrieve_volume] 0-: Unknown key:
> brick-0
> [2014-03-04 21:20:37.120118] E
> [glusterd-store.c:1905:glusterd_store_retrieve_volume] 0-: Unknown key:
> brick-1
> [2014-03-04 21:20:37.120137] E
> [glusterd-store.c:1905:glusterd_store_retrieve_volume] 0-: Unknown key:
> brick-2
> [2014-03-04 21:20:37.120154] E
> [glusterd-store.c:1905:glusterd_store_retrieve_volume] 0-: Unknown key:
> brick-3
> [2014-03-04 21:20:37.761785] I
> [glusterd-handler.c:2886:glusterd_friend_add] 0-management: connect
> returned 0
> [2014-03-04 21:20:37.765059] I
> [glusterd-handler.c:2886:glusterd_friend_add] 0-management: connect
> returned 0
> [2014-03-04 21:20:37.767677] I
> [glusterd-handler.c:2886:glusterd_friend_add] 0-management: connect
> returned 0
> [2014-03-04 21:20:37.767783] I [rpc-clnt.c:974:rpc_clnt_connection_init]
> 0-management: setting frame-timeout to 600
> [2014-03-04 21:20:37.767850] I [socket.c:3485:socket_init] 0-management:
> SSL support is NOT enabled
> [2014-03-04 21:20:37.767866] I [socket.c:3500:socket_init] 0-management:
> using system polling thread
> [2014-03-04 21:20:37.772356] I [rpc-clnt.c:974:rpc_clnt_connection_init]
> 0-management: setting frame-timeout to 600
> [2014-03-04 21:20:37.772441] I [socket.c:3485:socket_init] 0-management:
> SSL support is NOT enabled
> [2014-03-04 21:20:37.772459] I [socket.c:3500:socket_init] 0-management:
> using system polling thread
> [2014-03-04 21:20:37.776131] I [rpc-clnt.c:974:rpc_clnt_connection_init]
> 0-management: setting frame-timeout to 600
> [2014-03-04 21:20:37.776185] I [socket.c:3485:socket_init] 0-management:
> SSL support is NOT enabled
> [2014-03-04 21:20:37.776201] I [socket.c:3500:socket_init] 0-management:
> using system polling thread
> [2014-03-04 21:20:37.780363] E
> [glusterd-store.c:2548:glusterd_resolve_all_bricks] 0-glusterd: resolve
> brick failed in restore
> [2014-03-04 21:20:37.780395] E [xlator.c:423:xlator_init] 0-management:
> Initialization of volume 'management' failed, review your volfile again
> [2014-03-04 21:20:37.780410] E [graph.c:292:glusterfs_graph_init]
> 0-management: initializing translator failed
> [2014-03-04 21:20:37.780422] E [graph.c:479:glusterfs_graph_activate]
> 0-graph: init failed
> [2014-03-04 21:20:37.780723] W [glusterfsd.c:1097:cleanup_and_exit]
> (-->/usr/sbin/glusterd(main+0x6b1) [0x406a91]
> (-->/usr/sbin/glusterd(glusterfs_volumes_init+0xb7) [0x405247]
> (-->/usr/sbin/glusterd(glusterfs_process_volfp+0x106) [0x405156]))) 0-:
> received signum (0), shutting down
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>