[Gluster-users] Command "/etc/init.d/glusterd start" failed

Sat Apr 19 16:49:58 UTC 2014

I have been plagued by errors of this kind every so often, mainly 
because we are in a development phase and we reboot our servers so 
frequently. If you start glusterd in debug mode:

sh$ glusterd --debug

you can easily pinpoint exactly which volume/peer data is causing the 
initialization failure for mgmt/glusterd.

In addition, from my own experiences, two of the leading reasons for 
failure include:
a) Bad peer data if glusterd is somehow killed during an active peer 
probe operation, and
b) I have noticed that if glusterd needs to update info for volume/brick 
(say "info" for volume testvol) in /var/lib/glusterd, it first renames 
/var/lib/glusterd/vols/testvol/info to info.tmp, and then creates a new 
file info, which is probably written into _freshly_. If glusterd were to 
crash at this point, it would cause failures in glusterd startup till 
this is manually resolved. Usually, moving info.tmp into info works for me.

Thanks,
Anirban

On Saturday 12 April 2014 08:45 AM, 吴保川 wrote:
> It is tcp.
>
> [root at server1 wbc]# gluster volume info
>
> Volume Name: gv_replica
> Type: Replicate
> Volume ID: 81014863-ee59-409b-8897-6485d411d14d
> Status: Started
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: 192.168.1.3:/home/wbc/vdir/gv_replica
> Brick2: 192.168.1.4:/home/wbc/vdir/gv_replica
>
> Volume Name: gv1
> Type: Distribute
> Volume ID: cfe2b8a0-284b-489d-a153-21182933f266
> Status: Started
> Number of Bricks: 2
> Transport-type: tcp
> Bricks:
> Brick1: 192.168.1.4:/home/wbc/vdir/gv1
> Brick2: 192.168.1.3:/home/wbc/vdir/gv1
>
> Thanks,
> Baochuan Wu
>
>
>
> 2014-04-12 10:11 GMT+08:00 Nagaprasad Sathyanarayana 
> <nsathyan at redhat.com <mailto:nsathyan at redhat.com>>:
>
>     If you run
>
>     # gluster volume info
>
>     What is the value set for transport-type?
>
>     Thanks
>     Naga
>
>
>     On 12-Apr-2014, at 7:33 am, 吴保川 <wildpointercs at gmail.com
>     <mailto:wildpointercs at gmail.com>> wrote:
>
>>     Thanks, Joe. I found one of my machine has been assigned wrong IP
>>     address. This leads to the error.
>>     Originally, I thought the following error is critical:
>>     [2014-04-11 18:12:03.433371] E
>>     [rpc-transport.c:269:rpc_transport_load] 0-rpc-transport:
>>     /usr/local/lib/glusterfs/3.4.3/rpc-transport/rdma.so: cannot open
>>     shared object file: No such file or directory
>>
>>
>>     2014-04-12 5:34 GMT+08:00 Joe Julian <joe at julianfamily.org
>>     <mailto:joe at julianfamily.org>>:
>>
>>         On 04/11/2014 11:18 AM, 吴保川 wrote:
>>
>>             [2014-04-11 18:12:05.165989] E
>>             [glusterd-store.c:2663:glusterd_resolve_all_bricks]
>>             0-glusterd: resolve brick failed in restore
>>
>>         I'm pretty sure that means that one of the bricks isn't
>>         resolved in your list of peers.
>>
>>
>>     _______________________________________________
>>     Gluster-users mailing list
>>     Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>>     http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140419/6f8e3768/attachment.html>