[Gluster-users] Glusterd dont start

Tue Jan 28 17:03:51 UTC 2014

Hi Jefferson,

I've seen such differences in df, too. They are not necessarily a cause 
for alarm, as sometimes sparse files can be identical (verified through 
md5sum) on both bricks, but not use the same number of disk blocks.

You should instead try a ls -l of the files on both bricks, and see if 
they are different. If they're exactly the same, you could still try to 
do an md5sum, I did that on my bricks (without gluster running) to make 
100% sure that all the interesting events of the past few days didn't 
corrupt my storage.

The difference in disk usage can also be down to the content of the 
hidden .glusterfs-directory in your bricks. That's where the main 
difference is on my machines.

Regards, Paul Boven.

On 01/28/2014 05:54 PM, Jefferson Carlos Machado wrote:
> Hi,
>
> Thank you so much.
> After this all sounds good, but I am not sure because df is different on
> nodes.
>
> root at srvhttp0 results]# df
> Filesystem              1K-blocks    Used Available Use% Mounted on
> /dev/mapper/fedora-root   2587248 2128160    307948  88% /
> devtmpfs                   493056       0    493056   0% /dev
> tmpfs                      506240   50648    455592  11% /dev/shm
> tmpfs                      506240     236    506004   1% /run
> tmpfs                      506240       0    506240   0% /sys/fs/cgroup
> tmpfs                      506240      12    506228   1% /tmp
> /dev/xvda1                 487652  106846    351110  24% /boot
> /dev/xvdb1                2085888  551292 1534596  27% /gv
> localhost:/gv_html        2085888  587776   1498112  29% /var/www/html
> [root at srvhttp0 results]# cd /gv
> [root at srvhttp0 gv]# ls -la
> total 8
> drwxr-xr-x   3 root root   17 Jan 28 14:43 .
> dr-xr-xr-x. 19 root root 4096 Jan 26 10:10 ..
> drwxr-xr-x   4 root root   37 Jan 28 14:43 html
> [root at srvhttp0 gv]#
>
>
> [root at srvhttp1 html]# df
> Filesystem              1K-blocks    Used Available Use% Mounted on
> /dev/mapper/fedora-root   2587248 2355180     80928  97% /
> devtmpfs                   126416       0    126416   0% /dev
> tmpfs                      139600   35252    104348  26% /dev/shm
> tmpfs                      139600     208    139392   1% /run
> tmpfs                      139600       0    139600   0% /sys/fs/cgroup
> tmpfs                      139600       8    139592   1% /tmp
> /dev/xvda1                 487652  106846    351110  24% /boot
> /dev/xvdb1                2085888  587752 1498136  29% /gv
> localhost:/gv_html        2085888  587776   1498112  29% /var/www/html
> [root at srvhttp1 html]#
> [root at srvhttp1 html]# cd /gv
> [root at srvhttp1 gv]# ll -a
> total 12
> drwxr-xr-x   3 root root   17 Jan 28 14:42 .
> dr-xr-xr-x. 19 root root 4096 Out 18 11:16 ..
> drwxr-xr-x   4 root root   37 Jan 28 14:42 html
> [root at srvhttp1 gv]#
>
> Em 28-01-2014 12:01, Franco Broi escreveu:
>>
>> Every peer has a copy of the files but I'm not sure it's 100% safe to
>> remove them entirely. I've never really got a definitive answer from
>> the Gluster devs but if your files were trashed anyway you don't have
>> anything to lose.
>>
>> This is what I did.
>>
>> On the bad node stop glusterd
>>
>> Make a copy of the /var/lib/glusterd dir, then remove it.
>>
>> Start glusterd
>>
>> peer probe the good node.
>>
>> Restart glusterd
>>
>> And that should be it. Check the files are there.
>>
>> If it doesn't work you can restore the files from the backup copy.
>>
>> On 28 Jan 2014 21:48, Jefferson Carlos Machado
>> <lista.linux at results.com.br> wrote:
>> Hi,
>>
>> I have only 2 nodes in this cluster.
>> So can I remove the config files?
>>
>> Regards,
>> Em 28-01-2014 04:17, Franco Broi escreveu:
>> > I think Jefferson's problem might have been due to corrupted config
>> > files, maybe because the /var partition was full as suggested by Paul
>> > Boven but as has been pointed out before, the error messages don't make
>> > it obvious what's wrong.
>> >
>> > He got glusterd started but now the peers can't communicate, probably
>> > because a uuid is wrong. This is an weird problem to debug because the
>> > clients can see the data but df may not show the full size and you
>> > wouldn't now anything was wrong until like Jefferson you looked in the
>> > gluster log file.
>> >
>> > [2014-01-27 15:48:19.580353] E [socket.c:2788:socket_connect]
>> 0-management: connection attempt failed (Connection refused)
>> > [2014-01-27 15:48:19.583374] I
>> [glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management:
>> Found brick
>> > [2014-01-27 15:48:22.584029] E [socket.c:2788:socket_connect]
>> 0-management: connection attempt failed (Connection refused)
>> > [2014-01-27 15:48:22.607477] I
>> [glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management:
>> Found brick
>> > [2014-01-27 15:48:25.608186] E [socket.c:2788:socket_connect]
>> 0-management: connection attempt failed (Connection refused)
>> > [2014-01-27 15:48:25.612032] I
>> [glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management:
>> Found brick
>> > [2014-01-27 15:48:28.612638] E [socket.c:2788:socket_connect]
>> 0-management: connection attempt failed (Connection refused)
>> > [2014-01-27 15:48:28.615509] I
>> [glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management:
>> Found brick
>> >
>> > I think the advice should be, if you have a working peer, use a peer
>> > probe and glusterd restart to restore the files but in order for this to
>> > work, you have to remove all the config files first so that glutserd
>> > will start in the first place.
>> >
>> >
>> > On Tue, 2014-01-28 at 08:32 +0530, shwetha wrote:
>> >> Hi Jefferson,
>> >>
>> >> glusterd don't start because it's not able to find the brick path for
>> >> the volume Or the brick path doesn't exist any more.
>> >>
>> >> Please refer to the bug
>> >> https://bugzilla.redhat.com/show_bug.cgi?id=1036551
>> >>
>> >> Check if the brick path is available .
>> >>
>> >> -Shwetha
>> >>
>> >> On 01/27/2014 05:23 PM, Jefferson Carlos Machado wrote:
>> >>
>> >>> Hi,
>> >>>
>> >>> Please, help me!!
>> >>>
>> >>> After reboot my system the service glusterd dont start.
>> >>>
>> >>> the /var/log/glusterfs/etc-glusterfs-glusterd.vol.log
>> >>>
>> >>> [2014-01-27 09:27:02.898807] I [glusterfsd.c:1910:main]
>> >>> 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version
>> >>> 3.4.2 (/usr/sbin/glusterd -p /run/glusterd.pid)
>> >>> [2014-01-27 09:27:02.909147] I [glusterd.c:961:init] 0-management:
>> >>> Using /var/lib/glusterd as working directory
>> >>> [2014-01-27 09:27:02.913247] I [socket.c:3480:socket_init]
>> >>> 0-socket.management: SSL support is NOT enabled
>> >>> [2014-01-27 09:27:02.913273] I [socket.c:3495:socket_init]
>> >>> 0-socket.management: using system polling thread
>> >>> [2014-01-27 09:27:02.914337] W [rdma.c:4197:__gf_rdma_ctx_create]
>> >>> 0-rpc-transport/rdma: rdma_cm event channel creation failed (No such
>> >>> device)
>> >>> [2014-01-27 09:27:02.914359] E [rdma.c:4485:init] 0-rdma.management:
>> >>> Failed to initialize IB Device
>> >>> [2014-01-27 09:27:02.914375] E
>> >>> [rpc-transport.c:320:rpc_transport_load] 0-rpc-transport: 'rdma'
>> >>> initialization failed
>> >>> [2014-01-27 09:27:02.914535] W
>> >>> [rpcsvc.c:1389:rpcsvc_transport_create] 0-rpc-service: cannot create
>> >>> listener, initing the transport failed
>> >>> [2014-01-27 09:27:05.337557] I
>> >>> [glusterd-store.c:1339:glusterd_restore_op_version] 0-glusterd:
>> >>> retrieved op-version: 2
>> >>> [2014-01-27 09:27:05.373853] E
>> >>> [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown
>> >>> key: brick-0
>> >>> [2014-01-27 09:27:05.373927] E
>> >>> [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown
>> >>> key: brick-1
>> >>> [2014-01-27 09:27:06.166721] I [glusterd.c:125:glusterd_uuid_init]
>> >>> 0-management: retrieved UUID: 28f232e9-564f-4866-8014-32bb020766f2
>> >>> [2014-01-27 09:27:06.169422] E
>> >>> [glusterd-store.c:2487:glusterd_resolve_all_bricks] 0-glusterd:
>> >>> resolve brick failed in restore
>> >>> [2014-01-27 09:27:06.169491] E [xlator.c:390:xlator_init]
>> >>> 0-management: Initialization of volume 'management' failed, review
>> >>> your volfile again
>> >>> [2014-01-27 09:27:06.169516] E [graph.c:292:glusterfs_graph_init]
>> >>> 0-management: initializing translator failed
>> >>> [2014-01-27 09:27:06.169532] E
>> >>> [graph.c:479:glusterfs_graph_activate] 0-graph: init failed
>> >>> [2014-01-27 09:27:06.169769] W [glusterfsd.c:1002:cleanup_and_exit]
>> >>> (-->/usr/sbin/glusterd(main+0x3df) [0x7f23c76588ef]
>> >>> (-->/usr/sbin/glusterd(glusterfs_volumes_init+0xb0) [0x7f23c765b6e0]
>> >>> (-->/usr/sbin/glusterd(glusterfs_process_volfp+0x103)
>> >>> [0x7f23c765b5f3]))) 0-: received signum (0), shutting down
>> >>>
>> >>> _______________________________________________
>> >>> Gluster-users mailing list
>> >>> Gluster-users at gluster.org
>> >>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>> >>
>> >> _______________________________________________
>> >> Gluster-users mailing list
>> >> Gluster-users at gluster.org
>> >> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>> >
>>
>>
>> ------------------------------------------------------------------------
>>
>>
>> This email and any files transmitted with it are confidential and are
>> intended solely for the use of the individual or entity to whom they
>> are addressed. If you are not the original recipient or the person
>> responsible for delivering the email to the intended recipient, be
>> advised that you have received this email in error, and that any use,
>> dissemination, forwarding, printing, or copying of this email is
>> strictly prohibited. If you received this email in error, please
>> immediately notify the sender and delete the original.
>>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>

-- 
Paul Boven <boven at jive.nl> +31 (0)521-596547
Unix/Linux/Networking specialist
Joint Institute for VLBI in Europe - www.jive.nl
VLBI - It's a fringe science