[Gluster-users] Glusterd dont start
Paul Boven
boven at jive.nl
Tue Jan 28 17:03:51 UTC 2014
Hi Jefferson,
I've seen such differences in df, too. They are not necessarily a cause
for alarm, as sometimes sparse files can be identical (verified through
md5sum) on both bricks, but not use the same number of disk blocks.
You should instead try a ls -l of the files on both bricks, and see if
they are different. If they're exactly the same, you could still try to
do an md5sum, I did that on my bricks (without gluster running) to make
100% sure that all the interesting events of the past few days didn't
corrupt my storage.
The difference in disk usage can also be down to the content of the
hidden .glusterfs-directory in your bricks. That's where the main
difference is on my machines.
Regards, Paul Boven.
On 01/28/2014 05:54 PM, Jefferson Carlos Machado wrote:
> Hi,
>
> Thank you so much.
> After this all sounds good, but I am not sure because df is different on
> nodes.
>
> root at srvhttp0 results]# df
> Filesystem 1K-blocks Used Available Use% Mounted on
> /dev/mapper/fedora-root 2587248 2128160 307948 88% /
> devtmpfs 493056 0 493056 0% /dev
> tmpfs 506240 50648 455592 11% /dev/shm
> tmpfs 506240 236 506004 1% /run
> tmpfs 506240 0 506240 0% /sys/fs/cgroup
> tmpfs 506240 12 506228 1% /tmp
> /dev/xvda1 487652 106846 351110 24% /boot
> /dev/xvdb1 2085888 551292 1534596 27% /gv
> localhost:/gv_html 2085888 587776 1498112 29% /var/www/html
> [root at srvhttp0 results]# cd /gv
> [root at srvhttp0 gv]# ls -la
> total 8
> drwxr-xr-x 3 root root 17 Jan 28 14:43 .
> dr-xr-xr-x. 19 root root 4096 Jan 26 10:10 ..
> drwxr-xr-x 4 root root 37 Jan 28 14:43 html
> [root at srvhttp0 gv]#
>
>
> [root at srvhttp1 html]# df
> Filesystem 1K-blocks Used Available Use% Mounted on
> /dev/mapper/fedora-root 2587248 2355180 80928 97% /
> devtmpfs 126416 0 126416 0% /dev
> tmpfs 139600 35252 104348 26% /dev/shm
> tmpfs 139600 208 139392 1% /run
> tmpfs 139600 0 139600 0% /sys/fs/cgroup
> tmpfs 139600 8 139592 1% /tmp
> /dev/xvda1 487652 106846 351110 24% /boot
> /dev/xvdb1 2085888 587752 1498136 29% /gv
> localhost:/gv_html 2085888 587776 1498112 29% /var/www/html
> [root at srvhttp1 html]#
> [root at srvhttp1 html]# cd /gv
> [root at srvhttp1 gv]# ll -a
> total 12
> drwxr-xr-x 3 root root 17 Jan 28 14:42 .
> dr-xr-xr-x. 19 root root 4096 Out 18 11:16 ..
> drwxr-xr-x 4 root root 37 Jan 28 14:42 html
> [root at srvhttp1 gv]#
>
> Em 28-01-2014 12:01, Franco Broi escreveu:
>>
>> Every peer has a copy of the files but I'm not sure it's 100% safe to
>> remove them entirely. I've never really got a definitive answer from
>> the Gluster devs but if your files were trashed anyway you don't have
>> anything to lose.
>>
>> This is what I did.
>>
>> On the bad node stop glusterd
>>
>> Make a copy of the /var/lib/glusterd dir, then remove it.
>>
>> Start glusterd
>>
>> peer probe the good node.
>>
>> Restart glusterd
>>
>> And that should be it. Check the files are there.
>>
>> If it doesn't work you can restore the files from the backup copy.
>>
>> On 28 Jan 2014 21:48, Jefferson Carlos Machado
>> <lista.linux at results.com.br> wrote:
>> Hi,
>>
>> I have only 2 nodes in this cluster.
>> So can I remove the config files?
>>
>> Regards,
>> Em 28-01-2014 04:17, Franco Broi escreveu:
>> > I think Jefferson's problem might have been due to corrupted config
>> > files, maybe because the /var partition was full as suggested by Paul
>> > Boven but as has been pointed out before, the error messages don't make
>> > it obvious what's wrong.
>> >
>> > He got glusterd started but now the peers can't communicate, probably
>> > because a uuid is wrong. This is an weird problem to debug because the
>> > clients can see the data but df may not show the full size and you
>> > wouldn't now anything was wrong until like Jefferson you looked in the
>> > gluster log file.
>> >
>> > [2014-01-27 15:48:19.580353] E [socket.c:2788:socket_connect]
>> 0-management: connection attempt failed (Connection refused)
>> > [2014-01-27 15:48:19.583374] I
>> [glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management:
>> Found brick
>> > [2014-01-27 15:48:22.584029] E [socket.c:2788:socket_connect]
>> 0-management: connection attempt failed (Connection refused)
>> > [2014-01-27 15:48:22.607477] I
>> [glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management:
>> Found brick
>> > [2014-01-27 15:48:25.608186] E [socket.c:2788:socket_connect]
>> 0-management: connection attempt failed (Connection refused)
>> > [2014-01-27 15:48:25.612032] I
>> [glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management:
>> Found brick
>> > [2014-01-27 15:48:28.612638] E [socket.c:2788:socket_connect]
>> 0-management: connection attempt failed (Connection refused)
>> > [2014-01-27 15:48:28.615509] I
>> [glusterd-utils.c:1079:glusterd_volume_brickinfo_get] 0-management:
>> Found brick
>> >
>> > I think the advice should be, if you have a working peer, use a peer
>> > probe and glusterd restart to restore the files but in order for this to
>> > work, you have to remove all the config files first so that glutserd
>> > will start in the first place.
>> >
>> >
>> > On Tue, 2014-01-28 at 08:32 +0530, shwetha wrote:
>> >> Hi Jefferson,
>> >>
>> >> glusterd don't start because it's not able to find the brick path for
>> >> the volume Or the brick path doesn't exist any more.
>> >>
>> >> Please refer to the bug
>> >> https://bugzilla.redhat.com/show_bug.cgi?id=1036551
>> >>
>> >> Check if the brick path is available .
>> >>
>> >> -Shwetha
>> >>
>> >> On 01/27/2014 05:23 PM, Jefferson Carlos Machado wrote:
>> >>
>> >>> Hi,
>> >>>
>> >>> Please, help me!!
>> >>>
>> >>> After reboot my system the service glusterd dont start.
>> >>>
>> >>> the /var/log/glusterfs/etc-glusterfs-glusterd.vol.log
>> >>>
>> >>> [2014-01-27 09:27:02.898807] I [glusterfsd.c:1910:main]
>> >>> 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version
>> >>> 3.4.2 (/usr/sbin/glusterd -p /run/glusterd.pid)
>> >>> [2014-01-27 09:27:02.909147] I [glusterd.c:961:init] 0-management:
>> >>> Using /var/lib/glusterd as working directory
>> >>> [2014-01-27 09:27:02.913247] I [socket.c:3480:socket_init]
>> >>> 0-socket.management: SSL support is NOT enabled
>> >>> [2014-01-27 09:27:02.913273] I [socket.c:3495:socket_init]
>> >>> 0-socket.management: using system polling thread
>> >>> [2014-01-27 09:27:02.914337] W [rdma.c:4197:__gf_rdma_ctx_create]
>> >>> 0-rpc-transport/rdma: rdma_cm event channel creation failed (No such
>> >>> device)
>> >>> [2014-01-27 09:27:02.914359] E [rdma.c:4485:init] 0-rdma.management:
>> >>> Failed to initialize IB Device
>> >>> [2014-01-27 09:27:02.914375] E
>> >>> [rpc-transport.c:320:rpc_transport_load] 0-rpc-transport: 'rdma'
>> >>> initialization failed
>> >>> [2014-01-27 09:27:02.914535] W
>> >>> [rpcsvc.c:1389:rpcsvc_transport_create] 0-rpc-service: cannot create
>> >>> listener, initing the transport failed
>> >>> [2014-01-27 09:27:05.337557] I
>> >>> [glusterd-store.c:1339:glusterd_restore_op_version] 0-glusterd:
>> >>> retrieved op-version: 2
>> >>> [2014-01-27 09:27:05.373853] E
>> >>> [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown
>> >>> key: brick-0
>> >>> [2014-01-27 09:27:05.373927] E
>> >>> [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown
>> >>> key: brick-1
>> >>> [2014-01-27 09:27:06.166721] I [glusterd.c:125:glusterd_uuid_init]
>> >>> 0-management: retrieved UUID: 28f232e9-564f-4866-8014-32bb020766f2
>> >>> [2014-01-27 09:27:06.169422] E
>> >>> [glusterd-store.c:2487:glusterd_resolve_all_bricks] 0-glusterd:
>> >>> resolve brick failed in restore
>> >>> [2014-01-27 09:27:06.169491] E [xlator.c:390:xlator_init]
>> >>> 0-management: Initialization of volume 'management' failed, review
>> >>> your volfile again
>> >>> [2014-01-27 09:27:06.169516] E [graph.c:292:glusterfs_graph_init]
>> >>> 0-management: initializing translator failed
>> >>> [2014-01-27 09:27:06.169532] E
>> >>> [graph.c:479:glusterfs_graph_activate] 0-graph: init failed
>> >>> [2014-01-27 09:27:06.169769] W [glusterfsd.c:1002:cleanup_and_exit]
>> >>> (-->/usr/sbin/glusterd(main+0x3df) [0x7f23c76588ef]
>> >>> (-->/usr/sbin/glusterd(glusterfs_volumes_init+0xb0) [0x7f23c765b6e0]
>> >>> (-->/usr/sbin/glusterd(glusterfs_process_volfp+0x103)
>> >>> [0x7f23c765b5f3]))) 0-: received signum (0), shutting down
>> >>>
>> >>> _______________________________________________
>> >>> Gluster-users mailing list
>> >>> Gluster-users at gluster.org
>> >>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>> >>
>> >> _______________________________________________
>> >> Gluster-users mailing list
>> >> Gluster-users at gluster.org
>> >> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>> >
>>
>>
>> ------------------------------------------------------------------------
>>
>>
>> This email and any files transmitted with it are confidential and are
>> intended solely for the use of the individual or entity to whom they
>> are addressed. If you are not the original recipient or the person
>> responsible for delivering the email to the intended recipient, be
>> advised that you have received this email in error, and that any use,
>> dissemination, forwarding, printing, or copying of this email is
>> strictly prohibited. If you received this email in error, please
>> immediately notify the sender and delete the original.
>>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
--
Paul Boven <boven at jive.nl> +31 (0)521-596547
Unix/Linux/Networking specialist
Joint Institute for VLBI in Europe - www.jive.nl
VLBI - It's a fringe science
More information about the Gluster-users
mailing list