[Gluster-devel] path 651, it keeps hanging

Jordi Moles Blanco jordi at cdmon.com
Sun Feb 10 18:44:15 UTC 2008


Hi everyone,

I'm sorry guys, i don't mean to put any pressure on you.. but you know,
in the business world everything has to go smoothly; otherwise, we've
got a lot of costumers calling us really pissed of because the service
is not working.

I actually think this is an excellent project, i really do, and my
bosses still think so, you know, really, but.... we have been testing
this for weeks and there is no a single patch that makes the system, at
least in our case, stable. 

I'm happy to keep testing this as long as needed, but, again, it's a
business, and they are not going to keep paying me for so long to test
something that we can't use in our company. :( Unfortunately, at the end
of the day, it's all about money.

On the other hand, it came from one of my bosses himself to say that if
this works and, in the end, we can use it in our company, this must be
rewarded. I mean, some kind of donation to the project or something like
that. They are usually generous in this sense, with all the free
software we use. We are a hosting provider company and also register
domain names. This could also be an option.

Anyway... the thing is that for some days, may be even a couple of
weeks, we won't be able to test any of this cause we've got a different
project to work on, which has higher priority. I hope that in this time
you will manage to work on the issues. :)  

so, for the greater good, hahaha, i wish you the best, and work hard. :)

bye!!.


El dg 10 de 02 del 2008 a les 08:43 +0530, en/na Amar S. Tumballi va
escriure:
> Hi Jordi,
>  I understand the need of a stable system for you to go live.
> Everytime we face a bug, its a step towards stability for us. That too
> recent changes in unify/afr self-heal were very key for us. It came
> with some bugs which were tricky, and thus took time to solve. Now,
> (from patch set 643+) are thinking its getting stable. I have done two
> fixes (in patch-653), which can cause such situation as you described,
> but is _very_ corner case. If your boss is still ok with the testing,
> and if you still want to go with GlusterFS (i hope you are), patch-653
> should work for you. 
> 
>  For your production, we suggest you to wait till 1.3.8 (which should
> be done is weeks time.. only testing is going on).
> 
> Regards,
> Amar
> 
> On Feb 9, 2008 7:18 AM, Jordi Moles Blanco <jordi at cdmon.com> wrote:
>         Hi again,
>         
>         well... it is worse than i first thought. glusterfs is not
>         working
>         anymore. I've already tried to reboot all the servers, nodes
>         and
>         clients, and the filesystem in not availabe anymore :( .
>         clients keep saying that there some errors while communicating
>         with
>         nodes and there's no way the can mount the filesystem :(
>         i see that i'll have to reinstall all the servers, cause
>         something is
>         really badly broken, and glusterfs won't work anymore. The
>         servers
>         haven't hang at all, and all the reboots have been "clean",
>         with only
>         the glusterfs broken.
>         
>         
>         El ds 09 de 02 del 2008 a les 01:10 +0100, en/na Jordi Moles
>         Blanco va
>         escriure:
>         
>         > hi everyone,
>         >
>         > i'm afraid to bring bad news.
>         >
>         > after applying patch 650, the system seemed to work smoothly
>         for a whole
>         > under a lot of work.
>         > I moved to patch 651 as you suggested, and it didn't last
>         more than 3
>         > hours :(
>         >
>         > the thing is... server didn't hang, but filesystem isn't
>         accessible at
>         > all, not even an "ls" is possible from any of the clients.
>         >
>         > this are the node logs:
>         >
>         > *************
>         > nothing logged about this matter!!!
>         > ************
>         >
>         > and these are the client logs:
>         >
>         > *************
>         > 2008-02-08 14:56:59 E [unify.c:260:unify_lookup_cbk] ultim:
>         Revalidate
>         > failed for /dummy/maildirsize
>         > 2008-02-08 14:56:59 E [fuse-bridge.c:431:fuse_entry_cbk]
>         glusterfs-fuse:
>         > 202928: /dummy/maildirsize => -1 (2)
>         > 2008-02-08 14:56:59 E [afr.c:2398:afr_selfheal_getxattr_cbk]
>         grup2:
>         > (path=/dummy/maildirsize child=espai4) op_ret=-1 op_errno=2
>         > 2008-02-08 14:56:59 E [afr.c:2398:afr_selfheal_getxattr_cbk]
>         grup2:
>         > (path=/dummy/maildirsize child=espai3) op_ret=-1 op_errno=2
>         > 2008-02-08 14:56:59 E [afr.c:1564:afr_open_cbk] grup2:
>         > (path=/dummy/maildirsize child=espai4) op_ret=-1 op_errno=2
>         > 2008-02-08 14:56:59 E [afr.c:1564:afr_open_cbk] grup2:
>         > (path=/dummy/maildirsize child=espai3) op_ret=-1 op_errno=2
>         > 2008-02-08 14:56:59 E [unify.c:802:unify_open_cbk] ultim:
>         Open success
>         > on namespace, failed on child node
>         > 2008-02-08 14:56:59 E [fuse-bridge.c:670:fuse_fd_cbk]
>         glusterfs-fuse:
>         > 203092: /dummy/maildirsize => -1 (2)
>         > 2008-02-08 14:56:59 E [afr.c:2398:afr_selfheal_getxattr_cbk]
>         grup2:
>         > (path=/dummy/maildirsize child=espai4) op_ret=-1 op_errno=2
>         > 2008-02-08 14:56:59 E [afr.c:2398:afr_selfheal_getxattr_cbk]
>         grup2:
>         > (path=/dummy/maildirsize child=espai3) op_ret=-1 op_errno=2
>         > 2008-02-08 14:56:59 E [afr.c:1564:afr_open_cbk] grup2:
>         > (path=/dummy/maildirsize child=espai4) op_ret=-1 op_errno=2
>         > 2008-02-08 14:56:59 E [afr.c:1564:afr_open_cbk] grup2:
>         > (path=/dummy/maildirsize child=espai3) op_ret=-1 op_errno=2
>         > 2008-02-08 14:56:59 E [unify.c:802:unify_open_cbk] ultim:
>         Open success
>         > on namespace, failed on child node
>         > 2008-02-08 14:56:59 E [fuse-bridge.c:670:fuse_fd_cbk]
>         glusterfs-fuse:
>         > 203093: /dummy/maildirsize => -1 (2)
>         > 2008-02-08 14:56:59 E [afr.c:2398:afr_selfheal_getxattr_cbk]
>         grup2:
>         > (path=/dummy/maildirsize child=espai3) op_ret=-1 op_errno=2
>         > 2008-02-08 14:56:59 E [afr.c:2398:afr_selfheal_getxattr_cbk]
>         grup2:
>         > (path=/dummy/maildirsize child=espai4) op_ret=-1 op_errno=2
>         > 2008-02-08 14:56:59 E [afr.c:1564:afr_open_cbk] grup2:
>         > (path=/dummy/maildirsize child=espai3) op_ret=-1 op_errno=2
>         > 2008-02-08 14:56:59 E [afr.c:1564:afr_open_cbk] grup2:
>         > (path=/dummy/maildirsize child=espai4) op_ret=-1 op_errno=2
>         > 2008-02-08 14:56:59 E [unify.c:802:unify_open_cbk] ultim:
>         Open success
>         > on namespace, failed on child node
>         > 2008-02-08 14:56:59 E [fuse-bridge.c:670:fuse_fd_cbk]
>         glusterfs-fuse:
>         > 203094: /dummy/maildirsize => -1 (2)
>         > 2008-02-08 14:57:00 E [unify.c:260:unify_lookup_cbk] ultim:
>         Revalidate
>         > failed for /dummy/maildirsize
>         > 2008-02-08 14:57:00 E [fuse-bridge.c:431:fuse_entry_cbk]
>         glusterfs-fuse:
>         > 203247: /dummy/maildirsize => -1 (2)
>         > 2008-02-08 17:33:05 E [afr.c:2398:afr_selfheal_getxattr_cbk]
>         nm:
>         > (path=/dummy/maildirsize child=namespace2) op_ret=-1
>         op_errno=2
>         > 2008-02-08 17:33:05 E [afr.c:2398:afr_selfheal_getxattr_cbk]
>         nm:
>         > (path=/dummy/maildirsize child=namespace1) op_ret=-1
>         op_errno=2
>         > 2008-02-08 17:33:05 E [afr.c:2398:afr_selfheal_getxattr_cbk]
>         grup2:
>         > (path=/dummy/maildirsize child=espai4) op_ret=-1 op_errno=2
>         > 2008-02-08 17:33:05 E [afr.c:2398:afr_selfheal_getxattr_cbk]
>         grup2:
>         > (path=/dummy/maildirsize child=espai3) op_ret=-1 op_errno=2
>         > 2008-02-08 17:33:05 E [afr.c:1564:afr_open_cbk] nm:
>         > (path=/dummy/maildirsize child=namespace2) op_ret=-1
>         op_errno=2
>         > 2008-02-08 17:33:05 E [afr.c:1564:afr_open_cbk] nm:
>         > (path=/dummy/maildirsize child=namespace1) op_ret=-1
>         op_errno=2
>         > 2008-02-08 17:33:05 E [afr.c:1564:afr_open_cbk] grup2:
>         > (path=/dummy/maildirsize child=espai4) op_ret=-1 op_errno=2
>         > 2008-02-08 17:33:05 E [afr.c:1564:afr_open_cbk] grup2:
>         > (path=/dummy/maildirsize child=espai3) op_ret=-1 op_errno=2
>         > 2008-02-08 17:33:05 E [fuse-bridge.c:670:fuse_fd_cbk]
>         glusterfs-fuse:
>         > 398639: /dummy/maildirsize => -1 (2)
>         > 2008-02-08 17:49:46 E [unify.c:260:unify_lookup_cbk] ultim:
>         Revalidate
>         > failed for /dummy/maildirsize
>         > 2008-02-08 17:49:46 E [fuse-bridge.c:431:fuse_entry_cbk]
>         glusterfs-fuse:
>         > 517247: /dummy/maildirsize => -1 (2)
>         > 2008-02-08 18:03:38 E [afr.c:1564:afr_open_cbk] grup3:
>         > (path=/dummy/maildirsize child=espai6) op_ret=-1 op_errno=2
>         > 2008-02-08 18:03:38 E [afr.c:1564:afr_open_cbk] grup3:
>         > (path=/dummy/maildirsize child=espai5) op_ret=-1 op_errno=2
>         > 2008-02-08 18:03:38 E [unify.c:802:unify_open_cbk] ultim:
>         Open success
>         > on namespace, failed on child node
>         > 2008-02-08 18:03:38 E [fuse-bridge.c:670:fuse_fd_cbk]
>         glusterfs-fuse:
>         > 641123: /dummy/maildirsize => -1 (2)
>         > 2008-02-08 18:30:33 E [afr.c:1564:afr_open_cbk] grup2:
>         > (path=/dummy/maildirsize child=espai3) op_ret=-1 op_errno=2
>         > 2008-02-08 18:30:33 E [afr.c:1564:afr_open_cbk] grup2:
>         > (path=/dummy/maildirsize child=espai4) op_ret=-1 op_errno=2
>         > 2008-02-08 18:30:33 E [afr.c:1564:afr_open_cbk] nm:
>         > (path=/dummy/maildirsize child=namespace2) op_ret=-1
>         op_errno=2
>         > 2008-02-08 18:30:33 E [afr.c:1564:afr_open_cbk] nm:
>         > (path=/dummy/maildirsize child=namespace1) op_ret=-1
>         op_errno=2
>         > 2008-02-08 18:30:33 E [fuse-bridge.c:670:fuse_fd_cbk]
>         glusterfs-fuse:
>         > 894293: /dummy/maildirsize => -1 (2)
>         > 2008-02-08 18:38:09 E [afr.c:1564:afr_open_cbk] nm:
>         > (path=/dummy/maildirsize child=namespace2) op_ret=-1
>         op_errno=2
>         > 2008-02-08 18:38:09 E [afr.c:1564:afr_open_cbk] nm:
>         > (path=/dummy/maildirsize child=namespace1) op_ret=-1
>         op_errno=2
>         > 2008-02-08 18:38:09 E [unify.c:794:unify_open_cbk] ultim:
>         Open success
>         > on child node, failed on namespace
>         > 2008-02-08 18:38:09 E [fuse-bridge.c:670:fuse_fd_cbk]
>         glusterfs-fuse:
>         > 969868: /dummy/maildirsize => -1 (2)
>         > 2008-02-08 19:26:58 E [unify.c:260:unify_lookup_cbk] ultim:
>         Revalidate
>         > failed for /dummy/maildirsize
>         > 2008-02-08 19:26:58 E [fuse-bridge.c:431:fuse_entry_cbk]
>         glusterfs-fuse:
>         > 1302224: /dummy/maildirsize => -1 (2)
>         > 2008-02-08 19:40:03 E [afr.c:1564:afr_open_cbk] nm:
>         > (path=/dummy/maildirsize child=namespace2) op_ret=-1
>         op_errno=2
>         > 2008-02-08 19:40:03 E [afr.c:1564:afr_open_cbk] nm:
>         > (path=/dummy/maildirsize child=namespace1) op_ret=-1
>         op_errno=2
>         > 2008-02-08 19:40:03 E [unify.c:794:unify_open_cbk] ultim:
>         Open success
>         > on child node, failed on namespace
>         > *************
>         >
>         > i don't know if that makes any sense, but here you've got
>         come cats
>         >
>         > dovecots:
>         >
>         > cat /mnt/fusectl/1/waiting >>  6
>         >
>         > postfixs:
>         >
>         > cat /mnt/fusectl/1/waiting >>  14
>         >
>         > these are values at the very moment everything hanged
>         >
>         >
>         >
>         > _______________________________________________
>         > Gluster-devel mailing list
>         > Gluster-devel at nongnu.org
>         > http://lists.nongnu.org/mailman/listinfo/gluster-devel
>         
>         
>         
>         _______________________________________________
>         Gluster-devel mailing list
>         Gluster-devel at nongnu.org
>         http://lists.nongnu.org/mailman/listinfo/gluster-devel
>         
> 
> 
> 
> -- 
> Amar Tumballi
> Gluster/GlusterFS Hacker
> [bulde on #gluster/irc.gnu.org]
> http://www.zresearch.com - Commoditizing Supercomputing and
> Superstorage!






More information about the Gluster-devel mailing list