[Gluster-devel] NetBSD regression tests hanging after ./tests/basic/mgmt_v3-locks.t

Pranith Kumar Karampuri pkarampu at redhat.com
Mon Jun 15 11:08:54 UTC 2015


Emmanuel,
        I am not sure of the feasibility but just wanted to ask you. Do 
you think there is a possibility to error out operations on the mount 
when mount crashes instead of hanging? That would prevent a lot of 
manual intervention even in future.

Pranith.
On 06/15/2015 01:35 PM, Niels de Vos wrote:
> Hi,
>
> sometimes the NetBSD regression tests hang with messages like this:
>
>      [12:29:07] ./tests/basic/mgmt_v3-locks.t
>      ........................................... ok    79867 ms
>      No volumes present
>      mount_nfs: can't access /patchy: Permission denied
>      mount_nfs: can't access /patchy: Permission denied
>      mount_nfs: can't access /patchy: Permission denied
>
> Most (if not all) of these hangs are caused by a crashing Gluster/NFS
> process. Once the Gluster/NFS server is not reachable anymore,
> unmounting fails.
>
> The only way to recover is to reboot the VM and retrigger the test. For
> rebooting, the http://build.gluster.org/job/reboot-vm job can be used,
> and retriggering works by clicking the "retrigger" link in the left menu
> once the test has been marked as failed/aborted.
>
> When logging in on the NetBSD system that hangs, you can verify with
> these steps:
>
> 1. check if there is a /glusterfsd.core file
> 2. run gdb on the core:
>
>      # cd /build/install
>      # gdb --core=/glusterfsd.core sbin/glusterfs
>      ...
>      Program terminated with signal SIGSEGV, Segmentation fault.
>      #0  0xb9b94f0b in auth_cache_lookup (cache=0xb9aa2310, fh=0xb9044bf8,
>      host_addr=0xb900e400 "104.130.205.187", timestamp=0xbf7fd900,
>      can_write=0xbf7fd8fc)
>          at
>      /home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/xlators/nfs/server/src/auth-cache.c:164
>      164             *can_write = lookup_res->item->opts->rw;
>
> 3. verify the lookup_res structure:
>
>      (gdb) p *lookup_res
>      $1 = {timestamp = 1434284981, item = 0xb901e3b0}
>      (gdb) p *lookup_res->item
>      $2 = {name = 0xffffff00 <error: Cannot access memory at address
>      0xffffff00>, opts = 0xffffffff}
>
>
> A fix for this has been sent, it is currently waiting for an update to
> the prosed reference counting:
>
>    - http://review.gluster.org/11022
>      core: add "gf_ref_t" for common refcounting structures
>    - http://review.gluster.org/11023
>      nfs: refcount each auth_cache_entry and related data_t
>
> Thanks,
> Niels
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel



More information about the Gluster-devel mailing list