[Bugs] [Bug 1226253] New: gluster volume heal info crashes

Fri May 29 10:03:32 UTC 2015

https://bugzilla.redhat.com/show_bug.cgi?id=1226253

            Bug ID: 1226253
           Summary: gluster volume heal info crashes
           Product: GlusterFS
           Version: mainline
         Component: replicate
          Assignee: bugs at gluster.org
          Reporter: pkarampu at redhat.com
                CC: bugs at gluster.org, gluster-bugs at redhat.com

Description of problem:

Mail thread from Alessandro who reported the problem on gluster-users:

On 05/29/2015 03:16 PM, Alessandro De Salvo wrote:
> Hi Pranith,
> I’m definitely sure the log is correct, but you are also correct when you say there is no sign of crash (even checking with grep!).
> However I see core dumps (e.g. core.19430) in /var/log/gluster) created every time I issue the heal info command.
> From gdb I see this:
Thanks for providing the information Alessandro. We will fix this issue. I am
wondering how we can unblock you in the interim. There is a plan to release
3.7.1 in 2-3 days I think. I can try to make this fix for that release. Let me
know if you can wait that long? Another possibility is to compile just glfsheal
binary with the fix which "gluster volume heal <volname> info" internally. Let
me know.

Pranith.
>
>
> GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-64.el7
> Copyright (C) 2013 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from /usr/sbin/glfsheal...Reading symbols from /usr/lib/debug/usr/sbin/glfsheal.debug...done.
> done.
> [New LWP 19430]
> [New LWP 19431]
> [New LWP 19434]
> [New LWP 19436]
> [New LWP 19433]
> [New LWP 19437]
> [New LWP 19432]
> [New LWP 19435]
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> Core was generated by `/usr/sbin/glfsheal adsnet-vm-01'.
> Program terminated with signal 11, Segmentation fault.
> #0  inode_unref (inode=0x7f7a1e27806c) at inode.c:499
> 499             table = inode->table;
> (gdb) bt
> #0  inode_unref (inode=0x7f7a1e27806c) at inode.c:499
> #1  0x00007f7a265e8a61 in fini (this=<optimized out>) at qemu-block.c:1092
> #2  0x00007f7a39a53791 in xlator_fini_rec (xl=0x7f7a2000b9a0) at xlator.c:463
> #3  0x00007f7a39a53725 in xlator_fini_rec (xl=0x7f7a2000d450) at xlator.c:453
> #4  0x00007f7a39a53725 in xlator_fini_rec (xl=0x7f7a2000e800) at xlator.c:453
> #5  0x00007f7a39a53725 in xlator_fini_rec (xl=0x7f7a2000fbb0) at xlator.c:453
> #6  0x00007f7a39a53725 in xlator_fini_rec (xl=0x7f7a20010f80) at xlator.c:453
> #7  0x00007f7a39a53725 in xlator_fini_rec (xl=0x7f7a20012330) at xlator.c:453
> #8  0x00007f7a39a53725 in xlator_fini_rec (xl=0x7f7a200136e0) at xlator.c:453
> #9  0x00007f7a39a53725 in xlator_fini_rec (xl=0x7f7a20014b30) at xlator.c:453
> #10 0x00007f7a39a53725 in xlator_fini_rec (xl=0x7f7a20015fc0) at xlator.c:453
> #11 0x00007f7a39a54eea in xlator_tree_fini (xl=<optimized out>) at xlator.c:545
> #12 0x00007f7a39a90b25 in glusterfs_graph_deactivate (graph=<optimized out>) at graph.c:340
> #13 0x00007f7a38d50e3c in pub_glfs_fini (fs=fs at entry=0x7f7a3a6b6010) at glfs.c:1155
> #14 0x00007f7a39f18ed4 in main (argc=<optimized out>, argv=<optimized out>) at glfs-heal.c:821
>
>
> Thanks,
>
> Alessandro
>
>> Il giorno 29/mag/2015, alle ore 11:12, Pranith Kumar Karampuri <pkarampu at redhat.com> ha scritto:
>>
>>
>>
>> On 05/29/2015 02:37 PM, Alessandro De Salvo wrote:
>>> Hi Pranith,
>>> many thanks for the help!
>>> The volume info of the problematic volume is the following:
>>>
>>> # gluster volume info adsnet-vm-01
>>>  
>>> Volume Name: adsnet-vm-01
>>> Type: Replicate
>>> Volume ID: f8f615df-3dde-4ea6-9bdb-29a1706e864c
>>> Status: Started
>>> Number of Bricks: 1 x 2 = 2
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: gwads02.sta.adsnet.it:/gluster/vm01/data
>>> Brick2: gwads03.sta.adsnet.it:/gluster/vm01/data
>>> Options Reconfigured:
>>> nfs.disable: true
>>> features.barrier: disable
>>> features.file-snapshot: on
>>> server.allow-insecure: on
>> Are you sure the attached log is correct? I do not see any backtrace in the log file to indicate there is a crash :-(. Could you do "grep -i crash /var/log/glusterfs/*" to see if there is some other file with the crash. If that also fails, will it be possible for you to provide the backtrace of the core by opening it using gdb?
>>
>> Pranith
>>>
>>> The log is in attachment.
>>> I just wanted to add that the heal info command works fine on other volumes hosted by the same machines, so it’s just this volume which is causing problems.
>>> Thanks,
>>>
>>> Alessandro
>>>
>>>
>>>
>>>
>>>> Il giorno 29/mag/2015, alle ore 10:50, Pranith Kumar Karampuri <pkarampu at redhat.com> ha scritto:
>>>>
>>>>
>>>>
>>>> On 05/29/2015 02:18 PM, Pranith Kumar Karampuri wrote:
>>>>>
>>>>>
>>>>> On 05/29/2015 02:13 PM, Alessandro De Salvo wrote:
>>>>>> Hi,
>>>>>> I'm facing a strange issue with split brain reporting.
>>>>>> I have upgraded to 3.7.0, after stopping all gluster processes as described in the twiki, on all servers hosting the volumes. The upgrade and the restart was fine, and the volumes are accessible.
>>>>>> However I had two files in split brain that I did not heal before upgrading, so I tried a full heal with 3.7.0. The heal was launched correctly, but when I now perform an heal info there is no output, while the heal statistics says there are actually 2 files in split brain. In the logs I see something like this:
>>>>>>
>>>>>> glustershd.log:
>>>>>> [2015-05-29 08:28:43.008373] I [afr-self-heal-entry.c:558:afr_selfheal_entry_do] 0-adsnet-gluster-01-replicate-0: performing entry selfheal on 7fd1262d-949b-402e-96c2-ae487c8d4e27
>>>>>> [2015-05-29 08:28:43.012690] W [client-rpc-fops.c:241:client3_3_mknod_cbk] 0-adsnet-gluster-01-client-1: remote operation failed: Invalid argument. Path: (null)
>>>>> Hey could you let us know "gluster volume info" output? Please let us know the backtrace printed by /var/log/glusterfs/glfsheal-<volname>.log as well.
>>>> Please attach /var/log/glusterfs/glfsheal-<volname>.log file to this thread so that I can take a look.
>>>>
>>>> Pranith
>>>>>
>>>>> Pranith
>>>>>>
>>>>>>
>>>>>> So, it seems like the files to be healed are not correctly identified, or at least their path is null.
>>>>>> Also, every time I issue a "gluster volume heal <volname> info" a core dump is generated in the log area.
>>>>>> All servers are using the latest CentOS 7.
>>>>>> Any idea why this might be happening and how to solve it?
>>>>>> Thanks,
>>>>>>
>>>>>>    Alessandro
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Gluster-users mailing list
>>>>>> Gluster-users at gluster.org
>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>

_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:

Expected results:

Additional info:

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.