[Gluster-users] Volume heal info not reporting files in split brain and core dumping, after upgrading to 3.7.0

Pranith Kumar Karampuri pkarampu at redhat.com
Fri May 29 09:54:14 UTC 2015



On 05/29/2015 03:16 PM, Alessandro De Salvo wrote:
> Hi Pranith,
> I’m definitely sure the log is correct, but you are also correct when 
> you say there is no sign of crash (even checking with grep!).
> However I see core dumps (e.g. core.19430) in /var/log/gluster) 
> created every time I issue the heal info command.
> From gdb I see this:
Thanks for providing the information Alessandro. We will fix this issue. 
I am wondering how we can unblock you in the interim. There is a plan to 
release 3.7.1 in 2-3 days I think. I can try to make this fix for that 
release. Let me know if you can wait that long? Another possibility is 
to compile just glfsheal binary with the fix which "gluster volume heal 
<volname> info" internally. Let me know.

Pranith.
>
>
> GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-64.el7
> Copyright (C) 2013 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later 
> <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from /usr/sbin/glfsheal...Reading symbols from 
> /usr/lib/debug/usr/sbin/glfsheal.debug...done.
> done.
> [New LWP 19430]
> [New LWP 19431]
> [New LWP 19434]
> [New LWP 19436]
> [New LWP 19433]
> [New LWP 19437]
> [New LWP 19432]
> [New LWP 19435]
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> Core was generated by `/usr/sbin/glfsheal adsnet-vm-01'.
> Program terminated with signal 11, Segmentation fault.
> #0  inode_unref (inode=0x7f7a1e27806c) at inode.c:499
> 499             table = inode->table;
> (gdb) bt
> #0  inode_unref (inode=0x7f7a1e27806c) at inode.c:499
> #1  0x00007f7a265e8a61 in fini (this=<optimized out>) at qemu-block.c:1092
> #2  0x00007f7a39a53791 in xlator_fini_rec (xl=0x7f7a2000b9a0) at 
> xlator.c:463
> #3  0x00007f7a39a53725 in xlator_fini_rec (xl=0x7f7a2000d450) at 
> xlator.c:453
> #4  0x00007f7a39a53725 in xlator_fini_rec (xl=0x7f7a2000e800) at 
> xlator.c:453
> #5  0x00007f7a39a53725 in xlator_fini_rec (xl=0x7f7a2000fbb0) at 
> xlator.c:453
> #6  0x00007f7a39a53725 in xlator_fini_rec (xl=0x7f7a20010f80) at 
> xlator.c:453
> #7  0x00007f7a39a53725 in xlator_fini_rec (xl=0x7f7a20012330) at 
> xlator.c:453
> #8  0x00007f7a39a53725 in xlator_fini_rec (xl=0x7f7a200136e0) at 
> xlator.c:453
> #9  0x00007f7a39a53725 in xlator_fini_rec (xl=0x7f7a20014b30) at 
> xlator.c:453
> #10 0x00007f7a39a53725 in xlator_fini_rec (xl=0x7f7a20015fc0) at 
> xlator.c:453
> #11 0x00007f7a39a54eea in xlator_tree_fini (xl=<optimized out>) at 
> xlator.c:545
> #12 0x00007f7a39a90b25 in glusterfs_graph_deactivate (graph=<optimized 
> out>) at graph.c:340
> #13 0x00007f7a38d50e3c in pub_glfs_fini (fs=fs at entry=0x7f7a3a6b6010) 
> at glfs.c:1155
> #14 0x00007f7a39f18ed4 in main (argc=<optimized out>, argv=<optimized 
> out>) at glfs-heal.c:821
>
>
> Thanks,
>
> Alessandro
>
>> Il giorno 29/mag/2015, alle ore 11:12, Pranith Kumar Karampuri 
>> <pkarampu at redhat.com <mailto:pkarampu at redhat.com>> ha scritto:
>>
>>
>>
>> On 05/29/2015 02:37 PM, Alessandro De Salvo wrote:
>>> Hi Pranith,
>>> many thanks for the help!
>>> The volume info of the problematic volume is the following:
>>>
>>> # gluster volume info adsnet-vm-01
>>> Volume Name: adsnet-vm-01
>>> Type: Replicate
>>> Volume ID: f8f615df-3dde-4ea6-9bdb-29a1706e864c
>>> Status: Started
>>> Number of Bricks: 1 x 2 = 2
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: gwads02.sta.adsnet.it 
>>> <http://gwads02.sta.adsnet.it/>:/gluster/vm01/data
>>> Brick2: gwads03.sta.adsnet.it 
>>> <http://gwads03.sta.adsnet.it/>:/gluster/vm01/data
>>> Options Reconfigured:
>>> nfs.disable: true
>>> features.barrier: disable
>>> features.file-snapshot: on
>>> server.allow-insecure: on
>> Are you sure the attached log is correct? I do not see any backtrace 
>> in the log file to indicate there is a crash :-(. Could you do "grep 
>> -i crash /var/log/glusterfs/*" to see if there is some other file 
>> with the crash. If that also fails, will it be possible for you to 
>> provide the backtrace of the core by opening it using gdb?
>>
>> Pranith
>>>
>>> The log is in attachment.
>>> I just wanted to add that the heal info command works fine on other 
>>> volumes hosted by the same machines, so it’s just this volume which 
>>> is causing problems.
>>> Thanks,
>>>
>>> Alessandro
>>>
>>>
>>>
>>>
>>>> Il giorno 29/mag/2015, alle ore 10:50, Pranith Kumar Karampuri 
>>>> <pkarampu at redhat.com <mailto:pkarampu at redhat.com>> ha scritto:
>>>>
>>>>
>>>>
>>>> On 05/29/2015 02:18 PM, Pranith Kumar Karampuri wrote:
>>>>>
>>>>>
>>>>> On 05/29/2015 02:13 PM, Alessandro De Salvo wrote:
>>>>>> Hi,
>>>>>> I'm facing a strange issue with split brain reporting.
>>>>>> I have upgraded to 3.7.0, after stopping all gluster processes as 
>>>>>> described in the twiki, on all servers hosting the volumes. The 
>>>>>> upgrade and the restart was fine, and the volumes are accessible.
>>>>>> However I had two files in split brain that I did not heal before 
>>>>>> upgrading, so I tried a full heal with 3.7.0. The heal was 
>>>>>> launched correctly, but when I now perform an heal info there is 
>>>>>> no output, while the heal statistics says there are actually 2 
>>>>>> files in split brain. In the logs I see something like this:
>>>>>>
>>>>>> glustershd.log:
>>>>>> [2015-05-29 08:28:43.008373] I 
>>>>>> [afr-self-heal-entry.c:558:afr_selfheal_entry_do] 
>>>>>> 0-adsnet-gluster-01-replicate-0: performing entry selfheal on 
>>>>>> 7fd1262d-949b-402e-96c2-ae487c8d4e27
>>>>>> [2015-05-29 08:28:43.012690] W 
>>>>>> [client-rpc-fops.c:241:client3_3_mknod_cbk] 
>>>>>> 0-adsnet-gluster-01-client-1: remote operation failed: Invalid 
>>>>>> argument. Path: (null)
>>>>> Hey could you let us know "gluster volume info" output? Please let 
>>>>> us know the backtrace printed by 
>>>>> /var/log/glusterfs/glfsheal-<volname>.log as well.
>>>> Please attach /var/log/glusterfs/glfsheal-<volname>.log file to 
>>>> this thread so that I can take a look.
>>>>
>>>> Pranith
>>>>>
>>>>> Pranith
>>>>>>
>>>>>>
>>>>>> So, it seems like the files to be healed are not correctly 
>>>>>> identified, or at least their path is null.
>>>>>> Also, every time I issue a "gluster volume heal <volname> info" a 
>>>>>> core dump is generated in the log area.
>>>>>> All servers are using the latest CentOS 7.
>>>>>> Any idea why this might be happening and how to solve it?
>>>>>> Thanks,
>>>>>>
>>>>>>    Alessandro
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Gluster-users mailing list
>>>>>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150529/673fdd83/attachment.html>


More information about the Gluster-users mailing list