[Gluster-users] Client un-mounting since upgrade to 3.12.9-1 version

Nithya Balachandran nbalacha at redhat.com
Thu Jun 14 04:37:50 UTC 2018


This is not the same issue as the one you are referring - that was in the
RPC layer and caused the bricks to crash. This one is different as it seems
to be in the dht and rda layers. It does look like a stack overflow though.

@Mohammad,

Please send the following information:

1. gluster volume info
2. The number of entries in the directory being listed
3. System memory

Does this still happen if you turn off parallel-readdir?

Regards,
Nithya




On 13 June 2018 at 16:40, Milind Changire <mchangir at redhat.com> wrote:

> +Nithya
>
> Nithya,
> Do these logs [1]  look similar to the recursive readdir() issue that you
> encountered just a while back ?
> i.e. recursive readdir() response definition in the XDR
>
> [1] http://www-pnp.physics.ox.ac.uk/~mohammad/backtrace.log
>
>
> On Wed, Jun 13, 2018 at 4:29 PM, mohammad kashif <kashif.alig at gmail.com>
> wrote:
>
>> Hi Milind
>>
>> Thanks a lot, I manage to run gdb and produced traceback as well. Its here
>>
>> http://www-pnp.physics.ox.ac.uk/~mohammad/backtrace.log
>>
>>
>> I am trying to understand but still not able to make sense out of it.
>>
>> Thanks
>>
>> Kashif
>>
>> On Wed, Jun 13, 2018 at 11:34 AM, Milind Changire <mchangir at redhat.com>
>> wrote:
>>
>>> Kashif,
>>> FYI: http://debuginfo.centos.org/centos/6/storage/x86_64/
>>>
>>>
>>> On Wed, Jun 13, 2018 at 3:21 PM, mohammad kashif <kashif.alig at gmail.com>
>>> wrote:
>>>
>>>> Hi Milind
>>>>
>>>> There is no glusterfs-debuginfo available for gluster-3.12 from
>>>> http://mirror.centos.org/centos/6/storage/x86_64/gluster-3.12/ repo.
>>>> Do you know from where I can get it?
>>>> Also when I run gdb, it says
>>>>
>>>> Missing separate debuginfos, use: debuginfo-install
>>>> glusterfs-fuse-3.12.9-1.el6.x86_64
>>>>
>>>> I can't find debug package for glusterfs-fuse either
>>>>
>>>> Thanks from the pit of despair ;)
>>>>
>>>> Kashif
>>>>
>>>>
>>>> On Tue, Jun 12, 2018 at 5:01 PM, mohammad kashif <kashif.alig at gmail.com
>>>> > wrote:
>>>>
>>>>> Hi Milind
>>>>>
>>>>> I will send you links for logs.
>>>>>
>>>>> I collected these core dumps at client and there is no glusterd
>>>>> process running on client.
>>>>>
>>>>> Kashif
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Jun 12, 2018 at 4:14 PM, Milind Changire <mchangir at redhat.com>
>>>>> wrote:
>>>>>
>>>>>> Kashif,
>>>>>> Could you also send over the client/mount log file as Vijay suggested
>>>>>> ?
>>>>>> Or maybe the lines with the crash backtrace lines
>>>>>>
>>>>>> Also, you've mentioned that you straced glusterd, but when you ran
>>>>>> gdb, you ran it over /usr/sbin/glusterfs
>>>>>>
>>>>>>
>>>>>> On Tue, Jun 12, 2018 at 8:19 PM, Vijay Bellur <vbellur at redhat.com>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jun 12, 2018 at 7:40 AM, mohammad kashif <
>>>>>>> kashif.alig at gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi Milind
>>>>>>>>
>>>>>>>> The operating system is Scientific Linux 6 which is based on RHEL6.
>>>>>>>> The cpu arch is Intel x86_64.
>>>>>>>>
>>>>>>>> I will send you a separate email with link to core dump.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> You could also grep for crash in the client log file and the lines
>>>>>>> following crash would have a backtrace in most cases.
>>>>>>>
>>>>>>> HTH,
>>>>>>> Vijay
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks for your help.
>>>>>>>>
>>>>>>>> Kashif
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jun 12, 2018 at 3:16 PM, Milind Changire <
>>>>>>>> mchangir at redhat.com> wrote:
>>>>>>>>
>>>>>>>>> Kashif,
>>>>>>>>> Could you share the core dump via Google Drive or something similar
>>>>>>>>>
>>>>>>>>> Also, let me know the CPU arch and OS Distribution on which you
>>>>>>>>> are running gluster.
>>>>>>>>>
>>>>>>>>> If you've installed the glusterfs-debuginfo package, you'll also
>>>>>>>>> get the source lines in the backtrace via gdb
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Jun 12, 2018 at 5:59 PM, mohammad kashif <
>>>>>>>>> kashif.alig at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Milind, Vijay
>>>>>>>>>>
>>>>>>>>>> Thanks, I have some more information now as I straced glusterd on
>>>>>>>>>> client
>>>>>>>>>>
>>>>>>>>>> 138544      0.000131 mprotect(0x7f2f70785000, 4096,
>>>>>>>>>> PROT_READ|PROT_WRITE) = 0 <0.000026>
>>>>>>>>>> 138544      0.000128 mprotect(0x7f2f70786000, 4096,
>>>>>>>>>> PROT_READ|PROT_WRITE) = 0 <0.000027>
>>>>>>>>>> 138544      0.000126 mprotect(0x7f2f70787000, 4096,
>>>>>>>>>> PROT_READ|PROT_WRITE) = 0 <0.000027>
>>>>>>>>>> 138544      0.000124 --- SIGSEGV {si_signo=SIGSEGV,
>>>>>>>>>> si_code=SEGV_ACCERR, si_addr=0x7f2f7c60ef88} ---
>>>>>>>>>> 138544      0.000051 --- SIGSEGV {si_signo=SIGSEGV,
>>>>>>>>>> si_code=SI_KERNEL, si_addr=0} ---
>>>>>>>>>> 138551      0.105048 +++ killed by SIGSEGV (core dumped) +++
>>>>>>>>>> 138550      0.000041 +++ killed by SIGSEGV (core dumped) +++
>>>>>>>>>> 138547      0.000008 +++ killed by SIGSEGV (core dumped) +++
>>>>>>>>>> 138546      0.000007 +++ killed by SIGSEGV (core dumped) +++
>>>>>>>>>> 138545      0.000007 +++ killed by SIGSEGV (core dumped) +++
>>>>>>>>>> 138544      0.000008 +++ killed by SIGSEGV (core dumped) +++
>>>>>>>>>> 138543      0.000007 +++ killed by SIGSEGV (core dumped) +++
>>>>>>>>>>
>>>>>>>>>> As for I understand that somehow gluster is trying to access
>>>>>>>>>> memory in appropriate manner and kernel sends SIGSEGV
>>>>>>>>>>
>>>>>>>>>> I also got the core dump. I am trying gdb first time so I am not
>>>>>>>>>> sure whether I am using it correctly
>>>>>>>>>>
>>>>>>>>>> gdb /usr/sbin/glusterfs core.138536
>>>>>>>>>>
>>>>>>>>>> It just tell me that program terminated with signal 11,
>>>>>>>>>> segmentation fault .
>>>>>>>>>>
>>>>>>>>>> The problem is not limited to one client but happening to many
>>>>>>>>>> clients.
>>>>>>>>>>
>>>>>>>>>> I will really appreciate any help as whole file system has become
>>>>>>>>>> unusable
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>>
>>>>>>>>>> Kashif
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Jun 12, 2018 at 12:26 PM, Milind Changire <
>>>>>>>>>> mchangir at redhat.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Kashif,
>>>>>>>>>>> You can change the log level by:
>>>>>>>>>>> $ gluster volume set <vol> diagnostics.brick-log-level TRACE
>>>>>>>>>>> $ gluster volume set <vol> diagnostics.client-log-level TRACE
>>>>>>>>>>>
>>>>>>>>>>> and see how things fare
>>>>>>>>>>>
>>>>>>>>>>> If you want fewer logs you can change the log-level to DEBUG
>>>>>>>>>>> instead of TRACE.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jun 12, 2018 at 3:37 PM, mohammad kashif <
>>>>>>>>>>> kashif.alig at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Vijay
>>>>>>>>>>>>
>>>>>>>>>>>> Now it is unmounting every 30 mins !
>>>>>>>>>>>>
>>>>>>>>>>>> The server log at /var/log/glusterfs/bricks/glusteratlas-brics001-gv0.log
>>>>>>>>>>>> have this line only
>>>>>>>>>>>>
>>>>>>>>>>>> 2018-06-12 09:53:19.303102] I [MSGID: 115013]
>>>>>>>>>>>> [server-helpers.c:289:do_fd_cleanup] 0-atlasglust-server: fd
>>>>>>>>>>>> cleanup on /atlas/atlasdata/zgubic/hmumu/
>>>>>>>>>>>> histograms/v14.3/Signal
>>>>>>>>>>>> [2018-06-12 09:53:19.306190] I [MSGID: 101055]
>>>>>>>>>>>> [client_t.c:443:gf_client_unref] 0-atlasglust-server: Shutting
>>>>>>>>>>>> down connection <server-name> -2224879-2018/06/12-09:51:01:4
>>>>>>>>>>>> 60889-atlasglust-client-0-0-0
>>>>>>>>>>>>
>>>>>>>>>>>> There is no other information. Is there any way to increase log
>>>>>>>>>>>> verbosity?
>>>>>>>>>>>>
>>>>>>>>>>>> on the client
>>>>>>>>>>>>
>>>>>>>>>>>> 2018-06-12 09:51:01.744980] I [MSGID: 114057]
>>>>>>>>>>>> [client-handshake.c:1478:select_server_supported_programs]
>>>>>>>>>>>> 0-atlasglust-client-5: Using Program GlusterFS 3.3, Num (1298437), Version
>>>>>>>>>>>> (330)
>>>>>>>>>>>> [2018-06-12 09:51:01.746508] I [MSGID: 114046]
>>>>>>>>>>>> [client-handshake.c:1231:client_setvolume_cbk]
>>>>>>>>>>>> 0-atlasglust-client-5: Connected to atlasglust-client-5, attached to remote
>>>>>>>>>>>> volume '/glusteratlas/brick006/gv0'.
>>>>>>>>>>>> [2018-06-12 09:51:01.746543] I [MSGID: 114047]
>>>>>>>>>>>> [client-handshake.c:1242:client_setvolume_cbk]
>>>>>>>>>>>> 0-atlasglust-client-5: Server and Client lk-version numbers are not same,
>>>>>>>>>>>> reopening the fds
>>>>>>>>>>>> [2018-06-12 09:51:01.746814] I [MSGID: 114035]
>>>>>>>>>>>> [client-handshake.c:202:client_set_lk_version_cbk]
>>>>>>>>>>>> 0-atlasglust-client-5: Server lk version = 1
>>>>>>>>>>>> [2018-06-12 09:51:01.748449] I [MSGID: 114057]
>>>>>>>>>>>> [client-handshake.c:1478:select_server_supported_programs]
>>>>>>>>>>>> 0-atlasglust-client-6: Using Program GlusterFS 3.3, Num (1298437), Version
>>>>>>>>>>>> (330)
>>>>>>>>>>>> [2018-06-12 09:51:01.750219] I [MSGID: 114046]
>>>>>>>>>>>> [client-handshake.c:1231:client_setvolume_cbk]
>>>>>>>>>>>> 0-atlasglust-client-6: Connected to atlasglust-client-6, attached to remote
>>>>>>>>>>>> volume '/glusteratlas/brick007/gv0'.
>>>>>>>>>>>> [2018-06-12 09:51:01.750261] I [MSGID: 114047]
>>>>>>>>>>>> [client-handshake.c:1242:client_setvolume_cbk]
>>>>>>>>>>>> 0-atlasglust-client-6: Server and Client lk-version numbers are not same,
>>>>>>>>>>>> reopening the fds
>>>>>>>>>>>> [2018-06-12 09:51:01.750503] I [MSGID: 114035]
>>>>>>>>>>>> [client-handshake.c:202:client_set_lk_version_cbk]
>>>>>>>>>>>> 0-atlasglust-client-6: Server lk version = 1
>>>>>>>>>>>> [2018-06-12 09:51:01.752207] I [fuse-bridge.c:4205:fuse_init]
>>>>>>>>>>>> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel
>>>>>>>>>>>> 7.14
>>>>>>>>>>>> [2018-06-12 09:51:01.752261] I [fuse-bridge.c:4835:fuse_graph_sync]
>>>>>>>>>>>> 0-fuse: switched to graph 0
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> is there a problem with server and client 1k version?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for your help.
>>>>>>>>>>>>
>>>>>>>>>>>> Kashif
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Jun 11, 2018 at 11:52 PM, Vijay Bellur <
>>>>>>>>>>>> vbellur at redhat.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Jun 11, 2018 at 8:50 AM, mohammad kashif <
>>>>>>>>>>>>> kashif.alig at gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Since I have updated our gluster server and client to latest
>>>>>>>>>>>>>> version 3.12.9-1, I am having this issue of gluster getting unmounted from
>>>>>>>>>>>>>> client very regularly. It was not a problem before update.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Its a distributed file system with no replication. We have
>>>>>>>>>>>>>> seven servers totaling around 480TB data. Its 97% full.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I am using following config on server
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> gluster volume set atlasglust features.cache-invalidation on
>>>>>>>>>>>>>> gluster volume set atlasglust features.cache-invalidation-timeout
>>>>>>>>>>>>>> 600
>>>>>>>>>>>>>> gluster volume set atlasglust performance.stat-prefetch on
>>>>>>>>>>>>>> gluster volume set atlasglust performance.cache-invalidation
>>>>>>>>>>>>>> on
>>>>>>>>>>>>>> gluster volume set atlasglust performance.md-cache-timeout 600
>>>>>>>>>>>>>> gluster volume set atlasglust performance.parallel-readdir on
>>>>>>>>>>>>>> gluster volume set atlasglust performance.cache-size 1GB
>>>>>>>>>>>>>> gluster volume set atlasglust performance.client-io-threads on
>>>>>>>>>>>>>> gluster volume set atlasglust cluster.lookup-optimize on
>>>>>>>>>>>>>> gluster volume set atlasglust performance.stat-prefetch on
>>>>>>>>>>>>>> gluster volume set atlasglust client.event-threads 4
>>>>>>>>>>>>>> gluster volume set atlasglust server.event-threads 4
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> clients are mounted with this option
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> defaults,direct-io-mode=disable,attribute-timeout=600,entry-
>>>>>>>>>>>>>> timeout=600,negative-timeout=600,fopen-keep-cache,rw,_netdev
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I can't see anything in the log file. Can someone suggest
>>>>>>>>>>>>>> that how to troubleshoot this issue?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Can you please share the log file? Checking for messages
>>>>>>>>>>>>> related to disconnections/crashes in the log file would be a good way to
>>>>>>>>>>>>> start troubleshooting the problem.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Vijay
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> Gluster-users mailing list
>>>>>>>>>>>> Gluster-users at gluster.org
>>>>>>>>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Milind
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Milind
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Milind
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Milind
>>>
>>>
>>
>
>
> --
> Milind
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180614/147c0600/attachment.html>


More information about the Gluster-users mailing list