[Gluster-users] Client un-mounting since upgrade to 3.12.9-1 version

mohammad kashif kashif.alig at gmail.com
Tue Jun 12 14:40:01 UTC 2018


Hi Milind

The operating system is Scientific Linux 6 which is based on RHEL6. The cpu
arch is Intel x86_64.

I will send you a separate email with link to core dump.

Thanks for your help.

Kashif


On Tue, Jun 12, 2018 at 3:16 PM, Milind Changire <mchangir at redhat.com>
wrote:

> Kashif,
> Could you share the core dump via Google Drive or something similar
>
> Also, let me know the CPU arch and OS Distribution on which you are
> running gluster.
>
> If you've installed the glusterfs-debuginfo package, you'll also get the
> source lines in the backtrace via gdb
>
>
>
> On Tue, Jun 12, 2018 at 5:59 PM, mohammad kashif <kashif.alig at gmail.com>
> wrote:
>
>> Hi Milind, Vijay
>>
>> Thanks, I have some more information now as I straced glusterd on client
>>
>> 138544      0.000131 mprotect(0x7f2f70785000, 4096, PROT_READ|PROT_WRITE)
>> = 0 <0.000026>
>> 138544      0.000128 mprotect(0x7f2f70786000, 4096, PROT_READ|PROT_WRITE)
>> = 0 <0.000027>
>> 138544      0.000126 mprotect(0x7f2f70787000, 4096, PROT_READ|PROT_WRITE)
>> = 0 <0.000027>
>> 138544      0.000124 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_ACCERR,
>> si_addr=0x7f2f7c60ef88} ---
>> 138544      0.000051 --- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL,
>> si_addr=0} ---
>> 138551      0.105048 +++ killed by SIGSEGV (core dumped) +++
>> 138550      0.000041 +++ killed by SIGSEGV (core dumped) +++
>> 138547      0.000008 +++ killed by SIGSEGV (core dumped) +++
>> 138546      0.000007 +++ killed by SIGSEGV (core dumped) +++
>> 138545      0.000007 +++ killed by SIGSEGV (core dumped) +++
>> 138544      0.000008 +++ killed by SIGSEGV (core dumped) +++
>> 138543      0.000007 +++ killed by SIGSEGV (core dumped) +++
>>
>> As for I understand that somehow gluster is trying to access memory in
>> appropriate manner and kernel sends SIGSEGV
>>
>> I also got the core dump. I am trying gdb first time so I am not sure
>> whether I am using it correctly
>>
>> gdb /usr/sbin/glusterfs core.138536
>>
>> It just tell me that program terminated with signal 11, segmentation
>> fault .
>>
>> The problem is not limited to one client but happening to many clients.
>>
>> I will really appreciate any help as whole file system has become
>> unusable
>>
>> Thanks
>>
>> Kashif
>>
>>
>>
>>
>> On Tue, Jun 12, 2018 at 12:26 PM, Milind Changire <mchangir at redhat.com>
>> wrote:
>>
>>> Kashif,
>>> You can change the log level by:
>>> $ gluster volume set <vol> diagnostics.brick-log-level TRACE
>>> $ gluster volume set <vol> diagnostics.client-log-level TRACE
>>>
>>> and see how things fare
>>>
>>> If you want fewer logs you can change the log-level to DEBUG instead of
>>> TRACE.
>>>
>>>
>>>
>>> On Tue, Jun 12, 2018 at 3:37 PM, mohammad kashif <kashif.alig at gmail.com>
>>> wrote:
>>>
>>>> Hi Vijay
>>>>
>>>> Now it is unmounting every 30 mins !
>>>>
>>>> The server log at /var/log/glusterfs/bricks/glusteratlas-brics001-gv0.log
>>>> have this line only
>>>>
>>>> 2018-06-12 09:53:19.303102] I [MSGID: 115013]
>>>> [server-helpers.c:289:do_fd_cleanup] 0-atlasglust-server: fd cleanup
>>>> on /atlas/atlasdata/zgubic/hmumu/histograms/v14.3/Signal
>>>> [2018-06-12 09:53:19.306190] I [MSGID: 101055]
>>>> [client_t.c:443:gf_client_unref] 0-atlasglust-server: Shutting down
>>>> connection <server-name> -2224879-2018/06/12-09:51:01:4
>>>> 60889-atlasglust-client-0-0-0
>>>>
>>>> There is no other information. Is there any way to increase log
>>>> verbosity?
>>>>
>>>> on the client
>>>>
>>>> 2018-06-12 09:51:01.744980] I [MSGID: 114057]
>>>> [client-handshake.c:1478:select_server_supported_programs]
>>>> 0-atlasglust-client-5: Using Program GlusterFS 3.3, Num (1298437), Version
>>>> (330)
>>>> [2018-06-12 09:51:01.746508] I [MSGID: 114046]
>>>> [client-handshake.c:1231:client_setvolume_cbk] 0-atlasglust-client-5:
>>>> Connected to atlasglust-client-5, attached to remote volume
>>>> '/glusteratlas/brick006/gv0'.
>>>> [2018-06-12 09:51:01.746543] I [MSGID: 114047]
>>>> [client-handshake.c:1242:client_setvolume_cbk] 0-atlasglust-client-5:
>>>> Server and Client lk-version numbers are not same, reopening the fds
>>>> [2018-06-12 09:51:01.746814] I [MSGID: 114035]
>>>> [client-handshake.c:202:client_set_lk_version_cbk]
>>>> 0-atlasglust-client-5: Server lk version = 1
>>>> [2018-06-12 09:51:01.748449] I [MSGID: 114057]
>>>> [client-handshake.c:1478:select_server_supported_programs]
>>>> 0-atlasglust-client-6: Using Program GlusterFS 3.3, Num (1298437), Version
>>>> (330)
>>>> [2018-06-12 09:51:01.750219] I [MSGID: 114046]
>>>> [client-handshake.c:1231:client_setvolume_cbk] 0-atlasglust-client-6:
>>>> Connected to atlasglust-client-6, attached to remote volume
>>>> '/glusteratlas/brick007/gv0'.
>>>> [2018-06-12 09:51:01.750261] I [MSGID: 114047]
>>>> [client-handshake.c:1242:client_setvolume_cbk] 0-atlasglust-client-6:
>>>> Server and Client lk-version numbers are not same, reopening the fds
>>>> [2018-06-12 09:51:01.750503] I [MSGID: 114035]
>>>> [client-handshake.c:202:client_set_lk_version_cbk]
>>>> 0-atlasglust-client-6: Server lk version = 1
>>>> [2018-06-12 09:51:01.752207] I [fuse-bridge.c:4205:fuse_init]
>>>> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel
>>>> 7.14
>>>> [2018-06-12 09:51:01.752261] I [fuse-bridge.c:4835:fuse_graph_sync]
>>>> 0-fuse: switched to graph 0
>>>>
>>>>
>>>> is there a problem with server and client 1k version?
>>>>
>>>> Thanks for your help.
>>>>
>>>> Kashif
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Jun 11, 2018 at 11:52 PM, Vijay Bellur <vbellur at redhat.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Mon, Jun 11, 2018 at 8:50 AM, mohammad kashif <
>>>>> kashif.alig at gmail.com> wrote:
>>>>>
>>>>>> Hi
>>>>>>
>>>>>> Since I have updated our gluster server and client to latest version
>>>>>> 3.12.9-1, I am having this issue of gluster getting unmounted from client
>>>>>> very regularly. It was not a problem before update.
>>>>>>
>>>>>> Its a distributed file system with no replication. We have seven
>>>>>> servers totaling around 480TB data. Its 97% full.
>>>>>>
>>>>>> I am using following config on server
>>>>>>
>>>>>>
>>>>>> gluster volume set atlasglust features.cache-invalidation on
>>>>>> gluster volume set atlasglust features.cache-invalidation-timeout 600
>>>>>> gluster volume set atlasglust performance.stat-prefetch on
>>>>>> gluster volume set atlasglust performance.cache-invalidation on
>>>>>> gluster volume set atlasglust performance.md-cache-timeout 600
>>>>>> gluster volume set atlasglust performance.parallel-readdir on
>>>>>> gluster volume set atlasglust performance.cache-size 1GB
>>>>>> gluster volume set atlasglust performance.client-io-threads on
>>>>>> gluster volume set atlasglust cluster.lookup-optimize on
>>>>>> gluster volume set atlasglust performance.stat-prefetch on
>>>>>> gluster volume set atlasglust client.event-threads 4
>>>>>> gluster volume set atlasglust server.event-threads 4
>>>>>>
>>>>>> clients are mounted with this option
>>>>>>
>>>>>> defaults,direct-io-mode=disable,attribute-timeout=600,entry-
>>>>>> timeout=600,negative-timeout=600,fopen-keep-cache,rw,_netdev
>>>>>>
>>>>>> I can't see anything in the log file. Can someone suggest that how to
>>>>>> troubleshoot this issue?
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> Can you please share the log file? Checking for messages related to
>>>>> disconnections/crashes in the log file would be a good way to start
>>>>> troubleshooting the problem.
>>>>>
>>>>> Thanks,
>>>>> Vijay
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>
>>>
>>>
>>> --
>>> Milind
>>>
>>>
>>
>
>
> --
> Milind
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180612/399f4212/attachment.html>


More information about the Gluster-users mailing list