[Gluster-users] Client un-mounting since upgrade to 3.12.9-1 version

Tue Jun 12 14:16:56 UTC 2018

Kashif,
Could you share the core dump via Google Drive or something similar

Also, let me know the CPU arch and OS Distribution on which you are running
gluster.

If you've installed the glusterfs-debuginfo package, you'll also get the
source lines in the backtrace via gdb

On Tue, Jun 12, 2018 at 5:59 PM, mohammad kashif <kashif.alig at gmail.com>
wrote:

> Hi Milind, Vijay
>
> Thanks, I have some more information now as I straced glusterd on client
>
> 138544      0.000131 mprotect(0x7f2f70785000, 4096, PROT_READ|PROT_WRITE)
> = 0 <0.000026>
> 138544      0.000128 mprotect(0x7f2f70786000, 4096, PROT_READ|PROT_WRITE)
> = 0 <0.000027>
> 138544      0.000126 mprotect(0x7f2f70787000, 4096, PROT_READ|PROT_WRITE)
> = 0 <0.000027>
> 138544      0.000124 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_ACCERR,
> si_addr=0x7f2f7c60ef88} ---
> 138544      0.000051 --- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL,
> si_addr=0} ---
> 138551      0.105048 +++ killed by SIGSEGV (core dumped) +++
> 138550      0.000041 +++ killed by SIGSEGV (core dumped) +++
> 138547      0.000008 +++ killed by SIGSEGV (core dumped) +++
> 138546      0.000007 +++ killed by SIGSEGV (core dumped) +++
> 138545      0.000007 +++ killed by SIGSEGV (core dumped) +++
> 138544      0.000008 +++ killed by SIGSEGV (core dumped) +++
> 138543      0.000007 +++ killed by SIGSEGV (core dumped) +++
>
> As for I understand that somehow gluster is trying to access memory in
> appropriate manner and kernel sends SIGSEGV
>
> I also got the core dump. I am trying gdb first time so I am not sure
> whether I am using it correctly
>
> gdb /usr/sbin/glusterfs core.138536
>
> It just tell me that program terminated with signal 11, segmentation fault
> .
>
> The problem is not limited to one client but happening to many clients.
>
> I will really appreciate any help as whole file system has become unusable
>
> Thanks
>
> Kashif
>
>
>
>
> On Tue, Jun 12, 2018 at 12:26 PM, Milind Changire <mchangir at redhat.com>
> wrote:
>
>> Kashif,
>> You can change the log level by:
>> $ gluster volume set <vol> diagnostics.brick-log-level TRACE
>> $ gluster volume set <vol> diagnostics.client-log-level TRACE
>>
>> and see how things fare
>>
>> If you want fewer logs you can change the log-level to DEBUG instead of
>> TRACE.
>>
>>
>>
>> On Tue, Jun 12, 2018 at 3:37 PM, mohammad kashif <kashif.alig at gmail.com>
>> wrote:
>>
>>> Hi Vijay
>>>
>>> Now it is unmounting every 30 mins !
>>>
>>> The server log at /var/log/glusterfs/bricks/glusteratlas-brics001-gv0.log
>>> have this line only
>>>
>>> 2018-06-12 09:53:19.303102] I [MSGID: 115013]
>>> [server-helpers.c:289:do_fd_cleanup] 0-atlasglust-server: fd cleanup on
>>> /atlas/atlasdata/zgubic/hmumu/histograms/v14.3/Signal
>>> [2018-06-12 09:53:19.306190] I [MSGID: 101055]
>>> [client_t.c:443:gf_client_unref] 0-atlasglust-server: Shutting down
>>> connection <server-name> -2224879-2018/06/12-09:51:01:4
>>> 60889-atlasglust-client-0-0-0
>>>
>>> There is no other information. Is there any way to increase log
>>> verbosity?
>>>
>>> on the client
>>>
>>> 2018-06-12 09:51:01.744980] I [MSGID: 114057]
>>> [client-handshake.c:1478:select_server_supported_programs]
>>> 0-atlasglust-client-5: Using Program GlusterFS 3.3, Num (1298437), Version
>>> (330)
>>> [2018-06-12 09:51:01.746508] I [MSGID: 114046]
>>> [client-handshake.c:1231:client_setvolume_cbk] 0-atlasglust-client-5:
>>> Connected to atlasglust-client-5, attached to remote volume
>>> '/glusteratlas/brick006/gv0'.
>>> [2018-06-12 09:51:01.746543] I [MSGID: 114047]
>>> [client-handshake.c:1242:client_setvolume_cbk] 0-atlasglust-client-5:
>>> Server and Client lk-version numbers are not same, reopening the fds
>>> [2018-06-12 09:51:01.746814] I [MSGID: 114035]
>>> [client-handshake.c:202:client_set_lk_version_cbk]
>>> 0-atlasglust-client-5: Server lk version = 1
>>> [2018-06-12 09:51:01.748449] I [MSGID: 114057]
>>> [client-handshake.c:1478:select_server_supported_programs]
>>> 0-atlasglust-client-6: Using Program GlusterFS 3.3, Num (1298437), Version
>>> (330)
>>> [2018-06-12 09:51:01.750219] I [MSGID: 114046]
>>> [client-handshake.c:1231:client_setvolume_cbk] 0-atlasglust-client-6:
>>> Connected to atlasglust-client-6, attached to remote volume
>>> '/glusteratlas/brick007/gv0'.
>>> [2018-06-12 09:51:01.750261] I [MSGID: 114047]
>>> [client-handshake.c:1242:client_setvolume_cbk] 0-atlasglust-client-6:
>>> Server and Client lk-version numbers are not same, reopening the fds
>>> [2018-06-12 09:51:01.750503] I [MSGID: 114035]
>>> [client-handshake.c:202:client_set_lk_version_cbk]
>>> 0-atlasglust-client-6: Server lk version = 1
>>> [2018-06-12 09:51:01.752207] I [fuse-bridge.c:4205:fuse_init]
>>> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel
>>> 7.14
>>> [2018-06-12 09:51:01.752261] I [fuse-bridge.c:4835:fuse_graph_sync]
>>> 0-fuse: switched to graph 0
>>>
>>>
>>> is there a problem with server and client 1k version?
>>>
>>> Thanks for your help.
>>>
>>> Kashif
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Jun 11, 2018 at 11:52 PM, Vijay Bellur <vbellur at redhat.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Mon, Jun 11, 2018 at 8:50 AM, mohammad kashif <kashif.alig at gmail.com
>>>> > wrote:
>>>>
>>>>> Hi
>>>>>
>>>>> Since I have updated our gluster server and client to latest version
>>>>> 3.12.9-1, I am having this issue of gluster getting unmounted from client
>>>>> very regularly. It was not a problem before update.
>>>>>
>>>>> Its a distributed file system with no replication. We have seven
>>>>> servers totaling around 480TB data. Its 97% full.
>>>>>
>>>>> I am using following config on server
>>>>>
>>>>>
>>>>> gluster volume set atlasglust features.cache-invalidation on
>>>>> gluster volume set atlasglust features.cache-invalidation-timeout 600
>>>>> gluster volume set atlasglust performance.stat-prefetch on
>>>>> gluster volume set atlasglust performance.cache-invalidation on
>>>>> gluster volume set atlasglust performance.md-cache-timeout 600
>>>>> gluster volume set atlasglust performance.parallel-readdir on
>>>>> gluster volume set atlasglust performance.cache-size 1GB
>>>>> gluster volume set atlasglust performance.client-io-threads on
>>>>> gluster volume set atlasglust cluster.lookup-optimize on
>>>>> gluster volume set atlasglust performance.stat-prefetch on
>>>>> gluster volume set atlasglust client.event-threads 4
>>>>> gluster volume set atlasglust server.event-threads 4
>>>>>
>>>>> clients are mounted with this option
>>>>>
>>>>> defaults,direct-io-mode=disable,attribute-timeout=600,entry-
>>>>> timeout=600,negative-timeout=600,fopen-keep-cache,rw,_netdev
>>>>>
>>>>> I can't see anything in the log file. Can someone suggest that how to
>>>>> troubleshoot this issue?
>>>>>
>>>>>
>>>>>
>>>>
>>>> Can you please share the log file? Checking for messages related to
>>>> disconnections/crashes in the log file would be a good way to start
>>>> troubleshooting the problem.
>>>>
>>>> Thanks,
>>>> Vijay
>>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>>
>>
>> --
>> Milind
>>
>>
>

-- 
Milind
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180612/0e5b2b3a/attachment.html>