[Gluster-users] Understanding client logs

Tue Jan 23 19:49:53 UTC 2018

Hi,
Yes, of cause...should have included it from start.
Yes, I know an old version, but I will rebuild a new cluster later on,
that is another story.

Client side:
Archlinux
glusterfs 1:3.10.1-1

Sever side:
Replicated cluster on two physical machines.
Both running:
Centos 7 3.10.0-514.16.1.el7.x86_64
Gluster glusterfs 3.8.11 from centos-gluster38

Typical user case(the one we have problem with now; typical):
Our users handle genomic evaluations, where loads of calculations
are done, intermediate results are saved to files (MB-GB size and
up to a hundred files),
and used for next calculation step where it is read from file,
calculated, written to file aso. a couple of times.
The lenght of these processes are about 8-12 hours and up to
processes running for up til about 72-96 hours.
For this run we had 12 clients (all connected to gluster and all
file read/writes done to gluster). On each client we had assign
3 cores to be used to run the processes, and most of the time all
3 cores were beeing used on all 12 clients.

Regards
Marcus

________________________________
Från: Milind Changire <mchangir at redhat.com>
Skickat: den 23 januari 2018 15:46
Till: Marcus Pedersén
Kopia: Gluster Users
Ämne: Re: [Gluster-users] Understanding client logs

Marcus,
Please paste the name-version-release of the primary glusterfs package on your system.

If possible, also describe the typical workload that happens at the mount via the user application.

On Tue, Jan 23, 2018 at 7:43 PM, Marcus Pedersén <marcus.pedersen at slu.se<mailto:marcus.pedersen at slu.se>> wrote:
Hi all,
I have problem pin pointing an error, that users of
my system experience processes that crash.
The thing that have changed since the craches started
is that I added a gluster cluster.
Of cause the users start to attack my gluster cluster.

I started looking at logs, starting from the client side.
I just need help to understand how to read it in the right way.
I can see that every ten minutes the client changes port and
attach to the remote volume. About five minutes later
the client unmounts the volume.
I guess that this is the "old" mount and that the "new" mount
is already responding to user interaction?

As this repeates every ten minutes I see this as normal behavior
and just want to get a better understanding on how the client
interacts with the cluster.

Have you experienced that this switch malfunctions and the
mount becomes unreachable for a while?

Many thanks in advance!

Best regards
Marcus Pederén

An example of the output:
[2017-11-09 10:10:39.776403] I [rpc-clnt.c:2000:rpc_clnt_reconfig] 0-interbull-interbull-client-1: changing port to 49160 (from 0)
[2017-11-09 10:10:39.776830] I [MSGID: 114057] [client-handshake.c:1451:select_server_supported_programs] 0-interbull-interbull-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2017-11-09 10:10:39.777642] I [MSGID: 114046] [client-handshake.c:1216:client_setvolume_cbk] 0-interbull-interbull-client-0: Connected to interbull-interbull-client-0, attached to remote volume '/interbullfs/i\
nterbull'.
[2017-11-09 10:10:39.777663] I [MSGID: 114047] [client-handshake.c:1227:client_setvolume_cbk] 0-interbull-interbull-client-0: Server and Client lk-version numbers are not same, reopening the fds
[2017-11-09 10:10:39.777724] I [MSGID: 108005] [afr-common.c:4756:afr_notify] 0-interbull-interbull-replicate-0: Subvolume 'interbull-interbull-client-0' came back up; going online.
[2017-11-09 10:10:39.777954] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-interbull-interbull-client-0: Server lk version = 1
[2017-11-09 10:10:39.779909] I [MSGID: 114057] [client-handshake.c:1451:select_server_supported_programs] 0-interbull-interbull-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2017-11-09 10:10:39.780481] I [MSGID: 114046] [client-handshake.c:1216:client_setvolume_cbk] 0-interbull-interbull-client-1: Connected to interbull-interbull-client-1, attached to remote volume '/interbullfs/i\
nterbull'.
[2017-11-09 10:10:39.780509] I [MSGID: 114047] [client-handshake.c:1227:client_setvolume_cbk] 0-interbull-interbull-client-1: Server and Client lk-version numbers are not same, reopening the fds
[2017-11-09 10:10:39.781544] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-interbull-interbull-client-1: Server lk version = 1
[2017-11-09 10:10:39.781608] I [fuse-bridge.c:4146:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.26
[2017-11-09 10:10:39.781632] I [fuse-bridge.c:4831:fuse_graph_sync] 0-fuse: switched to graph 0
[2017-11-09 10:16:10.609922] I [fuse-bridge.c:5089:fuse_thread_proc] 0-fuse: unmounting /interbull
[2017-11-09 10:16:10.610258] W [glusterfsd.c:1329:cleanup_and_exit] (-->/usr/lib/libpthread.so.0(+0x72e7) [0x7f98c02282e7] -->/usr/bin/glusterfs(glusterfs_sigwaiter+0xdd) [0x40890d] -->/usr/bin/glusterfs(cleanu\
p_and_exit+0x4b) [0x40878b] ) 0-: received signum (15), shutting down
[2017-11-09 10:16:10.610290] I [fuse-bridge.c:5802:fini] 0-fuse: Unmounting '/interbull'.
[2017-11-09 10:20:39.752079] I [MSGID: 100030] [glusterfsd.c:2460:main] 0-/usr/bin/glusterfs: Started running /usr/bin/glusterfs version 3.10.1 (args: /usr/bin/glusterfs --negative-timeout=60 --volfile-server=1\
92.168.67.31 --volfile-id=/interbull-interbull /interbull)
[2017-11-09 10:20:39.763902] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2017-11-09 10:20:39.768738] I [afr.c:94:fix_quorum_options] 0-interbull-interbull-replicate-0: reindeer: incoming qtype = none
[2017-11-09 10:20:39.768756] I [afr.c:116:fix_quorum_options] 0-interbull-interbull-replicate-0: reindeer: quorum_count = 0
[2017-11-09 10:20:39.768856] W [MSGID: 108040] [afr.c:315:afr_pending_xattrs_init] 0-interbull-interbull-replicate-0: Unable to fetch afr-pending-xattr option from volfile. Falling back to using client translat\
or names.
[2017-11-09 10:20:39.769832] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2
[2017-11-09 10:20:39.770193] I [MSGID: 114020] [client.c:2352:notify] 0-interbull-interbull-client-0: parent translators are ready, attempting connect on transport
[2017-11-09 10:20:39.773109] I [MSGID: 114020] [client.c:2352:notify] 0-interbull-interbull-client-1: parent translators are ready, attempting connect on transport
[2017-11-09 10:20:39.773712] I [rpc-clnt.c:2000:rpc_clnt_reconfig] 0-interbull-interbull-client-0: changing port to 49177 (from 0)

--
**************************************************
* Marcus Pedersén                                *
* System administrator                           *
**************************************************
* Interbull Centre                               *
* ================                               *
* Department of Animal Breeding & Genetics - SLU *
* Box 7023, SE-750 07                            *
* Uppsala, Sweden                                *
**************************************************
* Visiting address:                              *
* Room 55614, Ulls väg 26, Ultuna                *
* Uppsala                                        *
* Sweden                                         *
*                                                *
* Tel: +46-(0)18-67 1962                         *
*                                                *
**************************************************
*     ISO 9001 Bureau Veritas No SE004561-1      *
**************************************************
_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org<mailto:Gluster-users at gluster.org>
http://lists.gluster.org/mailman/listinfo/gluster-users

--
Milind

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180123/19fb56dd/attachment.html>