[Gluster-users] Client un-mounting since upgrade to 3.12.9-1 version
Jim Kinney
jim.kinney at gmail.com
Thu Jun 14 14:30:51 UTC 2018
Hmm. I have a 3.12.9 volume (several) with 3.12.9 clients that are
dropping the mount yet parallel.readdir is off. This is only happening
on the RDMA interface. The TCP transport mounts are fine.
Option Value
------ --
--- cluster.lookup-
unhashed on cluste
r.lookup-
optimize off cluste
r.min-free-
disk 10% cluster.
min-free-
inodes 5% cluster.
rebalance-
stats off cluster.s
ubvols-per-
directory (null) cluster.rea
ddir-
optimize off cluster
.rsync-hash-
regex (null) cluster.ex
tra-hash-
regex (null) cluster.dh
t-xattr-
name trusted.glusterfs.dht cluster.r
andomize-hash-range-by-
gfid off cluster.rebal-
throttle normal clust
er.lock-
migration off clus
ter.local-volume-
name (null) cluster.weig
hted-
rebalance on cluster.
switch-
pattern (null) cluste
r.entry-change-
log on cluster.read
-subvolume (null) clu
ster.read-subvolume-index -
1 cluster.read-hash-
mode 1 cluster.b
ackground-self-heal-
count 8 cluster.metadata-
self-
heal on cluster.data-
self-
heal on cluster.e
ntry-self-
heal on cluster.se
lf-heal-
daemon enable cluster.h
eal-
timeout 600 clus
ter.self-heal-window-
size 1 cluster.data-
change-
log on cluster.met
adata-change-
log on cluster.data-
self-heal-
algorithm (null) cluster.eager-
lock on dispe
rse.eager-
lock on cluste
r.quorum-
type none cluste
r.quorum-
count (null) cluste
r.choose-
local true cluste
r.self-heal-readdir-
size 1KB cluster.post-op-
delay-
secs 1 cluster.ensur
e-
durability on cluste
r.consistent-
metadata no cluster.he
al-wait-queue-
length 128 cluster.favorit
e-child-
policy none cluster.stripe
-block-
size 128KB cluster.stri
pe-
coalesce true diagno
stics.latency-
measurement off diagnostics
.dump-fd-
stats off diagnostics
.count-fop-
hits off diagnostics.b
rick-log-
level INFO diagnostics.c
lient-log-
level INFO diagnostics.br
ick-sys-log-
level CRITICAL diagnostics.clien
t-sys-log-
level CRITICAL diagnostics.brick-
logger (null) diagnosti
cs.client-
logger (null) diagnostic
s.brick-log-
format (null) diagnostics.c
lient-log-
format (null) diagnostics.br
ick-log-buf-
size 5 diagnostics.clien
t-log-buf-
size 5 diagnostics.brick-
log-flush-
timeout 120 diagnostics.client-
log-flush-
timeout 120 diagnostics.stats-
dump-
interval 0 diagnostics.fo
p-sample-
interval 0 diagnostics.st
ats-dump-
format json diagnostics.fo
p-sample-buf-
size 65535 diagnostics.stats-
dnscache-ttl-
sec 86400 performance.cache-max-
file-
size 0 performance.cache-
min-file-
size 0 performance.cache-
refresh-
timeout 1 performance.cache
-priority performa
nce.cache-
size 32MB performan
ce.io-thread-
count 16 performance.h
igh-prio-
threads 16 performance.n
ormal-prio-
threads 16 performance.low
-prio-
threads 16 performance.
least-prio-
threads 1 performance.en
able-least-
priority on performance.cach
e-
size 128MB performan
ce.flush-
behind on performan
ce.nfs.flush-
behind on performance.w
rite-behind-window-
size 1MB performance.resync-
failed-syncs-after-
fsyncoff performance.nfs.write-
behind-window-
size1MB performance.strict-o-
direct off performance.
nfs.strict-o-
direct off performance.stri
ct-write-
ordering off performance.nfs.
strict-write-
ordering off performance.lazy-
open yes performa
nce.read-after-
open no performance.re
ad-ahead-page-
count 4 performance.md-
cache-
timeout 1 performance.
cache-swift-
metadata true performance.cac
he-samba-
metadata false performance.cac
he-capability-
xattrs true performance.cache-
ima-
xattrs true features.encr
yption off encr
yption.master-
key (null) encryptio
n.data-key-
size 256 encryption.
block-
size 4096 network.
frame-
timeout 1800 netwo
rk.ping-
timeout 42 netw
ork.tcp-window-
size (null) features.l
ock-
heal off featu
res.grace-
timeout 10 networ
k.remote-
dio disable client
.event-
threads 2 clie
nt.tcp-user-
timeout 0 client.
keepalive-
time 20 client.k
eepalive-
interval 2 client.k
eepalive-
count 9 network.
tcp-window-
size (null) network.in
ode-lru-
limit 16384 auth.allo
w *
auth.reject (null)
transport.keepalive 1
server.allow-
insecure (null) serv
er.root-
squash off ser
ver.anonuid 65534
server.anongid 65534
server.statedump-
path /var/run/gluster server.o
utstanding-rpc-
limit 64 features.lock-
heal off featu
res.grace-
timeout 10 server
.ssl (null)
auth.ssl-
allow *
server.manage-
gids off serve
r.dynamic-
auth on client
.send-
gids on ser
ver.gid-
timeout 300 se
rver.own-
thread (null) se
rver.event-
threads 1 serv
er.tcp-user-
timeout 0 server.
keepalive-
time 20 server.k
eepalive-
interval 2 server.k
eepalive-
count 9 transpor
t.listen-
backlog 10 ssl.own-
cert (null)
ssl.private-
key (null) ssl
.ca-
list (null)
ssl.crl-
path (null)
ssl.certificate-
depth (null) ssl.cip
her-
list (null) ss
l.dh-
param (null)
ssl.ec-
curve (null)
performance.write-
behind on performan
ce.read-
ahead on performa
nce.readdir-
ahead on performance
.io-
cache on perfor
mance.quick-
read on performan
ce.open-
behind on performa
nce.nl-
cache off perfor
mance.stat-
prefetch on performa
nce.client-io-
threads on performance.n
fs.write-
behind on performance.n
fs.read-
ahead off performance.
nfs.io-
cache off performanc
e.nfs.quick-
read off performance.n
fs.stat-
prefetch off performance.
nfs.io-
threads off performanc
e.force-
readdirp true performan
ce.cache-
invalidation false features.
uss off
features.snapshot-
directory .snaps features.
show-snapshot-
directory off network.compre
ssion off netwo
rk.compression.window-size -
15 network.compression.mem-
level 8 network.compres
sion.min-
size 0 network.compres
sion.compression-level -
1 network.compression.debug
false features.limit-
usage (null) featur
es.default-soft-
limit 80% features.soft
-timeout 60 feat
ures.hard-
timeout 5 featu
res.alert-
time 86400 featur
es.quota-deem-
statfs off geo-
replication.indexing off
geo-
replication.indexing off
geo-replication.ignore-pid-
check off geo-
replication.ignore-pid-
check off features.quota
off features.
inode-
quota off featur
es.bitrot disable
debug.trace off
debug.log-
history no d
ebug.log-
file no d
ebug.exclude-
ops (null) debug
.include-
ops (null) debug
.error-
gen off deb
ug.error-
failure (null) deb
ug.error-
number (null) deb
ug.random-
failure off debu
g.error-
fops (null) nfs
.disable off
features.read-
only off featu
res.worm off
features.worm-file-
level off features.d
efault-retention-
period 120 features.retention
-mode relax features.
auto-commit-
period 180 storage.linu
x-
aio off stora
ge.batch-fsync-mode reverse-
fsync storage.batch-fsync-delay-
usec 0 storage.owner-
uid -
1 storage.owner-
gid -
1 storage.node-uuid-
pathinfo off storage.h
ealth-check-
interval 30 storage.buil
d-
pgfid off stora
ge.gfid2path on
storage.gfid2path-
separator : storage.b
d-
aio off cl
uster.server-quorum-
type off cluster.serve
r-quorum-
ratio 0 changelog.cha
ngelog off chan
gelog.changelog-
dir (null) changelog.e
ncoding ascii ch
angelog.rollover-
time 15 changelog.
fsync-
interval 5 changel
og.changelog-barrier-
timeout 120 changelog.capture-
del-
path off features.barr
ier disable feat
ures.barrier-
timeout 120 features
.trash off
features.trash-
dir .trashcan featur
es.trash-eliminate-
path (null) features.trash-
max-
filesize 5MB features.t
rash-internal-
op off cluster.enable-
shared-
storage disable cluster.write
-freq-
threshold 0 cluster.re
ad-freq-
threshold 0 cluster.t
ier-
pause off clus
ter.tier-promote-
frequency 120 cluster.tier
-demote-
frequency 3600 cluster.wat
ermark-
hi 90 cluster.w
atermark-
low 75 cluster.t
ier-
mode cache clus
ter.tier-max-promote-file-
size 0 cluster.tier-max-
mb 4000 cluster.
tier-max-
files 10000 cluster.
tier-query-
limit 100 cluster.ti
er-
compact on clus
ter.tier-hot-compact-
frequency 604800 cluster.tier-
cold-compact-
frequency 604800 features.ctr-
enabled off feat
ures.record-
counters off feature
s.ctr-record-metadata-
heat off features.ctr_link_co
nsistency off features.ct
r_lookupheal_link_timeout 300 fe
atures.ctr_lookupheal_inode_timeout 300
features.ctr-sql-db-
cachesize 12500 features.ct
r-sql-db-wal-
autocheckpoint 25000 features.selinu
x on locks.
trace off
locks.mandatory-
locking off cluster
.disperse-self-heal-
daemon enable cluster.quorum-
reads no client
.bind-
insecure (null) fea
tures.shard off
features.shard-block-
size 64MB features.scr
ub-
throttle lazy featur
es.scrub-
freq biweekly featur
es.scrub false
features.expiry-
time 120 feature
s.cache-
invalidation off featur
es.cache-invalidation-
timeout 60 features.leases
off features.l
ease-lock-recall-
timeout 60 disperse.backgroun
d-
heals 8 disperse.he
al-wait-
qlength 128 cluster.he
al-
timeout 600 dht.
force-
readdirp on d
isperse.read-policy gfid-
hash cluster.shd-max-
threads 1 cluster
.shd-wait-
qlength 1024 cluster.
locking-
scheme full cluster
.granular-entry-
heal no features.locks
-revocation-
secs 0 features.locks-
revocation-clear-
all false features.locks-
revocation-max-
blocked 0 features.locks-
monkey-
unlocking false disperse.shd-
max-
threads 1 disperse
.shd-wait-
qlength 1024 disperse.
cpu-
extensions auto disp
erse.self-heal-window-
size 1 cluster.use-
compound-
fops off performance.
parallel-
readdir off performance.
rda-request-
size 131072 performance.rda
-low-
wmark 4096 performance
.rda-high-
wmark 128KB performance.
rda-cache-
limit 10MB performance.n
l-cache-positive-
entry false performance.nl-cache-
limit 10MB performance.
nl-cache-
timeout 60 cluster.bric
k-
multiplex off clust
er.max-bricks-per-
process 0 disperse.optim
istic-change-
log on cluster.halo-
enabled False clus
ter.halo-shd-max-
latency 99999 cluster.halo
-nfsd-max-
latency 5 cluster.halo-
max-
latency 5 cluster.
halo-max-
replicas 99999 cluster.
halo-min-replicas 2
On Thu, 2018-06-14 at 12:12 +0100, mohammad kashif wrote:
> Hi Nithya
>
> It seems that problem can be solved by either turning parallel-readir
> off or downgrading client to 3.10.12-1 . Yesterday I downgraded some
> clients to 3.10.12-1 and it seems to fixed the problem. Today when I
> saw your email then I disabled parallel-readir off and the current
> client 3.12.9-1 started to work. I upgraded server and clients to
> 3.12.9-1 last month and since then clients were intermittently
> unmounting once in a week. But during last three days, it started
> unmounting every few minutes. I don't know that what triggered this
> sudden panic except that file system was quite full; around 98%. It
> is 480 TB file system. The file system has almost 80 Million files.
>
> Servers have 64GB RAM and clients have 64GB to 192GB RAM. I tested
> with 192GB RAM client and it still had the same issue.
>
>
> Volume Name: atlasglust
> Type: Distribute
> Volume ID: fbf0ebb8-deab-4388-9d8a-f722618a624b
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 7
> Transport-type: tcp
> Bricks:
> Brick1: pplxgluster01.X.Y.Z/glusteratlas/brick001/gv0
> Brick2: pplxgluster02.X.Y.Z:/glusteratlas/brick002/gv0
> Brick3: pplxgluster03.X.Y.Z:/glusteratlas/brick003/gv0
> Brick4: pplxgluster04.X.Y.Z:/glusteratlas/brick004/gv0
> Brick5: pplxgluster05.X.Y.Z:/glusteratlas/brick005/gv0
> Brick6: pplxgluster06.X.Y.Z:/glusteratlas/brick006/gv0
> Brick7: pplxgluster07.X.Y.Z:/glusteratlas/brick007/gv0
> Options Reconfigured:
> diagnostics.client-log-level: ERROR
> diagnostics.brick-log-level: ERROR
> performance.cache-invalidation: on
> server.event-threads: 4
> client.event-threads: 4
> cluster.lookup-optimize: on
> performance.client-io-threads: on
> performance.cache-size: 1GB
> performance.parallel-readdir: off
> performance.md-cache-timeout: 600
> performance.stat-prefetch: on
> features.cache-invalidation-timeout: 600
> features.cache-invalidation: on
> auth.allow: X.Y.Z.*
> transport.address-family: inet
> performance.readdir-ahead: on
> nfs.disable: on
>
>
> Thanks
>
> Kashif
>
> On Thu, Jun 14, 2018 at 5:39 AM, Nithya Balachandran <nbalacha at redhat
> .com> wrote:
> > +Poornima who works on parallel-readdir.
> > @Poornima, Have you seen anything like this before?
> >
> > On 14 June 2018 at 10:07, Nithya Balachandran <nbalacha at redhat.com>
> > wrote:
> > > This is not the same issue as the one you are referring - that
> > > was in the RPC layer and caused the bricks to crash. This one is
> > > different as it seems to be in the dht and rda layers. It does
> > > look like a stack overflow though.
> > > @Mohammad,
> > >
> > > Please send the following information:
> > >
> > > 1. gluster volume info
> > > 2. The number of entries in the directory being listed
> > > 3. System memory
> > >
> > > Does this still happen if you turn off parallel-readdir?
> > >
> > > Regards,
> > > Nithya
> > >
> > >
> > >
> > >
> > > On 13 June 2018 at 16:40, Milind Changire <mchangir at redhat.com>
> > > wrote:
> > > > +Nithya
> > > >
> > > > Nithya,
> > > > Do these logs [1] look similar to the recursive readdir()
> > > > issue that you encountered just a while back ?
> > > > i.e. recursive readdir() response definition in the XDR
> > > >
> > > > [1] http://www-pnp.physics.ox.ac.uk/~mohammad/backtrace.log
> > > >
> > > >
> > > > On Wed, Jun 13, 2018 at 4:29 PM, mohammad kashif <kashif.alig at g
> > > > mail.com> wrote:
> > > > > Hi Milind
> > > > >
> > > > > Thanks a lot, I manage to run gdb and produced traceback as
> > > > > well. Its here
> > > > >
> > > > > http://www-pnp.physics.ox.ac.uk/~mohammad/backtrace.log
> > > > >
> > > > >
> > > > > I am trying to understand but still not able to make sense
> > > > > out of it.
> > > > >
> > > > > Thanks
> > > > >
> > > > > Kashif
> > > > >
> > > > >
> > > > > On Wed, Jun 13, 2018 at 11:34 AM, Milind Changire <mchangir at r
> > > > > edhat.com> wrote:
> > > > > > Kashif,
> > > > > > FYI: http://debuginfo.centos.org/centos/6/storage/x86_64/
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Wed, Jun 13, 2018 at 3:21 PM, mohammad kashif <kashif.al
> > > > > > ig at gmail.com> wrote:
> > > > > > > Hi Milind
> > > > > > >
> > > > > > > There is no
> > > > > > > glusterfs-debuginfo available for gluster-3.12 from
> > > > > > > http://mirror.centos.org/centos/6/storage/x86_64/gluster-
> > > > > > > 3.12/ repo. Do
> > > > > > > you know from where I can get it?
> > > > > > > Also when I run gdb, it says
> > > > > > >
> > > > > > > Missing separate debuginfos, use: debuginfo-install
> > > > > > > glusterfs-fuse-3.12.9-1.el6.x86_64
> > > > > > >
> > > > > > > I can't find debug package for glusterfs-fuse either
> > > > > > >
> > > > > > > Thanks from the pit of despair ;)
> > > > > > >
> > > > > > > Kashif
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Jun 12, 2018 at 5:01 PM, mohammad kashif <kashif.
> > > > > > > alig at gmail.com> wrote:
> > > > > > > > Hi Milind
> > > > > > > >
> > > > > > > > I will send you links for logs.
> > > > > > > >
> > > > > > > > I collected these core dumps at client and there is no
> > > > > > > > glusterd process running on client.
> > > > > > > >
> > > > > > > > Kashif
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Jun 12, 2018 at 4:14 PM, Milind Changire <mchan
> > > > > > > > gir at redhat.com> wrote:
> > > > > > > > > Kashif,
> > > > > > > > > Could you also send over the client/mount log file as
> > > > > > > > > Vijay suggested ?
> > > > > > > > > Or maybe the lines with the crash backtrace lines
> > > > > > > > >
> > > > > > > > > Also, you've mentioned that you straced glusterd, but
> > > > > > > > > when you ran gdb, you ran it over /usr/sbin/glusterfs
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Tue, Jun 12, 2018 at 8:19 PM, Vijay Bellur <vbellu
> > > > > > > > > r at redhat.com> wrote:
> > > > > > > > > > On Tue, Jun 12, 2018 at 7:40 AM, mohammad kashif <k
> > > > > > > > > > ashif.alig at gmail.com> wrote:
> > > > > > > > > > > Hi Milind
> > > > > > > > > > >
> > > > > > > > > > > The operating system is Scientific Linux 6 which
> > > > > > > > > > > is based on RHEL6. The cpu arch is Intel x86_64.
> > > > > > > > > > >
> > > > > > > > > > > I will send you a separate email with link to
> > > > > > > > > > > core dump.
> > > > > > > > > >
> > > > > > > > > > You could also grep for crash in the client log
> > > > > > > > > > file and the lines following crash would have a
> > > > > > > > > > backtrace in most cases.
> > > > > > > > > >
> > > > > > > > > > HTH,
> > > > > > > > > > Vijay
> > > > > > > > > >
> > > > > > > > > > > Thanks for your help.
> > > > > > > > > > >
> > > > > > > > > > > Kashif
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Jun 12, 2018 at 3:16 PM, Milind Changire
> > > > > > > > > > > <mchangir at redhat.com> wrote:
> > > > > > > > > > > > Kashif,
> > > > > > > > > > > > Could you share the core dump via Google Drive
> > > > > > > > > > > > or something similar
> > > > > > > > > > > >
> > > > > > > > > > > > Also, let me know the CPU arch and OS
> > > > > > > > > > > > Distribution on which you are running gluster.
> > > > > > > > > > > >
> > > > > > > > > > > > If you've installed the glusterfs-debuginfo
> > > > > > > > > > > > package, you'll also get the source lines in
> > > > > > > > > > > > the backtrace via gdb
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, Jun 12, 2018 at 5:59 PM, mohammad
> > > > > > > > > > > > kashif <kashif.alig at gmail.com> wrote:
> > > > > > > > > > > > > Hi Milind, Vijay
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks, I have some more information now as I
> > > > > > > > > > > > > straced glusterd on client
> > > > > > > > > > > > >
> > > > > > > > > > > > > 138544 0.000131 mprotect(0x7f2f70785000,
> > > > > > > > > > > > > 4096, PROT_READ|PROT_WRITE) = 0 <0.000026>
> > > > > > > > > > > > > 138544 0.000128 mprotect(0x7f2f70786000,
> > > > > > > > > > > > > 4096, PROT_READ|PROT_WRITE) = 0 <0.000027>
> > > > > > > > > > > > > 138544 0.000126 mprotect(0x7f2f70787000,
> > > > > > > > > > > > > 4096, PROT_READ|PROT_WRITE) = 0 <0.000027>
> > > > > > > > > > > > > 138544 0.000124 --- SIGSEGV
> > > > > > > > > > > > > {si_signo=SIGSEGV, si_code=SEGV_ACCERR,
> > > > > > > > > > > > > si_addr=0x7f2f7c60ef88} ---
> > > > > > > > > > > > > 138544 0.000051 --- SIGSEGV
> > > > > > > > > > > > > {si_signo=SIGSEGV, si_code=SI_KERNEL,
> > > > > > > > > > > > > si_addr=0} ---
> > > > > > > > > > > > > 138551 0.105048 +++ killed by SIGSEGV
> > > > > > > > > > > > > (core dumped) +++
> > > > > > > > > > > > > 138550 0.000041 +++ killed by SIGSEGV
> > > > > > > > > > > > > (core dumped) +++
> > > > > > > > > > > > > 138547 0.000008 +++ killed by SIGSEGV
> > > > > > > > > > > > > (core dumped) +++
> > > > > > > > > > > > > 138546 0.000007 +++ killed by SIGSEGV
> > > > > > > > > > > > > (core dumped) +++
> > > > > > > > > > > > > 138545 0.000007 +++ killed by SIGSEGV
> > > > > > > > > > > > > (core dumped) +++
> > > > > > > > > > > > > 138544 0.000008 +++ killed by SIGSEGV
> > > > > > > > > > > > > (core dumped) +++
> > > > > > > > > > > > > 138543 0.000007 +++ killed by SIGSEGV
> > > > > > > > > > > > > (core dumped) +++
> > > > > > > > > > > > >
> > > > > > > > > > > > > As for I understand that somehow gluster is
> > > > > > > > > > > > > trying to access memory in appropriate manner
> > > > > > > > > > > > > and kernel sends SIGSEGV
> > > > > > > > > > > > >
> > > > > > > > > > > > > I also got the core dump. I am trying gdb
> > > > > > > > > > > > > first time so I am not sure whether I am
> > > > > > > > > > > > > using it correctly
> > > > > > > > > > > > >
> > > > > > > > > > > > > gdb /usr/sbin/glusterfs core.138536
> > > > > > > > > > > > >
> > > > > > > > > > > > > It just tell me that program terminated with
> > > > > > > > > > > > > signal 11, segmentation fault .
> > > > > > > > > > > > >
> > > > > > > > > > > > > The problem is not limited to one client but
> > > > > > > > > > > > > happening to many clients.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I will really appreciate any help as whole
> > > > > > > > > > > > > file system has become unusable
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks
> > > > > > > > > > > > >
> > > > > > > > > > > > > Kashif
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, Jun 12, 2018 at 12:26 PM, Milind
> > > > > > > > > > > > > Changire <mchangir at redhat.com> wrote:
> > > > > > > > > > > > > > Kashif,
> > > > > > > > > > > > > > You can change the log level by:
> > > > > > > > > > > > > > $ gluster volume set <vol>
> > > > > > > > > > > > > > diagnostics.brick-log-level TRACE
> > > > > > > > > > > > > > $ gluster volume set <vol>
> > > > > > > > > > > > > > diagnostics.client-log-level TRACE
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > and see how things fare
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > If you want fewer logs you can change the
> > > > > > > > > > > > > > log-level to DEBUG instead of TRACE.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Tue, Jun 12, 2018 at 3:37 PM, mohammad
> > > > > > > > > > > > > > kashif <kashif.alig at gmail.com> wrote:
> > > > > > > > > > > > > > > Hi Vijay
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Now it is unmounting every 30 mins !
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The server log at
> > > > > > > > > > > > > > > /var/log/glusterfs/bricks/glusteratlas-
> > > > > > > > > > > > > > > brics001-gv0.log have this line only
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 2018-06-12 09:53:19.303102] I [MSGID:
> > > > > > > > > > > > > > > 115013] [server-
> > > > > > > > > > > > > > > helpers.c:289:do_fd_cleanup] 0-
> > > > > > > > > > > > > > > atlasglust-server: fd cleanup on
> > > > > > > > > > > > > > > /atlas/atlasdata/zgubic/hmumu/histograms/
> > > > > > > > > > > > > > > v14.3/Signal
> > > > > > > > > > > > > > > [2018-06-12 09:53:19.306190] I [MSGID:
> > > > > > > > > > > > > > > 101055] [client_t.c:443:gf_client_unref]
> > > > > > > > > > > > > > > 0-atlasglust-server: Shutting down
> > > > > > > > > > > > > > > connection <server-name> -2224879-
> > > > > > > > > > > > > > > 2018/06/12-09:51:01:460889-atlasglust-
> > > > > > > > > > > > > > > client-0-0-0
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > There is no other information. Is there
> > > > > > > > > > > > > > > any way to increase log verbosity?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > on the client
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 2018-06-12 09:51:01.744980] I [MSGID:
> > > > > > > > > > > > > > > 114057] [client-
> > > > > > > > > > > > > > > handshake.c:1478:select_server_supported_
> > > > > > > > > > > > > > > programs] 0-atlasglust-client-5: Using
> > > > > > > > > > > > > > > Program GlusterFS 3.3, Num (1298437),
> > > > > > > > > > > > > > > Version (330)
> > > > > > > > > > > > > > > [2018-06-12 09:51:01.746508] I [MSGID:
> > > > > > > > > > > > > > > 114046] [client-
> > > > > > > > > > > > > > > handshake.c:1231:client_setvolume_cbk] 0-
> > > > > > > > > > > > > > > atlasglust-client-5: Connected to
> > > > > > > > > > > > > > > atlasglust-client-5, attached to remote
> > > > > > > > > > > > > > > volume '/glusteratlas/brick006/gv0'.
> > > > > > > > > > > > > > > [2018-06-12 09:51:01.746543] I [MSGID:
> > > > > > > > > > > > > > > 114047] [client-
> > > > > > > > > > > > > > > handshake.c:1242:client_setvolume_cbk] 0-
> > > > > > > > > > > > > > > atlasglust-client-5: Server and Client
> > > > > > > > > > > > > > > lk-version numbers are not same,
> > > > > > > > > > > > > > > reopening the fds
> > > > > > > > > > > > > > > [2018-06-12 09:51:01.746814] I [MSGID:
> > > > > > > > > > > > > > > 114035] [client-
> > > > > > > > > > > > > > > handshake.c:202:client_set_lk_version_cbk
> > > > > > > > > > > > > > > ] 0-atlasglust-client-5: Server lk
> > > > > > > > > > > > > > > version = 1
> > > > > > > > > > > > > > > [2018-06-12 09:51:01.748449] I [MSGID:
> > > > > > > > > > > > > > > 114057] [client-
> > > > > > > > > > > > > > > handshake.c:1478:select_server_supported_
> > > > > > > > > > > > > > > programs] 0-atlasglust-client-6: Using
> > > > > > > > > > > > > > > Program GlusterFS 3.3, Num (1298437),
> > > > > > > > > > > > > > > Version (330)
> > > > > > > > > > > > > > > [2018-06-12 09:51:01.750219] I [MSGID:
> > > > > > > > > > > > > > > 114046] [client-
> > > > > > > > > > > > > > > handshake.c:1231:client_setvolume_cbk] 0-
> > > > > > > > > > > > > > > atlasglust-client-6: Connected to
> > > > > > > > > > > > > > > atlasglust-client-6, attached to remote
> > > > > > > > > > > > > > > volume '/glusteratlas/brick007/gv0'.
> > > > > > > > > > > > > > > [2018-06-12 09:51:01.750261] I [MSGID:
> > > > > > > > > > > > > > > 114047] [client-
> > > > > > > > > > > > > > > handshake.c:1242:client_setvolume_cbk] 0-
> > > > > > > > > > > > > > > atlasglust-client-6: Server and Client
> > > > > > > > > > > > > > > lk-version numbers are not same,
> > > > > > > > > > > > > > > reopening the fds
> > > > > > > > > > > > > > > [2018-06-12 09:51:01.750503] I [MSGID:
> > > > > > > > > > > > > > > 114035] [client-
> > > > > > > > > > > > > > > handshake.c:202:client_set_lk_version_cbk
> > > > > > > > > > > > > > > ] 0-atlasglust-client-6: Server lk
> > > > > > > > > > > > > > > version = 1
> > > > > > > > > > > > > > > [2018-06-12 09:51:01.752207] I [fuse-
> > > > > > > > > > > > > > > bridge.c:4205:fuse_init] 0-glusterfs-
> > > > > > > > > > > > > > > fuse: FUSE inited with protocol versions:
> > > > > > > > > > > > > > > glusterfs 7.24 kernel 7.14
> > > > > > > > > > > > > > > [2018-06-12 09:51:01.752261] I [fuse-
> > > > > > > > > > > > > > > bridge.c:4835:fuse_graph_sync] 0-fuse:
> > > > > > > > > > > > > > > switched to graph 0
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > is there a problem with server and client
> > > > > > > > > > > > > > > 1k version?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks for your help.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Kashif
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Mon, Jun 11, 2018 at 11:52 PM, Vijay
> > > > > > > > > > > > > > > Bellur <vbellur at redhat.com> wrote:
> > > > > > > > > > > > > > > > On Mon, Jun 11, 2018 at 8:50 AM,
> > > > > > > > > > > > > > > > mohammad kashif <kashif.alig at gmail.com>
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > Hi
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Since I have updated our gluster
> > > > > > > > > > > > > > > > > server and client to latest version
> > > > > > > > > > > > > > > > > 3.12.9-1, I am having this issue of
> > > > > > > > > > > > > > > > > gluster getting unmounted from client
> > > > > > > > > > > > > > > > > very regularly. It was not a problem
> > > > > > > > > > > > > > > > > before update.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Its a distributed file system with no
> > > > > > > > > > > > > > > > > replication. We have seven servers
> > > > > > > > > > > > > > > > > totaling around 480TB data. Its 97%
> > > > > > > > > > > > > > > > > full.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I am using following config on server
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > gluster volume set atlasglust
> > > > > > > > > > > > > > > > > features.cache-invalidation on
> > > > > > > > > > > > > > > > > gluster volume set atlasglust
> > > > > > > > > > > > > > > > > features.cache-invalidation-timeout
> > > > > > > > > > > > > > > > > 600
> > > > > > > > > > > > > > > > > gluster volume set atlasglust
> > > > > > > > > > > > > > > > > performance.stat-prefetch on
> > > > > > > > > > > > > > > > > gluster volume set atlasglust
> > > > > > > > > > > > > > > > > performance.cache-invalidation on
> > > > > > > > > > > > > > > > > gluster volume set atlasglust
> > > > > > > > > > > > > > > > > performance.md-cache-timeout 600
> > > > > > > > > > > > > > > > > gluster volume set atlasglust
> > > > > > > > > > > > > > > > > performance.parallel-readdir on
> > > > > > > > > > > > > > > > > gluster volume set atlasglust
> > > > > > > > > > > > > > > > > performance.cache-size 1GB
> > > > > > > > > > > > > > > > > gluster volume set atlasglust
> > > > > > > > > > > > > > > > > performance.client-io-threads on
> > > > > > > > > > > > > > > > > gluster volume set atlasglust
> > > > > > > > > > > > > > > > > cluster.lookup-optimize on
> > > > > > > > > > > > > > > > > gluster volume set atlasglust
> > > > > > > > > > > > > > > > > performance.stat-prefetch on
> > > > > > > > > > > > > > > > > gluster volume set atlasglust
> > > > > > > > > > > > > > > > > client.event-threads 4
> > > > > > > > > > > > > > > > > gluster volume set atlasglust
> > > > > > > > > > > > > > > > > server.event-threads 4
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > clients are mounted with this option
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > defaults,direct-io-
> > > > > > > > > > > > > > > > > mode=disable,attribute-
> > > > > > > > > > > > > > > > > timeout=600,entry-
> > > > > > > > > > > > > > > > > timeout=600,negative-
> > > > > > > > > > > > > > > > > timeout=600,fopen-keep-
> > > > > > > > > > > > > > > > > cache,rw,_netdev
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I can't see anything in the log file.
> > > > > > > > > > > > > > > > > Can someone suggest that how to
> > > > > > > > > > > > > > > > > troubleshoot this issue?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Can you please share the log file?
> > > > > > > > > > > > > > > > Checking for messages related to
> > > > > > > > > > > > > > > > disconnections/crashes in the log file
> > > > > > > > > > > > > > > > would be a good way to start
> > > > > > > > > > > > > > > > troubleshooting the problem.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > Vijay
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > _________________________________________
> > > > > > > > > > > > > > > ______
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Gluster-users mailing list
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Gluster-users at gluster.org
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > http://lists.gluster.org/mailman/listinfo
> > > > > > > > > > > > > > > /gluster-users
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > ___________________________________________
> > > > > > > > > > > > > > ____Gluster-users mailing listGluster-
> > > > > > > > > > > > > > users at gluster.orghttp://lists.gluster.org/m
> > > > > > > > > > > > > > ailman/listinfo/gluster-users
--
James P. Kinney III
Every time you stop a school, you will have to build a jail. What you
gain at one end you lose at the other. It's like feeding a dog on his
own tail. It won't fatten the dog.
- Speech 11/23/1900 Mark Twain
http://heretothereideas.blogspot.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20180614/9d64400f/attachment.html>
More information about the Gluster-users
mailing list