[Gluster-devel] Help needed in understanding GlusterFS logs and debugging elasticsearch failures

Mon Dec 14 10:43:53 UTC 2015

Hi,

On Sat, Dec 12, 2015 at 2:35 AM, Vijay Bellur <vbellur at redhat.com> wrote:

>
>
> ----- Original Message -----
> > From: "Sachidananda URS" <surs at redhat.com>
> > To: "Gluster Devel" <gluster-devel at gluster.org>
> > Sent: Friday, December 11, 2015 10:26:04 AM
> > Subject: [Gluster-devel] Help needed in understanding GlusterFS logs and
> debugging elasticsearch failures
> >
> > Hi,
> >
> > I was trying to use GlusterFS as a backend filesystem for storing the
> > elasticsearch indices on GlusterFS mount.
> >
> > The filesystem operations as far as I can understand is, lucene engine
> > does a lot of renames on the index files. And multiple threads read
> > from the same file concurrently.
> >
> > While writing index, elasticsearch/lucene complains of index corruption
> and
> > the
> > health of the cluster goes to red, and all the operations on the index
> fail
> > hereafter.
> >
> > ===================
> >
> > [2015-12-10 02:43:45,614][WARN ][index.engine             ] [client-2]
> > [logstash-2015.12.09][3] failed engine [merge failed]
> > org.apache.lucene.index.MergePolicy$MergeException:
> > org.apache.lucene.index.CorruptIndexException: checksum failed (hardware
> > problem?) : expected=0 actual=6d811d06
> >
> (resource=BufferedChecksumIndexInput(NIOFSIndexInput(path="/mnt/gluster2/rhs/nodes/0/indices/logstash-2015.12.09/3/index/_a7.cfs")
> > [slice=_a7_Lucene50_0.doc]))
> >         at
> >
>  org.elasticsearch.index.engine.InternalEngine$EngineMergeScheduler$1.doRun(InternalEngine.java:1233)
> >         at
> >
>  org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
> >         at
> >
>  java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> >         at
> >
>  java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> >         at java.lang.Thread.run(Thread.java:745)
> > Caused by: org.apache.lucene.index.CorruptIndexException: checksum failed
> > (hardware problem?) : expected=0 actual=6d811d06
> >
> (resource=BufferedChecksumIndexInput(NIOFSIndexInput(path="/mnt/gluster2/rhs/nodes/0/indices/logstash-2015.12.09/3/index/_a7.cfs")
> > [slice=_a7_Lucene50_0.doc]))
> >
> > =====================
> >
> >
> > Server logs does not have anything. The client logs is full of messages
> like:
> >
> >
> >
> > [2015-12-03 18:44:17.882032] I [MSGID: 109066]
> [dht-rename.c:1410:dht_rename]
> > 0-esearch-dht: renaming
> >
> /rhs/nodes/0/indices/logstash-2015.12.03/1/translog/translog-61881676454442626.tlog
> > (hash=esearch-replicate-0/cache=esearch-replicate-0) =>
> > /rhs/nodes/0/indices/logstash-2015.12.03/1/translog/translog-311.ckp
> > (hash=esearch-replicate-1/cache=<nul>)
> > [2015-12-03 18:45:31.276316] I [MSGID: 109066]
> [dht-rename.c:1410:dht_rename]
> > 0-esearch-dht: renaming
> >
> /rhs/nodes/0/indices/logstash-2015.12.03/1/translog/translog-2384654015514619399.tlog
> > (hash=esearch-replicate-0/cache=esearch-replicate-0) =>
> > /rhs/nodes/0/indices/logstash-2015.12.03/1/translog/translog-312.ckp
> > (hash=esearch-replicate-0/cache=<nul>)
> > [2015-12-03 18:45:31.587660] I [MSGID: 109066]
> [dht-rename.c:1410:dht_rename]
> > 0-esearch-dht: renaming
> >
> /rhs/nodes/0/indices/logstash-2015.12.03/4/translog/translog-4957943728738197940.tlog
> > (hash=esearch-replicate-0/cache=esearch-replicate-0) =>
> > /rhs/nodes/0/indices/logstash-2015.12.03/4/translog/translog-312.ckp
> > (hash=esearch-replicate-0/cache=<nul>)
> > [2015-12-03 18:46:48.424605] I [MSGID: 109066]
> [dht-rename.c:1410:dht_rename]
> > 0-esearch-dht: renaming
> >
> /rhs/nodes/0/indices/logstash-2015.12.03/1/translog/translog-1731620600607498012.tlog
> > (hash=esearch-replicate-1/cache=esearch-replicate-1) =>
> > /rhs/nodes/0/indices/logstash-2015.12.03/1/translog/translog-313.ckp
> > (hash=esearch-replicate-1/cache=<nul>)
> > [2015-12-03 18:46:48.466558] I [MSGID: 109066]
> [dht-rename.c:1410:dht_rename]
> > 0-esearch-dht: renaming
> >
> /rhs/nodes/0/indices/logstash-2015.12.03/4/translog/translog-5214949393126318982.tlog
> > (hash=esearch-replicate-1/cache=esearch-replicate-1) =>
> > /rhs/nodes/0/indices/logstash-2015.12.03/4/translog/translog-313.ckp
> > (hash=esearch-replicate-1/cache=<nul>)
> > [2015-12-03 18:48:06.314138] I [MSGID: 109066]
> [dht-rename.c:1410:dht_rename]
> > 0-esearch-dht: renaming
> >
> /rhs/nodes/0/indices/logstash-2015.12.03/4/translog/translog-9110755229226773921.tlog
> > (hash=esearch-replicate-0/cache=esearch-replicate-0) =>
> > /rhs/nodes/0/indices/logstash-2015.12.03/4/translog/translog-314.ckp
> > (hash=esearch-replicate-1/cache=<nul>)
> > [2015-12-03 18:48:06.332919] I [MSGID: 109066]
> [dht-rename.c:1410:dht_rename]
> > 0-esearch-dht: renaming
> >
> /rhs/nodes/0/indices/logstash-2015.12.03/1/translog/translog-5193443717817038271.tlog
> > (hash=esearch-replicate-1/cache=esearch-replicate-1) =>
> > /rhs/nodes/0/indices/logstash-2015.12.03/1/translog/translog-314.ckp
> > (hash=esearch-replicate-1/cache=<nul>)
> > [2015-12-03 18:49:24.694263] I [MSGID: 109066]
> [dht-rename.c:1410:dht_rename]
> > 0-esearch-dht: renaming
> >
> /rhs/nodes/0/indices/logstash-2015.12.03/1/translog/translog-2750483795035758522.tlog
> > (hash=esearch-replicate-1/cache=esearch-replicate-1) =>
> > /rhs/nodes/0/indices/logstash-2015.12.03/1/translog/translog-315.ckp
> > (hash=esearch-replicate-0/cache=<nul>)
> >
> > ==============================================================
> >
> > The same setup works well on any of the disk filesystems.
> > This is 2 x 2 distributed-replicate setup:
> >
> > # gluster vol info
> >
> > Volume Name: esearch
> > Type: Distributed-Replicate
> > Volume ID: 4e4b205e-28ed-4f9e-9fa4-0d020428dede
> > Status: Started
> > Number of Bricks: 2 x 2 = 4
> > Transport-type: tcp,rdma
> > Bricks:
> > Brick1: 10.70.47.171:/gluster/brick1
> > Brick2: 10.70.47.187:/gluster/brick1
> > Brick3: 10.70.47.121:/gluster/brick1
> > Brick4: 10.70.47.172:/gluster/brick1
> > Options Reconfigured:
> > performance.read-ahead: off
> > performance.write-behind: off
> >
> >
> > I need a little bit help in understanding the failures. Let me know if
> you
> > need
> > further information on setup or access to the system to debug further.
> I've
> > attached the debug logs for further investigation.
> >
>
>
> Would it be possible to turn off all the performance translators
> (md-cache, quickread, io-cache etc.) and check if the same problem
> persists? Collecting strace of the elasticsearch process that does I/O on
> gluster can also help.
>

I turned off all the performance xlators.

 gluster vol info

Volume Name: esearch
Type: Distributed-Replicate
Volume ID: 4e4b205e-28ed-4f9e-9fa4-0d020428dede
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp,rdma
Bricks:
Brick1: 10.70.47.171:/gluster/brick1
Brick2: 10.70.47.187:/gluster/brick1
Brick3: 10.70.47.121:/gluster/brick1
Brick4: 10.70.47.172:/gluster/brick1
Options Reconfigured:
performance.stat-prefetch: off
performance.md-cache-timeout: 0
performance.quick-read: off
performance.io-cache: off
performance.read-ahead: off
performance.write-behind: off

The problem still persists. Attaching strace logs.

-sac
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20151214/e77c2d08/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: elastic_strace.log.bz2
Type: application/x-bzip2
Size: 4411830 bytes
Desc: not available
URL: <http://www.gluster.org/pipermail/gluster-devel/attachments/20151214/e77c2d08/attachment-0001.bz2>