[Gluster-users] Issue with Pro active self healing for Erasure coding
Mohamed Pakkeer
mdfakkeer at gmail.com
Fri Jun 26 06:58:47 UTC 2015
Hi Xavier
We are facing same I/O error after upgrade into gluster 3.7.2.
Description of problem:
=======================
In a 3 x (4 + 2) = 18 distributed disperse volume, there are input/output
error of some files on fuse mount after simulating the following scenario
1. Simulate the disk failure by killing the disk pid and again adding the
same disk after formatting the drive
2. Try to read the recovered or healed file after 2 bricks/nodes were
brought down
Version-Release number of selected component (if applicable):
==============================================================
admin at node001:~$ sudo gluster --version
glusterfs 3.7.2 built on Jun 19 2015 16:33:27
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (coffee) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General
Public License.
Steps to Reproduce:
1. create a 3x(4+2) disperse volume across nodes
2. FUSE mount on the client and start creating files/directories with
mkdir and rsync/dd
3. simulate the disk failure by killing pid of any disk on one node
and add again the same disk after formatting the drive
4. start volume by force
5. self haling adding the file name with 0 bytes in newly formatted drive
6. wait more time to finish self healing, but self healing is not
happening the file lies on 0 bytes
7. Try to read same file from client, now the file name with 0 byte
try to recovery and recovery completed. Get the md5sum of the file
with all client live and the result is positive
8. Now, bring down 2 of the node
9. Now try to get the mdsum of same recoverd file, client throws I/O error
Screen shots
admin at node001:~$ sudo gluster volume info
Volume Name: vaulttest21
Type: Distributed-Disperse
Volume ID: ac6a374d-a0a2-405c-823d-0672fd92f0af
Status: Started
Number of Bricks: 3 x (4 + 2) = 18
Transport-type: tcp
Bricks:
Brick1: 10.1.2.1:/media/disk1
Brick2: 10.1.2.2:/media/disk1
Brick3: 10.1.2.3:/media/disk1
Brick4: 10.1.2.4:/media/disk1
Brick5: 10.1.2.5:/media/disk1
Brick6: 10.1.2.6:/media/disk1
Brick7: 10.1.2.1:/media/disk2
Brick8: 10.1.2.2:/media/disk2
Brick9: 10.1.2.3:/media/disk2
Brick10: 10.1.2.4:/media/disk2
Brick11: 10.1.2.5:/media/disk2
Brick12: 10.1.2.6:/media/disk2
Brick13: 10.1.2.1:/media/disk3
Brick14: 10.1.2.2:/media/disk3
Brick15: 10.1.2.3:/media/disk3
Brick16: 10.1.2.4:/media/disk3
Brick17: 10.1.2.5:/media/disk3
Brick18: 10.1.2.6:/media/disk3
Options Reconfigured:
performance.readdir-ahead: on
*After simulated the disk failure( node3- disk2) and adding aging by
formatting the drive *
admin at node003:~$ date
Thu Jun 25 *16:21:58* IST 2015
admin at node003:~$ ls -l -h /media/disk2
total 1.6G
drwxr-xr-x 3 root root 22 Jun 25 16:18 1
*-rw-r--r-- 2 root root 0 Jun 25 16:17 up1*
*-rw-r--r-- 2 root root 0 Jun 25 16:17 up2*
-rw-r--r-- 2 root root 797M Jun 25 16:03 up3
-rw-r--r-- 2 root root 797M Jun 25 16:04 up4
--
admin at node003:~$ date
Thu Jun 25 *16:25:09* IST 2015
admin at node003:~$ ls -l -h /media/disk2
total 1.6G
drwxr-xr-x 3 root root 22 Jun 25 16:18 1
*-rw-r--r-- 2 root root 0 Jun 25 16:17 up1*
*-rw-r--r-- 2 root root 0 Jun 25 16:17 up2*
-rw-r--r-- 2 root root 797M Jun 25 16:03 up3
-rw-r--r-- 2 root root 797M Jun 25 16:04 up4
admin at node003:~$ date
Thu Jun 25 *16:41:25* IST 2015
admin at node003:~$ ls -l -h /media/disk2
total 1.6G
drwxr-xr-x 3 root root 22 Jun 25 16:18 1
-rw-r--r-- 2 root root 0 Jun 25 16:17 up1
-rw-r--r-- 2 root root 0 Jun 25 16:17 up2
-rw-r--r-- 2 root root 797M Jun 25 16:03 up3
-rw-r--r-- 2 root root 797M Jun 25 16:04 up4
*after waiting nearly 20 minutes, self healing is not recovered the full
data junk . Then try to read the file using md5sum*
root at mas03:/mnt/gluster# time md5sum up1
4650543ade404ed5a1171726e76f8b7c up1
real 1m58.010s
user 0m6.243s
sys 0m0.778s
*corrupted junk starts growing*
admin at node003:~$ ls -l -h /media/disk2
total 2.6G
drwxr-xr-x 3 root root 22 Jun 25 16:18 1
-rw-r--r-- 2 root root 797M Jun 25 15:57 up1
-rw-r--r-- 2 root root 0 Jun 25 16:17 up2
-rw-r--r-- 2 root root 797M Jun 25 16:03 up3
-rw-r--r-- 2 root root 797M Jun 25 16:04 up4
*To verify healed file after two node 5 & 6 taken offline*
root at mas03:/mnt/gluster# time md5sum up1
md5sum: up1:* Input/output error*
Still the I/O error is not rectified. Could you suggest, if any thing wrong
on our testing?
admin at node001:~$ sudo gluster volume get vaulttest21 all
Option Value
------ -----
cluster.lookup-unhashed on
cluster.lookup-optimize off
cluster.min-free-disk 10%
cluster.min-free-inodes 5%
cluster.rebalance-stats off
cluster.subvols-per-directory (null)
cluster.readdir-optimize off
cluster.rsync-hash-regex (null)
cluster.extra-hash-regex (null)
cluster.dht-xattr-name trusted.glusterfs.dht
cluster.randomize-hash-range-by-gfid off
cluster.rebal-throttle normal
cluster.local-volume-name (null)
cluster.weighted-rebalance on
cluster.entry-change-log on
cluster.read-subvolume (null)
cluster.read-subvolume-index -1
cluster.read-hash-mode 1
cluster.background-self-heal-count 16
cluster.metadata-self-heal on
cluster.data-self-heal on
cluster.entry-self-heal on
cluster.self-heal-daemon on
cluster.heal-timeout 600
cluster.self-heal-window-size 1
cluster.data-change-log on
cluster.metadata-change-log on
cluster.data-self-heal-algorithm (null)
cluster.eager-lock on
cluster.quorum-type none
cluster.quorum-count (null)
cluster.choose-local true
cluster.self-heal-readdir-size 1KB
cluster.post-op-delay-secs 1
cluster.ensure-durability on
cluster.consistent-metadata no
cluster.stripe-block-size 128KB
cluster.stripe-coalesce true
diagnostics.latency-measurement off
diagnostics.dump-fd-stats off
diagnostics.count-fop-hits off
diagnostics.brick-log-level INFO
diagnostics.client-log-level INFO
diagnostics.brick-sys-log-level CRITICAL
diagnostics.client-sys-log-level CRITICAL
diagnostics.brick-logger (null)
diagnostics.client-logger (null)
diagnostics.brick-log-format (null)
diagnostics.client-log-format (null)
diagnostics.brick-log-buf-size 5
diagnostics.client-log-buf-size 5
diagnostics.brick-log-flush-timeout 120
diagnostics.client-log-flush-timeout 120
performance.cache-max-file-size 0
performance.cache-min-file-size 0
performance.cache-refresh-timeout 1
performance.cache-priority
performance.cache-size 32MB
performance.io-thread-count 16
performance.high-prio-threads 16
performance.normal-prio-threads 16
performance.low-prio-threads 16
performance.least-prio-threads 1
performance.enable-least-priority on
performance.least-rate-limit 0
performance.cache-size 128MB
performance.flush-behind on
performance.nfs.flush-behind on
performance.write-behind-window-size 1MB
performance.nfs.write-behind-window-size1MB
performance.strict-o-direct off
performance.nfs.strict-o-direct off
performance.strict-write-ordering off
performance.nfs.strict-write-ordering off
performance.lazy-open yes
performance.read-after-open no
performance.read-ahead-page-count 4
performance.md-cache-timeout 1
features.encryption off
encryption.master-key (null)
encryption.data-key-size 256
encryption.block-size 4096
network.frame-timeout 1800
network.ping-timeout 42
network.tcp-window-size (null)
features.lock-heal off
features.grace-timeout 10
network.remote-dio disable
client.event-threads 2
network.ping-timeout 42
network.tcp-window-size (null)
network.inode-lru-limit 16384
auth.allow *
auth.reject (null)
transport.keepalive (null)
server.allow-insecure (null)
server.root-squash off
server.anonuid 65534
server.anongid 65534
server.statedump-path /var/run/gluster
server.outstanding-rpc-limit 64
features.lock-heal off
features.grace-timeout (null)
server.ssl (null)
auth.ssl-allow *
server.manage-gids off
client.send-gids on
server.gid-timeout 300
server.own-thread (null)
server.event-threads 2
performance.write-behind on
performance.read-ahead on
performance.readdir-ahead on
performance.io-cache on
performance.quick-read on
performance.open-behind on
performance.stat-prefetch on
performance.client-io-threads off
performance.nfs.write-behind on
performance.nfs.read-ahead off
performance.nfs.io-cache off
performance.nfs.quick-read off
performance.nfs.stat-prefetch off
performance.nfs.io-threads off
performance.force-readdirp true
features.file-snapshot off
features.uss off
features.snapshot-directory .snaps
features.show-snapshot-directory off
network.compression off
network.compression.window-size -15
network.compression.mem-level 8
network.compression.min-size 0
network.compression.compression-level -1
network.compression.debug false
features.limit-usage (null)
features.quota-timeout 0
features.default-soft-limit 80%
features.soft-timeout 60
features.hard-timeout 5
features.alert-time 86400
features.quota-deem-statfs off
geo-replication.indexing off
geo-replication.indexing off
geo-replication.ignore-pid-check off
geo-replication.ignore-pid-check off
features.quota off
features.inode-quota off
features.bitrot disable
debug.trace off
debug.log-history no
debug.log-file no
debug.exclude-ops (null)
debug.include-ops (null)
debug.error-gen off
debug.error-failure (null)
debug.error-number (null)
debug.random-failure off
debug.error-fops (null)
nfs.enable-ino32 no
nfs.mem-factor 15
nfs.export-dirs on
nfs.export-volumes on
nfs.addr-namelookup off
nfs.dynamic-volumes off
nfs.register-with-portmap on
nfs.outstanding-rpc-limit 16
nfs.port 2049
nfs.rpc-auth-unix on
nfs.rpc-auth-null on
nfs.rpc-auth-allow all
nfs.rpc-auth-reject none
nfs.ports-insecure off
nfs.trusted-sync off
nfs.trusted-write off
nfs.volume-access read-write
nfs.export-dir
nfs.disable false
nfs.nlm on
nfs.acl on
nfs.mount-udp off
nfs.mount-rmtab /var/lib/glusterd/nfs/rmtab
nfs.rpc-statd /sbin/rpc.statd
nfs.server-aux-gids off
nfs.drc off
nfs.drc-size 0x20000
nfs.read-size (1 * 1048576ULL)
nfs.write-size (1 * 1048576ULL)
nfs.readdir-size (1 * 1048576ULL)
nfs.exports-auth-enable (null)
nfs.auth-refresh-interval-sec (null)
nfs.auth-cache-ttl-sec (null)
features.read-only off
features.worm off
storage.linux-aio off
storage.batch-fsync-mode reverse-fsync
storage.batch-fsync-delay-usec 0
storage.owner-uid -1
storage.owner-gid -1
storage.node-uuid-pathinfo off
storage.health-check-interval 30
storage.build-pgfid off
storage.bd-aio off
cluster.server-quorum-type off
cluster.server-quorum-ratio 0
changelog.changelog off
changelog.changelog-dir (null)
changelog.encoding ascii
changelog.rollover-time 15
changelog.fsync-interval 5
changelog.changelog-barrier-timeout 120
changelog.capture-del-path off
features.barrier disable
features.barrier-timeout 120
features.trash off
features.trash-dir .trashcan
features.trash-eliminate-path (null)
features.trash-max-filesize 5MB
features.trash-internal-op off
cluster.enable-shared-storage disable
features.ctr-enabled off
features.record-counters off
features.ctr_link_consistency off
locks.trace (null)
cluster.disperse-self-heal-daemon enable
cluster.quorum-reads no
client.bind-insecure (null)
ganesha.enable off
features.shard off
features.shard-block-size 4MB
features.scrub-throttle lazy
features.scrub-freq biweekly
features.expiry-time 120
features.cache-invalidation off
features.cache-invalidation-timeout 60
Thanks & regards
Backer
On Mon, Jun 15, 2015 at 1:26 PM, Xavier Hernandez <xhernandez at datalab.es>
wrote:
> On 06/15/2015 09:25 AM, Mohamed Pakkeer wrote:
>
>> Hi Xavier,
>>
>> When can we expect the 3.7.2 release for fixing the I/O error which we
>> discussed on this mail thread?.
>>
>
> As per the latest meeting held last wednesday [1] it will be released this
> week.
>
> Xavi
>
> [1]
> http://meetbot.fedoraproject.org/gluster-meeting/2015-06-10/gluster-meeting.2015-06-10-12.01.html
>
>
>> Thanks
>> Backer
>>
>> On Wed, May 27, 2015 at 8:02 PM, Xavier Hernandez <xhernandez at datalab.es
>> <mailto:xhernandez at datalab.es>> wrote:
>>
>> Hi again,
>>
>> in today's gluster meeting [1] it has been decided that 3.7.1 will
>> be released urgently to solve a bug in glusterd. All fixes planned
>> for 3.7.1 will be moved to 3.7.2 which will be released soon after.
>>
>> Xavi
>>
>> [1]
>>
>> http://meetbot.fedoraproject.org/gluster-meeting/2015-05-27/gluster-meeting.2015-05-27-12.01.html
>>
>>
>> On 05/27/2015 12:01 PM, Xavier Hernandez wrote:
>>
>> On 05/27/2015 11:26 AM, Mohamed Pakkeer wrote:
>>
>> Hi Xavier,
>>
>> Thanks for your reply. When can we expect the 3.7.1 release?
>>
>>
>> AFAIK a beta of 3.7.1 will be released very soon.
>>
>>
>> cheers
>> Backer
>>
>> On Wed, May 27, 2015 at 1:22 PM, Xavier Hernandez
>> <xhernandez at datalab.es <mailto:xhernandez at datalab.es>
>> <mailto:xhernandez at datalab.es
>>
>> <mailto:xhernandez at datalab.es>>> wrote:
>>
>> Hi,
>>
>> some Input/Output error issues have been identified and
>> fixed. These
>> fixes will be available on 3.7.1.
>>
>> Xavi
>>
>>
>> On 05/26/2015 10:15 AM, Mohamed Pakkeer wrote:
>>
>> Hi Glusterfs Experts,
>>
>> We are testing glusterfs 3.7.0 tarball on our 10
>> Node glusterfs
>> cluster.
>> Each node has 36 dirves and please find the volume
>> info below
>>
>> Volume Name: vaulttest5
>> Type: Distributed-Disperse
>> Volume ID: 68e082a6-9819-4885-856c-1510cd201bd9
>> Status: Started
>> Number of Bricks: 36 x (8 + 2) = 360
>> Transport-type: tcp
>> Bricks:
>> Brick1: 10.1.2.1:/media/disk1
>> Brick2: 10.1.2.2:/media/disk1
>> Brick3: 10.1.2.3:/media/disk1
>> Brick4: 10.1.2.4:/media/disk1
>> Brick5: 10.1.2.5:/media/disk1
>> Brick6: 10.1.2.6:/media/disk1
>> Brick7: 10.1.2.7:/media/disk1
>> Brick8: 10.1.2.8:/media/disk1
>> Brick9: 10.1.2.9:/media/disk1
>> Brick10: 10.1.2.10:/media/disk1
>> Brick11: 10.1.2.1:/media/disk2
>> Brick12: 10.1.2.2:/media/disk2
>> Brick13: 10.1.2.3:/media/disk2
>> Brick14: 10.1.2.4:/media/disk2
>> Brick15: 10.1.2.5:/media/disk2
>> Brick16: 10.1.2.6:/media/disk2
>> Brick17: 10.1.2.7:/media/disk2
>> Brick18: 10.1.2.8:/media/disk2
>> Brick19: 10.1.2.9:/media/disk2
>> Brick20: 10.1.2.10:/media/disk2
>> ...
>> ....
>> Brick351: 10.1.2.1:/media/disk36
>> Brick352: 10.1.2.2:/media/disk36
>> Brick353: 10.1.2.3:/media/disk36
>> Brick354: 10.1.2.4:/media/disk36
>> Brick355: 10.1.2.5:/media/disk36
>> Brick356: 10.1.2.6:/media/disk36
>> Brick357: 10.1.2.7:/media/disk36
>> Brick358: 10.1.2.8:/media/disk36
>> Brick359: 10.1.2.9:/media/disk36
>> Brick360: 10.1.2.10:/media/disk36
>> Options Reconfigured:
>> performance.readdir-ahead: on
>>
>> We did some performance testing and simulated the
>> proactive self
>> healing
>> for Erasure coding. Disperse volume has been
>> created across
>> nodes.
>>
>> _*Description of problem*_
>>
>> I disconnected the *network of two nodes* and tried
>> to write
>> some video
>> files and *glusterfs* *wrote the video files on
>> balance 8 nodes
>> perfectly*. I tried to download the uploaded file
>> and it was
>> downloaded
>> perfectly. Then i enabled the network of two nodes,
>> the pro
>> active self
>> healing mechanism worked perfectly and wrote the
>> unavailable
>> junk of
>> data to the recently enabled node from the other 8
>> nodes. But
>> when i
>> tried to download the same file node, it showed
>> Input/Output
>> error. I
>> couldn't download the file. I think there is an
>> issue in pro
>> active self
>> healing.
>>
>> Also we tried the simulation with one node network
>> failure. We
>> faced
>> same I/O error issue while downloading the file
>>
>>
>> _Error while downloading file _
>> _
>> _
>>
>> root at master02:/home/admin# rsync -r --progress
>> /mnt/gluster/file13_AN
>> ./1/file13_AN-2
>>
>> sending incremental file list
>>
>> file13_AN
>>
>> 3,342,355,597 100% 4.87MB/s 0:10:54 (xfr#1,
>> to-chk=0/1)
>>
>> rsync: read errors mapping "/mnt/gluster/file13_AN":
>> Input/output error (5)
>>
>> WARNING: file13_AN failed verification -- update
>> discarded (will
>> try again).
>>
>> root at master02:/home/admin# cp
>> /mnt/gluster/file13_AN
>> ./1/file13_AN-3
>>
>> cp: error reading ‘/mnt/gluster/file13_AN’:
>> Input/output error
>>
>> cp: failed to extend ‘./1/file13_AN-3’:
>> Input/output error_
>> _
>>
>>
>> We can't conclude the issue with glusterfs 3.7.0 or
>> our glusterfs
>> configuration.
>>
>> Any help would be greatly appreciated
>>
>> --
>> Cheers
>> Backer
>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>> <mailto:Gluster-users at gluster.org
>> <mailto:Gluster-users at gluster.org>>
>> http://www.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>> http://www.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>>
>>
>>
--
Thanks & Regards
K.Mohamed Pakkeer
Mobile- 0091-8754410114
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150626/c8789bcc/attachment.html>
More information about the Gluster-users
mailing list