[Gluster-users] Issue with Pro active self healing for Erasure coding
Xavier Hernandez
xhernandez at datalab.es
Fri Jun 26 07:41:30 UTC 2015
Could you file a bug for this ?
I'll investigate the problem.
Xavi
On 06/26/2015 08:58 AM, Mohamed Pakkeer wrote:
> Hi Xavier
>
> We are facing same I/O error after upgrade into gluster 3.7.2.
>
> Description of problem:
> =======================
> In a 3 x (4 + 2) = 18 distributed disperse volume, there are
> input/output error of some files on fuse mount after simulating the
> following scenario
>
> 1. Simulate the disk failure by killing the disk pid and again adding
> the same disk after formatting the drive
> 2. Try to read the recovered or healed file after 2 bricks/nodes were
> brought down
>
> Version-Release number of selected component (if applicable):
> ==============================================================
>
> admin at node001:~$ sudo gluster --version
> glusterfs 3.7.2 built on Jun 19 2015 16:33:27
> Repository revision: git://git.gluster.com/glusterfs.git
> <http://git.gluster.com/glusterfs.git>
> Copyright (coffee) 2006-2011 Gluster Inc. <http://www.gluster.com>
> GlusterFS comes with ABSOLUTELY NO WARRANTY.
> You may redistribute copies of GlusterFS under the terms of the GNU
> General Public License.
>
> Steps to Reproduce:
>
> 1. create a 3x(4+2) disperse volume across nodes
> 2. FUSE mount on the client and start creating files/directories with mkdir and rsync/dd
> 3. simulate the disk failure by killing pid of any disk on one node and add again the same disk after formatting the drive
> 4. start volume by force
> 5. self haling adding the file name with 0 bytes in newly formatted drive
> 6. wait more time to finish self healing, but self healing is not happening the file lies on 0 bytes
> 7. Try to read same file from client, now the file name with 0 byte try to recovery and recovery completed. Get the md5sum of the file with all client live and the result is positive
> 8. Now, bring down 2 of the node
> 9. Now try to get the mdsum of same recoverd file, client throws I/O error
>
> Screen shots
>
> admin at node001:~$ sudo gluster volume info
>
> Volume Name: vaulttest21
> Type: Distributed-Disperse
> Volume ID: ac6a374d-a0a2-405c-823d-0672fd92f0af
> Status: Started
> Number of Bricks: 3 x (4 + 2) = 18
> Transport-type: tcp
> Bricks:
> Brick1: 10.1.2.1:/media/disk1
> Brick2: 10.1.2.2:/media/disk1
> Brick3: 10.1.2.3:/media/disk1
> Brick4: 10.1.2.4:/media/disk1
> Brick5: 10.1.2.5:/media/disk1
> Brick6: 10.1.2.6:/media/disk1
> Brick7: 10.1.2.1:/media/disk2
> Brick8: 10.1.2.2:/media/disk2
> Brick9: 10.1.2.3:/media/disk2
> Brick10: 10.1.2.4:/media/disk2
> Brick11: 10.1.2.5:/media/disk2
> Brick12: 10.1.2.6:/media/disk2
> Brick13: 10.1.2.1:/media/disk3
> Brick14: 10.1.2.2:/media/disk3
> Brick15: 10.1.2.3:/media/disk3
> Brick16: 10.1.2.4:/media/disk3
> Brick17: 10.1.2.5:/media/disk3
> Brick18: 10.1.2.6:/media/disk3
> Options Reconfigured:
> performance.readdir-ahead: on
>
> *_After simulated the disk failure( node3- disk2) and adding aging by
> formatting the drive _*
>
> admin at node003:~$ date
>
> Thu Jun 25 *16:21:58* IST 2015
>
>
> admin at node003:~$ ls -l -h /media/disk2
>
> total 1.6G
>
> drwxr-xr-x 3 root root 22 Jun 25 16:18 1
>
> *-rw-r--r-- 2 root root 0 Jun 25 16:17 up1*
>
> *-rw-r--r-- 2 root root 0 Jun 25 16:17 up2*
>
> -rw-r--r-- 2 root root 797M Jun 25 16:03 up3
>
> -rw-r--r-- 2 root root 797M Jun 25 16:04 up4
>
> --
>
> admin at node003:~$ date
>
> Thu Jun 25 *16:25:09* IST 2015
>
>
> admin at node003:~$ ls -l -h /media/disk2
>
> total 1.6G
>
> drwxr-xr-x 3 root root 22 Jun 25 16:18 1
>
> *-rw-r--r-- 2 root root 0 Jun 25 16:17 up1*
>
> *-rw-r--r-- 2 root root 0 Jun 25 16:17 up2*
>
> -rw-r--r-- 2 root root 797M Jun 25 16:03 up3
>
> -rw-r--r-- 2 root root 797M Jun 25 16:04 up4
>
>
> admin at node003:~$ date
>
> Thu Jun 25 *16:41:25* IST 2015
>
>
> admin at node003:~$ ls -l -h /media/disk2
>
> total 1.6G
>
> drwxr-xr-x 3 root root 22 Jun 25 16:18 1
>
> -rw-r--r-- 2 root root 0 Jun 25 16:17 up1
>
> -rw-r--r-- 2 root root 0 Jun 25 16:17 up2
>
> -rw-r--r-- 2 root root 797M Jun 25 16:03 up3
>
> -rw-r--r-- 2 root root 797M Jun 25 16:04 up4
>
>
> *after waiting nearly 20 minutes, self healing is not recovered the full
> data junk . Then try to read the file using md5sum*
> *
> *
> root at mas03:/mnt/gluster# time md5sum up1
> 4650543ade404ed5a1171726e76f8b7c up1
>
> real 1m58.010s
> user 0m6.243s
> sys 0m0.778s
>
> *corrupted junk starts growing*
>
> admin at node003:~$ ls -l -h /media/disk2
> total 2.6G
> drwxr-xr-x 3 root root 22 Jun 25 16:18 1
> -rw-r--r-- 2 root root 797M Jun 25 15:57 up1
> -rw-r--r-- 2 root root 0 Jun 25 16:17 up2
> -rw-r--r-- 2 root root 797M Jun 25 16:03 up3
> -rw-r--r-- 2 root root 797M Jun 25 16:04 up4
>
> *_To verify healed file after two node 5 & 6 taken offline_*
>
> root at mas03:/mnt/gluster# time md5sum up1
> md5sum: up1:*Input/output error*
>
> Still the I/O error is not rectified. Could you suggest, if any thing
> wrong on our testing?
>
>
> admin at node001:~$ sudo gluster volume get vaulttest21 all
> Option Value
> ------ -----
> cluster.lookup-unhashed on
> cluster.lookup-optimize off
> cluster.min-free-disk 10%
> cluster.min-free-inodes 5%
> cluster.rebalance-stats off
> cluster.subvols-per-directory (null)
> cluster.readdir-optimize off
> cluster.rsync-hash-regex (null)
> cluster.extra-hash-regex (null)
> cluster.dht-xattr-name trusted.glusterfs.dht
> cluster.randomize-hash-range-by-gfid off
> cluster.rebal-throttle normal
> cluster.local-volume-name (null)
> cluster.weighted-rebalance on
> cluster.entry-change-log on
> cluster.read-subvolume (null)
> cluster.read-subvolume-index -1
> cluster.read-hash-mode 1
> cluster.background-self-heal-count 16
> cluster.metadata-self-heal on
> cluster.data-self-heal on
> cluster.entry-self-heal on
> cluster.self-heal-daemon on
> cluster.heal-timeout 600
> cluster.self-heal-window-size 1
> cluster.data-change-log on
> cluster.metadata-change-log on
> cluster.data-self-heal-algorithm (null)
> cluster.eager-lock on
> cluster.quorum-type none
> cluster.quorum-count (null)
> cluster.choose-local true
> cluster.self-heal-readdir-size 1KB
> cluster.post-op-delay-secs 1
> cluster.ensure-durability on
> cluster.consistent-metadata no
> cluster.stripe-block-size 128KB
> cluster.stripe-coalesce true
> diagnostics.latency-measurement off
> diagnostics.dump-fd-stats off
> diagnostics.count-fop-hits off
> diagnostics.brick-log-level INFO
> diagnostics.client-log-level INFO
> diagnostics.brick-sys-log-level CRITICAL
> diagnostics.client-sys-log-level CRITICAL
> diagnostics.brick-logger (null)
> diagnostics.client-logger (null)
> diagnostics.brick-log-format (null)
> diagnostics.client-log-format (null)
> diagnostics.brick-log-buf-size 5
> diagnostics.client-log-buf-size 5
> diagnostics.brick-log-flush-timeout 120
> diagnostics.client-log-flush-timeout 120
> performance.cache-max-file-size 0
> performance.cache-min-file-size 0
> performance.cache-refresh-timeout 1
> performance.cache-priority
> performance.cache-size 32MB
> performance.io-thread-count 16
> performance.high-prio-threads 16
> performance.normal-prio-threads 16
> performance.low-prio-threads 16
> performance.least-prio-threads 1
> performance.enable-least-priority on
> performance.least-rate-limit 0
> performance.cache-size 128MB
> performance.flush-behind on
> performance.nfs.flush-behind on
> performance.write-behind-window-size 1MB
> performance.nfs.write-behind-window-size1MB
> performance.strict-o-direct off
> performance.nfs.strict-o-direct off
> performance.strict-write-ordering off
> performance.nfs.strict-write-ordering off
> performance.lazy-open yes
> performance.read-after-open no
> performance.read-ahead-page-count 4
> performance.md-cache-timeout 1
> features.encryption off
> encryption.master-key (null)
> encryption.data-key-size 256
> encryption.block-size 4096
> network.frame-timeout 1800
> network.ping-timeout 42
> network.tcp-window-size (null)
> features.lock-heal off
> features.grace-timeout 10
> network.remote-dio disable
> client.event-threads 2
> network.ping-timeout 42
> network.tcp-window-size (null)
> network.inode-lru-limit 16384
> auth.allow *
> auth.reject (null)
> transport.keepalive (null)
> server.allow-insecure (null)
> server.root-squash off
> server.anonuid 65534
> server.anongid 65534
> server.statedump-path /var/run/gluster
> server.outstanding-rpc-limit 64
> features.lock-heal off
> features.grace-timeout (null)
> server.ssl (null)
> auth.ssl-allow *
> server.manage-gids off
> client.send-gids on
> server.gid-timeout 300
> server.own-thread (null)
> server.event-threads 2
> performance.write-behind on
> performance.read-ahead on
> performance.readdir-ahead on
> performance.io-cache on
> performance.quick-read on
> performance.open-behind on
> performance.stat-prefetch on
> performance.client-io-threads off
> performance.nfs.write-behind on
> performance.nfs.read-ahead off
> performance.nfs.io-cache off
> performance.nfs.quick-read off
> performance.nfs.stat-prefetch off
> performance.nfs.io-threads off
> performance.force-readdirp true
> features.file-snapshot off
> features.uss off
> features.snapshot-directory .snaps
> features.show-snapshot-directory off
> network.compression off
> network.compression.window-size -15
> network.compression.mem-level 8
> network.compression.min-size 0
> network.compression.compression-level -1
> network.compression.debug false
> features.limit-usage (null)
> features.quota-timeout 0
> features.default-soft-limit 80%
> features.soft-timeout 60
> features.hard-timeout 5
> features.alert-time 86400
> features.quota-deem-statfs off
> geo-replication.indexing off
> geo-replication.indexing off
> geo-replication.ignore-pid-check off
> geo-replication.ignore-pid-check off
> features.quota off
> features.inode-quota off
> features.bitrot disable
> debug.trace off
> debug.log-history no
> debug.log-file no
> debug.exclude-ops (null)
> debug.include-ops (null)
> debug.error-gen off
> debug.error-failure (null)
> debug.error-number (null)
> debug.random-failure off
> debug.error-fops (null)
> nfs.enable-ino32 no
> nfs.mem-factor 15
> nfs.export-dirs on
> nfs.export-volumes on
> nfs.addr-namelookup off
> nfs.dynamic-volumes off
> nfs.register-with-portmap on
> nfs.outstanding-rpc-limit 16
> nfs.port 2049
> nfs.rpc-auth-unix on
> nfs.rpc-auth-null on
> nfs.rpc-auth-allow all
> nfs.rpc-auth-reject none
> nfs.ports-insecure off
> nfs.trusted-sync off
> nfs.trusted-write off
> nfs.volume-access read-write
> nfs.export-dir
> nfs.disable false
> nfs.nlm on
> nfs.acl on
> nfs.mount-udp off
> nfs.mount-rmtab /var/lib/glusterd/nfs/rmtab
> nfs.rpc-statd /sbin/rpc.statd
> nfs.server-aux-gids off
> nfs.drc off
> nfs.drc-size 0x20000
> nfs.read-size (1 * 1048576ULL)
> nfs.write-size (1 * 1048576ULL)
> nfs.readdir-size (1 * 1048576ULL)
> nfs.exports-auth-enable (null)
> nfs.auth-refresh-interval-sec (null)
> nfs.auth-cache-ttl-sec (null)
> features.read-only off
> features.worm off
> storage.linux-aio off
> storage.batch-fsync-mode reverse-fsync
> storage.batch-fsync-delay-usec 0
> storage.owner-uid -1
> storage.owner-gid -1
> storage.node-uuid-pathinfo off
> storage.health-check-interval 30
> storage.build-pgfid off
> storage.bd-aio off
> cluster.server-quorum-type off
> cluster.server-quorum-ratio 0
> changelog.changelog off
> changelog.changelog-dir (null)
> changelog.encoding ascii
> changelog.rollover-time 15
> changelog.fsync-interval 5
> changelog.changelog-barrier-timeout 120
> changelog.capture-del-path off
> features.barrier disable
> features.barrier-timeout 120
> features.trash off
> features.trash-dir .trashcan
> features.trash-eliminate-path (null)
> features.trash-max-filesize 5MB
> features.trash-internal-op off
> cluster.enable-shared-storage disable
> features.ctr-enabled off
> features.record-counters off
> features.ctr_link_consistency off
> locks.trace (null)
> cluster.disperse-self-heal-daemon enable
> cluster.quorum-reads no
> client.bind-insecure (null)
> ganesha.enable off
> features.shard off
> features.shard-block-size 4MB
> features.scrub-throttle lazy
> features.scrub-freq biweekly
> features.expiry-time 120
> features.cache-invalidation off
> features.cache-invalidation-timeout 60
>
>
> Thanks & regards
> Backer
>
>
>
>
>
> On Mon, Jun 15, 2015 at 1:26 PM, Xavier Hernandez <xhernandez at datalab.es
> <mailto:xhernandez at datalab.es>> wrote:
>
> On 06/15/2015 09:25 AM, Mohamed Pakkeer wrote:
>
> Hi Xavier,
>
> When can we expect the 3.7.2 release for fixing the I/O error
> which we
> discussed on this mail thread?.
>
>
> As per the latest meeting held last wednesday [1] it will be
> released this week.
>
> Xavi
>
> [1]
> http://meetbot.fedoraproject.org/gluster-meeting/2015-06-10/gluster-meeting.2015-06-10-12.01.html
>
>
> Thanks
> Backer
>
> On Wed, May 27, 2015 at 8:02 PM, Xavier Hernandez
> <xhernandez at datalab.es <mailto:xhernandez at datalab.es>
> <mailto:xhernandez at datalab.es <mailto:xhernandez at datalab.es>>>
> wrote:
>
> Hi again,
>
> in today's gluster meeting [1] it has been decided that
> 3.7.1 will
> be released urgently to solve a bug in glusterd. All fixes
> planned
> for 3.7.1 will be moved to 3.7.2 which will be released
> soon after.
>
> Xavi
>
> [1]
> http://meetbot.fedoraproject.org/gluster-meeting/2015-05-27/gluster-meeting.2015-05-27-12.01.html
>
>
> On 05/27/2015 12:01 PM, Xavier Hernandez wrote:
>
> On 05/27/2015 11:26 AM, Mohamed Pakkeer wrote:
>
> Hi Xavier,
>
> Thanks for your reply. When can we expect the 3.7.1
> release?
>
>
> AFAIK a beta of 3.7.1 will be released very soon.
>
>
> cheers
> Backer
>
> On Wed, May 27, 2015 at 1:22 PM, Xavier Hernandez
> <xhernandez at datalab.es
> <mailto:xhernandez at datalab.es> <mailto:xhernandez at datalab.es
> <mailto:xhernandez at datalab.es>>
> <mailto:xhernandez at datalab.es
> <mailto:xhernandez at datalab.es>
>
> <mailto:xhernandez at datalab.es
> <mailto:xhernandez at datalab.es>>>> wrote:
>
> Hi,
>
> some Input/Output error issues have been
> identified and
> fixed. These
> fixes will be available on 3.7.1.
>
> Xavi
>
>
> On 05/26/2015 10:15 AM, Mohamed Pakkeer wrote:
>
> Hi Glusterfs Experts,
>
> We are testing glusterfs 3.7.0 tarball on
> our 10
> Node glusterfs
> cluster.
> Each node has 36 dirves and please find
> the volume
> info below
>
> Volume Name: vaulttest5
> Type: Distributed-Disperse
> Volume ID:
> 68e082a6-9819-4885-856c-1510cd201bd9
> Status: Started
> Number of Bricks: 36 x (8 + 2) = 360
> Transport-type: tcp
> Bricks:
> Brick1: 10.1.2.1:/media/disk1
> Brick2: 10.1.2.2:/media/disk1
> Brick3: 10.1.2.3:/media/disk1
> Brick4: 10.1.2.4:/media/disk1
> Brick5: 10.1.2.5:/media/disk1
> Brick6: 10.1.2.6:/media/disk1
> Brick7: 10.1.2.7:/media/disk1
> Brick8: 10.1.2.8:/media/disk1
> Brick9: 10.1.2.9:/media/disk1
> Brick10: 10.1.2.10:/media/disk1
> Brick11: 10.1.2.1:/media/disk2
> Brick12: 10.1.2.2:/media/disk2
> Brick13: 10.1.2.3:/media/disk2
> Brick14: 10.1.2.4:/media/disk2
> Brick15: 10.1.2.5:/media/disk2
> Brick16: 10.1.2.6:/media/disk2
> Brick17: 10.1.2.7:/media/disk2
> Brick18: 10.1.2.8:/media/disk2
> Brick19: 10.1.2.9:/media/disk2
> Brick20: 10.1.2.10:/media/disk2
> ...
> ....
> Brick351: 10.1.2.1:/media/disk36
> Brick352: 10.1.2.2:/media/disk36
> Brick353: 10.1.2.3:/media/disk36
> Brick354: 10.1.2.4:/media/disk36
> Brick355: 10.1.2.5:/media/disk36
> Brick356: 10.1.2.6:/media/disk36
> Brick357: 10.1.2.7:/media/disk36
> Brick358: 10.1.2.8:/media/disk36
> Brick359: 10.1.2.9:/media/disk36
> Brick360: 10.1.2.10:/media/disk36
> Options Reconfigured:
> performance.readdir-ahead: on
>
> We did some performance testing and
> simulated the
> proactive self
> healing
> for Erasure coding. Disperse volume has been
> created across
> nodes.
>
> _*Description of problem*_
>
> I disconnected the *network of two nodes*
> and tried
> to write
> some video
> files and *glusterfs* *wrote the video
> files on
> balance 8 nodes
> perfectly*. I tried to download the
> uploaded file
> and it was
> downloaded
> perfectly. Then i enabled the network of
> two nodes,
> the pro
> active self
> healing mechanism worked perfectly and
> wrote the
> unavailable
> junk of
> data to the recently enabled node from the
> other 8
> nodes. But
> when i
> tried to download the same file node, it
> showed
> Input/Output
> error. I
> couldn't download the file. I think there
> is an
> issue in pro
> active self
> healing.
>
> Also we tried the simulation with one node
> network
> failure. We
> faced
> same I/O error issue while downloading the
> file
>
>
> _Error while downloading file _
> _
> _
>
> root at master02:/home/admin# rsync -r --progress
> /mnt/gluster/file13_AN
> ./1/file13_AN-2
>
> sending incremental file list
>
> file13_AN
>
> 3,342,355,597 100% 4.87MB/s 0:10:54
> (xfr#1,
> to-chk=0/1)
>
> rsync: read errors mapping
> "/mnt/gluster/file13_AN":
> Input/output error (5)
>
> WARNING: file13_AN failed verification --
> update
> discarded (will
> try again).
>
> root at master02:/home/admin# cp
> /mnt/gluster/file13_AN
> ./1/file13_AN-3
>
> cp: error reading ‘/mnt/gluster/file13_AN’:
> Input/output error
>
> cp: failed to extend ‘./1/file13_AN-3’:
> Input/output error_
> _
>
>
> We can't conclude the issue with glusterfs
> 3.7.0 or
> our glusterfs
> configuration.
>
> Any help would be greatly appreciated
>
> --
> Cheers
> Backer
>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
> <mailto:Gluster-users at gluster.org
> <mailto:Gluster-users at gluster.org>>
> <mailto:Gluster-users at gluster.org
> <mailto:Gluster-users at gluster.org>
> <mailto:Gluster-users at gluster.org
> <mailto:Gluster-users at gluster.org>>>
> http://www.gluster.org/mailman/listinfo/gluster-users
>
>
>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
> <mailto:Gluster-users at gluster.org
> <mailto:Gluster-users at gluster.org>>
> http://www.gluster.org/mailman/listinfo/gluster-users
>
>
>
>
>
>
>
>
> --
> Thanks & Regards
> K.Mohamed Pakkeer
> Mobile- 0091-8754410114
>
More information about the Gluster-users
mailing list