[Gluster-users] Issue with Pro active self healing for Erasure coding

Fri Jun 26 06:58:47 UTC 2015

Hi Xavier

We are facing same I/O error after upgrade into gluster 3.7.2.

Description of problem:
=======================
In a 3 x (4 + 2) = 18 distributed disperse volume, there are input/output
error of some files on fuse mount after simulating the following scenario

1.   Simulate the disk failure by killing the disk pid and again adding the
same disk after formatting the drive
2.   Try to read the recovered or healed file after 2 bricks/nodes were
brought down

Version-Release number of selected component (if applicable):
==============================================================

admin at node001:~$ sudo gluster --version
glusterfs 3.7.2 built on Jun 19 2015 16:33:27
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (coffee) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General
Public License.

Steps to Reproduce:

1. create a 3x(4+2) disperse volume across nodes
2. FUSE mount on the client and start creating files/directories with
mkdir and rsync/dd
3. simulate the disk failure by killing pid of any disk on one node
and add again the same disk after formatting the drive
4. start volume by force
5. self haling adding the file name with 0 bytes in newly formatted drive
6. wait more time to finish self healing, but self healing is not
happening the file lies on 0 bytes
7. Try to read same file from client, now the file name with 0 byte
try to recovery and recovery completed. Get the md5sum of the file
with all client live and the result is positive
8. Now, bring down 2 of the node
9. Now try to get the mdsum of same recoverd file, client throws I/O error

Screen shots

admin at node001:~$ sudo gluster volume info

Volume Name: vaulttest21
Type: Distributed-Disperse
Volume ID: ac6a374d-a0a2-405c-823d-0672fd92f0af
Status: Started
Number of Bricks: 3 x (4 + 2) = 18
Transport-type: tcp
Bricks:
Brick1: 10.1.2.1:/media/disk1
Brick2: 10.1.2.2:/media/disk1
Brick3: 10.1.2.3:/media/disk1
Brick4: 10.1.2.4:/media/disk1
Brick5: 10.1.2.5:/media/disk1
Brick6: 10.1.2.6:/media/disk1
Brick7: 10.1.2.1:/media/disk2
Brick8: 10.1.2.2:/media/disk2
Brick9: 10.1.2.3:/media/disk2
Brick10: 10.1.2.4:/media/disk2
Brick11: 10.1.2.5:/media/disk2
Brick12: 10.1.2.6:/media/disk2
Brick13: 10.1.2.1:/media/disk3
Brick14: 10.1.2.2:/media/disk3
Brick15: 10.1.2.3:/media/disk3
Brick16: 10.1.2.4:/media/disk3
Brick17: 10.1.2.5:/media/disk3
Brick18: 10.1.2.6:/media/disk3
Options Reconfigured:
performance.readdir-ahead: on

*After simulated the disk failure( node3- disk2) and adding aging by
formatting the drive *

admin at node003:~$ date

Thu Jun 25 *16:21:58* IST 2015

admin at node003:~$ ls -l -h  /media/disk2

total 1.6G

drwxr-xr-x 3 root root   22 Jun 25 16:18 1

*-rw-r--r-- 2 root root    0 Jun 25 16:17 up1*

*-rw-r--r-- 2 root root    0 Jun 25 16:17 up2*

-rw-r--r-- 2 root root 797M Jun 25 16:03 up3

-rw-r--r-- 2 root root 797M Jun 25 16:04 up4

--

admin at node003:~$ date

Thu Jun 25 *16:25:09* IST 2015

admin at node003:~$ ls -l -h  /media/disk2

total 1.6G

drwxr-xr-x 3 root root   22 Jun 25 16:18 1

*-rw-r--r-- 2 root root    0 Jun 25 16:17 up1*

*-rw-r--r-- 2 root root    0 Jun 25 16:17 up2*

-rw-r--r-- 2 root root 797M Jun 25 16:03 up3

-rw-r--r-- 2 root root 797M Jun 25 16:04 up4

admin at node003:~$ date

Thu Jun 25 *16:41:25* IST 2015

admin at node003:~$  ls -l -h  /media/disk2

total 1.6G

drwxr-xr-x 3 root root   22 Jun 25 16:18 1

-rw-r--r-- 2 root root    0 Jun 25 16:17 up1

-rw-r--r-- 2 root root    0 Jun 25 16:17 up2

-rw-r--r-- 2 root root 797M Jun 25 16:03 up3

-rw-r--r-- 2 root root 797M Jun 25 16:04 up4

*after waiting nearly 20 minutes, self healing is not recovered the full
data junk . Then try to read the file using md5sum*

root at mas03:/mnt/gluster# time md5sum up1
4650543ade404ed5a1171726e76f8b7c  up1

real    1m58.010s
user    0m6.243s
sys     0m0.778s

*corrupted junk starts growing*

admin at node003:~$ ls -l -h  /media/disk2
total 2.6G
drwxr-xr-x 3 root root   22 Jun 25 16:18 1
-rw-r--r-- 2 root root 797M Jun 25 15:57 up1
-rw-r--r-- 2 root root    0 Jun 25 16:17 up2
-rw-r--r-- 2 root root 797M Jun 25 16:03 up3
-rw-r--r-- 2 root root 797M Jun 25 16:04 up4

*To verify healed file after two node 5 & 6 taken offline*

root at mas03:/mnt/gluster# time md5sum up1
md5sum: up1:* Input/output error*

Still the I/O error is not rectified. Could you suggest, if any thing wrong
on our testing?

admin at node001:~$ sudo gluster volume get vaulttest21 all
Option                                  Value
------                                  -----
cluster.lookup-unhashed                 on
cluster.lookup-optimize                 off
cluster.min-free-disk                   10%
cluster.min-free-inodes                 5%
cluster.rebalance-stats                 off
cluster.subvols-per-directory           (null)
cluster.readdir-optimize                off
cluster.rsync-hash-regex                (null)
cluster.extra-hash-regex                (null)
cluster.dht-xattr-name                  trusted.glusterfs.dht
cluster.randomize-hash-range-by-gfid    off
cluster.rebal-throttle                  normal
cluster.local-volume-name               (null)
cluster.weighted-rebalance              on
cluster.entry-change-log                on
cluster.read-subvolume                  (null)
cluster.read-subvolume-index            -1
cluster.read-hash-mode                  1
cluster.background-self-heal-count      16
cluster.metadata-self-heal              on
cluster.data-self-heal                  on
cluster.entry-self-heal                 on
cluster.self-heal-daemon                on
cluster.heal-timeout                    600
cluster.self-heal-window-size           1
cluster.data-change-log                 on
cluster.metadata-change-log             on
cluster.data-self-heal-algorithm        (null)
cluster.eager-lock                      on
cluster.quorum-type                     none
cluster.quorum-count                    (null)
cluster.choose-local                    true
cluster.self-heal-readdir-size          1KB
cluster.post-op-delay-secs              1
cluster.ensure-durability               on
cluster.consistent-metadata             no
cluster.stripe-block-size               128KB
cluster.stripe-coalesce                 true
diagnostics.latency-measurement         off
diagnostics.dump-fd-stats               off
diagnostics.count-fop-hits              off
diagnostics.brick-log-level             INFO
diagnostics.client-log-level            INFO
diagnostics.brick-sys-log-level         CRITICAL
diagnostics.client-sys-log-level        CRITICAL
diagnostics.brick-logger                (null)
diagnostics.client-logger               (null)
diagnostics.brick-log-format            (null)
diagnostics.client-log-format           (null)
diagnostics.brick-log-buf-size          5
diagnostics.client-log-buf-size         5
diagnostics.brick-log-flush-timeout     120
diagnostics.client-log-flush-timeout    120
performance.cache-max-file-size         0
performance.cache-min-file-size         0
performance.cache-refresh-timeout       1
performance.cache-priority
performance.cache-size                  32MB
performance.io-thread-count             16
performance.high-prio-threads           16
performance.normal-prio-threads         16
performance.low-prio-threads            16
performance.least-prio-threads          1
performance.enable-least-priority       on
performance.least-rate-limit            0
performance.cache-size                  128MB
performance.flush-behind                on
performance.nfs.flush-behind            on
performance.write-behind-window-size    1MB
performance.nfs.write-behind-window-size1MB
performance.strict-o-direct             off
performance.nfs.strict-o-direct         off
performance.strict-write-ordering       off
performance.nfs.strict-write-ordering   off
performance.lazy-open                   yes
performance.read-after-open             no
performance.read-ahead-page-count       4
performance.md-cache-timeout            1
features.encryption                     off
encryption.master-key                   (null)
encryption.data-key-size                256
encryption.block-size                   4096
network.frame-timeout                   1800
network.ping-timeout                    42
network.tcp-window-size                 (null)
features.lock-heal                      off
features.grace-timeout                  10
network.remote-dio                      disable
client.event-threads                    2
network.ping-timeout                    42
network.tcp-window-size                 (null)
network.inode-lru-limit                 16384
auth.allow                              *
auth.reject                             (null)
transport.keepalive                     (null)
server.allow-insecure                   (null)
server.root-squash                      off
server.anonuid                          65534
server.anongid                          65534
server.statedump-path                   /var/run/gluster
server.outstanding-rpc-limit            64
features.lock-heal                      off
features.grace-timeout                  (null)
server.ssl                              (null)
auth.ssl-allow                          *
server.manage-gids                      off
client.send-gids                        on
server.gid-timeout                      300
server.own-thread                       (null)
server.event-threads                    2
performance.write-behind                on
performance.read-ahead                  on
performance.readdir-ahead               on
performance.io-cache                    on
performance.quick-read                  on
performance.open-behind                 on
performance.stat-prefetch               on
performance.client-io-threads           off
performance.nfs.write-behind            on
performance.nfs.read-ahead              off
performance.nfs.io-cache                off
performance.nfs.quick-read              off
performance.nfs.stat-prefetch           off
performance.nfs.io-threads              off
performance.force-readdirp              true
features.file-snapshot                  off
features.uss                            off
features.snapshot-directory             .snaps
features.show-snapshot-directory        off
network.compression                     off
network.compression.window-size         -15
network.compression.mem-level           8
network.compression.min-size            0
network.compression.compression-level   -1
network.compression.debug               false
features.limit-usage                    (null)
features.quota-timeout                  0
features.default-soft-limit             80%
features.soft-timeout                   60
features.hard-timeout                   5
features.alert-time                     86400
features.quota-deem-statfs              off
geo-replication.indexing                off
geo-replication.indexing                off
geo-replication.ignore-pid-check        off
geo-replication.ignore-pid-check        off
features.quota                          off
features.inode-quota                    off
features.bitrot                         disable
debug.trace                             off
debug.log-history                       no
debug.log-file                          no
debug.exclude-ops                       (null)
debug.include-ops                       (null)
debug.error-gen                         off
debug.error-failure                     (null)
debug.error-number                      (null)
debug.random-failure                    off
debug.error-fops                        (null)
nfs.enable-ino32                        no
nfs.mem-factor                          15
nfs.export-dirs                         on
nfs.export-volumes                      on
nfs.addr-namelookup                     off
nfs.dynamic-volumes                     off
nfs.register-with-portmap               on
nfs.outstanding-rpc-limit               16
nfs.port                                2049
nfs.rpc-auth-unix                       on
nfs.rpc-auth-null                       on
nfs.rpc-auth-allow                      all
nfs.rpc-auth-reject                     none
nfs.ports-insecure                      off
nfs.trusted-sync                        off
nfs.trusted-write                       off
nfs.volume-access                       read-write
nfs.export-dir
nfs.disable                             false
nfs.nlm                                 on
nfs.acl                                 on
nfs.mount-udp                           off
nfs.mount-rmtab                         /var/lib/glusterd/nfs/rmtab
nfs.rpc-statd                           /sbin/rpc.statd
nfs.server-aux-gids                     off
nfs.drc                                 off
nfs.drc-size                            0x20000
nfs.read-size                           (1 * 1048576ULL)
nfs.write-size                          (1 * 1048576ULL)
nfs.readdir-size                        (1 * 1048576ULL)
nfs.exports-auth-enable                 (null)
nfs.auth-refresh-interval-sec           (null)
nfs.auth-cache-ttl-sec                  (null)
features.read-only                      off
features.worm                           off
storage.linux-aio                       off
storage.batch-fsync-mode                reverse-fsync
storage.batch-fsync-delay-usec          0
storage.owner-uid                       -1
storage.owner-gid                       -1
storage.node-uuid-pathinfo              off
storage.health-check-interval           30
storage.build-pgfid                     off
storage.bd-aio                          off
cluster.server-quorum-type              off
cluster.server-quorum-ratio             0
changelog.changelog                     off
changelog.changelog-dir                 (null)
changelog.encoding                      ascii
changelog.rollover-time                 15
changelog.fsync-interval                5
changelog.changelog-barrier-timeout     120
changelog.capture-del-path              off
features.barrier                        disable
features.barrier-timeout                120
features.trash                          off
features.trash-dir                      .trashcan
features.trash-eliminate-path           (null)
features.trash-max-filesize             5MB
features.trash-internal-op              off
cluster.enable-shared-storage           disable
features.ctr-enabled                    off
features.record-counters                off
features.ctr_link_consistency           off
locks.trace                             (null)
cluster.disperse-self-heal-daemon       enable
cluster.quorum-reads                    no
client.bind-insecure                    (null)
ganesha.enable                          off
features.shard                          off
features.shard-block-size               4MB
features.scrub-throttle                 lazy
features.scrub-freq                     biweekly
features.expiry-time                    120
features.cache-invalidation             off
features.cache-invalidation-timeout     60

Thanks & regards
Backer

On Mon, Jun 15, 2015 at 1:26 PM, Xavier Hernandez <xhernandez at datalab.es>
wrote:

> On 06/15/2015 09:25 AM, Mohamed Pakkeer wrote:
>
>> Hi Xavier,
>>
>> When can we expect the 3.7.2 release for fixing the I/O error which we
>> discussed on this mail thread?.
>>
>
> As per the latest meeting held last wednesday [1] it will be released this
> week.
>
> Xavi
>
> [1]
> http://meetbot.fedoraproject.org/gluster-meeting/2015-06-10/gluster-meeting.2015-06-10-12.01.html
>
>
>> Thanks
>> Backer
>>
>> On Wed, May 27, 2015 at 8:02 PM, Xavier Hernandez <xhernandez at datalab.es
>> <mailto:xhernandez at datalab.es>> wrote:
>>
>>     Hi again,
>>
>>     in today's gluster meeting [1] it has been decided that 3.7.1 will
>>     be released urgently to solve a bug in glusterd. All fixes planned
>>     for 3.7.1 will be moved to 3.7.2 which will be released soon after.
>>
>>     Xavi
>>
>>     [1]
>>
>> http://meetbot.fedoraproject.org/gluster-meeting/2015-05-27/gluster-meeting.2015-05-27-12.01.html
>>
>>
>>     On 05/27/2015 12:01 PM, Xavier Hernandez wrote:
>>
>>         On 05/27/2015 11:26 AM, Mohamed Pakkeer wrote:
>>
>>             Hi Xavier,
>>
>>             Thanks for your reply. When can we expect the 3.7.1 release?
>>
>>
>>         AFAIK a beta of 3.7.1 will be released very soon.
>>
>>
>>             cheers
>>             Backer
>>
>>             On Wed, May 27, 2015 at 1:22 PM, Xavier Hernandez
>>             <xhernandez at datalab.es <mailto:xhernandez at datalab.es>
>>             <mailto:xhernandez at datalab.es
>>
>>             <mailto:xhernandez at datalab.es>>> wrote:
>>
>>                  Hi,
>>
>>                  some Input/Output error issues have been identified and
>>             fixed. These
>>                  fixes will be available on 3.7.1.
>>
>>                  Xavi
>>
>>
>>                  On 05/26/2015 10:15 AM, Mohamed Pakkeer wrote:
>>
>>                      Hi Glusterfs Experts,
>>
>>                      We are testing glusterfs 3.7.0 tarball on our 10
>>             Node glusterfs
>>                      cluster.
>>                      Each node has 36 dirves and please find the volume
>>             info below
>>
>>                      Volume Name: vaulttest5
>>                      Type: Distributed-Disperse
>>                      Volume ID: 68e082a6-9819-4885-856c-1510cd201bd9
>>                      Status: Started
>>                      Number of Bricks: 36 x (8 + 2) = 360
>>                      Transport-type: tcp
>>                      Bricks:
>>                      Brick1: 10.1.2.1:/media/disk1
>>                      Brick2: 10.1.2.2:/media/disk1
>>                      Brick3: 10.1.2.3:/media/disk1
>>                      Brick4: 10.1.2.4:/media/disk1
>>                      Brick5: 10.1.2.5:/media/disk1
>>                      Brick6: 10.1.2.6:/media/disk1
>>                      Brick7: 10.1.2.7:/media/disk1
>>                      Brick8: 10.1.2.8:/media/disk1
>>                      Brick9: 10.1.2.9:/media/disk1
>>                      Brick10: 10.1.2.10:/media/disk1
>>                      Brick11: 10.1.2.1:/media/disk2
>>                      Brick12: 10.1.2.2:/media/disk2
>>                      Brick13: 10.1.2.3:/media/disk2
>>                      Brick14: 10.1.2.4:/media/disk2
>>                      Brick15: 10.1.2.5:/media/disk2
>>                      Brick16: 10.1.2.6:/media/disk2
>>                      Brick17: 10.1.2.7:/media/disk2
>>                      Brick18: 10.1.2.8:/media/disk2
>>                      Brick19: 10.1.2.9:/media/disk2
>>                      Brick20: 10.1.2.10:/media/disk2
>>                      ...
>>                      ....
>>                      Brick351: 10.1.2.1:/media/disk36
>>                      Brick352: 10.1.2.2:/media/disk36
>>                      Brick353: 10.1.2.3:/media/disk36
>>                      Brick354: 10.1.2.4:/media/disk36
>>                      Brick355: 10.1.2.5:/media/disk36
>>                      Brick356: 10.1.2.6:/media/disk36
>>                      Brick357: 10.1.2.7:/media/disk36
>>                      Brick358: 10.1.2.8:/media/disk36
>>                      Brick359: 10.1.2.9:/media/disk36
>>                      Brick360: 10.1.2.10:/media/disk36
>>                      Options Reconfigured:
>>                      performance.readdir-ahead: on
>>
>>                      We did some performance testing and simulated the
>>             proactive self
>>                      healing
>>                      for Erasure coding. Disperse volume has been
>>             created across
>>             nodes.
>>
>>                      _*Description of problem*_
>>
>>                      I disconnected the *network of two nodes* and tried
>>             to write
>>                      some video
>>                      files and *glusterfs* *wrote the video files on
>>             balance 8 nodes
>>                      perfectly*. I tried to download the uploaded file
>>             and it was
>>                      downloaded
>>                      perfectly. Then i enabled the network of two nodes,
>>             the pro
>>                      active self
>>                      healing mechanism worked perfectly and wrote the
>>             unavailable
>>             junk of
>>                      data to the recently enabled node from the other 8
>>             nodes. But
>>             when i
>>                      tried to download the same file node, it showed
>>             Input/Output
>>                      error. I
>>                      couldn't download the file. I think there is an
>>             issue in pro
>>                      active self
>>                      healing.
>>
>>                      Also we tried the simulation with one node network
>>             failure. We
>>             faced
>>                      same I/O error issue while downloading the file
>>
>>
>>                      _Error while downloading file _
>>                      _
>>                      _
>>
>>                      root at master02:/home/admin# rsync -r --progress
>>                      /mnt/gluster/file13_AN
>>                      ./1/file13_AN-2
>>
>>                      sending incremental file list
>>
>>                      file13_AN
>>
>>                          3,342,355,597 100% 4.87MB/s    0:10:54 (xfr#1,
>>             to-chk=0/1)
>>
>>                      rsync: read errors mapping "/mnt/gluster/file13_AN":
>>                      Input/output error (5)
>>
>>                      WARNING: file13_AN failed verification -- update
>>             discarded (will
>>                      try again).
>>
>>                         root at master02:/home/admin# cp
>> /mnt/gluster/file13_AN
>>                      ./1/file13_AN-3
>>
>>                      cp: error reading ‘/mnt/gluster/file13_AN’:
>>             Input/output error
>>
>>                      cp: failed to extend ‘./1/file13_AN-3’:
>>             Input/output error_
>>                      _
>>
>>
>>                      We can't conclude the issue with glusterfs 3.7.0 or
>>             our glusterfs
>>                      configuration.
>>
>>                      Any help would be greatly appreciated
>>
>>                      --
>>                      Cheers
>>                      Backer
>>
>>
>>
>>                      _______________________________________________
>>                      Gluster-users mailing list
>>             Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>>             <mailto:Gluster-users at gluster.org
>>             <mailto:Gluster-users at gluster.org>>
>>             http://www.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>>
>>
>>
>>         _______________________________________________
>>         Gluster-users mailing list
>>         Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>>         http://www.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>>
>>
>>

-- 
Thanks & Regards
K.Mohamed Pakkeer
Mobile- 0091-8754410114
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20150626/c8789bcc/attachment.html>