[Bugs] [Bug 1329503] [tiering]: during detach tier operation, Input/output error is seen with new file writes on NFS mount

Fri Apr 22 06:31:39 UTC 2016

https://bugzilla.redhat.com/show_bug.cgi?id=1329503

Mohammed Rafi KC <rkavunga at redhat.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
           Assignee|bugs at gluster.org            |rkavunga at redhat.com

--- Comment #1 from Mohammed Rafi KC <rkavunga at redhat.com> ---
Copy pasting description and RCA for public use

Description of problem:
On an NFS mount, when large files are written and detach tier operation is
started, input/output error is seen. 

[root at dhcp46-9 mnt]# while true; do for i in {1..5};do dd if=/dev/urandom
of=file$i bs=1024 count=700000;echo $?;done; echo 'end of cycle'; done
700000+0 records in
700000+0 records out
716800000 bytes (717 MB) copied, 73.3324 s, 9.8 MB/s
0
700000+0 records in
700000+0 records out
716800000 bytes (717 MB) copied, 71.0725 s, 10.1 MB/s
0
dd: error writing ‘file3’: Input/output error
600027+0 records in
600026+0 records out
614426624 bytes (614 MB) copied, 70.7233 s, 8.7 MB/s
1
700000+0 records in
700000+0 records out
716800000 bytes (717 MB) copied, 75.3172 s, 9.5 MB/s
0
700000+0 records in
700000+0 records out
716800000 bytes (717 MB) copied, 73.2562 s, 9.8 MB/s
0
end of cycle

[2016-04-12 01:43:39.423991] E [MSGID: 108008]
[afr-transaction.c:1981:afr_transaction] 0-testvol-replicate-4: Failing WRITE
on gfid 250d586b-3591-470b-a3ce-99fe52bb453d: split-brain observed.
[Input/output error]
[2016-04-12 01:43:39.424838] E [MSGID: 108008]
[afr-transaction.c:1981:afr_transaction] 0-testvol-replicate-4: Failing WRITE
on gfid 250d586b-3591-470b-a3ce-99fe52bb453d: split-brain observed.
[Input/output error]
[2016-04-12 01:43:39.425705] E [MSGID: 108008]
[afr-transaction.c:1981:afr_transaction] 0-testvol-replicate-4: Failing WRITE
on gfid 250d586b-3591-470b-a3ce-99fe52bb453d: split-brain observed.
[Input/output error]
[2016-04-12 01:43:39.429049] E [MSGID: 108008]
[afr-transaction.c:1981:afr_transaction] 0-testvol-replicate-4: Failing WRITE
on gfid 250d586b-3591-470b-a3ce-99fe52bb453d: split-brain observed.
[Input/output error]
[2016-04-12 01:43:39.430226] E [MSGID: 108008]
[afr-transaction.c:1981:afr_transaction] 0-testvol-replicate-4: Failing WRITE
on gfid 250d586b-3591-470b-a3ce-99fe52bb453d: split-brain observed.
[Input/output error]

[root at dhcp47-105 ~]# gluster v info

Volume Name: testvol
Type: Tier
Volume ID: 02427025-adcf-48a2-ac58-ae494839e9f8
Status: Started
Number of Bricks: 12
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Brick1: 10.70.46.94:/bricks/brick3/leg1
Brick2: 10.70.47.9:/bricks/brick3/leg1
Brick3: 10.70.47.105:/bricks/brick3/leg1
Brick4: 10.70.47.90:/bricks/brick3/leg1
Cold Tier:
Cold Tier Type : Distributed-Replicate
Number of Bricks: 4 x 2 = 8
Brick5: 10.70.47.90:/bricks/brick0/ct
Brick6: 10.70.47.105:/bricks/brick0/ct
Brick7: 10.70.47.9:/bricks/brick0/ct
Brick8: 10.70.46.94:/bricks/brick0/ct
Brick9: 10.70.47.90:/bricks/brick1/ct
Brick10: 10.70.47.105:/bricks/brick1/ct
Brick11: 10.70.47.9:/bricks/brick1/ct
Brick12: 10.70.46.94:/bricks/brick1/ct
Options Reconfigured:
cluster.tier-mode: cache
features.ctr-enabled: on
performance.readdir-ahead: on

Version-Release number of selected component (if applicable):
glusterfs-server-3.7.9-1.el7rhgs.x86_64

How reproducible:
2/3 

Steps to Reproduce:
1) create a dist-rep and start it followed by enabling quota
2) now nfs mount the volume and use dd command to create say 5 files of atleast
700MB each " for i in {1..5};do dd if=/dev/urandom of=file$i bs=1024
count=700000;echo $?;done"
3) Now while dd is in progress, perform an attach tier operation
4) After attach tier is successful, Perform detach tier start --> This is when
dd throws IO error

Actual results:
IO error is seen

Expected results:
No IO error should be seen  during detach tier operation

Additional info:

--- Additional comment from Mohammed Rafi KC on 2016-04-21 10:40:23 EDT ---

RCA:

NFS uses anonymous fd when writing into a file. If the file moved from cached
subvol then write or lock from afr will fail with ENOENT. When write fails,
first we will check migration complete check from dht layer. Which does a
lookup on the previous source subvol. Since the file moved from there, this
lookup will fail. So it will set readable flag to 0 for all subvolume in afr.
At this point, the tier still has cached_subvolume as old source. So any
subsequent request will again send to the same subvolume. That will cause afr
to throw EIO error.

Tier layer update cached_subvol only after it completes "migration complete
check". So this race window will be in between  migration complete check from
dht later and tier layer.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are on the CC list for the bug.
You are the assignee for the bug.