[Bugs] [Bug 1509810] New: [Disperse] Implement open fd heal for disperse volume
bugzilla at redhat.com
bugzilla at redhat.com
Mon Nov 6 07:02:38 UTC 2017
https://bugzilla.redhat.com/show_bug.cgi?id=1509810
Bug ID: 1509810
Summary: [Disperse] Implement open fd heal for disperse volume
Product: Red Hat Gluster Storage
Version: 3.4
Component: disperse
Assignee: aspandey at redhat.com
Reporter: sheggodu at redhat.com
QA Contact: nchilaka at redhat.com
CC: aspandey at redhat.com, bugs at gluster.org,
pkarampu at redhat.com, rhs-bugs at redhat.com,
storage-qa-internal at redhat.com
Depends On: 1431955
+++ This bug was initially created as a clone of Bug #1431955 +++
Description of problem:
When EC opens a file and get fd from all the bricks, if the brick is down, it
will not have the fd from that sub volume.
Before sending a write on that fd, if the brick comes UP, we should be able to
send this fd on this brick also to avoid unnecessary heal later.
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1.
2.
3.
Actual results:
Expected results:
Additional info:
--- Additional comment from Pranith Kumar K on 2017-03-27 01:28:29 EDT ---
Sunil,
When you do a dd on a file and as long as the file is open, you see
something like the following in statedump of the client:
[xlator.protocol.client.ec2-client-0.priv]
fd.0.remote_fd=0
connecting=0
connected=1
total_bytes_read=7288220
ping_timeout=42
total_bytes_written=11045016
ping_msgs_sent=3
msgs_sent=19812
This should be present for each of the fds that are open from each
client-xlator.
So if we have 3=2+1 configuration we will have one for each of the client
xlators. But if the brick was down at the time of opening the file this won't
be present. Now after bringing the brick back up and operating on the file we
should have this file opened again. I think at the moment this gets converted
to anonymous-fd based operation so the operation may not fail. But it is
important to open the file again for all operations to function properly like
lk etc.
--- Additional comment from Sunil Kumar Acharya on 2017-03-27 07:56:27 EDT ---
Steps to re-create/test:
1. Created and mounted an EC (2+1) volume. Heal disabled.
[root at server3 ~]# gluster volume info
Volume Name: ec-vol
Type: Disperse
Volume ID: b676891f-392d-49a6-891c-8e7e3790658d
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: server1:/LAB/store/ec-vol
Brick2: server2:/LAB/store/ec-vol
Brick3: server3:/LAB/store/ec-vol
Options Reconfigured:
cluster.disperse-self-heal-daemon: disable <<<<<
transport.address-family: inet
nfs.disable: on
disperse.background-heals: 0 <<<<<
[root at server3 ~]#
2. Touched a file on the mountpoint.
# touch file
3. Brought down one of the brick process.
4. Opened a file descriptor for the file.
# exec 30<> file
5. Brought up the brick process which was down.
6. Wrote to the FD.
# echo "abc" >&30
7. File status on clinet and bricks after write completes.
Client:
[root at varada mount]# ls -lh file
-rw-r--r--. 1 root root 4 Mar 27 17:11 file
[root at varada mount]# du -kh file
1.0K file
[root at varada mount]#
Bricks:
[root at server1 ~]# du -kh /LAB/store/ec-vol/file
4.0K /LAB/store/ec-vol/file
[root at server1 ~]# ls -lh /LAB/store/ec-vol/file
-rw-r--r-- 2 root root 0 Mar 27 17:08 /LAB/store/ec-vol/file
[root at server1 ~]# cat /LAB/store/ec-vol/file
[root at server1 ~]#
[root at server2 ~]# du -kh /LAB/store/ec-vol/file
8.0K /LAB/store/ec-vol/file
[root at server2 ~]# ls -lh /LAB/store/ec-vol/file
-rw-r--r-- 2 root root 512 Mar 27 17:11 /LAB/store/ec-vol/file
[root at server2 ~]# cat /LAB/store/ec-vol/file
abc
[root at server2 ~]#
[root at server3 ~]# du -kh /LAB/store/ec-vol/file
8.0K /LAB/store/ec-vol/file
[root at server3 ~]# ls -lh /LAB/store/ec-vol/file
-rw-r--r-- 2 root root 512 Mar 27 17:11 /LAB/store/ec-vol/file
[root at server3 ~]# cat /LAB/store/ec-vol/file
abc
abc
[root at server3 ~]#
--- Additional comment from Ashish Pandey on 2017-04-09 07:08:05 EDT ---
We will also need to disable some performance option to actually open an FD
in step - 4 for those bricks which are UP.
1 - gluster v set vol performance.lazy-open no
2 - gluster v set vol performance.read-after-open yes
[root at apandey /]# gluster v info
Volume Name: vol
Type: Disperse
Volume ID: d007c6c2-98da-4cd9-8d5e-99e0e3f37012
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: apandey:/home/apandey/bricks/gluster/vol-1
Brick2: apandey:/home/apandey/bricks/gluster/vol-2
Brick3: apandey:/home/apandey/bricks/gluster/vol-3
Options Reconfigured:
disperse.background-heals: 0
cluster.disperse-self-heal-daemon: disable
performance.read-after-open: yes
performance.lazy-open: no
transport.address-family: inet
nfs.disable: on
[root at apandey glusterfs]# gluster v status
Status of volume: vol
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick apandey:/home/apandey/bricks/gluster/
vol-1 49152 0 Y 6297
Brick apandey:/home/apandey/bricks/gluster/
vol-2 49153 0 Y 5865
Brick apandey:/home/apandey/bricks/gluster/
vol-3 49154 0 Y 5884
Task Status of Volume vol
------------------------------------------------------------------------------
There are no active volume tasks
After bringing the brick vol-1 UP and writing data on FD.
[root at apandey glusterfs]# cat /home/apandey/bricks/gluster/vol-1/dir/file
[root at apandey glusterfs]# cat /home/apandey/bricks/gluster/vol-2/dir/file
abc
abc
abc
abc
abc
abc
abc
abc
abc
abc
abc
[root at apandey glusterfs]# cat /home/apandey/bricks/gluster/vol-3/dir/file
abc
abc
abc
abc
abc
abc
abc
abc
abc
abc
abc
abc
abc
abc
abc
abc
abc
abc
abc
abc
abc
abc
[root at apandey glusterfs]#
[root at apandey glusterfs]# getfattr -m. -d -e hex
/home/apandey/bricks/gluster/vol-*/dir/file
getfattr: Removing leading '/' from absolute path names
# file: home/apandey/bricks/gluster/vol-1/dir/file
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a757365725f686f6d655f743a733000
trusted.ec.config=0x0000080301000200
trusted.ec.dirty=0x000000000000000b000000000000000b
trusted.ec.size=0x0000000000000000
trusted.ec.version=0x00000000000000000000000000000001
trusted.gfid=0xf8cf475afa5e4873bf2274f45278f74f
# file: home/apandey/bricks/gluster/vol-2/dir/file
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a757365725f686f6d655f743a733000
trusted.bit-rot.version=0x020000000000000058ea10fd0005a7ee
trusted.ec.config=0x0000080301000200
trusted.ec.dirty=0x000000000000000c000000000000000c
trusted.ec.size=0x000000000000002c
trusted.ec.version=0x000000000000000c000000000000000d
trusted.gfid=0xf8cf475afa5e4873bf2274f45278f74f
# file: home/apandey/bricks/gluster/vol-3/dir/file
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a757365725f686f6d655f743a733000
trusted.bit-rot.version=0x020000000000000058ea110100063c59
trusted.ec.config=0x0000080301000200
trusted.ec.dirty=0x000000000000000c000000000000000c
trusted.ec.size=0x000000000000002c
trusted.ec.version=0x000000000000000c000000000000000d
trusted.gfid=0xf8cf475afa5e4873bf2274f45278f74
--- Additional comment from Worker Ant on 2017-04-18 11:40:54 EDT ---
REVIEW: https://review.gluster.org/17077 (cluster/ec: OpenFD heal
implementation for EC) posted (#1) for review on master by Sunil Kumar Acharya
(sheggodu at redhat.com)
--- Additional comment from Worker Ant on 2017-04-18 11:47:46 EDT ---
REVIEW: https://review.gluster.org/17077 (cluster/ec: OpenFD heal
implementation for EC) posted (#2) for review on master by Sunil Kumar Acharya
(sheggodu at redhat.com)
--- Additional comment from Worker Ant on 2017-04-20 09:21:48 EDT ---
REVIEW: https://review.gluster.org/17077 (cluster/ec: OpenFD heal
implementation for EC) posted (#3) for review on master by Sunil Kumar Acharya
(sheggodu at redhat.com)
--- Additional comment from Worker Ant on 2017-04-27 07:39:20 EDT ---
REVIEW: https://review.gluster.org/17077 (cluster/ec: OpenFD heal
implementation for EC) posted (#4) for review on master by Sunil Kumar Acharya
(sheggodu at redhat.com)
--- Additional comment from Worker Ant on 2017-05-03 09:22:11 EDT ---
REVIEW: https://review.gluster.org/17077 (cluster/ec: OpenFD heal
implementation for EC) posted (#5) for review on master by Sunil Kumar Acharya
(sheggodu at redhat.com)
--- Additional comment from Worker Ant on 2017-05-16 02:29:34 EDT ---
REVIEW: https://review.gluster.org/17077 (cluster/ec: OpenFD heal
implementation for EC) posted (#6) for review on master by Sunil Kumar Acharya
(sheggodu at redhat.com)
--- Additional comment from Worker Ant on 2017-05-16 10:06:24 EDT ---
REVIEW: https://review.gluster.org/17077 (cluster/ec: OpenFD heal
implementation for EC) posted (#7) for review on master by Sunil Kumar Acharya
(sheggodu at redhat.com)
--- Additional comment from Worker Ant on 2017-05-30 06:50:05 EDT ---
REVIEW: https://review.gluster.org/17077 (cluster/ec: OpenFD heal
implementation for EC) posted (#8) for review on master by Sunil Kumar Acharya
(sheggodu at redhat.com)
--- Additional comment from Worker Ant on 2017-05-31 11:08:27 EDT ---
REVIEW: https://review.gluster.org/17077 (cluster/ec: OpenFD heal
implementation for EC) posted (#9) for review on master by Sunil Kumar Acharya
(sheggodu at redhat.com)
--- Additional comment from Worker Ant on 2017-06-05 11:36:58 EDT ---
REVIEW: https://review.gluster.org/17077 (cluster/ec: OpenFD heal
implementation for EC) posted (#10) for review on master by Sunil Kumar Acharya
(sheggodu at redhat.com)
--- Additional comment from Worker Ant on 2017-06-06 07:58:57 EDT ---
REVIEW: https://review.gluster.org/17077 (cluster/ec: OpenFD heal
implementation for EC) posted (#11) for review on master by Sunil Kumar Acharya
(sheggodu at redhat.com)
--- Additional comment from Worker Ant on 2017-06-08 14:18:03 EDT ---
REVIEW: https://review.gluster.org/17077 (cluster/ec: OpenFD heal
implementation for EC) posted (#12) for review on master by Sunil Kumar Acharya
(sheggodu at redhat.com)
--- Additional comment from Worker Ant on 2017-07-20 09:51:07 EDT ---
REVIEW: https://review.gluster.org/17077 (cluster/ec: OpenFD heal
implementation for EC) posted (#13) for review on master by Sunil Kumar Acharya
(sheggodu at redhat.com)
--- Additional comment from Worker Ant on 2017-08-24 08:10:49 EDT ---
REVIEW: https://review.gluster.org/17077 (cluster/ec: OpenFD heal
implementation for EC) posted (#14) for review on master by Sunil Kumar Acharya
(sheggodu at redhat.com)
--- Additional comment from Worker Ant on 2017-09-12 09:05:28 EDT ---
REVIEW: https://review.gluster.org/17077 (cluster/ec: OpenFD heal
implementation for EC) posted (#15) for review on master by Sunil Kumar Acharya
(sheggodu at redhat.com)
--- Additional comment from Worker Ant on 2017-09-22 07:38:17 EDT ---
REVIEW: https://review.gluster.org/17077 (cluster/ec: OpenFD heal
implementation for EC) posted (#16) for review on master by Sunil Kumar Acharya
(sheggodu at redhat.com)
--- Additional comment from Worker Ant on 2017-10-11 11:42:36 EDT ---
REVIEW: https://review.gluster.org/17077 (cluster/ec: OpenFD heal
implementation for EC) posted (#17) for review on master by Sunil Kumar Acharya
(sheggodu at redhat.com)
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1431955
[Bug 1431955] [Disperse] Implement open fd heal for disperse volume
--
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=0iy6liATQM&a=cc_unsubscribe
More information about the Bugs
mailing list