From bugzilla at redhat.com Thu Aug 1 01:04:52 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 01 Aug 2019 01:04:52 +0000 Subject: [Bugs] [Bug 1717824] Fencing: Added the tcmu-runner ALUA feature support but after one of node is rebooted the glfs_file_lock() get stucked In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1717824 Xiubo Li changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(spalai at redhat.com | |) --- Comment #20 from Xiubo Li --- When the ret == -1 and then check the errno directly will works for me now. But I can get both the -EAGAIN and -EBUSY, which only the -EBUSY is expected. Then the problem is why there will always be -EAGAIN every time before acquiring the lock ? Thanks BRs -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 1 01:15:46 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 01 Aug 2019 01:15:46 +0000 Subject: [Bugs] [Bug 1730948] [Glusterfs4.1.9] memory leak in fuse mount process. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1730948 guolei changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(guol-fnst at cn.fuji | |tsu.com) | --- Comment #5 from guolei --- The other bug is seen with creation/renaming of files/directories at root of the share. Just for the sake of verifying this bug you may try using vfs_glusterfs module avoiding operations at root if you can't get hold of required Samba version. -> I tried to use vfs_glusterfs module (smb4.8.3) and I found the fuse mount process consume litte memory. But the smb process consume much more memory than usual. If you need more info, Let me know. Thanks very much. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 1 01:24:36 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 01 Aug 2019 01:24:36 +0000 Subject: [Bugs] [Bug 1730948] [Glusterfs4.1.9] memory leak in fuse mount process. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1730948 --- Comment #6 from guolei --- Here was the output of "top" command, when I accessed volume via smb using vfs_glusterfs module . Tasks: 721 total, 2 running, 352 sleeping, 0 stopped, 0 zombie %Cpu(s): 9.8 us, 9.2 sy, 0.0 ni, 80.4 id, 0.1 wa, 0.0 hi, 0.5 si, 0.0 st KiB Mem : 98674000 total, 558548 free, 33681528 used, 64433928 buff/cache KiB Swap: 4194300 total, 4194300 free, 0 used. 62794580 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 8843 tom 20 0 16.802g 0.014t 265376 R 91.8 15.0 2380:11 smbd 1721 root 20 0 31.983g 3.419g 16616 S 0.0 3.6 12:45.74 java 4176 root 20 0 4554944 1.149g 7972 S 78.6 1.2 1898:52 glusterfsd 4156 root 20 0 4357068 1.077g 8120 S 61.8 1.1 1858:34 glusterfsd 4182 root 20 0 4223136 1.065g 8104 S 76.6 1.1 1881:06 glusterfsd 4115 root 20 0 4231652 1.032g 8148 S 51.6 1.1 1807:30 glusterfsd 4188 root 20 0 4281684 1.030g 8032 S 61.8 1.1 1802:19 glusterfsd 4122 root 20 0 4223168 1.025g 8268 S 56.6 1.1 1872:42 glusterfsd 4155 root 20 0 4224728 1.017g 8400 S 78.6 1.1 1789:53 glusterfsd 4146 root 20 0 4223168 1.017g 8140 S 56.6 1.1 1868:54 glusterfsd 4131 root 20 0 4228336 1.015g 8196 S 54.3 1.1 1884:54 glusterfsd 4132 root 20 0 4158672 1.015g 8216 S 73.4 1.1 1795:58 glusterfsd -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 1 02:59:49 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 01 Aug 2019 02:59:49 +0000 Subject: [Bugs] [Bug 1734299] ctime: When healing ctime xattr for legacy files, if multiple clients access and modify the same file, the ctime might be updated incorrectly. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734299 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-01 02:59:49 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23131 (posix/ctime: Fix race during lookup ctime xattr heal) merged (#2) on master by Amar Tumballi -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 1 02:59:49 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 01 Aug 2019 02:59:49 +0000 Subject: [Bugs] [Bug 1734305] ctime: When healing ctime xattr for legacy files, if multiple clients access and modify the same file, the ctime might be updated incorrectly. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734305 Bug 1734305 depends on bug 1734299, which changed state. Bug 1734299 Summary: ctime: When healing ctime xattr for legacy files, if multiple clients access and modify the same file, the ctime might be updated incorrectly. https://bugzilla.redhat.com/show_bug.cgi?id=1734299 What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 1 03:15:52 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 01 Aug 2019 03:15:52 +0000 Subject: [Bugs] [Bug 1735514] New: Open fd heal should filter O_APPEND/O_EXCL Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1735514 Bug ID: 1735514 Summary: Open fd heal should filter O_APPEND/O_EXCL Product: Red Hat Gluster Storage Version: rhgs-3.5 Status: ASSIGNED Component: disperse Keywords: ZStream Severity: medium Priority: medium Assignee: aspandey at redhat.com Reporter: sheggodu at redhat.com QA Contact: nchilaka at redhat.com CC: aspandey at redhat.com, atumball at redhat.com, bugs at gluster.org, nchilaka at redhat.com, pkarampu at redhat.com, rcyriac at redhat.com, rhs-bugs at redhat.com, sankarshan at redhat.com, sheggodu at redhat.com, storage-qa-internal at redhat.com, vdas at redhat.com Depends On: 1734303, 1733935 Target Milestone: --- Classification: Red Hat Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1733935 [Bug 1733935] Open fd heal should filter O_APPEND/O_EXCL https://bugzilla.redhat.com/show_bug.cgi?id=1734303 [Bug 1734303] Open fd heal should filter O_APPEND/O_EXCL -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 1 03:15:52 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 01 Aug 2019 03:15:52 +0000 Subject: [Bugs] [Bug 1734303] Open fd heal should filter O_APPEND/O_EXCL In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734303 Sunil Kumar Acharya changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1735514 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1735514 [Bug 1735514] Open fd heal should filter O_APPEND/O_EXCL -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 1 03:15:52 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 01 Aug 2019 03:15:52 +0000 Subject: [Bugs] [Bug 1733935] Open fd heal should filter O_APPEND/O_EXCL In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1733935 Sunil Kumar Acharya changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1735514 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1735514 [Bug 1735514] Open fd heal should filter O_APPEND/O_EXCL -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 1 03:15:57 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 01 Aug 2019 03:15:57 +0000 Subject: [Bugs] [Bug 1735514] Open fd heal should filter O_APPEND/O_EXCL In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1735514 RHEL Product and Program Management changed: What |Removed |Added ---------------------------------------------------------------------------- Rule Engine Rule| |Gluster: set proposed | |release flag for new BZs at | |RHGS -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 1 03:18:46 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 01 Aug 2019 03:18:46 +0000 Subject: [Bugs] [Bug 1735514] Open fd heal should filter O_APPEND/O_EXCL In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1735514 Atin Mukherjee changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |POST CC| |amukherj at redhat.com --- Comment #2 from Atin Mukherjee --- Upstream patch : https://review.gluster.org/#/c/glusterfs/+/23121/ -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 1 03:20:48 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 01 Aug 2019 03:20:48 +0000 Subject: [Bugs] [Bug 1735514] Open fd heal should filter O_APPEND/O_EXCL In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1735514 Sunil Kumar Acharya changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|medium |high Severity|medium |high -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 1 03:27:46 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 01 Aug 2019 03:27:46 +0000 Subject: [Bugs] [Bug 1735514] Open fd heal should filter O_APPEND/O_EXCL In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1735514 Rejy M Cyriac changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1696809 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 1 03:27:50 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 01 Aug 2019 03:27:50 +0000 Subject: [Bugs] [Bug 1735514] Open fd heal should filter O_APPEND/O_EXCL In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1735514 RHEL Product and Program Management changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|rhgs-3.5.0? blocker? |rhgs-3.5.0+ blocker+ Rule Engine Rule| |Gluster: Approve release | |flag for RHGS 3.5.0 Target Release|--- |RHGS 3.5.0 Rule Engine Rule| |666 Rule Engine Rule| |327 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 1 03:36:21 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 01 Aug 2019 03:36:21 +0000 Subject: [Bugs] [Bug 1734027] glusterd 6.4 memory leaks 2-3 GB per 24h (OOM) In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734027 Atin Mukherjee changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |CLOSED Version|unspecified |mainline CC| |bugs at gluster.org Component|glusterd |glusterd Assignee|amukherj at redhat.com |bugs at gluster.org Resolution|--- |WONTFIX Product|Red Hat Gluster Storage |GlusterFS QA Contact|bmekala at redhat.com | Last Closed| |2019-08-01 03:36:21 --- Comment #3 from Atin Mukherjee --- 3.12 version is EOLed, we have made several fixes related to memory leak, if this issue persists in the latest releases (release-5 or release-6) kindly reopen. Since we don't have an active 3.12 version to change the bug from RHGS to GlusterFS I have to choose the mainline version but actually this isn't applicable in mainline though. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 1 04:12:54 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 01 Aug 2019 04:12:54 +0000 Subject: [Bugs] [Bug 1193929] GlusterFS can be improved In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1193929 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- External Bug ID| |Gluster.org Gerrit 23083 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 1 04:12:56 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 01 Aug 2019 04:12:56 +0000 Subject: [Bugs] [Bug 1193929] GlusterFS can be improved In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1193929 --- Comment #737 from Worker Ant --- REVIEW: https://review.gluster.org/23083 (Multiple files: get trivial stuff done before lock) merged (#12) on master by Amar Tumballi -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 1 04:39:02 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 01 Aug 2019 04:39:02 +0000 Subject: [Bugs] [Bug 1732772] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732772 errata-xmlrpc changed: What |Removed |Added ---------------------------------------------------------------------------- Status|MODIFIED |ON_QA -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 1 04:39:03 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 01 Aug 2019 04:39:03 +0000 Subject: [Bugs] [Bug 1732776] I/O error on writes to a disperse volume when replace-brick is executed In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732776 errata-xmlrpc changed: What |Removed |Added ---------------------------------------------------------------------------- Status|MODIFIED |ON_QA -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 1 04:39:07 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 01 Aug 2019 04:39:07 +0000 Subject: [Bugs] [Bug 1732779] [GSS] An Input/Output error happens on a disperse volume when doing unaligned writes to a sparse file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732779 errata-xmlrpc changed: What |Removed |Added ---------------------------------------------------------------------------- Status|MODIFIED |ON_QA -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 1 04:39:08 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 01 Aug 2019 04:39:08 +0000 Subject: [Bugs] [Bug 1734303] Open fd heal should filter O_APPEND/O_EXCL In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734303 errata-xmlrpc changed: What |Removed |Added ---------------------------------------------------------------------------- Status|MODIFIED |ON_QA -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 1 04:43:10 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 01 Aug 2019 04:43:10 +0000 Subject: [Bugs] [Bug 1732790] fix truncate lock to cover the write in tuncate clean In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732790 errata-xmlrpc changed: What |Removed |Added ---------------------------------------------------------------------------- Status|MODIFIED |ON_QA -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 1 05:32:24 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 01 Aug 2019 05:32:24 +0000 Subject: [Bugs] [Bug 1730948] [Glusterfs4.1.9] memory leak in fuse mount process. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1730948 Anoop C S changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(guol-fnst at cn.fuji | |tsu.com) --- Comment #7 from Anoop C S --- (In reply to guolei from comment #5) > I tried to use vfs_glusterfs module (smb4.8.3) and I found the fuse mount > process consume litte memory. Mostly because you don't have an active connection to the share via FUSE mount since you switched to using vfs_glusterfs. > But the smb process consume much more memory than usual. It will consume more than in the previous situation where FUSE mount was used. Because the entire glusterfs client stack is getting loaded into smbd and acts as a client to glusterfs. You will have to figure out by what quantity memory footprint increased in smbd(when vfs_glusterfs is used) and compare it with memory of "glusterfs" process recorded from when FUSE mount was shared. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 1 07:42:06 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 01 Aug 2019 07:42:06 +0000 Subject: [Bugs] [Bug 1693692] Increase code coverage from regression tests In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1693692 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- External Bug ID| |Gluster.org Gerrit 23139 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 1 07:42:07 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 01 Aug 2019 07:42:07 +0000 Subject: [Bugs] [Bug 1693692] Increase code coverage from regression tests In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1693692 --- Comment #70 from Worker Ant --- REVIEW: https://review.gluster.org/23139 (lcov: check for zerofill/discard fops on arbiter) posted (#1) for review on master by Amar Tumballi -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 1 10:40:57 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 01 Aug 2019 10:40:57 +0000 Subject: [Bugs] [Bug 1716848] DHT: directory permissions are wiped out In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1716848 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-01 10:40:57 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/22814 (cluster/dht: Fix directory perms during selfheal) merged (#4) on release-6 by N Balachandran -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 1 10:42:57 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 01 Aug 2019 10:42:57 +0000 Subject: [Bugs] [Bug 1733881] [geo-rep]: gluster command not found while setting up a non-root session In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1733881 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-01 10:42:57 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23117 (geo-rep: Fix mount broker setup issue) merged (#2) on release-5 by Kotresh HR -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 1 10:42:58 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 01 Aug 2019 10:42:58 +0000 Subject: [Bugs] [Bug 1733880] [geo-rep]: gluster command not found while setting up a non-root session In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1733880 Bug 1733880 depends on bug 1733881, which changed state. Bug 1733881 Summary: [geo-rep]: gluster command not found while setting up a non-root session https://bugzilla.redhat.com/show_bug.cgi?id=1733881 What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 1 10:45:47 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 01 Aug 2019 10:45:47 +0000 Subject: [Bugs] [Bug 1731509] snapd crashes sometimes In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1731509 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-01 10:45:47 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23081 (features/snapview-server: obtain the list of snapshots inside the lock) merged (#2) on release-6 by hari gowtham -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 1 11:39:45 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 01 Aug 2019 11:39:45 +0000 Subject: [Bugs] [Bug 1734027] glusterd 6.4 memory leaks 2-3 GB per 24h (OOM) In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734027 Alex changed: What |Removed |Added ---------------------------------------------------------------------------- Status|CLOSED |NEW Resolution|WONTFIX |--- Keywords| |Reopened --- Comment #4 from Alex --- GLUSTERD version affected: 6.4 Hi, I've only mentioned 3.12 for the background, but if you read further you'll see this is a bug on 6.4. Thanks for reopening this. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 1 11:55:11 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 01 Aug 2019 11:55:11 +0000 Subject: [Bugs] [Bug 1693692] Increase code coverage from regression tests In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1693692 --- Comment #71 from Worker Ant --- REVIEW: https://review.gluster.org/23139 (lcov: check for zerofill/discard fops on arbiter) merged (#1) on master by Amar Tumballi -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 1 13:11:31 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 01 Aug 2019 13:11:31 +0000 Subject: [Bugs] [Bug 1554286] Xattr not updated if increasing the retention of a WORM/Retained file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1554286 Vishal Pandey changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 1 13:35:31 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 01 Aug 2019 13:35:31 +0000 Subject: [Bugs] [Bug 1693692] Increase code coverage from regression tests In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1693692 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- External Bug ID| |Gluster.org Gerrit 23141 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 1 13:35:32 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 01 Aug 2019 13:35:32 +0000 Subject: [Bugs] [Bug 1693692] Increase code coverage from regression tests In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1693692 --- Comment #72 from Worker Ant --- REVIEW: https://review.gluster.org/23141 (xdr: add code so we have more xdr functions covered) posted (#1) for review on master by Amar Tumballi -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 1 13:53:43 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 01 Aug 2019 13:53:43 +0000 Subject: [Bugs] [Bug 1708603] [geo-rep]: Note section in document is required for ignore_deletes true config option where it might delete a file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1708603 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- External Bug ID| |Gluster.org Gerrit 23142 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 1 13:53:44 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 01 Aug 2019 13:53:44 +0000 Subject: [Bugs] [Bug 1708603] [geo-rep]: Note section in document is required for ignore_deletes true config option where it might delete a file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1708603 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23142 (geo-rep: Note section is required for ignore_deletes) posted (#1) for review on master by Shwetha K Acharya -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 1 17:00:51 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 01 Aug 2019 17:00:51 +0000 Subject: [Bugs] [Bug 1736341] New: potential deadlock while processing callbacks in gfapi Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1736341 Bug ID: 1736341 Summary: potential deadlock while processing callbacks in gfapi Product: GlusterFS Version: 6 Hardware: All OS: All Status: NEW Component: libgfapi Severity: high Assignee: bugs at gluster.org Reporter: skoduri at redhat.com QA Contact: bugs at gluster.org CC: atumball at redhat.com, bugs at gluster.org, pasik at iki.fi Depends On: 1733166 Blocks: 1733520 Target Milestone: --- Classification: Community +++ This bug was initially created as a clone of Bug #1733166 +++ Description of problem: While running parallel I/Os involving many files on nfs-ganesha mount, have hit below deadlock in the nfs-ganesha process. epoll thread: ....glfs_cbk_upcall_data->upcall_syncop_args_init->glfs_h_poll_cache_invalidation->glfs_h_find_handle->priv_glfs_active_subvol->glfs_lock (waiting on lock) I/O thread: ...glfs_h_stat->glfs_resolve_inode->__glfs_resolve_inode (at this point we acquired glfs_lock) -> ...->glfs_refresh_inode_safe->syncop_lookup To summarize- I/O thread which acquired glfs_lock are waiting for epoll threads to receive response where as epoll threads are waiting for I/O threads to release lock. Similar issue was identified earlier (bug1693575). There could be other issues at different layers depending on how client xlators choose to process these callbacks. The correct way of avoiding or fixing these issues is to re-design upcall model which is to use different sockets for callback communication instead of using same epoll threads. Raised github issue for that - https://github.com/gluster/glusterfs/issues/697 Since it may take a while, raising this BZ to provide a workaround fix in gfapi layer for now Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: --- Additional comment from Worker Ant on 2019-07-25 10:09:58 UTC --- REVIEW: https://review.gluster.org/23107 (gfapi: Fix deadlock while processing upcall) posted (#1) for review on release-6 by soumya k --- Additional comment from Worker Ant on 2019-07-25 10:16:57 UTC --- REVIEW: https://review.gluster.org/23108 (gfapi: Fix deadlock while processing upcall) posted (#1) for review on master by soumya k Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1733166 [Bug 1733166] potential deadlock while processing callbacks in gfapi https://bugzilla.redhat.com/show_bug.cgi?id=1733520 [Bug 1733520] potential deadlock while processing callbacks in gfapi -- You are receiving this mail because: You are the QA Contact for the bug. You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 1 17:00:51 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 01 Aug 2019 17:00:51 +0000 Subject: [Bugs] [Bug 1733166] potential deadlock while processing callbacks in gfapi In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1733166 Soumya Koduri changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1736341 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1736341 [Bug 1736341] potential deadlock while processing callbacks in gfapi -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 1 17:00:51 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 01 Aug 2019 17:00:51 +0000 Subject: [Bugs] [Bug 1733520] potential deadlock while processing callbacks in gfapi In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1733520 Soumya Koduri changed: What |Removed |Added ---------------------------------------------------------------------------- Depends On| |1736341 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1736341 [Bug 1736341] potential deadlock while processing callbacks in gfapi -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 1 17:01:13 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 01 Aug 2019 17:01:13 +0000 Subject: [Bugs] [Bug 1736342] New: potential deadlock while processing callbacks in gfapi Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1736342 Bug ID: 1736342 Summary: potential deadlock while processing callbacks in gfapi Product: GlusterFS Version: 5 Hardware: All OS: All Status: NEW Component: libgfapi Severity: high Assignee: bugs at gluster.org Reporter: skoduri at redhat.com QA Contact: bugs at gluster.org CC: atumball at redhat.com, bugs at gluster.org, pasik at iki.fi Depends On: 1733166 Blocks: 1733520, 1736341 Target Milestone: --- Classification: Community +++ This bug was initially created as a clone of Bug #1733166 +++ Description of problem: While running parallel I/Os involving many files on nfs-ganesha mount, have hit below deadlock in the nfs-ganesha process. epoll thread: ....glfs_cbk_upcall_data->upcall_syncop_args_init->glfs_h_poll_cache_invalidation->glfs_h_find_handle->priv_glfs_active_subvol->glfs_lock (waiting on lock) I/O thread: ...glfs_h_stat->glfs_resolve_inode->__glfs_resolve_inode (at this point we acquired glfs_lock) -> ...->glfs_refresh_inode_safe->syncop_lookup To summarize- I/O thread which acquired glfs_lock are waiting for epoll threads to receive response where as epoll threads are waiting for I/O threads to release lock. Similar issue was identified earlier (bug1693575). There could be other issues at different layers depending on how client xlators choose to process these callbacks. The correct way of avoiding or fixing these issues is to re-design upcall model which is to use different sockets for callback communication instead of using same epoll threads. Raised github issue for that - https://github.com/gluster/glusterfs/issues/697 Since it may take a while, raising this BZ to provide a workaround fix in gfapi layer for now Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: --- Additional comment from Worker Ant on 2019-07-25 10:09:58 UTC --- REVIEW: https://review.gluster.org/23107 (gfapi: Fix deadlock while processing upcall) posted (#1) for review on release-6 by soumya k --- Additional comment from Worker Ant on 2019-07-25 10:16:57 UTC --- REVIEW: https://review.gluster.org/23108 (gfapi: Fix deadlock while processing upcall) posted (#1) for review on master by soumya k Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1733166 [Bug 1733166] potential deadlock while processing callbacks in gfapi https://bugzilla.redhat.com/show_bug.cgi?id=1733520 [Bug 1733520] potential deadlock while processing callbacks in gfapi https://bugzilla.redhat.com/show_bug.cgi?id=1736341 [Bug 1736341] potential deadlock while processing callbacks in gfapi -- You are receiving this mail because: You are the QA Contact for the bug. You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 1 17:01:13 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 01 Aug 2019 17:01:13 +0000 Subject: [Bugs] [Bug 1733166] potential deadlock while processing callbacks in gfapi In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1733166 Soumya Koduri changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1736342 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1736342 [Bug 1736342] potential deadlock while processing callbacks in gfapi -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 1 17:01:13 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 01 Aug 2019 17:01:13 +0000 Subject: [Bugs] [Bug 1733520] potential deadlock while processing callbacks in gfapi In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1733520 Soumya Koduri changed: What |Removed |Added ---------------------------------------------------------------------------- Depends On| |1736342 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1736342 [Bug 1736342] potential deadlock while processing callbacks in gfapi -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 1 17:01:13 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 01 Aug 2019 17:01:13 +0000 Subject: [Bugs] [Bug 1736341] potential deadlock while processing callbacks in gfapi In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1736341 Soumya Koduri changed: What |Removed |Added ---------------------------------------------------------------------------- Depends On| |1736342 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1736342 [Bug 1736342] potential deadlock while processing callbacks in gfapi -- You are receiving this mail because: You are the QA Contact for the bug. You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 1 17:01:34 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 01 Aug 2019 17:01:34 +0000 Subject: [Bugs] [Bug 1736345] New: potential deadlock while processing callbacks in gfapi Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1736345 Bug ID: 1736345 Summary: potential deadlock while processing callbacks in gfapi Product: GlusterFS Version: 7 Hardware: All OS: All Status: NEW Component: libgfapi Severity: high Assignee: bugs at gluster.org Reporter: skoduri at redhat.com QA Contact: bugs at gluster.org CC: atumball at redhat.com, bugs at gluster.org, pasik at iki.fi Depends On: 1733166 Blocks: 1733520, 1736341, 1736342 Target Milestone: --- Classification: Community +++ This bug was initially created as a clone of Bug #1733166 +++ Description of problem: While running parallel I/Os involving many files on nfs-ganesha mount, have hit below deadlock in the nfs-ganesha process. epoll thread: ....glfs_cbk_upcall_data->upcall_syncop_args_init->glfs_h_poll_cache_invalidation->glfs_h_find_handle->priv_glfs_active_subvol->glfs_lock (waiting on lock) I/O thread: ...glfs_h_stat->glfs_resolve_inode->__glfs_resolve_inode (at this point we acquired glfs_lock) -> ...->glfs_refresh_inode_safe->syncop_lookup To summarize- I/O thread which acquired glfs_lock are waiting for epoll threads to receive response where as epoll threads are waiting for I/O threads to release lock. Similar issue was identified earlier (bug1693575). There could be other issues at different layers depending on how client xlators choose to process these callbacks. The correct way of avoiding or fixing these issues is to re-design upcall model which is to use different sockets for callback communication instead of using same epoll threads. Raised github issue for that - https://github.com/gluster/glusterfs/issues/697 Since it may take a while, raising this BZ to provide a workaround fix in gfapi layer for now Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: --- Additional comment from Worker Ant on 2019-07-25 10:09:58 UTC --- REVIEW: https://review.gluster.org/23107 (gfapi: Fix deadlock while processing upcall) posted (#1) for review on release-6 by soumya k --- Additional comment from Worker Ant on 2019-07-25 10:16:57 UTC --- REVIEW: https://review.gluster.org/23108 (gfapi: Fix deadlock while processing upcall) posted (#1) for review on master by soumya k Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1733166 [Bug 1733166] potential deadlock while processing callbacks in gfapi https://bugzilla.redhat.com/show_bug.cgi?id=1733520 [Bug 1733520] potential deadlock while processing callbacks in gfapi https://bugzilla.redhat.com/show_bug.cgi?id=1736341 [Bug 1736341] potential deadlock while processing callbacks in gfapi https://bugzilla.redhat.com/show_bug.cgi?id=1736342 [Bug 1736342] potential deadlock while processing callbacks in gfapi -- You are receiving this mail because: You are the QA Contact for the bug. You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 1 17:01:34 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 01 Aug 2019 17:01:34 +0000 Subject: [Bugs] [Bug 1733166] potential deadlock while processing callbacks in gfapi In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1733166 Soumya Koduri changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1736345 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1736345 [Bug 1736345] potential deadlock while processing callbacks in gfapi -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 1 17:01:34 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 01 Aug 2019 17:01:34 +0000 Subject: [Bugs] [Bug 1733520] potential deadlock while processing callbacks in gfapi In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1733520 Soumya Koduri changed: What |Removed |Added ---------------------------------------------------------------------------- Depends On| |1736345 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1736345 [Bug 1736345] potential deadlock while processing callbacks in gfapi -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 1 17:01:34 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 01 Aug 2019 17:01:34 +0000 Subject: [Bugs] [Bug 1736341] potential deadlock while processing callbacks in gfapi In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1736341 Soumya Koduri changed: What |Removed |Added ---------------------------------------------------------------------------- Depends On| |1736345 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1736345 [Bug 1736345] potential deadlock while processing callbacks in gfapi -- You are receiving this mail because: You are the QA Contact for the bug. You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 1 17:01:34 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 01 Aug 2019 17:01:34 +0000 Subject: [Bugs] [Bug 1736342] potential deadlock while processing callbacks in gfapi In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1736342 Soumya Koduri changed: What |Removed |Added ---------------------------------------------------------------------------- Depends On| |1736345 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1736345 [Bug 1736345] potential deadlock while processing callbacks in gfapi -- You are receiving this mail because: You are the QA Contact for the bug. You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 1 18:06:30 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 01 Aug 2019 18:06:30 +0000 Subject: [Bugs] [Bug 1736481] New: capture stat failure error while setting the gfid Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1736481 Bug ID: 1736481 Summary: capture stat failure error while setting the gfid Product: GlusterFS Version: 7 Status: NEW Component: posix Assignee: bugs at gluster.org Reporter: rabhat at redhat.com CC: bugs at gluster.org Target Milestone: --- Classification: Community Description of problem: For create operation, after the entry is created, posix xlator tries to set the gfid for that entry. While doing that, there are several places where setting gfid can fail. While the failure is handled in all the cases, for one of the failure cases, the errno is not captured. Capturing this might help in debugging. int posix_gfid_set(xlator_t *this, const char *path, loc_t *loc, dict_t *xattr_req, pid_t pid, int *op_errno) { uuid_t uuid_req; uuid_t uuid_curr; int ret = 0; ssize_t size = 0; struct stat stat = { 0, }; *op_errno = 0; if (!xattr_req) { if (pid != GF_SERVER_PID_TRASH) { gf_msg(this->name, GF_LOG_ERROR, EINVAL, P_MSG_INVALID_ARGUMENT, "xattr_req is null"); *op_errno = EINVAL; ret = -1; } goto out; } if (sys_lstat(path, &stat) != 0) { ret = -1; gf_msg(this->name, GF_LOG_ERROR, errno, P_MSG_LSTAT_FAILED, "lstat on %s failed", path); goto out; } HERE, errno is not captured. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 1 18:07:46 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 01 Aug 2019 18:07:46 +0000 Subject: [Bugs] [Bug 1736482] New: capture stat failure error while setting the gfid Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1736482 Bug ID: 1736482 Summary: capture stat failure error while setting the gfid Product: GlusterFS Version: mainline Status: NEW Component: posix Assignee: bugs at gluster.org Reporter: rabhat at redhat.com CC: bugs at gluster.org Depends On: 1736481 Target Milestone: --- Classification: Community +++ This bug was initially created as a clone of Bug #1736481 +++ Description of problem: For create operation, after the entry is created, posix xlator tries to set the gfid for that entry. While doing that, there are several places where setting gfid can fail. While the failure is handled in all the cases, for one of the failure cases, the errno is not captured. Capturing this might help in debugging. int posix_gfid_set(xlator_t *this, const char *path, loc_t *loc, dict_t *xattr_req, pid_t pid, int *op_errno) { uuid_t uuid_req; uuid_t uuid_curr; int ret = 0; ssize_t size = 0; struct stat stat = { 0, }; *op_errno = 0; if (!xattr_req) { if (pid != GF_SERVER_PID_TRASH) { gf_msg(this->name, GF_LOG_ERROR, EINVAL, P_MSG_INVALID_ARGUMENT, "xattr_req is null"); *op_errno = EINVAL; ret = -1; } goto out; } if (sys_lstat(path, &stat) != 0) { ret = -1; gf_msg(this->name, GF_LOG_ERROR, errno, P_MSG_LSTAT_FAILED, "lstat on %s failed", path); goto out; } HERE, errno is not captured. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1736481 [Bug 1736481] capture stat failure error while setting the gfid -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 1 18:07:46 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 01 Aug 2019 18:07:46 +0000 Subject: [Bugs] [Bug 1736481] capture stat failure error while setting the gfid In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1736481 Raghavendra Bhat changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1736482 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1736482 [Bug 1736482] capture stat failure error while setting the gfid -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 1 18:56:05 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 01 Aug 2019 18:56:05 +0000 Subject: [Bugs] [Bug 1736564] New: GlusterFS files missing randomly. Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1736564 Bug ID: 1736564 Summary: GlusterFS files missing randomly. Product: GlusterFS Version: 6 Hardware: x86_64 OS: Linux Status: NEW Component: core Severity: high Assignee: bugs at gluster.org Reporter: yexue2015 at u.northwestern.edu CC: bugs at gluster.org Target Milestone: --- Classification: Community Description of problem: Some files were suddenly missing. Then after a couple of days, the missing files appeared again (not been damaged). Under one of my folders, there were two sub-folders and four files. At a point, two files were missing and more files under those two sub-folders were missing randomly. There should be 210 files under each of the sub-folders, but after the missing occurs, there were 125 and 138 files left. I can no longer read the files that are missing. However, after a few days. I found the missing files were back. Version-Release number of selected component (if applicable): rpm -qa | grep glusterfs glusterfs-6.1-1.el7.x86_64 glusterfs-client-xlators-6.1-1.el7.x86_64 glusterfs-libs-6.1-1.el7.x86_64 glusterfs-fuse-6.1-1.el7.x86_64 How reproducible: The problem occurs quite randomly. It is not clear what triggers the missing and how to reproduce the problem. Steps to Reproduce: 1. NA Actual results: NA Expected results: NA Additional info: -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 1 19:47:08 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 01 Aug 2019 19:47:08 +0000 Subject: [Bugs] [Bug 1736482] capture stat failure error while setting the gfid In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1736482 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- External Bug ID| |Gluster.org Gerrit 23144 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 1 19:47:09 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 01 Aug 2019 19:47:09 +0000 Subject: [Bugs] [Bug 1736482] capture stat failure error while setting the gfid In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1736482 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23144 (storage/posix: set the op_errno to proper errno during gfid set) posted (#1) for review on master by Raghavendra Bhat -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 2 06:49:26 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 02 Aug 2019 06:49:26 +0000 Subject: [Bugs] [Bug 1428103] Generate UUID on installation In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1428103 Vijay Bellur changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |CLOSED Resolution|--- |NOTABUG Flags|needinfo?(vbellur at redhat.co | |m) | Last Closed| |2019-08-02 06:49:26 --- Comment #5 from Vijay Bellur --- Haven't heard back from Shreyas. Closing this bug. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 2 06:50:35 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 02 Aug 2019 06:50:35 +0000 Subject: [Bugs] [Bug 1597798] 'mv' of directory on encrypted volume fails In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1597798 Vijay Bellur changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(vbellur at redhat.co | |m) | -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 2 06:51:06 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 02 Aug 2019 06:51:06 +0000 Subject: [Bugs] [Bug 1648169] Fuse mount would crash if features.encryption is on in the version from 3.13.0 to 4.1.5 In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1648169 Vijay Bellur changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(vbellur at redhat.co | |m) | -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 2 06:51:12 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 02 Aug 2019 06:51:12 +0000 Subject: [Bugs] [Bug 1428081] cluster/dht: Bug fixes to cluster.min-free-disk In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1428081 Vijay Bellur changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(vbellur at redhat.co | |m) | -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 2 06:51:29 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 02 Aug 2019 06:51:29 +0000 Subject: [Bugs] [Bug 1428075] debug/io-stats: Add errors to FOP samples In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1428075 Vijay Bellur changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(vbellur at redhat.co | |m) | -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 2 07:35:34 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 02 Aug 2019 07:35:34 +0000 Subject: [Bugs] [Bug 1727081] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1727081 Pranith Kumar K changed: What |Removed |Added ---------------------------------------------------------------------------- Status|CLOSED |ASSIGNED Resolution|NEXTRELEASE |--- --- Comment #15 from Pranith Kumar K --- Found one case which needs to be fixed. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 2 07:35:36 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 02 Aug 2019 07:35:36 +0000 Subject: [Bugs] [Bug 1732772] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732772 Bug 1732772 depends on bug 1727081, which changed state. Bug 1727081 Summary: Disperse volume : data corruption with ftruncate data in 4+2 config https://bugzilla.redhat.com/show_bug.cgi?id=1727081 What |Removed |Added ---------------------------------------------------------------------------- Status|CLOSED |ASSIGNED Resolution|NEXTRELEASE |--- -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 2 07:35:37 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 02 Aug 2019 07:35:37 +0000 Subject: [Bugs] [Bug 1732774] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732774 Bug 1732774 depends on bug 1727081, which changed state. Bug 1727081 Summary: Disperse volume : data corruption with ftruncate data in 4+2 config https://bugzilla.redhat.com/show_bug.cgi?id=1727081 What |Removed |Added ---------------------------------------------------------------------------- Status|CLOSED |ASSIGNED Resolution|NEXTRELEASE |--- -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 2 07:35:39 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 02 Aug 2019 07:35:39 +0000 Subject: [Bugs] [Bug 1732792] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732792 Bug 1732792 depends on bug 1727081, which changed state. Bug 1727081 Summary: Disperse volume : data corruption with ftruncate data in 4+2 config https://bugzilla.redhat.com/show_bug.cgi?id=1727081 What |Removed |Added ---------------------------------------------------------------------------- Status|CLOSED |ASSIGNED Resolution|NEXTRELEASE |--- -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 2 07:38:11 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 02 Aug 2019 07:38:11 +0000 Subject: [Bugs] [Bug 1727081] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1727081 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- External Bug ID| |Gluster.org Gerrit 23147 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 2 07:38:12 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 02 Aug 2019 07:38:12 +0000 Subject: [Bugs] [Bug 1727081] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1727081 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |POST --- Comment #16 from Worker Ant --- REVIEW: https://review.gluster.org/23147 (cluster/ec: Update lock->good_mask on parent fop failure) posted (#1) for review on master by Pranith Kumar Karampuri -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 2 07:56:17 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 02 Aug 2019 07:56:17 +0000 Subject: [Bugs] [Bug 1736848] New: Execute the "gluster peer probe invalid_hostname" thread deadlock or the glusterd process crashes Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1736848 Bug ID: 1736848 Summary: Execute the "gluster peer probe invalid_hostname" thread deadlock or the glusterd process crashes Product: GlusterFS Version: 6 Hardware: x86_64 OS: Linux Status: NEW Component: glusterd Severity: urgent Assignee: bugs at gluster.org Reporter: xlfy555 at 163.com CC: bugs at gluster.org Target Milestone: --- Classification: Community Description of problem: When glusterd starts, typing the command "gluster peer probe invalid_hostname" produces different results on different machines, with some machines glusterd crashing and producing core files, and some machines glusterd processes with many more child threads. Version-Release number of selected component (if applicable): release-6 How reproducible: Steps to Reproduce: Case 1 1.glusterd 2.gluster peer probe invalid_hostname Case 2 1.glusterd 2.gluster peer probe invalid_hostname 3.gluster peer probe invalid_hostname 4.gluster peer probe invalid_hostname(Do it a few more times) 5.ps -aux|grep glusterd 6.gdb attach glusterd-pid 7.info thr (You'll see a lot of "__lll_lock_wait()" child threads) Actual results: Case 1 [Thread debugging using libthread_db enabled] Using host libthread_db library "/usr/lib64/libthread_db.so.1". Core was generated by `glusterd'. Program terminated with signal 11, Segmentation fault. #0 0x00007fef4bd208ff in rpc_clnt_handle_disconnect (conn=0x7fef34007890, clnt=0x7fef34007860) at rpc-clnt.c:832 832 if (!conn->rpc_clnt->disabled && (conn->reconnect == NULL)) { Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.6-13.el7.x86_64 elfutils-libelf-0.166-2.el7.x86_64 elfutils-libs-0.166-2.el7.x86_64 glibc-2.17-157.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-37.el7_6.x86_64 libattr-2.4.46-12.el7.x86_64 libcap-2.22-8.el7.x86_64 libcom_err-1.42.9-9.el7.x86_64 libgcc-4.8.5-11.el7.x86_64 libselinux-2.5-6.el7.x86_64 libuuid-2.23.2-33.el7.x86_64 libxml2-2.9.1-6.el7_2.3.x86_64 openssl-libs-1.0.1e-60.el7.x86_64 pcre-8.32-15.el7_2.1.x86_64 systemd-libs-219-30.el7.x86_64 userspace-rcu-0.7.16-1.el7.x86_64 xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-17.el7.x86_64 (gdb) bt #0 0x00007fef4bd208ff in rpc_clnt_handle_disconnect (conn=0x7fef34007890, clnt=0x7fef34007860) at rpc-clnt.c:832 #1 rpc_clnt_notify (trans=0x7fef34007be0, mydata=0x7fef34007890, event=, data=) at rpc-clnt.c:878 #2 0x00007fef4bd1d4e3 in rpc_transport_notify (this=, event=event at entry=RPC_TRANSPORT_DISCONNECT, data=) at rpc-transport.c:542 #3 0x00007fef3f3634d7 in socket_connect_error_cbk (opaque=0x7fef34007190) at socket.c:3239 #4 0x00007fef4adb6dc5 in start_thread () from /usr/lib64/libpthread.so.0 #5 0x00007fef4a6fb73d in clone () from /usr/lib64/libc.so.6 (gdb) p conn->rpc_clnt $1 = (struct rpc_clnt *) 0x14860 (gdb) p conn->rpc_clnt->disabled Cannot access memory at address 0x149a0 Case 2 (gdb) info thr Id Target Id Frame 16 Thread 0x7ff384f45700 (LWP 18259) "glfs_timer" 0x00007ff38c728bdd in nanosleep () from /usr/lib64/libpthread.so.0 15 Thread 0x7ff384744700 (LWP 18260) "glfs_sigwait" 0x00007ff38c729101 in sigwait () from /usr/lib64/libpthread.so.0 14 Thread 0x7ff383f43700 (LWP 18261) "glfs_memsweep" 0x00007ff38c02d66d in nanosleep () from /usr/lib64/libc.so.6 13 Thread 0x7ff383742700 (LWP 18262) "glfs_sproc0" 0x00007ff38c725a82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0 12 Thread 0x7ff382f41700 (LWP 18263) "glfs_sproc1" 0x00007ff38c725a82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0 11 Thread 0x7ff382740700 (LWP 18264) "glusterd" 0x00007ff38c05dba3 in select () from /usr/lib64/libc.so.6 10 Thread 0x7ff37f2c1700 (LWP 18290) "glfs_gdhooks" 0x00007ff38c7256d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0 9 Thread 0x7ff37eac0700 (LWP 18291) "glfs_epoll000" 0x00007ff38c066d13 in epoll_wait () from /usr/lib64/libc.so.6 8 Thread 0x7ff37d216700 (LWP 18306) "glfs_scleanup" 0x00007ff38c7281bd in __lll_lock_wait () from /usr/lib64/libpthread.so.0 7 Thread 0x7ff37ca15700 (LWP 18307) "glfs_scleanup" 0x00007ff38c060bf9 in syscall () from /usr/lib64/libc.so.6 6 Thread 0x7ff367fff700 (LWP 18315) "glfs_scleanup" 0x00007ff38c7281bd in __lll_lock_wait () from /usr/lib64/libpthread.so.0 5 Thread 0x7ff3677fe700 (LWP 18323) "glfs_scleanup" 0x00007ff38c7281bd in __lll_lock_wait () from /usr/lib64/libpthread.so.0 4 Thread 0x7ff366ffd700 (LWP 18331) "glfs_scleanup" 0x00007ff38c7281bd in __lll_lock_wait () from /usr/lib64/libpthread.so.0 3 Thread 0x7ff3667fc700 (LWP 18339) "glfs_scleanup" 0x00007ff38c7281bd in __lll_lock_wait () from /usr/lib64/libpthread.so.0 2 Thread 0x7ff365ffb700 (LWP 18347) "glfs_scleanup" 0x00007ff38c7281bd in __lll_lock_wait () from /usr/lib64/libpthread.so.0 * 1 Thread 0x7ff38de22480 (LWP 18258) "glusterd" 0x00007ff38c722ef7 in pthread_join () from /usr/lib64/libpthread.so.0 Expected results: Additional info: -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 2 10:06:10 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 02 Aug 2019 10:06:10 +0000 Subject: [Bugs] [Bug 1736345] potential deadlock while processing callbacks in gfapi In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1736345 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- External Bug ID| |Gluster.org Gerrit 23150 -- You are receiving this mail because: You are the QA Contact for the bug. You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 2 10:06:12 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 02 Aug 2019 10:06:12 +0000 Subject: [Bugs] [Bug 1736345] potential deadlock while processing callbacks in gfapi In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1736345 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23150 (gfapi: Fix deadlock while processing upcall) posted (#1) for review on release-7 by soumya k -- You are receiving this mail because: You are the QA Contact for the bug. You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 2 10:07:12 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 02 Aug 2019 10:07:12 +0000 Subject: [Bugs] [Bug 1736342] potential deadlock while processing callbacks in gfapi In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1736342 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- External Bug ID| |Gluster.org Gerrit 23151 -- You are receiving this mail because: You are the QA Contact for the bug. You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 2 10:07:13 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 02 Aug 2019 10:07:13 +0000 Subject: [Bugs] [Bug 1736342] potential deadlock while processing callbacks in gfapi In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1736342 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23151 (gfapi: Fix deadlock while processing upcall) posted (#1) for review on release-5 by soumya k -- You are receiving this mail because: You are the QA Contact for the bug. You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 2 10:09:08 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 02 Aug 2019 10:09:08 +0000 Subject: [Bugs] [Bug 1736341] potential deadlock while processing callbacks in gfapi In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1736341 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- External Bug ID| |Gluster.org Gerrit 23107 -- You are receiving this mail because: You are the QA Contact for the bug. You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 2 10:09:09 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 02 Aug 2019 10:09:09 +0000 Subject: [Bugs] [Bug 1736341] potential deadlock while processing callbacks in gfapi In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1736341 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23107 (gfapi: Fix deadlock while processing upcall) posted (#4) for review on release-6 by soumya k -- You are receiving this mail because: You are the QA Contact for the bug. You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 2 13:04:32 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 02 Aug 2019 13:04:32 +0000 Subject: [Bugs] [Bug 1543996] truncates read-only files on copy In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1543996 Kaleb KEITHLEY changed: What |Removed |Added ---------------------------------------------------------------------------- Version|mainline |6 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 2 13:08:28 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 02 Aug 2019 13:08:28 +0000 Subject: [Bugs] [Bug 1543996] truncates read-only files on copy In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1543996 Kaleb KEITHLEY changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1735480 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1735480 [Bug 1735480] git clone fails on gluster volumes exported via nfs-ganesha -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 2 14:13:43 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 02 Aug 2019 14:13:43 +0000 Subject: [Bugs] [Bug 1733166] potential deadlock while processing callbacks in gfapi In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1733166 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-02 14:13:43 --- Comment #3 from Worker Ant --- REVIEW: https://review.gluster.org/23108 (gfapi: Fix deadlock while processing upcall) merged (#5) on master by Amar Tumballi -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 2 14:13:44 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 02 Aug 2019 14:13:44 +0000 Subject: [Bugs] [Bug 1733520] potential deadlock while processing callbacks in gfapi In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1733520 Bug 1733520 depends on bug 1733166, which changed state. Bug 1733166 Summary: potential deadlock while processing callbacks in gfapi https://bugzilla.redhat.com/show_bug.cgi?id=1733166 What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 2 14:13:45 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 02 Aug 2019 14:13:45 +0000 Subject: [Bugs] [Bug 1736341] potential deadlock while processing callbacks in gfapi In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1736341 Bug 1736341 depends on bug 1733166, which changed state. Bug 1733166 Summary: potential deadlock while processing callbacks in gfapi https://bugzilla.redhat.com/show_bug.cgi?id=1733166 What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE -- You are receiving this mail because: You are the QA Contact for the bug. You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 2 14:13:45 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 02 Aug 2019 14:13:45 +0000 Subject: [Bugs] [Bug 1736342] potential deadlock while processing callbacks in gfapi In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1736342 Bug 1736342 depends on bug 1733166, which changed state. Bug 1733166 Summary: potential deadlock while processing callbacks in gfapi https://bugzilla.redhat.com/show_bug.cgi?id=1733166 What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE -- You are receiving this mail because: You are the QA Contact for the bug. You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 2 14:13:46 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 02 Aug 2019 14:13:46 +0000 Subject: [Bugs] [Bug 1736345] potential deadlock while processing callbacks in gfapi In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1736345 Bug 1736345 depends on bug 1733166, which changed state. Bug 1733166 Summary: potential deadlock while processing callbacks in gfapi https://bugzilla.redhat.com/show_bug.cgi?id=1733166 What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE -- You are receiving this mail because: You are the QA Contact for the bug. You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 2 14:26:16 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 02 Aug 2019 14:26:16 +0000 Subject: [Bugs] [Bug 1734738] Unable to create geo-rep session on a non-root setup. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734738 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-02 14:26:16 --- Comment #3 from Worker Ant --- REVIEW: https://review.gluster.org/23136 (geo-rep: Fix mount broker setup issue) merged (#3) on master by Amar Tumballi -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 2 14:27:15 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 02 Aug 2019 14:27:15 +0000 Subject: [Bugs] [Bug 1717824] Fencing: Added the tcmu-runner ALUA feature support but after one of node is rebooted the glfs_file_lock() get stucked In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1717824 --- Comment #21 from Worker Ant --- REVIEW: https://review.gluster.org/23088 (locks/fencing: Address hang while lock preemption) merged (#4) on master by Amar Tumballi -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 2 19:19:55 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 02 Aug 2019 19:19:55 +0000 Subject: [Bugs] [Bug 1737141] New: read() returns more than file size when using direct I/O Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737141 Bug ID: 1737141 Summary: read() returns more than file size when using direct I/O Product: GlusterFS Version: 6 Status: NEW Component: fuse Severity: high Assignee: bugs at gluster.org Reporter: nsoffer at redhat.com CC: bugs at gluster.org Target Milestone: --- Classification: Community Description of problem: When using direct I/O, reading from a file returns more data, padding the file data with zeroes. Here is an example. ## On a host mounting gluster using fuse $ pwd /rhev/data-center/mnt/glusterSD/voodoo4.tlv.redhat.com:_gv0/de566475-5b67-4987-abf3-3dc98083b44c/dom_md $ mount | grep glusterfs voodoo4.tlv.redhat.com:/gv0 on /rhev/data-center/mnt/glusterSD/voodoo4.tlv.redhat.com:_gv0 type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) $ stat metadata File: metadata Size: 501 Blocks: 1 IO Block: 131072 regular file Device: 31h/49d Inode: 13313776956941938127 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 36/ vdsm) Gid: ( 36/ kvm) Context: system_u:object_r:fusefs_t:s0 Access: 2019-08-01 22:21:49.186381528 +0300 Modify: 2019-08-01 22:21:49.427404135 +0300 Change: 2019-08-01 22:21:49.969739575 +0300 Birth: - $ cat metadata ALIGNMENT=1048576 BLOCK_SIZE=4096 CLASS=Data DESCRIPTION=gv0 IOOPTIMEOUTSEC=10 LEASERETRIES=3 LEASETIMESEC=60 LOCKPOLICY= LOCKRENEWALINTERVALSEC=5 MASTER_VERSION=1 POOL_DESCRIPTION=4k-gluster POOL_DOMAINS=de566475-5b67-4987-abf3-3dc98083b44c:Active POOL_SPM_ID=-1 POOL_SPM_LVER=-1 POOL_UUID=44cfb532-3144-48bd-a08c-83065a5a1032 REMOTE_PATH=voodoo4.tlv.redhat.com:/gv0 ROLE=Master SDUUID=de566475-5b67-4987-abf3-3dc98083b44c TYPE=GLUSTERFS VERSION=5 _SHA_CKSUM=3d1cb836f4c93679fc5a4e7218425afe473e3cfa $ dd if=metadata bs=4096 count=1 of=/dev/null 0+1 records in 0+1 records out 501 bytes copied, 0.000340298 s, 1.5 MB/s $ dd if=metadata bs=4096 count=1 of=/dev/null iflag=direct 1+0 records in 1+0 records out 4096 bytes (4.1 kB, 4.0 KiB) copied, 0.00398529 s, 1.0 MB/s Checking the copied data, the actual content of the file is padded with zeros to 4096 bytes. ## On the one of the gluster nodes $ pwd /export/vdo0/brick/de566475-5b67-4987-abf3-3dc98083b44c/dom_md $ stat metadata File: metadata Size: 501 Blocks: 16 IO Block: 4096 regular file Device: fd02h/64770d Inode: 149 Links: 2 Access: (0644/-rw-r--r--) Uid: ( 36/ UNKNOWN) Gid: ( 36/ kvm) Context: system_u:object_r:usr_t:s0 Access: 2019-08-01 22:21:50.380425478 +0300 Modify: 2019-08-01 22:21:49.427397589 +0300 Change: 2019-08-01 22:21:50.374425302 +0300 Birth: - $ dd if=metadata bs=4096 count=1 of=/dev/null 0+1 records in 0+1 records out 501 bytes copied, 0.000991636 s, 505 kB/s $ dd if=metadata bs=4096 count=1 of=/dev/null iflag=direct 0+1 records in 0+1 records out 501 bytes copied, 0.0011381 s, 440 kB/s This proves that the issue is in gluster. # gluster volume info gv0 Volume Name: gv0 Type: Replicate Volume ID: cbc5a2ad-7246-42fc-a78f-70175fb7bf22 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: voodoo4.tlv.redhat.com:/export/vdo0/brick Brick2: voodoo5.tlv.redhat.com:/export/vdo0/brick Brick3: voodoo8.tlv.redhat.com:/export/vdo0/brick (arbiter) Options Reconfigured: storage.owner-gid: 36 storage.owner-uid: 36 server.event-threads: 4 client.event-threads: 4 cluster.choose-local: off user.cifs: off features.shard: on cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 8 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full cluster.server-quorum-type: server cluster.quorum-type: auto cluster.eager-lock: enable network.remote-dio: disable performance.low-prio-threads: 32 performance.io-cache: off performance.read-ahead: off performance.quick-read: off transport.address-family: inet nfs.disable: on performance.client-io-threads: on $ xfs_info /export/vdo0 meta-data=/dev/mapper/vdo0 isize=512 agcount=4, agsize=6553600 blks = sectsz=4096 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=0 data = bsize=4096 blocks=26214400, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=12800, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 Version-Release number of selected component (if applicable): Server: $ rpm -qa | grep glusterfs glusterfs-libs-6.4-1.fc29.x86_64 glusterfs-api-6.4-1.fc29.x86_64 glusterfs-client-xlators-6.4-1.fc29.x86_64 glusterfs-fuse-6.4-1.fc29.x86_64 glusterfs-6.4-1.fc29.x86_64 glusterfs-cli-6.4-1.fc29.x86_64 glusterfs-server-6.4-1.fc29.x86_64 Client: $ rpm -qa | grep glusterfs glusterfs-client-xlators-6.4-1.fc29.x86_64 glusterfs-6.4-1.fc29.x86_64 glusterfs-rdma-6.4-1.fc29.x86_64 glusterfs-cli-6.4-1.fc29.x86_64 glusterfs-libs-6.4-1.fc29.x86_64 glusterfs-fuse-6.4-1.fc29.x86_64 glusterfs-api-6.4-1.fc29.x86_64 How reproducible: Always. Steps to Reproduce: 1. Provision gluster volume over vdo (did not check without vdo) 2. Create a file of 501 bytes 3. Read the file using direct I/O Actual results: read() returns 4096 bytes, padding the file data with zeroes Expected results: read() returns actual file data (501 bytes) -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 2 19:20:19 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 02 Aug 2019 19:20:19 +0000 Subject: [Bugs] [Bug 1737141] read() returns more than file size when using direct I/O In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737141 Nir Soffer changed: What |Removed |Added ---------------------------------------------------------------------------- Dependent Products| |Red Hat Enterprise | |Virtualization Manager -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 2 19:21:20 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 02 Aug 2019 19:21:20 +0000 Subject: [Bugs] [Bug 1737141] read() returns more than file size when using direct I/O In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737141 Nir Soffer changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |teigland at redhat.com Flags| |needinfo?(teigland at redhat.c | |om) --- Comment #1 from Nir Soffer --- David, do you think this can affect sanlock? -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 2 19:25:02 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 02 Aug 2019 19:25:02 +0000 Subject: [Bugs] [Bug 1737141] read() returns more than file size when using direct I/O In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737141 Nir Soffer changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |kwolf at redhat.com Flags| |needinfo?(kwolf at redhat.com) --- Comment #2 from Nir Soffer --- Kevin, do you think this can affect qemu/qemu-img? -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Sun Aug 4 03:39:50 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Sun, 04 Aug 2019 03:39:50 +0000 Subject: [Bugs] [Bug 1193929] GlusterFS can be improved In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1193929 --- Comment #738 from Worker Ant --- REVIEW: https://review.gluster.org/23118 (tests: introduce BRICK_MUX_BAD_TESTS variable) merged (#4) on master by Atin Mukherjee -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Sun Aug 4 07:09:48 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Sun, 04 Aug 2019 07:09:48 +0000 Subject: [Bugs] [Bug 1736482] capture stat failure error while setting the gfid In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1736482 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-04 07:09:48 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23144 (storage/posix: set the op_errno to proper errno during gfid set) merged (#2) on master by Amar Tumballi -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Sun Aug 4 07:11:26 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Sun, 04 Aug 2019 07:11:26 +0000 Subject: [Bugs] [Bug 1693692] Increase code coverage from regression tests In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1693692 --- Comment #73 from Worker Ant --- REVIEW: https://review.gluster.org/23141 (xdr: add code so we have more xdr functions covered) merged (#2) on master by Amar Tumballi -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Sun Aug 4 13:03:30 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Sun, 04 Aug 2019 13:03:30 +0000 Subject: [Bugs] [Bug 1717824] Fencing: Added the tcmu-runner ALUA feature support but after one of node is rebooted the glfs_file_lock() get stucked In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1717824 Xiubo Li changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(spalai at redhat.com | |) --- Comment #22 from Xiubo Li --- @Susant, Since the Fencing patch has been into the release 6, so this fixing followed should be backported, right ? Thanks. BRs -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Sun Aug 4 14:01:11 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Sun, 04 Aug 2019 14:01:11 +0000 Subject: [Bugs] [Bug 1193929] GlusterFS can be improved In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1193929 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- External Bug ID| |Gluster.org Gerrit 23153 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Sun Aug 4 14:01:13 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Sun, 04 Aug 2019 14:01:13 +0000 Subject: [Bugs] [Bug 1193929] GlusterFS can be improved In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1193929 --- Comment #739 from Worker Ant --- REVIEW: https://review.gluster.org/23153 ([WIP]options.h: format OPTION_INIT similar to RECONF_INIT) posted (#1) for review on master by Yaniv Kaul -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Sun Aug 4 16:55:27 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Sun, 04 Aug 2019 16:55:27 +0000 Subject: [Bugs] [Bug 1732774] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732774 Yaniv Kaul changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|unspecified |urgent Severity|unspecified |urgent -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Sun Aug 4 16:55:37 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Sun, 04 Aug 2019 16:55:37 +0000 Subject: [Bugs] [Bug 1732792] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732792 Yaniv Kaul changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|unspecified |urgent Severity|unspecified |urgent -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Sun Aug 4 17:00:31 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Sun, 04 Aug 2019 17:00:31 +0000 Subject: [Bugs] [Bug 1732772] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732772 Yaniv Kaul changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|unspecified |urgent Severity|unspecified |urgent -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 5 03:07:25 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 05 Aug 2019 03:07:25 +0000 Subject: [Bugs] [Bug 1737288] New: nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737288 Bug ID: 1737288 Summary: nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on Product: GlusterFS Version: mainline Status: NEW Component: ctime Assignee: bugs at gluster.org Reporter: kinglongmee at gmail.com CC: bugs at gluster.org Target Milestone: --- Classification: Community Description of problem: I have a 4+2 disperse volume with ctime on, and export a dir from nfs-ganesha, storage.ctime: on features.utime: on When I copy a local file to nfs client, stat shows bad ctime for the file. # stat /mnt/nfs/test* File: ?/mnt/nfs/test1.sh? Size: 166 Blocks: 4 IO Block: 1048576 regular file Device: 27h/39d Inode: 10744358902712050257 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2019-08-05 09:49:00.000000000 +0800 Modify: 2019-08-05 09:49:00.000000000 +0800 Change: 2061-07-23 21:54:08.000000000 +0800 Birth: - File: ?/mnt/nfs/test2.sh? Size: 214 Blocks: 4 IO Block: 1048576 regular file Device: 27h/39d Inode: 12073556847735387788 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2019-08-05 09:49:00.000000000 +0800 Modify: 2019-08-05 09:49:00.000000000 +0800 Change: 2061-07-23 21:54:08.000000000 +0800 Birth: - # ps a 342188 pts/0 D+ 0:00 cp -i test1.sh test2.sh /mnt/nfs/ # gdb glusterfsd (gdb) p *stbuf $1 = {ia_flags = 0, ia_ino = 0, ia_dev = 0, ia_rdev = 0, ia_size = 0, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 174138658, ia_mtime = 2889352448, ia_ctime = 0, ia_btime = 0, ia_atime_nsec = 0, ia_mtime_nsec = 0, ia_ctime_nsec = 0, ia_btime_nsec = 0, ia_attributes = 0, ia_attributes_mask = 0, ia_gfid = '\000' , ia_type = IA_INVAL, ia_prot = { suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = { read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = { read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = { read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}} It is caused by nfs client create the copied file as EXCLUSIVE mode which set a verifier, the verifier is set to file's atime and mtime. nfs client set the verifier as, if (flags & O_EXCL) { data->arg.create.createmode = NFS3_CREATE_EXCLUSIVE; data->arg.create.verifier[0] = cpu_to_be32(jiffies); data->arg.create.verifier[1] = cpu_to_be32(current->pid); } the verifier[0] is set to file's atime, and verifier[1] is set to mtime. But utime at storage/posix set the mtime to ctime too at setattr and set ctime to a earlier time is not allowed. /* Earlier, mdata was updated only if the existing time is less * than the time to be updated. This would fail the scenarios * where mtime can be set to any time using the syscall. Hence * just updating without comparison. But the ctime is not * allowed to changed to older date. */ The following codes is used to find those PIDs which may cause a bad ctime for a copied file. ========================================================================== #include #include int swap_endian(int val){ val = ((val << 8)&0xFF00FF00) | ((val >> 8)&0x00FF00FF); return (val << 16)|(val >> 16); } // time of 2020/01/01 0:0:0 #define TO2020 1577808000 int main(int argc, char **argv) { unsigned int i = 0, val = 0; for (i = 0; i < 500000; i++) { val = swap_endian(i); if (val > TO2020) printf("%u %u\n", i, val); } return 0; } -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 5 03:09:23 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 05 Aug 2019 03:09:23 +0000 Subject: [Bugs] [Bug 1193929] GlusterFS can be improved In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1193929 --- Comment #740 from Worker Ant --- REVIEW: https://review.gluster.org/23130 (multiple files: reduce minor work under RCU_READ_LOCK) merged (#4) on master by Atin Mukherjee -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 5 03:12:34 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 05 Aug 2019 03:12:34 +0000 Subject: [Bugs] [Bug 1708929] Add more test coverage for shd mux In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1708929 --- Comment #8 from Worker Ant --- REVIEW: https://review.gluster.org/23135 (tests/shd: Break down shd mux tests into multiple .t file) merged (#2) on master by Atin Mukherjee -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 5 03:17:59 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 05 Aug 2019 03:17:59 +0000 Subject: [Bugs] [Bug 1737288] nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737288 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- External Bug ID| |Gluster.org Gerrit 23154 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 5 03:18:00 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 05 Aug 2019 03:18:00 +0000 Subject: [Bugs] [Bug 1737288] nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737288 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23154 (features/utime: always update ctime at setattr) posted (#1) for review on master by Kinglong Mee -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 5 03:20:44 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 05 Aug 2019 03:20:44 +0000 Subject: [Bugs] [Bug 1737291] New: features/locks: avoid use after freed of frame for blocked lock Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737291 Bug ID: 1737291 Summary: features/locks: avoid use after freed of frame for blocked lock Product: GlusterFS Version: mainline Status: NEW Component: locks Assignee: bugs at gluster.org Reporter: kinglongmee at gmail.com CC: bugs at gluster.org Target Milestone: --- Classification: Community Description of problem: The fop contains blocked lock may use freed frame info when other unlock fop has unwind the blocked lock. Because the blocked lock is added to block list in inode lock(or other lock), after that, when out of the inode lock, the fop contains the blocked lock should not use it. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 5 03:22:58 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 05 Aug 2019 03:22:58 +0000 Subject: [Bugs] [Bug 1737291] features/locks: avoid use after freed of frame for blocked lock In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737291 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- External Bug ID| |Gluster.org Gerrit 23155 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 5 03:23:00 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 05 Aug 2019 03:23:00 +0000 Subject: [Bugs] [Bug 1737291] features/locks: avoid use after freed of frame for blocked lock In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737291 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23155 (features/locks: avoid use after freed of frame for blocked lock) posted (#1) for review on master by Kinglong Mee -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 5 05:06:51 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 05 Aug 2019 05:06:51 +0000 Subject: [Bugs] [Bug 1736342] potential deadlock while processing callbacks in gfapi In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1736342 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-05 05:06:51 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23151 (gfapi: Fix deadlock while processing upcall) merged (#1) on release-5 by soumya k -- You are receiving this mail because: You are the QA Contact for the bug. You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 5 05:06:51 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 05 Aug 2019 05:06:51 +0000 Subject: [Bugs] [Bug 1733520] potential deadlock while processing callbacks in gfapi In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1733520 Bug 1733520 depends on bug 1736342, which changed state. Bug 1736342 Summary: potential deadlock while processing callbacks in gfapi https://bugzilla.redhat.com/show_bug.cgi?id=1736342 What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 5 05:06:52 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 05 Aug 2019 05:06:52 +0000 Subject: [Bugs] [Bug 1736341] potential deadlock while processing callbacks in gfapi In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1736341 Bug 1736341 depends on bug 1736342, which changed state. Bug 1736342 Summary: potential deadlock while processing callbacks in gfapi https://bugzilla.redhat.com/show_bug.cgi?id=1736342 What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE -- You are receiving this mail because: You are the QA Contact for the bug. You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 5 05:07:30 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 05 Aug 2019 05:07:30 +0000 Subject: [Bugs] [Bug 1736341] potential deadlock while processing callbacks in gfapi In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1736341 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-05 05:07:30 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23107 (gfapi: Fix deadlock while processing upcall) merged (#4) on release-6 by soumya k -- You are receiving this mail because: You are the QA Contact for the bug. You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 5 05:07:30 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 05 Aug 2019 05:07:30 +0000 Subject: [Bugs] [Bug 1733520] potential deadlock while processing callbacks in gfapi In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1733520 Bug 1733520 depends on bug 1736341, which changed state. Bug 1736341 Summary: potential deadlock while processing callbacks in gfapi https://bugzilla.redhat.com/show_bug.cgi?id=1736341 What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 5 05:29:39 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 05 Aug 2019 05:29:39 +0000 Subject: [Bugs] [Bug 1736564] GlusterFS files missing randomly. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1736564 Amar Tumballi changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |atumball at redhat.com --- Comment #1 from Amar Tumballi --- Can you try below command see if that helps? 'gluster volume set parallel-readdir disable' -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 5 05:30:18 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 05 Aug 2019 05:30:18 +0000 Subject: [Bugs] [Bug 1736848] Execute the "gluster peer probe invalid_hostname" thread deadlock or the glusterd process crashes In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1736848 Amar Tumballi changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|unspecified |high CC| |amukherj at redhat.com, | |atumball at redhat.com, | |moagrawa at redhat.com Assignee|bugs at gluster.org |srakonde at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 5 05:33:57 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 05 Aug 2019 05:33:57 +0000 Subject: [Bugs] [Bug 1737141] read() returns more than file size when using direct I/O In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737141 Amar Tumballi changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|unspecified |high CC| |atumball at redhat.com, | |csaba at redhat.com, | |kdhananj at redhat.com, | |khiremat at redhat.com, | |nbalacha at redhat.com, | |pkarampu at redhat.com, | |rabhat at redhat.com, | |rgowdapp at redhat.com, | |rkavunga at redhat.com --- Comment #3 from Amar Tumballi --- @Nir, thanks for the report. We will look into this. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 5 05:50:10 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 05 Aug 2019 05:50:10 +0000 Subject: [Bugs] [Bug 1737311] New: (glusterfs-6.5) - GlusterFS 6.5 tracker Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737311 Bug ID: 1737311 Summary: (glusterfs-6.5) - GlusterFS 6.5 tracker Product: GlusterFS Version: 6 Status: NEW Component: core Assignee: bugs at gluster.org Reporter: hgowtham at redhat.com CC: bugs at gluster.org Target Milestone: --- Classification: Community Description of problem: Tracker bug for 6.5 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 5 05:52:33 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 05 Aug 2019 05:52:33 +0000 Subject: [Bugs] [Bug 1737311] (glusterfs-6.5) - GlusterFS 6.5 tracker In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737311 hari gowtham changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |Tracking Depends On| |1736341, 1731509, 1730545, | |1716848 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1716848 [Bug 1716848] DHT: directory permissions are wiped out https://bugzilla.redhat.com/show_bug.cgi?id=1730545 [Bug 1730545] gluster v geo-rep status command timing out https://bugzilla.redhat.com/show_bug.cgi?id=1731509 [Bug 1731509] snapd crashes sometimes https://bugzilla.redhat.com/show_bug.cgi?id=1736341 [Bug 1736341] potential deadlock while processing callbacks in gfapi -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 5 05:52:33 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 05 Aug 2019 05:52:33 +0000 Subject: [Bugs] [Bug 1716848] DHT: directory permissions are wiped out In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1716848 hari gowtham changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1737311 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1737311 [Bug 1737311] (glusterfs-6.5) - GlusterFS 6.5 tracker -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 5 05:52:33 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 05 Aug 2019 05:52:33 +0000 Subject: [Bugs] [Bug 1730545] gluster v geo-rep status command timing out In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1730545 hari gowtham changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1737311 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1737311 [Bug 1737311] (glusterfs-6.5) - GlusterFS 6.5 tracker -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 5 05:52:33 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 05 Aug 2019 05:52:33 +0000 Subject: [Bugs] [Bug 1731509] snapd crashes sometimes In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1731509 hari gowtham changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1737311 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1737311 [Bug 1737311] (glusterfs-6.5) - GlusterFS 6.5 tracker -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 5 05:52:33 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 05 Aug 2019 05:52:33 +0000 Subject: [Bugs] [Bug 1736341] potential deadlock while processing callbacks in gfapi In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1736341 hari gowtham changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1737311 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1737311 [Bug 1737311] (glusterfs-6.5) - GlusterFS 6.5 tracker -- You are receiving this mail because: You are the QA Contact for the bug. You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 5 05:56:37 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 05 Aug 2019 05:56:37 +0000 Subject: [Bugs] [Bug 1737313] New: (glusterfs-5.9) - GlusterFS 5.9 tracker Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737313 Bug ID: 1737313 Summary: (glusterfs-5.9) - GlusterFS 5.9 tracker Product: GlusterFS Version: 5 Status: NEW Component: core Assignee: bugs at gluster.org Reporter: hgowtham at redhat.com CC: bugs at gluster.org Target Milestone: --- Classification: Community Description of problem: Tracker bug for 5.9 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 5 06:35:21 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 05 Aug 2019 06:35:21 +0000 Subject: [Bugs] [Bug 1733520] potential deadlock while processing callbacks in gfapi In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1733520 Sunil Kumar Acharya changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |sheggodu at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 5 06:43:14 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 05 Aug 2019 06:43:14 +0000 Subject: [Bugs] [Bug 1733520] potential deadlock while processing callbacks in gfapi In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1733520 Sunil Kumar Acharya changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |MODIFIED -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 5 06:49:01 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 05 Aug 2019 06:49:01 -0000 Subject: [Bugs] [Bug 1728766] Volume start failed when shd is down in one of the node in cluster In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1728766 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-05 06:48:55 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23007 (glusterd/shd: Return null proc if process is not running.) merged (#5) on master by Amar Tumballi -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 5 06:49:19 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 05 Aug 2019 06:49:19 +0000 Subject: [Bugs] [Bug 1727256] Directory pending heal in heal info output In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1727256 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-05 06:49:19 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23005 (graph/shd: attach volfile even if ctx->active is NULL) merged (#10) on master by Amar Tumballi -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 5 06:50:43 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 05 Aug 2019 06:50:43 +0000 Subject: [Bugs] [Bug 1737141] read() returns more than file size when using direct I/O In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737141 Krutika Dhananjay changed: What |Removed |Added ---------------------------------------------------------------------------- Component|fuse |sharding QA Contact| |bugs at gluster.org -- You are receiving this mail because: You are the QA Contact for the bug. You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 5 06:51:11 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 05 Aug 2019 06:51:11 +0000 Subject: [Bugs] [Bug 1737141] read() returns more than file size when using direct I/O In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737141 Krutika Dhananjay changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED Assignee|bugs at gluster.org |kdhananj at redhat.com -- You are receiving this mail because: You are the QA Contact for the bug. You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 5 07:08:37 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 05 Aug 2019 07:08:37 +0000 Subject: [Bugs] [Bug 1529842] Read-only listxattr syscalls seem to translate to non-read-only FOPs In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1529842 Aravinda VK changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |CLOSED Resolution|--- |CURRENTRELEASE Last Closed| |2019-08-05 07:08:37 --- Comment #3 from Aravinda VK --- (In reply to nh2 from comment #2) > Did you use the same version as I was using, 3.12.3? > > Unfortunately I won't be able to put time into re-reproducing this, as we > switched to Ceph a year ago. Thanks for the update. Closing this bug since the issue is not reproducible in the latest version, as mentioned in Comment 1. Please reopen if found again. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 5 07:08:39 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 05 Aug 2019 07:08:39 +0000 Subject: [Bugs] [Bug 1193929] GlusterFS can be improved In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1193929 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- External Bug ID| |Gluster.org Gerrit 23156 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 5 07:08:39 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 05 Aug 2019 07:08:39 +0000 Subject: [Bugs] [Bug 1193929] GlusterFS can be improved In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1193929 --- Comment #741 from Worker Ant --- REVIEW: https://review.gluster.org/23156 (index.{c|h}: minor changes) posted (#1) for review on master by Yaniv Kaul -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 5 08:30:06 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 05 Aug 2019 08:30:06 +0000 Subject: [Bugs] [Bug 1443027] Accessing file from aux mount is not triggering afr selfheals. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1443027 Ravishankar N changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |CLOSED Resolution|--- |WONTFIX Last Closed| |2019-08-05 08:30:06 --- Comment #3 from Ravishankar N --- I'm not planning to work on this bug any time soon. In the interest of reducing bug backlog count, I am closing it. Please feel free to re-open as needed. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 5 08:32:55 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 05 Aug 2019 08:32:55 +0000 Subject: [Bugs] [Bug 1682925] Gluster volumes never heal during oVirt 4.2->4.3 upgrade In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1682925 Ravishankar N changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |CLOSED Resolution|--- |INSUFFICIENT_DATA Last Closed| |2019-08-05 08:32:55 --- Comment #9 from Ravishankar N --- I'm closing this bug as there is not much information on what the problem is. Please feel free to re-open with the relevant details/ reproducer steps if issue occurs again. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 5 09:16:16 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 05 Aug 2019 09:16:16 +0000 Subject: [Bugs] [Bug 1737141] read() returns more than file size when using direct I/O In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737141 Kevin Wolf changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(teigland at redhat.c | |om) | |needinfo?(kwolf at redhat.com) | --- Comment #4 from Kevin Wolf --- (In reply to Nir Soffer from comment #2) > Kevin, do you think this can affect qemu/qemu-img? This is not a problem for QEMU as long as the file size is correct. If gluster didn't do the zero padding, QEMU would do it internally. In fact, fixing this in gluster may break the case of unaligned image sizes with QEMU because the image size is rounded up to sector (512 byte) granularity and the gluster driver turns short reads into errors. This would actually affect non-O_DIRECT, too, which already seems to behave this way, so can you just give this a quick test? -- You are receiving this mail because: You are the QA Contact for the bug. You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 5 10:00:38 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 05 Aug 2019 10:00:38 +0000 Subject: [Bugs] [Bug 1693184] A brick process(glusterfsd) died with 'memory violation' In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1693184 Ravishankar N changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |CLOSED Resolution|--- |INSUFFICIENT_DATA Flags|needinfo?(knjeong at growthsof | |t.co.kr) | Last Closed| |2019-08-05 10:00:38 --- Comment #2 from Ravishankar N --- Hi Jeong, I'm closing this bug as gluster 3.6 was EOL'd long back. Please feel free to re-open the bug if issue persists in any of the current supported releases. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 5 10:04:43 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 05 Aug 2019 10:04:43 +0000 Subject: [Bugs] [Bug 1727430] CPU Spike casue files unavailable In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1727430 Ravishankar N changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |CLOSED Resolution|--- |INSUFFICIENT_DATA Last Closed| |2019-08-05 10:04:43 --- Comment #3 from Ravishankar N --- Hi, I'm closing this bug since I haven't heard from you. Please feel free to re-open with the information I requested/ steps to reproduce. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 5 11:11:11 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 05 Aug 2019 11:11:11 +0000 Subject: [Bugs] [Bug 1414608] Weird directory appear when rmdir the directory in disk full condition In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1414608 Ravishankar N changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |CLOSED Resolution|--- |CURRENTRELEASE Last Closed| |2019-08-05 11:11:11 --- Comment #6 from Ravishankar N --- Disk full scenarios can cause problems ranging from ENOENT during creates to ENOTEMPTY during rmdirs to heal not progressing due to lack of gluster xattrs. Recent versions of gluster have 'storage.reserve' volume option in posix xlator to reserve space for rebalance, heals etc. That should mitigate this to some extent. But even that is not entirely race free as it checks and updates free space only once in 5 seconds. I'm going ahead and closing this bug as CURRENTRELEASE. George, please feel free to re-open the bug if storage.reserve doesn't solve your use case or if you have other ideas to solve this in a more robust way. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 5 11:31:49 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 05 Aug 2019 11:31:49 +0000 Subject: [Bugs] [Bug 1733880] [geo-rep]: gluster command not found while setting up a non-root session In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1733880 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-05 11:31:49 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23116 (geo-rep: Fix mount broker setup issue) merged (#3) on release-6 by hari gowtham -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 5 12:55:26 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 05 Aug 2019 12:55:26 +0000 Subject: [Bugs] [Bug 1730433] Gluster release 6 build errors on ppc64le In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1730433 Kaleb KEITHLEY changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |CLOSED CC| |kkeithle at redhat.com Resolution|--- |WORKSFORME Last Closed| |2019-08-05 12:55:26 --- Comment #3 from Kaleb KEITHLEY --- openssl-devel is in RHEL base (rhel-7-server-rpms repo) No need to build from source. You can get userspace-rcu(-devel) from EPEL or the CentOS Storage SIG. (yes, even for ppc64le, see http://mirror.centos.org/altarch/7.6.1810/storage/ppc64le/gluster-6/) But build from source if you wish. Closing as WORKSFORME. Reopen if necessary. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 5 12:59:58 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 05 Aug 2019 12:59:58 +0000 Subject: [Bugs] [Bug 1663337] Gluster documentation on quorum-reads option is incorrect In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1663337 Ravishankar N changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST Version|4.1 |mainline --- Comment #1 from Ravishankar N --- I have sent PR https://github.com/gluster/glusterdocs/pull/493 to update the documentation. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 5 13:35:45 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 05 Aug 2019 13:35:45 +0000 Subject: [Bugs] [Bug 1737484] geo-rep syncing significantly behind and also only one of the directories are synced with tracebacks seen In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737484 Aravinda VK changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED Assignee|bugs at gluster.org |avishwan at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 5 13:44:11 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 05 Aug 2019 13:44:11 +0000 Subject: [Bugs] [Bug 1737484] geo-rep syncing significantly behind and also only one of the directories are synced with tracebacks seen In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737484 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- External Bug ID| |Gluster.org Gerrit 23158 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 5 13:44:12 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 05 Aug 2019 13:44:12 +0000 Subject: [Bugs] [Bug 1737484] geo-rep syncing significantly behind and also only one of the directories are synced with tracebacks seen In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737484 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23158 (geo-rep: Fix Config Get Race) posted (#1) for review on master by Aravinda VK -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 5 15:08:32 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 05 Aug 2019 15:08:32 +0000 Subject: [Bugs] [Bug 1737141] read() returns more than file size when using direct I/O In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737141 --- Comment #5 from David Teigland --- (In reply to Nir Soffer from comment #1) > David, do you think this can affect sanlock? I don't think so. sanlock doesn't use any space that it didn't first write to initialize. -- You are receiving this mail because: You are the QA Contact for the bug. You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 5 17:34:09 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 05 Aug 2019 17:34:09 +0000 Subject: [Bugs] [Bug 1693692] Increase code coverage from regression tests In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1693692 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- External Bug ID| |Gluster.org Gerrit 23159 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 5 17:34:10 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 05 Aug 2019 17:34:10 +0000 Subject: [Bugs] [Bug 1693692] Increase code coverage from regression tests In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1693692 --- Comment #74 from Worker Ant --- REVIEW: https://review.gluster.org/23159 (tests/line-coverage: more commands added to cover xdrs) posted (#1) for review on master by Amar Tumballi -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 5 18:00:45 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 05 Aug 2019 18:00:45 +0000 Subject: [Bugs] [Bug 1734423] interrupts leak memory In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734423 Sunil Kumar Acharya changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |MODIFIED CC| |sheggodu at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 01:52:10 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 01:52:10 +0000 Subject: [Bugs] [Bug 1734423] interrupts leak memory In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734423 Sunil Kumar Acharya changed: What |Removed |Added ---------------------------------------------------------------------------- Fixed In Version| |glusterfs-6.0-11 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 01:52:11 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 01:52:11 +0000 Subject: [Bugs] [Bug 1733520] potential deadlock while processing callbacks in gfapi In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1733520 Sunil Kumar Acharya changed: What |Removed |Added ---------------------------------------------------------------------------- Fixed In Version| |glusterfs-6.0-11 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 01:55:20 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 01:55:20 +0000 Subject: [Bugs] [Bug 1733520] potential deadlock while processing callbacks in gfapi In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1733520 errata-xmlrpc changed: What |Removed |Added ---------------------------------------------------------------------------- Status|MODIFIED |ON_QA -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 01:55:23 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 01:55:23 +0000 Subject: [Bugs] [Bug 1734423] interrupts leak memory In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734423 errata-xmlrpc changed: What |Removed |Added ---------------------------------------------------------------------------- Status|MODIFIED |ON_QA -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 03:17:43 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 03:17:43 +0000 Subject: [Bugs] [Bug 1737676] New: Upgrading a Gluster node fails when user edited glusterd.vol file exists Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737676 Bug ID: 1737676 Summary: Upgrading a Gluster node fails when user edited glusterd.vol file exists Product: GlusterFS Version: mainline Status: NEW Component: glusterd Severity: high Assignee: bugs at gluster.org Reporter: amukherj at redhat.com CC: amukherj at redhat.com, bmekala at redhat.com, bugs at gluster.org, rhs-bugs at redhat.com, rtalur at redhat.com, sankarshan at redhat.com, storage-qa-internal at redhat.com, vbellur at redhat.com Blocks: 1734534 Target Milestone: --- Classification: Community +++ This bug was initially created as a clone of Bug #1734534 +++ Description of problem: When a user had edited the glusterd.vol file in /etc/glusterfs and updates the glusterfs packages, bricks cannot contact glusterd. Version-Release number of selected component (if applicable): glusterfs-6 and above (including mainline) How reproducible: Always Steps to Reproduce: 1. install glusterfs-5 or lower 2. create and start a volume 3. edit glusterd.vol and modify options like base port / max port 4. yum update gluster packages to glusterfs-6 5. restart volumes Actual result: bricks can't talk to glusterd Expected result: bricks should be able to talk to glusterd on 24007. --- Additional comment from Raghavendra Talur on 2019-08-05 14:33:01 UTC --- I think the change happened in c96778b354ea82943442aab158adbb854ca43a52 commit upstream and I propose that we fix this problem by keeping the default in code for glusterd and letting glusterd.vol override ride instead of having the value only in glusterd.vol. Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1734534 [Bug 1734534] Upgrading a RHGS node fails when user edited glusterd.vol file exists -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 6 03:22:26 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 03:22:26 +0000 Subject: [Bugs] [Bug 1737676] Upgrading a Gluster node fails when user edited glusterd.vol file exists In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737676 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- External Bug ID| |Gluster.org Gerrit 23160 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 6 03:22:27 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 03:22:27 +0000 Subject: [Bugs] [Bug 1737676] Upgrading a Gluster node fails when user edited glusterd.vol file exists In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737676 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23160 (rpc/transport: have default listen-port) posted (#1) for review on master by Atin Mukherjee -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 6 05:11:56 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 05:11:56 +0000 Subject: [Bugs] [Bug 1734305] ctime: When healing ctime xattr for legacy files, if multiple clients access and modify the same file, the ctime might be updated incorrectly. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734305 Vivek Das changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |vdas at redhat.com Blocks| |1696809 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 05:11:59 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 05:11:59 +0000 Subject: [Bugs] [Bug 1734305] ctime: When healing ctime xattr for legacy files, if multiple clients access and modify the same file, the ctime might be updated incorrectly. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734305 RHEL Product and Program Management changed: What |Removed |Added ---------------------------------------------------------------------------- Rule Engine Rule| |Gluster: Auto pm_ack at Eng | |In-Flight RHGS3.5 Blocker | |BZs Flags|rhgs-3.5.0? blocker? |rhgs-3.5.0+ blocker+ Rule Engine Rule| |665 Target Release|--- |RHGS 3.5.0 Rule Engine Rule| |666 Rule Engine Rule| |327 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 05:04:04 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 05:04:04 +0000 Subject: [Bugs] [Bug 1737313] (glusterfs-5.9) - GlusterFS 5.9 tracker In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737313 Amar Tumballi changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |Tracking CC| |atumball at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 6 05:45:19 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 05:45:19 +0000 Subject: [Bugs] [Bug 1736345] potential deadlock while processing callbacks in gfapi In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1736345 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-06 05:45:19 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23150 (gfapi: Fix deadlock while processing upcall) merged (#1) on release-7 by soumya k -- You are receiving this mail because: You are the QA Contact for the bug. You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 6 05:45:19 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 05:45:19 +0000 Subject: [Bugs] [Bug 1733520] potential deadlock while processing callbacks in gfapi In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1733520 Bug 1733520 depends on bug 1736345, which changed state. Bug 1736345 Summary: potential deadlock while processing callbacks in gfapi https://bugzilla.redhat.com/show_bug.cgi?id=1736345 What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 05:45:20 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 05:45:20 +0000 Subject: [Bugs] [Bug 1736341] potential deadlock while processing callbacks in gfapi In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1736341 Bug 1736341 depends on bug 1736345, which changed state. Bug 1736345 Summary: potential deadlock while processing callbacks in gfapi https://bugzilla.redhat.com/show_bug.cgi?id=1736345 What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE -- You are receiving this mail because: You are the QA Contact for the bug. You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 6 05:45:21 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 05:45:21 +0000 Subject: [Bugs] [Bug 1736342] potential deadlock while processing callbacks in gfapi In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1736342 Bug 1736342 depends on bug 1736345, which changed state. Bug 1736345 Summary: potential deadlock while processing callbacks in gfapi https://bugzilla.redhat.com/show_bug.cgi?id=1736345 What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE -- You are receiving this mail because: You are the QA Contact for the bug. You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 6 06:06:15 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 06:06:15 +0000 Subject: [Bugs] [Bug 1737288] nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737288 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-06 06:06:15 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23154 (features/utime: always update ctime at setattr) merged (#2) on master by Kotresh HR -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 06:17:19 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 06:17:19 +0000 Subject: [Bugs] [Bug 1737705] New: ctime: nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737705 Bug ID: 1737705 Summary: ctime: nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on Product: Red Hat Gluster Storage Version: rhgs-3.5 Status: NEW Component: core Severity: high Priority: medium Assignee: atumball at redhat.com Reporter: khiremat at redhat.com QA Contact: rhinduja at redhat.com CC: atumball at redhat.com, bugs at gluster.org, khiremat at redhat.com, kinglongmee at gmail.com, rhs-bugs at redhat.com, sankarshan at redhat.com, storage-qa-internal at redhat.com Depends On: 1737288 Target Milestone: --- Classification: Red Hat +++ This bug was initially created as a clone of Bug #1737288 +++ Description of problem: I have a 4+2 disperse volume with ctime on, and export a dir from nfs-ganesha, storage.ctime: on features.utime: on When I copy a local file to nfs client, stat shows bad ctime for the file. # stat /mnt/nfs/test* File: ?/mnt/nfs/test1.sh? Size: 166 Blocks: 4 IO Block: 1048576 regular file Device: 27h/39d Inode: 10744358902712050257 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2019-08-05 09:49:00.000000000 +0800 Modify: 2019-08-05 09:49:00.000000000 +0800 Change: 2061-07-23 21:54:08.000000000 +0800 Birth: - File: ?/mnt/nfs/test2.sh? Size: 214 Blocks: 4 IO Block: 1048576 regular file Device: 27h/39d Inode: 12073556847735387788 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2019-08-05 09:49:00.000000000 +0800 Modify: 2019-08-05 09:49:00.000000000 +0800 Change: 2061-07-23 21:54:08.000000000 +0800 Birth: - # ps a 342188 pts/0 D+ 0:00 cp -i test1.sh test2.sh /mnt/nfs/ # gdb glusterfsd (gdb) p *stbuf $1 = {ia_flags = 0, ia_ino = 0, ia_dev = 0, ia_rdev = 0, ia_size = 0, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 174138658, ia_mtime = 2889352448, ia_ctime = 0, ia_btime = 0, ia_atime_nsec = 0, ia_mtime_nsec = 0, ia_ctime_nsec = 0, ia_btime_nsec = 0, ia_attributes = 0, ia_attributes_mask = 0, ia_gfid = '\000' , ia_type = IA_INVAL, ia_prot = { suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = { read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = { read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = { read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}} It is caused by nfs client create the copied file as EXCLUSIVE mode which set a verifier, the verifier is set to file's atime and mtime. nfs client set the verifier as, if (flags & O_EXCL) { data->arg.create.createmode = NFS3_CREATE_EXCLUSIVE; data->arg.create.verifier[0] = cpu_to_be32(jiffies); data->arg.create.verifier[1] = cpu_to_be32(current->pid); } the verifier[0] is set to file's atime, and verifier[1] is set to mtime. But utime at storage/posix set the mtime to ctime too at setattr and set ctime to a earlier time is not allowed. /* Earlier, mdata was updated only if the existing time is less * than the time to be updated. This would fail the scenarios * where mtime can be set to any time using the syscall. Hence * just updating without comparison. But the ctime is not * allowed to changed to older date. */ The following codes is used to find those PIDs which may cause a bad ctime for a copied file. ========================================================================== #include #include int swap_endian(int val){ val = ((val << 8)&0xFF00FF00) | ((val >> 8)&0x00FF00FF); return (val << 16)|(val >> 16); } // time of 2020/01/01 0:0:0 #define TO2020 1577808000 int main(int argc, char **argv) { unsigned int i = 0, val = 0; for (i = 0; i < 500000; i++) { val = swap_endian(i); if (val > TO2020) printf("%u %u\n", i, val); } return 0; } --- Additional comment from Worker Ant on 2019-08-05 03:18:00 UTC --- REVIEW: https://review.gluster.org/23154 (features/utime: always update ctime at setattr) posted (#1) for review on master by Kinglong Mee --- Additional comment from Worker Ant on 2019-08-06 06:06:15 UTC --- REVIEW: https://review.gluster.org/23154 (features/utime: always update ctime at setattr) merged (#2) on master by Kotresh HR Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1737288 [Bug 1737288] nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 06:17:19 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 06:17:19 +0000 Subject: [Bugs] [Bug 1737288] nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737288 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1737705 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1737705 [Bug 1737705] ctime: nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 06:17:20 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 06:17:20 +0000 Subject: [Bugs] [Bug 1737705] ctime: nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737705 RHEL Product and Program Management changed: What |Removed |Added ---------------------------------------------------------------------------- Rule Engine Rule| |Gluster: set proposed | |release flag for new BZs at | |RHGS -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 06:17:58 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 06:17:58 +0000 Subject: [Bugs] [Bug 1737705] ctime: nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737705 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|atumball at redhat.com |khiremat at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 06:18:55 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 06:18:55 +0000 Subject: [Bugs] [Bug 1737705] ctime: nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737705 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 06:39:19 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 06:39:19 +0000 Subject: [Bugs] [Bug 1697293] DHT: print hash and layout values in hexadecimal format in the logs In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1697293 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-06 06:39:19 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23124 (cluster/dht: Log hashes in hex) merged (#2) on master by N Balachandran -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 06:43:59 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 06:43:59 +0000 Subject: [Bugs] [Bug 1737712] New: Unable to create geo-rep session on a non-root setup. Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737712 Bug ID: 1737712 Summary: Unable to create geo-rep session on a non-root setup. Product: GlusterFS Version: 6 Hardware: x86_64 OS: Linux Status: NEW Component: geo-replication Keywords: Regression Severity: high Assignee: bugs at gluster.org Reporter: khiremat at redhat.com CC: avishwan at redhat.com, bugs at gluster.org, csaba at redhat.com, khiremat at redhat.com, kiyer at redhat.com, rhs-bugs at redhat.com, sankarshan at redhat.com, storage-qa-internal at redhat.com Depends On: 1734734, 1734738 Target Milestone: --- Classification: Community Description of problem: Unable to create a non-root geo-rep session on a geo-rep setup. Version-Release number of selected component (if applicable): gluster-6.0 How reproducible: Always Steps to Reproduce: 1.Create a non-root geo-rep setup. 2.Try to create a non-root geo-rep session. Actual results: # gluster volume geo-replication master-rep geoaccount at 10.70.43.185::slave-rep create push-pem gluster command not found on 10.70.43.185 for user geoaccount. geo-replication command failed Expected results: # gluster volume geo-replication master-rep geoaccount at 10.70.43.185::slave-rep Creating geo-replication session between master-rep & geoaccount at 10.70.43.185::slave-rep has been successful Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1734734 [Bug 1734734] Unable to create geo-rep session on a non-root setup. https://bugzilla.redhat.com/show_bug.cgi?id=1734738 [Bug 1734738] Unable to create geo-rep session on a non-root setup. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 6 06:43:59 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 06:43:59 +0000 Subject: [Bugs] [Bug 1734738] Unable to create geo-rep session on a non-root setup. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734738 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1737712 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1737712 [Bug 1737712] Unable to create geo-rep session on a non-root setup. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 06:44:17 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 06:44:17 +0000 Subject: [Bugs] [Bug 1737712] Unable to create geo-rep session on a non-root setup. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737712 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED Assignee|bugs at gluster.org |khiremat at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 6 06:47:24 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 06:47:24 +0000 Subject: [Bugs] [Bug 1737712] Unable to create geo-rep session on a non-root setup. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737712 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- External Bug ID| |Gluster.org Gerrit 23161 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 06:47:25 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 06:47:25 +0000 Subject: [Bugs] [Bug 1737712] Unable to create geo-rep session on a non-root setup. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737712 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23161 (geo-rep: Fix mount broker setup issue) posted (#1) for review on release-6 by Kotresh HR -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 06:48:34 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 06:48:34 +0000 Subject: [Bugs] [Bug 1737716] New: Unable to create geo-rep session on a non-root setup. Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737716 Bug ID: 1737716 Summary: Unable to create geo-rep session on a non-root setup. Product: GlusterFS Version: 5 Hardware: x86_64 OS: Linux Status: NEW Component: geo-replication Keywords: Regression Severity: high Assignee: bugs at gluster.org Reporter: khiremat at redhat.com CC: avishwan at redhat.com, bugs at gluster.org, csaba at redhat.com, khiremat at redhat.com, kiyer at redhat.com, rhs-bugs at redhat.com, sankarshan at redhat.com, storage-qa-internal at redhat.com Depends On: 1734734, 1734738 Blocks: 1737712 Target Milestone: --- Classification: Community Description of problem: Unable to create a non-root geo-rep session on a geo-rep setup. Version-Release number of selected component (if applicable): gluster-5.0 How reproducible: Always Steps to Reproduce: 1.Create a non-root geo-rep setup. 2.Try to create a non-root geo-rep session. Actual results: # gluster volume geo-replication master-rep geoaccount at 10.70.43.185::slave-rep create push-pem gluster command not found on 10.70.43.185 for user geoaccount. geo-replication command failed Expected results: # gluster volume geo-replication master-rep geoaccount at 10.70.43.185::slave-rep Creating geo-replication session between master-rep & geoaccount at 10.70.43.185::slave-rep has been successful Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1734734 [Bug 1734734] Unable to create geo-rep session on a non-root setup. https://bugzilla.redhat.com/show_bug.cgi?id=1734738 [Bug 1734738] Unable to create geo-rep session on a non-root setup. https://bugzilla.redhat.com/show_bug.cgi?id=1737712 [Bug 1737712] Unable to create geo-rep session on a non-root setup. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 6 06:48:34 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 06:48:34 +0000 Subject: [Bugs] [Bug 1734738] Unable to create geo-rep session on a non-root setup. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734738 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1737716 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1737716 [Bug 1737716] Unable to create geo-rep session on a non-root setup. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 06:48:34 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 06:48:34 +0000 Subject: [Bugs] [Bug 1737712] Unable to create geo-rep session on a non-root setup. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737712 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Depends On| |1737716 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1737716 [Bug 1737716] Unable to create geo-rep session on a non-root setup. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 07:02:29 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 07:02:29 +0000 Subject: [Bugs] [Bug 1737716] Unable to create geo-rep session on a non-root setup. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737716 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED Assignee|bugs at gluster.org |khiremat at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 6 07:03:43 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 07:03:43 +0000 Subject: [Bugs] [Bug 1737716] Unable to create geo-rep session on a non-root setup. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737716 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- External Bug ID| |Gluster.org Gerrit 23162 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 07:03:44 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 07:03:44 +0000 Subject: [Bugs] [Bug 1737716] Unable to create geo-rep session on a non-root setup. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737716 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23162 (geo-rep: Fix mount broker setup issue) posted (#1) for review on release-5 by Kotresh HR -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 07:08:06 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 07:08:06 +0000 Subject: [Bugs] [Bug 1737676] Upgrading a Gluster node fails when user edited glusterd.vol file exists In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737676 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-06 07:08:06 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23160 (rpc/transport: have default listen-port) merged (#2) on master by Amar Tumballi -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 07:11:57 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 07:11:57 +0000 Subject: [Bugs] [Bug 1737745] New: ctime: When healing ctime xattr for legacy files, if multiple clients access and modify the same file, the ctime might be updated incorrectly. Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737745 Bug ID: 1737745 Summary: ctime: When healing ctime xattr for legacy files, if multiple clients access and modify the same file, the ctime might be updated incorrectly. Product: GlusterFS Version: 6 Hardware: x86_64 OS: Linux Status: NEW Component: ctime Severity: high Assignee: bugs at gluster.org Reporter: khiremat at redhat.com CC: bugs at gluster.org Depends On: 1734299 Blocks: 1734305 Target Milestone: --- Classification: Community +++ This bug was initially created as a clone of Bug #1734299 +++ Description of problem: Ctime heals the ctime xattr ("trusted.glusterfs.mdata") in lookup if it's not present. In a multi client scenario, there is a race which results in updating the ctime xattr to older value. e.g. Let c1 and c2 be two clients and file1 be the file which doesn't have the ctime xattr. Let the ctime of file1 be t1. (from backend, ctime heals time attributes from backend when not present). Now following operations are done on mount c1 -> ls -l /mnt1/file1 | c2 -> ls -l /mnt2/file1;echo "append" >> /mnt2/file1; The race is that the both c1 and c2 didn't fetch the ctime xattr in lookup, so both of them tries to heal ctime to time 't1'. If c2 wins the race and appends the file before c1 heals it, it sets the time to 't1' and updates it to 't2' (because of append). Now c1 proceeds to heal and sets it to 't1' which is incorrect. Version-Release number of selected component (if applicable): mainline How reproducible: Always Steps to Reproduce: 1. Create single brick gluster volume and start it 2. Mount at /mnt1 and /mnt2 3. Disable ctime gluster volume set ctime off 4. Create a file touch /mnt/file1 5. Enable ctime gluster volume set ctime on 6. Put a breakpoint at gf_utime_set_mdata_lookup_cbk on '/mnt1' 7. ls -l /mnt1/file1 This hits the break point, allow for root gfid and don't continue on stbuf->ia_gfid equals to file1's gfid 8. ls -l /mnt2/file1 9. The ctime xattr is healed from /mnt2. Capture it. getfattr -d -m . -e hex //file1 | grep mdata 10. echo "append" >> /mnt2/file1 and capture mdata getfattr -d -m . -e hex //file1 | grep mdata 11. Continue the break point at step 7 and capture the mdata Actual results: mdata xattr at step 11 is equal to step 9 (Went back in time) Expected results: mdata xattr at step 11 should be equal to step 10 Additional info: --- Additional comment from Worker Ant on 2019-07-30 08:14:18 UTC --- REVIEW: https://review.gluster.org/23131 (posix/ctime: Fix race during lookup ctime xattr heal) posted (#1) for review on master by Kotresh HR --- Additional comment from Worker Ant on 2019-08-01 02:59:49 UTC --- REVIEW: https://review.gluster.org/23131 (posix/ctime: Fix race during lookup ctime xattr heal) merged (#2) on master by Amar Tumballi Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1734299 [Bug 1734299] ctime: When healing ctime xattr for legacy files, if multiple clients access and modify the same file, the ctime might be updated incorrectly. https://bugzilla.redhat.com/show_bug.cgi?id=1734305 [Bug 1734305] ctime: When healing ctime xattr for legacy files, if multiple clients access and modify the same file, the ctime might be updated incorrectly. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 6 07:11:57 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 07:11:57 +0000 Subject: [Bugs] [Bug 1734299] ctime: When healing ctime xattr for legacy files, if multiple clients access and modify the same file, the ctime might be updated incorrectly. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734299 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1737745 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1737745 [Bug 1737745] ctime: When healing ctime xattr for legacy files, if multiple clients access and modify the same file, the ctime might be updated incorrectly. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 07:11:57 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 07:11:57 +0000 Subject: [Bugs] [Bug 1734305] ctime: When healing ctime xattr for legacy files, if multiple clients access and modify the same file, the ctime might be updated incorrectly. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734305 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Depends On| |1737745 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1737745 [Bug 1737745] ctime: When healing ctime xattr for legacy files, if multiple clients access and modify the same file, the ctime might be updated incorrectly. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 07:12:11 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 07:12:11 +0000 Subject: [Bugs] [Bug 1737745] ctime: When healing ctime xattr for legacy files, if multiple clients access and modify the same file, the ctime might be updated incorrectly. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737745 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED Assignee|bugs at gluster.org |khiremat at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 6 07:15:17 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 07:15:17 +0000 Subject: [Bugs] [Bug 1737746] New: ctime: nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737746 Bug ID: 1737746 Summary: ctime: nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on Product: GlusterFS Version: 6 Status: NEW Component: ctime Severity: high Priority: medium Assignee: bugs at gluster.org Reporter: khiremat at redhat.com CC: atumball at redhat.com, bugs at gluster.org, khiremat at redhat.com, kinglongmee at gmail.com Depends On: 1737288 Blocks: 1737705 Target Milestone: --- Classification: Community +++ This bug was initially created as a clone of Bug #1737288 +++ Description of problem: I have a 4+2 disperse volume with ctime on, and export a dir from nfs-ganesha, storage.ctime: on features.utime: on When I copy a local file to nfs client, stat shows bad ctime for the file. # stat /mnt/nfs/test* File: ?/mnt/nfs/test1.sh? Size: 166 Blocks: 4 IO Block: 1048576 regular file Device: 27h/39d Inode: 10744358902712050257 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2019-08-05 09:49:00.000000000 +0800 Modify: 2019-08-05 09:49:00.000000000 +0800 Change: 2061-07-23 21:54:08.000000000 +0800 Birth: - File: ?/mnt/nfs/test2.sh? Size: 214 Blocks: 4 IO Block: 1048576 regular file Device: 27h/39d Inode: 12073556847735387788 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2019-08-05 09:49:00.000000000 +0800 Modify: 2019-08-05 09:49:00.000000000 +0800 Change: 2061-07-23 21:54:08.000000000 +0800 Birth: - # ps a 342188 pts/0 D+ 0:00 cp -i test1.sh test2.sh /mnt/nfs/ # gdb glusterfsd (gdb) p *stbuf $1 = {ia_flags = 0, ia_ino = 0, ia_dev = 0, ia_rdev = 0, ia_size = 0, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 174138658, ia_mtime = 2889352448, ia_ctime = 0, ia_btime = 0, ia_atime_nsec = 0, ia_mtime_nsec = 0, ia_ctime_nsec = 0, ia_btime_nsec = 0, ia_attributes = 0, ia_attributes_mask = 0, ia_gfid = '\000' , ia_type = IA_INVAL, ia_prot = { suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = { read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = { read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = { read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}} It is caused by nfs client create the copied file as EXCLUSIVE mode which set a verifier, the verifier is set to file's atime and mtime. nfs client set the verifier as, if (flags & O_EXCL) { data->arg.create.createmode = NFS3_CREATE_EXCLUSIVE; data->arg.create.verifier[0] = cpu_to_be32(jiffies); data->arg.create.verifier[1] = cpu_to_be32(current->pid); } the verifier[0] is set to file's atime, and verifier[1] is set to mtime. But utime at storage/posix set the mtime to ctime too at setattr and set ctime to a earlier time is not allowed. /* Earlier, mdata was updated only if the existing time is less * than the time to be updated. This would fail the scenarios * where mtime can be set to any time using the syscall. Hence * just updating without comparison. But the ctime is not * allowed to changed to older date. */ The following codes is used to find those PIDs which may cause a bad ctime for a copied file. ========================================================================== #include #include int swap_endian(int val){ val = ((val << 8)&0xFF00FF00) | ((val >> 8)&0x00FF00FF); return (val << 16)|(val >> 16); } // time of 2020/01/01 0:0:0 #define TO2020 1577808000 int main(int argc, char **argv) { unsigned int i = 0, val = 0; for (i = 0; i < 500000; i++) { val = swap_endian(i); if (val > TO2020) printf("%u %u\n", i, val); } return 0; } --- Additional comment from Worker Ant on 2019-08-05 03:18:00 UTC --- REVIEW: https://review.gluster.org/23154 (features/utime: always update ctime at setattr) posted (#1) for review on master by Kinglong Mee --- Additional comment from Worker Ant on 2019-08-06 06:06:15 UTC --- REVIEW: https://review.gluster.org/23154 (features/utime: always update ctime at setattr) merged (#2) on master by Kotresh HR Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1737288 [Bug 1737288] nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on https://bugzilla.redhat.com/show_bug.cgi?id=1737705 [Bug 1737705] ctime: nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 6 07:15:17 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 07:15:17 +0000 Subject: [Bugs] [Bug 1737288] nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737288 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1737746 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1737746 [Bug 1737746] ctime: nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 07:15:17 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 07:15:17 +0000 Subject: [Bugs] [Bug 1737705] ctime: nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737705 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Depends On| |1737746 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1737746 [Bug 1737746] ctime: nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 07:15:33 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 07:15:33 +0000 Subject: [Bugs] [Bug 1737746] ctime: nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737746 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED Assignee|bugs at gluster.org |khiremat at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 6 07:27:32 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 07:27:32 +0000 Subject: [Bugs] [Bug 1737745] ctime: When healing ctime xattr for legacy files, if multiple clients access and modify the same file, the ctime might be updated incorrectly. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737745 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- External Bug ID| |Gluster.org Gerrit 23163 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 07:27:33 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 07:27:33 +0000 Subject: [Bugs] [Bug 1737745] ctime: When healing ctime xattr for legacy files, if multiple clients access and modify the same file, the ctime might be updated incorrectly. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737745 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23163 (posix/ctime: Fix race during lookup ctime xattr heal) posted (#1) for review on release-6 by Kotresh HR -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 07:28:38 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 07:28:38 +0000 Subject: [Bugs] [Bug 1737746] ctime: nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737746 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- External Bug ID| |Gluster.org Gerrit 23164 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 07:28:39 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 07:28:39 +0000 Subject: [Bugs] [Bug 1737746] ctime: nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737746 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23164 (features/utime: always update ctime at setattr) posted (#1) for review on release-6 by Kotresh HR -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 07:30:53 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 07:30:53 +0000 Subject: [Bugs] [Bug 1734305] ctime: When healing ctime xattr for legacy files, if multiple clients access and modify the same file, the ctime might be updated incorrectly. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734305 Sunil Kumar Acharya changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |sheggodu at redhat.com Flags| |needinfo?(khiremat at redhat.c | |om) -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 07:42:53 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 07:42:53 +0000 Subject: [Bugs] [Bug 1734305] ctime: When healing ctime xattr for legacy files, if multiple clients access and modify the same file, the ctime might be updated incorrectly. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734305 Atin Mukherjee changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |amukherj at redhat.com Flags|needinfo?(khiremat at redhat.c | |om) | -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 08:22:15 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 08:22:15 +0000 Subject: [Bugs] [Bug 1734305] ctime: When healing ctime xattr for legacy files, if multiple clients access and modify the same file, the ctime might be updated incorrectly. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734305 Sunil Kumar Acharya changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |MODIFIED -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 08:34:27 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 08:34:27 +0000 Subject: [Bugs] [Bug 1737778] New: ocf resource agent for volumes don't work in non-standard environment Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737778 Bug ID: 1737778 Summary: ocf resource agent for volumes don't work in non-standard environment Product: GlusterFS Version: 4.1 Status: NEW Component: scripts Assignee: bugs at gluster.org Reporter: jiri.lunacek at hosting90.cz CC: bugs at gluster.org Target Milestone: --- Classification: Community Description of problem: ocf resource agent for volumes don't work when short hostnames don't match gluster peer names and when volume is not defined across all peers -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 6 08:39:14 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 08:39:14 +0000 Subject: [Bugs] [Bug 1732774] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732774 Rejy M Cyriac changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |blocker? -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 08:40:14 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 08:40:14 +0000 Subject: [Bugs] [Bug 1732792] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732792 Rejy M Cyriac changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |blocker? -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 08:40:27 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 08:40:27 +0000 Subject: [Bugs] [Bug 1732793] I/O error on writes to a disperse volume when replace-brick is executed In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732793 Rejy M Cyriac changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |blocker? -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 08:46:54 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 08:46:54 +0000 Subject: [Bugs] [Bug 1737778] ocf resource agent for volumes don't work in non-standard environment In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737778 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- External Bug ID| |Gluster.org Gerrit 23165 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 6 08:46:55 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 08:46:55 +0000 Subject: [Bugs] [Bug 1737778] ocf resource agent for volumes don't work in non-standard environment In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737778 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23165 (peer_map parameter and fix in state detection when no brick is running on peer) posted (#1) for review on master by None -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 6 09:03:42 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 09:03:42 +0000 Subject: [Bugs] [Bug 1732774] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732774 Ashish Pandey changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 09:12:59 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 09:12:59 +0000 Subject: [Bugs] [Bug 1735514] Open fd heal should filter O_APPEND/O_EXCL In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1735514 Ashish Pandey changed: What |Removed |Added ---------------------------------------------------------------------------- Doc Type|If docs needed, set a value |No Doc Update Red Hat Bugzilla changed: What |Removed |Added ---------------------------------------------------------------------------- Doc Type|No Doc Update |No Doc Update -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 09:13:58 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 09:13:58 +0000 Subject: [Bugs] [Bug 1732770] fix truncate lock to cover the write in tuncate clean In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732770 Ashish Pandey changed: What |Removed |Added ---------------------------------------------------------------------------- Doc Type|If docs needed, set a value |No Doc Update Red Hat Bugzilla changed: What |Removed |Added ---------------------------------------------------------------------------- Doc Type|No Doc Update |No Doc Update -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 09:19:23 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 09:19:23 +0000 Subject: [Bugs] [Bug 1729108] Memory leak in glusterfsd process In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1729108 Red Hat Bugzilla changed: What |Removed |Added ---------------------------------------------------------------------------- Doc Type|If docs needed, set a value |No Doc Update -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 09:34:48 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 09:34:48 +0000 Subject: [Bugs] [Bug 1737484] geo-rep syncing significantly behind and also only one of the directories are synced with tracebacks seen In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737484 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23158 (geo-rep: Fix Config Get Race) merged (#2) on master by Amar Tumballi -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 09:39:25 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 09:39:25 +0000 Subject: [Bugs] [Bug 1733520] potential deadlock while processing callbacks in gfapi In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1733520 Red Hat Bugzilla changed: What |Removed |Added ---------------------------------------------------------------------------- Doc Type|If docs needed, set a value |No Doc Update -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 09:52:02 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 09:52:02 +0000 Subject: [Bugs] [Bug 1734423] interrupts leak memory In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734423 nchilaka changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |nchilaka at redhat.com QA Contact|rhinduja at redhat.com |nchilaka at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 10:38:09 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 10:38:09 +0000 Subject: [Bugs] [Bug 1727727] Build+Packaging Automation In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1727727 hari gowtham changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(mscherer at redhat.c | |om) --- Comment #9 from hari gowtham --- Hi Misc, Can you please create the machines as mentioned above, so we can setup them up? -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 10:58:43 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 10:58:43 +0000 Subject: [Bugs] [Bug 1620580] Deleted a volume and created a new volume with similar but not the same name. The kubernetes pod still keeps on running and doesn't crash. Still possible to write to gluster mount In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1620580 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- External Bug ID| |Gluster.org Gerrit 23166 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 10:58:44 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 10:58:44 +0000 Subject: [Bugs] [Bug 1620580] Deleted a volume and created a new volume with similar but not the same name. The kubernetes pod still keeps on running and doesn't crash. Still possible to write to gluster mount In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1620580 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST --- Comment #3 from Worker Ant --- REVIEW: https://review.gluster.org/23166 (protocol/handshake: pass volume-id for extra check) posted (#1) for review on master by Amar Tumballi -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 10:59:23 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 10:59:23 +0000 Subject: [Bugs] [Bug 1737716] Unable to create geo-rep session on a non-root setup. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737716 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-06 10:59:23 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23162 (geo-rep: Fix mount broker setup issue) merged (#1) on release-5 by Kotresh HR -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 10:59:24 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 10:59:24 +0000 Subject: [Bugs] [Bug 1737712] Unable to create geo-rep session on a non-root setup. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737712 Bug 1737712 depends on bug 1737716, which changed state. Bug 1737716 Summary: Unable to create geo-rep session on a non-root setup. https://bugzilla.redhat.com/show_bug.cgi?id=1737716 What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 05:04:04 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 05:04:04 +0000 Subject: [Bugs] [Bug 1737313] (glusterfs-5.9) - GlusterFS 5.9 tracker In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737313 hari gowtham changed: What |Removed |Added ---------------------------------------------------------------------------- Depends On| |1737716, 1736342, 1733881 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1733881 [Bug 1733881] [geo-rep]: gluster command not found while setting up a non-root session https://bugzilla.redhat.com/show_bug.cgi?id=1736342 [Bug 1736342] potential deadlock while processing callbacks in gfapi https://bugzilla.redhat.com/show_bug.cgi?id=1737716 [Bug 1737716] Unable to create geo-rep session on a non-root setup. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 6 11:07:50 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 11:07:50 +0000 Subject: [Bugs] [Bug 1733881] [geo-rep]: gluster command not found while setting up a non-root session In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1733881 hari gowtham changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1737313 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1737313 [Bug 1737313] (glusterfs-5.9) - GlusterFS 5.9 tracker -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 11:07:50 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 11:07:50 +0000 Subject: [Bugs] [Bug 1736342] potential deadlock while processing callbacks in gfapi In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1736342 hari gowtham changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1737313 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1737313 [Bug 1737313] (glusterfs-5.9) - GlusterFS 5.9 tracker -- You are receiving this mail because: You are the QA Contact for the bug. You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 6 11:07:50 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 11:07:50 +0000 Subject: [Bugs] [Bug 1737716] Unable to create geo-rep session on a non-root setup. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737716 hari gowtham changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1737313 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1737313 [Bug 1737313] (glusterfs-5.9) - GlusterFS 5.9 tracker -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 11:08:41 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 11:08:41 +0000 Subject: [Bugs] [Bug 1737313] (glusterfs-5.9) - GlusterFS 5.9 tracker In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737313 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- External Bug ID| |Gluster.org Gerrit 23167 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 6 11:08:42 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 11:08:42 +0000 Subject: [Bugs] [Bug 1737313] (glusterfs-5.9) - GlusterFS 5.9 tracker In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737313 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23167 (doc: Added release 5.9 notes) posted (#1) for review on release-5 by hari gowtham -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 6 11:12:00 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 11:12:00 +0000 Subject: [Bugs] [Bug 1726175] CentOs 6 GlusterFS client creates files with time 01/01/1970 In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1726175 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Component|fuse |ctime Assignee|khiremat at redhat.com |bugs at gluster.org -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 6 11:28:27 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 11:28:27 +0000 Subject: [Bugs] [Bug 1641969] Mounted Dir Gets Error in GlusterFS Storage Cluster with SSL/TLS Encryption as Doing add-brick and remove-brick Repeatly In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1641969 Amar Tumballi changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |CLOSED Resolution|--- |DEFERRED Last Closed| |2019-08-06 11:28:27 --- Comment #3 from Amar Tumballi --- The usecase of 'repeated' add-brick remove-brick is something we are not focusing right now. Marking it DEFERRED till it gets a focus. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 11:51:08 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 11:51:08 +0000 Subject: [Bugs] [Bug 1733885] ctime: Upgrade/Enabling ctime feature wrongly updates older files with latest {a|m|c}time In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1733885 --- Comment #3 from Worker Ant --- REVIEW: https://review.gluster.org/23119 (ctime: Set mdata xattr on legacy files) merged (#2) on release-6 by Kotresh HR -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 11:51:31 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 11:51:31 +0000 Subject: [Bugs] [Bug 1733885] ctime: Upgrade/Enabling ctime feature wrongly updates older files with latest {a|m|c}time In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1733885 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-06 11:51:31 --- Comment #4 from Worker Ant --- REVIEW: https://review.gluster.org/23120 (features/utime: Fix mem_put crash) merged (#2) on release-6 by Kotresh HR -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 11:57:45 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 11:57:45 +0000 Subject: [Bugs] [Bug 1686568] [geo-rep]: Checksum mismatch when 2x2 vols are converted to arbiter In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1686568 hari gowtham changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |hgowtham at redhat.com Assignee|ksubrahm at redhat.com |hgowtham at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 12:56:44 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 12:56:44 +0000 Subject: [Bugs] [Bug 1737313] (glusterfs-5.9) - GlusterFS 5.9 tracker In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737313 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-06 12:56:44 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23167 (doc: Added release 5.9 notes) merged (#2) on release-5 by hari gowtham -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 6 13:00:45 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 13:00:45 +0000 Subject: [Bugs] [Bug 1410439] glusterfind pre output file is empty In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1410439 Shwetha K Acharya changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |POST -- You are receiving this mail because: You are the QA Contact for the bug. You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 6 20:23:42 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 20:23:42 +0000 Subject: [Bugs] [Bug 1193929] GlusterFS can be improved In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1193929 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- External Bug ID| |Gluster.org Gerrit 23169 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 6 20:23:44 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 06 Aug 2019 20:23:44 +0000 Subject: [Bugs] [Bug 1193929] GlusterFS can be improved In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1193929 --- Comment #742 from Worker Ant --- REVIEW: https://review.gluster.org/23169 ([WIP]client-handshake.c: minor changes and removal of dead code.) posted (#1) for review on master by Yaniv Kaul -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Wed Aug 7 04:53:15 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 07 Aug 2019 04:53:15 +0000 Subject: [Bugs] [Bug 1737712] Unable to create geo-rep session on a non-root setup. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737712 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-07 04:53:15 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23161 (geo-rep: Fix mount broker setup issue) merged (#2) on release-6 by Sunny Kumar -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 7 05:08:27 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 07 Aug 2019 05:08:27 +0000 Subject: [Bugs] [Bug 1737745] ctime: When healing ctime xattr for legacy files, if multiple clients access and modify the same file, the ctime might be updated incorrectly. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737745 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-07 05:08:27 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23163 (posix/ctime: Fix race during lookup ctime xattr heal) merged (#2) on release-6 by hari gowtham -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 7 05:08:27 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 07 Aug 2019 05:08:27 +0000 Subject: [Bugs] [Bug 1734305] ctime: When healing ctime xattr for legacy files, if multiple clients access and modify the same file, the ctime might be updated incorrectly. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734305 Bug 1734305 depends on bug 1737745, which changed state. Bug 1737745 Summary: ctime: When healing ctime xattr for legacy files, if multiple clients access and modify the same file, the ctime might be updated incorrectly. https://bugzilla.redhat.com/show_bug.cgi?id=1737745 What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 7 05:09:46 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 07 Aug 2019 05:09:46 +0000 Subject: [Bugs] [Bug 1737746] ctime: nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737746 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-07 05:09:46 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23164 (features/utime: always update ctime at setattr) merged (#2) on release-6 by hari gowtham -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 7 05:09:47 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 07 Aug 2019 05:09:47 +0000 Subject: [Bugs] [Bug 1737705] ctime: nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737705 Bug 1737705 depends on bug 1737746, which changed state. Bug 1737746 Summary: ctime: nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on https://bugzilla.redhat.com/show_bug.cgi?id=1737746 What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 7 05:53:53 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 07 Aug 2019 05:53:53 +0000 Subject: [Bugs] [Bug 1737311] (glusterfs-6.5) - GlusterFS 6.5 tracker In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737311 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- External Bug ID| |Gluster.org Gerrit 23170 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Wed Aug 7 05:53:54 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 07 Aug 2019 05:53:54 +0000 Subject: [Bugs] [Bug 1737311] (glusterfs-6.5) - GlusterFS 6.5 tracker In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737311 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23170 (doc: Added release 6.5 notes) posted (#1) for review on release-6 by hari gowtham -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Wed Aug 7 06:15:15 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 07 Aug 2019 06:15:15 +0000 Subject: [Bugs] [Bug 1727081] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1727081 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed|2019-07-26 07:11:59 |2019-08-07 06:15:15 --- Comment #17 from Worker Ant --- REVIEW: https://review.gluster.org/23147 (cluster/ec: Update lock->good_mask on parent fop failure) merged (#2) on master by Pranith Kumar Karampuri -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 7 06:15:16 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 07 Aug 2019 06:15:16 +0000 Subject: [Bugs] [Bug 1732772] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732772 Bug 1732772 depends on bug 1727081, which changed state. Bug 1727081 Summary: Disperse volume : data corruption with ftruncate data in 4+2 config https://bugzilla.redhat.com/show_bug.cgi?id=1727081 What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 7 06:15:17 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 07 Aug 2019 06:15:17 +0000 Subject: [Bugs] [Bug 1732774] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732774 Bug 1732774 depends on bug 1727081, which changed state. Bug 1727081 Summary: Disperse volume : data corruption with ftruncate data in 4+2 config https://bugzilla.redhat.com/show_bug.cgi?id=1727081 What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 7 06:15:19 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 07 Aug 2019 06:15:19 +0000 Subject: [Bugs] [Bug 1732792] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732792 Bug 1732792 depends on bug 1727081, which changed state. Bug 1727081 Summary: Disperse volume : data corruption with ftruncate data in 4+2 config https://bugzilla.redhat.com/show_bug.cgi?id=1727081 What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 7 06:51:34 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 07 Aug 2019 06:51:34 +0000 Subject: [Bugs] [Bug 1738419] New: read() returns more than file size when using direct I/O Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1738419 Bug ID: 1738419 Summary: read() returns more than file size when using direct I/O Product: GlusterFS Version: mainline Status: NEW Component: sharding Keywords: Triaged Severity: high Priority: high Assignee: kdhananj at redhat.com Reporter: kdhananj at redhat.com QA Contact: bugs at gluster.org CC: atumball at redhat.com, bugs at gluster.org, csaba at redhat.com, kdhananj at redhat.com, khiremat at redhat.com, kwolf at redhat.com, nbalacha at redhat.com, nsoffer at redhat.com, pkarampu at redhat.com, rabhat at redhat.com, rgowdapp at redhat.com, rkavunga at redhat.com, sabose at redhat.com, teigland at redhat.com, tnisan at redhat.com, vjuranek at redhat.com Blocks: 1737141 Target Milestone: --- Classification: Community +++ This bug was initially created as a clone of Bug #1737141 +++ Description of problem: When using direct I/O, reading from a file returns more data, padding the file data with zeroes. Here is an example. ## On a host mounting gluster using fuse $ pwd /rhev/data-center/mnt/glusterSD/voodoo4.tlv.redhat.com:_gv0/de566475-5b67-4987-abf3-3dc98083b44c/dom_md $ mount | grep glusterfs voodoo4.tlv.redhat.com:/gv0 on /rhev/data-center/mnt/glusterSD/voodoo4.tlv.redhat.com:_gv0 type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) $ stat metadata File: metadata Size: 501 Blocks: 1 IO Block: 131072 regular file Device: 31h/49d Inode: 13313776956941938127 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 36/ vdsm) Gid: ( 36/ kvm) Context: system_u:object_r:fusefs_t:s0 Access: 2019-08-01 22:21:49.186381528 +0300 Modify: 2019-08-01 22:21:49.427404135 +0300 Change: 2019-08-01 22:21:49.969739575 +0300 Birth: - $ cat metadata ALIGNMENT=1048576 BLOCK_SIZE=4096 CLASS=Data DESCRIPTION=gv0 IOOPTIMEOUTSEC=10 LEASERETRIES=3 LEASETIMESEC=60 LOCKPOLICY= LOCKRENEWALINTERVALSEC=5 MASTER_VERSION=1 POOL_DESCRIPTION=4k-gluster POOL_DOMAINS=de566475-5b67-4987-abf3-3dc98083b44c:Active POOL_SPM_ID=-1 POOL_SPM_LVER=-1 POOL_UUID=44cfb532-3144-48bd-a08c-83065a5a1032 REMOTE_PATH=voodoo4.tlv.redhat.com:/gv0 ROLE=Master SDUUID=de566475-5b67-4987-abf3-3dc98083b44c TYPE=GLUSTERFS VERSION=5 _SHA_CKSUM=3d1cb836f4c93679fc5a4e7218425afe473e3cfa $ dd if=metadata bs=4096 count=1 of=/dev/null 0+1 records in 0+1 records out 501 bytes copied, 0.000340298 s, 1.5 MB/s $ dd if=metadata bs=4096 count=1 of=/dev/null iflag=direct 1+0 records in 1+0 records out 4096 bytes (4.1 kB, 4.0 KiB) copied, 0.00398529 s, 1.0 MB/s Checking the copied data, the actual content of the file is padded with zeros to 4096 bytes. ## On the one of the gluster nodes $ pwd /export/vdo0/brick/de566475-5b67-4987-abf3-3dc98083b44c/dom_md $ stat metadata File: metadata Size: 501 Blocks: 16 IO Block: 4096 regular file Device: fd02h/64770d Inode: 149 Links: 2 Access: (0644/-rw-r--r--) Uid: ( 36/ UNKNOWN) Gid: ( 36/ kvm) Context: system_u:object_r:usr_t:s0 Access: 2019-08-01 22:21:50.380425478 +0300 Modify: 2019-08-01 22:21:49.427397589 +0300 Change: 2019-08-01 22:21:50.374425302 +0300 Birth: - $ dd if=metadata bs=4096 count=1 of=/dev/null 0+1 records in 0+1 records out 501 bytes copied, 0.000991636 s, 505 kB/s $ dd if=metadata bs=4096 count=1 of=/dev/null iflag=direct 0+1 records in 0+1 records out 501 bytes copied, 0.0011381 s, 440 kB/s This proves that the issue is in gluster. # gluster volume info gv0 Volume Name: gv0 Type: Replicate Volume ID: cbc5a2ad-7246-42fc-a78f-70175fb7bf22 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: voodoo4.tlv.redhat.com:/export/vdo0/brick Brick2: voodoo5.tlv.redhat.com:/export/vdo0/brick Brick3: voodoo8.tlv.redhat.com:/export/vdo0/brick (arbiter) Options Reconfigured: storage.owner-gid: 36 storage.owner-uid: 36 server.event-threads: 4 client.event-threads: 4 cluster.choose-local: off user.cifs: off features.shard: on cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 8 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full cluster.server-quorum-type: server cluster.quorum-type: auto cluster.eager-lock: enable network.remote-dio: disable performance.low-prio-threads: 32 performance.io-cache: off performance.read-ahead: off performance.quick-read: off transport.address-family: inet nfs.disable: on performance.client-io-threads: on $ xfs_info /export/vdo0 meta-data=/dev/mapper/vdo0 isize=512 agcount=4, agsize=6553600 blks = sectsz=4096 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=0 data = bsize=4096 blocks=26214400, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=12800, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 Version-Release number of selected component (if applicable): Server: $ rpm -qa | grep glusterfs glusterfs-libs-6.4-1.fc29.x86_64 glusterfs-api-6.4-1.fc29.x86_64 glusterfs-client-xlators-6.4-1.fc29.x86_64 glusterfs-fuse-6.4-1.fc29.x86_64 glusterfs-6.4-1.fc29.x86_64 glusterfs-cli-6.4-1.fc29.x86_64 glusterfs-server-6.4-1.fc29.x86_64 Client: $ rpm -qa | grep glusterfs glusterfs-client-xlators-6.4-1.fc29.x86_64 glusterfs-6.4-1.fc29.x86_64 glusterfs-rdma-6.4-1.fc29.x86_64 glusterfs-cli-6.4-1.fc29.x86_64 glusterfs-libs-6.4-1.fc29.x86_64 glusterfs-fuse-6.4-1.fc29.x86_64 glusterfs-api-6.4-1.fc29.x86_64 How reproducible: Always. Steps to Reproduce: 1. Provision gluster volume over vdo (did not check without vdo) 2. Create a file of 501 bytes 3. Read the file using direct I/O Actual results: read() returns 4096 bytes, padding the file data with zeroes Expected results: read() returns actual file data (501 bytes) --- Additional comment from Nir Soffer on 2019-08-02 19:21:20 UTC --- David, do you think this can affect sanlock? --- Additional comment from Nir Soffer on 2019-08-02 19:25:02 UTC --- Kevin, do you think this can affect qemu/qemu-img? --- Additional comment from Amar Tumballi on 2019-08-05 05:33:57 UTC --- @Nir, thanks for the report. We will look into this. --- Additional comment from Kevin Wolf on 2019-08-05 09:16:16 UTC --- (In reply to Nir Soffer from comment #2) > Kevin, do you think this can affect qemu/qemu-img? This is not a problem for QEMU as long as the file size is correct. If gluster didn't do the zero padding, QEMU would do it internally. In fact, fixing this in gluster may break the case of unaligned image sizes with QEMU because the image size is rounded up to sector (512 byte) granularity and the gluster driver turns short reads into errors. This would actually affect non-O_DIRECT, too, which already seems to behave this way, so can you just give this a quick test? --- Additional comment from David Teigland on 2019-08-05 15:08:32 UTC --- (In reply to Nir Soffer from comment #1) > David, do you think this can affect sanlock? I don't think so. sanlock doesn't use any space that it didn't first write to initialize. Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1737141 [Bug 1737141] read() returns more than file size when using direct I/O -- You are receiving this mail because: You are the QA Contact for the bug. You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 7 06:51:34 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 07 Aug 2019 06:51:34 +0000 Subject: [Bugs] [Bug 1737141] read() returns more than file size when using direct I/O In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737141 Krutika Dhananjay changed: What |Removed |Added ---------------------------------------------------------------------------- Depends On| |1738419 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1738419 [Bug 1738419] read() returns more than file size when using direct I/O -- You are receiving this mail because: You are the QA Contact for the bug. You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 7 06:58:22 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 07 Aug 2019 06:58:22 +0000 Subject: [Bugs] [Bug 1732772] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732772 Pranith Kumar K changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ON_QA |POST -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 7 07:09:23 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 07 Aug 2019 07:09:23 +0000 Subject: [Bugs] [Bug 1732772] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732772 --- Comment #7 from Sunil Kumar Acharya --- Downstream Patch: https://code.engineering.redhat.com/gerrit/#/c/177958/ -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 7 07:09:41 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 07 Aug 2019 07:09:41 +0000 Subject: [Bugs] [Bug 1732772] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732772 Sunil Kumar Acharya changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |MODIFIED -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 7 07:19:12 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 07 Aug 2019 07:19:12 +0000 Subject: [Bugs] [Bug 1732792] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732792 RHEL Product and Program Management changed: What |Removed |Added ---------------------------------------------------------------------------- Rule Engine Rule| |Gluster: set | |qe_test_coverage flag at QE | |approved BZs -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 7 07:21:54 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 07 Aug 2019 07:21:54 +0000 Subject: [Bugs] [Bug 1732792] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732792 nchilaka changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1696809 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 7 07:21:57 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 07 Aug 2019 07:21:57 +0000 Subject: [Bugs] [Bug 1732792] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732792 RHEL Product and Program Management changed: What |Removed |Added ---------------------------------------------------------------------------- Rule Engine Rule| |Gluster: Auto pm_ack at Eng | |In-Flight RHGS3.5 Blocker | |BZs Flags|rhgs-3.5.0? blocker? |rhgs-3.5.0+ blocker+ Rule Engine Rule| |665 Target Release|--- |RHGS 3.5.0 Rule Engine Rule| |666 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 7 07:32:27 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 07 Aug 2019 07:32:27 +0000 Subject: [Bugs] [Bug 1514683] Removal of bricks in volume isn't prevented if remaining brick doesn't contain all the files In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1514683 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- External Bug ID| |Gluster.org Gerrit 23171 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 7 07:32:28 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 07 Aug 2019 07:32:28 +0000 Subject: [Bugs] [Bug 1514683] Removal of bricks in volume isn't prevented if remaining brick doesn't contain all the files In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1514683 --- Comment #4 from Worker Ant --- REVIEW: https://review.gluster.org/23171 (cli: Add warning for user before remove-brick commit) posted (#1) for review on master by Vishal Pandey -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 7 08:32:10 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 07 Aug 2019 08:32:10 +0000 Subject: [Bugs] [Bug 1732770] fix truncate lock to cover the write in tuncate clean In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732770 Sunil Kumar Acharya changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |MODIFIED CC| |sheggodu at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 7 08:35:46 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 07 Aug 2019 08:35:46 +0000 Subject: [Bugs] [Bug 1731448] [GSS] An Input/Output error happens on a disperse volume when doing unaligned writes to a sparse file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1731448 Ashish Pandey changed: What |Removed |Added ---------------------------------------------------------------------------- Doc Type|If docs needed, set a value |No Doc Update Red Hat Bugzilla changed: What |Removed |Added ---------------------------------------------------------------------------- Doc Type|No Doc Update |No Doc Update -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 7 08:36:06 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 07 Aug 2019 08:36:06 +0000 Subject: [Bugs] [Bug 1732792] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732792 Sunil Kumar Acharya changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |MODIFIED CC| |sheggodu at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 7 08:39:40 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 07 Aug 2019 08:39:40 +0000 Subject: [Bugs] [Bug 1731448] [GSS] An Input/Output error happens on a disperse volume when doing unaligned writes to a sparse file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1731448 Sunil Kumar Acharya changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |MODIFIED CC| |sheggodu at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 7 09:03:00 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 07 Aug 2019 09:03:00 +0000 Subject: [Bugs] [Bug 1735514] Open fd heal should filter O_APPEND/O_EXCL In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1735514 Sunil Kumar Acharya changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |MODIFIED -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 7 09:20:09 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 07 Aug 2019 09:20:09 +0000 Subject: [Bugs] [Bug 1732793] I/O error on writes to a disperse volume when replace-brick is executed In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732793 nchilaka changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1696809 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 7 09:20:11 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 07 Aug 2019 09:20:11 +0000 Subject: [Bugs] [Bug 1732793] I/O error on writes to a disperse volume when replace-brick is executed In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732793 RHEL Product and Program Management changed: What |Removed |Added ---------------------------------------------------------------------------- Rule Engine Rule| |Gluster: Auto pm_ack at Eng | |In-Flight RHGS3.5 Blocker | |BZs Flags|rhgs-3.5.0? blocker? |rhgs-3.5.0+ blocker+ Rule Engine Rule| |665 Target Release|--- |RHGS 3.5.0 Rule Engine Rule| |666 Rule Engine Rule| |327 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 7 09:20:49 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 07 Aug 2019 09:20:49 +0000 Subject: [Bugs] [Bug 1732774] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732774 nchilaka changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1696809 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 7 09:20:52 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 07 Aug 2019 09:20:52 +0000 Subject: [Bugs] [Bug 1732774] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732774 RHEL Product and Program Management changed: What |Removed |Added ---------------------------------------------------------------------------- Rule Engine Rule| |Gluster: Auto pm_ack at Eng | |In-Flight RHGS3.5 Blocker | |BZs Flags|rhgs-3.5.0? blocker? |rhgs-3.5.0+ blocker+ Rule Engine Rule| |665 Target Release|--- |RHGS 3.5.0 Rule Engine Rule| |666 Rule Engine Rule| |327 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 7 09:22:13 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 07 Aug 2019 09:22:13 +0000 Subject: [Bugs] [Bug 1731448] [GSS] An Input/Output error happens on a disperse volume when doing unaligned writes to a sparse file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1731448 nchilaka changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1696809 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 7 11:43:54 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 07 Aug 2019 11:43:54 +0000 Subject: [Bugs] [Bug 1737311] (glusterfs-6.5) - GlusterFS 6.5 tracker In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737311 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-07 11:43:54 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23170 (doc: Added release 6.5 notes) merged (#2) on release-6 by hari gowtham -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Wed Aug 7 13:56:52 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 07 Aug 2019 13:56:52 +0000 Subject: [Bugs] [Bug 1732793] I/O error on writes to a disperse volume when replace-brick is executed In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732793 Atin Mukherjee changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |MODIFIED -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 7 13:57:17 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 07 Aug 2019 13:57:17 +0000 Subject: [Bugs] [Bug 1732774] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732774 Atin Mukherjee changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |MODIFIED CC| |amukherj at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 7 15:47:49 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 07 Aug 2019 15:47:49 +0000 Subject: [Bugs] [Bug 1732875] GlusterFS 7.0 tracker In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732875 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- External Bug ID| |Gluster.org Gerrit 23174 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Wed Aug 7 15:47:50 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 07 Aug 2019 15:47:50 +0000 Subject: [Bugs] [Bug 1732875] GlusterFS 7.0 tracker In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732875 --- Comment #3 from Worker Ant --- REVIEW: https://review.gluster.org/23174 (doc: Added initial release notes for release-7) posted (#1) for review on release-7 by Rinku Kothiya -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Wed Aug 7 17:43:34 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 07 Aug 2019 17:43:34 +0000 Subject: [Bugs] [Bug 1732772] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732772 Sunil Kumar Acharya changed: What |Removed |Added ---------------------------------------------------------------------------- Fixed In Version| |glusterfs-3.12.2-47.4 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 7 17:43:51 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 07 Aug 2019 17:43:51 +0000 Subject: [Bugs] [Bug 1732772] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732772 Sunil Kumar Acharya changed: What |Removed |Added ---------------------------------------------------------------------------- Status|MODIFIED |ON_QA -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 8 05:56:02 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 08 Aug 2019 05:56:02 +0000 Subject: [Bugs] [Bug 1738419] read() returns more than file size when using direct I/O In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1738419 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- External Bug ID| |Gluster.org Gerrit 23175 -- You are receiving this mail because: You are the QA Contact for the bug. You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 8 05:56:04 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 08 Aug 2019 05:56:04 +0000 Subject: [Bugs] [Bug 1738419] read() returns more than file size when using direct I/O In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1738419 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23175 (features/shard: Send correct size when reads are sent beyond file size) posted (#1) for review on master by Krutika Dhananjay -- You are receiving this mail because: You are the QA Contact for the bug. You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 8 06:00:55 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 08 Aug 2019 06:00:55 +0000 Subject: [Bugs] [Bug 1665880] After the shard feature is enabled, the glfs_read will always return the length of the read buffer, no the actual length readed In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1665880 Krutika Dhananjay changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |kdhananj at redhat.com Flags| |needinfo?(xiubli at redhat.com | |) --- Comment #2 from Krutika Dhananjay --- Hi Xiubo, There's a similar bug raised by ovirt team for which I sent a patch at https://review.gluster.org/c/glusterfs/+/23175 to fix it. Could you check if this patch fixes the issue seen in gluster-block as well? -Krutika -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 8 06:06:24 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 08 Aug 2019 06:06:24 +0000 Subject: [Bugs] [Bug 1738763] New: [EC] : fix coverity issue Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1738763 Bug ID: 1738763 Summary: [EC] : fix coverity issue Product: GlusterFS Version: mainline Status: NEW Component: disperse Assignee: bugs at gluster.org Reporter: aspandey at redhat.com CC: bugs at gluster.org Target Milestone: --- Classification: Community Fix a minor coverity issue -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 8 06:06:44 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 08 Aug 2019 06:06:44 +0000 Subject: [Bugs] [Bug 1738763] [EC] : fix coverity issue In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1738763 Ashish Pandey changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 8 06:10:44 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 08 Aug 2019 06:10:44 +0000 Subject: [Bugs] [Bug 1738763] [EC] : fix coverity issue In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1738763 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- External Bug ID| |Gluster.org Gerrit 23176 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 8 06:10:45 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 08 Aug 2019 06:10:45 +0000 Subject: [Bugs] [Bug 1738763] [EC] : fix coverity issue In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1738763 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23176 (cluster/ec: Fix coverity issue.) posted (#1) for review on master by Ashish Pandey -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 8 06:11:15 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 08 Aug 2019 06:11:15 +0000 Subject: [Bugs] [Bug 1193929] GlusterFS can be improved In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1193929 --- Comment #743 from Worker Ant --- REVIEW: https://review.gluster.org/23105 (build: stop suppressing \"Entering/Leaving direcory...\" messages) merged (#2) on master by Amar Tumballi -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 8 06:11:45 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 08 Aug 2019 06:11:45 +0000 Subject: [Bugs] [Bug 1644322] flooding log with "glusterfs-fuse: read from /dev/fuse returned -1 (Operation not permitted)" In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1644322 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-08 06:11:45 --- Comment #4 from Worker Ant --- REVIEW: https://review.gluster.org/22494 (fuse: rate limit reading from fuse device upon receiving EPERM) merged (#7) on master by Amar Tumballi -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 8 07:13:45 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 08 Aug 2019 07:13:45 +0000 Subject: [Bugs] [Bug 1738778] New: Unable to setup softserve VM Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1738778 Bug ID: 1738778 Summary: Unable to setup softserve VM Product: GlusterFS Version: mainline Status: NEW Component: project-infrastructure Assignee: bugs at gluster.org Reporter: ravishankar at redhat.com CC: bugs at gluster.org, gluster-infra at gluster.org Target Milestone: --- Classification: Community Description of problem: After creating a VM from https://softserve.gluster.org/dashboard, when I try to use https://github.com/gluster/softserve/wiki/Running-Regressions-on-loaned-Softserve-instances, it doesn't connect to the VM. This is not just me, I believe even Sac tried it out on his setup and saw the same issue today. When I run `ansible-playbook -v -i inventory regressions-final.yml --become -u centos`, I get: TASK [Gathering Facts] ********************************************************************************************************************************************************************************************************************** fatal: [builder555.cloud.gluster.org]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: kex_exchange_identification: Connection closed by remote host", "unreachable": true} PLAY RECAP ********************************************************************************************************************************************************************************************************************************** builder555.cloud.gluster.org : ok=0 changed=0 unreachable=1 failed=0 skipped=0 rescued=0 ignored=0 I believe this is an infra issue. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 8 07:38:54 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 08 Aug 2019 07:38:54 +0000 Subject: [Bugs] [Bug 1738786] New: ctime: If atime is updated via utimensat syscall ctime is not getting updated Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1738786 Bug ID: 1738786 Summary: ctime: If atime is updated via utimensat syscall ctime is not getting updated Product: GlusterFS Version: mainline Status: NEW Component: ctime Assignee: bugs at gluster.org Reporter: khiremat at redhat.com CC: bugs at gluster.org Target Milestone: --- Classification: Community Description of problem: When atime|mtime is updated via utime family of syscalls, ctime is not updated. Version-Release number of selected component (if applicable): mainline How reproducible: Always Steps to Reproduce: touch /mnt/file1 stat /mnt/file1 sleep 1; touch -m -d "2020-01-01 12:00:00" /mnt/file1 stat /mnt/file1 Actual results: ctime is same between two stats above Expected results: ctime should be changed between two stats above Additional info: -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 8 07:39:08 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 08 Aug 2019 07:39:08 +0000 Subject: [Bugs] [Bug 1738786] ctime: If atime is updated via utimensat syscall ctime is not getting updated In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1738786 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED Assignee|bugs at gluster.org |khiremat at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 8 07:43:18 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 08 Aug 2019 07:43:18 +0000 Subject: [Bugs] [Bug 1738786] ctime: If atime is updated via utimensat syscall ctime is not getting updated In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1738786 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- External Bug ID| |Gluster.org Gerrit 23177 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 8 07:43:19 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 08 Aug 2019 07:43:19 +0000 Subject: [Bugs] [Bug 1738786] ctime: If atime is updated via utimensat syscall ctime is not getting updated In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1738786 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23177 (ctime: Fix ctime issue with utime family of syscalls) posted (#1) for review on master by Kotresh HR -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 8 10:03:04 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 08 Aug 2019 10:03:04 +0000 Subject: [Bugs] [Bug 1737705] ctime: nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737705 RHEL Product and Program Management changed: What |Removed |Added ---------------------------------------------------------------------------- Rule Engine Rule| |Gluster: set | |qe_test_coverage flag at QE | |approved BZs -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 8 10:06:30 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 08 Aug 2019 10:06:30 +0000 Subject: [Bugs] [Bug 1737705] ctime: nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737705 nchilaka changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1696802 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 8 10:07:33 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 08 Aug 2019 10:07:33 +0000 Subject: [Bugs] [Bug 1737705] ctime: nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737705 nchilaka changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks|1696802 |1696809 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 8 10:07:36 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 08 Aug 2019 10:07:36 +0000 Subject: [Bugs] [Bug 1737705] ctime: nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737705 RHEL Product and Program Management changed: What |Removed |Added ---------------------------------------------------------------------------- Rule Engine Rule| |Gluster: Auto pm_ack at Eng | |In-Flight RHGS3.5 Blocker | |BZs Flags|rhgs-3.5.0? blocker? |rhgs-3.5.0+ blocker+ Rule Engine Rule| |665 Target Release|--- |RHGS 3.5.0 Rule Engine Rule| |666 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 8 10:38:26 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 08 Aug 2019 10:38:26 +0000 Subject: [Bugs] [Bug 1738878] New: FUSE client's memory leak Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1738878 Bug ID: 1738878 Summary: FUSE client's memory leak Product: GlusterFS Version: 5 OS: Linux Status: NEW Component: core Severity: high Assignee: bugs at gluster.org Reporter: s.pleshkov at hostco.ru CC: bugs at gluster.org Target Milestone: --- External Bug ID: Red Hat Bugzilla 1623107,Red Hat Bugzilla 1659432 Classification: Community Description of problem: Single FUSE client consume a lot of memory. In our clients production environment, single FUSE client slowly continiously eat memory until killed by OOM case Version-Release number of selected component (if applicable): Servers # gluster --version glusterfs 5.5 rpm -qa | grep glu glusterfs-libs-5.5-1.el7.x86_64 glusterfs-fuse-5.5-1.el7.x86_64 glusterfs-client-xlators-5.5-1.el7.x86_64 centos-release-gluster5-1.0-1.el7.centos.noarch glusterfs-api-5.5-1.el7.x86_64 glusterfs-cli-5.5-1.el7.x86_64 nfs-ganesha-gluster-2.7.1-1.el7.x86_64 glusterfs-5.5-1.el7.x86_64 glusterfs-server-5.5-1.el7.x86_64 Client # gluster --version glusterfs 5.6 # rpm -qa | grep glus glusterfs-api-5.6-1.el7.x86_64 glusterfs-libs-5.6-1.el7.x86_64 glusterfs-cli-5.6-1.el7.x86_64 glusterfs-client-xlators-5.6-1.el7.x86_64 glusterfs-fuse-5.6-1.el7.x86_64 glusterfs-5.6-1.el7.x86_64 libvirt-daemon-driver-storage-gluster-4.5.0-10.el7_6.4.x86_64 How reproducible: Setup glusterfs replication cluster (3 node, replicate) with many of small files. Mount storage with FUSE client, set some process to work with gluster folder Read files metadata and writes files content. This problem rises with one client that have read executetable files and write logs processes (java|c++ programs) from this gluster volume, other clients same gluster volume have not this problem when work with read|write processes. Actual results: RSS memory of FUSE client grows infinitely. Expected results: RSS memory doesn't grow infinitely :) Additional info: Get statedumps from problem client, find this results: pool-name=data_t active-count=40897046 sizeof-type=72 padded-sizeof=128 size=5234821888 shared-pool=0x7f6bf222aca0 pool-name=dict_t active-count=40890978 sizeof-type=160 padded-sizeof=256 size=10468090368 shared-pool=0x7f6bf222acc8 Found similar bug - https://bugzilla.redhat.com/show_bug.cgi?id=1623107 Disabled "readdir-ahead" option to volume, but didn't helped -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 8 10:43:21 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 08 Aug 2019 10:43:21 +0000 Subject: [Bugs] [Bug 1738878] FUSE client's memory leak In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1738878 --- Comment #1 from Sergey Pleshkov --- Share two statedump https://cloud.hostco.ru/s/w9MY6jj5Hpj2qoa -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 8 10:43:44 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 08 Aug 2019 10:43:44 +0000 Subject: [Bugs] [Bug 1738778] Unable to setup softserve VM In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1738778 Deepshikha khandelwal changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |dkhandel at redhat.com --- Comment #1 from Deepshikha khandelwal --- It's working fine for me. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 8 10:52:36 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 08 Aug 2019 10:52:36 +0000 Subject: [Bugs] [Bug 1738878] FUSE client's memory leak In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1738878 Sergey Pleshkov changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |s.pleshkov at hostco.ru --- Comment #2 from Sergey Pleshkov --- Server and Client OS - Red Hat Enterprise Linux Server release 7.6 (Maipo) / Red Hat Enterprise Linux Server release 7.5 (Maipo) When client had gluster client from RH repo - 3.12 vers - situation was the same if it isn't version bug, would you have suggestions what is could be ? Which gluster volume options check and so on -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 8 11:08:14 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 08 Aug 2019 11:08:14 +0000 Subject: [Bugs] [Bug 1738778] Unable to setup softserve VM In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1738778 --- Comment #2 from Ravishankar N --- I'm trying this on Fedora 30. Here is the verbose output if it helps. I can ssh into the VM as centos user just fine. --------------------------------------------------------------------------------------------------------------------------------- fatal: [builder500.cloud.gluster.org]: UNREACHABLE! => { "changed": false, "msg": "Failed to connect to the host via ssh: OpenSSH_8.0p1, OpenSSL 1.1.1c FIPS 28 May 2019\r\ndebug1: Reading configuration data /home/ravi/.ssh/config\r\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug3: /etc/ssh/ssh_config line 51: Including file /etc/ssh/ssh_config.d/05-redhat.conf depth 0\r\ndebug1: Reading configuration data /etc/ssh/ssh_config.d/05-redhat.conf\r\ndebug2: checking match for 'final all' host 18.219.69.93 originally 18.219.69.93\r\ndebug3: /etc/ssh/ssh_config.d/05-redhat.conf line 3: not matched 'final'\r\ndebug2: match not found\r\ndebug3: /etc/ssh/ssh_config.d/05-redhat.conf line 5: Including file /etc/crypto-policies/back-ends/openssh.config depth 1 (parse only)\r\ndebug1: Reading configuration data /etc/crypto-policies/back-ends/openssh.config\r\ndebug3: gss kex names ok: [gss-gex-sha1-,gss-group14-sha1-,gss-group1-sha1-]\r\ndebug3: kex names ok: [curve25519-sha256,curve25519-sha256 at libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group14-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1,diffie-hellman-group1-sha1]\r\ndebug1: configuration requests final Match pass\r\ndebug2: resolve_canonicalize: hostname 18.219.69.93 is address\r\ndebug1: re-parsing configuration\r\ndebug1: Reading configuration data /home/ravi/.ssh/config\r\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug3: /etc/ssh/ssh_config line 51: Including file /etc/ssh/ssh_config.d/05-redhat.conf depth 0\r\ndebug1: Reading configuration data /etc/ssh/ssh_config.d/05-redhat.conf\r\ndebug2: checking match for 'final all' host 18.219.69.93 originally 18.219.69.93\r\ndebug3: /etc/ssh/ssh_config.d/05-redhat.conf line 3: matched 'final'\r\ndebug2: match found\r\ndebug3: /etc/ssh/ssh_config.d/05-redhat.conf line 5: Including file /etc/crypto-policies/back-ends/openssh.config depth 1\r\ndebug1: Reading configuration data /etc/crypto-policies/back-ends/openssh.config\r\ndebug3: gss kex names ok: [gss-gex-sha1-,gss-group14-sha1-,gss-group1-sha1-]\r\ndebug3: kex names ok: [curve25519-sha256,curve25519-sha256 at libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group14-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1,diffie-hellman-group1-sha1]\r\ndebug1: auto-mux: Trying existing master\r\ndebug1: Control socket \"/home/ravi/.ansible/cp/d35c8610b6\" does not exist\r\ndebug1: Executing proxy command: exec ssh -q -W 18.219.69.93:22 root at logs.aws.gluster.org\r\ndebug3: timeout: 10000 ms remain after connect\r\ndebug1: identity file /home/ravi/.ssh/id_rsa type 0\r\ndebug1: identity file /home/ravi/.ssh/id_rsa-cert type -1\r\ndebug1: identity file /home/ravi/.ssh/id_dsa type -1\r\ndebug1: identity file /home/ravi/.ssh/id_dsa-cert type -1\r\ndebug1: identity file /home/ravi/.ssh/id_ecdsa type -1\r\ndebug1: identity file /home/ravi/.ssh/id_ecdsa-cert type -1\r\ndebug1: identity file /home/ravi/.ssh/id_ed25519 type -1\r\ndebug1: identity file /home/ravi/.ssh/id_ed25519-cert type -1\r\ndebug1: identity file /home/ravi/.ssh/id_xmss type -1\r\ndebug1: identity file /home/ravi/.ssh/id_xmss-cert type -1\r\ndebug1: Local version string SSH-2.0-OpenSSH_8.0\r\nkex_exchange_identification: Connection closed by remote host", "unreachable": true } PLAY RECAP ********************************************************************************************************************************************************************************************************************************** builder500.cloud.gluster.org : ok=0 changed=0 unreachable=1 failed=0 skipped=0 rescued=0 ignored=0 --------------------------------------------------------------------------------------------------------------------------------- -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 8 19:43:57 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 08 Aug 2019 19:43:57 +0000 Subject: [Bugs] [Bug 1193929] GlusterFS can be improved In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1193929 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23178 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 8 19:43:58 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 08 Aug 2019 19:43:58 +0000 Subject: [Bugs] [Bug 1193929] GlusterFS can be improved In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1193929 --- Comment #744 from Worker Ant --- REVIEW: https://review.gluster.org/23178 ([WIP]client_t.c: removal of dead code.) posted (#1) for review on master by Yaniv Kaul -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 03:01:22 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 03:01:22 +0000 Subject: [Bugs] [Bug 1716979] Multiple disconnect events being propagated for the same child In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1716979 Amgad changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |amgad.saleh at nokia.com --- Comment #5 from Amgad --- This is a serious bug and blocking deployments -- I don't see it in the 6.x stream! what release and when it's released? -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 03:29:40 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 03:29:40 +0000 Subject: [Bugs] [Bug 1739320] New: The result (hostname) of getnameinfo for all bricks (ipv6 addresses) are the same, while they are not. Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739320 Bug ID: 1739320 Summary: The result (hostname) of getnameinfo for all bricks (ipv6 addresses) are the same, while they are not. Product: GlusterFS Version: 6 Hardware: All OS: Linux Status: NEW Component: glusterd Severity: urgent Assignee: bugs at gluster.org Reporter: amgad.saleh at nokia.com CC: bugs at gluster.org Target Milestone: --- Classification: Community Description of problem: When creating a volume using IPv6, failed with an error that bricks are on same hostname while they are not. The result (hostname) of getnameinfo for all bricks (ipv6 addresses) are the same, while they are not. Version-Release number of selected component (if applicable): 6.3-1 and 6.4-1 How reproducible: Steps to Reproduce: 1. Create a volume with replica 3 using the command: gluster --mode=script volume create vol_b6b4f444031cb86c969f3fc744f2e999 replica 3 2001:db8:1234::10:/root/test/a 2001:db8:1234::5:/root/test/a 2001:db8:1234::14:/root/test/a 2. Error happens that all bricks on the same hostname 3. check those addresses using nslookup which shows the opposite, those IP belongs to different hostnames Actual results: =============== # gluster --mode=script volume create vol_b6b4f444031cb86c969f3fc744f2e999 replica 3 2001:db8:1234::10:/root/test/a 2001:db8:1234::5:/root/test/a 2001:db8:1234::14:/root/test/a volume create: vol_b6b4f444031cb86c969f3fc744f2e999: failed: Multiple bricks of a replicate volume are present on the same server. This setup is not optimal. Bricks should be on different nodes to have best fault tolerant configuration. Use 'force' at the end of the command if you want to override this behavior. # nslookup 2001:db8:1234::10 Server: 2001:db8:1234::5 Address: 2001:db8:1234::5#53 0.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.4.3.2.1.8.b.d.0.1.0.0.2.ip6.arpa name = roger-1812-we-01. # nslookup 2001:db8:1234::5 Server: 2001:db8:1234::5 Address: 2001:db8:1234::5#53 5.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.4.3.2.1.8.b.d.0.1.0.0.2.ip6.arpa name = roger-1903-we-01. # nslookup 2001:db8:1234::14 Server: 2001:db8:1234::5 Address: 2001:db8:1234::5#53 4.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.4.3.2.1.8.b.d.0.1.0.0.2.ip6.arpa name = roger-1812-cwes-01. Expected results: Volume should succeed Additional info: Here are the code snippets from glusterfs 6.4, which has the problem For some reason, the result (hostname) of getnameinfo for all bricks (ipv6 addresses) are the same, while actually they are not. ================ xlators/mgmt/glusterd/src/glusterd-volume-ops.c gf_ai_compare_t glusterd_compare_addrinfo(struct addrinfo *first, struct addrinfo *next) { int ret = -1; struct addrinfo *tmp1 = NULL; struct addrinfo *tmp2 = NULL; char firstip[NI_MAXHOST] = {0.}; char nextip[NI_MAXHOST] = { 0, }; for (tmp1 = first; tmp1 != NULL; tmp1 = tmp1->ai_next) { ret = getnameinfo(tmp1->ai_addr, tmp1->ai_addrlen, firstip, NI_MAXHOST, NULL, 0, NI_NUMERICHOST); if (ret) return GF_AI_COMPARE_ERROR; for (tmp2 = next; tmp2 != NULL; tmp2 = tmp2->ai_next) { ret = getnameinfo(tmp2->ai_addr, tmp2->ai_addrlen, nextip, NI_MAXHOST, NULL, 0, NI_NUMERICHOST); if (ret) return GF_AI_COMPARE_ERROR; if (!strcmp(firstip, nextip)) { return GF_AI_COMPARE_MATCH; } } } return GF_AI_COMPARE_NO_MATCH; } ... if (GF_AI_COMPARE_MATCH == ret) goto found_bad_brick_order; ... found_bad_brick_order: gf_msg(this->name, GF_LOG_INFO, 0, GD_MSG_BAD_BRKORDER, "Bad brick order found"); if (type == GF_CLUSTER_TYPE_DISPERSE) { snprintf(err_str, sizeof(found_string), found_string, "disperse"); } else { snprintf(err_str, sizeof(found_string), found_string, "replicate"); } .... const char found_string[2048] = "Multiple bricks of a %s " "volume are present on the same server. This " "setup is not optimal. Bricks should be on " "different nodes to have best fault tolerant " "configuration. Use 'force' at the end of the " "command if you want to override this " "behavior. "; -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 04:39:55 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 04:39:55 +0000 Subject: [Bugs] [Bug 1739334] New: Multiple disconnect events being propagated for the same child Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739334 Bug ID: 1739334 Summary: Multiple disconnect events being propagated for the same child Product: GlusterFS Version: 7 OS: Linux Status: NEW Component: rpc Keywords: Regression Severity: high Priority: high Assignee: bugs at gluster.org Reporter: ravishankar at redhat.com CC: amgad.saleh at nokia.com, amukherj at redhat.com, bugs at gluster.org, ravishankar at redhat.com, rgowdapp at redhat.com, rhinduja at redhat.com, rhs-bugs at redhat.com, sankarshan at redhat.com, sheggodu at redhat.com Depends On: 1703423, 1716979 Target Milestone: --- Classification: Community Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1703423 [Bug 1703423] Multiple disconnect events being propagated for the same child https://bugzilla.redhat.com/show_bug.cgi?id=1716979 [Bug 1716979] Multiple disconnect events being propagated for the same child -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 04:39:55 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 04:39:55 +0000 Subject: [Bugs] [Bug 1716979] Multiple disconnect events being propagated for the same child In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1716979 Ravishankar N changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1739334 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1739334 [Bug 1739334] Multiple disconnect events being propagated for the same child -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 04:40:59 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 04:40:59 +0000 Subject: [Bugs] [Bug 1739334] Multiple disconnect events being propagated for the same child In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739334 Ravishankar N changed: What |Removed |Added ---------------------------------------------------------------------------- Comment #0 is|1 |0 private| | Keywords|Regression | Status|NEW |ASSIGNED Assignee|bugs at gluster.org |ravishankar at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 04:43:14 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 04:43:14 +0000 Subject: [Bugs] [Bug 1739334] Multiple disconnect events being propagated for the same child In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739334 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23179 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 04:43:15 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 04:43:15 +0000 Subject: [Bugs] [Bug 1739334] Multiple disconnect events being propagated for the same child In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739334 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23179 (protocol/client: propagte GF_EVENT_CHILD_PING only for connections to brick) posted (#1) for review on release-7 by Ravishankar N -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 04:44:16 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 04:44:16 +0000 Subject: [Bugs] [Bug 1739335] New: Multiple disconnect events being propagated for the same child Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739335 Bug ID: 1739335 Summary: Multiple disconnect events being propagated for the same child Product: GlusterFS Version: 6 OS: Linux Status: NEW Component: rpc Keywords: Regression Severity: high Priority: high Assignee: bugs at gluster.org Reporter: ravishankar at redhat.com CC: amgad.saleh at nokia.com, amukherj at redhat.com, bugs at gluster.org, ravishankar at redhat.com, rgowdapp at redhat.com, rhinduja at redhat.com, rhs-bugs at redhat.com, sankarshan at redhat.com, sheggodu at redhat.com Depends On: 1703423, 1716979 Blocks: 1739334 Target Milestone: --- Classification: Community +++ This bug was initially created as a clone of Bug #1716979 +++ +++ This bug was initially created as a clone of Bug #1703423 +++ Description of problem: Issue was reported upstream by a user via https://github.com/gluster/glusterfs/issues/648 I'm seeing that if I kill a brick in a replica 3 system, AFR keeps getting child_down event repeatedly for the same child. This seems to be a regression in behaviour as it does not occur in rhgs-3.4.0. In 3.4.0, I get exactly one GF_EVENT_CHILD_DOWN for 1 disconnect. Version-Release number of selected component (if applicable): rhgs-3.5 branch (source install) How reproducible: Always. Steps to Reproduce: 1. Create a replica 3 volume and start it. 2. Put a break point in __afr_handle_child_down_event() in glustershd process. 3. Kill any one brick. Actual results: The break point keeps getting hit once every 3 seconds or so repeatedly. Expected results: Only 1 event per one disconnect. Additional info: I haven't checked if the same happens for GF_EVENT_CHILD_UP as well. I think this is regression that needs to be fixed. If this is not a bug please feel free to close stating why. Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1703423 [Bug 1703423] Multiple disconnect events being propagated for the same child https://bugzilla.redhat.com/show_bug.cgi?id=1716979 [Bug 1716979] Multiple disconnect events being propagated for the same child https://bugzilla.redhat.com/show_bug.cgi?id=1739334 [Bug 1739334] Multiple disconnect events being propagated for the same child -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 04:44:16 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 04:44:16 +0000 Subject: [Bugs] [Bug 1716979] Multiple disconnect events being propagated for the same child In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1716979 Ravishankar N changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1739335 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1739335 [Bug 1739335] Multiple disconnect events being propagated for the same child -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 04:44:16 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 04:44:16 +0000 Subject: [Bugs] [Bug 1739334] Multiple disconnect events being propagated for the same child In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739334 Ravishankar N changed: What |Removed |Added ---------------------------------------------------------------------------- Depends On| |1739335 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1739335 [Bug 1739335] Multiple disconnect events being propagated for the same child -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 04:44:56 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 04:44:56 +0000 Subject: [Bugs] [Bug 1739335] Multiple disconnect events being propagated for the same child In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739335 Ravishankar N changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords|Regression | Status|NEW |ASSIGNED CC|amukherj at redhat.com, | |rhinduja at redhat.com, | |rhs-bugs at redhat.com, | |sankarshan at redhat.com, | |sheggodu at redhat.com | -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 04:47:00 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 04:47:00 +0000 Subject: [Bugs] [Bug 1739335] Multiple disconnect events being propagated for the same child In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739335 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23180 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 04:47:01 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 04:47:01 +0000 Subject: [Bugs] [Bug 1739335] Multiple disconnect events being propagated for the same child In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739335 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23180 (protocol/client: propagte GF_EVENT_CHILD_PING only for connections to brick) posted (#1) for review on release-6 by Ravishankar N -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 05:02:48 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 05:02:48 +0000 Subject: [Bugs] [Bug 1739336] New: Multiple disconnect events being propagated for the same child Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739336 Bug ID: 1739336 Summary: Multiple disconnect events being propagated for the same child Product: GlusterFS Version: 5 OS: Linux Status: NEW Component: rpc Keywords: Regression Severity: high Priority: high Assignee: bugs at gluster.org Reporter: ravishankar at redhat.com CC: amgad.saleh at nokia.com, amukherj at redhat.com, bugs at gluster.org, ravishankar at redhat.com, rgowdapp at redhat.com, rhinduja at redhat.com, rhs-bugs at redhat.com, sankarshan at redhat.com, sheggodu at redhat.com Depends On: 1703423, 1716979 Blocks: 1739334, 1739335 Target Milestone: --- Classification: Community +++ This bug was initially created as a clone of Bug #1716979 +++ +++ This bug was initially created as a clone of Bug #1703423 +++ Description of problem: Issue was reported upstream by a user via https://github.com/gluster/glusterfs/issues/648 I'm seeing that if I kill a brick in a replica 3 system, AFR keeps getting child_down event repeatedly for the same child. This seems to be a regression in behaviour as it does not occur in rhgs-3.4.0. In 3.4.0, I get exactly one GF_EVENT_CHILD_DOWN for 1 disconnect. Version-Release number of selected component (if applicable): rhgs-3.5 branch (source install) How reproducible: Always. Steps to Reproduce: 1. Create a replica 3 volume and start it. 2. Put a break point in __afr_handle_child_down_event() in glustershd process. 3. Kill any one brick. Actual results: The break point keeps getting hit once every 3 seconds or so repeatedly. Expected results: Only 1 event per one disconnect. Additional info: I haven't checked if the same happens for GF_EVENT_CHILD_UP as well. I think this is regression that needs to be fixed. If this is not a bug please feel free to close stating why. Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1703423 [Bug 1703423] Multiple disconnect events being propagated for the same child https://bugzilla.redhat.com/show_bug.cgi?id=1716979 [Bug 1716979] Multiple disconnect events being propagated for the same child https://bugzilla.redhat.com/show_bug.cgi?id=1739334 [Bug 1739334] Multiple disconnect events being propagated for the same child https://bugzilla.redhat.com/show_bug.cgi?id=1739335 [Bug 1739335] Multiple disconnect events being propagated for the same child -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 05:02:48 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 05:02:48 +0000 Subject: [Bugs] [Bug 1716979] Multiple disconnect events being propagated for the same child In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1716979 Ravishankar N changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1739336 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1739336 [Bug 1739336] Multiple disconnect events being propagated for the same child -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 05:02:48 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 05:02:48 +0000 Subject: [Bugs] [Bug 1739334] Multiple disconnect events being propagated for the same child In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739334 Ravishankar N changed: What |Removed |Added ---------------------------------------------------------------------------- Depends On| |1739336 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1739336 [Bug 1739336] Multiple disconnect events being propagated for the same child -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 05:02:48 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 05:02:48 +0000 Subject: [Bugs] [Bug 1739335] Multiple disconnect events being propagated for the same child In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739335 Ravishankar N changed: What |Removed |Added ---------------------------------------------------------------------------- Depends On| |1739336 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1739336 [Bug 1739336] Multiple disconnect events being propagated for the same child -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 05:03:26 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 05:03:26 +0000 Subject: [Bugs] [Bug 1739336] Multiple disconnect events being propagated for the same child In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739336 Ravishankar N changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords|Regression | Status|NEW |ASSIGNED CC|amukherj at redhat.com, | |rhinduja at redhat.com, | |rhs-bugs at redhat.com, | |sankarshan at redhat.com, | |sheggodu at redhat.com | Assignee|bugs at gluster.org |ravishankar at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 05:05:14 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 05:05:14 +0000 Subject: [Bugs] [Bug 1716979] Multiple disconnect events being propagated for the same child In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1716979 --- Comment #6 from Ravishankar N --- I've sent the backport to the current release branches: https://review.gluster.org/#/q/topic:ref-1716979+(status:open+OR+status:merged) -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 05:05:16 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 05:05:16 +0000 Subject: [Bugs] [Bug 1739336] Multiple disconnect events being propagated for the same child In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739336 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23181 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 05:05:17 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 05:05:17 +0000 Subject: [Bugs] [Bug 1739336] Multiple disconnect events being propagated for the same child In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739336 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23181 (protocol/client: propagte GF_EVENT_CHILD_PING only for connections to brick) posted (#1) for review on release-5 by Ravishankar N -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 05:08:48 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 05:08:48 +0000 Subject: [Bugs] [Bug 1739337] New: DHT: severe memory leak in dht rename Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739337 Bug ID: 1739337 Summary: DHT: severe memory leak in dht rename Product: GlusterFS Version: 7 Status: NEW Component: distribute Severity: high Priority: high Assignee: bugs at gluster.org Reporter: nbalacha at redhat.com CC: bugs at gluster.org, rhs-bugs at redhat.com, sankarshan at redhat.com, storage-qa-internal at redhat.com, tdesala at redhat.com Depends On: 1722512, 1722698 Blocks: 1726294 Target Milestone: --- Classification: Community +++ This bug was initially created as a clone of Bug #1722698 +++ +++ This bug was initially created as a clone of Bug #1722512 +++ Description of problem: The dht rename codepath has a severe leak if it needs to create a linkto file. Version-Release number of selected component (if applicable): How reproducible: Consistently Steps to Reproduce: 1. Create a 2x3 distribute replicate volume and fuse mount it. 2. Create 2 directories, dir1 and dir1-new in the root of the volume 3. Find 2 filenames which will hash to different subvols when created in these directories. For example, in my setup dir1/file-1 and dir1-new/newfile-1 hash to different subvols. This is necessary as the leak is in the path which creates a linkto file. 4. Run the following script and watch the memory usage for the mount process using top. Actual results: Memory rises steadily. Statedumps show that the number of active inodes keeps increasing. Expected results: Memory should not increase as there is a single file on the volume. Additional info: --- Additional comment from Nithya Balachandran on 2019-06-20 14:10:03 UTC --- (In reply to Nithya Balachandran from comment #0) > Description of problem: > > The dht rename codepath has a severe leak if it needs to create a linkto > file. > Version-Release number of selected component (if applicable): > > > How reproducible: > Consistently > > Steps to Reproduce: > 1. Create a 2x3 distribute replicate volume and fuse mount it. > 2. Create 2 directories, dir1 and dir1-new in the root of the volume > 3. Find 2 filenames which will hash to different subvols when created in > these directories. For example, in my setup dir1/file-1 and > dir1-new/newfile-1 hash to different subvols. This is necessary as the leak > is in the path which creates a linkto file. > 4. Run the following script and watch the memory usage for the mount process > using top. > > Forgot to mention the script in the description: while (true); do for in in {1..20000}; do touch /mnt/fuse1/dir1/file-1; mv -f /mnt/fuse1/dir1/file-1 /mnt/fuse1/dir1-new/newfile-1; done ;rm -rf /mnt/fuse1/dir1-new/*; done --- Additional comment from Worker Ant on 2019-06-21 03:38:32 UTC --- REVIEW: https://review.gluster.org/22912 (cluster/dht: Fixed a memleak in dht_rename_cbk) posted (#1) for review on master by N Balachandran --- Additional comment from Worker Ant on 2019-07-02 11:52:04 UTC --- REVIEW: https://review.gluster.org/22912 (cluster/dht: Fixed a memleak in dht_rename_cbk) merged (#5) on master by N Balachandran Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1722512 [Bug 1722512] DHT: severe memory leak in dht rename https://bugzilla.redhat.com/show_bug.cgi?id=1722698 [Bug 1722698] DHT: severe memory leak in dht rename https://bugzilla.redhat.com/show_bug.cgi?id=1726294 [Bug 1726294] DHT: severe memory leak in dht rename -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 05:08:48 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 05:08:48 +0000 Subject: [Bugs] [Bug 1722698] DHT: severe memory leak in dht rename In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1722698 Nithya Balachandran changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1739337 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1739337 [Bug 1739337] DHT: severe memory leak in dht rename -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 05:08:48 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 05:08:48 +0000 Subject: [Bugs] [Bug 1726294] DHT: severe memory leak in dht rename In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1726294 Nithya Balachandran changed: What |Removed |Added ---------------------------------------------------------------------------- Depends On| |1739337 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1739337 [Bug 1739337] DHT: severe memory leak in dht rename -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 05:11:40 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 05:11:40 +0000 Subject: [Bugs] [Bug 1739337] DHT: severe memory leak in dht rename In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739337 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23182 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 05:11:42 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 05:11:42 +0000 Subject: [Bugs] [Bug 1739337] DHT: severe memory leak in dht rename In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739337 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23182 (cluster/dht: Fixed a memleak in dht_rename_cbk) posted (#1) for review on release-7 by N Balachandran -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 05:59:23 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 05:59:23 +0000 Subject: [Bugs] [Bug 1654778] Please update GlusterFS documentation to describe how to do a non-root install In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1654778 spamecha at redhat.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |CLOSED Resolution|--- |NOTABUG Last Closed| |2019-08-09 05:59:23 --- Comment #3 from spamecha at redhat.com --- Its mentioned in docs https://docs.gluster.org/en/latest/Developer-guide/Building-GlusterFS/#running-glusterfs -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 06:50:59 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 06:50:59 +0000 Subject: [Bugs] [Bug 1665880] After the shard feature is enabled, the glfs_read will always return the length of the read buffer, no the actual length readed In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1665880 Xiubo Li changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(xiubli at redhat.com | |) | --- Comment #3 from Xiubo Li --- (In reply to Krutika Dhananjay from comment #2) > Hi Xiubo, > > There's a similar bug raised by ovirt team for which I sent a patch at > https://review.gluster.org/c/glusterfs/+/23175 to fix it. > Could you check if this patch fixes the issue seen in gluster-block as well? > @Krutika Tested it, I couldn't reproduce it any more with this. Thanks BRs -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 07:06:56 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 07:06:56 +0000 Subject: [Bugs] [Bug 1554286] Xattr not updated if increasing the retention of a WORM/Retained file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1554286 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23184 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 07:06:57 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 07:06:57 +0000 Subject: [Bugs] [Bug 1554286] Xattr not updated if increasing the retention of a WORM/Retained file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1554286 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |POST --- Comment #3 from Worker Ant --- REVIEW: https://review.gluster.org/23184 (WIP: xlator update on changing access time of a WORM-Aretained file) posted (#1) for review on master by Vishal Pandey -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 07:16:53 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 07:16:53 +0000 Subject: [Bugs] [Bug 1739360] New: [GNFS] gluster crash with nfs.nlm off Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739360 Bug ID: 1739360 Summary: [GNFS] gluster crash with nfs.nlm off Product: GlusterFS Version: mainline Hardware: x86_64 Status: NEW Component: nfs Assignee: bugs at gluster.org Reporter: xiechanglong at cmss.chinamobile.com CC: bugs at gluster.org Target Milestone: --- Classification: Community Description of problem: gnfs crash with nfs.nlm disable [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/run/gluster/n'. Program terminated with signal 11, Segmentation fault. #0 nlm_priv (this=this at entry=0x7eff74017450) at nlm4.c:2746 2746 gf_proc_dump_write (key, "%s\n", client->caller_name); Missing separate debuginfos, use: debuginfo-install glibc-2.17-157.el7_3.5.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-19.el7.x86_64 libacl-2.2.51-12.el7.x86_64 libattr-2.4.46-12.el7.x86_64 libcom_err-1.42.9-9.el7.x86_64 libgcc-4.8.5-36.el7.x86_64 libselinux-2.5-6.el7.x86_64 libuuid-2.23.2-33.el7_3.2.x86_64 openssl-libs-1.0.2k-16.1.el7.bclinux.x86_64 pcre-8.32-15.el7_2.1.x86_64 sssd-client-1.14.0-43.el7_3.18.x86_64 zlib-1.2.7-17.el7.x86_64 (gdb) bt #0 nlm_priv (this=this at entry=0x7eff74017450) at nlm4.c:2746 #1 0x00007eff78e400fd in nfs_priv (this=0x7eff74017450) at nfs.c:1702 #2 0x00007eff873cf66d in gf_proc_dump_xlator_info (top=) at statedump.c:502 #3 0x00007eff873cfc00 in gf_proc_dump_info (signum=signum at entry=10, ctx=0x7eff87fd4010) at statedump.c:837 #4 0x00007eff8789c05e in glusterfs_sigwaiter (arg=) at glusterfsd.c:2083 #5 0x00007eff86202dc5 in start_thread () from /lib64/libpthread.so.0 #6 0x00007eff85a2c76d in clone () from /lib64/libc.so.6 Version-Release number of selected component (if applicable): master branch How reproducible: 1) gluster v set nfs.nlm off 2) gluster v set nfs.disable on 3) gluster v set nfs.disable off 4) kill -SIGUSR1 // gnfs crash here Steps to Reproduce: 1. 2. 3. Actual results: gnfs crash Expected results: gnfs works well with statedump files Additional info: -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 07:26:36 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 07:26:36 +0000 Subject: [Bugs] [Bug 1739360] [GNFS] gluster crash with nfs.nlm off In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739360 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23185 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 07:26:37 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 07:26:37 +0000 Subject: [Bugs] [Bug 1739360] [GNFS] gluster crash with nfs.nlm off In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739360 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23185 (nlm: preinitialize nlm_client_list) posted (#1) for review on master by Xie Changlong -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 08:47:48 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 08:47:48 +0000 Subject: [Bugs] [Bug 1524058] gluster peer command stops working with unhelpful error messages when DNS doens't work In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1524058 --- Comment #3 from Vishal Pandey --- @nh2 Can you please try to reproduct this issue again on the latest releases ? -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 08:52:20 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 08:52:20 +0000 Subject: [Bugs] [Bug 1739399] New: [Ganesha]: truncate operation not updating the ctime Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739399 Bug ID: 1739399 Summary: [Ganesha]: truncate operation not updating the ctime Product: GlusterFS Version: 7 Status: ASSIGNED Component: posix Keywords: Triaged Severity: high Assignee: jthottan at redhat.com Reporter: jthottan at redhat.com CC: amukherj at redhat.com, bugs at gluster.org, dang at redhat.com, ffilz at redhat.com, grajoria at redhat.com, jthottan at redhat.com, khiremat at redhat.com, kkeithle at redhat.com, mbenjamin at redhat.com, msaini at redhat.com, rhs-bugs at redhat.com, sankarshan at redhat.com, skoduri at redhat.com, storage-qa-internal at redhat.com, vdas at redhat.com Depends On: 1723761 Target Milestone: --- Classification: Community +++ This bug was initially created as a clone of Bug #1723761 +++ +++ This bug was initially created as a clone of Bug #1720163 +++ Description of problem: truncate/00.t and chown/00.t tests are failing in posix compliance when ran over 3.5.0 build v3 mount.These used to pass in 3.4.0 release ------------------- # prove -r /home/ntfs-3g-pjd-fstest/tests/chown/00.t /home/ntfs-3g-pjd-fstest/tests/chown/00.t .. Failed 1/171 subtests Test Summary Report ------------------- /home/ntfs-3g-pjd-fstest/tests/chown/00.t (Wstat: 0 Tests: 171 Failed: 1) Failed test: 118 Files=1, Tests=171, 17 wallclock secs ( 0.03 usr 0.01 sys + 0.52 cusr 0.73 csys = 1.29 CPU) Result: FAIL # prove -r /home/ntfs-3g-pjd-fstest/tests/truncate/00.t /home/ntfs-3g-pjd-fstest/tests/truncate/00.t .. Failed 1/21 subtests Test Summary Report ------------------- /home/ntfs-3g-pjd-fstest/tests/truncate/00.t (Wstat: 0 Tests: 21 Failed: 1) Failed test: 15 Files=1, Tests=21, 2 wallclock secs ( 0.01 usr 0.01 sys + 0.07 cusr 0.10 csys = 0.19 CPU) Result: FAIL Version-Release number of selected component (if applicable): # rpm -qa | grep ganesha nfs-ganesha-gluster-2.7.3-4.el7rhgs.x86_64 nfs-ganesha-debuginfo-2.7.3-4.el7rhgs.x86_64 glusterfs-ganesha-6.0-5.el7rhgs.x86_64 nfs-ganesha-2.7.3-4.el7rhgs.x86_64 # cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.7 Beta (Maipo) How reproducible: 3/3 Steps to Reproduce: 1.Create 4 node Ganesha cluster 2.Create 12*3 Distributed-Replicate Volume. 3.Export the volume via Ganesha 4.Mount the volume on client via v3.0 5.Run posix compliance test Actual results: =============== truncate/00.t and chown/00.t tests were failing with 3.5.0. These test used to pass with older LIVE bits 3.5.0 ------ Test Summary Report ------------------- /home/ntfs-3g-pjd-fstest/tests/chmod/00.t (Wstat: 0 Tests: 106 Failed: 2) Failed tests: 31, 39 /home/ntfs-3g-pjd-fstest/tests/chown/00.t (Wstat: 0 Tests: 171 Failed: 1) Failed test: 118 /home/ntfs-3g-pjd-fstest/tests/ftruncate/00.t (Wstat: 0 Tests: 26 Failed: 1) Failed test: 24 /home/ntfs-3g-pjd-fstest/tests/mknod/03.t (Wstat: 0 Tests: 12 Failed: 9) Failed tests: 1-3, 5-7, 9-11 /home/ntfs-3g-pjd-fstest/tests/truncate/00.t (Wstat: 0 Tests: 21 Failed: 1) Failed test: 15 3.4.0 ----- Test Summary Report ------------------- /root/ntfs-3g-pjd-fstest/tests/chmod/00.t (Wstat: 0 Tests: 106 Failed: 2) Failed tests: 31, 39 /root/ntfs-3g-pjd-fstest/tests/ftruncate/00.t (Wstat: 0 Tests: 26 Failed: 1) Failed test: 24 /root/ntfs-3g-pjd-fstest/tests/link/00.t (Wstat: 0 Tests: 82 Failed: 4) Failed tests: 57-58, 64-65 /root/ntfs-3g-pjd-fstest/tests/mknod/03.t (Wstat: 0 Tests: 12 Failed: 9) Failed tests: 1-3, 5-7, 9-11 Expected results: ========== truncate/00.t and chown/00.t tests should pass Additional info: --- Additional comment from Jiffin on 2019-06-13 10:29:54 UTC --- Failures cases: ctime is not getting updated properly for truncate operation on v3 and v4 mounts(truncate) ctime is not getting updated properly for mkdir in v3 operation (chown), it seems to be hard to believe for me every other operation apart from mkdir is passing(create,mkfifo, symlink) If possible can u please try to olders nfs-clients than rhel 7.7 --- Additional comment from Manisha Saini on 2019-06-13 19:44:29 UTC --- (In reply to Jiffin from comment #5) > Failures cases: > > ctime is not getting updated properly for truncate operation on v3 and v4 > mounts(truncate) > > ctime is not getting updated properly for mkdir in v3 operation (chown), it > seems to be hard to believe for me every other operation apart from mkdir is > passing(create,mkfifo, symlink) > > If possible can u please try to olders nfs-clients than rhel 7.7 Hi Jiffin, Posix compliance on 3.4.0 was ran with RHEL 7.7 clients only.Hence it does not seems to be RHEL client issue since its passing with 3.4.0+ RHEL 7.7 but failing with 3.5.0+ RHEL 7.7 --- Additional comment from Jiffin on 2019-06-18 09:56:20 UTC --- As I mentioned before the failure for truncate operation is genuine. But I was able to pass test case for chown if I increase the sleep from 1s to 2s, so most probably a timing issue. --- Additional comment from RHEL Product and Program Management on 2019-06-20 06:14:28 UTC --- This BZ is being approved for the RHGS 3.5.0 release, upon receipt of the 3 ACKs (PM,Devel,QA) for the release flag 'rhgs-3.5.0', and on being attached to an official RHGS 3.5.0 BZ Tracker --- Additional comment from Jiffin on 2019-06-24 09:26:33 UTC --- The test was in the following part # successful truncate(2) updates ctime. expect 0 create ${n0} 0644 ctime1=`${fstest} stat ${n0} ctime` sleep 1 expect 0 truncate ${n0} 123 ctime2=`${fstest} stat ${n0} ctime` test_check $ctime1 -lt $ctime2 ---> failure. The ctime1 and ctime2 was same while running this test nfs mounts. expect 0 unlink ${n0} But when I tried to create the same scenario manually and via test script. In both cases the ctime is getting updated properly. # vim /root/ntfs-3g-pjd-fstest/tests/truncate/00.t -sh-4.2# touch foo -sh-4.2# stat foo File: ?foo? Size: 0 Blocks: 0 IO Block: 1048576 regular empty file Device: 2bh/43d Inode: 9680437269025099955 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 99/ nobody) Gid: ( 99/ nobody) Context: system_u:object_r:nfs_t:s0 Access: 2019-06-21 15:10:43.000000000 +0530 Modify: 2019-06-21 15:10:43.000000000 +0530 Change: 2019-06-21 15:10:43.000000000 +0530 Birth: - -sh-4.2# truncate -s 0 foo -sh-4.2# stat foo File: ?foo? Size: 0 Blocks: 0 IO Block: 1048576 regular empty file Device: 2bh/43d Inode: 9680437269025099955 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 99/ nobody) Gid: ( 99/ nobody) Context: system_u:object_r:nfs_t:s0 Access: 2019-06-21 15:10:43.000000000 +0530 Modify: 2019-06-21 15:11:04.000000000 +0530 Change: 2019-06-21 15:11:04.000000000 +0530 Birth: - # cat test.sh #!/bin/bash touch fqwe stat fqwe sleep 1 truncate -s 123 fqwe stat fqwe # ./test.sh File: ?fqwe? Size: 0 Blocks: 0 IO Block: 1048576 regular empty file Device: 2bh/43d Inode: 13678437881096575140 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 99/ nobody) Gid: ( 99/ nobody) Context: system_u:object_r:nfs_t:s0 Access: 2019-06-21 15:18:17.000000000 +0530 Modify: 2019-06-21 15:18:17.000000000 +0530 Change: 2019-06-21 15:18:17.000000000 +0530 Birth: - File: ?fqwe? Size: 0 Blocks: 0 IO Block: 1048576 regular empty file Device: 2bh/43d Inode: 13678437881096575140 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 99/ nobody) Gid: ( 99/ nobody) Context: system_u:object_r:nfs_t:s0 Access: 2019-06-21 15:18:17.000000000 +0530 Modify: 2019-06-21 15:18:18.000000000 +0530 Change: 2019-06-21 15:18:18.000000000 +0530 will check via packet traces as well and confirm --- Additional comment from Sunil Kumar Acharya on 2019-06-24 14:50:59 UTC --- Please updated the RDT flag/text appropriately. --- Additional comment from Jiffin on 2019-06-25 10:36:45 UTC --- RCA : with ctime feature posix_ftruncate was not setting up the ctime which resulted in this failure. Thanks Kotresh for pointing out the fix. will post the patch soon. --- Additional comment from Worker Ant on 2019-06-26 11:20:41 UTC --- REVIEW: https://review.gluster.org/22948 (posix : add posix_set_ctime() in posix_ftruncate()) posted (#1) for review on master by jiffin tony Thottan --- Additional comment from Worker Ant on 2019-06-27 09:29:20 UTC --- REVIEW: https://review.gluster.org/22948 (posix : add posix_set_ctime() in posix_ftruncate()) merged (#3) on master by Kotresh HR Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1723761 [Bug 1723761] [Ganesha]: truncate operation not updating the ctime -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 08:52:20 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 08:52:20 +0000 Subject: [Bugs] [Bug 1723761] [Ganesha]: truncate operation not updating the ctime In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1723761 Jiffin changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1739399 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1739399 [Bug 1739399] [Ganesha]: truncate operation not updating the ctime -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 08:55:07 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 08:55:07 +0000 Subject: [Bugs] [Bug 1739399] [Ganesha]: truncate operation not updating the ctime In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739399 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23186 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 08:55:09 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 08:55:09 +0000 Subject: [Bugs] [Bug 1739399] [Ganesha]: truncate operation not updating the ctime In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739399 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23186 (posix : add posix_set_ctime() in posix_ftruncate()) posted (#1) for review on release-7 by jiffin tony Thottan -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 09:07:44 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 09:07:44 +0000 Subject: [Bugs] [Bug 1732717] fuse: Limit the number of inode invalidation requests in the queue In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732717 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23187 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 09:07:45 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 09:07:45 +0000 Subject: [Bugs] [Bug 1732717] fuse: Limit the number of inode invalidation requests in the queue In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732717 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23187 (fuse: Set limit on invalidate queue size) posted (#1) for review on master by N Balachandran -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 10:01:30 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:01:30 +0000 Subject: [Bugs] [Bug 1739424] New: Disperse volume : data corruption with ftruncate data in 4+2 config Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739424 Bug ID: 1739424 Summary: Disperse volume : data corruption with ftruncate data in 4+2 config Product: GlusterFS Version: 7 Status: NEW Component: disperse Keywords: Reopened Assignee: bugs at gluster.org Reporter: pkarampu at redhat.com CC: aspandey at redhat.com, bugs at gluster.org, jahernan at redhat.com, kinglongmee at gmail.com, pkarampu at redhat.com Depends On: 1727081 Blocks: 1730914, 1732772, 1732774, 1732778, 1732792 Target Milestone: --- Classification: Community +++ This bug was initially created as a clone of Bug #1727081 +++ Description of problem: LTP ftestxx tests reports data corruption at a 4+2 disperse volume. <<>> ftest05 1 TFAIL : ftest05.c:395: Test[0] bad verify @ 0x3800 for val 2 count 487 xfr 2048 file_max 0xfa000. ftest05 0 TINFO : Test[0]: last_trunc = 0x4d800 ftest05 0 TINFO : Stat: size=fa000, ino=120399ba ftest05 0 TINFO : Buf: ftest05 0 TINFO : 64*0, ftest05 0 TINFO : 2, ftest05 0 TINFO : 2, ftest05 0 TINFO : 2, ftest05 0 TINFO : 2, ftest05 0 TINFO : 2, ftest05 0 TINFO : 2, ftest05 0 TINFO : 2, ftest05 0 TINFO : 2, ftest05 0 TINFO : 2, ftest05 0 TINFO : 2, ftest05 0 TINFO : ... more ftest05 0 TINFO : Bits array: ftest05 0 TINFO : 0: ftest05 0 TINFO : 0: ftest05 0 TINFO : ddx ftest05 0 TINFO : 8: ftest05 0 TINFO : ecx Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: --- Additional comment from Worker Ant on 2019-07-05 00:06:33 UTC --- REVIEW: https://review.gluster.org/22999 (cluster/ec: inherit right mask from top parent) posted (#1) for review on master by Kinglong Mee --- Additional comment from Worker Ant on 2019-07-08 13:27:01 UTC --- REVIEW: https://review.gluster.org/23010 (cluster/ec: inherit healing from lock which has info) posted (#1) for review on master by Kinglong Mee --- Additional comment from Pranith Kumar K on 2019-07-10 10:30:39 UTC --- (In reply to Kinglong Mee from comment #0) > Description of problem: > > LTP ftestxx tests reports data corruption at a 4+2 disperse volume. > > <<>> > ftest05 1 TFAIL : ftest05.c:395: Test[0] bad verify @ 0x3800 > for val 2 count 487 xfr 2048 file_max 0xfa000. > ftest05 0 TINFO : Test[0]: last_trunc = 0x4d800 > ftest05 0 TINFO : Stat: size=fa000, ino=120399ba > ftest05 0 TINFO : Buf: > ftest05 0 TINFO : 64*0, > ftest05 0 TINFO : 2, > ftest05 0 TINFO : 2, > ftest05 0 TINFO : 2, > ftest05 0 TINFO : 2, > ftest05 0 TINFO : 2, > ftest05 0 TINFO : 2, > ftest05 0 TINFO : 2, > ftest05 0 TINFO : 2, > ftest05 0 TINFO : 2, > ftest05 0 TINFO : 2, > ftest05 0 TINFO : ... more > ftest05 0 TINFO : Bits array: > ftest05 0 TINFO : 0: > ftest05 0 TINFO : 0: > ftest05 0 TINFO : ddx > ftest05 0 TINFO : 8: > ftest05 0 TINFO : ecx When I try to run this test, it is choosing /tmp as the directory where the file is created. How to change it to the mount directory? root at localhost - /mnt/ec2 15:11:08 :( ? /opt/ltp/testcases/bin/ftest05 ftest05 1 TPASS : Test passed. > > > Version-Release number of selected component (if applicable): > > > How reproducible: > > > Steps to Reproduce: > 1. > 2. > 3. > > Actual results: > > > Expected results: > > > Additional info: --- Additional comment from Kinglong Mee on 2019-07-10 12:47:01 UTC --- (In reply to Pranith Kumar K from comment #3) > (In reply to Kinglong Mee from comment #0) > > Description of problem: > > > > LTP ftestxx tests reports data corruption at a 4+2 disperse volume. > > > > <<>> > > ftest05 1 TFAIL : ftest05.c:395: Test[0] bad verify @ 0x3800 > > for val 2 count 487 xfr 2048 file_max 0xfa000. > > ftest05 0 TINFO : Test[0]: last_trunc = 0x4d800 > > ftest05 0 TINFO : Stat: size=fa000, ino=120399ba > > ftest05 0 TINFO : Buf: > > ftest05 0 TINFO : 64*0, > > ftest05 0 TINFO : 2, > > ftest05 0 TINFO : 2, > > ftest05 0 TINFO : 2, > > ftest05 0 TINFO : 2, > > ftest05 0 TINFO : 2, > > ftest05 0 TINFO : 2, > > ftest05 0 TINFO : 2, > > ftest05 0 TINFO : 2, > > ftest05 0 TINFO : 2, > > ftest05 0 TINFO : 2, > > ftest05 0 TINFO : ... more > > ftest05 0 TINFO : Bits array: > > ftest05 0 TINFO : 0: > > ftest05 0 TINFO : 0: > > ftest05 0 TINFO : ddx > > ftest05 0 TINFO : 8: > > ftest05 0 TINFO : ecx > > > When I try to run this test, it is choosing /tmp as the directory where the > file is created. How to change it to the mount directory? > root at localhost - /mnt/ec2 > 15:11:08 :( ? /opt/ltp/testcases/bin/ftest05 > ftest05 1 TPASS : Test passed. You can run as, ./runltp -p -l /tmp/resut.log -o /tmp/output.log -C /tmp/failed.log -d /mnt/nfs/ -f casefilename-under-runtest When running the test at nfs client, there is a bash scripts running which reboot one node(the cluster node Ganesha.nfsd is not running on) every 600s. --- Additional comment from Kinglong Mee on 2019-07-11 10:32:22 UTC --- valgrind reports some memory leak, ==7925== 300 bytes in 6 blocks are possibly lost in loss record 880 of 1,436 ==7925== at 0x4C29BC3: malloc (vg_replace_malloc.c:299) ==7925== by 0x71828BF: __gf_default_malloc (mem-pool.h:112) ==7925== by 0x7183182: __gf_malloc (mem-pool.c:131) ==7925== by 0x713FB65: gf_strndup (mem-pool.h:189) ==7925== by 0x713FBD5: gf_strdup (mem-pool.h:206) ==7925== by 0x7144465: loc_copy (xlator.c:1276) ==7925== by 0x18EDBF1C: ec_loc_from_loc (ec-helpers.c:760) ==7925== by 0x18F02FE5: ec_manager_open (ec-inode-read.c:778) ==7925== by 0x18EE4905: __ec_manager (ec-common.c:3094) ==7925== by 0x18EE4A0F: ec_manager (ec-common.c:3112) ==7925== by 0x18F037F3: ec_open (ec-inode-read.c:929) ==7925== by 0x18ED5E85: ec_gf_open (ec.c:1146) --- Additional comment from Worker Ant on 2019-07-11 11:05:58 UTC --- REVIEW: https://review.gluster.org/23029 (cluster/ec: do loc_copy from ctx->loc in fd->lock) posted (#1) for review on master by Kinglong Mee --- Additional comment from Kinglong Mee on 2019-07-12 00:47:27 UTC --- ganesha.nfsd crash when healing name, Core was generated by `/usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N N'. Program terminated with signal 11, Segmentation fault. #0 0x00007f0d5ae8c5a9 in ec_heal_name (frame=0x7f0d57c6ca28, ec=0x7f0d5b62d280, parent=0x0, name=0x7f0d57537d31 "b", participants=0x7f0d0dfffe30 "\001\001\001") at ec-heal.c:1685 1685 loc.inode = inode_new(parent->table); Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.6-13.el7.x86_64 dbus-libs-1.10.24-12.el7.x86_64 elfutils-libelf-0.172-2.el7.x86_64 elfutils-libs-0.172-2.el7.x86_64 glibc-2.17-260.el7.x86_64 gssproxy-0.7.0-21.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-34.el7.x86_64 libacl-2.2.51-14.el7.x86_64 libattr-2.4.46-13.el7.x86_64 libblkid-2.23.2-59.el7.x86_64 libcap-2.22-9.el7.x86_64 libcom_err-1.42.9-13.el7.x86_64 libgcc-4.8.5-36.el7.x86_64 libgcrypt-1.5.3-14.el7.x86_64 libgpg-error-1.12-3.el7.x86_64 libnfsidmap-0.25-19.el7.x86_64 libselinux-2.5-14.1.el7.x86_64 libuuid-2.23.2-59.el7.x86_64 lz4-1.7.5-2.el7.x86_64 openssl-libs-1.0.2k-16.el7.x86_64 pcre-8.32-17.el7.x86_64 systemd-libs-219-62.el7.x86_64 xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-18.el7.x86_64 (gdb) bt #0 0x00007f0d5ae8c5a9 in ec_heal_name (frame=0x7f0d57c6ca28, ec=0x7f0d5b62d280, parent=0x0, name=0x7f0d57537d31 "b", participants=0x7f0d0dfffe30 "\001\001\001") at ec-heal.c:1685 #1 0x00007f0d5ae93cae in ec_heal_do (this=0x7f0d5b65ac00, data=0x7f0d24e3c028, loc=0x7f0d24e3c358, partial=0) at ec-heal.c:3050 #2 0x00007f0d5ae94455 in ec_synctask_heal_wrap (opaque=0x7f0d24e3c028) at ec-heal.c:3139 #3 0x00007f0d6d1268c9 in synctask_wrap () at syncop.c:369 #4 0x00007f0d6c6bf010 in ?? () from /lib64/libc.so.6 #5 0x0000000000000000 in ?? () (gdb) frame 1 #1 0x00007f0d5ae93cae in ec_heal_do (this=0x7f0d5b65ac00, data=0x7f0d24e3c028, loc=0x7f0d24e3c358, partial=0) at ec-heal.c:3050 3050 ret = ec_heal_name(frame, ec, loc->parent, (char *)loc->name, (gdb) p loc $1 = (loc_t *) 0x7f0d24e3c358 (gdb) p *loc $2 = { path = 0x7f0d57537d00 "/nfsshare/ltp-eZQlnozjnX/ftegVRmbT/ftest05.20436/b", name = 0x7f0d57537d31 "b", inode = 0x7f0d24255b28, parent = 0x0, gfid = "\263\341\223\031\301\245I\260\234\334\017\to%\305^", pargfid = '\000' } --- Additional comment from Xavi Hernandez on 2019-07-13 14:09:15 UTC --- Please, don't use the same bug for different issues. --- Additional comment from Worker Ant on 2019-07-14 13:03:20 UTC --- REVISION POSTED: https://review.gluster.org/23029 (cluster/ec: do loc_copy from ctx->loc in fd->lock) posted (#2) for review on master by Kinglong Mee --- Additional comment from Worker Ant on 2019-07-16 17:54:25 UTC --- REVIEW: https://review.gluster.org/23010 (cluster/ec: inherit healing from lock when it has info) merged (#4) on master by Amar Tumballi --- Additional comment from Ashish Pandey on 2019-07-17 05:27:44 UTC --- There are two patches associated with this BZ - https://review.gluster.org/#/c/glusterfs/+/22999/ - No merged and under review https://review.gluster.org/#/c/glusterfs/+/23010/ - Merged I would like to keep this bug open till both the patches get merged. -- Ashish --- Additional comment from Worker Ant on 2019-07-18 07:28:12 UTC --- REVIEW: https://review.gluster.org/23069 ((WIP)cluster/ec: Always read from good-mask) posted (#1) for review on master by Pranith Kumar Karampuri --- Additional comment from Worker Ant on 2019-07-23 06:20:08 UTC --- REVIEW: https://review.gluster.org/23073 (cluster/ec: fix data corruption) posted (#4) for review on master by Pranith Kumar Karampuri --- Additional comment from Worker Ant on 2019-07-26 07:11:59 UTC --- REVIEW: https://review.gluster.org/23069 (cluster/ec: Always read from good-mask) merged (#6) on master by Pranith Kumar Karampuri --- Additional comment from Pranith Kumar K on 2019-08-02 07:35:34 UTC --- Found one case which needs to be fixed. --- Additional comment from Worker Ant on 2019-08-02 07:38:12 UTC --- REVIEW: https://review.gluster.org/23147 (cluster/ec: Update lock->good_mask on parent fop failure) posted (#1) for review on master by Pranith Kumar Karampuri --- Additional comment from Worker Ant on 2019-08-07 06:15:15 UTC --- REVIEW: https://review.gluster.org/23147 (cluster/ec: Update lock->good_mask on parent fop failure) merged (#2) on master by Pranith Kumar Karampuri Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1727081 [Bug 1727081] Disperse volume : data corruption with ftruncate data in 4+2 config https://bugzilla.redhat.com/show_bug.cgi?id=1730914 [Bug 1730914] [GSS] Sometimes truncate and discard could cause data corruption when executed while self-heal is running https://bugzilla.redhat.com/show_bug.cgi?id=1732772 [Bug 1732772] Disperse volume : data corruption with ftruncate data in 4+2 config https://bugzilla.redhat.com/show_bug.cgi?id=1732774 [Bug 1732774] Disperse volume : data corruption with ftruncate data in 4+2 config https://bugzilla.redhat.com/show_bug.cgi?id=1732778 [Bug 1732778] [GSS] Sometimes truncate and discard could cause data corruption when executed while self-heal is running https://bugzilla.redhat.com/show_bug.cgi?id=1732792 [Bug 1732792] Disperse volume : data corruption with ftruncate data in 4+2 config -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 10:01:30 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:01:30 +0000 Subject: [Bugs] [Bug 1727081] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1727081 Pranith Kumar K changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1739424 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1739424 [Bug 1739424] Disperse volume : data corruption with ftruncate data in 4+2 config -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 10:01:30 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:01:30 +0000 Subject: [Bugs] [Bug 1732772] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732772 Pranith Kumar K changed: What |Removed |Added ---------------------------------------------------------------------------- Depends On| |1739424 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1739424 [Bug 1739424] Disperse volume : data corruption with ftruncate data in 4+2 config -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 10:01:30 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:01:30 +0000 Subject: [Bugs] [Bug 1732774] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732774 Pranith Kumar K changed: What |Removed |Added ---------------------------------------------------------------------------- Depends On| |1739424 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1739424 [Bug 1739424] Disperse volume : data corruption with ftruncate data in 4+2 config -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 10:01:30 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:01:30 +0000 Subject: [Bugs] [Bug 1732792] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732792 Pranith Kumar K changed: What |Removed |Added ---------------------------------------------------------------------------- Depends On| |1739424 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1739424 [Bug 1739424] Disperse volume : data corruption with ftruncate data in 4+2 config -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 10:02:42 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:02:42 +0000 Subject: [Bugs] [Bug 1739426] New: Open fd heal should filter O_APPEND/O_EXCL Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739426 Bug ID: 1739426 Summary: Open fd heal should filter O_APPEND/O_EXCL Product: GlusterFS Version: 7 Status: NEW Component: disperse Severity: medium Priority: medium Assignee: bugs at gluster.org Reporter: pkarampu at redhat.com CC: atumball at redhat.com, bugs at gluster.org Depends On: 1733935 Blocks: 1734303, 1735514 Target Milestone: --- Classification: Community +++ This bug was initially created as a clone of Bug #1733935 +++ Description of problem: Problem: when a file needs to be re-opened O_APPEND and O_EXCL flags are not filtered in EC. - O_APPEND should be filtered because EC doesn't send O_APPEND below EC for open to make sure writes happen on the individual fragments instead of at the end of the file. - O_EXCL should be filtered because shd could have created the file so even without O_EXCL open should succeed. Fix: Filter out these two flags in reopen. Version-Release number of selected component (if applicable): How reproducible: Found while reading code. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: --- Additional comment from Amar Tumballi on 2019-07-30 05:25:19 UTC --- https://review.gluster.org/#/c/glusterfs/+/23121/ posted. --- Additional comment from Worker Ant on 2019-07-30 05:39:45 UTC --- REVIEW: https://review.gluster.org/23121 (cluster/ec: Fix reopen flags to avoid misbehavior) merged (#4) on master by Pranith Kumar Karampuri Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1733935 [Bug 1733935] Open fd heal should filter O_APPEND/O_EXCL https://bugzilla.redhat.com/show_bug.cgi?id=1734303 [Bug 1734303] Open fd heal should filter O_APPEND/O_EXCL https://bugzilla.redhat.com/show_bug.cgi?id=1735514 [Bug 1735514] Open fd heal should filter O_APPEND/O_EXCL -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 10:02:42 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:02:42 +0000 Subject: [Bugs] [Bug 1733935] Open fd heal should filter O_APPEND/O_EXCL In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1733935 Pranith Kumar K changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1739426 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1739426 [Bug 1739426] Open fd heal should filter O_APPEND/O_EXCL -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 10:02:42 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:02:42 +0000 Subject: [Bugs] [Bug 1734303] Open fd heal should filter O_APPEND/O_EXCL In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734303 Pranith Kumar K changed: What |Removed |Added ---------------------------------------------------------------------------- Depends On| |1739426 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1739426 [Bug 1739426] Open fd heal should filter O_APPEND/O_EXCL -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 10:02:42 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:02:42 +0000 Subject: [Bugs] [Bug 1735514] Open fd heal should filter O_APPEND/O_EXCL In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1735514 Pranith Kumar K changed: What |Removed |Added ---------------------------------------------------------------------------- Depends On| |1739426 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1739426 [Bug 1739426] Open fd heal should filter O_APPEND/O_EXCL -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 10:03:48 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:03:48 +0000 Subject: [Bugs] [Bug 1739427] New: An Input/Output error happens on a disperse volume when doing unaligned writes to a sparse file Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739427 Bug ID: 1739427 Summary: An Input/Output error happens on a disperse volume when doing unaligned writes to a sparse file Product: GlusterFS Version: 7 Status: NEW Component: disperse Keywords: Reopened Assignee: bugs at gluster.org Reporter: pkarampu at redhat.com CC: bugs at gluster.org, jahernan at redhat.com Depends On: 1730715 Blocks: 1731448, 1732779 Target Milestone: --- Classification: Community +++ This bug was initially created as a clone of Bug #1730715 +++ Description of problem: When a write not aligned to the stripe size is done concurrently with other wirtes on a sparse file of a disperse volume, EIO error can be returned in some cases. Version-Release number of selected component (if applicable): mainline How reproducible: randomly Steps to Reproduce: 1. Create a disperse volume 2. Create an empty file 3. Write to two non-overlapping areas of the file with unaligned offsets Actual results: In some cases the write to the lower offset fails with EIO. Expected results: Both writes should succeed. Additional info: EC doesn't allow concurrent writes on overlapping areas, they are serialized. However non-overlapping writes are serviced in parallel. When a write is not aligned, EC first needs to read the entire chunk from disk, apply the modified fragment and write it again. Suppose we have a 4+2 disperse volume. The problem appears on sparse files because a write to an offset implicitly creates data on offsets below it. For example, if a file is empty and we read 10 bytes from offset 10, read() will return 0 bytes. Now, if we write one byte at offset 1M and retry the same read, the system call will return 10 bytes (all containing 0's). So if we have two writes, the first one at offset 10 and the second one at offset 1M, EC will send both in parallel because they do not overlap. However, the first one will try to read missing data from the first chunk (i.e. offset 0 to 9) to recombine the entire chunk and do the final write. This read will happen in parallel with the write to 1M. What could happen is that 3 bricks process the write before the read, and the other 3 process the read before the write. First 3 bricks will return 10 bytes, while the latest three will return 0 (because the file on the brick has not been expanded yet). When EC tries to recombine the answers from the bricks, it can't, because it needs at least 4 consistent answers to recover the data. So this read fails with EIO error. This error is propagated to the parent write, which is aborted and EIO is returned to the application. --- Additional comment from Worker Ant on 2019-07-17 12:58:56 UTC --- REVIEW: https://review.gluster.org/23066 (cluster/ec: fix EIO error for concurrent writes on sparse files) posted (#1) for review on master by Xavi Hernandez --- Additional comment from Worker Ant on 2019-07-24 10:20:48 UTC --- REVIEW: https://review.gluster.org/23066 (cluster/ec: fix EIO error for concurrent writes on sparse files) merged (#4) on master by Pranith Kumar Karampuri --- Additional comment from Worker Ant on 2019-07-27 06:41:19 UTC --- REVIEW: https://review.gluster.org/23113 (cluster/ec: fix EIO error for concurrent writes on sparse files) posted (#1) for review on release-6 by lidi Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1730715 [Bug 1730715] An Input/Output error happens on a disperse volume when doing unaligned writes to a sparse file https://bugzilla.redhat.com/show_bug.cgi?id=1731448 [Bug 1731448] [GSS] An Input/Output error happens on a disperse volume when doing unaligned writes to a sparse file https://bugzilla.redhat.com/show_bug.cgi?id=1732779 [Bug 1732779] [GSS] An Input/Output error happens on a disperse volume when doing unaligned writes to a sparse file -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 10:03:48 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:03:48 +0000 Subject: [Bugs] [Bug 1730715] An Input/Output error happens on a disperse volume when doing unaligned writes to a sparse file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1730715 Pranith Kumar K changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1739427 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1739427 [Bug 1739427] An Input/Output error happens on a disperse volume when doing unaligned writes to a sparse file -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 10:03:48 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:03:48 +0000 Subject: [Bugs] [Bug 1731448] [GSS] An Input/Output error happens on a disperse volume when doing unaligned writes to a sparse file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1731448 Pranith Kumar K changed: What |Removed |Added ---------------------------------------------------------------------------- Depends On| |1739427 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1739427 [Bug 1739427] An Input/Output error happens on a disperse volume when doing unaligned writes to a sparse file -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 10:03:48 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:03:48 +0000 Subject: [Bugs] [Bug 1732779] [GSS] An Input/Output error happens on a disperse volume when doing unaligned writes to a sparse file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732779 Pranith Kumar K changed: What |Removed |Added ---------------------------------------------------------------------------- Depends On| |1739427 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1739427 [Bug 1739427] An Input/Output error happens on a disperse volume when doing unaligned writes to a sparse file -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 10:07:22 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:07:22 +0000 Subject: [Bugs] [Bug 1739424] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739424 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23188 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 10:07:23 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:07:23 +0000 Subject: [Bugs] [Bug 1739424] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739424 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23188 (cluster/ec: inherit healing from lock when it has info) posted (#1) for review on release-7 by Pranith Kumar Karampuri -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 10:07:30 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:07:30 +0000 Subject: [Bugs] [Bug 1739430] New: ctime: Upgrade/Enabling ctime feature wrongly updates older files with latest {a|m|c}time Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739430 Bug ID: 1739430 Summary: ctime: Upgrade/Enabling ctime feature wrongly updates older files with latest {a|m|c}time Product: GlusterFS Version: 7 Status: NEW Component: ctime Keywords: Reopened Severity: high Priority: high Assignee: bugs at gluster.org Reporter: khiremat at redhat.com CC: atumball at redhat.com, bugs at gluster.org, pasik at iki.fi, pkarampu at redhat.com Depends On: 1593542 Blocks: 1715422, 1733885 Target Milestone: --- Classification: Community +++ This bug was initially created as a clone of Bug #1593542 +++ Description of problem: Upgrade scenario: Currently for older files, the ctime gets updated during {a|m|c}time modification fop and eventually becomes consistent. With any {a|m|c}time modification, the ctime is initialized with latest time which is incorrect. So how do we handle this upgrade scenario. Version-Release number of selected component (if applicable): mainline How reproducible: Always Steps to Reproduce: 1. Create EC/replica volume, mount it, create a file. 2. Enable ctime feature 3. touch the created file {m|a|c}time will be latest. Only access time should have been updated. Actual results: {a|m|c}time gets updated. Expected results: Only access time should have been updated. Additional info: --- Additional comment from Worker Ant on 2019-06-24 19:11:36 UTC --- REVIEW: https://review.gluster.org/22936 (ctime: Set mdata xattr on legacy files) posted (#1) for review on master by Kotresh HR --- Additional comment from Worker Ant on 2019-07-22 06:57:08 UTC --- REVIEW: https://review.gluster.org/22936 (ctime: Set mdata xattr on legacy files) merged (#14) on master by Atin Mukherjee --- Additional comment from Worker Ant on 2019-07-22 15:30:46 UTC --- REVIEW: https://review.gluster.org/23091 (features/utime: Fix mem_put crash) posted (#1) for review on master by Pranith Kumar Karampuri --- Additional comment from Worker Ant on 2019-07-23 01:29:55 UTC --- REVIEW: https://review.gluster.org/23091 (features/utime: Fix mem_put crash) merged (#1) on master by Pranith Kumar Karampuri Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1593542 [Bug 1593542] ctime: Upgrade/Enabling ctime feature wrongly updates older files with latest {a|m|c}time https://bugzilla.redhat.com/show_bug.cgi?id=1733885 [Bug 1733885] ctime: Upgrade/Enabling ctime feature wrongly updates older files with latest {a|m|c}time -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 10:07:30 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:07:30 +0000 Subject: [Bugs] [Bug 1593542] ctime: Upgrade/Enabling ctime feature wrongly updates older files with latest {a|m|c}time In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1593542 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1739430 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1739430 [Bug 1739430] ctime: Upgrade/Enabling ctime feature wrongly updates older files with latest {a|m|c}time -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 10:07:30 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:07:30 +0000 Subject: [Bugs] [Bug 1733885] ctime: Upgrade/Enabling ctime feature wrongly updates older files with latest {a|m|c}time In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1733885 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Depends On| |1739430 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1739430 [Bug 1739430] ctime: Upgrade/Enabling ctime feature wrongly updates older files with latest {a|m|c}time -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 10:08:02 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:08:02 +0000 Subject: [Bugs] [Bug 1739430] ctime: Upgrade/Enabling ctime feature wrongly updates older files with latest {a|m|c}time In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739430 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords|Reopened | Status|NEW |ASSIGNED Assignee|bugs at gluster.org |khiremat at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 10:08:25 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:08:25 +0000 Subject: [Bugs] [Bug 1739427] An Input/Output error happens on a disperse volume when doing unaligned writes to a sparse file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739427 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23189 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 10:08:26 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:08:26 +0000 Subject: [Bugs] [Bug 1739427] An Input/Output error happens on a disperse volume when doing unaligned writes to a sparse file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739427 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23189 (cluster/ec: fix EIO error for concurrent writes on sparse files) posted (#1) for review on release-7 by Pranith Kumar Karampuri -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 10:09:30 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:09:30 +0000 Subject: [Bugs] [Bug 1739424] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739424 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23190 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 10:09:31 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:09:31 +0000 Subject: [Bugs] [Bug 1739424] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739424 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23190 (cluster/ec: Always read from good-mask) posted (#1) for review on release-7 by Pranith Kumar Karampuri -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 10:10:35 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:10:35 +0000 Subject: [Bugs] [Bug 1739426] Open fd heal should filter O_APPEND/O_EXCL In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739426 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23191 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 10:10:36 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:10:36 +0000 Subject: [Bugs] [Bug 1739426] Open fd heal should filter O_APPEND/O_EXCL In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739426 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23191 (cluster/ec: Fix reopen flags to avoid misbehavior) posted (#1) for review on release-7 by Pranith Kumar Karampuri -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 10:11:39 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:11:39 +0000 Subject: [Bugs] [Bug 1739424] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739424 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23192 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 10:11:40 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:11:40 +0000 Subject: [Bugs] [Bug 1739424] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739424 --- Comment #3 from Worker Ant --- REVIEW: https://review.gluster.org/23192 (cluster/ec: Update lock->good_mask on parent fop failure) posted (#1) for review on release-7 by Pranith Kumar Karampuri -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 10:14:56 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:14:56 +0000 Subject: [Bugs] [Bug 1739436] New: ctime: When healing ctime xattr for legacy files, if multiple clients access and modify the same file, the ctime might be updated incorrectly. Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739436 Bug ID: 1739436 Summary: ctime: When healing ctime xattr for legacy files, if multiple clients access and modify the same file, the ctime might be updated incorrectly. Product: GlusterFS Version: 7 Status: NEW Component: ctime Assignee: bugs at gluster.org Reporter: khiremat at redhat.com CC: bugs at gluster.org Depends On: 1734299 Blocks: 1734305, 1737745 Target Milestone: --- Classification: Community +++ This bug was initially created as a clone of Bug #1734299 +++ Description of problem: Ctime heals the ctime xattr ("trusted.glusterfs.mdata") in lookup if it's not present. In a multi client scenario, there is a race which results in updating the ctime xattr to older value. e.g. Let c1 and c2 be two clients and file1 be the file which doesn't have the ctime xattr. Let the ctime of file1 be t1. (from backend, ctime heals time attributes from backend when not present). Now following operations are done on mount c1 -> ls -l /mnt1/file1 | c2 -> ls -l /mnt2/file1;echo "append" >> /mnt2/file1; The race is that the both c1 and c2 didn't fetch the ctime xattr in lookup, so both of them tries to heal ctime to time 't1'. If c2 wins the race and appends the file before c1 heals it, it sets the time to 't1' and updates it to 't2' (because of append). Now c1 proceeds to heal and sets it to 't1' which is incorrect. Version-Release number of selected component (if applicable): mainline How reproducible: Always Steps to Reproduce: 1. Create single brick gluster volume and start it 2. Mount at /mnt1 and /mnt2 3. Disable ctime gluster volume set ctime off 4. Create a file touch /mnt/file1 5. Enable ctime gluster volume set ctime on 6. Put a breakpoint at gf_utime_set_mdata_lookup_cbk on '/mnt1' 7. ls -l /mnt1/file1 This hits the break point, allow for root gfid and don't continue on stbuf->ia_gfid equals to file1's gfid 8. ls -l /mnt2/file1 9. The ctime xattr is healed from /mnt2. Capture it. getfattr -d -m . -e hex //file1 | grep mdata 10. echo "append" >> /mnt2/file1 and capture mdata getfattr -d -m . -e hex //file1 | grep mdata 11. Continue the break point at step 7 and capture the mdata Actual results: mdata xattr at step 11 is equal to step 9 (Went back in time) Expected results: mdata xattr at step 11 should be equal to step 10 Additional info: --- Additional comment from Worker Ant on 2019-07-30 08:14:18 UTC --- REVIEW: https://review.gluster.org/23131 (posix/ctime: Fix race during lookup ctime xattr heal) posted (#1) for review on master by Kotresh HR --- Additional comment from Worker Ant on 2019-08-01 02:59:49 UTC --- REVIEW: https://review.gluster.org/23131 (posix/ctime: Fix race during lookup ctime xattr heal) merged (#2) on master by Amar Tumballi Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1734299 [Bug 1734299] ctime: When healing ctime xattr for legacy files, if multiple clients access and modify the same file, the ctime might be updated incorrectly. https://bugzilla.redhat.com/show_bug.cgi?id=1734305 [Bug 1734305] ctime: When healing ctime xattr for legacy files, if multiple clients access and modify the same file, the ctime might be updated incorrectly. https://bugzilla.redhat.com/show_bug.cgi?id=1737745 [Bug 1737745] ctime: When healing ctime xattr for legacy files, if multiple clients access and modify the same file, the ctime might be updated incorrectly. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 10:14:56 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:14:56 +0000 Subject: [Bugs] [Bug 1734299] ctime: When healing ctime xattr for legacy files, if multiple clients access and modify the same file, the ctime might be updated incorrectly. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734299 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1739436 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1739436 [Bug 1739436] ctime: When healing ctime xattr for legacy files, if multiple clients access and modify the same file, the ctime might be updated incorrectly. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 10:14:56 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:14:56 +0000 Subject: [Bugs] [Bug 1734305] ctime: When healing ctime xattr for legacy files, if multiple clients access and modify the same file, the ctime might be updated incorrectly. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734305 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Depends On| |1739436 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1739436 [Bug 1739436] ctime: When healing ctime xattr for legacy files, if multiple clients access and modify the same file, the ctime might be updated incorrectly. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 10:14:56 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:14:56 +0000 Subject: [Bugs] [Bug 1737745] ctime: When healing ctime xattr for legacy files, if multiple clients access and modify the same file, the ctime might be updated incorrectly. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737745 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Depends On| |1739436 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1739436 [Bug 1739436] ctime: When healing ctime xattr for legacy files, if multiple clients access and modify the same file, the ctime might be updated incorrectly. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 10:15:11 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:15:11 +0000 Subject: [Bugs] [Bug 1739436] ctime: When healing ctime xattr for legacy files, if multiple clients access and modify the same file, the ctime might be updated incorrectly. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739436 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED Assignee|bugs at gluster.org |khiremat at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 10:16:56 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:16:56 +0000 Subject: [Bugs] [Bug 1739437] New: nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739437 Bug ID: 1739437 Summary: nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on Product: GlusterFS Version: 7 Status: NEW Component: ctime Severity: high Priority: medium Assignee: bugs at gluster.org Reporter: khiremat at redhat.com CC: atumball at redhat.com, bugs at gluster.org, khiremat at redhat.com, kinglongmee at gmail.com Depends On: 1737288 Blocks: 1737705, 1737746 Target Milestone: --- Classification: Community +++ This bug was initially created as a clone of Bug #1737288 +++ Description of problem: I have a 4+2 disperse volume with ctime on, and export a dir from nfs-ganesha, storage.ctime: on features.utime: on When I copy a local file to nfs client, stat shows bad ctime for the file. # stat /mnt/nfs/test* File: ?/mnt/nfs/test1.sh? Size: 166 Blocks: 4 IO Block: 1048576 regular file Device: 27h/39d Inode: 10744358902712050257 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2019-08-05 09:49:00.000000000 +0800 Modify: 2019-08-05 09:49:00.000000000 +0800 Change: 2061-07-23 21:54:08.000000000 +0800 Birth: - File: ?/mnt/nfs/test2.sh? Size: 214 Blocks: 4 IO Block: 1048576 regular file Device: 27h/39d Inode: 12073556847735387788 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2019-08-05 09:49:00.000000000 +0800 Modify: 2019-08-05 09:49:00.000000000 +0800 Change: 2061-07-23 21:54:08.000000000 +0800 Birth: - # ps a 342188 pts/0 D+ 0:00 cp -i test1.sh test2.sh /mnt/nfs/ # gdb glusterfsd (gdb) p *stbuf $1 = {ia_flags = 0, ia_ino = 0, ia_dev = 0, ia_rdev = 0, ia_size = 0, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 174138658, ia_mtime = 2889352448, ia_ctime = 0, ia_btime = 0, ia_atime_nsec = 0, ia_mtime_nsec = 0, ia_ctime_nsec = 0, ia_btime_nsec = 0, ia_attributes = 0, ia_attributes_mask = 0, ia_gfid = '\000' , ia_type = IA_INVAL, ia_prot = { suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = { read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = { read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = { read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}} It is caused by nfs client create the copied file as EXCLUSIVE mode which set a verifier, the verifier is set to file's atime and mtime. nfs client set the verifier as, if (flags & O_EXCL) { data->arg.create.createmode = NFS3_CREATE_EXCLUSIVE; data->arg.create.verifier[0] = cpu_to_be32(jiffies); data->arg.create.verifier[1] = cpu_to_be32(current->pid); } the verifier[0] is set to file's atime, and verifier[1] is set to mtime. But utime at storage/posix set the mtime to ctime too at setattr and set ctime to a earlier time is not allowed. /* Earlier, mdata was updated only if the existing time is less * than the time to be updated. This would fail the scenarios * where mtime can be set to any time using the syscall. Hence * just updating without comparison. But the ctime is not * allowed to changed to older date. */ The following codes is used to find those PIDs which may cause a bad ctime for a copied file. ========================================================================== #include #include int swap_endian(int val){ val = ((val << 8)&0xFF00FF00) | ((val >> 8)&0x00FF00FF); return (val << 16)|(val >> 16); } // time of 2020/01/01 0:0:0 #define TO2020 1577808000 int main(int argc, char **argv) { unsigned int i = 0, val = 0; for (i = 0; i < 500000; i++) { val = swap_endian(i); if (val > TO2020) printf("%u %u\n", i, val); } return 0; } --- Additional comment from Worker Ant on 2019-08-05 03:18:00 UTC --- REVIEW: https://review.gluster.org/23154 (features/utime: always update ctime at setattr) posted (#1) for review on master by Kinglong Mee --- Additional comment from Worker Ant on 2019-08-06 06:06:15 UTC --- REVIEW: https://review.gluster.org/23154 (features/utime: always update ctime at setattr) merged (#2) on master by Kotresh HR Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1737288 [Bug 1737288] nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on https://bugzilla.redhat.com/show_bug.cgi?id=1737705 [Bug 1737705] ctime: nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on https://bugzilla.redhat.com/show_bug.cgi?id=1737746 [Bug 1737746] ctime: nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 10:16:56 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:16:56 +0000 Subject: [Bugs] [Bug 1737288] nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737288 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1739437 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1739437 [Bug 1739437] nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 10:16:56 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:16:56 +0000 Subject: [Bugs] [Bug 1737705] ctime: nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737705 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Depends On| |1739437 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1739437 [Bug 1739437] nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 10:16:56 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:16:56 +0000 Subject: [Bugs] [Bug 1737746] ctime: nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737746 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Depends On| |1739437 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1739437 [Bug 1739437] nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 10:17:10 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:17:10 +0000 Subject: [Bugs] [Bug 1739437] nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739437 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED Assignee|bugs at gluster.org |khiremat at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 10:19:10 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:19:10 +0000 Subject: [Bugs] [Bug 1739430] ctime: Upgrade/Enabling ctime feature wrongly updates older files with latest {a|m|c}time In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739430 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23193 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 10:19:11 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:19:11 +0000 Subject: [Bugs] [Bug 1739430] ctime: Upgrade/Enabling ctime feature wrongly updates older files with latest {a|m|c}time In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739430 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23193 (ctime: Set mdata xattr on legacy files) posted (#1) for review on release-7 by Kotresh HR -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 10:20:14 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:20:14 +0000 Subject: [Bugs] [Bug 1739430] ctime: Upgrade/Enabling ctime feature wrongly updates older files with latest {a|m|c}time In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739430 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23194 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 10:20:15 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:20:15 +0000 Subject: [Bugs] [Bug 1739430] ctime: Upgrade/Enabling ctime feature wrongly updates older files with latest {a|m|c}time In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739430 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23194 (features/utime: Fix mem_put crash) posted (#1) for review on release-7 by Kotresh HR -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 10:21:19 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:21:19 +0000 Subject: [Bugs] [Bug 1739436] ctime: When healing ctime xattr for legacy files, if multiple clients access and modify the same file, the ctime might be updated incorrectly. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739436 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23195 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 10:21:25 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:21:25 +0000 Subject: [Bugs] [Bug 1739436] ctime: When healing ctime xattr for legacy files, if multiple clients access and modify the same file, the ctime might be updated incorrectly. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739436 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23195 (posix/ctime: Fix race during lookup ctime xattr heal) posted (#1) for review on release-7 by Kotresh HR -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 10:22:27 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:22:27 +0000 Subject: [Bugs] [Bug 1739437] nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739437 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23196 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 10:22:28 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:22:28 +0000 Subject: [Bugs] [Bug 1739437] nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739437 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23196 (features/utime: always update ctime at setattr) posted (#1) for review on release-7 by Kotresh HR -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 10:27:41 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:27:41 +0000 Subject: [Bugs] [Bug 1739442] New: Unable to create geo-rep session on a non-root setup. Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739442 Bug ID: 1739442 Summary: Unable to create geo-rep session on a non-root setup. Product: GlusterFS Version: 7 Hardware: x86_64 OS: Linux Status: NEW Component: geo-replication Keywords: Regression Severity: high Assignee: bugs at gluster.org Reporter: khiremat at redhat.com CC: avishwan at redhat.com, bugs at gluster.org, csaba at redhat.com, khiremat at redhat.com, kiyer at redhat.com, rhs-bugs at redhat.com, sankarshan at redhat.com, storage-qa-internal at redhat.com Depends On: 1734734, 1734738 Blocks: 1737712, 1737716 Target Milestone: --- Classification: Community Description of problem: Unable to create a non-root geo-rep session on a geo-rep setup. Version-Release number of selected component (if applicable): gluster-7 How reproducible: Always Steps to Reproduce: 1.Create a non-root geo-rep setup. 2.Try to create a non-root geo-rep session. Actual results: # gluster volume geo-replication master-rep geoaccount at 10.70.43.185::slave-rep create push-pem gluster command not found on 10.70.43.185 for user geoaccount. geo-replication command failed Expected results: # gluster volume geo-replication master-rep geoaccount at 10.70.43.185::slave-rep Creating geo-replication session between master-rep & geoaccount at 10.70.43.185::slave-rep has been successful Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1734734 [Bug 1734734] Unable to create geo-rep session on a non-root setup. https://bugzilla.redhat.com/show_bug.cgi?id=1734738 [Bug 1734738] Unable to create geo-rep session on a non-root setup. https://bugzilla.redhat.com/show_bug.cgi?id=1737712 [Bug 1737712] Unable to create geo-rep session on a non-root setup. https://bugzilla.redhat.com/show_bug.cgi?id=1737716 [Bug 1737716] Unable to create geo-rep session on a non-root setup. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 10:27:41 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:27:41 +0000 Subject: [Bugs] [Bug 1734738] Unable to create geo-rep session on a non-root setup. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734738 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1739442 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1739442 [Bug 1739442] Unable to create geo-rep session on a non-root setup. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 10:27:41 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:27:41 +0000 Subject: [Bugs] [Bug 1737712] Unable to create geo-rep session on a non-root setup. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737712 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Depends On| |1739442 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1739442 [Bug 1739442] Unable to create geo-rep session on a non-root setup. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 10:27:41 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:27:41 +0000 Subject: [Bugs] [Bug 1737716] Unable to create geo-rep session on a non-root setup. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737716 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Depends On| |1739442 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1739442 [Bug 1739442] Unable to create geo-rep session on a non-root setup. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 10:28:00 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:28:00 +0000 Subject: [Bugs] [Bug 1739442] Unable to create geo-rep session on a non-root setup. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739442 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords|Regression | Status|NEW |ASSIGNED Assignee|bugs at gluster.org |khiremat at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 10:29:53 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:29:53 +0000 Subject: [Bugs] [Bug 1739442] Unable to create geo-rep session on a non-root setup. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739442 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23198 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 10:29:54 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:29:54 +0000 Subject: [Bugs] [Bug 1739442] Unable to create geo-rep session on a non-root setup. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739442 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23198 (geo-rep: Fix mount broker setup issue) posted (#1) for review on release-7 by Kotresh HR -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 10:46:19 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:46:19 +0000 Subject: [Bugs] [Bug 1739446] New: [Disperse] : Client side heal is not removing dirty flag for some of the files. Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739446 Bug ID: 1739446 Summary: [Disperse] : Client side heal is not removing dirty flag for some of the files. Product: GlusterFS Version: 6 Status: NEW Component: disperse Severity: medium Priority: medium Assignee: bugs at gluster.org Reporter: pkarampu at redhat.com CC: aspandey at redhat.com, atumball at redhat.com, bugs at gluster.org, jahernan at redhat.com, nchilaka at redhat.com, pkarampu at redhat.com, vavuthu at redhat.com Depends On: 1593224 Blocks: 1600918, 1693223 Target Milestone: --- Classification: Community +++ This bug was initially created as a clone of Bug #1593224 +++ Description of problem: While server side heal is disabled, client side heal is not removing dirty flag for some files. Version-Release number of selected component (if applicable): How reproducible: 50% Steps to Reproduce: 1. Create a 4+2 volume and mount 2. Touch 100 files 3. Kill one brick 4. write some data all files using dd 5. bricng the bricks UP 6. Append some data on all the bricks using dd, this will trigger heal on all the files 7. Read data from all the files using dd command. At the end all files should be healed. However, I have observed that 2-3 files are still showing up in heal info. When I looked for the getxattr, all the xattrs were same and dirty flag was still present for data fop. Actual results: Expected results: Additional info: --- Additional comment from Worker Ant on 2018-11-28 12:51:02 UTC --- REVIEW: https://review.gluster.org/21744 (cluster/ec: Don't enqueue an entry if it is already healing) posted (#2) for review on master by Ashish Pandey --- Additional comment from Ashish Pandey on 2018-12-06 10:23:52 UTC --- There are two things we need to fix. First, If there are number of entries to be healed and client touches all of them while SHD if off, it will trigger heal for all the files but not all the files will be placed in queue. This is because, for a single file there are number of fops coming and all are placing the same file in queue for healing which fills the queue quickly and new files will not get chance. Second, When a file is started healing, sometimes we see that the dirty flag is not removed while version and size, have been healed and are same. Need to find out why is this happening. If the shd is OFF and if a client access this file it will not find any discrepancies as version and size on all the bricks are same, hence heal will not be triggered for this file and the dirty will remain as it is. --- Additional comment from Ashish Pandey on 2018-12-11 06:51:11 UTC --- While debugging the failure of this patch and thinking of incorporating comments given by Pranith and Xavi, I found that there is some design constraints to implement the idea of not enqueue an entry if it is already healing. Consider 2+1 config and following scenario - 1 - Create volume and disable self heal daemon. 2 - Created a file wrote some data while all the bricks are UP. 3 - Kill one brick and write some data on the same file. 4 - Bring the brick UP. 5 - Now to trigger heal we will do "chmod 0666 file". This will do stst on file which will find the brick is not healthy and trigger the heal. 6 - Now a synctask for the heal will be created and started which will call ec_heal_do, which in turn calls ec_heal_metadata and ec_heal_data. 7 - A fop setattr will also be called on the file to set permission. Now, a sequence of steps could be like this- a > Stat- which saw unhealthy file and triggered heal b >ec_heal_metadata - took lock and healed metadata and healed metadata part of trusted.ec.version, release the lock on file. [At this point setattr is waiting for lock] c > setattr takes the lock and found that the brick is still unhealthy as data version is not healed and miss matching. Mark the dirty for metadata version, unlock the file. d > ec_heal_data takes the locks and heals the data. Now, if we restrict only one fop to trigger heal, after step d, the file will contain dirty flag and mismatched metadata versions. If we keep all the heal request from every fop in a queue and after every heal we check if the heal is needed or not then we will end up triggering heal for all the fop, defeats the purpose of the patch. Xavi, Pranith, Please provide your comments. Am I correct in my understanding? --- Additional comment from Ashish Pandey on 2018-12-18 11:23:15 UTC --- I have found a bug which was the actual cause of dirty flag remain set even after heal happened and all the version and size are matching. This bug can only be visible when we have shd disabled as shd will clear the dirty flag if it has nothing to heal. 1 - Let's say we have disabled shd 2 - Create a file and then kill a brick 3 - Write data, around 1GB, on file which will be healed after bricks comes UP 4 - Bring the brick UP 5 - Do "chmod 0777 file" this will trigger heal. 6 - Immediately, start write on this file append 512 bytes using dd. Now, while data healing was happening, write from mount came (step 6) and took the lock. It saw that healing flag is set on version xattr of file so it will send write on all the bricks. Before releasing lock it will also update version and size on ALL the bricks including brick which is healing. However, in ec_update_info, we consider lock->good_mask to decide if we should unset the "dirty" flag or not which was set by this write fop, even if it has succeeded on all the bricks. So, +1 of dirty flag will remain as it is by this write fop. Now, data heal again got the lock and after completing data heal, we unset the dirty flag by decreasing it by the _same_ number which we found at the start of healing. This will not have the incremented value made by the write fop in step 6. So, after healing a dirty flag will remain set on the file. This flag will never be unset if shd is not enabled. --- Additional comment from Worker Ant on 2019-03-27 11:15:29 UTC --- REVIEW: https://review.gluster.org/21744 (cluster/ec: Don't enqueue an entry if it is already healing) merged (#11) on master by Xavi Hernandez --- Additional comment from Worker Ant on 2019-06-20 12:11:27 UTC --- REVIEW: https://review.gluster.org/22907 (cluster/ec: Prevent double pre-op xattrops) posted (#1) for review on master by Pranith Kumar Karampuri --- Additional comment from Worker Ant on 2019-06-22 05:06:51 UTC --- REVIEW: https://review.gluster.org/22907 (cluster/ec: Prevent double pre-op xattrops) merged (#4) on master by Pranith Kumar Karampuri Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1593224 [Bug 1593224] [Disperse] : Client side heal is not removing dirty flag for some of the files. https://bugzilla.redhat.com/show_bug.cgi?id=1600918 [Bug 1600918] [Disperse] : Client side heal is not removing dirty flag for some of the files. https://bugzilla.redhat.com/show_bug.cgi?id=1693223 [Bug 1693223] [Disperse] : Client side heal is not removing dirty flag for some of the files. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 10:46:19 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:46:19 +0000 Subject: [Bugs] [Bug 1593224] [Disperse] : Client side heal is not removing dirty flag for some of the files. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1593224 Pranith Kumar K changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1739446 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1739446 [Bug 1739446] [Disperse] : Client side heal is not removing dirty flag for some of the files. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 10:46:19 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:46:19 +0000 Subject: [Bugs] [Bug 1693223] [Disperse] : Client side heal is not removing dirty flag for some of the files. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1693223 Pranith Kumar K changed: What |Removed |Added ---------------------------------------------------------------------------- Depends On| |1739446 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1739446 [Bug 1739446] [Disperse] : Client side heal is not removing dirty flag for some of the files. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 10:47:31 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:47:31 +0000 Subject: [Bugs] [Bug 1739449] New: Disperse volume : data corruption with ftruncate data in 4+2 config Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739449 Bug ID: 1739449 Summary: Disperse volume : data corruption with ftruncate data in 4+2 config Product: GlusterFS Version: 6 Status: NEW Component: disperse Keywords: Reopened Assignee: bugs at gluster.org Reporter: pkarampu at redhat.com CC: aspandey at redhat.com, bugs at gluster.org, jahernan at redhat.com, kinglongmee at gmail.com, pkarampu at redhat.com Depends On: 1739424, 1727081 Blocks: 1730914, 1732772, 1732774, 1732778, 1732792 Target Milestone: --- Classification: Community +++ This bug was initially created as a clone of Bug #1739424 +++ +++ This bug was initially created as a clone of Bug #1727081 +++ Description of problem: LTP ftestxx tests reports data corruption at a 4+2 disperse volume. <<>> ftest05 1 TFAIL : ftest05.c:395: Test[0] bad verify @ 0x3800 for val 2 count 487 xfr 2048 file_max 0xfa000. ftest05 0 TINFO : Test[0]: last_trunc = 0x4d800 ftest05 0 TINFO : Stat: size=fa000, ino=120399ba ftest05 0 TINFO : Buf: ftest05 0 TINFO : 64*0, ftest05 0 TINFO : 2, ftest05 0 TINFO : 2, ftest05 0 TINFO : 2, ftest05 0 TINFO : 2, ftest05 0 TINFO : 2, ftest05 0 TINFO : 2, ftest05 0 TINFO : 2, ftest05 0 TINFO : 2, ftest05 0 TINFO : 2, ftest05 0 TINFO : 2, ftest05 0 TINFO : ... more ftest05 0 TINFO : Bits array: ftest05 0 TINFO : 0: ftest05 0 TINFO : 0: ftest05 0 TINFO : ddx ftest05 0 TINFO : 8: ftest05 0 TINFO : ecx Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: --- Additional comment from Worker Ant on 2019-07-05 00:06:33 UTC --- REVIEW: https://review.gluster.org/22999 (cluster/ec: inherit right mask from top parent) posted (#1) for review on master by Kinglong Mee --- Additional comment from Worker Ant on 2019-07-08 13:27:01 UTC --- REVIEW: https://review.gluster.org/23010 (cluster/ec: inherit healing from lock which has info) posted (#1) for review on master by Kinglong Mee --- Additional comment from Pranith Kumar K on 2019-07-10 10:30:39 UTC --- (In reply to Kinglong Mee from comment #0) > Description of problem: > > LTP ftestxx tests reports data corruption at a 4+2 disperse volume. > > <<>> > ftest05 1 TFAIL : ftest05.c:395: Test[0] bad verify @ 0x3800 > for val 2 count 487 xfr 2048 file_max 0xfa000. > ftest05 0 TINFO : Test[0]: last_trunc = 0x4d800 > ftest05 0 TINFO : Stat: size=fa000, ino=120399ba > ftest05 0 TINFO : Buf: > ftest05 0 TINFO : 64*0, > ftest05 0 TINFO : 2, > ftest05 0 TINFO : 2, > ftest05 0 TINFO : 2, > ftest05 0 TINFO : 2, > ftest05 0 TINFO : 2, > ftest05 0 TINFO : 2, > ftest05 0 TINFO : 2, > ftest05 0 TINFO : 2, > ftest05 0 TINFO : 2, > ftest05 0 TINFO : 2, > ftest05 0 TINFO : ... more > ftest05 0 TINFO : Bits array: > ftest05 0 TINFO : 0: > ftest05 0 TINFO : 0: > ftest05 0 TINFO : ddx > ftest05 0 TINFO : 8: > ftest05 0 TINFO : ecx When I try to run this test, it is choosing /tmp as the directory where the file is created. How to change it to the mount directory? root at localhost - /mnt/ec2 15:11:08 :( ? /opt/ltp/testcases/bin/ftest05 ftest05 1 TPASS : Test passed. > > > Version-Release number of selected component (if applicable): > > > How reproducible: > > > Steps to Reproduce: > 1. > 2. > 3. > > Actual results: > > > Expected results: > > > Additional info: --- Additional comment from Kinglong Mee on 2019-07-10 12:47:01 UTC --- (In reply to Pranith Kumar K from comment #3) > (In reply to Kinglong Mee from comment #0) > > Description of problem: > > > > LTP ftestxx tests reports data corruption at a 4+2 disperse volume. > > > > <<>> > > ftest05 1 TFAIL : ftest05.c:395: Test[0] bad verify @ 0x3800 > > for val 2 count 487 xfr 2048 file_max 0xfa000. > > ftest05 0 TINFO : Test[0]: last_trunc = 0x4d800 > > ftest05 0 TINFO : Stat: size=fa000, ino=120399ba > > ftest05 0 TINFO : Buf: > > ftest05 0 TINFO : 64*0, > > ftest05 0 TINFO : 2, > > ftest05 0 TINFO : 2, > > ftest05 0 TINFO : 2, > > ftest05 0 TINFO : 2, > > ftest05 0 TINFO : 2, > > ftest05 0 TINFO : 2, > > ftest05 0 TINFO : 2, > > ftest05 0 TINFO : 2, > > ftest05 0 TINFO : 2, > > ftest05 0 TINFO : 2, > > ftest05 0 TINFO : ... more > > ftest05 0 TINFO : Bits array: > > ftest05 0 TINFO : 0: > > ftest05 0 TINFO : 0: > > ftest05 0 TINFO : ddx > > ftest05 0 TINFO : 8: > > ftest05 0 TINFO : ecx > > > When I try to run this test, it is choosing /tmp as the directory where the > file is created. How to change it to the mount directory? > root at localhost - /mnt/ec2 > 15:11:08 :( ? /opt/ltp/testcases/bin/ftest05 > ftest05 1 TPASS : Test passed. You can run as, ./runltp -p -l /tmp/resut.log -o /tmp/output.log -C /tmp/failed.log -d /mnt/nfs/ -f casefilename-under-runtest When running the test at nfs client, there is a bash scripts running which reboot one node(the cluster node Ganesha.nfsd is not running on) every 600s. --- Additional comment from Kinglong Mee on 2019-07-11 10:32:22 UTC --- valgrind reports some memory leak, ==7925== 300 bytes in 6 blocks are possibly lost in loss record 880 of 1,436 ==7925== at 0x4C29BC3: malloc (vg_replace_malloc.c:299) ==7925== by 0x71828BF: __gf_default_malloc (mem-pool.h:112) ==7925== by 0x7183182: __gf_malloc (mem-pool.c:131) ==7925== by 0x713FB65: gf_strndup (mem-pool.h:189) ==7925== by 0x713FBD5: gf_strdup (mem-pool.h:206) ==7925== by 0x7144465: loc_copy (xlator.c:1276) ==7925== by 0x18EDBF1C: ec_loc_from_loc (ec-helpers.c:760) ==7925== by 0x18F02FE5: ec_manager_open (ec-inode-read.c:778) ==7925== by 0x18EE4905: __ec_manager (ec-common.c:3094) ==7925== by 0x18EE4A0F: ec_manager (ec-common.c:3112) ==7925== by 0x18F037F3: ec_open (ec-inode-read.c:929) ==7925== by 0x18ED5E85: ec_gf_open (ec.c:1146) --- Additional comment from Worker Ant on 2019-07-11 11:05:58 UTC --- REVIEW: https://review.gluster.org/23029 (cluster/ec: do loc_copy from ctx->loc in fd->lock) posted (#1) for review on master by Kinglong Mee --- Additional comment from Kinglong Mee on 2019-07-12 00:47:27 UTC --- ganesha.nfsd crash when healing name, Core was generated by `/usr/bin/ganesha.nfsd -L /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N N'. Program terminated with signal 11, Segmentation fault. #0 0x00007f0d5ae8c5a9 in ec_heal_name (frame=0x7f0d57c6ca28, ec=0x7f0d5b62d280, parent=0x0, name=0x7f0d57537d31 "b", participants=0x7f0d0dfffe30 "\001\001\001") at ec-heal.c:1685 1685 loc.inode = inode_new(parent->table); Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.6-13.el7.x86_64 dbus-libs-1.10.24-12.el7.x86_64 elfutils-libelf-0.172-2.el7.x86_64 elfutils-libs-0.172-2.el7.x86_64 glibc-2.17-260.el7.x86_64 gssproxy-0.7.0-21.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-34.el7.x86_64 libacl-2.2.51-14.el7.x86_64 libattr-2.4.46-13.el7.x86_64 libblkid-2.23.2-59.el7.x86_64 libcap-2.22-9.el7.x86_64 libcom_err-1.42.9-13.el7.x86_64 libgcc-4.8.5-36.el7.x86_64 libgcrypt-1.5.3-14.el7.x86_64 libgpg-error-1.12-3.el7.x86_64 libnfsidmap-0.25-19.el7.x86_64 libselinux-2.5-14.1.el7.x86_64 libuuid-2.23.2-59.el7.x86_64 lz4-1.7.5-2.el7.x86_64 openssl-libs-1.0.2k-16.el7.x86_64 pcre-8.32-17.el7.x86_64 systemd-libs-219-62.el7.x86_64 xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-18.el7.x86_64 (gdb) bt #0 0x00007f0d5ae8c5a9 in ec_heal_name (frame=0x7f0d57c6ca28, ec=0x7f0d5b62d280, parent=0x0, name=0x7f0d57537d31 "b", participants=0x7f0d0dfffe30 "\001\001\001") at ec-heal.c:1685 #1 0x00007f0d5ae93cae in ec_heal_do (this=0x7f0d5b65ac00, data=0x7f0d24e3c028, loc=0x7f0d24e3c358, partial=0) at ec-heal.c:3050 #2 0x00007f0d5ae94455 in ec_synctask_heal_wrap (opaque=0x7f0d24e3c028) at ec-heal.c:3139 #3 0x00007f0d6d1268c9 in synctask_wrap () at syncop.c:369 #4 0x00007f0d6c6bf010 in ?? () from /lib64/libc.so.6 #5 0x0000000000000000 in ?? () (gdb) frame 1 #1 0x00007f0d5ae93cae in ec_heal_do (this=0x7f0d5b65ac00, data=0x7f0d24e3c028, loc=0x7f0d24e3c358, partial=0) at ec-heal.c:3050 3050 ret = ec_heal_name(frame, ec, loc->parent, (char *)loc->name, (gdb) p loc $1 = (loc_t *) 0x7f0d24e3c358 (gdb) p *loc $2 = { path = 0x7f0d57537d00 "/nfsshare/ltp-eZQlnozjnX/ftegVRmbT/ftest05.20436/b", name = 0x7f0d57537d31 "b", inode = 0x7f0d24255b28, parent = 0x0, gfid = "\263\341\223\031\301\245I\260\234\334\017\to%\305^", pargfid = '\000' } --- Additional comment from Xavi Hernandez on 2019-07-13 14:09:15 UTC --- Please, don't use the same bug for different issues. --- Additional comment from Worker Ant on 2019-07-14 13:03:20 UTC --- REVISION POSTED: https://review.gluster.org/23029 (cluster/ec: do loc_copy from ctx->loc in fd->lock) posted (#2) for review on master by Kinglong Mee --- Additional comment from Worker Ant on 2019-07-16 17:54:25 UTC --- REVIEW: https://review.gluster.org/23010 (cluster/ec: inherit healing from lock when it has info) merged (#4) on master by Amar Tumballi --- Additional comment from Ashish Pandey on 2019-07-17 05:27:44 UTC --- There are two patches associated with this BZ - https://review.gluster.org/#/c/glusterfs/+/22999/ - No merged and under review https://review.gluster.org/#/c/glusterfs/+/23010/ - Merged I would like to keep this bug open till both the patches get merged. -- Ashish --- Additional comment from Worker Ant on 2019-07-18 07:28:12 UTC --- REVIEW: https://review.gluster.org/23069 ((WIP)cluster/ec: Always read from good-mask) posted (#1) for review on master by Pranith Kumar Karampuri --- Additional comment from Worker Ant on 2019-07-23 06:20:08 UTC --- REVIEW: https://review.gluster.org/23073 (cluster/ec: fix data corruption) posted (#4) for review on master by Pranith Kumar Karampuri --- Additional comment from Worker Ant on 2019-07-26 07:11:59 UTC --- REVIEW: https://review.gluster.org/23069 (cluster/ec: Always read from good-mask) merged (#6) on master by Pranith Kumar Karampuri --- Additional comment from Pranith Kumar K on 2019-08-02 07:35:34 UTC --- Found one case which needs to be fixed. --- Additional comment from Worker Ant on 2019-08-02 07:38:12 UTC --- REVIEW: https://review.gluster.org/23147 (cluster/ec: Update lock->good_mask on parent fop failure) posted (#1) for review on master by Pranith Kumar Karampuri --- Additional comment from Worker Ant on 2019-08-07 06:15:15 UTC --- REVIEW: https://review.gluster.org/23147 (cluster/ec: Update lock->good_mask on parent fop failure) merged (#2) on master by Pranith Kumar Karampuri --- Additional comment from Worker Ant on 2019-08-09 10:07:23 UTC --- REVIEW: https://review.gluster.org/23188 (cluster/ec: inherit healing from lock when it has info) posted (#1) for review on release-7 by Pranith Kumar Karampuri --- Additional comment from Worker Ant on 2019-08-09 10:09:31 UTC --- REVIEW: https://review.gluster.org/23190 (cluster/ec: Always read from good-mask) posted (#1) for review on release-7 by Pranith Kumar Karampuri --- Additional comment from Worker Ant on 2019-08-09 10:11:40 UTC --- REVIEW: https://review.gluster.org/23192 (cluster/ec: Update lock->good_mask on parent fop failure) posted (#1) for review on release-7 by Pranith Kumar Karampuri Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1727081 [Bug 1727081] Disperse volume : data corruption with ftruncate data in 4+2 config https://bugzilla.redhat.com/show_bug.cgi?id=1730914 [Bug 1730914] [GSS] Sometimes truncate and discard could cause data corruption when executed while self-heal is running https://bugzilla.redhat.com/show_bug.cgi?id=1732772 [Bug 1732772] Disperse volume : data corruption with ftruncate data in 4+2 config https://bugzilla.redhat.com/show_bug.cgi?id=1732774 [Bug 1732774] Disperse volume : data corruption with ftruncate data in 4+2 config https://bugzilla.redhat.com/show_bug.cgi?id=1732778 [Bug 1732778] [GSS] Sometimes truncate and discard could cause data corruption when executed while self-heal is running https://bugzilla.redhat.com/show_bug.cgi?id=1732792 [Bug 1732792] Disperse volume : data corruption with ftruncate data in 4+2 config https://bugzilla.redhat.com/show_bug.cgi?id=1739424 [Bug 1739424] Disperse volume : data corruption with ftruncate data in 4+2 config -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 10:47:31 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:47:31 +0000 Subject: [Bugs] [Bug 1739424] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739424 Pranith Kumar K changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1739449 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1739449 [Bug 1739449] Disperse volume : data corruption with ftruncate data in 4+2 config -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 10:47:31 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:47:31 +0000 Subject: [Bugs] [Bug 1727081] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1727081 Pranith Kumar K changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1739449 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1739449 [Bug 1739449] Disperse volume : data corruption with ftruncate data in 4+2 config -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 10:47:31 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:47:31 +0000 Subject: [Bugs] [Bug 1732772] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732772 Pranith Kumar K changed: What |Removed |Added ---------------------------------------------------------------------------- Depends On| |1739449 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1739449 [Bug 1739449] Disperse volume : data corruption with ftruncate data in 4+2 config -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 10:47:31 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:47:31 +0000 Subject: [Bugs] [Bug 1732774] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732774 Pranith Kumar K changed: What |Removed |Added ---------------------------------------------------------------------------- Depends On| |1739449 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1739449 [Bug 1739449] Disperse volume : data corruption with ftruncate data in 4+2 config -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 10:47:31 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:47:31 +0000 Subject: [Bugs] [Bug 1732792] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732792 Pranith Kumar K changed: What |Removed |Added ---------------------------------------------------------------------------- Depends On| |1739449 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1739449 [Bug 1739449] Disperse volume : data corruption with ftruncate data in 4+2 config -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 10:48:31 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:48:31 +0000 Subject: [Bugs] [Bug 1739450] New: Open fd heal should filter O_APPEND/O_EXCL Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739450 Bug ID: 1739450 Summary: Open fd heal should filter O_APPEND/O_EXCL Product: GlusterFS Version: 6 Status: NEW Component: disperse Severity: medium Priority: medium Assignee: bugs at gluster.org Reporter: pkarampu at redhat.com CC: atumball at redhat.com, bugs at gluster.org Depends On: 1739426, 1733935 Blocks: 1734303, 1735514 Target Milestone: --- Classification: Community +++ This bug was initially created as a clone of Bug #1739426 +++ +++ This bug was initially created as a clone of Bug #1733935 +++ Description of problem: Problem: when a file needs to be re-opened O_APPEND and O_EXCL flags are not filtered in EC. - O_APPEND should be filtered because EC doesn't send O_APPEND below EC for open to make sure writes happen on the individual fragments instead of at the end of the file. - O_EXCL should be filtered because shd could have created the file so even without O_EXCL open should succeed. Fix: Filter out these two flags in reopen. Version-Release number of selected component (if applicable): How reproducible: Found while reading code. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: --- Additional comment from Amar Tumballi on 2019-07-30 05:25:19 UTC --- https://review.gluster.org/#/c/glusterfs/+/23121/ posted. --- Additional comment from Worker Ant on 2019-07-30 05:39:45 UTC --- REVIEW: https://review.gluster.org/23121 (cluster/ec: Fix reopen flags to avoid misbehavior) merged (#4) on master by Pranith Kumar Karampuri --- Additional comment from Worker Ant on 2019-08-09 10:10:36 UTC --- REVIEW: https://review.gluster.org/23191 (cluster/ec: Fix reopen flags to avoid misbehavior) posted (#1) for review on release-7 by Pranith Kumar Karampuri Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1733935 [Bug 1733935] Open fd heal should filter O_APPEND/O_EXCL https://bugzilla.redhat.com/show_bug.cgi?id=1734303 [Bug 1734303] Open fd heal should filter O_APPEND/O_EXCL https://bugzilla.redhat.com/show_bug.cgi?id=1735514 [Bug 1735514] Open fd heal should filter O_APPEND/O_EXCL https://bugzilla.redhat.com/show_bug.cgi?id=1739426 [Bug 1739426] Open fd heal should filter O_APPEND/O_EXCL -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 10:48:31 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:48:31 +0000 Subject: [Bugs] [Bug 1739426] Open fd heal should filter O_APPEND/O_EXCL In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739426 Pranith Kumar K changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1739450 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1739450 [Bug 1739450] Open fd heal should filter O_APPEND/O_EXCL -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 10:48:31 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:48:31 +0000 Subject: [Bugs] [Bug 1733935] Open fd heal should filter O_APPEND/O_EXCL In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1733935 Pranith Kumar K changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1739450 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1739450 [Bug 1739450] Open fd heal should filter O_APPEND/O_EXCL -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 10:48:31 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:48:31 +0000 Subject: [Bugs] [Bug 1734303] Open fd heal should filter O_APPEND/O_EXCL In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734303 Pranith Kumar K changed: What |Removed |Added ---------------------------------------------------------------------------- Depends On| |1739450 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1739450 [Bug 1739450] Open fd heal should filter O_APPEND/O_EXCL -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 10:48:31 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:48:31 +0000 Subject: [Bugs] [Bug 1735514] Open fd heal should filter O_APPEND/O_EXCL In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1735514 Pranith Kumar K changed: What |Removed |Added ---------------------------------------------------------------------------- Depends On| |1739450 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1739450 [Bug 1739450] Open fd heal should filter O_APPEND/O_EXCL -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 10:49:22 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:49:22 +0000 Subject: [Bugs] [Bug 1739451] New: An Input/Output error happens on a disperse volume when doing unaligned writes to a sparse file Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739451 Bug ID: 1739451 Summary: An Input/Output error happens on a disperse volume when doing unaligned writes to a sparse file Product: GlusterFS Version: 6 Status: NEW Component: disperse Keywords: Reopened Assignee: bugs at gluster.org Reporter: pkarampu at redhat.com CC: bugs at gluster.org, jahernan at redhat.com Depends On: 1730715, 1739427 Blocks: 1731448, 1732779 Target Milestone: --- Classification: Community +++ This bug was initially created as a clone of Bug #1739427 +++ +++ This bug was initially created as a clone of Bug #1730715 +++ Description of problem: When a write not aligned to the stripe size is done concurrently with other wirtes on a sparse file of a disperse volume, EIO error can be returned in some cases. Version-Release number of selected component (if applicable): mainline How reproducible: randomly Steps to Reproduce: 1. Create a disperse volume 2. Create an empty file 3. Write to two non-overlapping areas of the file with unaligned offsets Actual results: In some cases the write to the lower offset fails with EIO. Expected results: Both writes should succeed. Additional info: EC doesn't allow concurrent writes on overlapping areas, they are serialized. However non-overlapping writes are serviced in parallel. When a write is not aligned, EC first needs to read the entire chunk from disk, apply the modified fragment and write it again. Suppose we have a 4+2 disperse volume. The problem appears on sparse files because a write to an offset implicitly creates data on offsets below it. For example, if a file is empty and we read 10 bytes from offset 10, read() will return 0 bytes. Now, if we write one byte at offset 1M and retry the same read, the system call will return 10 bytes (all containing 0's). So if we have two writes, the first one at offset 10 and the second one at offset 1M, EC will send both in parallel because they do not overlap. However, the first one will try to read missing data from the first chunk (i.e. offset 0 to 9) to recombine the entire chunk and do the final write. This read will happen in parallel with the write to 1M. What could happen is that 3 bricks process the write before the read, and the other 3 process the read before the write. First 3 bricks will return 10 bytes, while the latest three will return 0 (because the file on the brick has not been expanded yet). When EC tries to recombine the answers from the bricks, it can't, because it needs at least 4 consistent answers to recover the data. So this read fails with EIO error. This error is propagated to the parent write, which is aborted and EIO is returned to the application. --- Additional comment from Worker Ant on 2019-07-17 12:58:56 UTC --- REVIEW: https://review.gluster.org/23066 (cluster/ec: fix EIO error for concurrent writes on sparse files) posted (#1) for review on master by Xavi Hernandez --- Additional comment from Worker Ant on 2019-07-24 10:20:48 UTC --- REVIEW: https://review.gluster.org/23066 (cluster/ec: fix EIO error for concurrent writes on sparse files) merged (#4) on master by Pranith Kumar Karampuri --- Additional comment from Worker Ant on 2019-07-27 06:41:19 UTC --- REVIEW: https://review.gluster.org/23113 (cluster/ec: fix EIO error for concurrent writes on sparse files) posted (#1) for review on release-6 by lidi --- Additional comment from Worker Ant on 2019-08-09 10:08:26 UTC --- REVIEW: https://review.gluster.org/23189 (cluster/ec: fix EIO error for concurrent writes on sparse files) posted (#1) for review on release-7 by Pranith Kumar Karampuri Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1730715 [Bug 1730715] An Input/Output error happens on a disperse volume when doing unaligned writes to a sparse file https://bugzilla.redhat.com/show_bug.cgi?id=1731448 [Bug 1731448] [GSS] An Input/Output error happens on a disperse volume when doing unaligned writes to a sparse file https://bugzilla.redhat.com/show_bug.cgi?id=1732779 [Bug 1732779] [GSS] An Input/Output error happens on a disperse volume when doing unaligned writes to a sparse file https://bugzilla.redhat.com/show_bug.cgi?id=1739427 [Bug 1739427] An Input/Output error happens on a disperse volume when doing unaligned writes to a sparse file -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 10:49:22 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:49:22 +0000 Subject: [Bugs] [Bug 1730715] An Input/Output error happens on a disperse volume when doing unaligned writes to a sparse file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1730715 Pranith Kumar K changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1739451 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1739451 [Bug 1739451] An Input/Output error happens on a disperse volume when doing unaligned writes to a sparse file -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 10:49:22 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:49:22 +0000 Subject: [Bugs] [Bug 1739427] An Input/Output error happens on a disperse volume when doing unaligned writes to a sparse file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739427 Pranith Kumar K changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1739451 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1739451 [Bug 1739451] An Input/Output error happens on a disperse volume when doing unaligned writes to a sparse file -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 10:49:22 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:49:22 +0000 Subject: [Bugs] [Bug 1731448] [GSS] An Input/Output error happens on a disperse volume when doing unaligned writes to a sparse file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1731448 Pranith Kumar K changed: What |Removed |Added ---------------------------------------------------------------------------- Depends On| |1739451 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1739451 [Bug 1739451] An Input/Output error happens on a disperse volume when doing unaligned writes to a sparse file -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 10:49:22 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 10:49:22 +0000 Subject: [Bugs] [Bug 1732779] [GSS] An Input/Output error happens on a disperse volume when doing unaligned writes to a sparse file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732779 Pranith Kumar K changed: What |Removed |Added ---------------------------------------------------------------------------- Depends On| |1739451 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1739451 [Bug 1739451] An Input/Output error happens on a disperse volume when doing unaligned writes to a sparse file -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 11:26:11 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 11:26:11 +0000 Subject: [Bugs] [Bug 1730715] An Input/Output error happens on a disperse volume when doing unaligned writes to a sparse file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1730715 --- Comment #4 from Worker Ant --- REVISION POSTED: https://review.gluster.org/23113 (cluster/ec: fix EIO error for concurrent writes on sparse files) posted (#2) for review on release-6 by Pranith Kumar Karampuri -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 11:26:12 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 11:26:12 +0000 Subject: [Bugs] [Bug 1730715] An Input/Output error happens on a disperse volume when doing unaligned writes to a sparse file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1730715 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID|Gluster.org Gerrit 23113 | -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 11:26:13 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 11:26:13 +0000 Subject: [Bugs] [Bug 1739451] An Input/Output error happens on a disperse volume when doing unaligned writes to a sparse file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739451 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23113 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 11:26:14 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 11:26:14 +0000 Subject: [Bugs] [Bug 1739451] An Input/Output error happens on a disperse volume when doing unaligned writes to a sparse file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739451 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23113 (cluster/ec: fix EIO error for concurrent writes on sparse files) posted (#2) for review on release-6 by Pranith Kumar Karampuri -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 11:27:18 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 11:27:18 +0000 Subject: [Bugs] [Bug 1739446] [Disperse] : Client side heal is not removing dirty flag for some of the files. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739446 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23199 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 11:27:19 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 11:27:19 +0000 Subject: [Bugs] [Bug 1739446] [Disperse] : Client side heal is not removing dirty flag for some of the files. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739446 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23199 (cluster/ec: Prevent double pre-op xattrops) posted (#1) for review on release-6 by Pranith Kumar Karampuri -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 11:28:23 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 11:28:23 +0000 Subject: [Bugs] [Bug 1739449] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739449 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23200 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 11:28:24 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 11:28:24 +0000 Subject: [Bugs] [Bug 1739449] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739449 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23200 (cluster/ec: inherit healing from lock when it has info) posted (#1) for review on release-6 by Pranith Kumar Karampuri -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 11:29:29 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 11:29:29 +0000 Subject: [Bugs] [Bug 1739449] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739449 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23201 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 11:29:30 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 11:29:30 +0000 Subject: [Bugs] [Bug 1739449] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739449 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23201 (cluster/ec: Always read from good-mask) posted (#1) for review on release-6 by Pranith Kumar Karampuri -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 11:30:36 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 11:30:36 +0000 Subject: [Bugs] [Bug 1739450] Open fd heal should filter O_APPEND/O_EXCL In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739450 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23202 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 11:30:37 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 11:30:37 +0000 Subject: [Bugs] [Bug 1739450] Open fd heal should filter O_APPEND/O_EXCL In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739450 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23202 (cluster/ec: Fix reopen flags to avoid misbehavior) posted (#1) for review on release-6 by Pranith Kumar Karampuri -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 11:31:42 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 11:31:42 +0000 Subject: [Bugs] [Bug 1739449] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739449 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23203 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 11:31:43 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 11:31:43 +0000 Subject: [Bugs] [Bug 1739449] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739449 --- Comment #3 from Worker Ant --- REVIEW: https://review.gluster.org/23203 (cluster/ec: Update lock->good_mask on parent fop failure) posted (#1) for review on release-6 by Pranith Kumar Karampuri -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 11:36:23 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 11:36:23 +0000 Subject: [Bugs] [Bug 1739334] Multiple disconnect events being propagated for the same child In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739334 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-09 11:36:23 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23179 (protocol/client: propagte GF_EVENT_CHILD_PING only for connections to brick) merged (#1) on release-7 by Ravishankar N -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 11:41:27 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 11:41:27 +0000 Subject: [Bugs] [Bug 1726294] DHT: severe memory leak in dht rename In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1726294 Bug 1726294 depends on bug 1739337, which changed state. Bug 1739337 Summary: DHT: severe memory leak in dht rename https://bugzilla.redhat.com/show_bug.cgi?id=1739337 What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 11:41:26 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 11:41:26 +0000 Subject: [Bugs] [Bug 1739337] DHT: severe memory leak in dht rename In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739337 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-09 11:41:26 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23182 (cluster/dht: Fixed a memleak in dht_rename_cbk) merged (#2) on release-7 by Rinku Kothiya -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 9 13:32:44 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 13:32:44 +0000 Subject: [Bugs] [Bug 1739399] [Ganesha]: truncate operation not updating the ctime In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739399 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-09 13:32:44 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23186 (posix : add posix_set_ctime() in posix_ftruncate()) merged (#2) on release-7 by Rinku Kothiya -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 13:45:48 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 13:45:48 +0000 Subject: [Bugs] [Bug 1737705] ctime: nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737705 Sunil Kumar Acharya changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |sheggodu at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 14:16:42 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 14:16:42 +0000 Subject: [Bugs] [Bug 1737705] ctime: nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737705 Sunil Kumar Acharya changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |MODIFIED -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 16:23:48 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 16:23:48 +0000 Subject: [Bugs] [Bug 1731448] [GSS] An Input/Output error happens on a disperse volume when doing unaligned writes to a sparse file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1731448 errata-xmlrpc changed: What |Removed |Added ---------------------------------------------------------------------------- Status|MODIFIED |ON_QA -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 16:23:48 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 16:23:48 +0000 Subject: [Bugs] [Bug 1732770] fix truncate lock to cover the write in tuncate clean In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732770 errata-xmlrpc changed: What |Removed |Added ---------------------------------------------------------------------------- Status|MODIFIED |ON_QA -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 16:23:48 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 16:23:48 +0000 Subject: [Bugs] [Bug 1732774] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732774 errata-xmlrpc changed: What |Removed |Added ---------------------------------------------------------------------------- Status|MODIFIED |ON_QA -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 16:23:52 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 16:23:52 +0000 Subject: [Bugs] [Bug 1732793] I/O error on writes to a disperse volume when replace-brick is executed In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732793 errata-xmlrpc changed: What |Removed |Added ---------------------------------------------------------------------------- Status|MODIFIED |ON_QA -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 16:23:52 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 16:23:52 +0000 Subject: [Bugs] [Bug 1732792] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732792 errata-xmlrpc changed: What |Removed |Added ---------------------------------------------------------------------------- Status|MODIFIED |ON_QA -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 16:23:55 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 16:23:55 +0000 Subject: [Bugs] [Bug 1734305] ctime: When healing ctime xattr for legacy files, if multiple clients access and modify the same file, the ctime might be updated incorrectly. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734305 errata-xmlrpc changed: What |Removed |Added ---------------------------------------------------------------------------- Status|MODIFIED |ON_QA -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 16:23:59 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 16:23:59 +0000 Subject: [Bugs] [Bug 1735514] Open fd heal should filter O_APPEND/O_EXCL In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1735514 errata-xmlrpc changed: What |Removed |Added ---------------------------------------------------------------------------- Status|MODIFIED |ON_QA -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 16:23:59 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 16:23:59 +0000 Subject: [Bugs] [Bug 1737705] ctime: nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737705 errata-xmlrpc changed: What |Removed |Added ---------------------------------------------------------------------------- Status|MODIFIED |ON_QA -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 16:34:02 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 16:34:02 +0000 Subject: [Bugs] [Bug 1732774] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732774 Sunil Kumar Acharya changed: What |Removed |Added ---------------------------------------------------------------------------- Fixed In Version| |glusterfs-6.0-12 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 16:34:03 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 16:34:03 +0000 Subject: [Bugs] [Bug 1737705] ctime: nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737705 Sunil Kumar Acharya changed: What |Removed |Added ---------------------------------------------------------------------------- Fixed In Version| |glusterfs-6.0-12 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 16:34:03 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 16:34:03 +0000 Subject: [Bugs] [Bug 1735514] Open fd heal should filter O_APPEND/O_EXCL In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1735514 Sunil Kumar Acharya changed: What |Removed |Added ---------------------------------------------------------------------------- Fixed In Version| |glusterfs-6.0-12 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 16:34:05 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 16:34:05 +0000 Subject: [Bugs] [Bug 1732793] I/O error on writes to a disperse volume when replace-brick is executed In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732793 Sunil Kumar Acharya changed: What |Removed |Added ---------------------------------------------------------------------------- Fixed In Version| |glusterfs-6.0-12 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 16:34:05 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 16:34:05 +0000 Subject: [Bugs] [Bug 1732770] fix truncate lock to cover the write in tuncate clean In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732770 Sunil Kumar Acharya changed: What |Removed |Added ---------------------------------------------------------------------------- Fixed In Version| |glusterfs-6.0-12 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 16:34:08 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 16:34:08 +0000 Subject: [Bugs] [Bug 1732792] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732792 Sunil Kumar Acharya changed: What |Removed |Added ---------------------------------------------------------------------------- Fixed In Version| |glusterfs-6.0-12 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 16:34:09 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 16:34:09 +0000 Subject: [Bugs] [Bug 1731448] [GSS] An Input/Output error happens on a disperse volume when doing unaligned writes to a sparse file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1731448 Sunil Kumar Acharya changed: What |Removed |Added ---------------------------------------------------------------------------- Fixed In Version| |glusterfs-6.0-12 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 9 16:34:12 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 09 Aug 2019 16:34:12 +0000 Subject: [Bugs] [Bug 1734305] ctime: When healing ctime xattr for legacy files, if multiple clients access and modify the same file, the ctime might be updated incorrectly. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734305 Sunil Kumar Acharya changed: What |Removed |Added ---------------------------------------------------------------------------- Fixed In Version| |glusterfs-6.0-12 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Sun Aug 11 04:31:46 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Sun, 11 Aug 2019 04:31:46 +0000 Subject: [Bugs] [Bug 1716979] Multiple disconnect events being propagated for the same child In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1716979 --- Comment #7 from Amgad --- Does that mean, it's not yet in 6.x or 5.x? When is the release with the fix is due? -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Sun Aug 11 04:34:04 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Sun, 11 Aug 2019 04:34:04 +0000 Subject: [Bugs] [Bug 1739320] The result (hostname) of getnameinfo for all bricks (ipv6 addresses) are the same, while they are not. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739320 --- Comment #1 from Amgad --- Any response? -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Sun Aug 11 05:29:16 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Sun, 11 Aug 2019 05:29:16 +0000 Subject: [Bugs] [Bug 1716979] Multiple disconnect events being propagated for the same child In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1716979 Ravishankar N changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |hgowtham at redhat.com Flags| |needinfo?(hgowtham at redhat.c | |om) --- Comment #8 from Ravishankar N --- Yes, that is correct. The release schedule is at https://www.gluster.org/release-schedule/. I'm not sure of the dates, Hari should be able to tell you if it is valid. I'm adding a need-info on him. That said, Amgad, could you explain why this bug is blocking your deployments? I do not see this as a blocker. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Sun Aug 11 15:55:35 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Sun, 11 Aug 2019 15:55:35 +0000 Subject: [Bugs] [Bug 1739884] New: glusterfsd process crashes with SIGSEGV Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739884 Bug ID: 1739884 Summary: glusterfsd process crashes with SIGSEGV Product: GlusterFS Version: 6 Hardware: x86_64 OS: Linux Status: NEW Component: transport Severity: high Assignee: bugs at gluster.org Reporter: cfeller at gmail.com CC: bugs at gluster.org Target Milestone: --- Classification: Community Description of problem: glusterfsd process crashes with SIGSEGV. A glusterfsd process crashed during an normal workload. Version-Release number of selected component (if applicable): glusterfs-6.4-1.el7.x86_64 glusterfs-server-6.4-1.el7.x86_64 How reproducible: Seldom, but twice in less than 24 hours. This Gluster setup had been running reliably for several weeks, but crashed twice for the same reason in less than 24 hours. (I captured the data on the first crash, but it crashed a second time before I created the bug report on the first one.) Steps to Reproduce: ? Additional info: This is a two node cluster, behind the same switch connected via fiber SFPs, 10GE. First crash: ########################### # journalctl -u glusterd -- Logs begin at Thu 2019-08-08 05:48:10 PDT, end at Thu 2019-08-08 17:40:01 PDT. -- Aug 08 05:48:31 gluster00 systemd[1]: Starting GlusterFS, a clustered file-system server... Aug 08 05:48:31 gluster00 systemd[1]: Started GlusterFS, a clustered file-system server. Aug 08 17:05:00 gluster00 export-brick0-srv[6563]: pending frames: Aug 08 17:05:00 gluster00 export-brick0-srv[6563]: frame : type(1) op(XATTROP) Aug 08 17:05:00 gluster00 export-brick0-srv[6563]: frame : type(1) op(INODELK) Aug 08 17:05:00 gluster00 export-brick0-srv[6563]: frame : type(1) op(XATTROP) Aug 08 17:05:00 gluster00 export-brick0-srv[6563]: frame : type(1) op(FXATTROP) Aug 08 17:05:00 gluster00 export-brick0-srv[6563]: frame : type(1) op(TRUNCATE) Aug 08 17:05:00 gluster00 export-brick0-srv[6563]: patchset: git://git.gluster.org/glusterfs.git Aug 08 17:05:00 gluster00 export-brick0-srv[6563]: signal received: 11 Aug 08 17:05:00 gluster00 export-brick0-srv[6563]: time of crash: Aug 08 17:05:00 gluster00 export-brick0-srv[6563]: 2019-08-09 00:05:00 Aug 08 17:05:00 gluster00 export-brick0-srv[6563]: configuration details: Aug 08 17:05:00 gluster00 export-brick0-srv[6563]: argp 1 Aug 08 17:05:00 gluster00 export-brick0-srv[6563]: backtrace 1 Aug 08 17:05:00 gluster00 export-brick0-srv[6563]: dlfcn 1 Aug 08 17:05:00 gluster00 export-brick0-srv[6563]: libpthread 1 Aug 08 17:05:00 gluster00 export-brick0-srv[6563]: llistxattr 1 Aug 08 17:05:00 gluster00 export-brick0-srv[6563]: setfsid 1 Aug 08 17:05:00 gluster00 export-brick0-srv[6563]: spinlock 1 Aug 08 17:05:00 gluster00 export-brick0-srv[6563]: epoll.h 1 Aug 08 17:05:00 gluster00 export-brick0-srv[6563]: xattr.h 1 Aug 08 17:05:00 gluster00 export-brick0-srv[6563]: st_atim.tv_nsec 1 Aug 08 17:05:00 gluster00 export-brick0-srv[6563]: package-string: glusterfs 6.4 Aug 08 17:05:00 gluster00 export-brick0-srv[6563]: --------- ########################### # log from first node [2019-08-09 00:04:58.959057] I [MSGID: 115036] [server.c:499:server_rpc_notify] 0-gv0-server: disconnecting connection from CTX_ID:783468e6-b1c9-461d-861a-469c4aba45a6-GRAPH_ID:0-PID:30955-HOST:gluster01-PC_NAME:gv0-client-0-RECON_NO:-0 [2019-08-09 00:04:58.959250] I [MSGID: 101055] [client_t.c:436:gf_client_unref] 0-gv0-server: Shutting down connection CTX_ID:783468e6-b1c9-461d-861a-469c4aba45a6-GRAPH_ID:0-PID:30955-HOST:gluster01-PC_NAME:gv0-client-0-RECON_NO:-0 [2019-08-09 00:04:58.992241] I [addr.c:54:compare_addr_and_update] 0-/export/brick0/srv: allowed = "*", received addr = "192.168.0.21" [2019-08-09 00:04:58.992294] I [login.c:110:gf_auth] 0-auth/login: allowed user names: ad85e1b1-89f4-44ba-b098-b941f0b0a0bb [2019-08-09 00:04:58.992304] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gv0-server: accepted client from CTX_ID:f0c57ea3-fd1d-433a-985c-e6e3dfa014f1-GRAPH_ID:0-PID:30974-HOST:gluster01-PC_NAME:gv0-client-0-RECON_NO:-0 (version: 6.4) with subvol /export/brick0/srv [2019-08-09 00:05:00.953515] E [MSGID: 101064] [event-epoll.c:618:event_dispatch_epoll_handler] 0-epoll: generation mismatch on idx=5, gen=9316, slot->gen=9317, slot->fd=19 [2019-08-09 00:05:00.970841] E [socket.c:1303:socket_event_poll_err] (-->/lib64/libglusterfs.so.0(+0x8b4d6) [0x7fb437d884d6] -->/usr/lib64/glusterfs/6.4/rpc-transport/socket.so(+0xa48a) [0x7fb42c0e848a] -->/usr/lib64/glusterfs/6.4/rpc-transport/socket.so(+0x81fc) [0x7fb42c0e61fc] ) 0-socket: invalid argument: this->private [Invalid argument] pending frames: frame : type(1) op(XATTROP) frame : type(1) op(INODELK) frame : type(1) op(XATTROP) frame : type(1) op(FXATTROP) frame : type(1) op(TRUNCATE) patchset: git://git.gluster.org/glusterfs.git signal received: 11 time of crash: 2019-08-09 00:05:00 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 6.4 /lib64/libglusterfs.so.0(+0x26e00)[0x7fb437d23e00] /lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7fb437d2e804] /lib64/libc.so.6(+0x36340)[0x7fb436363340] /usr/lib64/glusterfs/6.4/rpc-transport/socket.so(+0xa4cc)[0x7fb42c0e84cc] /lib64/libglusterfs.so.0(+0x8b4d6)[0x7fb437d884d6] /lib64/libpthread.so.0(+0x7dd5)[0x7fb436b63dd5] /lib64/libc.so.6(clone+0x6d)[0x7fb43642b02d] --------- # log from second node (crashed ~5 minutes later) [2019-08-09 00:09:34.882722] I [MSGID: 115036] [server.c:499:server_rpc_notify] 0-gv0-server: disconnecting connection from CTX_ID:3e3f26ce-f682-4631-a058-11fd08414c81-GRAPH_ID:0-PID:31527-HOST:gluster01-PC_NAME:gv0-client-1-RECON_NO:-0 [2019-08-09 00:09:34.882878] I [MSGID: 101055] [client_t.c:436:gf_client_unref] 0-gv0-server: Shutting down connection CTX_ID:3e3f26ce-f682-4631-a058-11fd08414c81-GRAPH_ID:0-PID:31527-HOST:gluster01-PC_NAME:gv0-client-1-RECON_NO:-0 [2019-08-09 00:09:39.916899] I [addr.c:54:compare_addr_and_update] 0-/export/brick0/srv: allowed = "*", received addr = "192.168.0.21" [2019-08-09 00:09:39.916929] I [login.c:110:gf_auth] 0-auth/login: allowed user names: ad85e1b1-89f4-44ba-b098-b941f0b0a0bb [2019-08-09 00:09:39.916946] I [MSGID: 115029] [server-handshake.c:550:server_setvolume] 0-gv0-server: accepted client from CTX_ID:1401c14a-b2f7-421a-89a1-9acfdaffeda0-GRAPH_ID:0-PID:31660-HOST:gluster01-PC_NAME:gv0-client-1-RECON_NO:-0 (version: 6.4) with subvol /export/brick0/srv pending frames: frame : type(1) op(LOOKUP) frame : type(1) op(OPEN) frame : type(1) op(READ) patchset: git://git.gluster.org/glusterfs.git signal received: 11 time of crash: 2019-08-09 00:10:34 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 6.4 /lib64/libglusterfs.so.0(+0x26e00)[0x7f50f2b20e00] /lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f50f2b2b804] /lib64/libc.so.6(+0x36340)[0x7f50f1160340] /usr/lib64/glusterfs/6.4/rpc-transport/socket.so(+0xa4cc)[0x7f50e6ee54cc] /lib64/libglusterfs.so.0(+0x8b4d6)[0x7f50f2b854d6] /lib64/libpthread.so.0(+0x7dd5)[0x7f50f1960dd5] /lib64/libc.so.6(clone+0x6d)[0x7f50f122802d] --------- ########################### # backtrace in core dump on first node: (gdb) bt #0 0x00007fb42c0e84cc in socket_event_handler () from /usr/lib64/glusterfs/6.4/rpc-transport/socket.so #1 0x00007fb437d884d6 in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0 #2 0x00007fb436b63dd5 in start_thread (arg=0x7fb413fff700) at pthread_create.c:307 #3 0x00007fb43642b02d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 (gdb) ########################### # backtrace in core dump on second node: (gdb) bt #0 0x00007f50e6ee54cc in socket_event_handler () from /usr/lib64/glusterfs/6.4/rpc-transport/socket.so #1 0x00007f50f2b854d6 in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0 #2 0x00007f50f1960dd5 in start_thread (arg=0x7f50dd365700) at pthread_create.c:307 #3 0x00007f50f122802d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 (gdb) -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Sun Aug 11 16:09:31 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Sun, 11 Aug 2019 16:09:31 +0000 Subject: [Bugs] [Bug 1739884] glusterfsd process crashes with SIGSEGV In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739884 --- Comment #1 from Chad Feller --- Less than 24 hours later, a second crash nearly identical with the exception that this time around, the glusterfsd process on both nodes crashed within a few seconds of each other. I had just added four new bricks and was about 8 hours into a rebalance when it crashed: ########################### # log from first node: [2019-08-09 17:06:48.926301] I [MSGID: 115036] [server.c:499:server_rpc_notify] 0-gv0-server: disconnecting connection from CTX_ID:b36e17ac-62e5-4171-bfdc-1707aa5ec5a9-GRAPH_ID:0-PID:31299-HOST:gluster01-PC_NAME:gv0-client-2-RECON_NO:-0 [2019-08-09 17:06:48.926480] I [MSGID: 101055] [client_t.c:436:gf_client_unref] 0-gv0-server: Shutting down connection CTX_ID:b36e17ac-62e5-4171-bfdc-1707aa5ec5a9-GRAPH_ID:0-PID:31299-HOST:gluster01-PC_NAME:gv0-client-2-RECON_NO:-0 [2019-08-09 17:06:48.956140] I [dict.c:541:dict_get] (-->/usr/lib64/glusterfs/6.4/xlator/features/worm.so(+0x7241) [0x7f5f80443241] -->/usr/lib64/glusterfs/6.4/xlator/features/locks.so(+0x1c219) [0x7f5f80669219] -->/lib64/libglusterfs.so.0(dict_get+0x94) [0x7f5f900860b4] ) 0-dict: !this || key=trusted.glusterfs.enforce-mandatory-lock [Invalid argument] [2019-08-09 17:06:49.038307] I [dict.c:541:dict_get] (-->/lib64/libglusterfs.so.0(default_fremovexattr+0xe7) [0x7f5f901251f7] -->/usr/lib64/glusterfs/6.4/xlator/features/locks.so(+0x1d5dd) [0x7f5f8066a5dd] -->/lib64/libglusterfs.so.0(dict_get+0x94) [0x7f5f900860b4] ) 0-dict: !this || key=trusted.glusterfs.enforce-mandatory-lock [Invalid argument] [2019-08-09 17:06:49.099221] I [dict.c:541:dict_get] (-->/usr/lib64/glusterfs/6.4/xlator/features/worm.so(+0x7241) [0x7f5f80443241] -->/usr/lib64/glusterfs/6.4/xlator/features/locks.so(+0x1c219) [0x7f5f80669219] -->/lib64/libglusterfs.so.0(dict_get+0x94) [0x7f5f900860b4] ) 0-dict: !this || key=trusted.glusterfs.enforce-mandatory-lock [Invalid argument] pending frames: frame : type(1) op(INODELK) patchset: git://git.gluster.org/glusterfs.git signal received: 11 time of crash: 2019-08-09 17:06:49 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 6.4 /lib64/libglusterfs.so.0(+0x26e00)[0x7f5f90091e00] /lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f5f9009c804] /lib64/libc.so.6(+0x36340)[0x7f5f8e6d1340] /usr/lib64/glusterfs/6.4/rpc-transport/socket.so(+0xa4cc)[0x7f5f844564cc] /lib64/libglusterfs.so.0(+0x8b4d6)[0x7f5f900f64d6] /lib64/libpthread.so.0(+0x7dd5)[0x7f5f8eed1dd5] /lib64/libc.so.6(clone+0x6d)[0x7f5f8e79902d] --------- ########################### # log from second node: [2019-08-09 17:06:48.926010] I [MSGID: 115036] [server.c:499:server_rpc_notify] 0-gv0-server: disconnecting connection from CTX_ID:b36e17ac-62e5-4171-bfdc-1707aa5ec5a9-GRAPH_ID:0-PID:31299-HOST:gluster01-PC_NAME:gv0-client-3-RECON_NO:-0 [2019-08-09 17:06:48.926231] I [MSGID: 101055] [client_t.c:436:gf_client_unref] 0-gv0-server: Shutting down connection CTX_ID:b36e17ac-62e5-4171-bfdc-1707aa5ec5a9-GRAPH_ID:0-PID:31299-HOST:gluster01-PC_NAME:gv0-client-3-RECON_NO:-0 [2019-08-09 17:06:48.955802] I [dict.c:541:dict_get] (-->/usr/lib64/glusterfs/6.4/xlator/features/worm.so(+0x7241) [0x7f47b8c64241] -->/usr/lib64/glusterfs/6.4/xlator/features/locks.so(+0x1c219) [0x7f47b8e8a219] -->/lib64/libglusterfs.so.0(dict_get+0x94) [0x7f47c88a70b4] ) 0-dict: !this || key=trusted.glusterfs.enforce-mandatory-lock [Invalid argument] [2019-08-09 17:06:49.038024] I [dict.c:541:dict_get] (-->/lib64/libglusterfs.so.0(default_fremovexattr+0xe7) [0x7f47c89461f7] -->/usr/lib64/glusterfs/6.4/xlator/features/locks.so(+0x1d5dd) [0x7f47b8e8b5dd] -->/lib64/libglusterfs.so.0(dict_get+0x94) [0x7f47c88a70b4] ) 0-dict: !this || key=trusted.glusterfs.enforce-mandatory-lock [Invalid argument] [2019-08-09 17:06:49.098930] I [dict.c:541:dict_get] (-->/usr/lib64/glusterfs/6.4/xlator/features/worm.so(+0x7241) [0x7f47b8c64241] -->/usr/lib64/glusterfs/6.4/xlator/features/locks.so(+0x1c219) [0x7f47b8e8a219] -->/lib64/libglusterfs.so.0(dict_get+0x94) [0x7f47c88a70b4] ) 0-dict: !this || key=trusted.glusterfs.enforce-mandatory-lock [Invalid argument] [2019-08-09 17:06:50.042220] I [MSGID: 115072] [server-rpc-fops_v2.c:1681:server4_setattr_cbk] 0-gv0-server: 349203: SETATTR /munin/var/lib (418e15b7-70d5-4f11-86ac-c1d8aeb6aa12), client: CTX_ID:d6150421-3ac2-4dbc-a665-d619f13dbb15-GRAPH_ID:0-PID:538-HOST:munin-PC_NAME:gv0-client-3-RECON_NO:-0, error-xlator: gv0-access-control [Operation not permitted] [2019-08-09 17:06:50.962416] I [MSGID: 115072] [server-rpc-fops_v2.c:1681:server4_setattr_cbk] 0-gv0-server: 349225: SETATTR /munin/var/lib (418e15b7-70d5-4f11-86ac-c1d8aeb6aa12), client: CTX_ID:d6150421-3ac2-4dbc-a665-d619f13dbb15-GRAPH_ID:0-PID:538-HOST:munin-PC_NAME:gv0-client-3-RECON_NO:-0, error-xlator: gv0-access-control [Operation not permitted] [2019-08-09 17:06:52.073700] I [dict.c:541:dict_get] (-->/lib64/libglusterfs.so.0(default_fremovexattr+0xe7) [0x7f47c89461f7] -->/usr/lib64/glusterfs/6.4/xlator/features/locks.so(+0x1d5dd) [0x7f47b8e8b5dd] -->/lib64/libglusterfs.so.0(dict_get+0x94) [0x7f47c88a70b4] ) 0-dict: !this || key=trusted.glusterfs.enforce-mandatory-lock [Invalid argument] [2019-08-09 17:06:52.222678] I [dict.c:541:dict_get] (-->/usr/lib64/glusterfs/6.4/xlator/features/worm.so(+0x7241) [0x7f47b8c64241] -->/usr/lib64/glusterfs/6.4/xlator/features/locks.so(+0x1c219) [0x7f47b8e8a219] -->/lib64/libglusterfs.so.0(dict_get+0x94) [0x7f47c88a70b4] ) 0-dict: !this || key=trusted.glusterfs.enforce-mandatory-lock [Invalid argument] pending frames: patchset: git://git.gluster.org/glusterfs.git signal received: 11 time of crash: 2019-08-09 17:06:52 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 6.4 /lib64/libglusterfs.so.0(+0x26e00)[0x7f47c88b2e00] /lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f47c88bd804] /lib64/libc.so.6(+0x36340)[0x7f47c6ef2340] /usr/lib64/glusterfs/6.4/rpc-transport/socket.so(+0xa4cc)[0x7f47bcc774cc] /lib64/libglusterfs.so.0(+0x8b4d6)[0x7f47c89174d6] /lib64/libpthread.so.0(+0x7dd5)[0x7f47c76f2dd5] /lib64/libc.so.6(clone+0x6d)[0x7f47c6fba02d] --------- ########################### # backtrace in core dump on first node: (gdb) bt #0 0x00007f5f844564cc in socket_event_handler () from /usr/lib64/glusterfs/6.4/rpc-transport/socket.so #1 0x00007f5f900f64d6 in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0 #2 0x00007f5f8eed1dd5 in start_thread (arg=0x7f5f66ffd700) at pthread_create.c:307 #3 0x00007f5f8e79902d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 (gdb) ########################### # backtrace in core dump on second node: (gdb) bt #0 0x00007f47bcc774cc in socket_event_handler () from /usr/lib64/glusterfs/6.4/rpc-transport/socket.so #1 0x00007f47c89174d6 in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0 #2 0x00007f47c76f2dd5 in start_thread (arg=0x7f47b1053700) at pthread_create.c:307 #3 0x00007f47c6fba02d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 (gdb) -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Sun Aug 11 19:21:14 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Sun, 11 Aug 2019 19:21:14 +0000 Subject: [Bugs] [Bug 1716979] Multiple disconnect events being propagated for the same child In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1716979 --- Comment #9 from Amgad --- Thanks Ravi. The link is showing the initial release date and maintenance (30th). Does that mean, 6.5-1 will include the fix coming on August 30th? The bug is blocking because the impact showed during testing is very serious! Ravi: Would you kindly provide a feedback for Bug 1739320 as well? -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 12 04:21:10 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 12 Aug 2019 04:21:10 +0000 Subject: [Bugs] [Bug 1739320] The result (hostname) of getnameinfo for all bricks (ipv6 addresses) are the same, while they are not. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739320 Ravishankar N changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |amukherj at redhat.com --- Comment #2 from Ravishankar N --- CC'ing glusterd maintainer to take a look. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 12 05:13:28 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 12 Aug 2019 05:13:28 +0000 Subject: [Bugs] [Bug 1726205] Windows client fails to copy large file to GlusterFS volume share with fruit and streams_xattr VFS modules via Samba In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1726205 Anoop C S changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED Assignee|spamecha at redhat.com |anoopcs at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 12 05:13:47 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 12 Aug 2019 05:13:47 +0000 Subject: [Bugs] [Bug 1726205] Windows client fails to copy large file to GlusterFS volume share with fruit and streams_xattr VFS modules via Samba In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1726205 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23206 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 12 05:13:49 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 12 Aug 2019 05:13:49 +0000 Subject: [Bugs] [Bug 1726205] Windows client fails to copy large file to GlusterFS volume share with fruit and streams_xattr VFS modules via Samba In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1726205 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |POST --- Comment #3 from Worker Ant --- REVIEW: https://review.gluster.org/23206 (performance/md-cache: Do not skip caching of null character xattr values) posted (#1) for review on master by Anoop C S -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 12 05:27:55 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 12 Aug 2019 05:27:55 +0000 Subject: [Bugs] [Bug 1738878] FUSE client's memory leak In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1738878 Nithya Balachandran changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |nbalacha at redhat.com Flags| |needinfo?(s.pleshkov at hostco | |.ru) --- Comment #3 from Nithya Balachandran --- (In reply to Sergey Pleshkov from comment #2) > Server and Client OS - Red Hat Enterprise Linux Server release 7.6 (Maipo) / > Red Hat Enterprise Linux Server release 7.5 (Maipo) > When client had gluster client from RH repo - 3.12 vers - situation was the > same > > if it isn't version bug, would you have suggestions what is could be ? Which > gluster volume options check and so on Please provide the gluster volume info for this volume. Do you have any script/steps we can use to reproduce the leak? -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 12 05:32:19 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 12 Aug 2019 05:32:19 +0000 Subject: [Bugs] [Bug 1738878] FUSE client's memory leak In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1738878 Sergey Pleshkov changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(s.pleshkov at hostco | |.ru) | --- Comment #4 from Sergey Pleshkov --- [root at LSY-GL-01 host]# gluster volume info PROD Volume Name: PROD Type: Replicate Volume ID: f54a0ce9-d2ec-4d44-a1f8-c53cf1c49a52 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: lsy-gl-01:/diskForData/prod Brick2: lsy-gl-02:/diskForData/prod Brick3: lsy-gl-03:/diskForData/prod Options Reconfigured: performance.readdir-ahead: off client.event-threads: 24 server.event-threads: 24 server.allow-insecure: on features.shard-block-size: 64MB features.shard: on network.ping-timeout: 5 transport.address-family: inet nfs.disable: on performance.client-io-threads: off performance.io-thread-count: 24 cluster.heal-timeout: 120 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 12 05:35:50 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 12 Aug 2019 05:35:50 +0000 Subject: [Bugs] [Bug 1734027] glusterd 6.4 memory leaks 2-3 GB per 24h (OOM) In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734027 Nithya Balachandran changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(totalworlddominat | |ion at gmail.com) --- Comment #5 from Nithya Balachandran --- (In reply to Alex from comment #4) > GLUSTERD version affected: 6.4 > > Hi, > I've only mentioned 3.12 for the background, but if you read further you'll > see this is a bug on 6.4. > Thanks for reopening this. Please provide the following: 1. gluster volume info 2. ps of the process that is consuming memory 3. statedump of the the process that is consuming memory by doing the following: kill -SIGUSR1 The statedump will be created in /var/run/gluster. This directory must exist - please create it if it does not. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 12 05:36:26 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 12 Aug 2019 05:36:26 +0000 Subject: [Bugs] [Bug 1738878] FUSE client's memory leak In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1738878 --- Comment #5 from Sergey Pleshkov --- This problem arose on one production client, so I can?t immediately check which steps can repeat the problem without interrupting business processes. I will try to reproduce the behavior on the test cluster and let you know. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 12 06:06:31 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 12 Aug 2019 06:06:31 +0000 Subject: [Bugs] [Bug 1740017] New: tests/bugs/replicate/bug-880898.t created a core file. Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1740017 Bug ID: 1740017 Summary: tests/bugs/replicate/bug-880898.t created a core file. Product: GlusterFS Version: mainline Status: NEW Component: tests Assignee: bugs at gluster.org Reporter: ravishankar at redhat.com CC: bugs at gluster.org Target Milestone: --- Classification: Community Description of problem: Problem: https://build.gluster.org/job/centos7-regression/7337/consoleFull indicates the shd crashing for this .t. On looking at the core, I see the crash is at the time of shd init and glusterfs context is null: (gdb) bt (gdb) p ctx $2 = (glusterfs_ctx_t *) 0xf00000000 The .t is killing all gluster processes immediately after volume start, so it looks like a race between shd coming up and it being killed. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 12 06:07:29 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 12 Aug 2019 06:07:29 +0000 Subject: [Bugs] [Bug 1740017] tests/bugs/replicate/bug-880898.t created a core file. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1740017 Ravishankar N changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |Triaged Status|NEW |ASSIGNED Assignee|bugs at gluster.org |ravishankar at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 12 06:09:50 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 12 Aug 2019 06:09:50 +0000 Subject: [Bugs] [Bug 1740017] tests/bugs/replicate/bug-880898.t created a core file. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1740017 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23207 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 12 06:09:51 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 12 Aug 2019 06:09:51 +0000 Subject: [Bugs] [Bug 1740017] tests/bugs/replicate/bug-880898.t created a core file. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1740017 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23207 (tests: fix bug-880898.t crash) posted (#1) for review on master by Ravishankar N -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 12 06:34:33 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 12 Aug 2019 06:34:33 +0000 Subject: [Bugs] [Bug 1740017] tests/bugs/replicate/bug-880898.t created a core file. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1740017 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-12 06:34:33 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23207 (tests: fix bug-880898.t crash) merged (#1) on master by Ravishankar N -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 12 06:37:23 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 12 Aug 2019 06:37:23 +0000 Subject: [Bugs] [Bug 1716979] Multiple disconnect events being propagated for the same child In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1716979 hari gowtham changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(hgowtham at redhat.c | |om) | --- Comment #10 from hari gowtham --- Hi Amgad, The 5.x and 6.x has reached its slowed out phase (releases after .3 or .4). So we are supposed to have the next release in two months. The dates have been changed and more info about the change can be found here: https://lists.gluster.org/pipermail/gluster-devel/2019-August/056521.html And this month's releases of 5 and 6 are in the testing phase, announcement will be expected by the end of the day. As we didn't get to know these blocker issues during the bugs gathering part (https://lists.gluster.org/pipermail/gluster-devel/2019-August/056500.html) we missed out these bugs for this particular release. Please do, come forward with the bugs by then so we can plan things better, make them easier and thus get a better product. As we might miss backporting a few bugs we have fixed. It would be great, If you can come forward and help us. @Ravi, thanks for back-ports. I will check the backports and take them in. The backports of this bug will be a part of the next release. As to when the next release will be, it is supposed to be in two months from now. We will look into if we can plan another release for the next month. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 12 07:19:58 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 12 Aug 2019 07:19:58 +0000 Subject: [Bugs] [Bug 1739336] Multiple disconnect events being propagated for the same child In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739336 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-12 07:19:58 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23181 (protocol/client: propagte GF_EVENT_CHILD_PING only for connections to brick) merged (#1) on release-5 by Ravishankar N -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 12 07:19:58 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 12 Aug 2019 07:19:58 +0000 Subject: [Bugs] [Bug 1739335] Multiple disconnect events being propagated for the same child In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739335 Bug 1739335 depends on bug 1739336, which changed state. Bug 1739336 Summary: Multiple disconnect events being propagated for the same child https://bugzilla.redhat.com/show_bug.cgi?id=1739336 What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 12 07:19:59 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 12 Aug 2019 07:19:59 +0000 Subject: [Bugs] [Bug 1739334] Multiple disconnect events being propagated for the same child In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739334 Bug 1739334 depends on bug 1739336, which changed state. Bug 1739336 Summary: Multiple disconnect events being propagated for the same child https://bugzilla.redhat.com/show_bug.cgi?id=1739336 What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 12 09:06:40 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 12 Aug 2019 09:06:40 +0000 Subject: [Bugs] [Bug 1740077] New: Fencing: Added the tcmu-runner ALUA feature support but after one of node is rebooted the glfs_file_lock() get stucked Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1740077 Bug ID: 1740077 Summary: Fencing: Added the tcmu-runner ALUA feature support but after one of node is rebooted the glfs_file_lock() get stucked Product: GlusterFS Version: 7 Status: NEW Component: locks Assignee: spalai at redhat.com Reporter: spalai at redhat.com CC: atumball at redhat.com, bugs at gluster.org, prasanna.kalever at redhat.com, spalai at redhat.com, xiubli at redhat.com Depends On: 1717824 Target Milestone: --- Classification: Community +++ This bug was initially created as a clone of Bug #1717824 +++ Description of problem: In Glusterfs, we have support the fencing feature support. With this we can suppor the ALUA feature in LIO/TCMU now. The fencing doc: https://review.gluster.org/#/c/glusterfs-specs/+/21925/6/accepted/fencing.md The fencing test example: https://review.gluster.org/#/c/glusterfs/+/21496/12/tests/basic/fencing/fence-basic.c The LIO/tcmu-runner PR of supporting the ALUA is : https://github.com/open-iscsi/tcmu-runner/pull/554. But currently when testing it based the above PR in tcmu-runner by shutting down of the HA node, and start it after 2~3 minutes, in all the HA nodes we can see that the glfs_file_lock() get stucked, the following is from the /var/log/tcmu-runner.log: ==== 2019-06-06 13:50:15.755 1316 [DEBUG] tcmu_acquire_dev_lock:388 glfs/block3: lock call state 2 retries 0. tag 65535 reopen 0 2019-06-06 13:50:15.757 1316 [DEBUG] tcmu_acquire_dev_lock:440 glfs/block3: lock call done. lock state 1 2019-06-06 13:50:55.845 1316 [DEBUG] tcmu_acquire_dev_lock:388 glfs/block4: lock call state 2 retries 0. tag 65535 reopen 0 2019-06-06 13:50:55.847 1316 [DEBUG] tcmu_acquire_dev_lock:440 glfs/block4: lock call done. lock state 1 2019-06-06 13:57:50.102 1315 [DEBUG] tcmu_acquire_dev_lock:388 glfs/block3: lock call state 2 retries 0. tag 65535 reopen 0 2019-06-06 13:57:50.103 1315 [DEBUG] tcmu_acquire_dev_lock:440 glfs/block3: lock call done. lock state 1 2019-06-06 13:57:50.121 1315 [DEBUG] tcmu_acquire_dev_lock:388 glfs/block4: lock call state 2 retries 0. tag 65535 reopen 0 2019-06-06 13:57:50.132 1315 [DEBUG] tcmu_acquire_dev_lock:440 glfs/block4: lock call done. lock state 1 2019-06-06 14:09:03.654 1328 [DEBUG] tcmu_acquire_dev_lock:388 glfs/block3: lock call state 2 retries 0. tag 65535 reopen 0 2019-06-06 14:09:03.662 1328 [DEBUG] tcmu_acquire_dev_lock:440 glfs/block3: lock call done. lock state 1 2019-06-06 14:09:06.700 1328 [DEBUG] tcmu_acquire_dev_lock:388 glfs/block4: lock call state 2 retries 0. tag 65535 reopen 0 ==== The lock operation is never returned. I am using the following glusterfs built by myself: # rpm -qa|grep glusterfs glusterfs-extra-xlators-7dev-0.0.el7.x86_64 glusterfs-api-devel-7dev-0.0.el7.x86_64 glusterfs-7dev-0.0.el7.x86_64 glusterfs-server-7dev-0.0.el7.x86_64 glusterfs-cloudsync-plugins-7dev-0.0.el7.x86_64 glusterfs-resource-agents-7dev-0.0.el7.noarch glusterfs-api-7dev-0.0.el7.x86_64 glusterfs-devel-7dev-0.0.el7.x86_64 glusterfs-regression-tests-7dev-0.0.el7.x86_64 glusterfs-gnfs-7dev-0.0.el7.x86_64 glusterfs-client-xlators-7dev-0.0.el7.x86_64 glusterfs-geo-replication-7dev-0.0.el7.x86_64 glusterfs-debuginfo-7dev-0.0.el7.x86_64 glusterfs-fuse-7dev-0.0.el7.x86_64 glusterfs-events-7dev-0.0.el7.x86_64 glusterfs-libs-7dev-0.0.el7.x86_64 glusterfs-cli-7dev-0.0.el7.x86_64 glusterfs-rdma-7dev-0.0.el7.x86_64 How reproducible: 30%. Steps to Reproduce: 1. create one rep volume(HA >= 2) with the mandantary lock enabled 2. create one gluster-blockd target 3. login and do the fio in the client node 4. shutdown one of the HA nodes, and wait 2 ~3 minutes and start it again Actual results: all the time the fio couldn't recovery and the rw BW will be 0kb/s, and we can see tons of log from /var/log/tcmu-runnner.log file: 2019-06-06 15:01:06.641 1328 [DEBUG] alua_implicit_transition:561 glfs/block4: Lock acquisition operation is already in process. 2019-06-06 15:01:06.648 1328 [DEBUG_SCSI_CMD] tcmu_cdb_print_info:353 glfs/block4: 28 0 0 3 1f 80 0 0 8 0 2019-06-06 15:01:06.648 1328 [DEBUG] alua_implicit_transition:561 glfs/block4: Lock acquisition operation is already in process. 2019-06-06 15:01:06.655 1328 [DEBUG_SCSI_CMD] tcmu_cdb_print_info:353 glfs/block4: 28 0 0 3 1f 80 0 0 8 0 2019-06-06 15:01:06.655 1328 [DEBUG] alua_implicit_transition:561 glfs/block4: Lock acquisition operation is already in process. 2019-06-06 15:01:06.661 1328 [DEBUG_SCSI_CMD] tcmu_cdb_print_info:353 glfs/block4: 28 0 0 3 1f 80 0 0 8 0 2019-06-06 15:01:06.662 1328 [DEBUG] alua_implicit_transition:561 glfs/block4: Lock acquisition operation is already in process. Expected results: just before the shutdown node is up, the fio could be recovery. --- Additional comment from Xiubo Li on 2019-06-06 14:39:50 MVT --- --- Additional comment from Xiubo Li on 2019-06-06 14:40:16 MVT --- --- Additional comment from Xiubo Li on 2019-06-06 14:42:10 MVT --- The bt output from the gbd: [root at rhel1 ~]# gdb -p 1325 (gdb) bt #0 0x00007fc7761baf47 in pthread_join () from /lib64/libpthread.so.0 #1 0x00007fc7773de468 in event_dispatch_epoll (event_pool=0x559f03d4b560) at event-epoll.c:847 #2 0x0000559f02419658 in main (argc=21, argv=0x7fff9c6722c8) at glusterfsd.c:2871 (gdb) [root at rhel3 ~]# gdb -p 7669 (gdb) bt #0 0x00007fac80bd9f47 in pthread_join () from /usr/lib64/libpthread.so.0 #1 0x00007fac81dfd468 in event_dispatch_epoll (event_pool=0x55de6f845560) at event-epoll.c:847 #2 0x000055de6f143658 in main (argc=21, argv=0x7ffcafc3eff8) at glusterfsd.c:2871 (gdb) The pl_inode->fop_wind_count is: (gdb) thread 2 [Switching to thread 2 (Thread 0x7fc742184700 (LWP 1829))] #0 0x00007fc7761bd965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 (gdb) frame 2 #2 0x00007fc76379c13b in pl_lk (frame=frame at entry=0x7fc750001128, this=this at entry=0x7fc75c0128f0, fd=fd at entry=0x7fc73c0977d8, cmd=cmd at entry=6, flock=flock at entry=0x7fc73c076938, xdata=xdata at entry=0x7fc73c071828) at posix.c:2637 2637 ret = pl_lock_preempt(pl_inode, reqlock); (gdb) p pl_inode->fop_wind_count $1 = -30 (gdb) The pstack logs please see the attachments Thanks. BRs --- Additional comment from Susant Kumar Palai on 2019-06-10 12:10:33 MVT --- Just a small update: There are cases where fop_wind_count can go -ve. A basic fix will be never to bring its value down if it is zero. I will update more on this later as I am busy with a few other issues ATM. Susant --- Additional comment from Xiubo Li on 2019-07-17 13:15:51 MVT --- Hi Susant, Is there any new update about this ? Thanks. --- Additional comment from Susant Kumar Palai on 2019-07-17 13:20:56 MVT --- (In reply to Xiubo Li from comment #5) > Hi Susant, > > Is there any new update about this ? > > Thanks. Hey Xiubo, most likely will be sending a patch by end of day today. --- Additional comment from Xiubo Li on 2019-07-17 13:22:28 MVT --- (In reply to Susant Kumar Palai from comment #6) > (In reply to Xiubo Li from comment #5) > > Hi Susant, > > > > Is there any new update about this ? > > > > Thanks. > > Hey Xiubo, most likely will be sending a patch by end of day today. Sure and take your time Susant please :-) Thanks very much. BRs --- Additional comment from Susant Kumar Palai on 2019-07-17 14:13:59 MVT --- Moved to POST by mistake. Resetting. --- Additional comment from Worker Ant on 2019-07-22 14:59:10 MVT --- REVIEW: https://review.gluster.org/23088 (locks/fencing: Address while lock preemption) posted (#1) for review on master by Susant Palai --- Additional comment from Susant Kumar Palai on 2019-07-22 15:02:07 MVT --- Xiubo, if you could test it out, it would be great. (Make sure you enable fencing before you create any client) --- Additional comment from Xiubo Li on 2019-07-22 15:42:38 MVT --- (In reply to Susant Kumar Palai from comment #10) > Xiubo, if you could test it out, it would be great. (Make sure you enable > fencing before you create any client) @Susant, Yeah, thanks very much for your work on this. And I will test it late today or tomorrow. BRs Xiubo --- Additional comment from Xiubo Li on 2019-07-23 07:44:00 MVT --- @Susant, There 2 issues are found: 1, From my test the glfs_file_lock() sometimes it will takes around 43 seconds, is it normal ? And why ? 90456 2019-07-23 10:08:57.444 31411 [INFO] tcmu_glfs_lock:901 glfs/block0: lxb--------------glfs_file_lock start ... 319959 2019-07-23 10:09:40.183 31411 [INFO] tcmu_glfs_lock:905 glfs/block0: lxb--------------glfs_file_lock end 2, After the lock is broke and all the FIOs callback will return -1, the -EPERM, not the -EBUSY as we discussed before. Is there any change about the return value ? I am only checking the -EBUSY and -ENOTCONN then only after that the lock state in local tcmu node will be changed. Or the local state in tcmu-runner service will always in LOCKED state, but it actually already lost the lock and should be in UNLOCKED state, so all the IOs will fail. Thanks, BRs --- Additional comment from Susant Kumar Palai on 2019-07-23 10:45:01 MVT --- (In reply to Xiubo Li from comment #12) > @Susant, > > There 2 issues are found: > > > 1, From my test the glfs_file_lock() sometimes it will takes around 43 > seconds, is it normal ? And why ? > > 90456 2019-07-23 10:08:57.444 31411 [INFO] tcmu_glfs_lock:901 glfs/block0: > lxb--------------glfs_file_lock start > ... > 319959 2019-07-23 10:09:40.183 31411 [INFO] tcmu_glfs_lock:905 glfs/block0: > lxb--------------glfs_file_lock end I wonder if it is related to draining of fops. Let me do some testing around this. > > 2, After the lock is broke and all the FIOs callback will return -1, the > -EPERM, not the -EBUSY as we discussed before. Is there any change about the > return value ? I am only checking the -EBUSY and -ENOTCONN then only after > that the lock state in local tcmu node will be changed. Or the local state > in tcmu-runner service will always in LOCKED state, but it actually already > lost the lock and should be in UNLOCKED state, so all the IOs will fail. This is interesting. Will get back after some code checking. > > > > Thanks, > BRs --- Additional comment from Xiubo Li on 2019-07-23 10:47:15 MVT --- (In reply to Susant Kumar Palai from comment #13) > (In reply to Xiubo Li from comment #12) > > @Susant, > > > > There 2 issues are found: > > > > > > 1, From my test the glfs_file_lock() sometimes it will takes around 43 > > seconds, is it normal ? And why ? > > > > 90456 2019-07-23 10:08:57.444 31411 [INFO] tcmu_glfs_lock:901 glfs/block0: > > lxb--------------glfs_file_lock start > > ... > > 319959 2019-07-23 10:09:40.183 31411 [INFO] tcmu_glfs_lock:905 glfs/block0: > > lxb--------------glfs_file_lock end > > I wonder if it is related to draining of fops. Let me do some testing around > this. > Sure. > > > > > > 2, After the lock is broke and all the FIOs callback will return -1, the > > -EPERM, not the -EBUSY as we discussed before. Is there any change about the > > return value ? I am only checking the -EBUSY and -ENOTCONN then only after > > that the lock state in local tcmu node will be changed. Or the local state > > in tcmu-runner service will always in LOCKED state, but it actually already > > lost the lock and should be in UNLOCKED state, so all the IOs will fail. > > This is interesting. Will get back after some code checking. > Please take your time @Susant. Thanks, BRs --- Additional comment from Susant Kumar Palai on 2019-07-23 12:44:22 MVT --- (In reply to Xiubo Li from comment #12) > @Susant, > > There 2 issues are found: > > > 1, From my test the glfs_file_lock() sometimes it will takes around 43 > seconds, is it normal ? And why ? > > 90456 2019-07-23 10:08:57.444 31411 [INFO] tcmu_glfs_lock:901 glfs/block0: > lxb--------------glfs_file_lock start > ... > 319959 2019-07-23 10:09:40.183 31411 [INFO] tcmu_glfs_lock:905 glfs/block0: > lxb--------------glfs_file_lock end Checked the time taken for file_lock and it completes immediately for me. ret = glfs_fsetxattr(fd1, GF_ENFORCE_MANDATORY_LOCK, "set", 8, 0); if (ret < 0) { LOG_ERR("glfs_fsetxattr", errno); ret = -1; goto out; } time(&before); /* take a write mandatory lock */ ret = glfs_file_lock(fd1, F_SETLKW, &lock, GLFS_LK_MANDATORY); if (ret) { LOG_ERR("glfs_file_lock", errno); goto out; } time(&after); diff = (unsigned long )after - before; fprintf(fp, "time %lu %lu %lu\n", diff, before, after); time 0 1563867824 1563867824 Can you attach the brick log here when you run the test next time? > > 2, After the lock is broke and all the FIOs callback will return -1, the > -EPERM, not the -EBUSY as we discussed before. Is there any change about the > return value ? I am only checking the -EBUSY and -ENOTCONN then only after > that the lock state in local tcmu node will be changed. Or the local state > in tcmu-runner service will always in LOCKED state, but it actually already > lost the lock and should be in UNLOCKED state, so all the IOs will fail. Please attach the brick log after enabling trace logging. brick-log-level TRACE> > > > > Thanks, > BRs --- Additional comment from Xiubo Li on 2019-07-23 12:48:58 MVT --- (In reply to Susant Kumar Palai from comment #15) > (In reply to Xiubo Li from comment #12) > > @Susant, > > > > There 2 issues are found: > > > > > > 1, From my test the glfs_file_lock() sometimes it will takes around 43 > > seconds, is it normal ? And why ? > > > > 90456 2019-07-23 10:08:57.444 31411 [INFO] tcmu_glfs_lock:901 glfs/block0: > > lxb--------------glfs_file_lock start > > ... > > 319959 2019-07-23 10:09:40.183 31411 [INFO] tcmu_glfs_lock:905 glfs/block0: > > lxb--------------glfs_file_lock end > > Checked the time taken for file_lock and it completes immediately for me. > > ret = glfs_fsetxattr(fd1, GF_ENFORCE_MANDATORY_LOCK, "set", 8, 0); > if (ret < 0) { > LOG_ERR("glfs_fsetxattr", errno); > ret = -1; > goto out; > } > > time(&before); > /* take a write mandatory lock */ > ret = glfs_file_lock(fd1, F_SETLKW, &lock, GLFS_LK_MANDATORY); > if (ret) { > LOG_ERR("glfs_file_lock", errno); > goto out; > } > time(&after); > diff = (unsigned long )after - before; > fprintf(fp, "time %lu %lu %lu\n", diff, before, after); > > time 0 1563867824 1563867824 > > Can you attach the brick log here when you run the test next time? > > > > > 2, After the lock is broke and all the FIOs callback will return -1, the > > -EPERM, not the -EBUSY as we discussed before. Is there any change about the > > return value ? I am only checking the -EBUSY and -ENOTCONN then only after > > that the lock state in local tcmu node will be changed. Or the local state > > in tcmu-runner service will always in LOCKED state, but it actually already > > lost the lock and should be in UNLOCKED state, so all the IOs will fail. > > Please attach the brick log after enabling trace logging. brick-log-level TRACE> > > > > > > Sure, I will do that after my current work handy, possibly late today or tomorrow morning. Thanks BRs --- Additional comment from Susant Kumar Palai on 2019-07-25 15:05:47 MVT --- On the permission denied: I did not see any error related to EPERM but saw EBUSY in the brick logs. [2019-07-24 08:15:22.236283] E [MSGID: 101191] [event-epoll.c:765:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler [2019-07-24 08:15:46.083306] E [MSGID: 115068] [server-rpc-fops_v2.c:1425:server4_readv_cbk] 0-repvol3-server: 29: READV 0 (7db899f8-bf56-4b89-a4c6-90235e8c720a), client: CTX_ID:024a059c-7be1-4a19-ba27-8624c6e9c c9c-GRAPH_ID:0-PID:9399-HOST:rhel3-PC_NAME:repvol3-client-2-RECON_NO:-0, error-xlator: repvol3-locks [Resource temporarily unavailable] [2019-07-24 08:15:46.088292] E [MSGID: 115068] [server-rpc-fops_v2.c:1425:server4_readv_cbk] 0-repvol3-server: 31: READV 0 (7db899f8-bf56-4b89-a4c6-90235e8c720a), client: CTX_ID:024a059c-7be1-4a19-ba27-8624c6e9c c9c-GRAPH_ID:0-PID:9399-HOST:rhel3-PC_NAME:repvol3-client-2-RECON_NO:-0, error-xlator: repvol3-locks [Resource temporarily unavailable] [2019-07-24 08:15:46.119463] E [MSGID: 115068] [server-rpc-fops_v2.c:1425:server4_readv_cbk] 0-repvol3-server: 33: READV 0 (7db899f8-bf56-4b89-a4c6-90235e8c720a), client: CTX_ID:024a059c-7be1-4a19-ba27-8624c6e9c c9c-GRAPH_ID:0-PID:9399-HOST:rhel3-PC_NAME:repvol3-client-2-RECON_NO:-0, error-xlator: repvol3-locks [Resource temporarily unavailable] [2019-07-24 08:15:46.124067] E [MSGID: 115068] [server-rpc-fops_v2.c:1425:server4_readv_cbk] 0-repvol3-server: 35: READV 0 (7db899f8-bf56-4b89-a4c6-90235e8c720a), client: CTX_ID:024a059c-7be1-4a19-ba27-8624c6e9c c9c-GRAPH_ID:0-PID:9399-HOST:rhel3-PC_NAME:repvol3-client-2-RECON_NO:-0, error-xlator: repvol3-locks [Resource temporarily unavailable] [2019-07-24 08:15:46.294554] E [MSGID: 115068] [server-rpc-fops_v2.c:1425:server4_readv_cbk] 0-repvol3-server: 37: READV 0 (7db899f8-bf56-4b89-a4c6-90235e8c720a), client: CTX_ID:024a059c-7be1-4a19-ba27-8624c6e9c c9c-GRAPH_ID:0-PID:9399-HOST:rhel3-PC_NAME:repvol3-client-2-RECON_NO:-0, error-xlator: repvol3-locks [Resource temporarily unavailable] [2019-07-24 08:15:46.298672] E [MSGID: 115068] [server-rpc-fops_v2.c:1425:server4_readv_cbk] 0-repvol3-server: 39: READV 0 (7db899f8-bf56-4b Is it possible that the lower layer is converting the errnos to EPERM? Can you check gfapi logs and tcmu logs for corresponding error messages and confirm? --- Additional comment from Xiubo Li on 2019-07-25 15:39:23 MVT --- (In reply to Susant Kumar Palai from comment #17) > On the permission denied: > > I did not see any error related to EPERM but saw EBUSY in the brick logs. > > > [2019-07-24 08:15:22.236283] E [MSGID: 101191] > [event-epoll.c:765:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch > handler > [2019-07-24 08:15:46.083306] E [MSGID: 115068] > [server-rpc-fops_v2.c:1425:server4_readv_cbk] 0-repvol3-server: 29: READV 0 > (7db899f8-bf56-4b89-a4c6-90235e8c720a), client: > CTX_ID:024a059c-7be1-4a19-ba27-8624c6e9c > c9c-GRAPH_ID:0-PID:9399-HOST:rhel3-PC_NAME:repvol3-client-2-RECON_NO:-0, > error-xlator: repvol3-locks [Resource temporarily unavailable] > [2019-07-24 08:15:46.088292] E [MSGID: 115068] > [server-rpc-fops_v2.c:1425:server4_readv_cbk] 0-repvol3-server: 31: READV 0 > (7db899f8-bf56-4b89-a4c6-90235e8c720a), client: > CTX_ID:024a059c-7be1-4a19-ba27-8624c6e9c > c9c-GRAPH_ID:0-PID:9399-HOST:rhel3-PC_NAME:repvol3-client-2-RECON_NO:-0, > error-xlator: repvol3-locks [Resource temporarily unavailable] > [2019-07-24 08:15:46.119463] E [MSGID: 115068] > [server-rpc-fops_v2.c:1425:server4_readv_cbk] 0-repvol3-server: 33: READV 0 > (7db899f8-bf56-4b89-a4c6-90235e8c720a), client: > CTX_ID:024a059c-7be1-4a19-ba27-8624c6e9c > c9c-GRAPH_ID:0-PID:9399-HOST:rhel3-PC_NAME:repvol3-client-2-RECON_NO:-0, > error-xlator: repvol3-locks [Resource temporarily unavailable] > [2019-07-24 08:15:46.124067] E [MSGID: 115068] > [server-rpc-fops_v2.c:1425:server4_readv_cbk] 0-repvol3-server: 35: READV 0 > (7db899f8-bf56-4b89-a4c6-90235e8c720a), client: > CTX_ID:024a059c-7be1-4a19-ba27-8624c6e9c > c9c-GRAPH_ID:0-PID:9399-HOST:rhel3-PC_NAME:repvol3-client-2-RECON_NO:-0, > error-xlator: repvol3-locks [Resource temporarily unavailable] > [2019-07-24 08:15:46.294554] E [MSGID: 115068] > [server-rpc-fops_v2.c:1425:server4_readv_cbk] 0-repvol3-server: 37: READV 0 > (7db899f8-bf56-4b89-a4c6-90235e8c720a), client: > CTX_ID:024a059c-7be1-4a19-ba27-8624c6e9c > c9c-GRAPH_ID:0-PID:9399-HOST:rhel3-PC_NAME:repvol3-client-2-RECON_NO:-0, > error-xlator: repvol3-locks [Resource temporarily unavailable] > [2019-07-24 08:15:46.298672] E [MSGID: 115068] > [server-rpc-fops_v2.c:1425:server4_readv_cbk] 0-repvol3-server: 39: READV 0 > (7db899f8-bf56-4b > > > Is it possible that the lower layer is converting the errnos to EPERM? Can > you check gfapi logs and tcmu logs for corresponding error messages and > confirm? If so maybe the gfapi is doing this. I will sent you the gfapi logs, the EPERM value comes from the gfapi directly and tcmu-runner do nothing with it. Checked the gfapi log, it is also full of: [2019-07-24 08:23:41.042339] W [MSGID: 114031] [client-rpc-fops_v2.c:680:client4_0_writev_cbk] 0-repvol3-client-1: remote operation failed [Device or resource busy] [2019-07-24 08:23:41.042381] W [MSGID: 114031] [client-rpc-fops_v2.c:680:client4_0_writev_cbk] 0-repvol3-client-0: remote operation failed [Device or resource busy] [2019-07-24 08:23:41.042556] W [MSGID: 114031] [client-rpc-fops_v2.c:680:client4_0_writev_cbk] 0-repvol3-client-1: remote operation failed [Device or resource busy] [2019-07-24 08:23:41.042574] W [MSGID: 114031] [client-rpc-fops_v2.c:680:client4_0_writev_cbk] 0-repvol3-client-0: remote operation failed [Device or resource busy] [2019-07-24 08:23:41.042655] W [MSGID: 114031] [client-rpc-fops_v2.c:680:client4_0_writev_cbk] 0-repvol3-client-1: remote operation failed [Device or resource busy] [2019-07-24 08:23:41.042671] W [MSGID: 114031] [client-rpc-fops_v2.c:680:client4_0_writev_cbk] 0-repvol3-client-0: remote operation failed [Device or resource busy] [2019-07-24 08:23:41.042709] W [MSGID: 114031] [client-rpc-fops_v2.c:680:client4_0_writev_cbk] 0-repvol3-client-1: remote operation failed [Device or resource busy] [2019-07-24 08:23:41.042722] W [MSGID: 114031] [client-rpc-fops_v2.c:680:client4_0_writev_cbk] 0-repvol3-client-0: remote operation failed [Device or resource busy] [2019-07-24 08:23:41.042784] W [MSGID: 114031] [client-rpc-fops_v2.c:680:client4_0_writev_cbk] 0-repvol3-client-1: remote operation failed [Device or resource busy] Checked the gfapi source code: 677 out: 678 if (rsp.op_ret == -1) { 679 gf_msg(this->name, GF_LOG_WARNING, gf_error_to_errno(rsp.op_errno), 680 PC_MSG_REMOTE_OP_FAILED, "remote operation failed"); 681 } else if (rsp.op_ret >= 0) { 682 if (local->attempt_reopen) 683 client_attempt_reopen(local->fd, this); 684 } 685 CLIENT_STACK_UNWIND(writev, frame, rsp.op_ret, 686 gf_error_to_errno(rsp.op_errno), &prestat, &poststat, 687 xdata); 688 689 if (xdata) 690 dict_unref(xdata); It seems the return valume is coverted. Thanks, BRs --- Additional comment from Xiubo Li on 2019-07-25 15:42:32 MVT --- (In reply to Xiubo Li from comment #18) > (In reply to Susant Kumar Palai from comment #17) [...] > > Checked the gfapi source code: > > 677 out: > 678 if (rsp.op_ret == -1) { It seems returning the rsp.op_ret here to the callback: static void glfs_async_cbk(glfs_fd_t *fd, ssize_t ret, void *data) Not the rsp.op_errno. > 679 gf_msg(this->name, GF_LOG_WARNING, > gf_error_to_errno(rsp.op_errno), > > 680 PC_MSG_REMOTE_OP_FAILED, "remote operation failed"); > 681 } else if (rsp.op_ret >= 0) { > 682 if (local->attempt_reopen) > 683 client_attempt_reopen(local->fd, this); > 684 } > 685 CLIENT_STACK_UNWIND(writev, frame, rsp.op_ret, > 686 gf_error_to_errno(rsp.op_errno), &prestat, > &poststat, > 687 xdata); > 688 > 689 if (xdata) > 690 dict_unref(xdata); > > > It seems the return valume is coverted. > > Thanks, > BRs --- Additional comment from Xiubo Li on 2019-08-01 06:04:52 MVT --- When the ret == -1 and then check the errno directly will works for me now. But I can get both the -EAGAIN and -EBUSY, which only the -EBUSY is expected. Then the problem is why there will always be -EAGAIN every time before acquiring the lock ? Thanks BRs --- Additional comment from Worker Ant on 2019-08-02 19:27:15 MVT --- REVIEW: https://review.gluster.org/23088 (locks/fencing: Address hang while lock preemption) merged (#4) on master by Amar Tumballi --- Additional comment from Xiubo Li on 2019-08-04 18:03:30 MVT --- @Susant, Since the Fencing patch has been into the release 6, so this fixing followed should be backported, right ? Thanks. BRs Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1717824 [Bug 1717824] Fencing: Added the tcmu-runner ALUA feature support but after one of node is rebooted the glfs_file_lock() get stucked -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 12 09:06:40 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 12 Aug 2019 09:06:40 +0000 Subject: [Bugs] [Bug 1717824] Fencing: Added the tcmu-runner ALUA feature support but after one of node is rebooted the glfs_file_lock() get stucked In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1717824 Susant Kumar Palai changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1740077 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1740077 [Bug 1740077] Fencing: Added the tcmu-runner ALUA feature support but after one of node is rebooted the glfs_file_lock() get stucked -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 12 09:27:46 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 12 Aug 2019 09:27:46 +0000 Subject: [Bugs] [Bug 1194546] Write behind returns success for a write irrespective of a conflicting lock held by another application In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1194546 Rishubh Jain changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 12 09:29:58 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 12 Aug 2019 09:29:58 +0000 Subject: [Bugs] [Bug 1717824] Fencing: Added the tcmu-runner ALUA feature support but after one of node is rebooted the glfs_file_lock() get stucked In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1717824 Susant Kumar Palai changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Blocks|1740077 | Resolution|--- |CURRENTRELEASE Flags|needinfo?(spalai at redhat.com | |) | |needinfo?(spalai at redhat.com | |) | Last Closed| |2019-08-12 09:29:58 --- Comment #23 from Susant Kumar Palai --- (In reply to Xiubo Li from comment #22) > @Susant, > > Since the Fencing patch has been into the release 6, so this fixing followed > should be backported, right ? > > Thanks. > BRs Will backport to release 6 and 7. Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1740077 [Bug 1740077] Fencing: Added the tcmu-runner ALUA feature support but after one of node is rebooted the glfs_file_lock() get stucked -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 12 09:29:58 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 12 Aug 2019 09:29:58 +0000 Subject: [Bugs] [Bug 1740077] Fencing: Added the tcmu-runner ALUA feature support but after one of node is rebooted the glfs_file_lock() get stucked In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1740077 Susant Kumar Palai changed: What |Removed |Added ---------------------------------------------------------------------------- Depends On|1717824 | Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1717824 [Bug 1717824] Fencing: Added the tcmu-runner ALUA feature support but after one of node is rebooted the glfs_file_lock() get stucked -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 12 10:43:15 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 12 Aug 2019 10:43:15 +0000 Subject: [Bugs] [Bug 1558507] Gluster allows renaming of folders, which contain WORMed/Retain or WORMed files In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1558507 Vishal Pandey changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 12 12:59:00 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 12 Aug 2019 12:59:00 +0000 Subject: [Bugs] [Bug 1288227] samba gluster vfs - client can't follow symlinks In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1288227 joao.bauto at neuro.fchampalimaud.org changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |joao.bauto at neuro.fchampalim | |aud.org --- Comment #3 from joao.bauto at neuro.fchampalimaud.org --- I'm seeing the same behaviour as Steve in glusterfs 6.4 and samba 4.9.11. Testing with vfs glusterfs (gfapi), symlinks show up as files without extension on Windows and not showing at all in Linux. Switching to vfs glusterfs_fuse (fuse) both Windows and Linux show symlinks as folders and can follow them. smb.conf clustering = yes allow insecure wide links = yes follow symlinks = yes wide links = yes unix extensions = no [demo] vfs objects = glusterfs glusterfs:volume = data create mode = 0640 directory mode = 0750 glusterfs:logfile = /var/log/samba/glusterfs.log path = /demo [demo2] vfs objects = glusterfs_fuse create mode = 0640 directory mode = 0750 glusterfs:logfile = /var/log/samba/glusterfs_fuse.log path = /mnt/data/demo2 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 12 13:30:56 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 12 Aug 2019 13:30:56 +0000 Subject: [Bugs] [Bug 1738419] read() returns more than file size when using direct I/O In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1738419 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-12 13:30:56 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23175 (features/shard: Send correct size when reads are sent beyond file size) merged (#3) on master by Krutika Dhananjay -- You are receiving this mail because: You are the QA Contact for the bug. You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 12 13:30:57 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 12 Aug 2019 13:30:57 +0000 Subject: [Bugs] [Bug 1737141] read() returns more than file size when using direct I/O In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737141 Bug 1737141 depends on bug 1738419, which changed state. Bug 1738419 Summary: read() returns more than file size when using direct I/O https://bugzilla.redhat.com/show_bug.cgi?id=1738419 What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE -- You are receiving this mail because: You are the QA Contact for the bug. You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 12 14:33:00 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 12 Aug 2019 14:33:00 +0000 Subject: [Bugs] [Bug 1734027] glusterd 6.4 memory leaks 2-3 GB per 24h (OOM) In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734027 --- Comment #6 from Alex --- Created attachment 1602963 --> https://bugzilla.redhat.com/attachment.cgi?id=1602963&action=edit Ram usage over a few weeks for node 002 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 12 14:33:34 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 12 Aug 2019 14:33:34 +0000 Subject: [Bugs] [Bug 1734027] glusterd 6.4 memory leaks 2-3 GB per 24h (OOM) In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734027 --- Comment #7 from Alex --- Created attachment 1602964 --> https://bugzilla.redhat.com/attachment.cgi?id=1602964&action=edit Ram usage over a few weeks for node 001 & 003 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 12 14:35:19 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 12 Aug 2019 14:35:19 +0000 Subject: [Bugs] [Bug 1734027] glusterd 6.4 memory leaks 2-3 GB per 24h (OOM) In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734027 Alex changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(totalworlddominat | |ion at gmail.com) | --- Comment #8 from Alex --- 1. > gluster volume info Volume Name: gluster Type: Replicate Volume ID: 60ae0ddf-67d0-4b23-b694-0250c17a2f04 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 172.27.39.82:/mnt/xfs-drive-gluster/brick Brick2: 172.27.39.81:/mnt/xfs-drive-gluster/brick Brick3: 172.27.39.84:/mnt/xfs-drive-gluster/brick Options Reconfigured: cluster.self-heal-daemon: enable cluster.consistent-metadata: off ssl.dh-param: /etc/ssl/dhparam.pem ssl.ca-list: /etc/ssl/glusterfs.ca ssl.own-cert: /etc/ssl/glusterfs.pem ssl.private-key: /etc/ssl/glusterfs.key ssl.cipher-list: HIGH:!SSLv2:!SSLv3:!TLSv1:!TLSv1.1:TLSv1.2:!3DES:!RC4:!aNULL:!ADH ssl.certificate-depth: 2 server.ssl: on client.ssl: on transport.address-family: inet nfs.disable: on performance.client-io-threads: off features.barrier: disable features.bitrot: on features.scrub: Active auto-delete: enable 2. Over the past month, since a recovery, I've had glusterd grow in ram on node 002 every 24h and every week on 001 and 003. Interestingly, since last week, it seems to have stopped the rapid growth on glusterd and glusterfsd might now be the one consuming more ram. See attached graph of ram over the month, fast ram freeing, or quick vertical lines, are due to the cron that restarted glusterd. glusterfs-001: root 1435 118 41.3 5878344 3379376 ? Ssl jui24 31930:51 /usr/sbin/glusterfsd -s 172.27.39.82 --volfile-id gluster.172.27.39.82.mnt-xfs-drive-gluster-brick -p /var/run/gluster/vols/gluster/172.27.39.82-mnt-xfs-drive-gluster-brick.pid -S /var/run/gluster/b9ec53e974e8d080.socket --brick-name /mnt/xfs-drive-gluster/brick -l /var/log/glusterfs/bricks/mnt-xfs-drive-gluster-brick.log --xlator-option *-posix.glusterd-uuid=2cc7ba6f-5478-4b27-b647-0c1527192f5a --process-name brick --brick-port 49152 --xlator-option gluster-server.listen-port=49152 root 45129 0.2 17.8 1890584 1457448 ? Ssl ao?06 17:33 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO glusterfs-002: root 1458 47.5 50.6 5878492 4141664 ? Ssl jui24 12775:43 /usr/sbin/glusterfsd -s 172.27.39.81 --volfile-id gluster.172.27.39.81.mnt-xfs-drive-gluster-brick -p /var/run/gluster/vols/gluster/172.27.39.81-mnt-xfs-drive-gluster-brick.pid -S /var/run/gluster/dcbebdf486b846e2.socket --brick-name /mnt/xfs-drive-gluster/brick -l /var/log/glusterfs/bricks/mnt-xfs-drive-gluster-brick.log --xlator-option *-posix.glusterd-uuid=be4912ac-b0a5-4a02-b8d6-7bccd3e1f807 --process-name brick --brick-port 49152 --xlator-option gluster-server.listen-port=49152 root 20329 0.0 1.2 506132 99128 ? Ssl 03:00 0:22 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO glusterfs-003: root 1496 60.6 42.1 5878776 3443712 ? Ssl jui24 16308:37 /usr/sbin/glusterfsd -s 172.27.39.84 --volfile-id gluster.172.27.39.84.mnt-xfs-drive-gluster-brick -p /var/run/gluster/vols/gluster/172.27.39.84-mnt-xfs-drive-gluster-brick.pid -S /var/run/gluster/848c5dbe437c2451.socket --brick-name /mnt/xfs-drive-gluster/brick -l /var/log/glusterfs/bricks/mnt-xfs-drive-gluster-brick.log --xlator-option *-posix.glusterd-uuid=180e8f78-fa85-4cb8-8bbd-b0924a16ba60 --process-name brick --brick-port 49152 --xlator-option gluster-server.listen-port=49152 root 58242 0.2 17.6 1816852 1440608 ? Ssl ao?06 19:08 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO 3. The kill -1 doesn't create anything in the /var/run/gluster folder on either the glusterd or glusterfsd PID. Is it creating a different dump than the one generated above via: `gluster volume statedump gluster` ? Anyhting I am missing to have 6.4 dump its state? Thanks! -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 12 15:51:37 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 12 Aug 2019 15:51:37 +0000 Subject: [Bugs] [Bug 1737141] read() returns more than file size when using direct I/O In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737141 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23212 -- You are receiving this mail because: You are the QA Contact for the bug. You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 12 15:51:39 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 12 Aug 2019 15:51:39 +0000 Subject: [Bugs] [Bug 1737141] read() returns more than file size when using direct I/O In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737141 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |POST --- Comment #6 from Worker Ant --- REVIEW: https://review.gluster.org/23212 (features/shard: Send correct size when reads are sent beyond file size) posted (#1) for review on release-6 by Krutika Dhananjay -- You are receiving this mail because: You are the QA Contact for the bug. You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 12 15:54:04 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 12 Aug 2019 15:54:04 +0000 Subject: [Bugs] [Bug 1740316] New: read() returns more than file size when using direct I/O Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1740316 Bug ID: 1740316 Summary: read() returns more than file size when using direct I/O Product: GlusterFS Version: 7 Status: NEW Component: sharding Keywords: Triaged Severity: high Priority: high Assignee: bugs at gluster.org Reporter: kdhananj at redhat.com QA Contact: bugs at gluster.org CC: atumball at redhat.com, bugs at gluster.org, csaba at redhat.com, kdhananj at redhat.com, khiremat at redhat.com, kwolf at redhat.com, nsoffer at redhat.com, pkarampu at redhat.com, rabhat at redhat.com, rgowdapp at redhat.com, rkavunga at redhat.com, sabose at redhat.com, teigland at redhat.com, tnisan at redhat.com, vjuranek at redhat.com Depends On: 1738419 Target Milestone: --- Classification: Community +++ This bug was initially created as a clone of Bug #1738419 +++ +++ This bug was initially created as a clone of Bug #1737141 +++ Description of problem: When using direct I/O, reading from a file returns more data, padding the file data with zeroes. Here is an example. ## On a host mounting gluster using fuse $ pwd /rhev/data-center/mnt/glusterSD/voodoo4.tlv.redhat.com:_gv0/de566475-5b67-4987-abf3-3dc98083b44c/dom_md $ mount | grep glusterfs voodoo4.tlv.redhat.com:/gv0 on /rhev/data-center/mnt/glusterSD/voodoo4.tlv.redhat.com:_gv0 type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) $ stat metadata File: metadata Size: 501 Blocks: 1 IO Block: 131072 regular file Device: 31h/49d Inode: 13313776956941938127 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 36/ vdsm) Gid: ( 36/ kvm) Context: system_u:object_r:fusefs_t:s0 Access: 2019-08-01 22:21:49.186381528 +0300 Modify: 2019-08-01 22:21:49.427404135 +0300 Change: 2019-08-01 22:21:49.969739575 +0300 Birth: - $ cat metadata ALIGNMENT=1048576 BLOCK_SIZE=4096 CLASS=Data DESCRIPTION=gv0 IOOPTIMEOUTSEC=10 LEASERETRIES=3 LEASETIMESEC=60 LOCKPOLICY= LOCKRENEWALINTERVALSEC=5 MASTER_VERSION=1 POOL_DESCRIPTION=4k-gluster POOL_DOMAINS=de566475-5b67-4987-abf3-3dc98083b44c:Active POOL_SPM_ID=-1 POOL_SPM_LVER=-1 POOL_UUID=44cfb532-3144-48bd-a08c-83065a5a1032 REMOTE_PATH=voodoo4.tlv.redhat.com:/gv0 ROLE=Master SDUUID=de566475-5b67-4987-abf3-3dc98083b44c TYPE=GLUSTERFS VERSION=5 _SHA_CKSUM=3d1cb836f4c93679fc5a4e7218425afe473e3cfa $ dd if=metadata bs=4096 count=1 of=/dev/null 0+1 records in 0+1 records out 501 bytes copied, 0.000340298 s, 1.5 MB/s $ dd if=metadata bs=4096 count=1 of=/dev/null iflag=direct 1+0 records in 1+0 records out 4096 bytes (4.1 kB, 4.0 KiB) copied, 0.00398529 s, 1.0 MB/s Checking the copied data, the actual content of the file is padded with zeros to 4096 bytes. ## On the one of the gluster nodes $ pwd /export/vdo0/brick/de566475-5b67-4987-abf3-3dc98083b44c/dom_md $ stat metadata File: metadata Size: 501 Blocks: 16 IO Block: 4096 regular file Device: fd02h/64770d Inode: 149 Links: 2 Access: (0644/-rw-r--r--) Uid: ( 36/ UNKNOWN) Gid: ( 36/ kvm) Context: system_u:object_r:usr_t:s0 Access: 2019-08-01 22:21:50.380425478 +0300 Modify: 2019-08-01 22:21:49.427397589 +0300 Change: 2019-08-01 22:21:50.374425302 +0300 Birth: - $ dd if=metadata bs=4096 count=1 of=/dev/null 0+1 records in 0+1 records out 501 bytes copied, 0.000991636 s, 505 kB/s $ dd if=metadata bs=4096 count=1 of=/dev/null iflag=direct 0+1 records in 0+1 records out 501 bytes copied, 0.0011381 s, 440 kB/s This proves that the issue is in gluster. # gluster volume info gv0 Volume Name: gv0 Type: Replicate Volume ID: cbc5a2ad-7246-42fc-a78f-70175fb7bf22 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: voodoo4.tlv.redhat.com:/export/vdo0/brick Brick2: voodoo5.tlv.redhat.com:/export/vdo0/brick Brick3: voodoo8.tlv.redhat.com:/export/vdo0/brick (arbiter) Options Reconfigured: storage.owner-gid: 36 storage.owner-uid: 36 server.event-threads: 4 client.event-threads: 4 cluster.choose-local: off user.cifs: off features.shard: on cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 8 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full cluster.server-quorum-type: server cluster.quorum-type: auto cluster.eager-lock: enable network.remote-dio: disable performance.low-prio-threads: 32 performance.io-cache: off performance.read-ahead: off performance.quick-read: off transport.address-family: inet nfs.disable: on performance.client-io-threads: on $ xfs_info /export/vdo0 meta-data=/dev/mapper/vdo0 isize=512 agcount=4, agsize=6553600 blks = sectsz=4096 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=0 data = bsize=4096 blocks=26214400, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=12800, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 Version-Release number of selected component (if applicable): Server: $ rpm -qa | grep glusterfs glusterfs-libs-6.4-1.fc29.x86_64 glusterfs-api-6.4-1.fc29.x86_64 glusterfs-client-xlators-6.4-1.fc29.x86_64 glusterfs-fuse-6.4-1.fc29.x86_64 glusterfs-6.4-1.fc29.x86_64 glusterfs-cli-6.4-1.fc29.x86_64 glusterfs-server-6.4-1.fc29.x86_64 Client: $ rpm -qa | grep glusterfs glusterfs-client-xlators-6.4-1.fc29.x86_64 glusterfs-6.4-1.fc29.x86_64 glusterfs-rdma-6.4-1.fc29.x86_64 glusterfs-cli-6.4-1.fc29.x86_64 glusterfs-libs-6.4-1.fc29.x86_64 glusterfs-fuse-6.4-1.fc29.x86_64 glusterfs-api-6.4-1.fc29.x86_64 How reproducible: Always. Steps to Reproduce: 1. Provision gluster volume over vdo (did not check without vdo) 2. Create a file of 501 bytes 3. Read the file using direct I/O Actual results: read() returns 4096 bytes, padding the file data with zeroes Expected results: read() returns actual file data (501 bytes) --- Additional comment from Nir Soffer on 2019-08-02 19:21:20 UTC --- David, do you think this can affect sanlock? --- Additional comment from Nir Soffer on 2019-08-02 19:25:02 UTC --- Kevin, do you think this can affect qemu/qemu-img? --- Additional comment from Amar Tumballi on 2019-08-05 05:33:57 UTC --- @Nir, thanks for the report. We will look into this. --- Additional comment from Kevin Wolf on 2019-08-05 09:16:16 UTC --- (In reply to Nir Soffer from comment #2) > Kevin, do you think this can affect qemu/qemu-img? This is not a problem for QEMU as long as the file size is correct. If gluster didn't do the zero padding, QEMU would do it internally. In fact, fixing this in gluster may break the case of unaligned image sizes with QEMU because the image size is rounded up to sector (512 byte) granularity and the gluster driver turns short reads into errors. This would actually affect non-O_DIRECT, too, which already seems to behave this way, so can you just give this a quick test? --- Additional comment from David Teigland on 2019-08-05 15:08:32 UTC --- (In reply to Nir Soffer from comment #1) > David, do you think this can affect sanlock? I don't think so. sanlock doesn't use any space that it didn't first write to initialize. --- Additional comment from Worker Ant on 2019-08-08 05:56:04 UTC --- REVIEW: https://review.gluster.org/23175 (features/shard: Send correct size when reads are sent beyond file size) posted (#1) for review on master by Krutika Dhananjay --- Additional comment from Worker Ant on 2019-08-12 13:30:56 UTC --- REVIEW: https://review.gluster.org/23175 (features/shard: Send correct size when reads are sent beyond file size) merged (#3) on master by Krutika Dhananjay Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1738419 [Bug 1738419] read() returns more than file size when using direct I/O -- You are receiving this mail because: You are the QA Contact for the bug. You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 12 15:54:04 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 12 Aug 2019 15:54:04 +0000 Subject: [Bugs] [Bug 1738419] read() returns more than file size when using direct I/O In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1738419 Krutika Dhananjay changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1740316 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1740316 [Bug 1740316] read() returns more than file size when using direct I/O -- You are receiving this mail because: You are the QA Contact for the bug. You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 12 15:56:00 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 12 Aug 2019 15:56:00 +0000 Subject: [Bugs] [Bug 1740316] read() returns more than file size when using direct I/O In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1740316 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23213 -- You are receiving this mail because: You are the QA Contact for the bug. You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 12 15:56:02 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 12 Aug 2019 15:56:02 +0000 Subject: [Bugs] [Bug 1740316] read() returns more than file size when using direct I/O In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1740316 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23213 (features/shard: Send correct size when reads are sent beyond file size) posted (#1) for review on release-7 by Krutika Dhananjay -- You are receiving this mail because: You are the QA Contact for the bug. You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 12 15:56:25 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 12 Aug 2019 15:56:25 +0000 Subject: [Bugs] [Bug 1740316] read() returns more than file size when using direct I/O In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1740316 Krutika Dhananjay changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|bugs at gluster.org |kdhananj at redhat.com -- You are receiving this mail because: You are the QA Contact for the bug. You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 12 15:59:38 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 12 Aug 2019 15:59:38 +0000 Subject: [Bugs] [Bug 1665880] After the shard feature is enabled, the glfs_read will always return the length of the read buffer, no the actual length readed In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1665880 Krutika Dhananjay changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-12 15:59:38 --- Comment #4 from Krutika Dhananjay --- Patch at https://review.gluster.org/c/glusterfs/+/23175 fixes this issue. I'm therefore closing this bz with the resolution NEXT_RELEASE. Thanks a lot Xiubo Li, for taking the time to test the fix! -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 12 20:07:23 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 12 Aug 2019 20:07:23 +0000 Subject: [Bugs] [Bug 1736481] capture stat failure error while setting the gfid In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1736481 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23214 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 12 20:07:24 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 12 Aug 2019 20:07:24 +0000 Subject: [Bugs] [Bug 1736481] capture stat failure error while setting the gfid In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1736481 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23214 (storage/posix: set the op_errno to proper errno during gfid set) posted (#1) for review on release-7 by Raghavendra Bhat -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 13 05:37:36 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 13 Aug 2019 05:37:36 +0000 Subject: [Bugs] [Bug 1739884] glusterfsd process crashes with SIGSEGV In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739884 Amar Tumballi changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|unspecified |high CC| |atumball at redhat.com, | |hgowtham at redhat.com Assignee|bugs at gluster.org |spalai at redhat.com --- Comment #2 from Amar Tumballi --- Thanks for the report. If you still have the core, I request you to post the output of '(gdb) thread apply all bt full'. That gives more details on the crash. Also, if it possible for letting us know a pattern of access (so that we can try to reproduce the same) it would be great. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 13 05:48:25 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 13 Aug 2019 05:48:25 +0000 Subject: [Bugs] [Bug 1740413] Gluster volume bricks crashes when running a security scan on glusterfs ports In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1740413 Amar Tumballi changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|unspecified |high Version|unspecified |6 Component|glusterfs |rpc CC| |bugs at gluster.org Assignee|atumball at redhat.com |bugs at gluster.org QA Contact|bmekala at redhat.com | Product|Red Hat Gluster Storage |GlusterFS Severity|unspecified |high --- Comment #2 from Amar Tumballi --- Looks like the bits are for Upstream GlusterFS (based on the version). Moving the product to 'GlusterFS'. Will analyze this. > Every time the security team runs a security scans on the gluster ports, the bricks crashes. When you say security scan, is it nmap with arguments, or other scripts too? Also is it possible to set 'gluster volume set VOLNAME brick-log-level TRACE' before the test, and run the scan ? That would help us to get more details. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 13 05:56:46 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 13 Aug 2019 05:56:46 +0000 Subject: [Bugs] [Bug 1739335] Multiple disconnect events being propagated for the same child In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739335 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-13 05:56:46 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23180 (protocol/client: propagte GF_EVENT_CHILD_PING only for connections to brick) merged (#1) on release-6 by Ravishankar N -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 13 05:56:47 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 13 Aug 2019 05:56:47 +0000 Subject: [Bugs] [Bug 1739334] Multiple disconnect events being propagated for the same child In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739334 Bug 1739334 depends on bug 1739335, which changed state. Bug 1739335 Summary: Multiple disconnect events being propagated for the same child https://bugzilla.redhat.com/show_bug.cgi?id=1739335 What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 13 05:59:47 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 13 Aug 2019 05:59:47 +0000 Subject: [Bugs] [Bug 1740494] New: Fencing: Added the tcmu-runner ALUA feature support but after one of node is rebooted the glfs_file_lock() get stucked Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1740494 Bug ID: 1740494 Summary: Fencing: Added the tcmu-runner ALUA feature support but after one of node is rebooted the glfs_file_lock() get stucked Product: GlusterFS Version: 6 Status: NEW Component: locks Assignee: spalai at redhat.com Reporter: spalai at redhat.com CC: atumball at redhat.com, bugs at gluster.org, prasanna.kalever at redhat.com, spalai at redhat.com, xiubli at redhat.com Depends On: 1717824 Target Milestone: --- Classification: Community +++ This bug was initially created as a clone of Bug #1717824 +++ Description of problem: In Glusterfs, we have support the fencing feature support. With this we can suppor the ALUA feature in LIO/TCMU now. The fencing doc: https://review.gluster.org/#/c/glusterfs-specs/+/21925/6/accepted/fencing.md The fencing test example: https://review.gluster.org/#/c/glusterfs/+/21496/12/tests/basic/fencing/fence-basic.c The LIO/tcmu-runner PR of supporting the ALUA is : https://github.com/open-iscsi/tcmu-runner/pull/554. But currently when testing it based the above PR in tcmu-runner by shutting down of the HA node, and start it after 2~3 minutes, in all the HA nodes we can see that the glfs_file_lock() get stucked, the following is from the /var/log/tcmu-runner.log: ==== 2019-06-06 13:50:15.755 1316 [DEBUG] tcmu_acquire_dev_lock:388 glfs/block3: lock call state 2 retries 0. tag 65535 reopen 0 2019-06-06 13:50:15.757 1316 [DEBUG] tcmu_acquire_dev_lock:440 glfs/block3: lock call done. lock state 1 2019-06-06 13:50:55.845 1316 [DEBUG] tcmu_acquire_dev_lock:388 glfs/block4: lock call state 2 retries 0. tag 65535 reopen 0 2019-06-06 13:50:55.847 1316 [DEBUG] tcmu_acquire_dev_lock:440 glfs/block4: lock call done. lock state 1 2019-06-06 13:57:50.102 1315 [DEBUG] tcmu_acquire_dev_lock:388 glfs/block3: lock call state 2 retries 0. tag 65535 reopen 0 2019-06-06 13:57:50.103 1315 [DEBUG] tcmu_acquire_dev_lock:440 glfs/block3: lock call done. lock state 1 2019-06-06 13:57:50.121 1315 [DEBUG] tcmu_acquire_dev_lock:388 glfs/block4: lock call state 2 retries 0. tag 65535 reopen 0 2019-06-06 13:57:50.132 1315 [DEBUG] tcmu_acquire_dev_lock:440 glfs/block4: lock call done. lock state 1 2019-06-06 14:09:03.654 1328 [DEBUG] tcmu_acquire_dev_lock:388 glfs/block3: lock call state 2 retries 0. tag 65535 reopen 0 2019-06-06 14:09:03.662 1328 [DEBUG] tcmu_acquire_dev_lock:440 glfs/block3: lock call done. lock state 1 2019-06-06 14:09:06.700 1328 [DEBUG] tcmu_acquire_dev_lock:388 glfs/block4: lock call state 2 retries 0. tag 65535 reopen 0 ==== The lock operation is never returned. I am using the following glusterfs built by myself: # rpm -qa|grep glusterfs glusterfs-extra-xlators-7dev-0.0.el7.x86_64 glusterfs-api-devel-7dev-0.0.el7.x86_64 glusterfs-7dev-0.0.el7.x86_64 glusterfs-server-7dev-0.0.el7.x86_64 glusterfs-cloudsync-plugins-7dev-0.0.el7.x86_64 glusterfs-resource-agents-7dev-0.0.el7.noarch glusterfs-api-7dev-0.0.el7.x86_64 glusterfs-devel-7dev-0.0.el7.x86_64 glusterfs-regression-tests-7dev-0.0.el7.x86_64 glusterfs-gnfs-7dev-0.0.el7.x86_64 glusterfs-client-xlators-7dev-0.0.el7.x86_64 glusterfs-geo-replication-7dev-0.0.el7.x86_64 glusterfs-debuginfo-7dev-0.0.el7.x86_64 glusterfs-fuse-7dev-0.0.el7.x86_64 glusterfs-events-7dev-0.0.el7.x86_64 glusterfs-libs-7dev-0.0.el7.x86_64 glusterfs-cli-7dev-0.0.el7.x86_64 glusterfs-rdma-7dev-0.0.el7.x86_64 How reproducible: 30%. Steps to Reproduce: 1. create one rep volume(HA >= 2) with the mandantary lock enabled 2. create one gluster-blockd target 3. login and do the fio in the client node 4. shutdown one of the HA nodes, and wait 2 ~3 minutes and start it again Actual results: all the time the fio couldn't recovery and the rw BW will be 0kb/s, and we can see tons of log from /var/log/tcmu-runnner.log file: 2019-06-06 15:01:06.641 1328 [DEBUG] alua_implicit_transition:561 glfs/block4: Lock acquisition operation is already in process. 2019-06-06 15:01:06.648 1328 [DEBUG_SCSI_CMD] tcmu_cdb_print_info:353 glfs/block4: 28 0 0 3 1f 80 0 0 8 0 2019-06-06 15:01:06.648 1328 [DEBUG] alua_implicit_transition:561 glfs/block4: Lock acquisition operation is already in process. 2019-06-06 15:01:06.655 1328 [DEBUG_SCSI_CMD] tcmu_cdb_print_info:353 glfs/block4: 28 0 0 3 1f 80 0 0 8 0 2019-06-06 15:01:06.655 1328 [DEBUG] alua_implicit_transition:561 glfs/block4: Lock acquisition operation is already in process. 2019-06-06 15:01:06.661 1328 [DEBUG_SCSI_CMD] tcmu_cdb_print_info:353 glfs/block4: 28 0 0 3 1f 80 0 0 8 0 2019-06-06 15:01:06.662 1328 [DEBUG] alua_implicit_transition:561 glfs/block4: Lock acquisition operation is already in process. Expected results: just before the shutdown node is up, the fio could be recovery. --- Additional comment from Xiubo Li on 2019-06-06 14:39:50 MVT --- --- Additional comment from Xiubo Li on 2019-06-06 14:40:16 MVT --- --- Additional comment from Xiubo Li on 2019-06-06 14:42:10 MVT --- The bt output from the gbd: [root at rhel1 ~]# gdb -p 1325 (gdb) bt #0 0x00007fc7761baf47 in pthread_join () from /lib64/libpthread.so.0 #1 0x00007fc7773de468 in event_dispatch_epoll (event_pool=0x559f03d4b560) at event-epoll.c:847 #2 0x0000559f02419658 in main (argc=21, argv=0x7fff9c6722c8) at glusterfsd.c:2871 (gdb) [root at rhel3 ~]# gdb -p 7669 (gdb) bt #0 0x00007fac80bd9f47 in pthread_join () from /usr/lib64/libpthread.so.0 #1 0x00007fac81dfd468 in event_dispatch_epoll (event_pool=0x55de6f845560) at event-epoll.c:847 #2 0x000055de6f143658 in main (argc=21, argv=0x7ffcafc3eff8) at glusterfsd.c:2871 (gdb) The pl_inode->fop_wind_count is: (gdb) thread 2 [Switching to thread 2 (Thread 0x7fc742184700 (LWP 1829))] #0 0x00007fc7761bd965 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 (gdb) frame 2 #2 0x00007fc76379c13b in pl_lk (frame=frame at entry=0x7fc750001128, this=this at entry=0x7fc75c0128f0, fd=fd at entry=0x7fc73c0977d8, cmd=cmd at entry=6, flock=flock at entry=0x7fc73c076938, xdata=xdata at entry=0x7fc73c071828) at posix.c:2637 2637 ret = pl_lock_preempt(pl_inode, reqlock); (gdb) p pl_inode->fop_wind_count $1 = -30 (gdb) The pstack logs please see the attachments Thanks. BRs --- Additional comment from Susant Kumar Palai on 2019-06-10 12:10:33 MVT --- Just a small update: There are cases where fop_wind_count can go -ve. A basic fix will be never to bring its value down if it is zero. I will update more on this later as I am busy with a few other issues ATM. Susant --- Additional comment from Xiubo Li on 2019-07-17 13:15:51 MVT --- Hi Susant, Is there any new update about this ? Thanks. --- Additional comment from Susant Kumar Palai on 2019-07-17 13:20:56 MVT --- (In reply to Xiubo Li from comment #5) > Hi Susant, > > Is there any new update about this ? > > Thanks. Hey Xiubo, most likely will be sending a patch by end of day today. --- Additional comment from Xiubo Li on 2019-07-17 13:22:28 MVT --- (In reply to Susant Kumar Palai from comment #6) > (In reply to Xiubo Li from comment #5) > > Hi Susant, > > > > Is there any new update about this ? > > > > Thanks. > > Hey Xiubo, most likely will be sending a patch by end of day today. Sure and take your time Susant please :-) Thanks very much. BRs --- Additional comment from Susant Kumar Palai on 2019-07-17 14:13:59 MVT --- Moved to POST by mistake. Resetting. --- Additional comment from Worker Ant on 2019-07-22 14:59:10 MVT --- REVIEW: https://review.gluster.org/23088 (locks/fencing: Address while lock preemption) posted (#1) for review on master by Susant Palai --- Additional comment from Susant Kumar Palai on 2019-07-22 15:02:07 MVT --- Xiubo, if you could test it out, it would be great. (Make sure you enable fencing before you create any client) --- Additional comment from Xiubo Li on 2019-07-22 15:42:38 MVT --- (In reply to Susant Kumar Palai from comment #10) > Xiubo, if you could test it out, it would be great. (Make sure you enable > fencing before you create any client) @Susant, Yeah, thanks very much for your work on this. And I will test it late today or tomorrow. BRs Xiubo --- Additional comment from Xiubo Li on 2019-07-23 07:44:00 MVT --- @Susant, There 2 issues are found: 1, From my test the glfs_file_lock() sometimes it will takes around 43 seconds, is it normal ? And why ? 90456 2019-07-23 10:08:57.444 31411 [INFO] tcmu_glfs_lock:901 glfs/block0: lxb--------------glfs_file_lock start ... 319959 2019-07-23 10:09:40.183 31411 [INFO] tcmu_glfs_lock:905 glfs/block0: lxb--------------glfs_file_lock end 2, After the lock is broke and all the FIOs callback will return -1, the -EPERM, not the -EBUSY as we discussed before. Is there any change about the return value ? I am only checking the -EBUSY and -ENOTCONN then only after that the lock state in local tcmu node will be changed. Or the local state in tcmu-runner service will always in LOCKED state, but it actually already lost the lock and should be in UNLOCKED state, so all the IOs will fail. Thanks, BRs --- Additional comment from Susant Kumar Palai on 2019-07-23 10:45:01 MVT --- (In reply to Xiubo Li from comment #12) > @Susant, > > There 2 issues are found: > > > 1, From my test the glfs_file_lock() sometimes it will takes around 43 > seconds, is it normal ? And why ? > > 90456 2019-07-23 10:08:57.444 31411 [INFO] tcmu_glfs_lock:901 glfs/block0: > lxb--------------glfs_file_lock start > ... > 319959 2019-07-23 10:09:40.183 31411 [INFO] tcmu_glfs_lock:905 glfs/block0: > lxb--------------glfs_file_lock end I wonder if it is related to draining of fops. Let me do some testing around this. > > 2, After the lock is broke and all the FIOs callback will return -1, the > -EPERM, not the -EBUSY as we discussed before. Is there any change about the > return value ? I am only checking the -EBUSY and -ENOTCONN then only after > that the lock state in local tcmu node will be changed. Or the local state > in tcmu-runner service will always in LOCKED state, but it actually already > lost the lock and should be in UNLOCKED state, so all the IOs will fail. This is interesting. Will get back after some code checking. > > > > Thanks, > BRs --- Additional comment from Xiubo Li on 2019-07-23 10:47:15 MVT --- (In reply to Susant Kumar Palai from comment #13) > (In reply to Xiubo Li from comment #12) > > @Susant, > > > > There 2 issues are found: > > > > > > 1, From my test the glfs_file_lock() sometimes it will takes around 43 > > seconds, is it normal ? And why ? > > > > 90456 2019-07-23 10:08:57.444 31411 [INFO] tcmu_glfs_lock:901 glfs/block0: > > lxb--------------glfs_file_lock start > > ... > > 319959 2019-07-23 10:09:40.183 31411 [INFO] tcmu_glfs_lock:905 glfs/block0: > > lxb--------------glfs_file_lock end > > I wonder if it is related to draining of fops. Let me do some testing around > this. > Sure. > > > > > > 2, After the lock is broke and all the FIOs callback will return -1, the > > -EPERM, not the -EBUSY as we discussed before. Is there any change about the > > return value ? I am only checking the -EBUSY and -ENOTCONN then only after > > that the lock state in local tcmu node will be changed. Or the local state > > in tcmu-runner service will always in LOCKED state, but it actually already > > lost the lock and should be in UNLOCKED state, so all the IOs will fail. > > This is interesting. Will get back after some code checking. > Please take your time @Susant. Thanks, BRs --- Additional comment from Susant Kumar Palai on 2019-07-23 12:44:22 MVT --- (In reply to Xiubo Li from comment #12) > @Susant, > > There 2 issues are found: > > > 1, From my test the glfs_file_lock() sometimes it will takes around 43 > seconds, is it normal ? And why ? > > 90456 2019-07-23 10:08:57.444 31411 [INFO] tcmu_glfs_lock:901 glfs/block0: > lxb--------------glfs_file_lock start > ... > 319959 2019-07-23 10:09:40.183 31411 [INFO] tcmu_glfs_lock:905 glfs/block0: > lxb--------------glfs_file_lock end Checked the time taken for file_lock and it completes immediately for me. ret = glfs_fsetxattr(fd1, GF_ENFORCE_MANDATORY_LOCK, "set", 8, 0); if (ret < 0) { LOG_ERR("glfs_fsetxattr", errno); ret = -1; goto out; } time(&before); /* take a write mandatory lock */ ret = glfs_file_lock(fd1, F_SETLKW, &lock, GLFS_LK_MANDATORY); if (ret) { LOG_ERR("glfs_file_lock", errno); goto out; } time(&after); diff = (unsigned long )after - before; fprintf(fp, "time %lu %lu %lu\n", diff, before, after); time 0 1563867824 1563867824 Can you attach the brick log here when you run the test next time? > > 2, After the lock is broke and all the FIOs callback will return -1, the > -EPERM, not the -EBUSY as we discussed before. Is there any change about the > return value ? I am only checking the -EBUSY and -ENOTCONN then only after > that the lock state in local tcmu node will be changed. Or the local state > in tcmu-runner service will always in LOCKED state, but it actually already > lost the lock and should be in UNLOCKED state, so all the IOs will fail. Please attach the brick log after enabling trace logging. brick-log-level TRACE> > > > > Thanks, > BRs --- Additional comment from Xiubo Li on 2019-07-23 12:48:58 MVT --- (In reply to Susant Kumar Palai from comment #15) > (In reply to Xiubo Li from comment #12) > > @Susant, > > > > There 2 issues are found: > > > > > > 1, From my test the glfs_file_lock() sometimes it will takes around 43 > > seconds, is it normal ? And why ? > > > > 90456 2019-07-23 10:08:57.444 31411 [INFO] tcmu_glfs_lock:901 glfs/block0: > > lxb--------------glfs_file_lock start > > ... > > 319959 2019-07-23 10:09:40.183 31411 [INFO] tcmu_glfs_lock:905 glfs/block0: > > lxb--------------glfs_file_lock end > > Checked the time taken for file_lock and it completes immediately for me. > > ret = glfs_fsetxattr(fd1, GF_ENFORCE_MANDATORY_LOCK, "set", 8, 0); > if (ret < 0) { > LOG_ERR("glfs_fsetxattr", errno); > ret = -1; > goto out; > } > > time(&before); > /* take a write mandatory lock */ > ret = glfs_file_lock(fd1, F_SETLKW, &lock, GLFS_LK_MANDATORY); > if (ret) { > LOG_ERR("glfs_file_lock", errno); > goto out; > } > time(&after); > diff = (unsigned long )after - before; > fprintf(fp, "time %lu %lu %lu\n", diff, before, after); > > time 0 1563867824 1563867824 > > Can you attach the brick log here when you run the test next time? > > > > > 2, After the lock is broke and all the FIOs callback will return -1, the > > -EPERM, not the -EBUSY as we discussed before. Is there any change about the > > return value ? I am only checking the -EBUSY and -ENOTCONN then only after > > that the lock state in local tcmu node will be changed. Or the local state > > in tcmu-runner service will always in LOCKED state, but it actually already > > lost the lock and should be in UNLOCKED state, so all the IOs will fail. > > Please attach the brick log after enabling trace logging. brick-log-level TRACE> > > > > > > Sure, I will do that after my current work handy, possibly late today or tomorrow morning. Thanks BRs --- Additional comment from Susant Kumar Palai on 2019-07-25 15:05:47 MVT --- On the permission denied: I did not see any error related to EPERM but saw EBUSY in the brick logs. [2019-07-24 08:15:22.236283] E [MSGID: 101191] [event-epoll.c:765:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler [2019-07-24 08:15:46.083306] E [MSGID: 115068] [server-rpc-fops_v2.c:1425:server4_readv_cbk] 0-repvol3-server: 29: READV 0 (7db899f8-bf56-4b89-a4c6-90235e8c720a), client: CTX_ID:024a059c-7be1-4a19-ba27-8624c6e9c c9c-GRAPH_ID:0-PID:9399-HOST:rhel3-PC_NAME:repvol3-client-2-RECON_NO:-0, error-xlator: repvol3-locks [Resource temporarily unavailable] [2019-07-24 08:15:46.088292] E [MSGID: 115068] [server-rpc-fops_v2.c:1425:server4_readv_cbk] 0-repvol3-server: 31: READV 0 (7db899f8-bf56-4b89-a4c6-90235e8c720a), client: CTX_ID:024a059c-7be1-4a19-ba27-8624c6e9c c9c-GRAPH_ID:0-PID:9399-HOST:rhel3-PC_NAME:repvol3-client-2-RECON_NO:-0, error-xlator: repvol3-locks [Resource temporarily unavailable] [2019-07-24 08:15:46.119463] E [MSGID: 115068] [server-rpc-fops_v2.c:1425:server4_readv_cbk] 0-repvol3-server: 33: READV 0 (7db899f8-bf56-4b89-a4c6-90235e8c720a), client: CTX_ID:024a059c-7be1-4a19-ba27-8624c6e9c c9c-GRAPH_ID:0-PID:9399-HOST:rhel3-PC_NAME:repvol3-client-2-RECON_NO:-0, error-xlator: repvol3-locks [Resource temporarily unavailable] [2019-07-24 08:15:46.124067] E [MSGID: 115068] [server-rpc-fops_v2.c:1425:server4_readv_cbk] 0-repvol3-server: 35: READV 0 (7db899f8-bf56-4b89-a4c6-90235e8c720a), client: CTX_ID:024a059c-7be1-4a19-ba27-8624c6e9c c9c-GRAPH_ID:0-PID:9399-HOST:rhel3-PC_NAME:repvol3-client-2-RECON_NO:-0, error-xlator: repvol3-locks [Resource temporarily unavailable] [2019-07-24 08:15:46.294554] E [MSGID: 115068] [server-rpc-fops_v2.c:1425:server4_readv_cbk] 0-repvol3-server: 37: READV 0 (7db899f8-bf56-4b89-a4c6-90235e8c720a), client: CTX_ID:024a059c-7be1-4a19-ba27-8624c6e9c c9c-GRAPH_ID:0-PID:9399-HOST:rhel3-PC_NAME:repvol3-client-2-RECON_NO:-0, error-xlator: repvol3-locks [Resource temporarily unavailable] [2019-07-24 08:15:46.298672] E [MSGID: 115068] [server-rpc-fops_v2.c:1425:server4_readv_cbk] 0-repvol3-server: 39: READV 0 (7db899f8-bf56-4b Is it possible that the lower layer is converting the errnos to EPERM? Can you check gfapi logs and tcmu logs for corresponding error messages and confirm? --- Additional comment from Xiubo Li on 2019-07-25 15:39:23 MVT --- (In reply to Susant Kumar Palai from comment #17) > On the permission denied: > > I did not see any error related to EPERM but saw EBUSY in the brick logs. > > > [2019-07-24 08:15:22.236283] E [MSGID: 101191] > [event-epoll.c:765:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch > handler > [2019-07-24 08:15:46.083306] E [MSGID: 115068] > [server-rpc-fops_v2.c:1425:server4_readv_cbk] 0-repvol3-server: 29: READV 0 > (7db899f8-bf56-4b89-a4c6-90235e8c720a), client: > CTX_ID:024a059c-7be1-4a19-ba27-8624c6e9c > c9c-GRAPH_ID:0-PID:9399-HOST:rhel3-PC_NAME:repvol3-client-2-RECON_NO:-0, > error-xlator: repvol3-locks [Resource temporarily unavailable] > [2019-07-24 08:15:46.088292] E [MSGID: 115068] > [server-rpc-fops_v2.c:1425:server4_readv_cbk] 0-repvol3-server: 31: READV 0 > (7db899f8-bf56-4b89-a4c6-90235e8c720a), client: > CTX_ID:024a059c-7be1-4a19-ba27-8624c6e9c > c9c-GRAPH_ID:0-PID:9399-HOST:rhel3-PC_NAME:repvol3-client-2-RECON_NO:-0, > error-xlator: repvol3-locks [Resource temporarily unavailable] > [2019-07-24 08:15:46.119463] E [MSGID: 115068] > [server-rpc-fops_v2.c:1425:server4_readv_cbk] 0-repvol3-server: 33: READV 0 > (7db899f8-bf56-4b89-a4c6-90235e8c720a), client: > CTX_ID:024a059c-7be1-4a19-ba27-8624c6e9c > c9c-GRAPH_ID:0-PID:9399-HOST:rhel3-PC_NAME:repvol3-client-2-RECON_NO:-0, > error-xlator: repvol3-locks [Resource temporarily unavailable] > [2019-07-24 08:15:46.124067] E [MSGID: 115068] > [server-rpc-fops_v2.c:1425:server4_readv_cbk] 0-repvol3-server: 35: READV 0 > (7db899f8-bf56-4b89-a4c6-90235e8c720a), client: > CTX_ID:024a059c-7be1-4a19-ba27-8624c6e9c > c9c-GRAPH_ID:0-PID:9399-HOST:rhel3-PC_NAME:repvol3-client-2-RECON_NO:-0, > error-xlator: repvol3-locks [Resource temporarily unavailable] > [2019-07-24 08:15:46.294554] E [MSGID: 115068] > [server-rpc-fops_v2.c:1425:server4_readv_cbk] 0-repvol3-server: 37: READV 0 > (7db899f8-bf56-4b89-a4c6-90235e8c720a), client: > CTX_ID:024a059c-7be1-4a19-ba27-8624c6e9c > c9c-GRAPH_ID:0-PID:9399-HOST:rhel3-PC_NAME:repvol3-client-2-RECON_NO:-0, > error-xlator: repvol3-locks [Resource temporarily unavailable] > [2019-07-24 08:15:46.298672] E [MSGID: 115068] > [server-rpc-fops_v2.c:1425:server4_readv_cbk] 0-repvol3-server: 39: READV 0 > (7db899f8-bf56-4b > > > Is it possible that the lower layer is converting the errnos to EPERM? Can > you check gfapi logs and tcmu logs for corresponding error messages and > confirm? If so maybe the gfapi is doing this. I will sent you the gfapi logs, the EPERM value comes from the gfapi directly and tcmu-runner do nothing with it. Checked the gfapi log, it is also full of: [2019-07-24 08:23:41.042339] W [MSGID: 114031] [client-rpc-fops_v2.c:680:client4_0_writev_cbk] 0-repvol3-client-1: remote operation failed [Device or resource busy] [2019-07-24 08:23:41.042381] W [MSGID: 114031] [client-rpc-fops_v2.c:680:client4_0_writev_cbk] 0-repvol3-client-0: remote operation failed [Device or resource busy] [2019-07-24 08:23:41.042556] W [MSGID: 114031] [client-rpc-fops_v2.c:680:client4_0_writev_cbk] 0-repvol3-client-1: remote operation failed [Device or resource busy] [2019-07-24 08:23:41.042574] W [MSGID: 114031] [client-rpc-fops_v2.c:680:client4_0_writev_cbk] 0-repvol3-client-0: remote operation failed [Device or resource busy] [2019-07-24 08:23:41.042655] W [MSGID: 114031] [client-rpc-fops_v2.c:680:client4_0_writev_cbk] 0-repvol3-client-1: remote operation failed [Device or resource busy] [2019-07-24 08:23:41.042671] W [MSGID: 114031] [client-rpc-fops_v2.c:680:client4_0_writev_cbk] 0-repvol3-client-0: remote operation failed [Device or resource busy] [2019-07-24 08:23:41.042709] W [MSGID: 114031] [client-rpc-fops_v2.c:680:client4_0_writev_cbk] 0-repvol3-client-1: remote operation failed [Device or resource busy] [2019-07-24 08:23:41.042722] W [MSGID: 114031] [client-rpc-fops_v2.c:680:client4_0_writev_cbk] 0-repvol3-client-0: remote operation failed [Device or resource busy] [2019-07-24 08:23:41.042784] W [MSGID: 114031] [client-rpc-fops_v2.c:680:client4_0_writev_cbk] 0-repvol3-client-1: remote operation failed [Device or resource busy] Checked the gfapi source code: 677 out: 678 if (rsp.op_ret == -1) { 679 gf_msg(this->name, GF_LOG_WARNING, gf_error_to_errno(rsp.op_errno), 680 PC_MSG_REMOTE_OP_FAILED, "remote operation failed"); 681 } else if (rsp.op_ret >= 0) { 682 if (local->attempt_reopen) 683 client_attempt_reopen(local->fd, this); 684 } 685 CLIENT_STACK_UNWIND(writev, frame, rsp.op_ret, 686 gf_error_to_errno(rsp.op_errno), &prestat, &poststat, 687 xdata); 688 689 if (xdata) 690 dict_unref(xdata); It seems the return valume is coverted. Thanks, BRs --- Additional comment from Xiubo Li on 2019-07-25 15:42:32 MVT --- (In reply to Xiubo Li from comment #18) > (In reply to Susant Kumar Palai from comment #17) [...] > > Checked the gfapi source code: > > 677 out: > 678 if (rsp.op_ret == -1) { It seems returning the rsp.op_ret here to the callback: static void glfs_async_cbk(glfs_fd_t *fd, ssize_t ret, void *data) Not the rsp.op_errno. > 679 gf_msg(this->name, GF_LOG_WARNING, > gf_error_to_errno(rsp.op_errno), > > 680 PC_MSG_REMOTE_OP_FAILED, "remote operation failed"); > 681 } else if (rsp.op_ret >= 0) { > 682 if (local->attempt_reopen) > 683 client_attempt_reopen(local->fd, this); > 684 } > 685 CLIENT_STACK_UNWIND(writev, frame, rsp.op_ret, > 686 gf_error_to_errno(rsp.op_errno), &prestat, > &poststat, > 687 xdata); > 688 > 689 if (xdata) > 690 dict_unref(xdata); > > > It seems the return valume is coverted. > > Thanks, > BRs --- Additional comment from Xiubo Li on 2019-08-01 06:04:52 MVT --- When the ret == -1 and then check the errno directly will works for me now. But I can get both the -EAGAIN and -EBUSY, which only the -EBUSY is expected. Then the problem is why there will always be -EAGAIN every time before acquiring the lock ? Thanks BRs --- Additional comment from Worker Ant on 2019-08-02 19:27:15 MVT --- REVIEW: https://review.gluster.org/23088 (locks/fencing: Address hang while lock preemption) merged (#4) on master by Amar Tumballi --- Additional comment from Xiubo Li on 2019-08-04 18:03:30 MVT --- @Susant, Since the Fencing patch has been into the release 6, so this fixing followed should be backported, right ? Thanks. BRs --- Additional comment from Susant Kumar Palai on 2019-08-12 14:29:58 MVT --- (In reply to Xiubo Li from comment #22) > @Susant, > > Since the Fencing patch has been into the release 6, so this fixing followed > should be backported, right ? > > Thanks. > BRs Will backport to release 6 and 7. Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1717824 [Bug 1717824] Fencing: Added the tcmu-runner ALUA feature support but after one of node is rebooted the glfs_file_lock() get stucked -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 13 05:59:47 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 13 Aug 2019 05:59:47 +0000 Subject: [Bugs] [Bug 1717824] Fencing: Added the tcmu-runner ALUA feature support but after one of node is rebooted the glfs_file_lock() get stucked In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1717824 Susant Kumar Palai changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1740494 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1740494 [Bug 1740494] Fencing: Added the tcmu-runner ALUA feature support but after one of node is rebooted the glfs_file_lock() get stucked -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 13 05:59:59 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 13 Aug 2019 05:59:59 +0000 Subject: [Bugs] [Bug 1738763] [EC] : fix coverity issue In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1738763 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-13 05:59:59 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23176 (cluster/ec: Fix coverity issue.) merged (#3) on master by Amar Tumballi -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 13 06:13:02 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 13 Aug 2019 06:13:02 +0000 Subject: [Bugs] [Bug 1740077] Fencing: Added the tcmu-runner ALUA feature support but after one of node is rebooted the glfs_file_lock() get stucked In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1740077 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23216 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 13 06:13:03 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 13 Aug 2019 06:13:03 +0000 Subject: [Bugs] [Bug 1740077] Fencing: Added the tcmu-runner ALUA feature support but after one of node is rebooted the glfs_file_lock() get stucked In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1740077 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23216 (locks/fencing: Address hang while lock preemption) posted (#2) for review on release-7 by Susant Palai -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 13 06:14:07 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 13 Aug 2019 06:14:07 +0000 Subject: [Bugs] [Bug 1740494] Fencing: Added the tcmu-runner ALUA feature support but after one of node is rebooted the glfs_file_lock() get stucked In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1740494 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23217 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 13 06:14:08 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 13 Aug 2019 06:14:08 +0000 Subject: [Bugs] [Bug 1740494] Fencing: Added the tcmu-runner ALUA feature support but after one of node is rebooted the glfs_file_lock() get stucked In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1740494 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23217 (locks/fencing: Address hang while lock preemption) posted (#2) for review on release-6 by Susant Palai -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 13 08:15:51 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 13 Aug 2019 08:15:51 +0000 Subject: [Bugs] [Bug 1740519] New: event: rename event_XXX with gf_ prefixed to avoid crash when apps linked libevent at the same time Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1740519 Bug ID: 1740519 Summary: event: rename event_XXX with gf_ prefixed to avoid crash when apps linked libevent at the same time Product: GlusterFS Version: 7 Status: NEW Component: eventsapi Assignee: bugs at gluster.org Reporter: xiubli at redhat.com Target Milestone: --- Classification: Community Description of problem: event: rename event_XXX with gf_ prefixed I hit one crash issue when using the libgfapi. In the libgfapi it will call glfs_poller() --> event_dispatch() in file api/src/glfs.c:721, and the event_dispatch() is defined by libgluster locally, the problem is the name of event_dispatch() is the extremly the same with the one from libevent package form the OS. For example, if a executable program Foo, which will also use and link the libevent and the libgfapi at the same time, I can hit the crash, like: kernel: glfs_glfspoll[68486]: segfault at 1c0 ip 00007fef006fd2b8 sp 00007feeeaffce30 error 4 in libevent-2.0.so.5.1.9[7fef006ed000+46000] The link for Foo is: lib_foo_LADD = -levent $(GFAPI_LIBS) It will crash. This is because the glfs_poller() is calling the event_dispatch() from the libevent, not the libglsuter. The gfapi link info : GFAPI_LIBS = -lacl -lgfapi -lglusterfs -lgfrpc -lgfxdr -luuid If I link Foo like: lib_foo_LADD = $(GFAPI_LIBS) -levent It will works well without any problem. And if Foo call one private lib, such as handler_glfs.so, and the handler_glfs.so will link the GFAPI_LIBS directly, while the Foo won't and it will dlopen(handler_glfs.so), then the crash will be hit everytime. The link info will be: foo_LADD = -levent libhandler_glfs_LIBADD = $(GFAPI_LIBS) I can avoid the crash temporarily by linking the GFAPI_LIBS in Foo too like: foo_LADD = $(GFAPI_LIBS) -levent libhandler_glfs_LIBADD = $(GFAPI_LIBS) But this is ugly since the Foo won't use any APIs from the GFAPI_LIBS. And in some cases when the --as-needed link option is added(on many dists it is added as default), then the crash is back again, the above workaround won't work. How reproducible: Link libveent and libgfapi at the same time, then run the app. -- You are receiving this mail because: You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 13 08:23:26 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 13 Aug 2019 08:23:26 +0000 Subject: [Bugs] [Bug 1740519] event: rename event_XXX with gf_ prefixed to avoid crash when apps linked libevent at the same time In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1740519 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23218 -- You are receiving this mail because: You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 13 08:23:27 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 13 Aug 2019 08:23:27 +0000 Subject: [Bugs] [Bug 1740519] event: rename event_XXX with gf_ prefixed to avoid crash when apps linked libevent at the same time In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1740519 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23218 (event: rename event_XXX with gf_ prefixed) posted (#1) for review on release-7 by Xiubo Li -- You are receiving this mail because: You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 13 08:23:49 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 13 Aug 2019 08:23:49 +0000 Subject: [Bugs] [Bug 1740525] New: event: rename event_XXX with gf_ prefixed to avoid crash when apps linked libevent at the same time Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1740525 Bug ID: 1740525 Summary: event: rename event_XXX with gf_ prefixed to avoid crash when apps linked libevent at the same time Product: GlusterFS Version: 6 Status: NEW Component: eventsapi Assignee: bugs at gluster.org Reporter: xiubli at redhat.com Target Milestone: --- Classification: Community Description of problem: event: rename event_XXX with gf_ prefixed I hit one crash issue when using the libgfapi. In the libgfapi it will call glfs_poller() --> event_dispatch() in file api/src/glfs.c:721, and the event_dispatch() is defined by libgluster locally, the problem is the name of event_dispatch() is the extremly the same with the one from libevent package form the OS. For example, if a executable program Foo, which will also use and link the libevent and the libgfapi at the same time, I can hit the crash, like: kernel: glfs_glfspoll[68486]: segfault at 1c0 ip 00007fef006fd2b8 sp 00007feeeaffce30 error 4 in libevent-2.0.so.5.1.9[7fef006ed000+46000] The link for Foo is: lib_foo_LADD = -levent $(GFAPI_LIBS) It will crash. This is because the glfs_poller() is calling the event_dispatch() from the libevent, not the libglsuter. The gfapi link info : GFAPI_LIBS = -lacl -lgfapi -lglusterfs -lgfrpc -lgfxdr -luuid If I link Foo like: lib_foo_LADD = $(GFAPI_LIBS) -levent It will works well without any problem. And if Foo call one private lib, such as handler_glfs.so, and the handler_glfs.so will link the GFAPI_LIBS directly, while the Foo won't and it will dlopen(handler_glfs.so), then the crash will be hit everytime. The link info will be: foo_LADD = -levent libhandler_glfs_LIBADD = $(GFAPI_LIBS) I can avoid the crash temporarily by linking the GFAPI_LIBS in Foo too like: foo_LADD = $(GFAPI_LIBS) -levent libhandler_glfs_LIBADD = $(GFAPI_LIBS) But this is ugly since the Foo won't use any APIs from the GFAPI_LIBS. And in some cases when the --as-needed link option is added(on many dists it is added as default), then the crash is back again, the above workaround won't work. How reproducible: Link libveent and libgfapi at the same time, then run the app. -- You are receiving this mail because: You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 13 08:26:42 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 13 Aug 2019 08:26:42 +0000 Subject: [Bugs] [Bug 1740525] event: rename event_XXX with gf_ prefixed to avoid crash when apps linked libevent at the same time In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1740525 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23219 (event: rename event_XXX with gf_ prefixed) posted (#1) for review on release-6 by Xiubo Li -- You are receiving this mail because: You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 13 08:26:41 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 13 Aug 2019 08:26:41 +0000 Subject: [Bugs] [Bug 1740525] event: rename event_XXX with gf_ prefixed to avoid crash when apps linked libevent at the same time In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1740525 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23219 -- You are receiving this mail because: You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 13 08:40:29 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 13 Aug 2019 08:40:29 +0000 Subject: [Bugs] [Bug 1740525] event: rename event_XXX with gf_ prefixed to avoid crash when apps linked libevent at the same time In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1740525 Xiubo Li changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |bugs at gluster.org Component|eventsapi |core -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 13 08:41:07 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 13 Aug 2019 08:41:07 +0000 Subject: [Bugs] [Bug 1740519] event: rename event_XXX with gf_ prefixed to avoid crash when apps linked libevent at the same time In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1740519 Xiubo Li changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |bugs at gluster.org Component|eventsapi |core -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 13 14:26:35 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 13 Aug 2019 14:26:35 +0000 Subject: [Bugs] [Bug 1668239] [man page] Gluster(8) - Missing disperse-data parameter Gluster Console Manager man page In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1668239 Rishubh Jain changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 13 14:26:56 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 13 Aug 2019 14:26:56 +0000 Subject: [Bugs] [Bug 1718741] GlusterFS having high CPU In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1718741 Rishubh Jain changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 13 19:14:54 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 13 Aug 2019 19:14:54 +0000 Subject: [Bugs] [Bug 1194546] Write behind returns success for a write irrespective of a conflicting lock held by another application In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1194546 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23224 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 13 19:14:55 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 13 Aug 2019 19:14:55 +0000 Subject: [Bugs] [Bug 1194546] Write behind returns success for a write irrespective of a conflicting lock held by another application In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1194546 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |POST --- Comment #10 from Worker Ant --- REVIEW: https://review.gluster.org/23224 (performance/write-behind: lk and write calls should be ordered) posted (#1) for review on master by Rishubh Jain -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 13 19:41:41 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 13 Aug 2019 19:41:41 +0000 Subject: [Bugs] [Bug 1718741] GlusterFS having high CPU In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1718741 Rishubh Jain changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(suresh3.mani at gmai | |l.com) -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 14 03:21:50 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 03:21:50 +0000 Subject: [Bugs] [Bug 1732717] fuse: Limit the number of inode invalidation requests in the queue In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732717 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-14 03:21:50 --- Comment #3 from Worker Ant --- REVIEW: https://review.gluster.org/23187 (fuse: Set limit on invalidate queue size) merged (#3) on master by Amar Tumballi -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 14 03:22:15 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 03:22:15 +0000 Subject: [Bugs] [Bug 1733042] cluster.rc Create separate logdirs for each host instance In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1733042 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-14 03:22:15 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23097 (glusterd: create separate logdirs for cluster.rc instances) merged (#4) on master by Amar Tumballi -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 14 03:56:27 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 03:56:27 +0000 Subject: [Bugs] [Bug 1716979] Multiple disconnect events being propagated for the same child In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1716979 --- Comment #11 from Amgad --- Hari Unfortunately, this bug made our platform unstable and we can't wait for 2-month. Could it be backported at least for release 6.x this month as a cherry pick. Appreciate the support! Regards, Amgad -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Wed Aug 14 05:12:19 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 05:12:19 +0000 Subject: [Bugs] [Bug 1739320] The result (hostname) of getnameinfo for all bricks (ipv6 addresses) are the same, while they are not. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739320 --- Comment #3 from Amgad --- can someone provide a pointer to the "getnameinfo" source code while looking at the issue -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Wed Aug 14 05:19:45 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 05:19:45 +0000 Subject: [Bugs] [Bug 1739320] The result (hostname) of getnameinfo for all bricks (ipv6 addresses) are the same, while they are not. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739320 Atin Mukherjee changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |avishwan at redhat.com Flags| |needinfo?(avishwan at redhat.c | |om) --- Comment #4 from Atin Mukherjee --- Aravinda - can you please help here? -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Wed Aug 14 05:30:14 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 05:30:14 +0000 Subject: [Bugs] [Bug 1734027] glusterd 6.4 memory leaks 2-3 GB per 24h (OOM) In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734027 Atin Mukherjee changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |amukherj at redhat.com Flags| |needinfo?(totalworlddominat | |ion at gmail.com) --- Comment #9 from Atin Mukherjee --- Are you running any monitoring command? Could you attach the cmd_history.log files from all the nodes? Restarting GlusterD with out any constant series of incoming commands wouldn't lead to such massive leak, so there's something terribly wrong going on with your setup. Statedump can be captured with 'kill -SIGUSR1 $(pidof glusterd)' command . If you still fail to see any output file in /var/run/gluster please send us glusterd.log file too along with output of gluster peer status. Also how are you monitoring the memory? Through ps command? -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 14 05:56:12 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 05:56:12 +0000 Subject: [Bugs] [Bug 1193929] GlusterFS can be improved In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1193929 --- Comment #745 from Worker Ant --- REVIEW: https://review.gluster.org/23169 (client-handshake.c: minor changes and removal of dead code.) merged (#5) on master by Amar Tumballi -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Wed Aug 14 06:15:06 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 06:15:06 +0000 Subject: [Bugs] [Bug 1739320] The result (hostname) of getnameinfo for all bricks (ipv6 addresses) are the same, while they are not. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739320 Aravinda VK changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(avishwan at redhat.c | |om) | --- Comment #5 from Aravinda VK --- I think it is failing while doing strcmp comparison. ``` if (!strcmp(firstip, nextip)) { return GF_AI_COMPARE_MATCH; } ``` Wrote a small script to compare the hostnames ``` #include #include int main() { char* first = "roger-1812-we-01"; char* second = "roger-1903-we-01"; char* third = "roger-1812-cwes-01"; printf("First(%s) vs Second(%s): %d\n", first, second, strcmp(first, second)); printf("First(%s) vs Third(%s): %d\n", first, third, strcmp(first, third)); printf("Second(%s) vs Third(%s): %d\n", second, third, strcmp(second, third)); } ``` And the output is First(roger-1812-we-01) vs Second(roger-1903-we-01): -1 First(roger-1812-we-01) vs Third(roger-1812-cwes-01): 20 Second(roger-1903-we-01) vs Third(roger-1812-cwes-01): 1 We should change the comparison to ``` if (strcmp(firstip, nextip) == 0) { return GF_AI_COMPARE_MATCH; } ``` -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Wed Aug 14 06:25:06 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 06:25:06 +0000 Subject: [Bugs] [Bug 1739320] The result (hostname) of getnameinfo for all bricks (ipv6 addresses) are the same, while they are not. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739320 --- Comment #6 from Aravinda VK --- Ignore my previous comment. I was wrong. Thanks Amar for pointing that. `!-1` is `0` -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Wed Aug 14 06:30:38 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 06:30:38 +0000 Subject: [Bugs] [Bug 1734370] atime/mtime is not restored after healing for entry self heals In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734370 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-14 06:30:38 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23132 (afr: restore timestamp of parent dir during entry-heal) merged (#5) on master by Pranith Kumar Karampuri -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 14 06:38:20 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 06:38:20 +0000 Subject: [Bugs] [Bug 1741041] New: atime/mtime is not restored after healing for entry self heals Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741041 Bug ID: 1741041 Summary: atime/mtime is not restored after healing for entry self heals Product: GlusterFS Version: 7 Status: NEW Component: replicate Keywords: Triaged Severity: low Priority: medium Assignee: bugs at gluster.org Reporter: ravishankar at redhat.com CC: bugs at gluster.org, nchilaka at redhat.com Depends On: 1572163, 1734370 Target Milestone: --- Classification: Community +++ This bug was initially created as a clone of Bug #1734370 +++ +++ This bug was initially created as a clone of Bug #1572163 +++ Description of problem: atime/mtime is not restored after healing for entry self heals Version-Release number of selected component (if applicable): Build : glusterfs-3.12.2-8.el7rhgs.x86_64 How reproducible: Always Steps to Reproduce: 1) create 1 * 3 volume 2) create a 'dir' from mount 3) kill one brick 4) touch 'dir/file' 5) bring back brick after some time 6) after healing, mtime and ctime of 'dir' will be different, while it should be same. Actual results: After healing time is not same as orginal atime/ctime of the dir >From N1: # ls -lrt /bricks/brick2/b0/ total 0 drwxr-xr-x. 2 root root 18 Apr 26 03:16 dir # >From N2: # ls -lrt /bricks/brick2/b1 total 0 drwxr-xr-x. 2 root root 18 Apr 26 02:47 dir # Expected results: mtime/atime should be same from all the bricks. Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1572163 [Bug 1572163] atime/mtime is not restored after healing for entry self heals https://bugzilla.redhat.com/show_bug.cgi?id=1734370 [Bug 1734370] atime/mtime is not restored after healing for entry self heals -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Wed Aug 14 06:38:20 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 06:38:20 +0000 Subject: [Bugs] [Bug 1734370] atime/mtime is not restored after healing for entry self heals In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734370 Ravishankar N changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1741041 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1741041 [Bug 1741041] atime/mtime is not restored after healing for entry self heals -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 14 06:38:55 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 06:38:55 +0000 Subject: [Bugs] [Bug 1741041] atime/mtime is not restored after healing for entry self heals In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741041 Ravishankar N changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED Assignee|bugs at gluster.org |ravishankar at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Wed Aug 14 06:40:50 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 06:40:50 +0000 Subject: [Bugs] [Bug 1741041] atime/mtime is not restored after healing for entry self heals In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741041 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23225 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 14 06:40:51 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 06:40:51 +0000 Subject: [Bugs] [Bug 1741041] atime/mtime is not restored after healing for entry self heals In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741041 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23225 (afr: restore timestamp of parent dir during entry-heal) posted (#1) for review on release-7 by Ravishankar N -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 14 06:40:59 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 06:40:59 +0000 Subject: [Bugs] [Bug 1732793] I/O error on writes to a disperse volume when replace-brick is executed In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732793 nchilaka changed: What |Removed |Added ---------------------------------------------------------------------------- QA Contact|nchilaka at redhat.com |ubansal at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 14 06:49:44 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 06:49:44 +0000 Subject: [Bugs] [Bug 1741044] New: atime/mtime is not restored after healing for entry self heals Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741044 Bug ID: 1741044 Summary: atime/mtime is not restored after healing for entry self heals Product: GlusterFS Version: 6 Status: NEW Component: replicate Keywords: Triaged Severity: low Priority: medium Assignee: bugs at gluster.org Reporter: ravishankar at redhat.com CC: bugs at gluster.org, nchilaka at redhat.com Depends On: 1572163, 1734370 Blocks: 1741041 Target Milestone: --- Classification: Community +++ This bug was initially created as a clone of Bug #1734370 +++ +++ This bug was initially created as a clone of Bug #1572163 +++ Description of problem: atime/mtime is not restored after healing for entry self heals Version-Release number of selected component (if applicable): Build : glusterfs-3.12.2-8.el7rhgs.x86_64 How reproducible: Always Steps to Reproduce: 1) create 1 * 3 volume 2) create a 'dir' from mount 3) kill one brick 4) touch 'dir/file' 5) bring back brick after some time 6) after healing, mtime and ctime of 'dir' will be different, while it should be same. Actual results: After healing time is not same as orginal atime/ctime of the dir >From N1: # ls -lrt /bricks/brick2/b0/ total 0 drwxr-xr-x. 2 root root 18 Apr 26 03:16 dir # >From N2: # ls -lrt /bricks/brick2/b1 total 0 drwxr-xr-x. 2 root root 18 Apr 26 02:47 dir # Expected results: mtime/atime should be same from all the bricks. Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1572163 [Bug 1572163] atime/mtime is not restored after healing for entry self heals https://bugzilla.redhat.com/show_bug.cgi?id=1734370 [Bug 1734370] atime/mtime is not restored after healing for entry self heals https://bugzilla.redhat.com/show_bug.cgi?id=1741041 [Bug 1741041] atime/mtime is not restored after healing for entry self heals -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Wed Aug 14 06:49:44 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 06:49:44 +0000 Subject: [Bugs] [Bug 1734370] atime/mtime is not restored after healing for entry self heals In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734370 Ravishankar N changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1741044 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1741044 [Bug 1741044] atime/mtime is not restored after healing for entry self heals -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 14 06:49:44 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 06:49:44 +0000 Subject: [Bugs] [Bug 1741041] atime/mtime is not restored after healing for entry self heals In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741041 Ravishankar N changed: What |Removed |Added ---------------------------------------------------------------------------- Depends On| |1741044 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1741044 [Bug 1741044] atime/mtime is not restored after healing for entry self heals -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 14 06:50:04 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 06:50:04 +0000 Subject: [Bugs] [Bug 1741044] atime/mtime is not restored after healing for entry self heals In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741044 Ravishankar N changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED Assignee|bugs at gluster.org |ravishankar at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Wed Aug 14 06:52:00 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 06:52:00 +0000 Subject: [Bugs] [Bug 1741044] atime/mtime is not restored after healing for entry self heals In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741044 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23226 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 14 06:52:01 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 06:52:01 +0000 Subject: [Bugs] [Bug 1741044] atime/mtime is not restored after healing for entry self heals In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741044 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23226 (afr: restore timestamp of parent dir during entry-heal) posted (#1) for review on release-6 by Ravishankar N -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 14 07:15:58 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 07:15:58 +0000 Subject: [Bugs] [Bug 1558507] Gluster allows renaming of folders, which contain WORMed/Retain or WORMed files In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1558507 Vishal Pandey changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |ksubrahm at redhat.com --- Comment #3 from Vishal Pandey --- Had a discussion with Karthik and he is suspicious of why we need this change. Few reasons why this change might not be needed - -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 14 07:19:08 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 07:19:08 +0000 Subject: [Bugs] [Bug 1558507] Gluster allows renaming of folders, which contain WORMed/Retain or WORMed files In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1558507 --- Comment #4 from Vishal Pandey --- (In reply to Vishal Pandey from comment #3) > Had a discussion with Karthik and he is suspicious of why we need this > change. Few reasons why this change might not be needed - 1- Rename will not change the contents or metadata of the files inside src directory 2- Rename will not change the xattrs specific to WORM feature on the files 3- The dir can contain other files as well which may not be WORM-Retained or WORMed yet 4- Also even if we do check a directory if it has worm files or no, it will take a long time in case when the number of files are too large. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 14 09:19:05 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 09:19:05 +0000 Subject: [Bugs] [Bug 1428101] cluster/afr: Turn on pgfid tracking by default In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1428101 Rishubh Jain changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 14 09:20:08 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 09:20:08 +0000 Subject: [Bugs] [Bug 1428101] cluster/afr: Turn on pgfid tracking by default In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1428101 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23231 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 14 09:20:09 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 09:20:09 +0000 Subject: [Bugs] [Bug 1428101] cluster/afr: Turn on pgfid tracking by default In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1428101 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23231 (cluster/afr: Turn on pgfid tracking by default) posted (#1) for review on master by Rishubh Jain -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 14 11:56:47 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 11:56:47 +0000 Subject: [Bugs] [Bug 1732776] I/O error on writes to a disperse volume when replace-brick is executed In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732776 Upasana changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |ubansal at redhat.com QA Contact|nchilaka at redhat.com |ubansal at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 14 11:58:44 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 11:58:44 +0000 Subject: [Bugs] [Bug 1734303] Open fd heal should filter O_APPEND/O_EXCL In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734303 nchilaka changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ON_QA |VERIFIED -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 14 12:03:20 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 12:03:20 +0000 Subject: [Bugs] [Bug 1732776] I/O error on writes to a disperse volume when replace-brick is executed In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732776 Upasana changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ON_QA |VERIFIED -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 14 12:07:30 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 12:07:30 +0000 Subject: [Bugs] [Bug 1732793] I/O error on writes to a disperse volume when replace-brick is executed In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732793 Upasana changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(aspandey at redhat.c | |om) -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 14 12:10:16 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 12:10:16 +0000 Subject: [Bugs] [Bug 1732793] I/O error on writes to a disperse volume when replace-brick is executed In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732793 Upasana changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |sheggodu at redhat.com Flags| |needinfo?(sheggodu at redhat.c | |om) -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 14 12:12:04 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 12:12:04 +0000 Subject: [Bugs] [Bug 1732793] I/O error on writes to a disperse volume when replace-brick is executed In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732793 Sunil Kumar Acharya changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(aspandey at redhat.c | |om) | |needinfo?(sheggodu at redhat.c | |om) | -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 14 12:33:49 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 12:33:49 +0000 Subject: [Bugs] [Bug 1402538] Assertion Failed Error messages in rebalance logs during rebalance In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1402538 Rishubh Jain changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED Assignee|risjain at redhat.com |spalai at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 14 13:12:45 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 13:12:45 +0000 Subject: [Bugs] [Bug 1737705] ctime: nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737705 nchilaka changed: What |Removed |Added ---------------------------------------------------------------------------- QA Contact|rhinduja at redhat.com |nchilaka at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 14 13:12:54 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 13:12:54 +0000 Subject: [Bugs] [Bug 1734305] ctime: When healing ctime xattr for legacy files, if multiple clients access and modify the same file, the ctime might be updated incorrectly. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734305 nchilaka changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |nchilaka at redhat.com QA Contact|rhinduja at redhat.com |nchilaka at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 14 13:15:50 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 13:15:50 +0000 Subject: [Bugs] [Bug 1732793] I/O error on writes to a disperse volume when replace-brick is executed In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732793 Upasana changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ON_QA |VERIFIED -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 14 13:48:30 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 13:48:30 +0000 Subject: [Bugs] [Bug 1732772] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732772 nchilaka changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ON_QA |VERIFIED -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 14 13:48:49 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 13:48:49 +0000 Subject: [Bugs] [Bug 1732779] [GSS] An Input/Output error happens on a disperse volume when doing unaligned writes to a sparse file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732779 nchilaka changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ON_QA |VERIFIED -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 14 13:53:35 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 13:53:35 +0000 Subject: [Bugs] [Bug 1732790] fix truncate lock to cover the write in tuncate clean In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732790 nchilaka changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ON_QA |VERIFIED -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 14 14:08:59 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 14:08:59 +0000 Subject: [Bugs] [Bug 1734027] glusterd 6.4 memory leaks 2-3 GB per 24h (OOM) In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734027 Alex changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(totalworlddominat | |ion at gmail.com) | --- Comment #10 from Alex --- Created attachment 1603770 --> https://bugzilla.redhat.com/attachment.cgi?id=1603770&action=edit kill -SIGUSR1 $(pidof glusterd) -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 14 14:15:22 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 14:15:22 +0000 Subject: [Bugs] [Bug 1734027] glusterd 6.4 memory leaks 2-3 GB per 24h (OOM) In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734027 --- Comment #11 from Alex --- Created attachment 1603772 --> https://bugzilla.redhat.com/attachment.cgi?id=1603772&action=edit cmd_history.log node 1 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 14 14:15:39 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 14:15:39 +0000 Subject: [Bugs] [Bug 1734027] glusterd 6.4 memory leaks 2-3 GB per 24h (OOM) In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734027 --- Comment #12 from Alex --- Created attachment 1603773 --> https://bugzilla.redhat.com/attachment.cgi?id=1603773&action=edit cmd_history.log node 2 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 14 14:16:01 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 14:16:01 +0000 Subject: [Bugs] [Bug 1734027] glusterd 6.4 memory leaks 2-3 GB per 24h (OOM) In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734027 --- Comment #13 from Alex --- Created attachment 1603774 --> https://bugzilla.redhat.com/attachment.cgi?id=1603774&action=edit cmd_history.log node 3 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 14 14:20:02 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 14:20:02 +0000 Subject: [Bugs] [Bug 1734027] glusterd 6.4 memory leaks 2-3 GB per 24h (OOM) In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734027 --- Comment #14 from Alex --- Statedump worked, my bad, I was thinking kill -1 and not -10... :) I've attached it under the command's name to generate it as its description. I do have a glusterd-exporter for prometheus running. I just stopped them for a few days to see what happens. I've also attached all 3 cmd_history.log. Interestingly, since I've stopped the glusterd-exporter at ~10AM EDT, 15 minutes prior to copying the logs (~14h UTC), the repeating message ("tail" below): [2019-08-14 13:59:35.027247] : volume profile gluster info cumulative : FAILED : Profile on Volume gluster is not started [2019-08-14 13:59:35.193063] : volume status all detail : SUCCESS [2019-08-14 13:59:35.199847] : volume status all detail : SUCCESS ... seem to have stopped at the same time! Is that what you meant by monitoring command? Is it a problem with that exporter not clearing something after getting some data or with gluster accumulating some sort of cache for those commands? Thanks! -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 14 21:31:04 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 21:31:04 +0000 Subject: [Bugs] [Bug 1739884] glusterfsd process crashes with SIGSEGV In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739884 --- Comment #3 from Chad Feller --- Hi, I still have the core dumps, so I will include the output of your commands as attachments. I'm also going to attach the cluster configuration as well. Native clients are mounted with the 'reader-thread-count=4' option. Only additional notes are that we had switched from Copper 1GbE to Fiber 10GbE about a week before the first crash. At the same time we also added 18 additional disks per node, which would eventually comprise two additional bricks per node. I've been using custom Ansible playbooks to manage not only the system, but Gluster as well. When I used Ansible (https://docs.ansible.com/ansible/latest/modules/gluster_volume_module.html) to add the two additional bricks at the same time, it incorrectly paired them (bug?). Before the incorrect addition, my cluster configuration was as follows: Brick1: gluster00:/export/brick0/srv Brick2: gluster01:/export/brick0/srv After Ansible incorrectly paired the disks, it was something like this, IIRC: Brick1: gluster00:/export/brick0/srv Brick2: gluster01:/export/brick0/srv Brick3: gluster00:/export/brick1/srv Brick4: gluster00:/export/brick2/srv Brick5: gluster01:/export/brick1/srv Brick6: gluster01:/export/brick2/srv After adding the bricks, I issued a rebalance command (not realizing the incorrect pairing) but about a minute into the rebalance I realized that something was amiss. I immediately realized what happened and issued a: gluster volume remove-brick gv0 gluster01:/export/brick2/srv gluster01:/export/brick1/srv gluster00:/export/brick2/srv gluster00:/export/brick1/srv start After the remove completed, I did a commit to confirm the remove-brick command. After the commit I was back to the original configuration: Brick1: gluster00:/export/brick0/srv Brick2: gluster01:/export/brick0/srv While the data was intact, my directory permissions and and file ownership were wiped out due to a bug that may have been related to Bug #1716848. After correcting the directory permissions and ownership, the cluster ran fine for several hours, and I had planned to reattempt the brick add (via Ansible) but with one brick pair at a time so I didn't end up with mismatched brick pairs again. At the end of the day however, before I was able to re-add the brick pair, Gluster crashed with the first core dump. It was still in the two brick setup, as I had not yet re-attempted the brick add. (Note: I reformatted the bricks before attempting to re-use them.) I rebooted the cluster, and upon coming back up, the self heal daemon resync'd everything. After examining the volume, I was happy with everything so I went ahead and added a brick pair via Ansible. It worked and everything was paired correctly. I then added the next pair to Ansible and ran the playbook again. Again, everything paired correctly. At this point I had the correct brick setup: Brick1: gluster00:/export/brick0/srv Brick2: gluster01:/export/brick0/srv Brick3: gluster00:/export/brick1/srv Brick4: gluster01:/export/brick1/srv Brick5: gluster00:/export/brick2/srv Brick6: gluster01:/export/brick2/srv >From here I issued a rebalance command and watched. Everything was working fine for about 10 hours, which was when the second crash happened. That is, the second crash happened in the middle of a rebalance. After everything came back up, the self heal daemon did its thing. I examined the volume, saw no issues and went ahead and started the rebalance again. This time the rebalance ran to completion (took somewhere between 1-2 days). I've had zero crashes since then. Not sure if there is any pattern of access that caused any of this, but the timing of it around some administrative work is interesting and why I covered it in such detail above. I should also note that I'm also using both munin-glusterfs (https://github.com/burner1024/munin-gluster) and gluster-prometheus (https://github.com/gluster/gluster-prometheus) plugins on the nodes for monitoring (although Munin is legacy at this point, and will be going away once Prometheus is fully built out). -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 14 21:35:44 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 21:35:44 +0000 Subject: [Bugs] [Bug 1739884] glusterfsd process crashes with SIGSEGV In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739884 --- Comment #4 from Chad Feller --- Created attachment 1603901 --> https://bugzilla.redhat.com/attachment.cgi?id=1603901&action=edit gluster00 crash #1 core dump gluster00 crash #1 core dump -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 14 21:37:52 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 21:37:52 +0000 Subject: [Bugs] [Bug 1739884] glusterfsd process crashes with SIGSEGV In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739884 Chad Feller changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment|gluster00 crash #1 core |gdb backtrace from #1603901|dump |gluster00 crash #1 description| | --- Comment #5 from Chad Feller --- Comment on attachment 1603901 --> https://bugzilla.redhat.com/attachment.cgi?id=1603901 gdb backtrace from gluster00 crash #1 Output of '(gdb) thread apply all bt full' from crash #1 on gluster00 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 14 21:39:47 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 21:39:47 +0000 Subject: [Bugs] [Bug 1739884] glusterfsd process crashes with SIGSEGV In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739884 --- Comment #6 from Chad Feller --- Created attachment 1603902 --> https://bugzilla.redhat.com/attachment.cgi?id=1603902&action=edit gdb backtrace from gluster01 crash #1 Output of '(gdb) thread apply all bt full' from crash #1 on gluster01 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 14 21:41:51 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 21:41:51 +0000 Subject: [Bugs] [Bug 1739884] glusterfsd process crashes with SIGSEGV In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739884 --- Comment #7 from Chad Feller --- Created attachment 1603903 --> https://bugzilla.redhat.com/attachment.cgi?id=1603903&action=edit gdb backtrace from gluster00 crash #2 Output of '(gdb) thread apply all bt full' from crash #2 on gluster00 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 14 21:43:22 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 21:43:22 +0000 Subject: [Bugs] [Bug 1739884] glusterfsd process crashes with SIGSEGV In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739884 --- Comment #8 from Chad Feller --- Created attachment 1603904 --> https://bugzilla.redhat.com/attachment.cgi?id=1603904&action=edit gdb backtrace from gluster01 crash #2 Output of '(gdb) thread apply all bt full' from crash #2 on gluster01 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 14 21:46:09 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 21:46:09 +0000 Subject: [Bugs] [Bug 1739884] glusterfsd process crashes with SIGSEGV In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739884 --- Comment #9 from Chad Feller --- Created attachment 1603905 --> https://bugzilla.redhat.com/attachment.cgi?id=1603905&action=edit lvm config LVM configuration -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 14 21:46:42 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 21:46:42 +0000 Subject: [Bugs] [Bug 1739884] glusterfsd process crashes with SIGSEGV In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739884 --- Comment #10 from Chad Feller --- Created attachment 1603906 --> https://bugzilla.redhat.com/attachment.cgi?id=1603906&action=edit gluster volume info output of 'gluster volume info' -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 14 21:47:36 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 21:47:36 +0000 Subject: [Bugs] [Bug 1739884] glusterfsd process crashes with SIGSEGV In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739884 --- Comment #11 from Chad Feller --- Created attachment 1603907 --> https://bugzilla.redhat.com/attachment.cgi?id=1603907&action=edit gluster volume status client-list output of 'gluster volume status gv0 client-list' -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 14 21:49:29 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 14 Aug 2019 21:49:29 +0000 Subject: [Bugs] [Bug 1739884] glusterfsd process crashes with SIGSEGV In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739884 --- Comment #12 from Chad Feller --- Created attachment 1603909 --> https://bugzilla.redhat.com/attachment.cgi?id=1603909&action=edit gluster volume status detail output of 'gluster volume status detail' -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 15 03:37:14 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 15 Aug 2019 03:37:14 +0000 Subject: [Bugs] [Bug 1741402] New: READDIRP incorrectly updates posix-acl inode ctx Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741402 Bug ID: 1741402 Summary: READDIRP incorrectly updates posix-acl inode ctx Product: GlusterFS Version: 6 OS: Linux Status: NEW Component: posix-acl Severity: urgent Priority: medium Assignee: bugs at gluster.org Reporter: moagrawa at redhat.com CC: anoopcs at redhat.com, asender at testlabs.com.au, atumball at redhat.com, bugs at gluster.org, homma at allworks.co.jp, jthottan at redhat.com, nbalacha at redhat.com, pgurusid at redhat.com, rgowdapp at redhat.com, spalai at redhat.com Depends On: 1668286 Target Milestone: --- Classification: Community +++ This bug was initially created as a clone of Bug #1668286 +++ Description of problem: On FUSE client with mount option use-readdirp=on (default) and acl, access to a file is denied for about a second after listing the directory in which the file resides. Version-Release number of selected component (if applicable): glusterfs-fuse.x86_64 5.2-1.el7 from centos-gluster5 repository How reproducible: Always, with mount option use-readdirp=on and acl Steps to Reproduce: 1. Mount GlusterFS volume with acl and use-readdirp=on 2. Chdir to the mounted directory 3. Execute the following commands: echo TEST > foo; echo -n "[`date -u --rfc-3339=ns`] "; cat foo; ls -l; while :; do echo -n "[`date -u --rfc-3339=ns`] "; cat foo && break; usleep 200000; done Actual results: Access is denied for about a second after executing ls: [2019-01-22 10:24:18.802855191+00:00] TEST total 1 -rw-rw-r-- 1 centos centos 5 Jan 22 16:30 bar -rw-rw-r-- 1 centos centos 5 Jan 22 19:24 foo [2019-01-22 10:24:18.825725474+00:00] cat: foo: Permission denied [2019-01-22 10:24:19.029015958+00:00] cat: foo: Permission denied [2019-01-22 10:24:19.232249483+00:00] cat: foo: Permission denied [2019-01-22 10:24:19.435580108+00:00] cat: foo: Permission denied [2019-01-22 10:24:19.638781941+00:00] cat: foo: Permission denied [2019-01-22 10:24:19.843016193+00:00] TEST Gluster log on the client: [2019-01-22 10:24:18.826671] I [MSGID: 139001] [posix-acl.c:263:posix_acl_log_permit_denied] 0-posix-acl-autoload: client: -, gfid: e16e1d3e-7518-4323-982f-1ad348f9608f, req(uid:1000,gid:1000,perm:4,ngrps:4), ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:READDIRP, acl:-) [Permission denied] [2019-01-22 10:24:18.826711] W [fuse-bridge.c:1124:fuse_fd_cbk] 0-glusterfs-fuse: 930: OPEN() /centos/test/foo => -1 (Permission denied) [2019-01-22 10:24:19.030036] W [fuse-bridge.c:1124:fuse_fd_cbk] 0-glusterfs-fuse: 931: OPEN() /centos/test/foo => -1 (Permission denied) [2019-01-22 10:24:19.233301] W [fuse-bridge.c:1124:fuse_fd_cbk] 0-glusterfs-fuse: 932: OPEN() /centos/test/foo => -1 (Permission denied) [2019-01-22 10:24:19.436612] W [fuse-bridge.c:1124:fuse_fd_cbk] 0-glusterfs-fuse: 933: OPEN() /centos/test/foo => -1 (Permission denied) [2019-01-22 10:24:19.639804] W [fuse-bridge.c:1124:fuse_fd_cbk] 0-glusterfs-fuse: 934: OPEN() /centos/test/foo => -1 (Permission denied) The message "I [MSGID: 139001] [posix-acl.c:263:posix_acl_log_permit_denied] 0-posix-acl-autoload: client: -, gfid: e16e1d3e-7518-4323-982f-1ad348f9608f, req(uid:1000,gid:1000,perm:4,ngrps:4), ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:READDIRP, acl:-) [Permission denied]" repeated 4 times between [2019-01-22 10:24:18.826671] and [2019-01-22 10:24:19.639797] Expected results: Access to the file is always granted. Additional info: In readdir-ahead.c, rda_fill_fd_cbk() replaces dentries and zeroes out iatts exept for ia_gfid and ia_type. Then in posix-acl.c, posix_acl_readdirp_cbk() updates its inode ctx by that zeroed permission, and permission is denied. --- Additional comment from asender at testlabs.com.au on 2019-02-13 04:43:45 UTC --- We need a procedure to downgrade from 5 to 4 without causing any further disruptions. I think this bug report should be a blocker. --- Additional comment from Raghavendra G on 2019-02-13 05:09:10 UTC --- (In reply to homma from comment #0) > Additional info: > > In readdir-ahead.c, rda_fill_fd_cbk() replaces dentries and zeroes out iatts > exept for ia_gfid and ia_type. > Then in posix-acl.c, posix_acl_readdirp_cbk() updates its inode ctx by that > zeroed permission, and permission is denied. The expectation is kernel would do a fresh lookup for getting other attributes like permissions and that's what Glusterfs indicates kernel too - that only entry information (mapping of path to inode/gfid) is valid and the attributes are not valid. How did you conclude zeroed out permissions are set on posix-acl? Did you see a call like setattr or any setxattr updating posix acls? If yes, whether these zeroed out attributes were sent from kernel? --- Additional comment from asender at testlabs.com.au on 2019-02-13 05:20:50 UTC --- gluster volume info Volume Name: common Type: Replicate Volume ID: 359a079c-0c67-4a07-aa92-65d746ae6440 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: hplintnfs30063:/export/common/common Brick2: hplintnfs30065:/export/common/common Options Reconfigured: transport.address-family: inet performance.readdir-ahead: on nfs.disable: on Volume Name: external Type: Replicate Volume ID: b76d3a71-6c0c-4df3-9411-baa30a586489 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: hplintnfs30065:/export/external/external Brick2: hplintnfs30064:/export/external/external Options Reconfigured: transport.address-family: inet performance.readdir-ahead: on nfs.disable: on nfs.log-level: debug Volume Name: input Type: Replicate Volume ID: 399caee1-4acc-48bc-9416-5510dc056280 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: hplintnfs30063:/export/input/input Brick2: hplintnfs30065:/export/input/input Options Reconfigured: transport.address-family: inet performance.readdir-ahead: enable nfs.disable: on performance.cache-size: 1GB performance.client-io-threads: on performance.io-cache: on performance.io-thread-count: 16 performance.read-ahead: disable server.allow-insecure: on cluster.lookup-optimize: on client.event-threads: 4 server.event-threads: 4 cluster.readdir-optimize: on performance.write-behind-window-size: 1MB Volume Name: logs Type: Replicate Volume ID: a5afa578-441b-4392-887a-2e3d71a27408 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: hplintnfs30063:/export/logs/logs Brick2: hplintnfs30065:/export/logs/logs Options Reconfigured: transport.address-family: inet performance.readdir-ahead: on nfs.disable: on Volume Name: output Type: Replicate Volume ID: bf333aa2-7260-4a8c-aa8b-cb9aeac16d36 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: hplintnfs30063:/export/output/output Brick2: hplintnfs30065:/export/output/output Options Reconfigured: transport.address-family: inet performance.readdir-ahead: on nfs.disable: on Volume Name: report Type: Replicate Volume ID: caf38a37-9228-4d2a-b636-6a168ce89183 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: hplintnfs30065:/export/report/report Brick2: hplintnfs30064:/export/report/report Options Reconfigured: transport.address-family: inet performance.readdir-ahead: on nfs.disable: on Volume Name: statement Type: Replicate Volume ID: 238e520d-d493-4b0e-89e2-15707847e1e7 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: hplintnfs30065:/export/statement/statement Brick2: hplintnfs30063:/export/statement/statement Options Reconfigured: transport.address-family: inet performance.readdir-ahead: on nfs.disable: on --- Additional comment from asender at testlabs.com.au on 2019-02-13 05:33:15 UTC --- Applications returning permission denied. 2019-02-07 15:00:35 DEBUG - stderr: gpg: can't open `/data/common/direct-entry-files/first-data-returns/sftp/512733_ADEF020701.gpg': Permission denied --- Additional comment from asender at testlabs.com.au on 2019-02-13 05:38:26 UTC --- Could someone kindly provide a "rollback" procedure with minimal impact. Can we set Gluster back to version 4 compatibility mode and downgrade.? Prefer non-impacting, but whatever is the safest. We need to go back to version 4. --- Additional comment from on 2019-02-13 08:25:41 UTC --- (In reply to Raghavendra G from comment #2) > (In reply to homma from comment #0) > > Additional info: > > > > In readdir-ahead.c, rda_fill_fd_cbk() replaces dentries and zeroes out iatts > > exept for ia_gfid and ia_type. > > Then in posix-acl.c, posix_acl_readdirp_cbk() updates its inode ctx by that > > zeroed permission, and permission is denied. > > The expectation is kernel would do a fresh lookup for getting other > attributes like permissions and that's what Glusterfs indicates kernel too - > that only entry information (mapping of path to inode/gfid) is valid and > the attributes are not valid. How did you conclude zeroed out permissions > are set on posix-acl? Did you see a call like setattr or any setxattr > updating posix acls? If yes, whether these zeroed out attributes were sent > from kernel? In the client log, 'ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:READDIRP, acl:-)' indicates that owner, group, and permissions are all zero in posix-acl ctx. With gdb, the following output is obtained when executing the above commands (see 'Steps to Reproduce'). The ctx is zeroed out (uid = 0, gid = 0, perm = 32768) when updated by READDIRP, while it has correct values (uid = 1000, gid = 1000, perm = 33204) when updated by LOOKUP. (gdb) break posix-acl.c:1196 Breakpoint 1 at 0x7fbdc0fecb28: file posix-acl.c, line 1196. (gdb) commands Type commands for breakpoint(s) 1, one per line. End with a line saying just "end". >print *loc >print *(struct posix_acl_ctx *)loc.inode._ctx[13].ptr1 >continue >end (gdb) break posix-acl.c:1200 Breakpoint 2 at 0x7fbdc0fec953: file posix-acl.c, line 1200. (gdb) commands Type commands for breakpoint(s) 2, one per line. End with a line saying just "end". >print *loc >print *(struct posix_acl_ctx *)loc.inode._ctx[13].ptr1 >continue >end (gdb) set pagination off (gdb) continue Continuing. [Switching to Thread 0x7fbdbb7fe700 (LWP 7156)] Breakpoint 1, posix_acl_open (frame=frame at entry=0x7fbdac01e8b8, this=this at entry=0x7fbdbc01dc00, loc=loc at entry=0x7fbdac000f30, flags=flags at entry=32769, fd=fd at entry=0x7fbdac009d88, xdata=xdata at entry=0x0) at posix-acl.c:1196 1196 STACK_WIND(frame, posix_acl_open_cbk, FIRST_CHILD(this), $1 = {path = 0x7fbdac007f20 "/centos/test/foo", name = 0x0, inode = 0x7fbdac001e98, parent = 0x0, gfid = "\341n\035>u\030C#\230/\032\323H\371`\217", pargfid = '\000' } $2 = {uid = 1000, gid = 1000, perm = 33204, fop = GF_FOP_LOOKUP, acl_access = 0x0, acl_default = 0x0} Breakpoint 1, posix_acl_open (frame=frame at entry=0x7fbdac014fe8, this=this at entry=0x7fbdbc01dc00, loc=loc at entry=0x7fbdac000f30, flags=flags at entry=32768, fd=fd at entry=0x7fbdac00ab28, xdata=xdata at entry=0x0) at posix-acl.c:1196 1196 STACK_WIND(frame, posix_acl_open_cbk, FIRST_CHILD(this), $3 = {path = 0x7fbdac009090 "/centos/test/foo", name = 0x0, inode = 0x7fbdac001e98, parent = 0x0, gfid = "\341n\035>u\030C#\230/\032\323H\371`\217", pargfid = '\000' } $4 = {uid = 1000, gid = 1000, perm = 33204, fop = GF_FOP_LOOKUP, acl_access = 0x0, acl_default = 0x0} Breakpoint 2, posix_acl_open (frame=frame at entry=0x7fbdac013978, this=this at entry=0x7fbdbc01dc00, loc=loc at entry=0x7fbdac000f30, flags=flags at entry=32768, fd=fd at entry=0x7fbdac014638, xdata=xdata at entry=0x0) at posix-acl.c:1200 1200 STACK_UNWIND_STRICT(open, frame, -1, EACCES, NULL, NULL); $5 = {path = 0x7fbdac009090 "/centos/test/foo", name = 0x0, inode = 0x7fbdac001e98, parent = 0x0, gfid = "\341n\035>u\030C#\230/\032\323H\371`\217", pargfid = '\000' } $6 = {uid = 0, gid = 0, perm = 32768, fop = GF_FOP_READDIRP, acl_access = 0x0, acl_default = 0x0} Breakpoint 2, posix_acl_open (frame=frame at entry=0x7fbdac0126a8, this=this at entry=0x7fbdbc01dc00, loc=loc at entry=0x7fbdac000f30, flags=flags at entry=32768, fd=fd at entry=0x7fbdac017b48, xdata=xdata at entry=0x0) at posix-acl.c:1200 1200 STACK_UNWIND_STRICT(open, frame, -1, EACCES, NULL, NULL); $7 = {path = 0x7fbdac007f20 "/centos/test/foo", name = 0x0, inode = 0x7fbdac001e98, parent = 0x0, gfid = "\341n\035>u\030C#\230/\032\323H\371`\217", pargfid = '\000' } $8 = {uid = 0, gid = 0, perm = 32768, fop = GF_FOP_READDIRP, acl_access = 0x0, acl_default = 0x0} Breakpoint 2, posix_acl_open (frame=frame at entry=0x7fbdac017b48, this=this at entry=0x7fbdbc01dc00, loc=loc at entry=0x7fbdac000f30, flags=flags at entry=32768, fd=fd at entry=0x7fbdac014fe8, xdata=xdata at entry=0x0) at posix-acl.c:1200 1200 STACK_UNWIND_STRICT(open, frame, -1, EACCES, NULL, NULL); $9 = {path = 0x7fbdac009090 "/centos/test/foo", name = 0x0, inode = 0x7fbdac001e98, parent = 0x0, gfid = "\341n\035>u\030C#\230/\032\323H\371`\217", pargfid = '\000' } $10 = {uid = 0, gid = 0, perm = 32768, fop = GF_FOP_READDIRP, acl_access = 0x0, acl_default = 0x0} Breakpoint 2, posix_acl_open (frame=frame at entry=0x7fbdac014fe8, this=this at entry=0x7fbdbc01dc00, loc=loc at entry=0x7fbdac000f30, flags=flags at entry=32768, fd=fd at entry=0x7fbdac0126a8, xdata=xdata at entry=0x0) at posix-acl.c:1200 1200 STACK_UNWIND_STRICT(open, frame, -1, EACCES, NULL, NULL); $11 = {path = 0x7fbdac007f20 "/centos/test/foo", name = 0x0, inode = 0x7fbdac001e98, parent = 0x0, gfid = "\341n\035>u\030C#\230/\032\323H\371`\217", pargfid = '\000' } $12 = {uid = 0, gid = 0, perm = 32768, fop = GF_FOP_READDIRP, acl_access = 0x0, acl_default = 0x0} Breakpoint 2, posix_acl_open (frame=frame at entry=0x7fbdac0126a8, this=this at entry=0x7fbdbc01dc00, loc=loc at entry=0x7fbdac000f30, flags=flags at entry=32768, fd=fd at entry=0x7fbdac017b48, xdata=xdata at entry=0x0) at posix-acl.c:1200 1200 STACK_UNWIND_STRICT(open, frame, -1, EACCES, NULL, NULL); $13 = {path = 0x7fbdac009090 "/centos/test/foo", name = 0x0, inode = 0x7fbdac001e98, parent = 0x0, gfid = "\341n\035>u\030C#\230/\032\323H\371`\217", pargfid = '\000' } $14 = {uid = 0, gid = 0, perm = 32768, fop = GF_FOP_READDIRP, acl_access = 0x0, acl_default = 0x0} Breakpoint 1, posix_acl_open (frame=frame at entry=0x7fbdac018b38, this=this at entry=0x7fbdbc01dc00, loc=loc at entry=0x7fbdac000f30, flags=flags at entry=32768, fd=fd at entry=0x7fbdac0179d8, xdata=xdata at entry=0x0) at posix-acl.c:1196 1196 STACK_WIND(frame, posix_acl_open_cbk, FIRST_CHILD(this), $15 = {path = 0x7fbdac007f20 "/centos/test/foo", name = 0x0, inode = 0x7fbdac001e98, parent = 0x0, gfid = "\341n\035>u\030C#\230/\032\323H\371`\217", pargfid = '\000' } $16 = {uid = 1000, gid = 1000, perm = 33204, fop = GF_FOP_LOOKUP, acl_access = 0x0, acl_default = 0x0} --- Additional comment from asender at testlabs.com.au on 2019-02-13 22:23:50 UTC --- I also have the errors from client: data-common.log-20190210:[2019-02-07 04:00:34.845536] I [MSGID: 139001] [posix-acl.c:263:posix_acl_log_permit_denied] 0-posix-acl-autoload: client: -, gfid: 9cdaca25-8b70-4d5c-ab7c-23711af54f29, req(uid:582601439,gid:582600513,perm:4,ngrps:1), ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:READDIRP, acl:(tag:1,perm:0,id:4294967295)(tag:2,perm:7,id:582601439)(tag:4,perm:5,id:4294967295)(tag:16,perm:0,id:4294967295)(tag:32,perm:0,id:4294967295) [Permission denied] data-common.log-20190210:[2019-02-07 04:00:34.903357] I [MSGID: 139001] [posix-acl.c:263:posix_acl_log_permit_denied] 0-posix-acl-autoload: client: -, gfid: 3ddd4e64-38a2-456d-82c7-8361fd2f12a0, req(uid:582601439,gid:582600513,perm:4,ngrps:1), ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:READDIRP, acl:(tag:1,perm:0,id:4294967295)(tag:2,perm:7,id:582601439)(tag:4,perm:5,id:4294967295)(tag:16,perm:0,id:4294967295)(tag:32,perm:0,id:4294967295) [Permission denied] data-common.log-20190210:[2019-02-07 04:00:34.933803] I [MSGID: 139001] [posix-acl.c:263:posix_acl_log_permit_denied] 0-posix-acl-autoload: client: -, gfid: ce5220ed-2ac0-436d-9c3b-6978fba22409, req(uid:582601439,gid:582600513,perm:4,ngrps:1), ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:READDIRP, acl:(tag:1,perm:0,id:4294967295)(tag:2,perm:7,id:582601439)(tag:4,perm:5,id:4294967295)(tag:16,perm:0,id:4294967295)(tag:32,perm:0,id:4294967295) [Permission denied] data-common.log-20190210:[2019-02-07 04:00:34.968269] I [MSGID: 139001] [posix-acl.c:263:posix_acl_log_permit_denied] 0-posix-acl-autoload: client: -, gfid: 01f64a01-6657-408e-80bc-2daa2fa4c3d6, req(uid:582601439,gid:582600513,perm:4,ngrps:1), ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:READDIRP, acl:(tag:1,perm:0,id:4294967295)(tag:2,perm:7,id:582601439)(tag:4,perm:5,id:4294967295)(tag:16,perm:0,id:4294967295)(tag:32,perm:0,id:4294967295) [Permission denied] data-common.log-20190210:[2019-02-07 04:00:35.001639] I [MSGID: 139001] [posix-acl.c:263:posix_acl_log_permit_denied] 0-posix-acl-autoload: client: -, gfid: dd58bcb9-123c-4149-a101-87c145c8d75e, req(uid:582601439,gid:582600513,perm:4,ngrps:1), ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:READDIRP, acl:(tag:1,perm:0,id:4294967295)(tag:2,perm:7,id:582601439)(tag:4,perm:5,id:4294967295)(tag:16,perm:0,id:4294967295)(tag:32,perm:0,id:4294967295) [Permission denied] data-common.log-20190210:[2019-02-07 04:00:35.029941] I [MSGID: 139001] [posix-acl.c:263:posix_acl_log_permit_denied] 0-posix-acl-autoload: client: -, gfid: afba3b44-82a4-4cc8-8412-2e6640b3ee41, req(uid:582601439,gid:582600513,perm:4,ngrps:1), ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:READDIRP, acl:(tag:1,perm:0,id:4294967295)(tag:2,perm:7,id:582601439)(tag:4,perm:5,id:4294967295)(tag:16,perm:0,id:4294967295)(tag:32,perm:0,id:4294967295) [Permission denied] data-common.log-20190210:[2019-02-07 04:00:35.062942] I [MSGID: 139001] [posix-acl.c:263:posix_acl_log_permit_denied] 0-posix-acl-autoload: client: -, gfid: 6e2761ce-4ae6-4a57-b77a-aca9d675d4a7, req(uid:582601439,gid:582600513,perm:4,ngrps:1), ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:READDIRP, acl:(tag:1,perm:0,id:4294967295)(tag:2,perm:7,id:582601439)(tag:4,perm:5,id:4294967295)(tag:16,perm:0,id:4294967295)(tag:32,perm:0,id:4294967295) [Permission denied] data-common.log-20190210:[2019-02-07 04:00:35.088658] I [MSGID: 139001] [posix-acl.c:263:posix_acl_log_permit_denied] 0-posix-acl-autoload: client: -, gfid: 95150a45-e2af-43be-97c5-3d6c68f2cc45, req(uid:582601439,gid:582600513,perm:4,ngrps:1), ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:READDIRP, acl:(tag:1,perm:0,id:4294967295)(tag:2,perm:7,id:582601439)(tag:4,perm:5,id:4294967295)(tag:16,perm:0,id:4294967295)(tag:32,perm:0,id:4294967295) [Permission denied] data-common.log-20190210:[2019-02-07 04:00:35.115121] I [MSGID: 139001] [posix-acl.c:263:posix_acl_log_permit_denied] 0-posix-acl-autoload: client: -, gfid: beba071f-3107-4955-b132-6871c5a4b4a7, req(uid:582601439,gid:582600513,perm:4,ngrps:1), ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:READDIRP, acl:(tag:1,perm:0,id:4294967295)(tag:2,perm:7,id:582601439)(tag:4,perm:5,id:4294967295)(tag:16,perm:0,id:4294967295)(tag:32,perm:0,id:4294967295) [Permission denied] data-common.log-20190210:[2019-02-07 04:00:35.142953] I [MSGID: 139001] [posix-acl.c:263:posix_acl_log_permit_denied] 0-posix-acl-autoload: client: -, gfid: 08ce9205-f7c9-40da-bae7-0a6f313a2a4b, req(uid:582601439,gid:582600513,perm:4,ngrps:1), ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:READDIRP, acl:(tag:1,perm:0,id:4294967295)(tag:2,perm:7,id:582601439)(tag:4,perm:5,id:4294967295)(tag:16,perm:0,id:4294967295)(tag:32,perm:0,id:4294967295) [Permission denied] data-common.log-20190210:[2019-02-07 04:00:35.169342] I [MSGID: 139001] [posix-acl.c:263:posix_acl_log_permit_denied] 0-posix-acl-autoload: client: -, gfid: 219ae3ee-6783-4194-9952-32b859b6e9e6, req(uid:582601439,gid:582600513,perm:4,ngrps:1), ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:READDIRP, acl:(tag:1,perm:0,id:4294967295)(tag:2,perm:7,id:582601439)(tag:4,perm:5,id:4294967295)(tag:16,perm:0,id:4294967295)(tag:32,perm:0,id:4294967295) [Permission denied] data-common.log-20190210:[2019-02-07 04:00:35.195090] I [MSGID: 139001] [posix-acl.c:263:posix_acl_log_permit_denied] 0-posix-acl-autoload: client: -, gfid: eec300d6-5e58-4dc4-9779-cb5872bcfde3, req(uid:582601439,gid:582600513,perm:4,ngrps:1), ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:READDIRP, acl:(tag:1,perm:0,id:4294967295)(tag:2,perm:7,id:582601439)(tag:4,perm:5,id:4294967295)(tag:16,perm:0,id:4294967295)(tag:32,perm:0,id:4294967295) [Permission denied] data-common.log-20190210:[2019-02-07 04:00:35.222026] I [MSGID: 139001] [posix-acl.c:263:posix_acl_log_permit_denied] 0-posix-acl-autoload: client: -, gfid: e7ebe43c-611b-48c0-91da-bc8cff0e257d, req(uid:582601439,gid:582600513,perm:4,ngrps:1), ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:READDIRP, acl:(tag:1,perm:0,id:4294967295)(tag:2,perm:7,id:582601439)(tag:4,perm:5,id:4294967295)(tag:16,perm:0,id:4294967295)(tag:32,perm:0,id:4294967295) [Permission denied] data-common.log-20190210:[2019-02-07 04:00:35.248727] I [MSGID: 139001] [posix-acl.c:263:posix_acl_log_permit_denied] 0-posix-acl-autoload: client: -, gfid: 75e84ea7-ee13-499e-a9ed-9924ca795220, req(uid:582601439,gid:582600513,perm:4,ngrps:1), ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:READDIRP, acl:(tag:1,perm:0,id:4294967295)(tag:2,perm:7,id:582601439)(tag:4,perm:5,id:4294967295)(tag:16,perm:0,id:4294967295)(tag:32,perm:0,id:4294967295) [Permission denied] data-common.log-20190210:[2019-02-07 04:00:35.273546] I [MSGID: 139001] [posix-acl.c:263:posix_acl_log_permit_denied] 0-posix-acl-autoload: client: -, gfid: bfb17595-25fd-48b1-b63c-5de711575e77, req(uid:582601439,gid:582600513,perm:4,ngrps:1), ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:READDIRP, acl:(tag:1,perm:0,id:4294967295)(tag:2,perm:7,id:582601439)(tag:4,perm:5,id:4294967295)(tag:16,perm:0,id:4294967295)(tag:32,perm:0,id:4294967295) [Permission denied] data-common.log-20190210:[2019-02-07 04:00:35.298552] I [MSGID: 139001] [posix-acl.c:263:posix_acl_log_permit_denied] 0-posix-acl-autoload: client: -, gfid: 41133cbc-0287-41e2-b38b-18e73d986b86, req(uid:582601439,gid:582600513,perm:4,ngrps:1), ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:READDIRP, acl:(tag:1,perm:0,id:4294967295)(tag:2,perm:7,id:582601439)(tag:4,perm:5,id:4294967295)(tag:16,perm:0,id:4294967295)(tag:32,perm:0,id:4294967295) [Permission denied] data-common.log-20190210:[2019-02-07 04:00:35.325860] I [MSGID: 139001] [posix-acl.c:263:posix_acl_log_permit_denied] 0-posix-acl-autoload: client: -, gfid: c926e9f8-ec94-47de-a669-7f0b1623297e, req(uid:582601439,gid:582600513,perm:4,ngrps:1), ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:READDIRP, acl:(tag:1,perm:0,id:4294967295)(tag:2,perm:7,id:582601439)(tag:4,perm:5,id:4294967295)(tag:16,perm:0,id:4294967295)(tag:32,perm:0,id:4294967295) [Permission denied] data-common.log-20190210:[2019-02-07 22:42:43.312845] I [MSGID: 139001] [posix-acl.c:263:posix_acl_log_permit_denied] 0-posix-acl-autoload: client: -, gfid: 7f53991d-0db7-4f1c-8deb-35cd5bcda822, req(uid:582601182,gid:582600513,perm:1,ngrps:7), ctx(uid:0,gid:0,in-groups:0,perm:770,updated-fop:LOOKUP, acl:(tag:1,perm:7,id:4294967295)(tag:2,perm:7,id:1000)(tag:2,perm:7,id:582601439)(tag:2,perm:7,id:582601746)(tag:4,perm:0,id:4294967295)(tag:16,perm:7,id:4294967295)(tag:32,perm:0,id:4294967295) [Permission denied] data-common.log-20190210:The message "I [MSGID: 139001] [posix-acl.c:263:posix_acl_log_permit_denied] 0-posix-acl-autoload: client: -, gfid: 7f53991d-0db7-4f1c-8deb-35cd5bcda822, req(uid:582601182,gid:582600513,perm:1,ngrps:7), ctx(uid:0,gid:0,in-groups:0,perm:770,updated-fop:LOOKUP, acl:(tag:1,perm:7,id:4294967295)(tag:2,perm:7,id:1000)(tag:2,perm:7,id:582601439)(tag:2,perm:7,id:582601746)(tag:4,perm:0,id:4294967295)(tag:16,perm:7,id:4294967295)(tag:32,perm:0,id:4294967295) [Permission denied]" repeated 3 times between [2019-02-07 22:42:43.312845] and [2019-02-07 22:42:43.314192] --- Additional comment from Nithya Balachandran on 2019-02-15 13:32:27 UTC --- (In reply to asender at testlabs.com.au from comment #5) > Could someone kindly provide a "rollback" procedure with minimal impact. Can > we set Gluster back to version 4 compatibility mode and downgrade.? Prefer > non-impacting, but whatever is the safest. > > We need to go back to version 4. >From Kaushal on IRC: sendro, To rollback do the following. 1. Kill glusterds on all the nodes. 2. Edit /var/lib/gluster/glusterd.info and manually change opversion to what you want. Do this on all the nodes. 3. Downgrade glusterfs-server to the version you want. 4. Restart glusterd. --- Additional comment from Jiffin on 2019-02-19 04:35:16 UTC --- As far as I understand, rda_fill_fd_cbk() sets iatt to zero and that info is stored in its context not passed to the other layers. I tried to reproduce, but was not able to hit till now.(turned on performance.readdir-ahead). Prior to this bug myself have seen similar issue when, the permission of acl ctx gets zeroed after readdir operations. The issue was very much spurious and there was no specific steps to hit that issue --- Additional comment from on 2019-02-26 13:31:31 UTC --- (In reply to Jiffin from comment #9) > As far as I understand, rda_fill_fd_cbk() sets iatt to zero and that info is > stored in its context not passed to the other layers. > I tried to reproduce, but was not able to hit till now.(turned on > performance.readdir-ahead). > Prior to this bug myself have seen similar issue when, the permission of acl > ctx gets zeroed after readdir operations. The issue was > very much spurious and there was no specific steps to hit that issue I think rda_fill_fd_cbk() passes entries with zeroed iatts to other xlators. On entry of rda_fill_fd_cbk(), 'entries' holds dentries obtained by READDIRP operation. After setting iatt to zero, it calls STACK_UNWIND_STRICT with modified 'serve_entries', not the original 'entries'. Then posix_acl_readdirp_cbk() receives that modified entries information. (gdb) b rda_fill_fd_cbk Breakpoint 1 at 0x7fef2451f9d0: file readdir-ahead.c, line 424. (gdb) b readdir-ahead.c:537 b posix_acl_readdirp_cbk Breakpoint 2 at 0x7fef2451fcd9: file readdir-ahead.c, line 537. (gdb) b posix_acl_readdirp_cbk Breakpoint 3 at 0x7fef1f7990b0: file posix-acl.c, line 1654. (gdb) c Continuing. [Switching to Thread 0x7fef25b37700 (LWP 12060)] Breakpoint 1, rda_fill_fd_cbk (frame=frame at entry=0x7fef2005c628, cookie=0x7fef2006a4d8, this=0x7fef200132e0, op_ret=op_ret at entry=4, op_errno=op_errno at entry=2, entries=entries at entry=0x7fef25b36710, xdata=xdata at entry=0x0) at readdir-ahead.c:424 424 { (gdb) p *entries.next.next.next $1 = {{list = {next = 0x7fef200011a0, prev = 0x7fef20000f40}, {next = 0x7fef200011a0, prev = 0x7fef20000f40}}, d_ino = 10966013112435171471, d_off = 28, d_len = 3, d_type = 8, d_stat = {ia_flags = 6143, ia_ino = 10966013112435171471, ia_dev = 51792, ia_rdev = 0, ia_size = 5, ia_nlink = 1, ia_uid = 1000, ia_gid = 1000, ia_blksize = 4096, ia_blocks = 1, ia_atime = 1551186297, ia_mtime = 1551186488, ia_ctime = 1551186488, ia_btime = 0, ia_atime_nsec = 517274116, ia_mtime_nsec = 150035482, ia_ctime_nsec = 153035462, ia_btime_nsec = 0, ia_attributes = 0, ia_attributes_mask = 0, ia_gfid = "\341n\035>u\030C#\230/\032\323H\371`\217", ia_type = IA_IFREG, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 1 '\001', write = 1 '\001', exec = 0 '\000'}, group = {read = 1 '\001', write = 1 '\001', exec = 0 '\000'}, other = {read = 1 '\001', write = 0 '\000', exec = 0 '\000'}}}, dict = 0x7fef200658f8, inode = 0x7fef100032c8, d_name = 0x7fef20001140 "foo"} (gdb) c Continuing. Breakpoint 2, rda_fill_fd_cbk (frame=frame at entry=0x7fef2005c628, cookie=, this=0x7fef200132e0, op_ret=op_ret at entry=4, op_errno=op_errno at entry=2, entries=entries at entry=0x7fef25b36710, xdata=xdata at entry=0x0) at readdir-ahead.c:537 537 STACK_UNWIND_STRICT(readdirp, stub->frame, ret, op_errno, (gdb) l 532 op_errno = 0; 533 534 UNLOCK(&ctx->lock); 535 536 if (serve) { 537 STACK_UNWIND_STRICT(readdirp, stub->frame, ret, op_errno, 538 &serve_entries, xdata); 539 gf_dirent_free(&serve_entries); 540 call_stub_destroy(stub); 541 } (gdb) p &serve_entries $2 = (gf_dirent_t *) 0x7fef25b364c0 (gdb) p *serve_entries.next.next.next $3 = {{list = {next = 0x7fef200011a0, prev = 0x7fef20000f40}, {next = 0x7fef200011a0, prev = 0x7fef20000f40}}, d_ino = 10966013112435171471, d_off = 28, d_len = 3, d_type = 8, d_stat = {ia_flags = 0, ia_ino = 0, ia_dev = 0, ia_rdev = 0, ia_size = 0, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_mtime = 0, ia_ctime = 0, ia_btime = 0, ia_atime_nsec = 0, ia_mtime_nsec = 0, ia_ctime_nsec = 0, ia_btime_nsec = 0, ia_attributes = 0, ia_attributes_mask = 0, ia_gfid = "\341n\035>u\030C#\230/\032\323H\371`\217", ia_type = IA_IFREG, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}}, dict = 0x7fef200658f8, inode = 0x7fef100032c8, d_name = 0x7fef20001140 "foo"} (gdb) c Continuing. Breakpoint 3, posix_acl_readdirp_cbk (frame=0x7fef1000b8c8, cookie=0x7fef1000c9e8, this=0x7fef2001dc00, op_ret=4, op_errno=2, entries=0x7fef25b364c0, xdata=0x0) at posix-acl.c:1654 1654 { (gdb) p entries $4 = (gf_dirent_t *) 0x7fef25b364c0 (gdb) p *entries.next.next.next $5 = {{list = {next = 0x7fef200011a0, prev = 0x7fef20000f40}, {next = 0x7fef200011a0, prev = 0x7fef20000f40}}, d_ino = 10966013112435171471, d_off = 28, d_len = 3, d_type = 8, d_stat = {ia_flags = 0, ia_ino = 0, ia_dev = 0, ia_rdev = 0, ia_size = 0, ia_nlink = 0, ia_uid = 0, ia_gid = 0, ia_blksize = 0, ia_blocks = 0, ia_atime = 0, ia_mtime = 0, ia_ctime = 0, ia_btime = 0, ia_atime_nsec = 0, ia_mtime_nsec = 0, ia_ctime_nsec = 0, ia_btime_nsec = 0, ia_attributes = 0, ia_attributes_mask = 0, ia_gfid = "\341n\035>u\030C#\230/\032\323H\371`\217", ia_type = IA_IFREG, ia_prot = {suid = 0 '\000', sgid = 0 '\000', sticky = 0 '\000', owner = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, group = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}, other = {read = 0 '\000', write = 0 '\000', exec = 0 '\000'}}}, dict = 0x7fef200658f8, inode = 0x7fef100032c8, d_name = 0x7fef20001140 "foo"} --- Additional comment from on 2019-03-29 11:16:37 UTC --- The problem still exists on release 5.5. THe cause of the problem may be that posix_acl_readdirp_cbk() updates its ctx without checking that dentries contain valid iatts. If so, please change the component to posix-acl. --- Additional comment from on 2019-07-01 12:05:16 UTC --- The problem persists on release 5.6. We cannot update fuse client to release 5 (or even 6) on our production environment because of this problem. I have examined the source code in more detail. rda_fill_fd_cbk() passes entries with zeroed iatts to other xlators, but for example, md-cache invalidates its cache entry when iatt with ia_ctime=0 is passed. In md-cache.c, function mdc_inode_iatt_set_validate(): if (!iatt || !iatt->ia_ctime) { gf_msg_callingfn("md-cache", GF_LOG_TRACE, 0, 0, "invalidating iatt(NULL)" "(%s)", uuid_utoa(inode->gfid)); mdc->ia_time = 0; mdc->valid = 0; gen = __mdc_get_generation(this, mdc); mdc->invalidation_time = (gen & 0xffffffff); goto unlock; } On the other hand, posix-acl updates its ctx without checking the content of the passed iatts. In posix-acl.c, function posix_acl_ctx_update(): ctx = __posix_acl_ctx_get(inode, this, _gf_true); if (!ctx) { ret = -1; goto unlock; } ctx->uid = buf->ia_uid; ctx->gid = buf->ia_gid; ctx->perm = st_mode_from_ia(buf->ia_prot, buf->ia_type); ctx->fop = fop; I think posix-acl.c should be modified not to update its ctx, when iatt with ia_ctime=0 is passed. --- Additional comment from Amar Tumballi on 2019-07-01 12:10:31 UTC --- Homma, Thanks for the detailed check on this. Your analysis seems right. Does the patch fix the issues for you? Will be sending a patch for the issue soon. --- Additional comment from on 2019-07-05 10:31:19 UTC --- Amar, Thank you for your prompt reply. I will try to see if the patch solves the problem when it has been sent. --- Additional comment from Worker Ant on 2019-07-05 10:44:31 UTC --- REVIEW: https://review.gluster.org/23003 (system/posix-acl: update ctx only if iatt is non-NULL) posted (#1) for review on master by Amar Tumballi --- Additional comment from on 2019-07-08 08:23:27 UTC --- Amar, I have built posix-acl.so applying your modification to release 5.6 source code, and confirmed that the problem is solved. Waiting for this fix to be officially released. --- Additional comment from Amar Tumballi on 2019-07-08 08:28:42 UTC --- Just note that it is 'your' fix. Ie, you identified it, I just sent it to repo. Once it gets merged, you can see yourself as contributor to glusterfs. --- Additional comment from Worker Ant on 2019-07-16 04:44:46 UTC --- REVIEW: https://review.gluster.org/23003 (system/posix-acl: update ctx only if iatt is non-NULL) merged (#4) on master by jiffin tony Thottan Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1668286 [Bug 1668286] READDIRP incorrectly updates posix-acl inode ctx -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 15 03:37:14 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 15 Aug 2019 03:37:14 +0000 Subject: [Bugs] [Bug 1668286] READDIRP incorrectly updates posix-acl inode ctx In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1668286 Mohit Agrawal changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1741402 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1741402 [Bug 1741402] READDIRP incorrectly updates posix-acl inode ctx -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 15 03:37:32 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 15 Aug 2019 03:37:32 +0000 Subject: [Bugs] [Bug 1741402] READDIRP incorrectly updates posix-acl inode ctx In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741402 Mohit Agrawal changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|bugs at gluster.org |moagrawa at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 15 08:09:58 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 15 Aug 2019 08:09:58 +0000 Subject: [Bugs] [Bug 789278] Issues reported by Coverity static analysis tool In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=789278 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23234 -- You are receiving this mail because: You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 15 08:09:59 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 15 Aug 2019 08:09:59 +0000 Subject: [Bugs] [Bug 789278] Issues reported by Coverity static analysis tool In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=789278 --- Comment #1636 from Worker Ant --- REVIEW: https://review.gluster.org/23234 (storage/posix) posted (#1) for review on master by None -- You are receiving this mail because: You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 15 09:15:38 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 15 Aug 2019 09:15:38 +0000 Subject: [Bugs] [Bug 1738878] FUSE client's memory leak In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1738878 --- Comment #6 from Csaba Henk --- Hi Sergey, can you please take statedumps at regular intervals during your test (say, in every 30 minutes, but feel free to adjust in light of the dynamic of the situation) so that we can observe the progress, and tar 'em up and attach to the bug? -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 15 09:49:02 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 15 Aug 2019 09:49:02 +0000 Subject: [Bugs] [Bug 789278] Issues reported by Coverity static analysis tool In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=789278 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23235 -- You are receiving this mail because: You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 15 09:49:04 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 15 Aug 2019 09:49:04 +0000 Subject: [Bugs] [Bug 789278] Issues reported by Coverity static analysis tool In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=789278 --- Comment #1637 from Worker Ant --- REVIEW: https://review.gluster.org/23235 (cluster/afr) posted (#1) for review on master by None -- You are receiving this mail because: You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 15 12:58:30 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 15 Aug 2019 12:58:30 +0000 Subject: [Bugs] [Bug 789278] Issues reported by Coverity static analysis tool In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=789278 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23236 -- You are receiving this mail because: You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 15 12:58:31 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 15 Aug 2019 12:58:31 +0000 Subject: [Bugs] [Bug 789278] Issues reported by Coverity static analysis tool In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=789278 --- Comment #1638 from Worker Ant --- REVIEW: https://review.gluster.org/23236 (storage/posix) posted (#1) for review on master by None -- You are receiving this mail because: You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 15 14:26:05 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 15 Aug 2019 14:26:05 +0000 Subject: [Bugs] [Bug 789278] Issues reported by Coverity static analysis tool In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=789278 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23237 -- You are receiving this mail because: You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 15 14:26:06 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 15 Aug 2019 14:26:06 +0000 Subject: [Bugs] [Bug 789278] Issues reported by Coverity static analysis tool In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=789278 --- Comment #1639 from Worker Ant --- REVIEW: https://review.gluster.org/23237 (features/cloudsync) posted (#1) for review on master by None -- You are receiving this mail because: You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 16 02:05:58 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 16 Aug 2019 02:05:58 +0000 Subject: [Bugs] [Bug 1741734] New: gluster-smb:glusto-test access gluster by cifs test write report Device or resource busy Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741734 Bug ID: 1741734 Summary: gluster-smb:glusto-test access gluster by cifs test write report Device or resource busy Product: GlusterFS Version: 6 Hardware: x86_64 OS: Linux Status: NEW Component: gluster-smb Severity: high Assignee: bugs at gluster.org Reporter: 13965432176 at 163.com CC: bugs at gluster.org Target Milestone: --- Classification: Community Description of problem: Write data Device or resource busy by client access volume when do gluster volume set testvol_distributed-replicated user.cifs on. Version-Release number of selected component (if applicable): gluster-6 extras/hook-scripts/start/post/S30samba-start.sh and ./extras/hook-scripts/set/post/S30samba-set.sh script How reproducible: enable gluster cifs and access volume by cifs. Smb client will report Device or resource busy when we echo or do cat command file Steps to Reproduce: 1.create 2*(2+1) test volume 2.gluster v set test user.cifs on 3.On the client and mount test by cifs 4.create file and echo "11111" >> file, will report Device or resource busy. Actual results: can not write or read file Expected results: can write or read file as usual Additional info: smb client: mount.cifs //10.10.51.22/gluster-testvol_dispersed /mnt/testvol_dispersed cd /mnt/testvol_dispersed cat file cat: file: Device or resource busy echo "11111" >> file -bash: file: Device or resource busy -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 16 02:31:57 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 16 Aug 2019 02:31:57 +0000 Subject: [Bugs] [Bug 1741734] gluster-smb:glusto-test access gluster by cifs test write report Device or resource busy In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741734 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23240 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 16 02:31:58 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 16 Aug 2019 02:31:58 +0000 Subject: [Bugs] [Bug 1741734] gluster-smb:glusto-test access gluster by cifs test write report Device or resource busy In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741734 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23240 (gluster-smb:add smb parameter when access gluster by cifs) posted (#1) for review on master by None -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 16 05:35:37 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 16 Aug 2019 05:35:37 +0000 Subject: [Bugs] [Bug 1158130] Not possible to disable fopen-keeo-cache when mounting In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1158130 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-16 05:35:37 --- Comment #3 from Worker Ant --- REVIEW: https://review.gluster.org/22678 (mount.glusterfs: make fcache-keep-open option take a value) merged (#4) on master by Amar Tumballi -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 16 05:37:34 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 16 Aug 2019 05:37:34 +0000 Subject: [Bugs] [Bug 1636297] Make it easy to build / host a project which just builds glusterfs translator In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1636297 --- Comment #11 from Worker Ant --- REVIEW: https://review.gluster.org/23015 (libglusterfs: remove dependency of rpc) merged (#4) on master by Amar Tumballi -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 16 06:15:07 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 16 Aug 2019 06:15:07 +0000 Subject: [Bugs] [Bug 1741779] New: Fix spelling errors Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741779 Bug ID: 1741779 Summary: Fix spelling errors Product: GlusterFS Version: mainline Status: NEW Component: geo-replication Assignee: bugs at gluster.org Reporter: sacharya at redhat.com CC: bugs at gluster.org Target Milestone: --- Classification: Community Description of problem: Fix spelling errors Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 16 06:15:43 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 16 Aug 2019 06:15:43 +0000 Subject: [Bugs] [Bug 1741779] Fix spelling errors In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741779 Shwetha K Acharya changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|bugs at gluster.org |sacharya at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 16 06:33:14 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 16 Aug 2019 06:33:14 +0000 Subject: [Bugs] [Bug 1741779] Fix spelling errors In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741779 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23242 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 16 06:33:15 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 16 Aug 2019 06:33:15 +0000 Subject: [Bugs] [Bug 1741779] Fix spelling errors In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741779 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23242 (geo-rep: Fix spelling errors) posted (#1) for review on master by Shwetha K Acharya -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 16 08:33:51 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 16 Aug 2019 08:33:51 +0000 Subject: [Bugs] [Bug 1385762] Don't create a directory if one with the same gfid exists In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1385762 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23245 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 16 08:33:52 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 16 Aug 2019 08:33:52 +0000 Subject: [Bugs] [Bug 1385762] Don't create a directory if one with the same gfid exists In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1385762 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |POST --- Comment #4 from Worker Ant --- REVIEW: https://review.gluster.org/23245 (storage/posix: Skip mkdir if gfid exists) posted (#1) for review on master by Sheetal Pamecha -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 16 09:10:03 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 16 Aug 2019 09:10:03 +0000 Subject: [Bugs] [Bug 1732776] I/O error on writes to a disperse volume when replace-brick is executed In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732776 errata-xmlrpc changed: What |Removed |Added ---------------------------------------------------------------------------- Status|VERIFIED |RELEASE_PENDING -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 16 09:10:03 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 16 Aug 2019 09:10:03 +0000 Subject: [Bugs] [Bug 1732772] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732772 errata-xmlrpc changed: What |Removed |Added ---------------------------------------------------------------------------- Status|VERIFIED |RELEASE_PENDING -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 16 09:10:05 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 16 Aug 2019 09:10:05 +0000 Subject: [Bugs] [Bug 1734303] Open fd heal should filter O_APPEND/O_EXCL In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734303 errata-xmlrpc changed: What |Removed |Added ---------------------------------------------------------------------------- Status|VERIFIED |RELEASE_PENDING -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 16 09:10:05 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 16 Aug 2019 09:10:05 +0000 Subject: [Bugs] [Bug 1732779] [GSS] An Input/Output error happens on a disperse volume when doing unaligned writes to a sparse file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732779 errata-xmlrpc changed: What |Removed |Added ---------------------------------------------------------------------------- Status|VERIFIED |RELEASE_PENDING -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 16 09:10:22 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 16 Aug 2019 09:10:22 +0000 Subject: [Bugs] [Bug 1732790] fix truncate lock to cover the write in tuncate clean In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732790 errata-xmlrpc changed: What |Removed |Added ---------------------------------------------------------------------------- Status|VERIFIED |RELEASE_PENDING -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 16 10:16:41 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 16 Aug 2019 10:16:41 +0000 Subject: [Bugs] [Bug 1737484] geo-rep syncing significantly behind and also only one of the directories are synced with tracebacks seen In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737484 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23247 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 16 10:55:46 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 16 Aug 2019 10:55:46 +0000 Subject: [Bugs] [Bug 1741890] New: geo-rep: Changelog archive file format is incorrect Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741890 Bug ID: 1741890 Summary: geo-rep: Changelog archive file format is incorrect Product: GlusterFS Version: 4.1 Status: NEW Component: geo-replication Assignee: bugs at gluster.org Reporter: khiremat at redhat.com CC: bugs at gluster.org Target Milestone: --- Classification: Community Description of problem: The created changelog archive file didn't have corresponding year and month. It created as "archive_%Y%m.tar" on python2 only systems. [root at rhs-gp-srv7 xsync]# ls -l total 664564 -rw-r--r--. 1 root root 680509440 Aug 15 16:51 archive_%Y%m.tar [root at rhs-gp-srv7 xsync]# Version-Release number of selected component (if applicable): mainline How reproducible: Always on python2 only machine (centos7) Steps to Reproduce: 1. Create geo-rep session on python2 only machine 2. ls -l /var/lib/misc/gluster/gsyncd///.processed/ Actual results: changelog archive file format is incorrect. Not substituted with corresponding year and month Expected results: changelog archive file name should have correct year and month Additional info: -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 16 10:55:55 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 16 Aug 2019 10:55:55 +0000 Subject: [Bugs] [Bug 1741890] geo-rep: Changelog archive file format is incorrect In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741890 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Version|4.1 |mainline -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 16 10:16:42 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 16 Aug 2019 10:16:42 +0000 Subject: [Bugs] [Bug 1737484] geo-rep syncing significantly behind and also only one of the directories are synced with tracebacks seen In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737484 --- Comment #3 from Worker Ant --- REVIEW: https://review.gluster.org/23247 (geo-rep: Fix worker connection issue) posted (#1) for review on master by Kotresh HR -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 16 10:56:08 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 16 Aug 2019 10:56:08 +0000 Subject: [Bugs] [Bug 1741890] geo-rep: Changelog archive file format is incorrect In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741890 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED Assignee|bugs at gluster.org |khiremat at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 16 10:59:24 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 16 Aug 2019 10:59:24 +0000 Subject: [Bugs] [Bug 1741890] geo-rep: Changelog archive file format is incorrect In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741890 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23248 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 16 10:59:26 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 16 Aug 2019 10:59:26 +0000 Subject: [Bugs] [Bug 1741890] geo-rep: Changelog archive file format is incorrect In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741890 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23248 (geo-rep: Fix the name of changelog archive file) posted (#1) for review on master by Kotresh HR -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 16 11:01:22 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 16 Aug 2019 11:01:22 +0000 Subject: [Bugs] [Bug 1732790] fix truncate lock to cover the write in tuncate clean In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732790 errata-xmlrpc changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RELEASE_PENDING |CLOSED Resolution|--- |ERRATA Last Closed| |2019-08-16 11:01:22 --- Comment #11 from errata-xmlrpc --- Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2515 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 16 11:01:23 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 16 Aug 2019 11:01:23 +0000 Subject: [Bugs] [Bug 1732770] fix truncate lock to cover the write in tuncate clean In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732770 Bug 1732770 depends on bug 1732790, which changed state. Bug 1732790 Summary: fix truncate lock to cover the write in tuncate clean https://bugzilla.redhat.com/show_bug.cgi?id=1732790 What |Removed |Added ---------------------------------------------------------------------------- Status|RELEASE_PENDING |CLOSED Resolution|--- |ERRATA -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 16 11:01:25 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 16 Aug 2019 11:01:25 +0000 Subject: [Bugs] [Bug 1732790] fix truncate lock to cover the write in tuncate clean In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732790 errata-xmlrpc changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Red Hat Product Errata | |RHBA-2019:2515 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 16 11:04:34 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 16 Aug 2019 11:04:34 +0000 Subject: [Bugs] [Bug 1732772] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732772 errata-xmlrpc changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RELEASE_PENDING |CLOSED Resolution|--- |ERRATA Last Closed| |2019-08-16 11:04:34 --- Comment #12 from errata-xmlrpc --- Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2514 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 16 11:04:37 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 16 Aug 2019 11:04:37 +0000 Subject: [Bugs] [Bug 1732792] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732792 Bug 1732792 depends on bug 1732772, which changed state. Bug 1732772 Summary: Disperse volume : data corruption with ftruncate data in 4+2 config https://bugzilla.redhat.com/show_bug.cgi?id=1732772 What |Removed |Added ---------------------------------------------------------------------------- Status|RELEASE_PENDING |CLOSED Resolution|--- |ERRATA -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 16 11:04:34 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 16 Aug 2019 11:04:34 +0000 Subject: [Bugs] [Bug 1732776] I/O error on writes to a disperse volume when replace-brick is executed In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732776 errata-xmlrpc changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RELEASE_PENDING |CLOSED Resolution|--- |ERRATA Last Closed|2019-07-24 11:48:04 |2019-08-16 11:04:34 --- Comment #13 from errata-xmlrpc --- Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2514 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 16 11:04:39 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 16 Aug 2019 11:04:39 +0000 Subject: [Bugs] [Bug 1732793] I/O error on writes to a disperse volume when replace-brick is executed In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732793 Bug 1732793 depends on bug 1732776, which changed state. Bug 1732776 Summary: I/O error on writes to a disperse volume when replace-brick is executed https://bugzilla.redhat.com/show_bug.cgi?id=1732776 What |Removed |Added ---------------------------------------------------------------------------- Status|RELEASE_PENDING |CLOSED Resolution|--- |ERRATA -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 16 11:04:34 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 16 Aug 2019 11:04:34 +0000 Subject: [Bugs] [Bug 1732779] [GSS] An Input/Output error happens on a disperse volume when doing unaligned writes to a sparse file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732779 errata-xmlrpc changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RELEASE_PENDING |CLOSED Resolution|--- |ERRATA Last Closed| |2019-08-16 11:04:34 --- Comment #10 from errata-xmlrpc --- Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2514 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 16 11:04:43 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 16 Aug 2019 11:04:43 +0000 Subject: [Bugs] [Bug 1731448] [GSS] An Input/Output error happens on a disperse volume when doing unaligned writes to a sparse file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1731448 Bug 1731448 depends on bug 1732779, which changed state. Bug 1732779 Summary: [GSS] An Input/Output error happens on a disperse volume when doing unaligned writes to a sparse file https://bugzilla.redhat.com/show_bug.cgi?id=1732779 What |Removed |Added ---------------------------------------------------------------------------- Status|RELEASE_PENDING |CLOSED Resolution|--- |ERRATA -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 16 11:04:34 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 16 Aug 2019 11:04:34 +0000 Subject: [Bugs] [Bug 1734303] Open fd heal should filter O_APPEND/O_EXCL In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734303 errata-xmlrpc changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RELEASE_PENDING |CLOSED Resolution|--- |ERRATA Last Closed| |2019-08-16 11:04:34 --- Comment #11 from errata-xmlrpc --- Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2514 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 16 11:04:44 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 16 Aug 2019 11:04:44 +0000 Subject: [Bugs] [Bug 1735514] Open fd heal should filter O_APPEND/O_EXCL In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1735514 Bug 1735514 depends on bug 1734303, which changed state. Bug 1734303 Summary: Open fd heal should filter O_APPEND/O_EXCL https://bugzilla.redhat.com/show_bug.cgi?id=1734303 What |Removed |Added ---------------------------------------------------------------------------- Status|RELEASE_PENDING |CLOSED Resolution|--- |ERRATA -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 16 11:04:46 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 16 Aug 2019 11:04:46 +0000 Subject: [Bugs] [Bug 1732772] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732772 errata-xmlrpc changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Red Hat Product Errata | |RHBA-2019:2514 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 16 11:04:46 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 16 Aug 2019 11:04:46 +0000 Subject: [Bugs] [Bug 1732776] I/O error on writes to a disperse volume when replace-brick is executed In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732776 errata-xmlrpc changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Red Hat Product Errata | |RHBA-2019:2514 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 16 11:04:46 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 16 Aug 2019 11:04:46 +0000 Subject: [Bugs] [Bug 1732779] [GSS] An Input/Output error happens on a disperse volume when doing unaligned writes to a sparse file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732779 errata-xmlrpc changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Red Hat Product Errata | |RHBA-2019:2514 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 16 11:04:46 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 16 Aug 2019 11:04:46 +0000 Subject: [Bugs] [Bug 1734303] Open fd heal should filter O_APPEND/O_EXCL In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734303 errata-xmlrpc changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Red Hat Product Errata | |RHBA-2019:2514 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 16 11:49:02 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 16 Aug 2019 11:49:02 +0000 Subject: [Bugs] [Bug 1741899] New: the volume of occupied space in the bricks of gluster volume (3 nodes replica) differs on nodes and the healing does not fix it Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741899 Bug ID: 1741899 Summary: the volume of occupied space in the bricks of gluster volume (3 nodes replica) differs on nodes and the healing does not fix it Product: GlusterFS Version: 5 Hardware: x86_64 OS: Linux Status: NEW Component: glusterd Severity: medium Assignee: bugs at gluster.org Reporter: s.pleshkov at hostco.ru CC: bugs at gluster.org Target Milestone: --- Classification: Community Description of problem: I have a gluster volume on 3 nodes (replicate) with following configuration [root at LSY-GL-0(1,2,3) /]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.6 (Maipo) [root at LSY-GL-02 host]# gluster volume info TST Volume Name: TST Type: Replicate Volume ID: a96c7b8c-61ec-4a4d-b47e-b445faf6c39b Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: lsy-gl-01:/diskForTestData/tst Brick2: lsy-gl-02:/diskForTestData/tst Brick3: lsy-gl-03:/diskForTestData/tst Options Reconfigured: cluster.favorite-child-policy: size features.shard-block-size: 64MB features.shard: on performance.io-thread-count: 24 client.event-threads: 24 server.event-threads: 24 server.allow-insecure: on network.ping-timeout: 5 transport.address-family: inet nfs.disable: on performance.client-io-threads: off cluster.heal-timeout: 120 Recently this volume been moved to other disk by command gluster volume replace-brick TST lsy-gl-0(1,2,3):/diskForData/tst lsy-gl-0(1,2,3):/diskForTestData/tst commit force sequentialy, started with lsy-gl-03 node, all nodes been online And now i have this state [root at LSY-GL-02 host]# gluster volume status TST detail Status of volume: TST ------------------------------------------------------------------------------ Brick : Brick lsy-gl-01:/diskForTestData/tst TCP Port : 49154 RDMA Port : 0 Online : Y Pid : 7555 File System : xfs Device : /dev/sdc1 Mount Options : rw,seclabel,relatime,attr2,inode64,noquota Inode Size : 512 Disk Space Free : 399.9GB Total Disk Space : 499.8GB Inode Count : 262143488 Free Inodes : 261684925 ------------------------------------------------------------------------------ Brick : Brick lsy-gl-02:/diskForTestData/tst TCP Port : 49154 RDMA Port : 0 Online : Y Pid : 25732 File System : xfs Device : /dev/sdc1 Mount Options : rw,seclabel,relatime,attr2,inode64,noquota Inode Size : 512 Disk Space Free : 399.9GB Total Disk Space : 499.8GB Inode Count : 262143488 Free Inodes : 261684925 ------------------------------------------------------------------------------ Brick : Brick lsy-gl-03:/diskForTestData/tst TCP Port : 49154 RDMA Port : 0 Online : Y Pid : 25243 File System : xfs Device : /dev/sdc1 Mount Options : rw,seclabel,relatime,attr2,inode64,noquota Inode Size : 512 Disk Space Free : 357.6GB Total Disk Space : 499.8GB Inode Count : 262143488 Free Inodes : 261684112 [root at LSY-GL-02 host]# gluster volume heal TST full Launching heal operation to perform full self heal on volume TST has been successful Use heal info commands to check status. [root at LSY-GL-02 host]# gluster volume heal TST info Brick lsy-gl-01:/diskForTestData/tst Status: Connected Number of entries: 0 Brick lsy-gl-02:/diskForTestData/tst Status: Connected Number of entries: 0 Brick lsy-gl-03:/diskForTestData/tst Status: Connected Number of entries: 0 [root at LSY-GL-01 /]# df -Th Filesystem Type Size Used Avail Use% Mounted on LSY-GL-01:/TST fuse.glusterfs 500G 148G 353G 30% /mnt/tst /dev/sdc1 xfs 500G 100G 400G 20% /diskForTestData [root at LSY-GL-02 host]# df -Th Filesystem Type Size Used Avail Use% Mounted on LSY-GL-02:/TST fuse.glusterfs 500G 148G 353G 30% /mnt/tst /dev/sdc1 xfs 500G 100G 400G 20% /diskForTestData [root at LSY-GL-03 host]# df -Th Filesystem Type Size Used Avail Use% Mounted on /dev/sdc1 xfs 500G 143G 358G 29% /diskForTestData LSY-GL-03:/TST fuse.glusterfs 500G 148G 353G 30% /mnt/tst Version-Release number of selected component (if applicable): [root at LSY-GL-0(1,2,3) /]# rpm -qa | grep gluster* glusterfs-libs-5.5-1.el7.x86_64 glusterfs-fuse-5.5-1.el7.x86_64 glusterfs-client-xlators-5.5-1.el7.x86_64 centos-release-gluster5-1.0-1.el7.centos.noarch glusterfs-api-5.5-1.el7.x86_64 glusterfs-cli-5.5-1.el7.x86_64 nfs-ganesha-gluster-2.7.1-1.el7.x86_64 glusterfs-5.5-1.el7.x86_64 glusterfs-server-5.5-1.el7.x86_64 How reproducible: Umm, I will test it again soon and do comment Steps to Reproduce: 1. 2. 3. Actual results: Size of brick on lsy-gl-01, and lsy-gl-02 differ from size brick on lsy-gl-03. Healing full not fixed this situation Expected results: What things I should do to fix it ? Additional info: -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 16 11:58:40 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 16 Aug 2019 11:58:40 +0000 Subject: [Bugs] [Bug 1741899] the volume of occupied space in the bricks of gluster volume (3 nodes replica) differs on nodes and the healing does not fix it In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741899 Sergey Pleshkov changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |s.pleshkov at hostco.ru --- Comment #1 from Sergey Pleshkov --- Before replace-brick I have split-brain event, but after nodes up it been healed automaticly -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 16 12:00:48 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 16 Aug 2019 12:00:48 +0000 Subject: [Bugs] [Bug 1738878] FUSE client's memory leak In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1738878 --- Comment #7 from Sergey Pleshkov --- Hi everybody, I will try to reproduce this problem on test environment on next week and will do statedump in process of testing -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 16 14:50:41 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 16 Aug 2019 14:50:41 +0000 Subject: [Bugs] [Bug 1739320] The result (hostname) of getnameinfo for all bricks (ipv6 addresses) are the same, while they are not. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739320 --- Comment #7 from Amgad --- I did some testing on "getnameinfo" and it works fine. When you pass the IPv6 address, it returns the right IP address. I used the following test program: #include #include #include #include #define SIZE 1024 int main(int argc, char *argv[]) { char host[SIZE]; char service[SIZE]; struct sockaddr_in6 sa; sa.sin6_family = AF_INET6; inet_pton(AF_INET6, argv[1], &sa.sin6_addr); int res = getnameinfo((struct sockaddr*)&sa, sizeof(sa), host, sizeof(host), service, sizeof(service), 0); if(res) { exit(1); } else { printf("Hostname: %s\n", host); printf("Service: %s\n", service); } return 0; } So I think the problem in what is passed to: glusterd_compare_addrinfo(struct addrinfo *first, struct addrinfo *next) -- -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 16 20:52:30 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 16 Aug 2019 20:52:30 +0000 Subject: [Bugs] [Bug 1739320] The result (hostname) of getnameinfo for all bricks (ipv6 addresses) are the same, while they are not. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739320 --- Comment #8 from Amgad --- dig more in the glusterd-volume-ops.c file where the "glusterd_compare_addrinfo" function is called by "glusterd_check_brick_order", The following code which prepares what is passed to "glusterd_compare_addrinfo", "getaddrinfo" doesn't seem to return the right address. brick_list_dup = brick_list_ptr = gf_strdup(brick_list); /* Resolve hostnames and get addrinfo */ while (i < brick_count) { ++i; brick = strtok_r(brick_list_dup, " \n", &tmpptr); brick_list_dup = tmpptr; if (brick == NULL) goto check_failed; brick = strtok_r(brick, ":", &tmpptr); if (brick == NULL) goto check_failed; ret = getaddrinfo(brick, NULL, NULL, &ai_info); if (ret != 0) { ret = 0; gf_msg(this->name, GF_LOG_ERROR, 0, GD_MSG_HOSTNAME_RESOLVE_FAIL, "unable to resolve " "host name"); goto out; } ai_list_tmp1 = MALLOC(sizeof(addrinfo_list_t)); if (ai_list_tmp1 == NULL) { ret = 0; gf_msg(this->name, GF_LOG_ERROR, ENOMEM, GD_MSG_NO_MEMORY, "failed to allocate " "memory"); freeaddrinfo(ai_info); goto out; } ai_list_tmp1->info = ai_info; cds_list_add_tail(&ai_list_tmp1->list, &ai_list->list); ai_list_tmp1 = NULL; } I wrote a small program to call it and it always returns --> "0.0.0.0", so maybe that's why later the code assumes it's the same host. It works though for IPv4. Also, have to loop thru the list to get the right address. I'll dig more, but I hope that gives some direction to other developers to check -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 16 23:32:34 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 16 Aug 2019 23:32:34 +0000 Subject: [Bugs] [Bug 1375431] [RFE] enable sharding and strict-o-direct with virt profile - /var/lib/glusterd/groups/virt In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1375431 Nir Soffer changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |nsoffer at redhat.com Flags| |needinfo?(kdhananj at redhat.c | |om) --- Comment #16 from Nir Soffer --- Krutika, according to comment 10, remote-dio and enabling strict-o-direct should be part of the virt group, but this bug was closed without adding them. So it looks like this bug was closed without implementing the requested feature. We seems to have issues like this: https://bugzilla.redhat.com/show_bug.cgi?id=1737256#c10 Because strict-o-direct is not part of the virt group. Should we file a new RFE for including it in the virt group? -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Sat Aug 17 03:40:24 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Sat, 17 Aug 2019 03:40:24 +0000 Subject: [Bugs] [Bug 1741734] gluster-smb:glusto-test access gluster by cifs test write report Device or resource busy In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741734 yinkui <13965432176 at 163.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Version|6 |mainline -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Sun Aug 18 08:05:25 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Sun, 18 Aug 2019 08:05:25 +0000 Subject: [Bugs] [Bug 789278] Issues reported by Coverity static analysis tool In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=789278 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23249 -- You are receiving this mail because: You are the assignee for the bug. From bugzilla at redhat.com Sun Aug 18 08:05:26 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Sun, 18 Aug 2019 08:05:26 +0000 Subject: [Bugs] [Bug 789278] Issues reported by Coverity static analysis tool In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=789278 --- Comment #1640 from Worker Ant --- REVIEW: https://review.gluster.org/23249 (features/utime - fixing a coverity issue) posted (#1) for review on master by None -- You are receiving this mail because: You are the assignee for the bug. From bugzilla at redhat.com Sun Aug 18 08:25:51 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Sun, 18 Aug 2019 08:25:51 +0000 Subject: [Bugs] [Bug 789278] Issues reported by Coverity static analysis tool In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=789278 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23250 -- You are receiving this mail because: You are the assignee for the bug. From bugzilla at redhat.com Sun Aug 18 08:25:52 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Sun, 18 Aug 2019 08:25:52 +0000 Subject: [Bugs] [Bug 789278] Issues reported by Coverity static analysis tool In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=789278 --- Comment #1641 from Worker Ant --- REVIEW: https://review.gluster.org/23250 (api - fixing a coverity issue) posted (#1) for review on master by None -- You are receiving this mail because: You are the assignee for the bug. From bugzilla at redhat.com Sun Aug 18 09:25:16 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Sun, 18 Aug 2019 09:25:16 +0000 Subject: [Bugs] [Bug 789278] Issues reported by Coverity static analysis tool In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=789278 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23251 -- You are receiving this mail because: You are the assignee for the bug. From bugzilla at redhat.com Sun Aug 18 09:25:17 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Sun, 18 Aug 2019 09:25:17 +0000 Subject: [Bugs] [Bug 789278] Issues reported by Coverity static analysis tool In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=789278 --- Comment #1642 from Worker Ant --- REVIEW: https://review.gluster.org/23251 (protocol/client - fixing a coverity issue) posted (#1) for review on master by None -- You are receiving this mail because: You are the assignee for the bug. From bugzilla at redhat.com Sun Aug 18 12:03:50 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Sun, 18 Aug 2019 12:03:50 +0000 Subject: [Bugs] [Bug 1410100] Package arequal-checksum for broader community use In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1410100 Rishubh Jain changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Sun Aug 18 12:04:32 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Sun, 18 Aug 2019 12:04:32 +0000 Subject: [Bugs] [Bug 1410100] Package arequal-checksum for broader community use In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1410100 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23252 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Sun Aug 18 12:04:33 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Sun, 18 Aug 2019 12:04:33 +0000 Subject: [Bugs] [Bug 1410100] Package arequal-checksum for broader community use In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1410100 --- Comment #4 from Worker Ant --- REVIEW: https://review.gluster.org/23252 ([WIP]Package arequal-checksum for broader community use) posted (#1) for review on master by Rishubh Jain -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Sun Aug 18 12:39:00 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Sun, 18 Aug 2019 12:39:00 +0000 Subject: [Bugs] [Bug 789278] Issues reported by Coverity static analysis tool In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=789278 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23255 -- You are receiving this mail because: You are the assignee for the bug. From bugzilla at redhat.com Sun Aug 18 12:39:01 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Sun, 18 Aug 2019 12:39:01 +0000 Subject: [Bugs] [Bug 789278] Issues reported by Coverity static analysis tool In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=789278 --- Comment #1643 from Worker Ant --- REVIEW: https://review.gluster.org/23255 (api -fixing a coverity issue) posted (#1) for review on master by None -- You are receiving this mail because: You are the assignee for the bug. From bugzilla at redhat.com Sun Aug 18 12:48:53 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Sun, 18 Aug 2019 12:48:53 +0000 Subject: [Bugs] [Bug 1668239] [man page] Gluster(8) - Missing disperse-data parameter Gluster Console Manager man page In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1668239 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23258 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Sun Aug 18 12:48:55 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Sun, 18 Aug 2019 12:48:55 +0000 Subject: [Bugs] [Bug 1668239] [man page] Gluster(8) - Missing disperse-data parameter Gluster Console Manager man page In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1668239 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23258 (Updating gluster 8 manual.) posted (#1) for review on master by Rishubh Jain -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Sun Aug 18 12:54:43 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Sun, 18 Aug 2019 12:54:43 +0000 Subject: [Bugs] [Bug 1668239] [man page] Gluster(8) - Missing disperse-data parameter Gluster Console Manager man page In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1668239 --- Comment #2 from Rishubh Jain --- https://review.gluster.org/#/c/glusterfs/+/23258 Updates the gluster manual with 'disperse-data ' but does not include details about utilization scenerio. Please refer here[https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.1/html/administration_guide/chap-red_hat_storage_volumes-creating_dispersed_volumes_1] for the utilization scenerio. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Sun Aug 18 12:56:16 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Sun, 18 Aug 2019 12:56:16 +0000 Subject: [Bugs] [Bug 789278] Issues reported by Coverity static analysis tool In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=789278 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23259 -- You are receiving this mail because: You are the assignee for the bug. From bugzilla at redhat.com Sun Aug 18 12:56:16 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Sun, 18 Aug 2019 12:56:16 +0000 Subject: [Bugs] [Bug 789278] Issues reported by Coverity static analysis tool In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=789278 --- Comment #1644 from Worker Ant --- REVIEW: https://review.gluster.org/23259 (cluster/afr - Unused variables) posted (#1) for review on master by None -- You are receiving this mail because: You are the assignee for the bug. From bugzilla at redhat.com Sun Aug 18 14:41:10 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Sun, 18 Aug 2019 14:41:10 +0000 Subject: [Bugs] [Bug 789278] Issues reported by Coverity static analysis tool In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=789278 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23260 -- You are receiving this mail because: You are the assignee for the bug. From bugzilla at redhat.com Sun Aug 18 14:41:12 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Sun, 18 Aug 2019 14:41:12 +0000 Subject: [Bugs] [Bug 789278] Issues reported by Coverity static analysis tool In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=789278 --- Comment #1645 from Worker Ant --- REVIEW: https://review.gluster.org/23260 (mount/fuse - Fixing a coverity issue) posted (#1) for review on master by None -- You are receiving this mail because: You are the assignee for the bug. From bugzilla at redhat.com Sun Aug 18 14:54:14 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Sun, 18 Aug 2019 14:54:14 +0000 Subject: [Bugs] [Bug 789278] Issues reported by Coverity static analysis tool In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=789278 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23261 -- You are receiving this mail because: You are the assignee for the bug. From bugzilla at redhat.com Sun Aug 18 14:54:15 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Sun, 18 Aug 2019 14:54:15 +0000 Subject: [Bugs] [Bug 789278] Issues reported by Coverity static analysis tool In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=789278 --- Comment #1646 from Worker Ant --- REVIEW: https://review.gluster.org/23261 (storage/posix - Fixing a coverity issue) posted (#1) for review on master by None -- You are receiving this mail because: You are the assignee for the bug. From bugzilla at redhat.com Sun Aug 18 15:56:03 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Sun, 18 Aug 2019 15:56:03 +0000 Subject: [Bugs] [Bug 1743020] New: glusterd start is failed and throwing an error Address already in use Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743020 Bug ID: 1743020 Summary: glusterd start is failed and throwing an error Address already in use Product: GlusterFS Version: mainline Status: NEW Component: rpc Assignee: bugs at gluster.org Reporter: moagrawa at redhat.com CC: bugs at gluster.org Target Milestone: --- Classification: Community Description of problem: Some of the .t are failing due to glusterd start failed after kill all gluster processes. Version-Release number of selected component (if applicable): How reproducible: Run regression test suite and below test case are failing ./tests/bugs/glusterd/brick-mux-validation.t ./tests/bugs/cli/bug-1077682.t ./tests/basic/glusterd-restart-shd-mux.t ./tests/bugs/core/multiplex-limit-issue-151.t Steps to Reproduce: 1. 2. 3. Actual results: test cases are failing Expected results: test case should not fail Additional info: -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Sun Aug 18 15:56:29 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Sun, 18 Aug 2019 15:56:29 +0000 Subject: [Bugs] [Bug 1743020] glusterd start is failed and throwing an error Address already in use In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743020 Mohit Agrawal changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|bugs at gluster.org |moagrawa at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Sun Aug 18 15:59:48 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Sun, 18 Aug 2019 15:59:48 +0000 Subject: [Bugs] [Bug 1743020] glusterd start is failed and throwing an error Address already in use In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743020 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23211 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Sun Aug 18 15:59:49 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Sun, 18 Aug 2019 15:59:49 +0000 Subject: [Bugs] [Bug 1743020] glusterd start is failed and throwing an error Address already in use In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743020 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23211 (rpc: glusterd start is failed and throwing an error Address already in use) posted (#9) for review on master by MOHIT AGRAWAL -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 19 03:45:41 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 03:45:41 +0000 Subject: [Bugs] [Bug 1743020] glusterd start is failed and throwing an error Address already in use In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743020 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-19 03:45:41 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23211 (rpc: glusterd start is failed and throwing an error Address already in use) merged (#9) on master by MOHIT AGRAWAL -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 19 04:10:06 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 04:10:06 +0000 Subject: [Bugs] [Bug 1743069] New: bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t fails in brick mux regression spuriously Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743069 Bug ID: 1743069 Summary: bug-1482023-snpashot-issue-with-other-processes-access ing-mounted-path.t fails in brick mux regression spuriously Product: GlusterFS Version: mainline Status: NEW Component: tests Assignee: bugs at gluster.org Reporter: amukherj at redhat.com CC: bugs at gluster.org Target Milestone: --- Classification: Community Description of problem: bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t fails in brick mux nightly run often. While we need to have a permanent fix to this with https://review.gluster.org/#/c/glusterfs/+/22949/ (which is WIP), this bug is to track this open defect. Version-Release number of selected component (if applicable): mainline How reproducible: Often -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 19 04:13:55 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 04:13:55 +0000 Subject: [Bugs] [Bug 1743069] bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t fails in brick mux regression spuriously In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743069 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23262 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 19 04:13:57 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 04:13:57 +0000 Subject: [Bugs] [Bug 1743069] bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t fails in brick mux regression spuriously In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743069 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23262 (tests: mark bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t as BRICK_MUX_BAD_TEST) posted (#1) for review on master by Atin Mukherjee -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 19 04:18:10 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 04:18:10 +0000 Subject: [Bugs] [Bug 1741899] the volume of occupied space in the bricks of gluster volume (3 nodes replica) differs on nodes and the healing does not fix it In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741899 Amar Tumballi changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|unspecified |medium CC| |atumball at redhat.com, | |pkarampu at redhat.com, | |ravishankar at redhat.com Assignee|bugs at gluster.org |ksubrahm at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 19 05:49:17 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 05:49:17 +0000 Subject: [Bugs] [Bug 1743094] New: glusterfs build fails on centos7 Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743094 Bug ID: 1743094 Summary: glusterfs build fails on centos7 Product: GlusterFS Version: mainline Status: NEW Component: core Severity: urgent Priority: urgent Assignee: atumball at redhat.com Reporter: atumball at redhat.com CC: bugs at gluster.org, kkeithle at redhat.com, nbalacha at redhat.com, ndevos at redhat.com Target Milestone: --- Classification: Community Description of problem: Noticed that the builds are failing on my centos7 machine with below errors: make[1]: Leaving directory '/root/glusterfs/code/glusterfs/libglusterfs' Making install in rpc make[1]: Entering directory '/root/glusterfs/code/glusterfs/rpc' Making install in xdr make[2]: Entering directory '/root/glusterfs/code/glusterfs/rpc/xdr' Making install in src make[3]: Entering directory '/root/glusterfs/code/glusterfs/rpc/xdr/src' glusterfs3-xdr.x: Too many levels of symbolic links make[3]: *** [Makefile:961: glusterfs3-xdr.h] Error 1 make[3]: Leaving directory '/root/glusterfs/code/glusterfs/rpc/xdr/src' make[2]: *** [Makefile:443: install-recursive] Error 1 make[2]: Leaving directory '/root/glusterfs/code/glusterfs/rpc/xdr' make[1]: *** [Makefile:443: install-recursive] Error 1 make[1]: Leaving directory '/root/glusterfs/code/glusterfs/rpc' make: *** [Makefile:574: install-recursive] Error 1 Version-Release number of selected component (if applicable): lastest master. How reproducible: 100% Steps to Reproduce: 1. git checkout glusterfs (on centos7) 2. ./autogen.sh; ./configure; make 3. Additional info: This started happening after having https://review.gluster.org/23015 at the top of the branch. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 19 05:53:14 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 05:53:14 +0000 Subject: [Bugs] [Bug 1743094] glusterfs build fails on centos7 In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743094 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23263 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 19 05:53:16 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 05:53:16 +0000 Subject: [Bugs] [Bug 1743094] glusterfs build fails on centos7 In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743094 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23263 (rpc/xdr: fixes in Makefile) posted (#1) for review on master by Amar Tumballi -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 19 06:02:22 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 06:02:22 +0000 Subject: [Bugs] [Bug 1741899] the volume of occupied space in the bricks of gluster volume (3 nodes replica) differs on nodes and the healing does not fix it In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741899 Ravishankar N changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(s.pleshkov at hostco | |.ru) --- Comment #2 from Ravishankar N --- > gluster volume replace-brick TST lsy-gl-0(1,2,3):/diskForData/tst lsy-gl-0(1,2,3):/diskForTestData/tst commit force I assume you ran the replace-brick command thrice, once for each brick. Did you wait for heal count to be zero after each replace-brick? If not, you can end up with incomplete heals. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 19 06:11:32 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 06:11:32 +0000 Subject: [Bugs] [Bug 1741899] the volume of occupied space in the bricks of gluster volume (3 nodes replica) differs on nodes and the healing does not fix it In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741899 Sergey Pleshkov changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(s.pleshkov at hostco | |.ru) | --- Comment #3 from Sergey Pleshkov --- Hello. Replace brick commands were executed sequentially on all nodes with 12-24 hour pause. Heal count be zero every time. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 19 06:14:52 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 06:14:52 +0000 Subject: [Bugs] [Bug 1743069] bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t fails in brick mux regression spuriously In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743069 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23262 (tests: mark bug-1482023-snpashot-issue-with-other-processes-accessing-mounted-path.t as BRICK_MUX_BAD_TEST) merged (#1) on master by Atin Mukherjee -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 19 06:27:17 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 06:27:17 +0000 Subject: [Bugs] [Bug 1741899] the volume of occupied space in the bricks of gluster volume (3 nodes replica) differs on nodes and the healing does not fix it In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741899 --- Comment #4 from Ravishankar N --- Could you check if there is actual missing data on lsy-gl-03? You might need to compute the checksum of each brick individually. https://github.com/gluster/glusterfs/blob/master/tests/utils/arequal-checksum.c can be used for that. # gcc tests/utils/arequal-checksum.c -o arequal-checksum On each brick, #./arequal-checksum -p /diskForTestData/tst -i .glusterfs (See ./arequal-checksum --help for details). -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 19 07:19:37 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 07:19:37 +0000 Subject: [Bugs] [Bug 1735514] Open fd heal should filter O_APPEND/O_EXCL In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1735514 nchilaka changed: What |Removed |Added ---------------------------------------------------------------------------- QA Contact|nchilaka at redhat.com |ubansal at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 19 07:28:48 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 07:28:48 +0000 Subject: [Bugs] [Bug 1626543] dht/tests: Create a .t to test all possible combinations for file rename In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1626543 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-19 07:28:48 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/21121 (tests/dht: Add a test file for file renames) merged (#6) on master by N Balachandran -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 19 07:36:49 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 07:36:49 +0000 Subject: [Bugs] [Bug 789278] Issues reported by Coverity static analysis tool In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=789278 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23264 -- You are receiving this mail because: You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 19 07:52:58 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 07:52:58 +0000 Subject: [Bugs] [Bug 1738778] Unable to setup softserve VM In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1738778 M. Scherer changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mscherer at redhat.com --- Comment #3 from M. Scherer --- That is not a infra issue, the inventory is wrong: https://github.com/gluster/softserve/blob/master/playbooks/inventory#L6 Regular non infra folks do not have access to that server to serve as a bastion. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 19 07:36:50 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 07:36:50 +0000 Subject: [Bugs] [Bug 789278] Issues reported by Coverity static analysis tool In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=789278 --- Comment #1647 from Worker Ant --- REVIEW: https://review.gluster.org/23264 (libglusterfs - fixing a coverity issue) posted (#1) for review on master by Barak Sason Rofman -- You are receiving this mail because: You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 5 13:35:15 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 05 Aug 2019 13:35:15 +0000 Subject: [Bugs] [Bug 1737484] New: geo-rep syncing significantly behind and also only one of the directories are synced with tracebacks seen Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737484 Bug ID: 1737484 Summary: geo-rep syncing significantly behind and also only one of the directories are synced with tracebacks seen Product: GlusterFS Version: mainline Status: NEW Component: geo-replication Keywords: Regression Severity: urgent Assignee: bugs at gluster.org Reporter: avishwan at redhat.com CC: amukherj at redhat.com, avishwan at redhat.com, bugs at gluster.org, csaba at redhat.com, hgowtham at redhat.com, khiremat at redhat.com, ksubrahm at redhat.com, moagrawa at redhat.com, nchilaka at redhat.com, rgowdapp at redhat.com, rhinduja at redhat.com, rhs-bugs at redhat.com, sankarshan at redhat.com, sheggodu at redhat.com, spalai at redhat.com, storage-qa-internal at redhat.com, sunkumar at redhat.com Depends On: 1729915 Target Milestone: --- Classification: Community +++ This bug was initially created as a clone of Bug #1729915 +++ Description of problem: ======================= had setup a georep session b/w a 4x3 master volume and 4x(4+2) ec volume. I see below issues in my test-bed 1) the volume has two main directories, called IOs and logs, with IOs directory being the place where all the workloads related IOs are happpening. logs directory is hosting a dedicated file for each client which is collecting the resource output every few minutes in append mode. The problem is till now, ie after about 3 days, the logs directory hasn't even been created 2) the syncing has been very slow paced, even after 3 days, slave is yet to catch up. Master had about 1.1TB data while slave has just about 350gb of data 3) I have seen some tracebacks in gsync log as below /var/log/glusterfs/geo-replication/nonfuncvol_rhs-gp-srv13.lab.eng.blr.redhat.com_nonfuncvol-slave/gsyncd.log-20190714 [2019-07-13 12:26:53.408348] E [syncdutils(worker /gluster/brick1/nonfuncvol-sv01):338:log_raise_exception] : FA IL: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 368, in twrap tf(*aargs) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1987, in syncjob po = self.sync_engine(pb, self.log_err) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1444, in rsync rconf.ssh_ctl_args + \ AttributeError: 'NoneType' object has no attribute 'split' [2019-07-13 12:26:53.490714] I [repce(agent /gluster/brick1/nonfuncvol-sv01):97:service_loop] RepceServer: terminating on reaching EOF. [2019-07-13 12:26:53.494467] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change status=Faulty [2019-07-13 12:27:03.508502] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change status=Initializing... [root at rhs-gp-srv7 glusterfs]# #less geo-replication/nonfuncvol_rhs-gp-srv13.lab.eng.blr.redhat.com_nonfuncvol-slave/gsyncd.log [2019-07-13 13:33:48.859147] I [master(worker /gluster/brick1/nonfuncvol-sv01):1682:crawl] _GMaster: processing xsync changelog path=/var/lib/misc/gluster/gsyncd/nonfuncvol_rhs-gp-srv13.lab.eng.blr.redhat.com_nonfuncvol-slave/gluster- brick1-nonfuncvol-sv01/xsync/XSYNC-CHANGELOG.1563020888 [2019-07-13 13:40:39.412694] E [syncdutils(worker /gluster/brick3/nonfuncvol-sv04):338:log_raise_exception] : FA IL: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 368, in twrap tf(*aargs) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1987, in syncjob po = self.sync_engine(pb, self.log_err) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1444, in rsync rconf.ssh_ctl_args + \ AttributeError: 'NoneType' object has no attribute 'split' [2019-07-13 13:40:39.484643] I [repce(agent /gluster/brick3/nonfuncvol-sv04):97:service_loop] RepceServer: terminating on reaching EOF. Version-Release number of selected component (if applicable): ===================== 6.0.7 rsync-3.1.2-6.el7_6.1.x86_64 Steps to Reproduce: ==================== note: no brickmux enabled 1. created a 4x3 volume on 4 nodes , with below volume settings, which will act as master in georep Volume Name: nonfuncvol Type: Distributed-Replicate Volume ID: 4d44936f-312d-431a-905d-813e8ee63668 Status: Started Snapshot Count: 1 Number of Bricks: 4 x 3 = 12 Transport-type: tcp Bricks: Brick1: rhs-gp-srv5.lab.eng.blr.redhat.com:/gluster/brick1/nonfuncvol-sv01 Brick2: rhs-gp-srv6.lab.eng.blr.redhat.com:/gluster/brick1/nonfuncvol-sv01 Brick3: rhs-gp-srv7.lab.eng.blr.redhat.com:/gluster/brick1/nonfuncvol-sv01 Brick4: rhs-gp-srv8.lab.eng.blr.redhat.com:/gluster/brick1/nonfuncvol-sv02 Brick5: rhs-gp-srv5.lab.eng.blr.redhat.com:/gluster/brick2/nonfuncvol-sv02 Brick6: rhs-gp-srv6.lab.eng.blr.redhat.com:/gluster/brick2/nonfuncvol-sv02 Brick7: rhs-gp-srv7.lab.eng.blr.redhat.com:/gluster/brick2/nonfuncvol-sv03 Brick8: rhs-gp-srv8.lab.eng.blr.redhat.com:/gluster/brick2/nonfuncvol-sv03 Brick9: rhs-gp-srv5.lab.eng.blr.redhat.com:/gluster/brick3/nonfuncvol-sv03 Brick10: rhs-gp-srv6.lab.eng.blr.redhat.com:/gluster/brick3/nonfuncvol-sv04 Brick11: rhs-gp-srv7.lab.eng.blr.redhat.com:/gluster/brick3/nonfuncvol-sv04 Brick12: rhs-gp-srv8.lab.eng.blr.redhat.com:/gluster/brick3/nonfuncvol-sv04 Options Reconfigured: changelog.changelog: on geo-replication.ignore-pid-check: on geo-replication.indexing: on features.barrier: disable cluster.shd-max-threads: 24 client.event-threads: 8 server.event-threads: 8 transport.address-family: inet storage.fips-mode-rchecksum: on nfs.disable: on performance.client-io-threads: off cluster.enable-shared-storage: enable 2. mounted the volume on 10 clients, started capturing resource info on clients 3. created another 3node cluster to be used as slave, with a 4x(4+2) ecvol as slave 4. started IOs on clients of master, just linux untar 50 times from all clients 5. setup georep from master->slave 6. started georep only after about 4hrs so that master has some data to propagate. 7. left the setup for weekend. Actual results: =================== seen below issues 1) the volume has two main directories, called IOs and logs, with IOs directory being the place where all the workloads related IOs are happpening. logs directory is hosting a dedicated file for each client which is collecting the resource output every few minutes in append mode. The problem is till now, ie after about 3 days, the logs directory hasn't even been created 2) the syncing has been very slow paced, even after 3 days, slave is yet to catch up. Master had about 1.1TB data while slave has just about 350gb of data 3) I have seen some tracebacks in gsync log as below /var/log/glusterfs/geo-replication/nonfuncvol_rhs-gp-srv13.lab.eng.blr.redhat.com_nonfuncvol-slave/gsyncd.log-20190714 [2019-07-13 12:26:53.408348] E [syncdutils(worker /gluster/brick1/nonfuncvol-sv01):338:log_raise_exception] : FA IL: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 368, in twrap tf(*aargs) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1987, in syncjob po = self.sync_engine(pb, self.log_err) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1444, in rsync rconf.ssh_ctl_args + \ AttributeError: 'NoneType' object has no attribute 'split' [2019-07-13 12:26:53.490714] I [repce(agent /gluster/brick1/nonfuncvol-sv01):97:service_loop] RepceServer: terminating on reaching EOF. [2019-07-13 12:26:53.494467] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change status=Faulty [2019-07-13 12:27:03.508502] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change status=Initializing... [root at rhs-gp-srv7 glusterfs]# #less geo-replication/nonfuncvol_rhs-gp-srv13.lab.eng.blr.redhat.com_nonfuncvol-slave/gsyncd.log [2019-07-13 13:33:48.859147] I [master(worker /gluster/brick1/nonfuncvol-sv01):1682:crawl] _GMaster: processing xsync changelog path=/var/lib/misc/gluster/gsyncd/nonfuncvol_rhs-gp-srv13.lab.eng.blr.redhat.com_nonfuncvol-slave/gluster- brick1-nonfuncvol-sv01/xsync/XSYNC-CHANGELOG.1563020888 [2019-07-13 13:40:39.412694] E [syncdutils(worker /gluster/brick3/nonfuncvol-sv04):338:log_raise_exception] : FA IL: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 368, in twrap tf(*aargs) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1987, in syncjob po = self.sync_engine(pb, self.log_err) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1444, in rsync rconf.ssh_ctl_args + \ AttributeError: 'NoneType' object has no attribute 'split' [2019-07-13 13:40:39.484643] I [repce(agent /gluster/brick3/nonfuncvol-sv04):97:service_loop] RepceServer: terminating on reaching EOF. --- Additional comment from RHEL Product and Program Management on 2019-07-15 10:18:53 UTC --- This bug is automatically being proposed for the next minor release of Red Hat Gluster Storage by setting the release flag 'rhgs?3.5.0' to '?'. If this bug should be proposed for a different release, please manually change the proposed release flag. --- Additional comment from nchilaka on 2019-07-15 10:21:43 UTC --- proposing as blocker, as syncing is falling behind significantly, and also tracebacks seen. Can revisit based on RC from dev. ^C [root at rhs-gp-srv5 bricks]# date;gluster volume geo-replication nonfuncvol rhs-gp-srv13.lab.eng.blr.redhat.com::nonfuncvol-slave status Mon Jul 15 15:50:57 IST 2019 MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ rhs-gp-srv5.lab.eng.blr.redhat.com nonfuncvol /gluster/brick1/nonfuncvol-sv01 root rhs-gp-srv13.lab.eng.blr.redhat.com::nonfuncvol-slave rhs-gp-srv11.lab.eng.blr.redhat.com Passive N/A N/A rhs-gp-srv5.lab.eng.blr.redhat.com nonfuncvol /gluster/brick2/nonfuncvol-sv02 root rhs-gp-srv13.lab.eng.blr.redhat.com::nonfuncvol-slave rhs-gp-srv16.lab.eng.blr.redhat.com Active Hybrid Crawl N/A rhs-gp-srv5.lab.eng.blr.redhat.com nonfuncvol /gluster/brick3/nonfuncvol-sv03 root rhs-gp-srv13.lab.eng.blr.redhat.com::nonfuncvol-slave rhs-gp-srv13.lab.eng.blr.redhat.com Passive N/A N/A rhs-gp-srv7.lab.eng.blr.redhat.com nonfuncvol /gluster/brick1/nonfuncvol-sv01 root rhs-gp-srv13.lab.eng.blr.redhat.com::nonfuncvol-slave rhs-gp-srv13.lab.eng.blr.redhat.com Active Hybrid Crawl N/A rhs-gp-srv7.lab.eng.blr.redhat.com nonfuncvol /gluster/brick2/nonfuncvol-sv03 root rhs-gp-srv13.lab.eng.blr.redhat.com::nonfuncvol-slave rhs-gp-srv11.lab.eng.blr.redhat.com Passive N/A N/A rhs-gp-srv7.lab.eng.blr.redhat.com nonfuncvol /gluster/brick3/nonfuncvol-sv04 root rhs-gp-srv13.lab.eng.blr.redhat.com::nonfuncvol-slave rhs-gp-srv16.lab.eng.blr.redhat.com Passive N/A N/A rhs-gp-srv6.lab.eng.blr.redhat.com nonfuncvol /gluster/brick1/nonfuncvol-sv01 root rhs-gp-srv13.lab.eng.blr.redhat.com::nonfuncvol-slave rhs-gp-srv16.lab.eng.blr.redhat.com Passive N/A N/A rhs-gp-srv6.lab.eng.blr.redhat.com nonfuncvol /gluster/brick2/nonfuncvol-sv02 root rhs-gp-srv13.lab.eng.blr.redhat.com::nonfuncvol-slave rhs-gp-srv13.lab.eng.blr.redhat.com Passive N/A N/A rhs-gp-srv6.lab.eng.blr.redhat.com nonfuncvol /gluster/brick3/nonfuncvol-sv04 root rhs-gp-srv13.lab.eng.blr.redhat.com::nonfuncvol-slave rhs-gp-srv11.lab.eng.blr.redhat.com Passive N/A N/A rhs-gp-srv8.lab.eng.blr.redhat.com nonfuncvol /gluster/brick1/nonfuncvol-sv02 root rhs-gp-srv13.lab.eng.blr.redhat.com::nonfuncvol-slave rhs-gp-srv11.lab.eng.blr.redhat.com Passive N/A N/A rhs-gp-srv8.lab.eng.blr.redhat.com nonfuncvol /gluster/brick2/nonfuncvol-sv03 root rhs-gp-srv13.lab.eng.blr.redhat.com::nonfuncvol-slave rhs-gp-srv16.lab.eng.blr.redhat.com Active Hybrid Crawl N/A rhs-gp-srv8.lab.eng.blr.redhat.com nonfuncvol /gluster/brick3/nonfuncvol-sv04 root rhs-gp-srv13.lab.eng.blr.redhat.com::nonfuncvol-slave rhs-gp-srv13.lab.eng.blr.redhat.com Active Hybrid Crawl N/A [root at rhs-gp-srv5 bricks]# slave volinfo Volume Name: gluster_shared_storage Type: Replicate Volume ID: 34c52663-f47b-42e5-a33c-abe5d16382a8 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: rhs-gp-srv16.lab.eng.blr.redhat.com:/var/lib/glusterd/ss_brick Brick2: rhs-gp-srv11.lab.eng.blr.redhat.com:/var/lib/glusterd/ss_brick Brick3: rhs-gp-srv13.lab.eng.blr.redhat.com:/var/lib/glusterd/ss_brick Options Reconfigured: transport.address-family: inet storage.fips-mode-rchecksum: on nfs.disable: on performance.client-io-threads: off cluster.enable-shared-storage: enable Volume Name: nonfuncvol-slave Type: Distributed-Disperse Volume ID: b5753c86-ea76-4e0e-8306-acc1d5237ced Status: Started Snapshot Count: 0 Number of Bricks: 4 x (4 + 2) = 24 Transport-type: tcp Bricks: Brick1: rhs-gp-srv13.lab.eng.blr.redhat.com:/gluster/brick1/nonfuncvol-slave-sv1 Brick2: rhs-gp-srv11.lab.eng.blr.redhat.com:/gluster/brick1/nonfuncvol-slave-sv1 Brick3: rhs-gp-srv16.lab.eng.blr.redhat.com:/gluster/brick1/nonfuncvol-slave-sv1 Brick4: rhs-gp-srv13.lab.eng.blr.redhat.com:/gluster/brick2/nonfuncvol-slave-sv1 Brick5: rhs-gp-srv11.lab.eng.blr.redhat.com:/gluster/brick2/nonfuncvol-slave-sv1 Brick6: rhs-gp-srv16.lab.eng.blr.redhat.com:/gluster/brick2/nonfuncvol-slave-sv1 Brick7: rhs-gp-srv13.lab.eng.blr.redhat.com:/gluster/brick3/nonfuncvol-slave-sv2 Brick8: rhs-gp-srv11.lab.eng.blr.redhat.com:/gluster/brick3/nonfuncvol-slave-sv2 Brick9: rhs-gp-srv16.lab.eng.blr.redhat.com:/gluster/brick3/nonfuncvol-slave-sv2 Brick10: rhs-gp-srv13.lab.eng.blr.redhat.com:/gluster/brick4/nonfuncvol-slave-sv2 Brick11: rhs-gp-srv11.lab.eng.blr.redhat.com:/gluster/brick4/nonfuncvol-slave-sv2 Brick12: rhs-gp-srv16.lab.eng.blr.redhat.com:/gluster/brick4/nonfuncvol-slave-sv2 Brick13: rhs-gp-srv13.lab.eng.blr.redhat.com:/gluster/brick5/nonfuncvol-slave-sv3 Brick14: rhs-gp-srv11.lab.eng.blr.redhat.com:/gluster/brick5/nonfuncvol-slave-sv3 Brick15: rhs-gp-srv16.lab.eng.blr.redhat.com:/gluster/brick5/nonfuncvol-slave-sv3 Brick16: rhs-gp-srv13.lab.eng.blr.redhat.com:/gluster/brick6/nonfuncvol-slave-sv3 Brick17: rhs-gp-srv11.lab.eng.blr.redhat.com:/gluster/brick6/nonfuncvol-slave-sv3 Brick18: rhs-gp-srv16.lab.eng.blr.redhat.com:/gluster/brick6/nonfuncvol-slave-sv3 Brick19: rhs-gp-srv13.lab.eng.blr.redhat.com:/gluster/brick7/nonfuncvol-slave-sv4 Brick20: rhs-gp-srv11.lab.eng.blr.redhat.com:/gluster/brick7/nonfuncvol-slave-sv4 Brick21: rhs-gp-srv16.lab.eng.blr.redhat.com:/gluster/brick7/nonfuncvol-slave-sv4 Brick22: rhs-gp-srv13.lab.eng.blr.redhat.com:/gluster/brick8/nonfuncvol-slave-sv4 Brick23: rhs-gp-srv11.lab.eng.blr.redhat.com:/gluster/brick8/nonfuncvol-slave-sv4 Brick24: rhs-gp-srv16.lab.eng.blr.redhat.com:/gluster/brick8/nonfuncvol-slave-sv4 Options Reconfigured: features.read-only: on performance.quick-read: off transport.address-family: inet storage.fips-mode-rchecksum: on nfs.disable: on cluster.enable-shared-storage: enable --- Additional comment from nchilaka on 2019-07-15 10:46:07 UTC --- sosreports and logs @ http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/nchilaka/bug.1729915/ --- Additional comment from nchilaka on 2019-07-15 11:37:59 UTC --- do let me know if you need the setup, ASAP{can wait till EOD} else I would go ahead with further testing which may involve reconfiguring part/complete of testbed. --- Additional comment from Atin Mukherjee on 2019-07-15 12:21:25 UTC --- Hari - since Sunny is on PTO for this week and this is proposed as a blocker, can you please work on this bug? Please don't hesitate to contact Kotresh/Aravinda should you need any help. Nag - Please note that Sunny is on PTO, so we'd have to expect some delay in picking this up, till then don't destroy the setup. --- Additional comment from Atin Mukherjee on 2019-07-15 13:37:05 UTC --- (In reply to Atin Mukherjee from comment #5) > Hari - since Sunny is on PTO for this week and this is proposed as a > blocker, can you please work on this bug? Please don't hesitate to contact > Kotresh/Aravinda should you need any help. I see that Hari is on PTO as well till 17th. Aravinda - would you be able to assist here? Kotresh has couple of bugs in his plate which he's focusing on and hence requested for your help. > > Nag - Please note that Sunny is on PTO, so we'd have to expect some delay in > picking this up, till then don't destroy the setup. --- Additional comment from Rochelle on 2019-07-16 05:30:24 UTC --- I'm seeing this while running automation on the latest builds as well: [2019-07-15 10:31:26.713311] E [syncdutils(worker /bricks/brick0/master_brick0):338:log_raise_exception] : FAIL: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 368, in twrap tf(*aargs) File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1987, in syncjob po = self.sync_engine(pb, self.log_err) File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1444, in rsync rconf.ssh_ctl_args + \ AttributeError: 'NoneType' object has no attribute 'split' There was no functionality impact in my case. However there were additional 'No such file or directory' messages in brick logs: mnt-bricks-brick0-master_brick1.log-20190716:[2019-07-15 11:19:49.950747] E [fuse-bridge.c:220:check_and_dump_fuse_W] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fac24305b3b] (--> /usr/lib64/glusterfs/6.0/xlator/mount/fuse.so(+0x81c1)[0x7fac1b6b01c1] (--> /usr/lib64/glusterfs/6.0/xlator/mount/fuse.so(+0x8a9a)[0x7fac1b6b0a9a] (--> /lib64/libpthread.so.0(+0x7ea5)[0x7fac23142ea5] (--> /lib64/libc.so.6(clone+0x6d)[0x7fac22a088cd] ))))) 0-glusterfs-fuse: writing to fuse device failed: No such file or directory mnt-bricks-brick0-master_brick1.log-20190716:[2019-07-15 12:29:27.262096] E [fuse-bridge.c:220:check_and_dump_fuse_W] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7fe169d97b3b] (--> /usr/lib64/glusterfs/6.0/xlator/mount/fuse.so(+0x81c1)[0x7fe1611421c1] (--> /usr/lib64/glusterfs/6.0/xlator/mount/fuse.so(+0x8a9a)[0x7fe161142a9a] (--> /lib64/libpthread.so.0(+0x7ea5)[0x7fe168bd4ea5] (--> /lib64/libc.so.6(clone+0x6d)[0x7fe16849a8cd] ))))) 0-glusterfs-fuse: writing to fuse device failed: No such file or directory gluster-nagios-common-0.2.4-1.el7rhgs.noarch glusterfs-cli-6.0-7.el7rhgs.x86_64 vdsm-gluster-4.30.18-1.0.el7rhgs.x86_64 libvirt-daemon-driver-storage-gluster-4.5.0-23.el7.x86_64 glusterfs-client-xlators-6.0-7.el7rhgs.x86_64 glusterfs-6.0-7.el7rhgs.x86_64 glusterfs-server-6.0-7.el7rhgs.x86_64 glusterfs-events-6.0-7.el7rhgs.x86_64 glusterfs-api-6.0-7.el7rhgs.x86_64 glusterfs-geo-replication-6.0-7.el7rhgs.x86_64 gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64 python2-gluster-6.0-7.el7rhgs.x86_64 glusterfs-libs-6.0-7.el7rhgs.x86_64 glusterfs-fuse-6.0-7.el7rhgs.x86_64 glusterfs-rdma-6.0-7.el7rhgs.x86_64 The volume config was 1x3 on master and slave --- Additional comment from Rochelle on 2019-07-16 05:32:22 UTC --- I updated this BZ as the same traceback is seen in my case as well. Though wrt syncing, there was no issue --- Additional comment from Aravinda VK on 2019-07-16 06:02:17 UTC --- Getting a Forbidden error while accessing sosreport Forbidden You don't have permission to access /sosreports/nchilaka/bug.1729915/servers/master-nodes/rhs-gp-srv6.lab.eng.blr.redhat.com/sosreport-rhs-gp-srv6-georep-issues-15jul2019-2019-07-15-xiqycze.tar.xz on this server. --- Additional comment from nchilaka on 2019-07-16 06:04:13 UTC --- (In reply to Aravinda VK from comment #9) > Getting a Forbidden error while accessing sosreport > > Forbidden > > You don't have permission to access > /sosreports/nchilaka/bug.1729915/servers/master-nodes/rhs-gp-srv6.lab.eng. > blr.redhat.com/sosreport-rhs-gp-srv6-georep-issues-15jul2019-2019-07-15- > xiqycze.tar.xz on this server. you can retry now --- Additional comment from nchilaka on 2019-07-16 06:05:04 UTC --- (In reply to Atin Mukherjee from comment #6) > (In reply to Atin Mukherjee from comment #5) > > Hari - since Sunny is on PTO for this week and this is proposed as a > > blocker, can you please work on this bug? Please don't hesitate to contact > > Kotresh/Aravinda should you need any help. > > I see that Hari is on PTO as well till 17th. > > Aravinda - would you be able to assist here? Kotresh has couple of bugs in > his plate which he's focusing on and hence requested for your help. > > > > > Nag - Please note that Sunny is on PTO, so we'd have to expect some delay in > > picking this up, till then don't destroy the setup. understood, acked --- Additional comment from Aravinda VK on 2019-07-16 06:06:14 UTC --- (In reply to nchilaka from comment #10) > (In reply to Aravinda VK from comment #9) > > Getting a Forbidden error while accessing sosreport > > > > Forbidden > > > > You don't have permission to access > > /sosreports/nchilaka/bug.1729915/servers/master-nodes/rhs-gp-srv6.lab.eng. > > blr.redhat.com/sosreport-rhs-gp-srv6-georep-issues-15jul2019-2019-07-15- > > xiqycze.tar.xz on this server. > > you can retry now I can access now. Thanks --- Additional comment from Aravinda VK on 2019-07-17 13:09:48 UTC --- Hi Nag/Rochelle, I failed to reproduce in my setup. Also looked at the setup to find any issue but couldn't find any. Please provide the steps if you have reproducer. One suspect is that gconf.get() is called before gconf is loaded. Still looking for the possible window when this can happen. --- Additional comment from Aravinda VK on 2019-07-17 13:41:44 UTC --- Another possible race condition between gconf.getr and gconf.get gconf.getr gets the realtime config value, that means it checks the config file on disk and gets the value. While gconf.getr loads the latest config, if other thread calls `gconf.get`, then it may get `None` as value. ``` getr: values = {} load_again() ``` If this is the issue, the fix should be to add thread lock for the `gconf._load` function. --- Additional comment from Sunil Kumar Acharya on 2019-07-18 06:51:48 UTC --- Placing needinfo on the comment 13 --- Additional comment from Aravinda VK on 2019-07-18 08:11:17 UTC --- Example program to reproduce the issue: Three threads, one thread calls `gconf.getr` config and two threads call `gconf.get`. Create this file in `/usr/libexec/glusterfs/python/syncdaemon/` directory ``` from threading import Thread import gsyncdconfig as gconf import time def worker(): while True: o = gconf.get("rsync-ssh-options") print("Worker 1 rsync-ssh-options=%s" % o) o.split() time.sleep(0.25) def worker2(): while True: o = gconf.getr("log-rsync-performance") print("Worker 2 log-rsync-performance=%s" % o) time.sleep(0.5) gconf.load("/etc/glusterfs/gsyncd.conf", "sample.conf") t1 = Thread(target=worker) t2 = Thread(target=worker) t3 = Thread(target=worker2) t1.start() t2.start() t3.start() t1.join() t2.join() t3.join() ``` Run the script using `python3 crash_reproduce.py` and in another terminal, run the following ``` for i in {1..100}; do echo "[vars] log-rsync-performance=True rsync-ssh-options=" > /usr/libexec/glusterfs/python/syncdaemon/sample.conf; sleep 1; done ``` In between, it produces the following traceback, ``` Exception in thread Thread-1: Traceback (most recent call last): File "/usr/lib64/python3.7/threading.py", line 917, in _bootstrap_inner self.run() File "/usr/lib64/python3.7/threading.py", line 865, in run self._target(*self._args, **self._kwargs) File "crash_reproduce.py", line 10, in worker o.split() AttributeError: 'NoneType' object has no attribute 'split' ``` I am working on adding thread lock to fix the issue. I will send upstream patch soon. --- Additional comment from Aravinda VK on 2019-07-18 14:43:05 UTC --- Number of worker restarts: ``` for i in {5..8}; do echo rhs-gp-srv${i}.lab.eng.blr.redhat.com; ssh root at rhs-gp-srv${i}.lab.eng.blr.redhat.com grep "Agent listining..." /var/log/glusterfs/geo-replication/nonfuncvol_rhs-gp-srv13.lab.eng.blr.redhat.com_nonfuncvol-slave/gsyncd.log | awk '{print $5}' | sed 's/):72:__init__]//' | sort | uniq -c; done ``` rhs-gp-srv5.lab.eng.blr.redhat.com 84 /gluster/brick1/nonfuncvol-sv01 61 /gluster/brick3/nonfuncvol-sv03 rhs-gp-srv6.lab.eng.blr.redhat.com 48 /gluster/brick1/nonfuncvol-sv01 48 /gluster/brick2/nonfuncvol-sv02 71 /gluster/brick3/nonfuncvol-sv04 rhs-gp-srv7.lab.eng.blr.redhat.com 3 /gluster/brick1/nonfuncvol-sv01 68 /gluster/brick2/nonfuncvol-sv03 34 /gluster/brick3/nonfuncvol-sv04 rhs-gp-srv8.lab.eng.blr.redhat.com 38 /gluster/brick1/nonfuncvol-sv02 2 /gluster/brick3/nonfuncvol-sv04 Number of times worker failed with Python traceback: ``` for i in {5..8}; do echo rhs-gp-srv${i}.lab.eng.blr.redhat.com; ssh root at rhs-gp-srv${i}.lab.eng.blr.redhat.com grep "split" -R /var/log/glusterfs/geo-replication/ | sort | uniq -c; done ``` rhs-gp-srv5.lab.eng.blr.redhat.com 1 /var/log/glusterfs/geo-replication/nonfuncvol_rhs-gp-srv13.lab.eng.blr.redhat.com_nonfuncvol-slave/gsyncd.log-20190714:AttributeError: 'NoneType' object has no attribute 'split' rhs-gp-srv6.lab.eng.blr.redhat.com 3 /var/log/glusterfs/geo-replication/nonfuncvol_rhs-gp-srv13.lab.eng.blr.redhat.com_nonfuncvol-slave/gsyncd.log:AttributeError: 'NoneType' object has no attribute 'split' rhs-gp-srv7.lab.eng.blr.redhat.com 3 /var/log/glusterfs/geo-replication/nonfuncvol_rhs-gp-srv13.lab.eng.blr.redhat.com_nonfuncvol-slave/gsyncd.log:AttributeError: 'NoneType' object has no attribute 'split' rhs-gp-srv8.lab.eng.blr.redhat.com Out of many restarts(400+) only 7 times restarted due to Python traceback. Other reasons observed: Workers failed with ENOTCONN ``` for i in {5..8}; do echo rhs-gp-srv${i}.lab.eng.blr.redhat.com; ssh root at rhs-gp-srv${i}.lab.eng.blr.redhat.com 'grep -R "error=ENOTCONN" /var/log/glusterfs/geo-replication/nonfuncvol_rhs-gp-srv13.lab.eng.blr.redhat.com_nonfuncvol-slave/gsyncd* | wc -l'; done ``` rhs-gp-srv5.lab.eng.blr.redhat.com 43 rhs-gp-srv6.lab.eng.blr.redhat.com 54 rhs-gp-srv7.lab.eng.blr.redhat.com 41 rhs-gp-srv8.lab.eng.blr.redhat.com 24 Many errors found in master mount logs(/var/log/glusterfs/geo-replication/nonfuncvol_rhs-gp-srv13.lab.eng.blr.redhat.com_nonfuncvol-slave/mnt-gluster-*) [2019-07-18 01:41:24.349009] W [MSGID: 108027] [afr-common.c:2268:afr_attempt_readsubvol_set] 0-nonfuncvol-replicate-2: no read subvols for (null) [2019-07-18 01:41:24.369058] I [MSGID: 109063] [dht-layout.c:650:dht_layout_normalize] 0-nonfuncvol-dht: Found anomalies in (null) (gfid = ec3d7d53-51ae-4cdf-bbef-e3de147b4e75). Holes=1 overlaps=0 [2019-07-18 01:41:24.372697] W [MSGID: 108027] [afr-common.c:2268:afr_attempt_readsubvol_set] 0-nonfuncvol-replicate-2: no read subvols for /IOs [2019-07-18 01:41:24.372755] I [MSGID: 109063] [dht-layout.c:650:dht_layout_normalize] 0-nonfuncvol-dht: Found anomalies in /IOs (gfid = 00000000-0000-0000-0000-000000000000). Holes=1 overlaps=0 [2019-07-18 01:41:24.372805] W [MSGID: 109005] [dht-selfheal.c:2143:dht_selfheal_directory] 0-nonfuncvol-dht: /IOs: Directory selfheal failed: 1 subvolumes down.Not fixing. gfid = c44c6cef-6c76-4d3e-a56c-db21c71450be [2019-07-18 01:41:24.374539] W [MSGID: 108027] [afr-common.c:2268:afr_attempt_readsubvol_set] 0-nonfuncvol-replicate-2: no read subvols for /IOs/kernel [2019-07-18 01:41:24.384280] I [MSGID: 109063] [dht-layout.c:650:dht_layout_normalize] 0-nonfuncvol-dht: Found anomalies in /IOs/kernel (gfid = 00000000-0000-0000-0000-000000000000). Holes=1 overlaps=0 [2019-07-18 01:41:24.384340] W [MSGID: 109005] [dht-selfheal.c:2143:dht_selfheal_directory] 0-nonfuncvol-dht: /IOs/kernel: Directory selfheal failed: 1 subvolumes down.Not fixing. gfid = c9076528-fea7-459c-b29a-2364b1f43bdc [2019-07-18 01:41:24.386233] W [MSGID: 108027] [afr-common.c:2268:afr_attempt_readsubvol_set] 0-nonfuncvol-replicate-2: no read subvols for /IOs/kernel/dhcp42-223.lab.eng.blr.redhat.com [2019-07-18 01:41:24.386270] I [MSGID: 109063] [dht-layout.c:650:dht_layout_normalize] 0-nonfuncvol-dht: Found anomalies in /IOs/kernel/dhcp42-223.lab.eng.blr.redhat.com (gfid = 00000000-0000-0000-0000-000000000000). Holes=1 overlaps=0 [2019-07-18 01:41:24.386302] W [MSGID: 109005] [dht-selfheal.c:2143:dht_selfheal_directory] 0-nonfuncvol-dht: /IOs/kernel/dhcp42-223.lab.eng.blr.redhat.com: Directory selfheal failed: 1 subvolumes down.Not fixing. gfid = be25fe8e-501a-485b-9b39-7caf2255ff6d [2019-07-18 01:41:24.386419] W [MSGID: 109011] [dht-layout.c:152:dht_layout_search] 0-nonfuncvol-dht: no subvolume for hash (value) = 1716939520 [2019-07-18 01:41:24.387535] W [MSGID: 108027] [afr-common.c:2268:afr_attempt_readsubvol_set] 0-nonfuncvol-replicate-2: no read subvols for /IOs/kernel/dhcp42-223.lab.eng.blr.redhat.com/dir.24 [2019-07-18 01:41:24.388206] I [MSGID: 109063] [dht-layout.c:650:dht_layout_normalize] 0-nonfuncvol-dht: Found anomalies in /IOs/kernel/dhcp42-223.lab.eng.blr.redhat.com/dir.24 (gfid = 00000000-0000-0000-0000-000000000000). Holes=1 overlaps=0 [2019-07-18 01:41:24.388267] W [MSGID: 109005] [dht-selfheal.c:2143:dht_selfheal_directory] 0-nonfuncvol-dht: /IOs/kernel/dhcp42-223.lab.eng.blr.redhat.com/dir.24: Directory selfheal failed: 1 subvolumes down.Not fixing. gfid = d89f373a-08f4-42ab-b1ad-c60c92ee5181 I will send a patch to fix the Python traceback But need help from DHT/AFR team to root cause ENOTCONN and Directory self heal failed errors. --- Additional comment from Atin Mukherjee on 2019-07-19 03:58:42 UTC --- Susant, Karthik - please have a look at comment 17. --- Additional comment from Susant Kumar Palai on 2019-07-19 06:23:27 UTC --- (In reply to Aravinda VK from comment #17) > Number of worker restarts: > > ``` > for i in {5..8}; > do > echo rhs-gp-srv${i}.lab.eng.blr.redhat.com; > ssh root at rhs-gp-srv${i}.lab.eng.blr.redhat.com grep "Agent listining..." > /var/log/glusterfs/geo-replication/nonfuncvol_rhs-gp-srv13.lab.eng.blr. > redhat.com_nonfuncvol-slave/gsyncd.log | awk '{print $5}' | sed > 's/):72:__init__]//' | sort | uniq -c; > done > ``` > > rhs-gp-srv5.lab.eng.blr.redhat.com > 84 /gluster/brick1/nonfuncvol-sv01 > 61 /gluster/brick3/nonfuncvol-sv03 > rhs-gp-srv6.lab.eng.blr.redhat.com > 48 /gluster/brick1/nonfuncvol-sv01 > 48 /gluster/brick2/nonfuncvol-sv02 > 71 /gluster/brick3/nonfuncvol-sv04 > rhs-gp-srv7.lab.eng.blr.redhat.com > 3 /gluster/brick1/nonfuncvol-sv01 > 68 /gluster/brick2/nonfuncvol-sv03 > 34 /gluster/brick3/nonfuncvol-sv04 > rhs-gp-srv8.lab.eng.blr.redhat.com > 38 /gluster/brick1/nonfuncvol-sv02 > 2 /gluster/brick3/nonfuncvol-sv04 > > Number of times worker failed with Python traceback: > > ``` > for i in {5..8}; do echo rhs-gp-srv${i}.lab.eng.blr.redhat.com; ssh > root at rhs-gp-srv${i}.lab.eng.blr.redhat.com grep "split" -R > /var/log/glusterfs/geo-replication/ | sort | uniq -c; done > ``` > > rhs-gp-srv5.lab.eng.blr.redhat.com > 1 > /var/log/glusterfs/geo-replication/nonfuncvol_rhs-gp-srv13.lab.eng.blr. > redhat.com_nonfuncvol-slave/gsyncd.log-20190714:AttributeError: 'NoneType' > object has no attribute 'split' > rhs-gp-srv6.lab.eng.blr.redhat.com > 3 > /var/log/glusterfs/geo-replication/nonfuncvol_rhs-gp-srv13.lab.eng.blr. > redhat.com_nonfuncvol-slave/gsyncd.log:AttributeError: 'NoneType' object has > no attribute 'split' > rhs-gp-srv7.lab.eng.blr.redhat.com > 3 > /var/log/glusterfs/geo-replication/nonfuncvol_rhs-gp-srv13.lab.eng.blr. > redhat.com_nonfuncvol-slave/gsyncd.log:AttributeError: 'NoneType' object has > no attribute 'split' > rhs-gp-srv8.lab.eng.blr.redhat.com > > Out of many restarts(400+) only 7 times restarted due to Python traceback. > > Other reasons observed: > > Workers failed with ENOTCONN > > ``` > for i in {5..8}; do echo rhs-gp-srv${i}.lab.eng.blr.redhat.com; ssh > root at rhs-gp-srv${i}.lab.eng.blr.redhat.com 'grep -R "error=ENOTCONN" > /var/log/glusterfs/geo-replication/nonfuncvol_rhs-gp-srv13.lab.eng.blr. > redhat.com_nonfuncvol-slave/gsyncd* | wc -l'; done > ``` > rhs-gp-srv5.lab.eng.blr.redhat.com > 43 > rhs-gp-srv6.lab.eng.blr.redhat.com > 54 > rhs-gp-srv7.lab.eng.blr.redhat.com > 41 > rhs-gp-srv8.lab.eng.blr.redhat.com > 24 > > Many errors found in master mount > logs(/var/log/glusterfs/geo-replication/nonfuncvol_rhs-gp-srv13.lab.eng.blr. > redhat.com_nonfuncvol-slave/mnt-gluster-*) > > [2019-07-18 01:41:24.349009] W [MSGID: 108027] > [afr-common.c:2268:afr_attempt_readsubvol_set] 0-nonfuncvol-replicate-2: no > read subvols for (null) > [2019-07-18 01:41:24.369058] I [MSGID: 109063] > [dht-layout.c:650:dht_layout_normalize] 0-nonfuncvol-dht: Found anomalies in > (null) (gfid = ec3d7d53-51ae-4cdf-bbef-e3de147b4e75). Holes=1 overlaps=0 > [2019-07-18 01:41:24.372697] W [MSGID: 108027] > [afr-common.c:2268:afr_attempt_readsubvol_set] 0-nonfuncvol-replicate-2: no > read subvols for /IOs > [2019-07-18 01:41:24.372755] I [MSGID: 109063] > [dht-layout.c:650:dht_layout_normalize] 0-nonfuncvol-dht: Found anomalies in > /IOs (gfid = 00000000-0000-0000-0000-000000000000). Holes=1 overlaps=0 > [2019-07-18 01:41:24.372805] W [MSGID: 109005] > [dht-selfheal.c:2143:dht_selfheal_directory] 0-nonfuncvol-dht: /IOs: > Directory selfheal failed: 1 subvolumes down.Not fixing. gfid = > c44c6cef-6c76-4d3e-a56c-db21c71450be > [2019-07-18 01:41:24.374539] W [MSGID: 108027] > [afr-common.c:2268:afr_attempt_readsubvol_set] 0-nonfuncvol-replicate-2: no > read subvols for /IOs/kernel > [2019-07-18 01:41:24.384280] I [MSGID: 109063] > [dht-layout.c:650:dht_layout_normalize] 0-nonfuncvol-dht: Found anomalies in > /IOs/kernel (gfid = 00000000-0000-0000-0000-000000000000). Holes=1 overlaps=0 > [2019-07-18 01:41:24.384340] W [MSGID: 109005] > [dht-selfheal.c:2143:dht_selfheal_directory] 0-nonfuncvol-dht: /IOs/kernel: > Directory selfheal failed: 1 subvolumes down.Not fixing. gfid = > c9076528-fea7-459c-b29a-2364b1f43bdc > [2019-07-18 01:41:24.386233] W [MSGID: 108027] > [afr-common.c:2268:afr_attempt_readsubvol_set] 0-nonfuncvol-replicate-2: no > read subvols for /IOs/kernel/dhcp42-223.lab.eng.blr.redhat.com > [2019-07-18 01:41:24.386270] I [MSGID: 109063] > [dht-layout.c:650:dht_layout_normalize] 0-nonfuncvol-dht: Found anomalies in > /IOs/kernel/dhcp42-223.lab.eng.blr.redhat.com (gfid = > 00000000-0000-0000-0000-000000000000). Holes=1 overlaps=0 > [2019-07-18 01:41:24.386302] W [MSGID: 109005] > [dht-selfheal.c:2143:dht_selfheal_directory] 0-nonfuncvol-dht: > /IOs/kernel/dhcp42-223.lab.eng.blr.redhat.com: Directory selfheal failed: 1 > subvolumes down.Not fixing. gfid = be25fe8e-501a-485b-9b39-7caf2255ff6d > [2019-07-18 01:41:24.386419] W [MSGID: 109011] > [dht-layout.c:152:dht_layout_search] 0-nonfuncvol-dht: no subvolume for hash > (value) = 1716939520 > [2019-07-18 01:41:24.387535] W [MSGID: 108027] > [afr-common.c:2268:afr_attempt_readsubvol_set] 0-nonfuncvol-replicate-2: no > read subvols for /IOs/kernel/dhcp42-223.lab.eng.blr.redhat.com/dir.24 > [2019-07-18 01:41:24.388206] I [MSGID: 109063] > [dht-layout.c:650:dht_layout_normalize] 0-nonfuncvol-dht: Found anomalies in > /IOs/kernel/dhcp42-223.lab.eng.blr.redhat.com/dir.24 (gfid = > 00000000-0000-0000-0000-000000000000). Holes=1 overlaps=0 > [2019-07-18 01:41:24.388267] W [MSGID: 109005] > [dht-selfheal.c:2143:dht_selfheal_directory] 0-nonfuncvol-dht: > /IOs/kernel/dhcp42-223.lab.eng.blr.redhat.com/dir.24: Directory selfheal > failed: 1 subvolumes down.Not fixing. gfid = > d89f373a-08f4-42ab-b1ad-c60c92ee5181 > > > I will send a patch to fix the Python traceback But need help from DHT/AFR > team to root cause ENOTCONN and Directory self heal failed errors. DHT does not heal directories when there is a subvolume down. But here is the thing, if DHT is seeing a ENOTCONN error that means the entire replica set was disconnected. Either the bricks were really down(possibly crash) or there was a genuine network issue. Susant --- Additional comment from nchilaka on 2019-07-24 11:12:22 UTC --- (In reply to Aravinda VK from comment #13) > Hi Nag/Rochelle, > > I failed to reproduce in my setup. Also looked at the setup to find any > issue but couldn't find any. Please provide the steps if you have reproducer. > > One suspect is that gconf.get() is called before gconf is loaded. Still > looking for the possible window when this can happen. The steps to reproduce have been mentioned in c#0 in detail nothing additional to add. --- Additional comment from nchilaka on 2019-07-24 11:12:56 UTC --- adding back needinfo on other requestees --- Additional comment from nchilaka on 2019-07-24 11:13:18 UTC --- adding back needinfo on other requestees --- Additional comment from Karthik U S on 2019-07-24 11:30:56 UTC --- The need-info was raised for the ENOTCONN errors seen during directory self heal, for which Susant had responded in comment #19. @Aravinda did you get a chance to check this? AFR was failing to set the read subvols which indicates either there is no readable copy available due to heal pending on all the bricks for the entry or all the bricks being down or n/w issue as Susant mentioned. --- Additional comment from Sunil Kumar Acharya on 2019-07-25 09:30:24 UTC --- Mohit, Setting needinfo to check ENOTCONN errors we are seeing. --- Additional comment from Kotresh HR on 2019-07-30 15:16:23 UTC --- This looks to me unstable setup. I could not find the geo-rep slave logs and few gluster mount logs are missing. But from what ever logs available from master, I could see lot number of ping timer expiration and disconnects. Sample log output of disconnects: [root at kotresh Master $]find . | grep geo-replication | xargs grep "last 42 seconds, disconnecting" 2>/dev/null | tail -f ./sosreport-rhs-gp-srv8-georep-issues-15jul2019-2019-07-15-nsbwhbp/var/log/glusterfs/geo-replication/nonfuncvol_rhs-gp-srv13.lab.eng.blr.redhat.com_nonfuncvol-slave/mnt-gluster-brick2-nonfuncvol-sv03.log:[2019-07-15 08:29:11.761442] C [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired] 0-nonfuncvol-client-7: server 10.70.36.12:49153 has not responded in the last 42 seconds, disconnecting. ./sosreport-rhs-gp-srv8-georep-issues-15jul2019-2019-07-15-nsbwhbp/var/log/glusterfs/geo-replication/nonfuncvol_rhs-gp-srv13.lab.eng.blr.redhat.com_nonfuncvol-slave/mnt-gluster-brick2-nonfuncvol-sv03.log:[2019-07-15 08:32:09.811456] C [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired] 0-nonfuncvol-client-7: server 10.70.36.12:49153 has not responded in the last 42 seconds, disconnecting. ./sosreport-rhs-gp-srv8-georep-issues-15jul2019-2019-07-15-nsbwhbp/var/log/glusterfs/geo-replication/nonfuncvol_rhs-gp-srv13.lab.eng.blr.redhat.com_nonfuncvol-slave/mnt-gluster-brick2-nonfuncvol-sv03.log:[2019-07-15 08:33:26.833158] C [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired] 0-nonfuncvol-client-7: server 10.70.36.12:49153 has not responded in the last 42 seconds, disconnecting. Number of disconnects from available logs on Master cluster: [root at kotresh Master $]find . | grep geo-replication | xargs grep "last 42 seconds, disconnecting" 2>/dev/null | wc -l 51729 [root at kotresh Master $] 51729 is a huge number of disconnects. Please redo the test with same volume type combination and see if you still observe the issue. --- Additional comment from nchilaka on 2019-07-31 06:33:37 UTC --- (In reply to Kotresh HR from comment #25) > This looks to me unstable setup. I could not find the geo-rep slave logs and > few gluster mount logs are missing. > But from what ever logs available from master, I could see lot number of > ping timer expiration and disconnects. > > Sample log output of disconnects: > > [root at kotresh Master $]find . | grep geo-replication | xargs grep "last 42 > seconds, disconnecting" 2>/dev/null | tail -f > ./sosreport-rhs-gp-srv8-georep-issues-15jul2019-2019-07-15-nsbwhbp/var/log/ > glusterfs/geo-replication/nonfuncvol_rhs-gp-srv13.lab.eng.blr.redhat. > com_nonfuncvol-slave/mnt-gluster-brick2-nonfuncvol-sv03.log:[2019-07-15 > 08:29:11.761442] C [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired] > 0-nonfuncvol-client-7: server 10.70.36.12:49153 has not responded in the > last 42 seconds, disconnecting. > ./sosreport-rhs-gp-srv8-georep-issues-15jul2019-2019-07-15-nsbwhbp/var/log/ > glusterfs/geo-replication/nonfuncvol_rhs-gp-srv13.lab.eng.blr.redhat. > com_nonfuncvol-slave/mnt-gluster-brick2-nonfuncvol-sv03.log:[2019-07-15 > 08:32:09.811456] C [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired] > 0-nonfuncvol-client-7: server 10.70.36.12:49153 has not responded in the > last 42 seconds, disconnecting. > ./sosreport-rhs-gp-srv8-georep-issues-15jul2019-2019-07-15-nsbwhbp/var/log/ > glusterfs/geo-replication/nonfuncvol_rhs-gp-srv13.lab.eng.blr.redhat. > com_nonfuncvol-slave/mnt-gluster-brick2-nonfuncvol-sv03.log:[2019-07-15 > 08:33:26.833158] C [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired] > 0-nonfuncvol-client-7: server 10.70.36.12:49153 has not responded in the > last 42 seconds, disconnecting. > > Number of disconnects from available logs on Master cluster: > [root at kotresh Master $]find . | grep geo-replication | xargs grep "last 42 > seconds, disconnecting" 2>/dev/null | wc -l > 51729 > [root at kotresh Master $] > > 51729 is a huge number of disconnects. > > Please redo the test with same volume type combination and see if you still > observe the issue. have restarted the tests --- Additional comment from nchilaka on 2019-08-01 07:40:57 UTC --- I have reproduced the issue where one of the main parent directory is not yet synced and also the slave sync is lagging behind a lot as previously one of the parent directory is not yet created even after about close to 4days eg: I had 2 IO patterns running, linux untar and small files however smallfiles dir is still not created on salve [root at rhs-gp-srv11 IOs]# pwd /mnt/slave-mount/IOs [root at rhs-gp-srv11 IOs]# ls kernel also data on master is about 1.4tb and on slave is less than 300gb cluster details: master node: rhs-gp-srv5 slave:rhs-gp-srv11 (mounted the slave vol locally on this itself) Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1729915 [Bug 1729915] geo-rep syncing significantly behind and also only one of the directories are synced with tracebacks seen -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 16 15:44:42 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 16 Aug 2019 15:44:42 +0000 Subject: [Bugs] [Bug 1740413] Gluster volume bricks crashes when running a security scan on glusterfs ports In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1740413 --- Comment #3 from Marvin --- Hello, sorry for the delay on providing this log. I still don't have more information about the scripts used on the security scan but I will provide it as soon as I get it from the security team. Thank you for your help. Brick log with TRACE log level enabled (Part 1): [2019-08-04 10:24:04.372080] T [MSGID: 0] [posix.c:4388:pl_statfs] 0-stack-trace: stack-address: 0x7fb188000aa8, winding from shared-locks to shared-access-control [2019-08-04 10:24:04.372096] T [MSGID: 0] [defaults.c:3158:default_statfs] 0-stack-trace: stack-address: 0x7fb188000aa8, winding from shared-access-control to shared-bitrot-stub [2019-08-04 10:24:04.372110] T [MSGID: 0] [defaults.c:3158:default_statfs] 0-stack-trace: stack-address: 0x7fb188000aa8, winding from shared-bitrot-stub to shared-changelog [2019-08-04 10:24:04.372136] T [MSGID: 0] [defaults.c:3158:default_statfs] 0-stack-trace: stack-address: 0x7fb188000aa8, winding from shared-changelog to shared-trash [2019-08-04 10:24:04.372150] T [MSGID: 0] [defaults.c:3158:default_statfs] 0-stack-trace: stack-address: 0x7fb188000aa8, winding from shared-trash to shared-posix [2019-08-04 10:24:04.372211] T [MSGID: 0] [posix-inode-fd-ops.c:2287:posix_statfs] 0-stack-trace: stack-address: 0x7fb188000aa8, shared-posix returned 0 [2019-08-04 10:24:04.372228] T [MSGID: 0] [posix.c:4379:pl_statfs_cbk] 0-stack-trace: stack-address: 0x7fb188000aa8, shared-locks returned 0 [2019-08-04 10:24:04.372242] T [MSGID: 0] [upcall.c:1211:up_statfs_cbk] 0-stack-trace: stack-address: 0x7fb188000aa8, shared-upcall returned 0 [2019-08-04 10:24:04.372255] T [MSGID: 0] [defaults.c:1642:default_statfs_cbk] 0-stack-trace: stack-address: 0x7fb188000aa8, shared-io-threads returned 0 [2019-08-04 10:24:04.372282] T [rpcsvc.c:1533:rpcsvc_submit_generic] 0-rpc-service: Tx message: 108 [2019-08-04 10:24:04.372298] T [rpcsvc.c:1069:rpcsvc_record_build_header] 0-rpc-service: Reply fraglen 132, payload: 108, rpc hdr: 24 [2019-08-04 10:24:04.372348] T [rpcsvc.c:1585:rpcsvc_submit_generic] 0-rpc-service: submitted reply for rpc-message (XID: 0x1d89a, Program: GlusterFS 4.x v1, ProgVers: 400, Proc: 14) to rpc-transport (tcp.shared-server) [2019-08-04 10:24:04.372436] D [client_t.c:433:gf_client_unref] (-->/usr/lib64/glusterfs/6.1/xlator/protocol/server.so(+0x5afc3) [0x7fb1c2a7efc3] -->/usr/lib64/glusterfs/6.1/xlator/protocol/server.so(+0xadeb) [0x7fb1c2a2edeb] -->/lib64/libglusterfs.so.0(gf_client_unref+0x7b) [0x7fb1d7b0fd2b] ) 0-client_t: CTX_ID:ca89ac1a-1d4d-4aea-b264-fce7b1378aa4-GRAPH_ID:2-PID:22025-HOST:gfs-migration701-PC_NAME:shared-client-1-RECON_NO:-2: ref-count 1 [2019-08-04 10:24:14.266296] D [logging.c:2006:_gf_msg_internal] 0-logging-infra: Buffer overflow of a buffer whose size limit is 5. About to flush least recently used log message to disk [2019-08-04 10:24:04.372271] T [MSGID: 0] [io-stats.c:2354:io_stats_statfs_cbk] 0-stack-trace: stack-address: 0x7fb188000aa8, shared-io-stats returned 0 [2019-08-04 10:24:14.266290] T [MSGID: 0] [posix-helpers.c:1469:posix_janitor_task] 0-shared-posix: janitor cleaning out /opt/data/shared/.glusterfs/landfill [2019-08-04 10:24:14.266439] D [MSGID: 0] [posix-metadata.c:118:posix_fetch_mdata_xattr] 0-shared-posix: No such attribute:trusted.glusterfs.mdata for file /opt/data/shared/.glusterfs/landfill gfid: null [2019-08-04 10:24:14.372639] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:11) in:1, out:0, err:0 [2019-08-04 10:24:14.372670] T [socket.c:2910:socket_event_handler] 0-tcp.shared-server: Server socket (11) is already connected [2019-08-04 10:24:14.372676] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:14.372685] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:14.372699] T [rpcsvc.c:744:rpcsvc_handle_rpc_call] 0-rpcsvc: Client port: 49090 [2019-08-04 10:24:14.372706] T [rpcsvc-auth.c:445:rpcsvc_auth_request_init] 0-rpc-service: Auth handler: AUTH_GLUSTERFS-v3 [2019-08-04 10:24:14.372713] T [rpcsvc.c:549:rpcsvc_request_create] 0-rpc-service: received rpc-message (XID: 0x120987, Ver: 2, Program: 1298437, ProgVers: 400, Proc: 14) from rpc-transport (tcp.shared-server) [2019-08-04 10:24:14.372724] T [auth-glusterfs.c:363:auth_glusterfs_v3_authenticate] 0-rpc-service: Auth Info: pid: 8258, uid: 694, gid: 692, owner: 0000000000000000, flags: 0 [2019-08-04 10:24:14.372729] T [rpcsvc.c:375:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterFS 4.x v1 - STATFS for 10.7.1.209:49090 [2019-08-04 10:24:14.372744] T [socket.c:2928:socket_event_handler] 0-tcp.shared-server: (sock:11) socket_event_poll_in returned 0 [2019-08-04 10:24:14.372770] T [rpcsvc.c:375:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterFS 4.x v1 - STATFS for 10.7.1.209:49090 [2019-08-04 10:24:14.372848] D [client_t.c:324:gf_client_ref] (-->/usr/lib64/glusterfs/6.1/xlator/protocol/server.so(+0x34dd5) [0x7fb1c2a58dd5] -->/usr/lib64/glusterfs/6.1/xlator/protocol/server.so(+0x115ed) [0x7fb1c2a355ed] -->/lib64/libglusterfs.so.0(gf_client_ref+0x6e) [0x7fb1d7b0fbde] ) 0-client_t: CTX_ID:ca89ac1a-1d4d-4aea-b264-fce7b1378aa4-GRAPH_ID:2-PID:22025-HOST:gfs-migration701-PC_NAME:shared-client-1-RECON_NO:-2: ref-count 2 [2019-08-04 10:24:14.372866] T [MSGID: 0] [server-rpc-fops_v2.c:2802:server4_statfs_resume] 0-stack-trace: stack-address: 0x7fb188000aa8, winding from shared-server to /opt/data/shared [2019-08-04 10:24:14.372879] T [MSGID: 0] [defaults.c:3158:default_statfs] 0-stack-trace: stack-address: 0x7fb188000aa8, winding from /opt/data/shared to shared-io-stats [2019-08-04 10:24:14.372887] T [MSGID: 0] [io-stats.c:2906:io_stats_statfs] 0-stack-trace: stack-address: 0x7fb188000aa8, winding from shared-io-stats to shared-quota [2019-08-04 10:24:14.372895] T [MSGID: 0] [quota.c:4557:quota_statfs] 0-stack-trace: stack-address: 0x7fb188000aa8, winding from shared-quota to shared-index [2019-08-04 10:24:14.372902] T [MSGID: 0] [defaults.c:3158:default_statfs] 0-stack-trace: stack-address: 0x7fb188000aa8, winding from shared-index to shared-barrier [2019-08-04 10:24:14.372908] T [MSGID: 0] [defaults.c:3158:default_statfs] 0-stack-trace: stack-address: 0x7fb188000aa8, winding from shared-barrier to shared-marker [2019-08-04 10:24:14.372914] T [MSGID: 0] [defaults.c:3158:default_statfs] 0-stack-trace: stack-address: 0x7fb188000aa8, winding from shared-marker to shared-selinux [2019-08-04 10:24:14.372921] T [MSGID: 0] [defaults.c:3158:default_statfs] 0-stack-trace: stack-address: 0x7fb188000aa8, winding from shared-selinux to shared-io-threads [2019-08-04 10:24:14.372931] D [MSGID: 0] [io-threads.c:376:iot_schedule] 0-shared-io-threads: STATFS scheduled as fast priority fop [2019-08-04 10:24:14.372970] T [MSGID: 0] [defaults.c:2325:default_statfs_resume] 0-stack-trace: stack-address: 0x7fb188000aa8, winding from shared-io-threads to shared-upcall [2019-08-04 10:24:14.372991] T [MSGID: 0] [upcall.c:1232:up_statfs] 0-stack-trace: stack-address: 0x7fb188000aa8, winding from shared-upcall to shared-leases [2019-08-04 10:24:14.373001] T [MSGID: 0] [defaults.c:3158:default_statfs] 0-stack-trace: stack-address: 0x7fb188000aa8, winding from shared-leases to shared-read-only [2019-08-04 10:24:14.373013] T [MSGID: 0] [defaults.c:3158:default_statfs] 0-stack-trace: stack-address: 0x7fb188000aa8, winding from shared-read-only to shared-worm [2019-08-04 10:24:14.373020] T [MSGID: 0] [defaults.c:3158:default_statfs] 0-stack-trace: stack-address: 0x7fb188000aa8, winding from shared-worm to shared-locks [2019-08-04 10:24:14.373028] T [MSGID: 0] [posix.c:4388:pl_statfs] 0-stack-trace: stack-address: 0x7fb188000aa8, winding from shared-locks to shared-access-control [2019-08-04 10:24:14.373035] T [MSGID: 0] [defaults.c:3158:default_statfs] 0-stack-trace: stack-address: 0x7fb188000aa8, winding from shared-access-control to shared-bitrot-stub [2019-08-04 10:24:14.373041] T [MSGID: 0] [defaults.c:3158:default_statfs] 0-stack-trace: stack-address: 0x7fb188000aa8, winding from shared-bitrot-stub to shared-changelog [2019-08-04 10:24:14.373047] T [MSGID: 0] [defaults.c:3158:default_statfs] 0-stack-trace: stack-address: 0x7fb188000aa8, winding from shared-changelog to shared-trash [2019-08-04 10:24:14.373054] T [MSGID: 0] [defaults.c:3158:default_statfs] 0-stack-trace: stack-address: 0x7fb188000aa8, winding from shared-trash to shared-posix [2019-08-04 10:24:14.373089] T [MSGID: 0] [posix-inode-fd-ops.c:2287:posix_statfs] 0-stack-trace: stack-address: 0x7fb188000aa8, shared-posix returned 0 [2019-08-04 10:24:14.373097] T [MSGID: 0] [posix.c:4379:pl_statfs_cbk] 0-stack-trace: stack-address: 0x7fb188000aa8, shared-locks returned 0 [2019-08-04 10:24:14.373104] T [MSGID: 0] [upcall.c:1211:up_statfs_cbk] 0-stack-trace: stack-address: 0x7fb188000aa8, shared-upcall returned 0 [2019-08-04 10:24:14.373134] D [logging.c:2006:_gf_msg_internal] 0-logging-infra: Buffer overflow of a buffer whose size limit is 5. About to flush least recently used log message to disk [2019-08-04 10:24:14.373129] T [MSGID: 0] [defaults.c:1642:default_statfs_cbk] 0-stack-trace: stack-address: 0x7fb188000aa8, shared-io-threads returned 0 [2019-08-04 10:24:14.373132] T [MSGID: 0] [io-stats.c:2354:io_stats_statfs_cbk] 0-stack-trace: stack-address: 0x7fb188000aa8, shared-io-stats returned 0 [2019-08-04 10:24:14.373161] T [rpcsvc.c:1533:rpcsvc_submit_generic] 0-rpc-service: Tx message: 108 [2019-08-04 10:24:14.373168] T [rpcsvc.c:1069:rpcsvc_record_build_header] 0-rpc-service: Reply fraglen 132, payload: 108, rpc hdr: 24 [2019-08-04 10:24:14.373192] T [rpcsvc.c:1585:rpcsvc_submit_generic] 0-rpc-service: submitted reply for rpc-message (XID: 0x1d89b, Program: GlusterFS 4.x v1, ProgVers: 400, Proc: 14) to rpc-transport (tcp.shared-server) [2019-08-04 10:24:14.373231] D [client_t.c:433:gf_client_unref] (-->/usr/lib64/glusterfs/6.1/xlator/protocol/server.so(+0x5afc3) [0x7fb1c2a7efc3] -->/usr/lib64/glusterfs/6.1/xlator/protocol/server.so(+0xadeb) [0x7fb1c2a2edeb] -->/lib64/libglusterfs.so.0(gf_client_unref+0x7b) [0x7fb1d7b0fd2b] ) 0-client_t: CTX_ID:ca89ac1a-1d4d-4aea-b264-fce7b1378aa4-GRAPH_ID:2-PID:22025-HOST:gfs-migration701-PC_NAME:shared-client-1-RECON_NO:-2: ref-count 1 [2019-08-04 10:24:18.703786] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:5) in:1, out:0, err:0 [2019-08-04 10:24:18.703846] T [socket.c:2891:socket_event_handler] 0-tcp.shared-server: server (sock:5) socket is not connected, completing connection [2019-08-04 10:24:18.703876] T [socket.c:2898:socket_event_handler] 0-tcp.shared-server: (sock:5) socket_complete_connection() returned 1 [2019-08-04 10:24:18.703887] T [socket.c:2902:socket_event_handler] 0-tcp.shared-server: (sock:5) returning to wait on socket [2019-08-04 10:24:18.703894] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:5) in:1, out:0, err:0 [2019-08-04 10:24:18.703910] T [socket.c:2910:socket_event_handler] 0-tcp.shared-server: Server socket (5) is already connected [2019-08-04 10:24:18.703916] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:18.703934] T [socket.c:2928:socket_event_handler] 0-tcp.shared-server: (sock:5) socket_event_poll_in returned -12 [2019-08-04 10:24:18.703949] T [socket.c:231:socket_dump_info] 0-tcp.shared-server: $$$ server: disconnecting from (af:2,sock:5) 10.7.3.217 non-SSL (errno:0:Success) [2019-08-04 10:24:18.703954] D [socket.c:2946:socket_event_handler] 0-transport: EPOLLERR - disconnecting (sock:5) (non-SSL) [2019-08-04 10:24:18.703954] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:5) in:1, out:0, err:0 [2019-08-04 10:24:18.703968] T [socket.c:4583:fini] 0-tcp.shared-server: transport 0x7fb1b80053f0 destroyed [2019-08-04 10:24:18.703967] T [socket.c:2910:socket_event_handler] 0-tcp.shared-server: Server socket (-1) is already connected [2019-08-04 10:24:18.704080] E [socket.c:2317:__socket_proto_state_machine] (-->/lib64/libpthread.so.0(+0x7dd5) [0x7fb1d68eddd5] -->/lib64/libglusterfs.so.0(+0x8c286) [0x7fb1d7b13286] -->/usr/lib64/glusterfs/6.1/rpc-transport/socket.so(+0xc972) [0x7fb1cbe74972] ) 0-socket: invalid argument: this->private [Invalid argument] [2019-08-04 10:24:18.704093] T [socket.c:2928:socket_event_handler] 0-tcp.shared-server: (sock:-1) socket_event_poll_in returned -1 [2019-08-04 10:24:18.704102] T [socket.c:231:socket_dump_info] 0-tcp.shared-server: $$$ server: disconnecting from (af:2,sock:-1) 10.7.3.217 non-SSL (errno:-1:Unknown error -1) [2019-08-04 10:24:18.704107] D [socket.c:2946:socket_event_handler] 0-transport: EPOLLERR - disconnecting (sock:-1) (non-SSL) [2019-08-04 10:24:18.704132] E [socket.c:1303:socket_event_poll_err] (-->/lib64/libglusterfs.so.0(+0x8c286) [0x7fb1d7b13286] -->/usr/lib64/glusterfs/6.1/rpc-transport/socket.so(+0xa48a) [0x7fb1cbe7248a] -->/usr/lib64/glusterfs/6.1/rpc-transport/socket.so(+0x81fc) [0x7fb1cbe701fc] ) 0-socket: invalid argument: this->private [Invalid argument] [2019-08-04 10:24:18.704759] T [socket.c:961:__socket_nodelay] 0-shared-server: NODELAY enabled for socket 5 [2019-08-04 10:24:18.704776] T [socket.c:1050:__socket_keepalive] 0-shared-server: Keep-alive enabled for socket: 5, (idle: 20, interval: 2, max-probes: 9, timeout: 42) [2019-08-04 10:24:18.704803] T [socket.c:3086:socket_server_event_handler] 0-tcp.shared-server: XXX server:10.7.4.110:49156, client:10.7.3.217:60044 [2019-08-04 10:24:18.704822] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:5) in:1, out:0, err:0 [2019-08-04 10:24:18.704827] T [socket.c:2891:socket_event_handler] 0-tcp.shared-server: server (sock:5) socket is not connected, completing connection [2019-08-04 10:24:18.704834] T [socket.c:2898:socket_event_handler] 0-tcp.shared-server: (sock:5) socket_complete_connection() returned 1 [2019-08-04 10:24:18.704838] T [socket.c:2902:socket_event_handler] 0-tcp.shared-server: (sock:5) returning to wait on socket [2019-08-04 10:24:18.704840] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:5) in:1, out:0, err:0 [2019-08-04 10:24:18.704847] T [socket.c:2910:socket_event_handler] 0-tcp.shared-server: Server socket (5) is already connected [2019-08-04 10:24:18.704851] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:18.704858] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:18.704863] T [socket.c:2349:__socket_proto_state_machine] 0-tcp.shared-server: partial fragment header read [2019-08-04 10:24:18.704870] T [socket.c:2928:socket_event_handler] 0-tcp.shared-server: (sock:5) socket_event_poll_in returned 0 [2019-08-04 10:24:24.371032] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:11) in:1, out:0, err:0 [2019-08-04 10:24:24.371067] T [socket.c:2910:socket_event_handler] 0-tcp.shared-server: Server socket (11) is already connected [2019-08-04 10:24:24.371074] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:24.371084] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:24.371100] T [rpcsvc.c:744:rpcsvc_handle_rpc_call] 0-rpcsvc: Client port: 49090 [2019-08-04 10:24:24.371109] T [rpcsvc-auth.c:445:rpcsvc_auth_request_init] 0-rpc-service: Auth handler: AUTH_GLUSTERFS-v3 [2019-08-04 10:24:24.371117] T [rpcsvc.c:549:rpcsvc_request_create] 0-rpc-service: received rpc-message (XID: 0x120988, Ver: 2, Program: 1298437, ProgVers: 400, Proc: 14) from rpc-transport (tcp.shared-server) [2019-08-04 10:24:24.371128] T [auth-glusterfs.c:363:auth_glusterfs_v3_authenticate] 0-rpc-service: Auth Info: pid: 8323, uid: 694, gid: 692, owner: 0000000000000000, flags: 0 [2019-08-04 10:24:24.371135] T [rpcsvc.c:375:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterFS 4.x v1 - STATFS for 10.7.1.209:49090 [2019-08-04 10:24:24.371146] T [socket.c:2928:socket_event_handler] 0-tcp.shared-server: (sock:11) socket_event_poll_in returned 0 [2019-08-04 10:24:24.371159] T [rpcsvc.c:375:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterFS 4.x v1 - STATFS for 10.7.1.209:49090 [2019-08-04 10:24:24.371229] D [client_t.c:324:gf_client_ref] (-->/usr/lib64/glusterfs/6.1/xlator/protocol/server.so(+0x34dd5) [0x7fb1c2a58dd5] -->/usr/lib64/glusterfs/6.1/xlator/protocol/server.so(+0x115ed) [0x7fb1c2a355ed] -->/lib64/libglusterfs.so.0(gf_client_ref+0x6e) [0x7fb1d7b0fbde] ) 0-client_t: CTX_ID:ca89ac1a-1d4d-4aea-b264-fce7b1378aa4-GRAPH_ID:2-PID:22025-HOST:gfs-migration701-PC_NAME:shared-client-1-RECON_NO:-2: ref-count 2 [2019-08-04 10:24:24.371253] D [logging.c:2006:_gf_msg_internal] 0-logging-infra: Buffer overflow of a buffer whose size limit is 5. About to flush least recently used log message to disk The message "T [MSGID: 0] [server-rpc-fops_v2.c:2802:server4_statfs_resume] 0-stack-trace: stack-address: 0x7fb18c000aa8, winding from shared-server to /opt/data/shared" repeated 2 times between [2019-08-04 10:23:34.373258] and [2019-08-04 10:24:24.371247] [2019-08-04 10:24:24.371252] T [MSGID: 0] [defaults.c:3158:default_statfs] 0-stack-trace: stack-address: 0x7fb18c000aa8, winding from /opt/data/shared to shared-io-stats [2019-08-04 10:24:24.371291] T [MSGID: 0] [io-stats.c:2906:io_stats_statfs] 0-stack-trace: stack-address: 0x7fb18c000aa8, winding from shared-io-stats to shared-quota [2019-08-04 10:24:24.371300] T [MSGID: 0] [quota.c:4557:quota_statfs] 0-stack-trace: stack-address: 0x7fb18c000aa8, winding from shared-quota to shared-index [2019-08-04 10:24:24.371309] T [MSGID: 0] [defaults.c:3158:default_statfs] 0-stack-trace: stack-address: 0x7fb18c000aa8, winding from shared-index to shared-barrier [2019-08-04 10:24:24.371316] T [MSGID: 0] [defaults.c:3158:default_statfs] 0-stack-trace: stack-address: 0x7fb18c000aa8, winding from shared-barrier to shared-marker [2019-08-04 10:24:24.371323] T [MSGID: 0] [defaults.c:3158:default_statfs] 0-stack-trace: stack-address: 0x7fb18c000aa8, winding from shared-marker to shared-selinux [2019-08-04 10:24:24.371331] T [MSGID: 0] [defaults.c:3158:default_statfs] 0-stack-trace: stack-address: 0x7fb18c000aa8, winding from shared-selinux to shared-io-threads [2019-08-04 10:24:24.371342] D [MSGID: 0] [io-threads.c:376:iot_schedule] 0-shared-io-threads: STATFS scheduled as fast priority fop [2019-08-04 10:24:24.371419] T [MSGID: 0] [defaults.c:2325:default_statfs_resume] 0-stack-trace: stack-address: 0x7fb18c000aa8, winding from shared-io-threads to shared-upcall [2019-08-04 10:24:24.371442] T [MSGID: 0] [upcall.c:1232:up_statfs] 0-stack-trace: stack-address: 0x7fb18c000aa8, winding from shared-upcall to shared-leases [2019-08-04 10:24:24.371451] T [MSGID: 0] [defaults.c:3158:default_statfs] 0-stack-trace: stack-address: 0x7fb18c000aa8, winding from shared-leases to shared-read-only [2019-08-04 10:24:24.371458] T [MSGID: 0] [defaults.c:3158:default_statfs] 0-stack-trace: stack-address: 0x7fb18c000aa8, winding from shared-read-only to shared-worm [2019-08-04 10:24:24.371465] T [MSGID: 0] [defaults.c:3158:default_statfs] 0-stack-trace: stack-address: 0x7fb18c000aa8, winding from shared-worm to shared-locks [2019-08-04 10:24:24.371473] T [MSGID: 0] [posix.c:4388:pl_statfs] 0-stack-trace: stack-address: 0x7fb18c000aa8, winding from shared-locks to shared-access-control [2019-08-04 10:24:24.371480] T [MSGID: 0] [defaults.c:3158:default_statfs] 0-stack-trace: stack-address: 0x7fb18c000aa8, winding from shared-access-control to shared-bitrot-stub [2019-08-04 10:24:24.371486] T [MSGID: 0] [defaults.c:3158:default_statfs] 0-stack-trace: stack-address: 0x7fb18c000aa8, winding from shared-bitrot-stub to shared-changelog [2019-08-04 10:24:24.371492] T [MSGID: 0] [defaults.c:3158:default_statfs] 0-stack-trace: stack-address: 0x7fb18c000aa8, winding from shared-changelog to shared-trash [2019-08-04 10:24:24.371499] T [MSGID: 0] [defaults.c:3158:default_statfs] 0-stack-trace: stack-address: 0x7fb18c000aa8, winding from shared-trash to shared-posix [2019-08-04 10:24:24.371531] T [MSGID: 0] [posix-inode-fd-ops.c:2287:posix_statfs] 0-stack-trace: stack-address: 0x7fb18c000aa8, shared-posix returned 0 [2019-08-04 10:24:24.371539] T [MSGID: 0] [posix.c:4379:pl_statfs_cbk] 0-stack-trace: stack-address: 0x7fb18c000aa8, shared-locks returned 0 [2019-08-04 10:24:24.371545] T [MSGID: 0] [upcall.c:1211:up_statfs_cbk] 0-stack-trace: stack-address: 0x7fb18c000aa8, shared-upcall returned 0 [2019-08-04 10:24:24.371561] T [rpcsvc.c:1533:rpcsvc_submit_generic] 0-rpc-service: Tx message: 108 [2019-08-04 10:24:24.371568] T [rpcsvc.c:1069:rpcsvc_record_build_header] 0-rpc-service: Reply fraglen 132, payload: 108, rpc hdr: 24 [2019-08-04 10:24:24.371593] T [rpcsvc.c:1585:rpcsvc_submit_generic] 0-rpc-service: submitted reply for rpc-message (XID: 0x1d89c, Program: GlusterFS 4.x v1, ProgVers: 400, Proc: 14) to rpc-transport (tcp.shared-server) [2019-08-04 10:24:24.371647] D [client_t.c:433:gf_client_unref] (-->/usr/lib64/glusterfs/6.1/xlator/protocol/server.so(+0x5afc3) [0x7fb1c2a7efc3] -->/usr/lib64/glusterfs/6.1/xlator/protocol/server.so(+0xadeb) [0x7fb1c2a2edeb] -->/lib64/libglusterfs.so.0(gf_client_unref+0x7b) [0x7fb1d7b0fd2b] ) 0-client_t: CTX_ID:ca89ac1a-1d4d-4aea-b264-fce7b1378aa4-GRAPH_ID:2-PID:22025-HOST:gfs-migration701-PC_NAME:shared-client-1-RECON_NO:-2: ref-count 1 [2019-08-04 10:24:25.278101] D [logging.c:2006:_gf_msg_internal] 0-logging-infra: Buffer overflow of a buffer whose size limit is 5. About to flush least recently used log message to disk The message "T [MSGID: 0] [io-stats.c:2354:io_stats_statfs_cbk] 0-stack-trace: stack-address: 0x7fb18c000aa8, shared-io-stats returned 0" repeated 2 times between [2019-08-04 10:23:34.373484] and [2019-08-04 10:24:24.371555] [2019-08-04 10:24:25.278096] T [MSGID: 0] [posix-helpers.c:1469:posix_janitor_task] 0-shared-posix: janitor cleaning out /opt/data/shared/.glusterfs/landfill [2019-08-04 10:24:25.278248] D [MSGID: 0] [posix-metadata.c:118:posix_fetch_mdata_xattr] 0-shared-posix: No such attribute:trusted.glusterfs.mdata for file /opt/data/shared/.glusterfs/landfill gfid: null [2019-08-04 10:24:26.709885] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:5) in:1, out:0, err:0 [2019-08-04 10:24:26.709931] T [socket.c:2910:socket_event_handler] 0-tcp.shared-server: Server socket (5) is already connected [2019-08-04 10:24:26.709938] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:26.709957] D [socket.c:692:__socket_rwv] 0-tcp.shared-server: EOF on socket 5 (errno:22:Invalid argument); returning ENODATA [2019-08-04 10:24:26.709963] T [socket.c:2928:socket_event_handler] 0-tcp.shared-server: (sock:5) socket_event_poll_in returned -1 [2019-08-04 10:24:26.709972] T [socket.c:231:socket_dump_info] 0-tcp.shared-server: $$$ server: disconnecting from (af:2,sock:5) 10.7.3.217 non-SSL (errno:0:Success) [2019-08-04 10:24:26.709977] D [socket.c:2946:socket_event_handler] 0-transport: EPOLLERR - disconnecting (sock:5) (non-SSL) [2019-08-04 10:24:26.709992] T [socket.c:4583:fini] 0-tcp.shared-server: transport 0x7fb1bc08d9b0 destroyed [2019-08-04 10:24:26.711716] T [socket.c:961:__socket_nodelay] 0-shared-server: NODELAY enabled for socket 5 [2019-08-04 10:24:26.711739] T [socket.c:1050:__socket_keepalive] 0-shared-server: Keep-alive enabled for socket: 5, (idle: 20, interval: 2, max-probes: 9, timeout: 42) [2019-08-04 10:24:26.711753] T [socket.c:3086:socket_server_event_handler] 0-tcp.shared-server: XXX server:10.7.4.110:49156, client:10.7.3.217:34494 [2019-08-04 10:24:26.711772] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:5) in:1, out:0, err:0 [2019-08-04 10:24:26.711786] T [socket.c:2891:socket_event_handler] 0-tcp.shared-server: server (sock:5) socket is not connected, completing connection [2019-08-04 10:24:26.711796] T [socket.c:2898:socket_event_handler] 0-tcp.shared-server: (sock:5) socket_complete_connection() returned 1 [2019-08-04 10:24:26.711800] T [socket.c:2902:socket_event_handler] 0-tcp.shared-server: (sock:5) returning to wait on socket [2019-08-04 10:24:26.711810] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:5) in:1, out:0, err:0 [2019-08-04 10:24:26.711824] T [socket.c:2910:socket_event_handler] 0-tcp.shared-server: Server socket (5) is already connected [2019-08-04 10:24:26.711830] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:26.712216] D [MSGID: 0] [iobuf.c:568:iobuf_get2] 0-iobuf: request for iobuf of size 369295620 is serviced using standard calloc() (0x7fb1b8004f20) as it exceeds the maximum available buffer size [2019-08-04 10:24:26.712243] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:26.712252] E [socket.c:2252:__socket_read_frag] 0-rpc: wrong MSG-TYPE (-788331683) received from 10.7.3.217:34494 [2019-08-04 10:24:26.712257] T [socket.c:2928:socket_event_handler] 0-tcp.shared-server: (sock:5) socket_event_poll_in returned -1 [2019-08-04 10:24:26.712267] T [socket.c:231:socket_dump_info] 0-tcp.shared-server: $$$ server: disconnecting from (af:2,sock:5) 10.7.3.217 non-SSL (errno:0:Success) [2019-08-04 10:24:26.712281] D [socket.c:2946:socket_event_handler] 0-transport: EPOLLERR - disconnecting (sock:5) (non-SSL) [2019-08-04 10:24:26.712287] D [MSGID: 0] [iobuf.c:683:__iobuf_put] 0-iobuf: freeing the iobuf (0x7fb1b8004f20) allocated with standard calloc() [2019-08-04 10:24:26.712319] T [socket.c:4583:fini] 0-tcp.shared-server: transport 0x7fb1bc08d9b0 destroyed [2019-08-04 10:24:26.712835] T [socket.c:961:__socket_nodelay] 0-shared-server: NODELAY enabled for socket 5 [2019-08-04 10:24:26.712858] T [socket.c:1050:__socket_keepalive] 0-shared-server: Keep-alive enabled for socket: 5, (idle: 20, interval: 2, max-probes: 9, timeout: 42) [2019-08-04 10:24:26.712871] T [socket.c:3086:socket_server_event_handler] 0-tcp.shared-server: XXX server:10.7.4.110:49156, client:10.7.3.217:34496 [2019-08-04 10:24:26.712890] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:5) in:1, out:0, err:0 [2019-08-04 10:24:26.712895] T [socket.c:2891:socket_event_handler] 0-tcp.shared-server: server (sock:5) socket is not connected, completing connection [2019-08-04 10:24:26.712904] T [socket.c:2898:socket_event_handler] 0-tcp.shared-server: (sock:5) socket_complete_connection() returned 1 [2019-08-04 10:24:26.712908] T [socket.c:2902:socket_event_handler] 0-tcp.shared-server: (sock:5) returning to wait on socket [2019-08-04 10:24:26.712915] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:5) in:1, out:0, err:0 [2019-08-04 10:24:26.712928] T [socket.c:2910:socket_event_handler] 0-tcp.shared-server: Server socket (5) is already connected [2019-08-04 10:24:26.712934] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:26.713088] D [MSGID: 0] [iobuf.c:568:iobuf_get2] 0-iobuf: request for iobuf of size 369295622 is serviced using standard calloc() (0x7fb1bc002160) as it exceeds the maximum available buffer size [2019-08-04 10:24:26.713101] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:26.713109] E [socket.c:2252:__socket_read_frag] 0-rpc: wrong MSG-TYPE (134415197) received from 10.7.3.217:34496 [2019-08-04 10:24:26.713114] T [socket.c:2928:socket_event_handler] 0-tcp.shared-server: (sock:5) socket_event_poll_in returned -1 [2019-08-04 10:24:26.713123] T [socket.c:231:socket_dump_info] 0-tcp.shared-server: $$$ server: disconnecting from (af:2,sock:5) 10.7.3.217 non-SSL (errno:0:Success) [2019-08-04 10:24:26.713128] D [socket.c:2946:socket_event_handler] 0-transport: EPOLLERR - disconnecting (sock:5) (non-SSL) [2019-08-04 10:24:26.713133] D [MSGID: 0] [iobuf.c:683:__iobuf_put] 0-iobuf: freeing the iobuf (0x7fb1bc002160) allocated with standard calloc() [2019-08-04 10:24:26.713157] T [socket.c:4583:fini] 0-tcp.shared-server: transport 0x7fb1b80053f0 destroyed [2019-08-04 10:24:26.713633] T [socket.c:961:__socket_nodelay] 0-shared-server: NODELAY enabled for socket 5 [2019-08-04 10:24:26.713649] T [socket.c:1050:__socket_keepalive] 0-shared-server: Keep-alive enabled for socket: 5, (idle: 20, interval: 2, max-probes: 9, timeout: 42) [2019-08-04 10:24:26.713674] T [socket.c:3086:socket_server_event_handler] 0-tcp.shared-server: XXX server:10.7.4.110:49156, client:10.7.3.217:34498 [2019-08-04 10:24:26.713689] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:5) in:1, out:0, err:0 [2019-08-04 10:24:26.713694] T [socket.c:2891:socket_event_handler] 0-tcp.shared-server: server (sock:5) socket is not connected, completing connection [2019-08-04 10:24:26.713702] T [socket.c:2898:socket_event_handler] 0-tcp.shared-server: (sock:5) socket_complete_connection() returned 1 [2019-08-04 10:24:26.713706] T [socket.c:2902:socket_event_handler] 0-tcp.shared-server: (sock:5) returning to wait on socket [2019-08-04 10:24:26.713711] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:5) in:1, out:0, err:0 [2019-08-04 10:24:26.713721] T [socket.c:2910:socket_event_handler] 0-tcp.shared-server: Server socket (5) is already connected [2019-08-04 10:24:26.713730] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:26.713878] D [MSGID: 0] [iobuf.c:568:iobuf_get2] 0-iobuf: request for iobuf of size 369295622 is serviced using standard calloc() (0x7fb1b8015f70) as it exceeds the maximum available buffer size [2019-08-04 10:24:26.713890] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:26.713898] E [socket.c:2252:__socket_read_frag] 0-rpc: wrong MSG-TYPE (134415197) received from 10.7.3.217:34498 [2019-08-04 10:24:26.713902] T [socket.c:2928:socket_event_handler] 0-tcp.shared-server: (sock:5) socket_event_poll_in returned -1 [2019-08-04 10:24:26.713912] T [socket.c:231:socket_dump_info] 0-tcp.shared-server: $$$ server: disconnecting from (af:2,sock:5) 10.7.3.217 non-SSL (errno:0:Success) [2019-08-04 10:24:26.713916] D [socket.c:2946:socket_event_handler] 0-transport: EPOLLERR - disconnecting (sock:5) (non-SSL) [2019-08-04 10:24:26.713921] D [MSGID: 0] [iobuf.c:683:__iobuf_put] 0-iobuf: freeing the iobuf (0x7fb1b8015f70) allocated with standard calloc() [2019-08-04 10:24:26.713947] T [socket.c:4583:fini] 0-tcp.shared-server: transport 0x7fb1bc08d9b0 destroyed [2019-08-04 10:24:26.714466] T [socket.c:961:__socket_nodelay] 0-shared-server: NODELAY enabled for socket 5 [2019-08-04 10:24:26.714479] T [socket.c:1050:__socket_keepalive] 0-shared-server: Keep-alive enabled for socket: 5, (idle: 20, interval: 2, max-probes: 9, timeout: 42) [2019-08-04 10:24:26.714489] T [socket.c:3086:socket_server_event_handler] 0-tcp.shared-server: XXX server:10.7.4.110:49156, client:10.7.3.217:34500 [2019-08-04 10:24:26.714504] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:5) in:1, out:0, err:0 [2019-08-04 10:24:26.714509] T [socket.c:2891:socket_event_handler] 0-tcp.shared-server: server (sock:5) socket is not connected, completing connection [2019-08-04 10:24:26.714517] T [socket.c:2898:socket_event_handler] 0-tcp.shared-server: (sock:5) socket_complete_connection() returned 1 [2019-08-04 10:24:26.714522] T [socket.c:2902:socket_event_handler] 0-tcp.shared-server: (sock:5) returning to wait on socket [2019-08-04 10:24:26.714526] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:5) in:1, out:0, err:0 [2019-08-04 10:24:26.714535] T [socket.c:2910:socket_event_handler] 0-tcp.shared-server: Server socket (5) is already connected [2019-08-04 10:24:26.714539] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:26.714694] D [MSGID: 0] [iobuf.c:568:iobuf_get2] 0-iobuf: request for iobuf of size 369295622 is serviced using standard calloc() (0x7fb1bc002160) as it exceeds the maximum available buffer size [2019-08-04 10:24:26.714709] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:26.714716] E [socket.c:2252:__socket_read_frag] 0-rpc: wrong MSG-TYPE (134415197) received from 10.7.3.217:34500 [2019-08-04 10:24:26.714720] T [socket.c:2928:socket_event_handler] 0-tcp.shared-server: (sock:5) socket_event_poll_in returned -1 [2019-08-04 10:24:26.714729] T [socket.c:231:socket_dump_info] 0-tcp.shared-server: $$$ server: disconnecting from (af:2,sock:5) 10.7.3.217 non-SSL (errno:0:Success) [2019-08-04 10:24:26.714733] D [socket.c:2946:socket_event_handler] 0-transport: EPOLLERR - disconnecting (sock:5) (non-SSL) [2019-08-04 10:24:26.714738] D [MSGID: 0] [iobuf.c:683:__iobuf_put] 0-iobuf: freeing the iobuf (0x7fb1bc002160) allocated with standard calloc() [2019-08-04 10:24:26.714760] T [socket.c:4583:fini] 0-tcp.shared-server: transport 0x7fb1b80053f0 destroyed [2019-08-04 10:24:26.715392] T [socket.c:961:__socket_nodelay] 0-shared-server: NODELAY enabled for socket 5 [2019-08-04 10:24:26.715404] T [socket.c:1050:__socket_keepalive] 0-shared-server: Keep-alive enabled for socket: 5, (idle: 20, interval: 2, max-probes: 9, timeout: 42) [2019-08-04 10:24:26.715412] T [socket.c:3086:socket_server_event_handler] 0-tcp.shared-server: XXX server:10.7.4.110:49156, client:10.7.3.217:34502 [2019-08-04 10:24:26.715431] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:5) in:1, out:0, err:0 [2019-08-04 10:24:26.715436] T [socket.c:2891:socket_event_handler] 0-tcp.shared-server: server (sock:5) socket is not connected, completing connection [2019-08-04 10:24:26.715444] T [socket.c:2898:socket_event_handler] 0-tcp.shared-server: (sock:5) socket_complete_connection() returned 1 [2019-08-04 10:24:26.715448] T [socket.c:2902:socket_event_handler] 0-tcp.shared-server: (sock:5) returning to wait on socket [2019-08-04 10:24:26.715453] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:5) in:1, out:0, err:0 [2019-08-04 10:24:26.715464] T [socket.c:2910:socket_event_handler] 0-tcp.shared-server: Server socket (5) is already connected [2019-08-04 10:24:26.715468] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:26.715624] D [MSGID: 0] [iobuf.c:568:iobuf_get2] 0-iobuf: request for iobuf of size 369295622 is serviced using standard calloc() (0x7fb1b8015f70) as it exceeds the maximum available buffer size [2019-08-04 10:24:26.715639] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:26.715647] E [socket.c:2252:__socket_read_frag] 0-rpc: wrong MSG-TYPE (134415197) received from 10.7.3.217:34502 [2019-08-04 10:24:26.715652] T [socket.c:2928:socket_event_handler] 0-tcp.shared-server: (sock:5) socket_event_poll_in returned -1 [2019-08-04 10:24:26.715660] T [socket.c:231:socket_dump_info] 0-tcp.shared-server: $$$ server: disconnecting from (af:2,sock:5) 10.7.3.217 non-SSL (errno:0:Success) [2019-08-04 10:24:26.715665] D [socket.c:2946:socket_event_handler] 0-transport: EPOLLERR - disconnecting (sock:5) (non-SSL) [2019-08-04 10:24:26.715670] D [MSGID: 0] [iobuf.c:683:__iobuf_put] 0-iobuf: freeing the iobuf (0x7fb1b8015f70) allocated with standard calloc() [2019-08-04 10:24:26.715694] T [socket.c:4583:fini] 0-tcp.shared-server: transport 0x7fb1bc08d9b0 destroyed [2019-08-04 10:24:26.716172] T [socket.c:961:__socket_nodelay] 0-shared-server: NODELAY enabled for socket 5 [2019-08-04 10:24:26.716186] T [socket.c:1050:__socket_keepalive] 0-shared-server: Keep-alive enabled for socket: 5, (idle: 20, interval: 2, max-probes: 9, timeout: 42) [2019-08-04 10:24:26.716195] T [socket.c:3086:socket_server_event_handler] 0-tcp.shared-server: XXX server:10.7.4.110:49156, client:10.7.3.217:34504 [2019-08-04 10:24:26.716216] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:5) in:1, out:0, err:0 [2019-08-04 10:24:26.716221] T [socket.c:2891:socket_event_handler] 0-tcp.shared-server: server (sock:5) socket is not connected, completing connection [2019-08-04 10:24:26.716228] T [socket.c:2898:socket_event_handler] 0-tcp.shared-server: (sock:5) socket_complete_connection() returned 1 [2019-08-04 10:24:26.716232] T [socket.c:2902:socket_event_handler] 0-tcp.shared-server: (sock:5) returning to wait on socket [2019-08-04 10:24:26.716234] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:5) in:1, out:0, err:0 [2019-08-04 10:24:26.716244] T [socket.c:2910:socket_event_handler] 0-tcp.shared-server: Server socket (5) is already connected [2019-08-04 10:24:26.716248] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:26.716397] D [MSGID: 0] [iobuf.c:568:iobuf_get2] 0-iobuf: request for iobuf of size 369296132 is serviced using standard calloc() (0x7fb1bc002160) as it exceeds the maximum available buffer size [2019-08-04 10:24:26.716407] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:26.716415] E [socket.c:2252:__socket_read_frag] 0-rpc: wrong MSG-TYPE (-1543306403) received from 10.7.3.217:34504 [2019-08-04 10:24:26.716419] T [socket.c:2928:socket_event_handler] 0-tcp.shared-server: (sock:5) socket_event_poll_in returned -1 [2019-08-04 10:24:26.716428] T [socket.c:231:socket_dump_info] 0-tcp.shared-server: $$$ server: disconnecting from (af:2,sock:5) 10.7.3.217 non-SSL (errno:0:Success) [2019-08-04 10:24:26.716437] D [socket.c:2946:socket_event_handler] 0-transport: EPOLLERR - disconnecting (sock:5) (non-SSL) [2019-08-04 10:24:26.716442] D [MSGID: 0] [iobuf.c:683:__iobuf_put] 0-iobuf: freeing the iobuf (0x7fb1bc002160) allocated with standard calloc() [2019-08-04 10:24:26.716464] T [socket.c:4583:fini] 0-tcp.shared-server: transport 0x7fb1b80053f0 destroyed [2019-08-04 10:24:26.717115] T [socket.c:961:__socket_nodelay] 0-shared-server: NODELAY enabled for socket 5 [2019-08-04 10:24:26.717130] T [socket.c:1050:__socket_keepalive] 0-shared-server: Keep-alive enabled for socket: 5, (idle: 20, interval: 2, max-probes: 9, timeout: 42) [2019-08-04 10:24:26.717140] T [socket.c:3086:socket_server_event_handler] 0-tcp.shared-server: XXX server:10.7.4.110:49156, client:10.7.3.217:34506 [2019-08-04 10:24:26.717308] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:5) in:1, out:0, err:0 [2019-08-04 10:24:26.717314] T [socket.c:2891:socket_event_handler] 0-tcp.shared-server: server (sock:5) socket is not connected, completing connection [2019-08-04 10:24:26.717323] T [socket.c:2898:socket_event_handler] 0-tcp.shared-server: (sock:5) socket_complete_connection() returned 1 [2019-08-04 10:24:26.717327] T [socket.c:2902:socket_event_handler] 0-tcp.shared-server: (sock:5) returning to wait on socket [2019-08-04 10:24:26.717332] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:5) in:1, out:0, err:0 [2019-08-04 10:24:26.717336] T [socket.c:2910:socket_event_handler] 0-tcp.shared-server: Server socket (5) is already connected [2019-08-04 10:24:26.717340] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:26.717480] D [MSGID: 0] [iobuf.c:568:iobuf_get2] 0-iobuf: request for iobuf of size 369295620 is serviced using standard calloc() (0x7fb1bc002160) as it exceeds the maximum available buffer size [2019-08-04 10:24:26.717490] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:26.717497] E [socket.c:2252:__socket_read_frag] 0-rpc: wrong MSG-TYPE (-100466339) received from 10.7.3.217:34506 [2019-08-04 10:24:26.717501] T [socket.c:2928:socket_event_handler] 0-tcp.shared-server: (sock:5) socket_event_poll_in returned -1 [2019-08-04 10:24:26.717509] T [socket.c:231:socket_dump_info] 0-tcp.shared-server: $$$ server: disconnecting from (af:2,sock:5) 10.7.3.217 non-SSL (errno:0:Success) [2019-08-04 10:24:26.717514] D [socket.c:2946:socket_event_handler] 0-transport: EPOLLERR - disconnecting (sock:5) (non-SSL) [2019-08-04 10:24:26.717518] D [MSGID: 0] [iobuf.c:683:__iobuf_put] 0-iobuf: freeing the iobuf (0x7fb1bc002160) allocated with standard calloc() [2019-08-04 10:24:26.717540] T [socket.c:4583:fini] 0-tcp.shared-server: transport 0x7fb1bc08d9b0 destroyed [2019-08-04 10:24:26.718269] T [socket.c:961:__socket_nodelay] 0-shared-server: NODELAY enabled for socket 5 [2019-08-04 10:24:26.718286] T [socket.c:1050:__socket_keepalive] 0-shared-server: Keep-alive enabled for socket: 5, (idle: 20, interval: 2, max-probes: 9, timeout: 42) [2019-08-04 10:24:26.718299] T [socket.c:3086:socket_server_event_handler] 0-tcp.shared-server: XXX server:10.7.4.110:49156, client:10.7.3.217:34508 [2019-08-04 10:24:26.718319] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:5) in:1, out:0, err:0 [2019-08-04 10:24:26.718327] T [socket.c:2891:socket_event_handler] 0-tcp.shared-server: server (sock:5) socket is not connected, completing connection [2019-08-04 10:24:26.718339] T [socket.c:2898:socket_event_handler] 0-tcp.shared-server: (sock:5) socket_complete_connection() returned 1 [2019-08-04 10:24:26.718346] T [socket.c:2902:socket_event_handler] 0-tcp.shared-server: (sock:5) returning to wait on socket [2019-08-04 10:24:26.718348] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:5) in:1, out:0, err:0 [2019-08-04 10:24:26.718360] T [socket.c:2910:socket_event_handler] 0-tcp.shared-server: Server socket (5) is already connected [2019-08-04 10:24:26.718369] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:26.718516] D [MSGID: 0] [iobuf.c:568:iobuf_get2] 0-iobuf: request for iobuf of size 369295620 is serviced using standard calloc() (0x7fb1b8015f70) as it exceeds the maximum available buffer size [2019-08-04 10:24:26.718527] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:26.718534] E [socket.c:2252:__socket_read_frag] 0-rpc: wrong MSG-TYPE (-100466339) received from 10.7.3.217:34508 [2019-08-04 10:24:26.718539] T [socket.c:2928:socket_event_handler] 0-tcp.shared-server: (sock:5) socket_event_poll_in returned -1 [2019-08-04 10:24:26.718548] T [socket.c:231:socket_dump_info] 0-tcp.shared-server: $$$ server: disconnecting from (af:2,sock:5) 10.7.3.217 non-SSL (errno:0:Success) [2019-08-04 10:24:26.718552] D [socket.c:2946:socket_event_handler] 0-transport: EPOLLERR - disconnecting (sock:5) (non-SSL) [2019-08-04 10:24:26.718557] D [MSGID: 0] [iobuf.c:683:__iobuf_put] 0-iobuf: freeing the iobuf (0x7fb1b8015f70) allocated with standard calloc() [2019-08-04 10:24:26.718582] T [socket.c:4583:fini] 0-tcp.shared-server: transport 0x7fb1bc08d9b0 destroyed [2019-08-04 10:24:26.719198] T [socket.c:961:__socket_nodelay] 0-shared-server: NODELAY enabled for socket 5 [2019-08-04 10:24:26.719215] T [socket.c:1050:__socket_keepalive] 0-shared-server: Keep-alive enabled for socket: 5, (idle: 20, interval: 2, max-probes: 9, timeout: 42) [2019-08-04 10:24:26.719225] T [socket.c:3086:socket_server_event_handler] 0-tcp.shared-server: XXX server:10.7.4.110:49156, client:10.7.3.217:34510 [2019-08-04 10:24:26.719241] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:5) in:1, out:0, err:0 [2019-08-04 10:24:26.719246] T [socket.c:2891:socket_event_handler] 0-tcp.shared-server: server (sock:5) socket is not connected, completing connection [2019-08-04 10:24:26.719254] T [socket.c:2898:socket_event_handler] 0-tcp.shared-server: (sock:5) socket_complete_connection() returned 1 [2019-08-04 10:24:26.719258] T [socket.c:2902:socket_event_handler] 0-tcp.shared-server: (sock:5) returning to wait on socket [2019-08-04 10:24:26.719266] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:5) in:1, out:0, err:0 [2019-08-04 10:24:26.719279] T [socket.c:2910:socket_event_handler] 0-tcp.shared-server: Server socket (5) is already connected [2019-08-04 10:24:26.719286] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:26.719446] D [MSGID: 0] [iobuf.c:568:iobuf_get2] 0-iobuf: request for iobuf of size 369295620 is serviced using standard calloc() (0x7fb1bc002160) as it exceeds the maximum available buffer size [2019-08-04 10:24:26.719458] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:26.719466] E [socket.c:2252:__socket_read_frag] 0-rpc: wrong MSG-TYPE (-1107099299) received from 10.7.3.217:34510 [2019-08-04 10:24:26.719470] T [socket.c:2928:socket_event_handler] 0-tcp.shared-server: (sock:5) socket_event_poll_in returned -1 [2019-08-04 10:24:26.719479] T [socket.c:231:socket_dump_info] 0-tcp.shared-server: $$$ server: disconnecting from (af:2,sock:5) 10.7.3.217 non-SSL (errno:0:Success) [2019-08-04 10:24:26.719483] D [socket.c:2946:socket_event_handler] 0-transport: EPOLLERR - disconnecting (sock:5) (non-SSL) [2019-08-04 10:24:26.719488] D [MSGID: 0] [iobuf.c:683:__iobuf_put] 0-iobuf: freeing the iobuf (0x7fb1bc002160) allocated with standard calloc() [2019-08-04 10:24:26.719511] T [socket.c:4583:fini] 0-tcp.shared-server: transport 0x7fb1b80053f0 destroyed [2019-08-04 10:24:26.719975] T [socket.c:961:__socket_nodelay] 0-shared-server: NODELAY enabled for socket 5 [2019-08-04 10:24:26.719990] T [socket.c:1050:__socket_keepalive] 0-shared-server: Keep-alive enabled for socket: 5, (idle: 20, interval: 2, max-probes: 9, timeout: 42) [2019-08-04 10:24:26.720004] T [socket.c:3086:socket_server_event_handler] 0-tcp.shared-server: XXX server:10.7.4.110:49156, client:10.7.3.217:34512 [2019-08-04 10:24:26.720019] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:5) in:1, out:0, err:0 [2019-08-04 10:24:26.720024] T [socket.c:2891:socket_event_handler] 0-tcp.shared-server: server (sock:5) socket is not connected, completing connection [2019-08-04 10:24:26.720031] T [socket.c:2898:socket_event_handler] 0-tcp.shared-server: (sock:5) socket_complete_connection() returned 1 [2019-08-04 10:24:26.720035] T [socket.c:2902:socket_event_handler] 0-tcp.shared-server: (sock:5) returning to wait on socket [2019-08-04 10:24:26.720040] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:5) in:1, out:0, err:0 [2019-08-04 10:24:26.720050] T [socket.c:2910:socket_event_handler] 0-tcp.shared-server: Server socket (5) is already connected [2019-08-04 10:24:26.720054] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:26.720198] D [MSGID: 0] [iobuf.c:568:iobuf_get2] 0-iobuf: request for iobuf of size 369295620 is serviced using standard calloc() (0x7fb1b8015f70) as it exceeds the maximum available buffer size [2019-08-04 10:24:26.720209] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:26.720216] E [socket.c:2252:__socket_read_frag] 0-rpc: wrong MSG-TYPE (-2080177827) received from 10.7.3.217:34512 [2019-08-04 10:24:26.720221] T [socket.c:2928:socket_event_handler] 0-tcp.shared-server: (sock:5) socket_event_poll_in returned -1 [2019-08-04 10:24:26.720229] T [socket.c:231:socket_dump_info] 0-tcp.shared-server: $$$ server: disconnecting from (af:2,sock:5) 10.7.3.217 non-SSL (errno:0:Success) [2019-08-04 10:24:26.720234] D [socket.c:2946:socket_event_handler] 0-transport: EPOLLERR - disconnecting (sock:5) (non-SSL) [2019-08-04 10:24:26.720239] D [MSGID: 0] [iobuf.c:683:__iobuf_put] 0-iobuf: freeing the iobuf (0x7fb1b8015f70) allocated with standard calloc() [2019-08-04 10:24:26.720262] T [socket.c:4583:fini] 0-tcp.shared-server: transport 0x7fb1bc08d9b0 destroyed [2019-08-04 10:24:26.720613] T [socket.c:961:__socket_nodelay] 0-shared-server: NODELAY enabled for socket 5 [2019-08-04 10:24:26.720637] T [socket.c:1050:__socket_keepalive] 0-shared-server: Keep-alive enabled for socket: 5, (idle: 20, interval: 2, max-probes: 9, timeout: 42) [2019-08-04 10:24:26.720647] T [socket.c:3086:socket_server_event_handler] 0-tcp.shared-server: XXX server:10.7.4.110:49156, client:10.7.3.217:34514 [2019-08-04 10:24:26.720661] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:5) in:1, out:0, err:0 [2019-08-04 10:24:26.720666] T [socket.c:2891:socket_event_handler] 0-tcp.shared-server: server (sock:5) socket is not connected, completing connection [2019-08-04 10:24:26.720674] T [socket.c:2898:socket_event_handler] 0-tcp.shared-server: (sock:5) socket_complete_connection() returned 1 [2019-08-04 10:24:26.720678] T [socket.c:2902:socket_event_handler] 0-tcp.shared-server: (sock:5) returning to wait on socket [2019-08-04 10:24:26.720681] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:5) in:1, out:0, err:0 [2019-08-04 10:24:26.720693] T [socket.c:2910:socket_event_handler] 0-tcp.shared-server: Server socket (5) is already connected [2019-08-04 10:24:26.720697] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:26.720848] D [MSGID: 0] [iobuf.c:568:iobuf_get2] 0-iobuf: request for iobuf of size 369295364 is serviced using standard calloc() (0x7fb1bc002160) as it exceeds the maximum available buffer size [2019-08-04 10:24:26.720859] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:26.720866] E [socket.c:2252:__socket_read_frag] 0-rpc: wrong MSG-TYPE (1812136029) received from 10.7.3.217:34514 [2019-08-04 10:24:26.720871] T [socket.c:2928:socket_event_handler] 0-tcp.shared-server: (sock:5) socket_event_poll_in returned -1 [2019-08-04 10:24:26.720883] T [socket.c:231:socket_dump_info] 0-tcp.shared-server: $$$ server: disconnecting from (af:2,sock:5) 10.7.3.217 non-SSL (errno:0:Success) [2019-08-04 10:24:26.720888] D [socket.c:2946:socket_event_handler] 0-transport: EPOLLERR - disconnecting (sock:5) (non-SSL) [2019-08-04 10:24:26.720892] D [MSGID: 0] [iobuf.c:683:__iobuf_put] 0-iobuf: freeing the iobuf (0x7fb1bc002160) allocated with standard calloc() [2019-08-04 10:24:26.720914] T [socket.c:4583:fini] 0-tcp.shared-server: transport 0x7fb1b80053f0 destroyed [2019-08-04 10:24:26.721297] T [socket.c:961:__socket_nodelay] 0-shared-server: NODELAY enabled for socket 5 [2019-08-04 10:24:26.721308] T [socket.c:1050:__socket_keepalive] 0-shared-server: Keep-alive enabled for socket: 5, (idle: 20, interval: 2, max-probes: 9, timeout: 42) [2019-08-04 10:24:26.721317] T [socket.c:3086:socket_server_event_handler] 0-tcp.shared-server: XXX server:10.7.4.110:49156, client:10.7.3.217:34516 [2019-08-04 10:24:26.721344] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:5) in:1, out:0, err:0 [2019-08-04 10:24:26.721349] T [socket.c:2891:socket_event_handler] 0-tcp.shared-server: server (sock:5) socket is not connected, completing connection [2019-08-04 10:24:26.721358] T [socket.c:2898:socket_event_handler] 0-tcp.shared-server: (sock:5) socket_complete_connection() returned 1 [2019-08-04 10:24:26.721362] T [socket.c:2902:socket_event_handler] 0-tcp.shared-server: (sock:5) returning to wait on socket [2019-08-04 10:24:26.721371] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:5) in:1, out:0, err:0 [2019-08-04 10:24:26.721383] T [socket.c:2910:socket_event_handler] 0-tcp.shared-server: Server socket (5) is already connected [2019-08-04 10:24:26.721388] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:26.721533] D [MSGID: 0] [iobuf.c:568:iobuf_get2] 0-iobuf: request for iobuf of size 369295364 is serviced using standard calloc() (0x7fb1b8015f70) as it exceeds the maximum available buffer size [2019-08-04 10:24:26.721544] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:26.721551] E [socket.c:2252:__socket_read_frag] 0-rpc: wrong MSG-TYPE (1241710685) received from 10.7.3.217:34516 [2019-08-04 10:24:26.721555] T [socket.c:2928:socket_event_handler] 0-tcp.shared-server: (sock:5) socket_event_poll_in returned -1 [2019-08-04 10:24:26.721564] T [socket.c:231:socket_dump_info] 0-tcp.shared-server: $$$ server: disconnecting from (af:2,sock:5) 10.7.3.217 non-SSL (errno:0:Success) [2019-08-04 10:24:26.721569] D [socket.c:2946:socket_event_handler] 0-transport: EPOLLERR - disconnecting (sock:5) (non-SSL) [2019-08-04 10:24:26.721574] D [MSGID: 0] [iobuf.c:683:__iobuf_put] 0-iobuf: freeing the iobuf (0x7fb1b8015f70) allocated with standard calloc() -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 16 15:45:19 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 16 Aug 2019 15:45:19 +0000 Subject: [Bugs] [Bug 1740413] Gluster volume bricks crashes when running a security scan on glusterfs ports In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1740413 --- Comment #4 from Marvin --- Part 2 [2019-08-04 10:24:26.721598] T [socket.c:4583:fini] 0-tcp.shared-server: transport 0x7fb1bc08d9b0 destroyed [2019-08-04 10:24:26.722058] T [socket.c:961:__socket_nodelay] 0-shared-server: NODELAY enabled for socket 5 [2019-08-04 10:24:26.722076] T [socket.c:1050:__socket_keepalive] 0-shared-server: Keep-alive enabled for socket: 5, (idle: 20, interval: 2, max-probes: 9, timeout: 42) [2019-08-04 10:24:26.722086] T [socket.c:3086:socket_server_event_handler] 0-tcp.shared-server: XXX server:10.7.4.110:49156, client:10.7.3.217:34518 [2019-08-04 10:24:26.722102] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:5) in:1, out:0, err:0 [2019-08-04 10:24:26.722107] T [socket.c:2891:socket_event_handler] 0-tcp.shared-server: server (sock:5) socket is not connected, completing connection [2019-08-04 10:24:26.722115] T [socket.c:2898:socket_event_handler] 0-tcp.shared-server: (sock:5) socket_complete_connection() returned 1 [2019-08-04 10:24:26.722119] T [socket.c:2902:socket_event_handler] 0-tcp.shared-server: (sock:5) returning to wait on socket [2019-08-04 10:24:26.722122] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:5) in:1, out:0, err:0 [2019-08-04 10:24:26.722134] T [socket.c:2910:socket_event_handler] 0-tcp.shared-server: Server socket (5) is already connected [2019-08-04 10:24:26.722138] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:26.722281] D [MSGID: 0] [iobuf.c:568:iobuf_get2] 0-iobuf: request for iobuf of size 369295364 is serviced using standard calloc() (0x7fb1bc002160) as it exceeds the maximum available buffer size [2019-08-04 10:24:26.722290] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:26.722297] E [socket.c:2252:__socket_read_frag] 0-rpc: wrong MSG-TYPE (-1576861603) received from 10.7.3.217:34518 [2019-08-04 10:24:26.722302] T [socket.c:2928:socket_event_handler] 0-tcp.shared-server: (sock:5) socket_event_poll_in returned -1 [2019-08-04 10:24:26.722310] T [socket.c:231:socket_dump_info] 0-tcp.shared-server: $$$ server: disconnecting from (af:2,sock:5) 10.7.3.217 non-SSL (errno:0:Success) [2019-08-04 10:24:26.722314] D [socket.c:2946:socket_event_handler] 0-transport: EPOLLERR - disconnecting (sock:5) (non-SSL) [2019-08-04 10:24:26.722319] D [MSGID: 0] [iobuf.c:683:__iobuf_put] 0-iobuf: freeing the iobuf (0x7fb1bc002160) allocated with standard calloc() [2019-08-04 10:24:26.722340] T [socket.c:4583:fini] 0-tcp.shared-server: transport 0x7fb1b80053f0 destroyed [2019-08-04 10:24:26.723181] T [socket.c:961:__socket_nodelay] 0-shared-server: NODELAY enabled for socket 5 [2019-08-04 10:24:26.723210] T [socket.c:1050:__socket_keepalive] 0-shared-server: Keep-alive enabled for socket: 5, (idle: 20, interval: 2, max-probes: 9, timeout: 42) [2019-08-04 10:24:26.723230] T [socket.c:3086:socket_server_event_handler] 0-tcp.shared-server: XXX server:10.7.4.110:49156, client:10.7.3.217:34520 [2019-08-04 10:24:26.723250] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:5) in:1, out:0, err:0 [2019-08-04 10:24:26.723257] T [socket.c:2891:socket_event_handler] 0-tcp.shared-server: server (sock:5) socket is not connected, completing connection [2019-08-04 10:24:26.723267] T [socket.c:2898:socket_event_handler] 0-tcp.shared-server: (sock:5) socket_complete_connection() returned 1 [2019-08-04 10:24:26.723272] T [socket.c:2902:socket_event_handler] 0-tcp.shared-server: (sock:5) returning to wait on socket [2019-08-04 10:24:26.723276] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:5) in:1, out:0, err:0 [2019-08-04 10:24:26.723289] T [socket.c:2910:socket_event_handler] 0-tcp.shared-server: Server socket (5) is already connected [2019-08-04 10:24:26.723294] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:26.723440] D [MSGID: 0] [iobuf.c:568:iobuf_get2] 0-iobuf: request for iobuf of size 369295364 is serviced using standard calloc() (0x7fb1b8015f70) as it exceeds the maximum available buffer size [2019-08-04 10:24:26.723451] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:26.723460] E [socket.c:2252:__socket_read_frag] 0-rpc: wrong MSG-TYPE (-1576861603) received from 10.7.3.217:34520 [2019-08-04 10:24:26.723468] T [socket.c:2928:socket_event_handler] 0-tcp.shared-server: (sock:5) socket_event_poll_in returned -1 [2019-08-04 10:24:26.723480] T [socket.c:231:socket_dump_info] 0-tcp.shared-server: $$$ server: disconnecting from (af:2,sock:5) 10.7.3.217 non-SSL (errno:0:Success) [2019-08-04 10:24:26.723485] D [socket.c:2946:socket_event_handler] 0-transport: EPOLLERR - disconnecting (sock:5) (non-SSL) [2019-08-04 10:24:26.723490] D [MSGID: 0] [iobuf.c:683:__iobuf_put] 0-iobuf: freeing the iobuf (0x7fb1b8015f70) allocated with standard calloc() [2019-08-04 10:24:26.723517] T [socket.c:4583:fini] 0-tcp.shared-server: transport 0x7fb1bc08d9b0 destroyed [2019-08-04 10:24:26.724010] T [socket.c:961:__socket_nodelay] 0-shared-server: NODELAY enabled for socket 5 [2019-08-04 10:24:26.724027] T [socket.c:1050:__socket_keepalive] 0-shared-server: Keep-alive enabled for socket: 5, (idle: 20, interval: 2, max-probes: 9, timeout: 42) [2019-08-04 10:24:26.724042] T [socket.c:3086:socket_server_event_handler] 0-tcp.shared-server: XXX server:10.7.4.110:49156, client:10.7.3.217:34522 [2019-08-04 10:24:26.724056] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:5) in:1, out:0, err:0 [2019-08-04 10:24:26.724061] T [socket.c:2891:socket_event_handler] 0-tcp.shared-server: server (sock:5) socket is not connected, completing connection [2019-08-04 10:24:26.724073] T [socket.c:2898:socket_event_handler] 0-tcp.shared-server: (sock:5) socket_complete_connection() returned 1 [2019-08-04 10:24:26.724077] T [socket.c:2902:socket_event_handler] 0-tcp.shared-server: (sock:5) returning to wait on socket [2019-08-04 10:24:26.724079] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:5) in:1, out:0, err:0 [2019-08-04 10:24:26.724090] T [socket.c:2910:socket_event_handler] 0-tcp.shared-server: Server socket (5) is already connected [2019-08-04 10:24:26.724097] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:26.724278] D [MSGID: 0] [iobuf.c:568:iobuf_get2] 0-iobuf: request for iobuf of size 369295364 is serviced using standard calloc() (0x7fb1bc002160) as it exceeds the maximum available buffer size [2019-08-04 10:24:26.724294] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:26.724304] E [socket.c:2252:__socket_read_frag] 0-rpc: wrong MSG-TYPE (1711472733) received from 10.7.3.217:34522 [2019-08-04 10:24:26.724310] T [socket.c:2928:socket_event_handler] 0-tcp.shared-server: (sock:5) socket_event_poll_in returned -1 [2019-08-04 10:24:26.724321] T [socket.c:231:socket_dump_info] 0-tcp.shared-server: $$$ server: disconnecting from (af:2,sock:5) 10.7.3.217 non-SSL (errno:0:Success) [2019-08-04 10:24:26.724326] D [socket.c:2946:socket_event_handler] 0-transport: EPOLLERR - disconnecting (sock:5) (non-SSL) [2019-08-04 10:24:26.724332] D [MSGID: 0] [iobuf.c:683:__iobuf_put] 0-iobuf: freeing the iobuf (0x7fb1bc002160) allocated with standard calloc() [2019-08-04 10:24:26.724363] T [socket.c:4583:fini] 0-tcp.shared-server: transport 0x7fb1b80053f0 destroyed [2019-08-04 10:24:26.724764] T [socket.c:961:__socket_nodelay] 0-shared-server: NODELAY enabled for socket 5 [2019-08-04 10:24:26.724786] T [socket.c:1050:__socket_keepalive] 0-shared-server: Keep-alive enabled for socket: 5, (idle: 20, interval: 2, max-probes: 9, timeout: 42) [2019-08-04 10:24:26.724802] T [socket.c:3086:socket_server_event_handler] 0-tcp.shared-server: XXX server:10.7.4.110:49156, client:10.7.3.217:34524 [2019-08-04 10:24:26.724825] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:5) in:1, out:0, err:0 [2019-08-04 10:24:26.724831] T [socket.c:2891:socket_event_handler] 0-tcp.shared-server: server (sock:5) socket is not connected, completing connection [2019-08-04 10:24:26.724841] T [socket.c:2898:socket_event_handler] 0-tcp.shared-server: (sock:5) socket_complete_connection() returned 1 [2019-08-04 10:24:26.724846] T [socket.c:2902:socket_event_handler] 0-tcp.shared-server: (sock:5) returning to wait on socket [2019-08-04 10:24:26.724849] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:5) in:1, out:0, err:0 [2019-08-04 10:24:26.724858] T [socket.c:2910:socket_event_handler] 0-tcp.shared-server: Server socket (5) is already connected [2019-08-04 10:24:26.724862] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:26.725011] D [MSGID: 0] [iobuf.c:568:iobuf_get2] 0-iobuf: request for iobuf of size 369295364 is serviced using standard calloc() (0x7fb1b8015f70) as it exceeds the maximum available buffer size [2019-08-04 10:24:26.725026] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:26.725037] E [socket.c:2252:__socket_read_frag] 0-rpc: wrong MSG-TYPE (1510146141) received from 10.7.3.217:34524 [2019-08-04 10:24:26.725045] T [socket.c:2928:socket_event_handler] 0-tcp.shared-server: (sock:5) socket_event_poll_in returned -1 [2019-08-04 10:24:26.725059] T [socket.c:231:socket_dump_info] 0-tcp.shared-server: $$$ server: disconnecting from (af:2,sock:5) 10.7.3.217 non-SSL (errno:0:Success) [2019-08-04 10:24:26.725064] D [socket.c:2946:socket_event_handler] 0-transport: EPOLLERR - disconnecting (sock:5) (non-SSL) [2019-08-04 10:24:26.725068] D [MSGID: 0] [iobuf.c:683:__iobuf_put] 0-iobuf: freeing the iobuf (0x7fb1b8015f70) allocated with standard calloc() [2019-08-04 10:24:26.725092] T [socket.c:4583:fini] 0-tcp.shared-server: transport 0x7fb1bc08d9b0 destroyed [2019-08-04 10:24:26.725438] T [socket.c:961:__socket_nodelay] 0-shared-server: NODELAY enabled for socket 5 [2019-08-04 10:24:26.725450] T [socket.c:1050:__socket_keepalive] 0-shared-server: Keep-alive enabled for socket: 5, (idle: 20, interval: 2, max-probes: 9, timeout: 42) [2019-08-04 10:24:26.725459] T [socket.c:3086:socket_server_event_handler] 0-tcp.shared-server: XXX server:10.7.4.110:49156, client:10.7.3.217:34526 [2019-08-04 10:24:26.725486] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:5) in:1, out:0, err:0 [2019-08-04 10:24:26.725492] T [socket.c:2891:socket_event_handler] 0-tcp.shared-server: server (sock:5) socket is not connected, completing connection [2019-08-04 10:24:26.725502] T [socket.c:2898:socket_event_handler] 0-tcp.shared-server: (sock:5) socket_complete_connection() returned 1 [2019-08-04 10:24:26.725506] T [socket.c:2902:socket_event_handler] 0-tcp.shared-server: (sock:5) returning to wait on socket [2019-08-04 10:24:26.725514] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:5) in:1, out:0, err:0 [2019-08-04 10:24:26.725524] T [socket.c:2910:socket_event_handler] 0-tcp.shared-server: Server socket (5) is already connected [2019-08-04 10:24:26.725531] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:26.725557] D [MSGID: 0] [iobuf.c:568:iobuf_get2] 0-iobuf: request for iobuf of size 3211524 is serviced using standard calloc() (0x7fb1bc002160) as it exceeds the maximum available buffer size [2019-08-04 10:24:26.725570] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:26.725582] E [socket.c:2252:__socket_read_frag] 0-rpc: wrong MSG-TYPE (4103) received from 10.7.3.217:34526 [2019-08-04 10:24:26.725590] T [socket.c:2928:socket_event_handler] 0-tcp.shared-server: (sock:5) socket_event_poll_in returned -1 [2019-08-04 10:24:26.725600] T [socket.c:231:socket_dump_info] 0-tcp.shared-server: $$$ server: disconnecting from (af:2,sock:5) 10.7.3.217 non-SSL (errno:0:Success) [2019-08-04 10:24:26.725605] D [socket.c:2946:socket_event_handler] 0-transport: EPOLLERR - disconnecting (sock:5) (non-SSL) [2019-08-04 10:24:26.725611] D [MSGID: 0] [iobuf.c:683:__iobuf_put] 0-iobuf: freeing the iobuf (0x7fb1bc002160) allocated with standard calloc() [2019-08-04 10:24:26.725655] T [socket.c:4583:fini] 0-tcp.shared-server: transport 0x7fb1b80053f0 destroyed [2019-08-04 10:24:26.726718] T [socket.c:961:__socket_nodelay] 0-shared-server: NODELAY enabled for socket 5 [2019-08-04 10:24:26.726736] T [socket.c:1050:__socket_keepalive] 0-shared-server: Keep-alive enabled for socket: 5, (idle: 20, interval: 2, max-probes: 9, timeout: 42) [2019-08-04 10:24:26.726747] T [socket.c:3086:socket_server_event_handler] 0-tcp.shared-server: XXX server:10.7.4.110:49156, client:10.7.3.217:34528 [2019-08-04 10:24:26.827220] T [socket.c:961:__socket_nodelay] 0-shared-server: NODELAY enabled for socket 19 [2019-08-04 10:24:26.827254] T [socket.c:1050:__socket_keepalive] 0-shared-server: Keep-alive enabled for socket: 19, (idle: 20, interval: 2, max-probes: 9, timeout: 42) [2019-08-04 10:24:26.827269] T [socket.c:3086:socket_server_event_handler] 0-tcp.shared-server: XXX server:10.7.4.110:49156, client:10.7.3.217:34536 [2019-08-04 10:24:26.927721] T [socket.c:961:__socket_nodelay] 0-shared-server: NODELAY enabled for socket 20 [2019-08-04 10:24:26.927768] T [socket.c:1050:__socket_keepalive] 0-shared-server: Keep-alive enabled for socket: 20, (idle: 20, interval: 2, max-probes: 9, timeout: 42) [2019-08-04 10:24:26.927797] T [socket.c:3086:socket_server_event_handler] 0-tcp.shared-server: XXX server:10.7.4.110:49156, client:10.7.3.217:34542 [2019-08-04 10:24:27.028127] T [socket.c:961:__socket_nodelay] 0-shared-server: NODELAY enabled for socket 21 [2019-08-04 10:24:27.028167] T [socket.c:1050:__socket_keepalive] 0-shared-server: Keep-alive enabled for socket: 21, (idle: 20, interval: 2, max-probes: 9, timeout: 42) [2019-08-04 10:24:27.028190] T [socket.c:3086:socket_server_event_handler] 0-tcp.shared-server: XXX server:10.7.4.110:49156, client:10.7.3.217:34548 [2019-08-04 10:24:27.128717] T [socket.c:961:__socket_nodelay] 0-shared-server: NODELAY enabled for socket 22 [2019-08-04 10:24:27.128787] T [socket.c:1050:__socket_keepalive] 0-shared-server: Keep-alive enabled for socket: 22, (idle: 20, interval: 2, max-probes: 9, timeout: 42) [2019-08-04 10:24:27.128814] T [socket.c:3086:socket_server_event_handler] 0-tcp.shared-server: XXX server:10.7.4.110:49156, client:10.7.3.217:34554 [2019-08-04 10:24:27.229151] T [socket.c:961:__socket_nodelay] 0-shared-server: NODELAY enabled for socket 23 [2019-08-04 10:24:27.229177] T [socket.c:1050:__socket_keepalive] 0-shared-server: Keep-alive enabled for socket: 23, (idle: 20, interval: 2, max-probes: 9, timeout: 42) [2019-08-04 10:24:27.229192] T [socket.c:3086:socket_server_event_handler] 0-tcp.shared-server: XXX server:10.7.4.110:49156, client:10.7.3.217:34568 [2019-08-04 10:24:27.329675] T [socket.c:961:__socket_nodelay] 0-shared-server: NODELAY enabled for socket 24 [2019-08-04 10:24:27.329713] T [socket.c:1050:__socket_keepalive] 0-shared-server: Keep-alive enabled for socket: 24, (idle: 20, interval: 2, max-probes: 9, timeout: 42) [2019-08-04 10:24:27.329737] T [socket.c:3086:socket_server_event_handler] 0-tcp.shared-server: XXX server:10.7.4.110:49156, client:10.7.3.217:34576 [2019-08-04 10:24:27.430084] T [socket.c:961:__socket_nodelay] 0-shared-server: NODELAY enabled for socket 25 [2019-08-04 10:24:27.430116] T [socket.c:1050:__socket_keepalive] 0-shared-server: Keep-alive enabled for socket: 25, (idle: 20, interval: 2, max-probes: 9, timeout: 42) [2019-08-04 10:24:27.430132] T [socket.c:3086:socket_server_event_handler] 0-tcp.shared-server: XXX server:10.7.4.110:49156, client:10.7.3.217:34584 [2019-08-04 10:24:27.530284] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:5) in:1, out:0, err:0 [2019-08-04 10:24:27.530310] T [socket.c:2891:socket_event_handler] 0-tcp.shared-server: server (sock:5) socket is not connected, completing connection [2019-08-04 10:24:27.530326] T [socket.c:2898:socket_event_handler] 0-tcp.shared-server: (sock:5) socket_complete_connection() returned 1 [2019-08-04 10:24:27.530332] T [socket.c:2902:socket_event_handler] 0-tcp.shared-server: (sock:5) returning to wait on socket [2019-08-04 10:24:27.530348] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:5) in:1, out:0, err:0 [2019-08-04 10:24:27.530366] T [socket.c:2910:socket_event_handler] 0-tcp.shared-server: Server socket (5) is already connected [2019-08-04 10:24:27.530373] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:27.530392] T [socket.c:2928:socket_event_handler] 0-tcp.shared-server: (sock:5) socket_event_poll_in returned -12 [2019-08-04 10:24:27.530405] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:5) in:1, out:0, err:0 [2019-08-04 10:24:27.530408] T [socket.c:231:socket_dump_info] 0-tcp.shared-server: $$$ server: disconnecting from (af:2,sock:5) 10.7.3.217 non-SSL (errno:0:Success) [2019-08-04 10:24:27.530412] T [socket.c:2910:socket_event_handler] 0-tcp.shared-server: Server socket (5) is already connected [2019-08-04 10:24:27.530415] D [socket.c:2946:socket_event_handler] 0-transport: EPOLLERR - disconnecting (sock:5) (non-SSL) [2019-08-04 10:24:27.530427] T [socket.c:2928:socket_event_handler] 0-tcp.shared-server: (sock:5) socket_event_poll_in returned -12 [2019-08-04 10:24:27.530441] T [socket.c:4583:fini] 0-tcp.shared-server: transport 0x7fb1bc08d9b0 destroyed [2019-08-04 10:24:27.530457] T [socket.c:231:socket_dump_info] 0- (errno:-1:Unknown error -1): $$$ server: disconnecting from (af:2,sock:-1) 10.7.3.217 non-SSL (errno:-1:Unknown error -1) [2019-08-04 10:24:27.530465] D [socket.c:2946:socket_event_handler] 0-transport: EPOLLERR - disconnecting (sock:-1) (non-SSL) [2019-08-04 10:24:27.530514] E [socket.c:1303:socket_event_poll_err] (-->/lib64/libglusterfs.so.0(+0x8c286) [0x7fb1d7b13286] -->/usr/lib64/glusterfs/6.1/rpc-transport/socket.so(+0xa48a) [0x7fb1cbe7248a] -->/usr/lib64/glusterfs/6.1/rpc-transport/socket.so(+0x81fc) [0x7fb1cbe701fc] ) 0-socket: invalid argument: this->private [Invalid argument] [2019-08-04 10:24:27.580375] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:19) in:1, out:0, err:0 [2019-08-04 10:24:27.580398] T [socket.c:2891:socket_event_handler] 0-tcp.shared-server: server (sock:19) socket is not connected, completing connection [2019-08-04 10:24:27.580411] T [socket.c:2898:socket_event_handler] 0-tcp.shared-server: (sock:19) socket_complete_connection() returned 1 [2019-08-04 10:24:27.580417] T [socket.c:2902:socket_event_handler] 0-tcp.shared-server: (sock:19) returning to wait on socket [2019-08-04 10:24:27.580425] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:19) in:1, out:0, err:0 [2019-08-04 10:24:27.580431] T [socket.c:2910:socket_event_handler] 0-tcp.shared-server: Server socket (19) is already connected [2019-08-04 10:24:27.580436] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:27.580447] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:27.580453] T [socket.c:2349:__socket_proto_state_machine] 0-tcp.shared-server: partial fragment header read [2019-08-04 10:24:27.630659] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:20) in:1, out:0, err:0 [2019-08-04 10:24:27.630697] T [socket.c:2891:socket_event_handler] 0-tcp.shared-server: server (sock:20) socket is not connected, completing connection [2019-08-04 10:24:27.630725] T [socket.c:2898:socket_event_handler] 0-tcp.shared-server: (sock:20) socket_complete_connection() returned 1 [2019-08-04 10:24:27.630737] T [socket.c:2902:socket_event_handler] 0-tcp.shared-server: (sock:20) returning to wait on socket [2019-08-04 10:24:27.630763] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:20) in:1, out:0, err:0 [2019-08-04 10:24:27.630768] T [socket.c:2910:socket_event_handler] 0-tcp.shared-server: Server socket (20) is already connected [2019-08-04 10:24:27.630773] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:27.630817] D [MSGID: 0] [iobuf.c:568:iobuf_get2] 0-iobuf: request for iobuf of size 67174404 is serviced using standard calloc() (0x7fb1bc002340) as it exceeds the maximum available buffer size [2019-08-04 10:24:27.630834] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:27.630842] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:27.630854] T [socket.c:2231:__socket_read_frag] 0-tcp.shared-server: partial read on non-blocking socket [2019-08-04 10:24:27.630862] T [socket.c:2928:socket_event_handler] 0-tcp.shared-server: (sock:20) socket_event_poll_in returned 0 [2019-08-04 10:24:27.680809] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:21) in:1, out:0, err:0 [2019-08-04 10:24:27.680846] T [socket.c:2891:socket_event_handler] 0-tcp.shared-server: server (sock:21) socket is not connected, completing connection [2019-08-04 10:24:27.680874] T [socket.c:2898:socket_event_handler] 0-tcp.shared-server: (sock:21) socket_complete_connection() returned 1 [2019-08-04 10:24:27.680885] T [socket.c:2902:socket_event_handler] 0-tcp.shared-server: (sock:21) returning to wait on socket [2019-08-04 10:24:27.680910] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:21) in:1, out:0, err:0 [2019-08-04 10:24:27.680920] T [socket.c:2910:socket_event_handler] 0-tcp.shared-server: Server socket (21) is already connected [2019-08-04 10:24:27.680931] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:27.680979] D [MSGID: 0] [iobuf.c:568:iobuf_get2] 0-iobuf: request for iobuf of size 486539780 is serviced using standard calloc() (0x7fb1bc091530) as it exceeds the maximum available buffer size [2019-08-04 10:24:27.681010] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:27.681019] E [socket.c:2252:__socket_read_frag] 0-rpc: wrong MSG-TYPE (1650549872) received from 10.7.3.217:34548 [2019-08-04 10:24:27.681024] T [socket.c:2928:socket_event_handler] 0-tcp.shared-server: (sock:21) socket_event_poll_in returned -1 [2019-08-04 10:24:27.681038] T [socket.c:231:socket_dump_info] 0-tcp.shared-server: $$$ server: disconnecting from (af:2,sock:21) 10.7.3.217 non-SSL (errno:0:Success) [2019-08-04 10:24:27.681044] D [socket.c:2946:socket_event_handler] 0-transport: EPOLLERR - disconnecting (sock:21) (non-SSL) [2019-08-04 10:24:27.681050] D [MSGID: 0] [iobuf.c:683:__iobuf_put] 0-iobuf: freeing the iobuf (0x7fb1bc091530) allocated with standard calloc() [2019-08-04 10:24:27.681087] T [socket.c:4583:fini] 0-tcp.shared-server: transport 0x7fb1bc08f130 destroyed [2019-08-04 10:24:27.730907] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:22) in:1, out:0, err:0 [2019-08-04 10:24:27.730937] T [socket.c:2891:socket_event_handler] 0-tcp.shared-server: server (sock:22) socket is not connected, completing connection [2019-08-04 10:24:27.730950] T [socket.c:2898:socket_event_handler] 0-tcp.shared-server: (sock:22) socket_complete_connection() returned 1 [2019-08-04 10:24:27.730955] T [socket.c:2902:socket_event_handler] 0-tcp.shared-server: (sock:22) returning to wait on socket [2019-08-04 10:24:27.730961] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:22) in:1, out:0, err:0 [2019-08-04 10:24:27.730965] T [socket.c:2910:socket_event_handler] 0-tcp.shared-server: Server socket (22) is already connected [2019-08-04 10:24:27.730970] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:27.730984] T [socket.c:2928:socket_event_handler] 0-tcp.shared-server: (sock:22) socket_event_poll_in returned -12 [2019-08-04 10:24:27.730997] T [socket.c:231:socket_dump_info] 0-tcp.shared-server: $$$ server: disconnecting from (af:2,sock:22) 10.7.3.217 non-SSL (errno:0:Success) [2019-08-04 10:24:27.731002] D [socket.c:2946:socket_event_handler] 0-transport: EPOLLERR - disconnecting (sock:22) (non-SSL) [2019-08-04 10:24:27.731000] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:22) in:1, out:0, err:0 [2019-08-04 10:24:27.731013] T [socket.c:4583:fini] 0-tcp.shared-server: transport 0x7fb1bc08f960 destroyed [2019-08-04 10:24:27.731017] T [socket.c:2910:socket_event_handler] 0-tcp.shared-server: Server socket (-1) is already connected [2019-08-04 10:24:27.731077] E [socket.c:2317:__socket_proto_state_machine] (-->/lib64/libpthread.so.0(+0x7dd5) [0x7fb1d68eddd5] -->/lib64/libglusterfs.so.0(+0x8c286) [0x7fb1d7b13286] -->/usr/lib64/glusterfs/6.1/rpc-transport/socket.so(+0xc972) [0x7fb1cbe74972] ) 0-socket: invalid argument: this->private [Invalid argument] [2019-08-04 10:24:27.731093] T [socket.c:2928:socket_event_handler] 0-tcp.shared-server: (sock:-1) socket_event_poll_in returned -1 [2019-08-04 10:24:27.731102] T [socket.c:231:socket_dump_info] 0-tcp.shared-server: $$$ server: disconnecting from (af:2,sock:-1) 10.7.3.217 non-SSL (errno:-1:Unknown error -1) [2019-08-04 10:24:27.731108] D [socket.c:2946:socket_event_handler] 0-transport: EPOLLERR - disconnecting (sock:-1) (non-SSL) [2019-08-04 10:24:27.731138] E [socket.c:1303:socket_event_poll_err] (-->/lib64/libglusterfs.so.0(+0x8c286) [0x7fb1d7b13286] -->/usr/lib64/glusterfs/6.1/rpc-transport/socket.so(+0xa48a) [0x7fb1cbe7248a] -->/usr/lib64/glusterfs/6.1/rpc-transport/socket.so(+0x81fc) [0x7fb1cbe701fc] ) 0-socket: invalid argument: this->private [Invalid argument] [2019-08-04 10:24:27.781095] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:23) in:1, out:0, err:0 [2019-08-04 10:24:27.781133] T [socket.c:2891:socket_event_handler] 0-tcp.shared-server: server (sock:23) socket is not connected, completing connection [2019-08-04 10:24:27.781171] T [socket.c:2898:socket_event_handler] 0-tcp.shared-server: (sock:23) socket_complete_connection() returned 1 [2019-08-04 10:24:27.781183] T [socket.c:2902:socket_event_handler] 0-tcp.shared-server: (sock:23) returning to wait on socket [2019-08-04 10:24:27.781189] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:23) in:1, out:0, err:0 [2019-08-04 10:24:27.781203] T [socket.c:2910:socket_event_handler] 0-tcp.shared-server: Server socket (23) is already connected [2019-08-04 10:24:27.781208] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:27.781240] D [MSGID: 0] [iobuf.c:568:iobuf_get2] 0-iobuf: request for iobuf of size 944261959 is serviced using standard calloc() (0x7fb1bc0329b0) as it exceeds the maximum available buffer size [2019-08-04 10:24:27.781253] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:27.781261] E [socket.c:2252:__socket_read_frag] 0-rpc: wrong MSG-TYPE (1299275112) received from 10.7.3.217:34568 [2019-08-04 10:24:27.781265] T [socket.c:2928:socket_event_handler] 0-tcp.shared-server: (sock:23) socket_event_poll_in returned -1 [2019-08-04 10:24:27.781275] T [socket.c:231:socket_dump_info] 0-tcp.shared-server: $$$ server: disconnecting from (af:2,sock:23) 10.7.3.217 non-SSL (errno:0:Success) [2019-08-04 10:24:27.781279] D [socket.c:2946:socket_event_handler] 0-transport: EPOLLERR - disconnecting (sock:23) (non-SSL) [2019-08-04 10:24:27.781284] D [MSGID: 0] [iobuf.c:683:__iobuf_put] 0-iobuf: freeing the iobuf (0x7fb1bc0329b0) allocated with standard calloc() [2019-08-04 10:24:27.781317] T [socket.c:4583:fini] 0-tcp.shared-server: transport 0x7fb1bc090190 destroyed [2019-08-04 10:24:27.831241] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:24) in:1, out:0, err:0 [2019-08-04 10:24:27.831259] T [socket.c:2891:socket_event_handler] 0-tcp.shared-server: server (sock:24) socket is not connected, completing connection [2019-08-04 10:24:27.831271] T [socket.c:2898:socket_event_handler] 0-tcp.shared-server: (sock:24) socket_complete_connection() returned 1 [2019-08-04 10:24:27.831275] T [socket.c:2902:socket_event_handler] 0-tcp.shared-server: (sock:24) returning to wait on socket [2019-08-04 10:24:27.831281] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:24) in:1, out:0, err:0 [2019-08-04 10:24:27.831285] T [socket.c:2910:socket_event_handler] 0-tcp.shared-server: Server socket (24) is already connected [2019-08-04 10:24:27.831290] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:27.831309] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:27.831323] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:27.831328] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:27.831333] T [socket.c:1651:__socket_read_request] 0-tcp.shared-server: partial read on non-blocking socket [2019-08-04 10:24:27.831338] T [socket.c:2928:socket_event_handler] 0-tcp.shared-server: (sock:24) socket_event_poll_in returned 0 [2019-08-04 10:24:27.881538] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:25) in:1, out:0, err:0 [2019-08-04 10:24:27.881557] T [socket.c:2891:socket_event_handler] 0-tcp.shared-server: server (sock:25) socket is not connected, completing connection [2019-08-04 10:24:27.881576] T [socket.c:2898:socket_event_handler] 0-tcp.shared-server: (sock:25) socket_complete_connection() returned 1 [2019-08-04 10:24:27.881581] T [socket.c:2902:socket_event_handler] 0-tcp.shared-server: (sock:25) returning to wait on socket [2019-08-04 10:24:27.881587] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:25) in:1, out:0, err:0 [2019-08-04 10:24:27.881591] T [socket.c:2910:socket_event_handler] 0-tcp.shared-server: Server socket (25) is already connected [2019-08-04 10:24:27.881596] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:27.881647] D [MSGID: 0] [iobuf.c:568:iobuf_get2] 0-iobuf: request for iobuf of size 989855748 is serviced using standard calloc() (0x7fb1bc0915a0) as it exceeds the maximum available buffer size [2019-08-04 10:24:27.881662] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:27.881669] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:27.881677] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:27.881681] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:27.881686] T [socket.c:1419:__socket_read_simple_msg] 0-tcp.shared-server: partial read on non-blocking socket. [2019-08-04 10:24:27.881692] T [socket.c:2928:socket_event_handler] 0-tcp.shared-server: (sock:25) socket_event_poll_in returned 0 [2019-08-04 10:24:34.370870] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:11) in:1, out:0, err:0 [2019-08-04 10:24:34.370906] T [socket.c:2910:socket_event_handler] 0-tcp.shared-server: Server socket (11) is already connected [2019-08-04 10:24:34.370913] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:34.370923] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:34.370942] T [rpcsvc.c:744:rpcsvc_handle_rpc_call] 0-rpcsvc: Client port: 49090 [2019-08-04 10:24:34.370956] T [rpcsvc-auth.c:445:rpcsvc_auth_request_init] 0-rpc-service: Auth handler: AUTH_GLUSTERFS-v3 [2019-08-04 10:24:34.370956] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:11) in:1, out:0, err:0 [2019-08-04 10:24:34.370971] T [rpcsvc.c:549:rpcsvc_request_create] 0-rpc-service: received rpc-message (XID: 0x120989, Ver: 2, Program: 1298437, ProgVers: 400, Proc: 14) from rpc-transport (tcp.shared-server) [2019-08-04 10:24:34.370984] T [socket.c:2910:socket_event_handler] 0-tcp.shared-server: Server socket (11) is already connected [2019-08-04 10:24:34.371003] T [auth-glusterfs.c:363:auth_glusterfs_v3_authenticate] 0-rpc-service: Auth Info: pid: 9855, uid: 694, gid: 692, owner: 0000000000000000, flags: 0 [2019-08-04 10:24:34.371008] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:34.371018] T [rpcsvc.c:375:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterFS 4.x v1 - STATFS for 10.7.1.209:49090 [2019-08-04 10:24:34.371033] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:34.371053] T [socket.c:2928:socket_event_handler] 0-tcp.shared-server: (sock:11) socket_event_poll_in returned 0 [2019-08-04 10:24:34.371077] T [rpcsvc.c:744:rpcsvc_handle_rpc_call] 0-rpcsvc: Client port: 49090 [2019-08-04 10:24:34.371089] T [rpcsvc-auth.c:445:rpcsvc_auth_request_init] 0-rpc-service: Auth handler: AUTH_GLUSTERFS-v3 [2019-08-04 10:24:34.371096] T [rpcsvc.c:375:rpcsvc_program_actor] 0-rpc-service: Actor found: GlusterFS 4.x v1 - STATFS for 10.7.1.209:49090 [2019-08-04 10:24:34.371099] T [rpcsvc.c:549:rpcsvc_request_create] 0-rpc-service: received rpc-message (XID: 0x120990, Ver: 2, Program: 123451501, ProgVers: 1, Proc: 2) from rpc-transport (tcp.shared-server) [2019-08-04 10:24:34.371126] T [auth-glusterfs.c:363:auth_glusterfs_v3_authenticate] 0-rpc-service: Auth Info: pid: 0, uid: 0, gid: 0, owner: 00000000, flags: 0 [2019-08-04 10:24:34.371144] T [rpcsvc.c:375:rpcsvc_program_actor] 0-rpc-service: Actor found: GF-DUMP - PING for 10.7.1.209:49090 [2019-08-04 10:24:34.371151] T [rpcsvc.c:1533:rpcsvc_submit_generic] 0-rpc-service: Tx message: 12 [2019-08-04 10:24:34.371157] T [rpcsvc.c:1069:rpcsvc_record_build_header] 0-rpc-service: Reply fraglen 36, payload: 12, rpc hdr: 24 [2019-08-04 10:24:34.371173] D [client_t.c:324:gf_client_ref] (-->/usr/lib64/glusterfs/6.1/xlator/protocol/server.so(+0x34dd5) [0x7fb1c2a58dd5] -->/usr/lib64/glusterfs/6.1/xlator/protocol/server.so(+0x115ed) [0x7fb1c2a355ed] -->/lib64/libglusterfs.so.0(gf_client_ref+0x6e) [0x7fb1d7b0fbde] ) 0-client_t: CTX_ID:ca89ac1a-1d4d-4aea-b264-fce7b1378aa4-GRAPH_ID:2-PID:22025-HOST:gfs-migration701-PC_NAME:shared-client-1-RECON_NO:-2: ref-count 2 [2019-08-04 10:24:34.371179] T [rpcsvc.c:1585:rpcsvc_submit_generic] 0-rpc-service: submitted reply for rpc-message (XID: 0x1d89e, Program: GF-DUMP, ProgVers: 1, Proc: 2) to rpc-transport (tcp.shared-server) [2019-08-04 10:24:34.371197] T [socket.c:2928:socket_event_handler] 0-tcp.shared-server: (sock:11) socket_event_poll_in returned 0 [2019-08-04 10:24:34.371195] T [MSGID: 0] [server-rpc-fops_v2.c:2802:server4_statfs_resume] 0-stack-trace: stack-address: 0x7fb188000aa8, winding from shared-server to /opt/data/shared [2019-08-04 10:24:34.371210] T [MSGID: 0] [defaults.c:3158:default_statfs] 0-stack-trace: stack-address: 0x7fb188000aa8, winding from /opt/data/shared to shared-io-stats [2019-08-04 10:24:34.371220] T [MSGID: 0] [io-stats.c:2906:io_stats_statfs] 0-stack-trace: stack-address: 0x7fb188000aa8, winding from shared-io-stats to shared-quota [2019-08-04 10:24:34.371229] T [MSGID: 0] [quota.c:4557:quota_statfs] 0-stack-trace: stack-address: 0x7fb188000aa8, winding from shared-quota to shared-index [2019-08-04 10:24:34.371237] T [MSGID: 0] [defaults.c:3158:default_statfs] 0-stack-trace: stack-address: 0x7fb188000aa8, winding from shared-index to shared-barrier [2019-08-04 10:24:34.371245] T [MSGID: 0] [defaults.c:3158:default_statfs] 0-stack-trace: stack-address: 0x7fb188000aa8, winding from shared-barrier to shared-marker [2019-08-04 10:24:34.371253] T [MSGID: 0] [defaults.c:3158:default_statfs] 0-stack-trace: stack-address: 0x7fb188000aa8, winding from shared-marker to shared-selinux [2019-08-04 10:24:34.371260] T [MSGID: 0] [defaults.c:3158:default_statfs] 0-stack-trace: stack-address: 0x7fb188000aa8, winding from shared-selinux to shared-io-threads [2019-08-04 10:24:34.371272] D [MSGID: 0] [io-threads.c:376:iot_schedule] 0-shared-io-threads: STATFS scheduled as fast priority fop [2019-08-04 10:24:34.371305] T [MSGID: 0] [defaults.c:2325:default_statfs_resume] 0-stack-trace: stack-address: 0x7fb188000aa8, winding from shared-io-threads to shared-upcall [2019-08-04 10:24:34.371321] T [MSGID: 0] [upcall.c:1232:up_statfs] 0-stack-trace: stack-address: 0x7fb188000aa8, winding from shared-upcall to shared-leases [2019-08-04 10:24:34.371329] T [MSGID: 0] [defaults.c:3158:default_statfs] 0-stack-trace: stack-address: 0x7fb188000aa8, winding from shared-leases to shared-read-only [2019-08-04 10:24:34.371336] T [MSGID: 0] [defaults.c:3158:default_statfs] 0-stack-trace: stack-address: 0x7fb188000aa8, winding from shared-read-only to shared-worm [2019-08-04 10:24:34.371344] T [MSGID: 0] [defaults.c:3158:default_statfs] 0-stack-trace: stack-address: 0x7fb188000aa8, winding from shared-worm to shared-locks [2019-08-04 10:24:34.371352] T [MSGID: 0] [posix.c:4388:pl_statfs] 0-stack-trace: stack-address: 0x7fb188000aa8, winding from shared-locks to shared-access-control [2019-08-04 10:24:34.371359] T [MSGID: 0] [defaults.c:3158:default_statfs] 0-stack-trace: stack-address: 0x7fb188000aa8, winding from shared-access-control to shared-bitrot-stub [2019-08-04 10:24:34.371365] T [MSGID: 0] [defaults.c:3158:default_statfs] 0-stack-trace: stack-address: 0x7fb188000aa8, winding from shared-bitrot-stub to shared-changelog [2019-08-04 10:24:34.371372] T [MSGID: 0] [defaults.c:3158:default_statfs] 0-stack-trace: stack-address: 0x7fb188000aa8, winding from shared-changelog to shared-trash [2019-08-04 10:24:34.371383] T [MSGID: 0] [defaults.c:3158:default_statfs] 0-stack-trace: stack-address: 0x7fb188000aa8, winding from shared-trash to shared-posix [2019-08-04 10:24:34.371423] T [MSGID: 0] [posix-inode-fd-ops.c:2287:posix_statfs] 0-stack-trace: stack-address: 0x7fb188000aa8, shared-posix returned 0 [2019-08-04 10:24:34.371436] T [MSGID: 0] [posix.c:4379:pl_statfs_cbk] 0-stack-trace: stack-address: 0x7fb188000aa8, shared-locks returned 0 [2019-08-04 10:24:34.371460] D [logging.c:2006:_gf_msg_internal] 0-logging-infra: Buffer overflow of a buffer whose size limit is 5. About to flush least recently used log message to disk [2019-08-04 10:24:34.371447] T [MSGID: 0] [upcall.c:1211:up_statfs_cbk] 0-stack-trace: stack-address: 0x7fb188000aa8, shared-upcall returned 0 [2019-08-04 10:24:34.371458] T [MSGID: 0] [defaults.c:1642:default_statfs_cbk] 0-stack-trace: stack-address: 0x7fb188000aa8, shared-io-threads returned 0 [2019-08-04 10:24:34.371494] T [rpcsvc.c:1533:rpcsvc_submit_generic] 0-rpc-service: Tx message: 108 [2019-08-04 10:24:34.371503] T [rpcsvc.c:1069:rpcsvc_record_build_header] 0-rpc-service: Reply fraglen 132, payload: 108, rpc hdr: 24 [2019-08-04 10:24:34.371529] T [rpcsvc.c:1585:rpcsvc_submit_generic] 0-rpc-service: submitted reply for rpc-message (XID: 0x1d89d, Program: GlusterFS 4.x v1, ProgVers: 400, Proc: 14) to rpc-transport (tcp.shared-server) [2019-08-04 10:24:34.371577] D [client_t.c:433:gf_client_unref] (-->/usr/lib64/glusterfs/6.1/xlator/protocol/server.so(+0x5afc3) [0x7fb1c2a7efc3] -->/usr/lib64/glusterfs/6.1/xlator/protocol/server.so(+0xadeb) [0x7fb1c2a2edeb] -->/lib64/libglusterfs.so.0(gf_client_unref+0x7b) [0x7fb1d7b0fd2b] ) 0-client_t: CTX_ID:ca89ac1a-1d4d-4aea-b264-fce7b1378aa4-GRAPH_ID:2-PID:22025-HOST:gfs-migration701-PC_NAME:shared-client-1-RECON_NO:-2: ref-count 1 [2019-08-04 10:24:35.943492] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:19) in:1, out:0, err:0 [2019-08-04 10:24:35.943549] T [socket.c:2910:socket_event_handler] 0-tcp.shared-server: Server socket (19) is already connected [2019-08-04 10:24:35.943562] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:35.943590] D [socket.c:692:__socket_rwv] 0-tcp.shared-server: EOF on socket 19 (errno:11:Resource temporarily unavailable); returning ENODATA [2019-08-04 10:24:35.943601] T [socket.c:2928:socket_event_handler] 0-tcp.shared-server: (sock:19) socket_event_poll_in returned -1 [2019-08-04 10:24:35.943615] T [socket.c:231:socket_dump_info] 0-tcp.shared-server: $$$ server: disconnecting from (af:2,sock:19) 10.7.3.217 non-SSL (errno:0:Success) [2019-08-04 10:24:35.943640] D [socket.c:2946:socket_event_handler] 0-transport: EPOLLERR - disconnecting (sock:19) (non-SSL) [2019-08-04 10:24:35.943655] T [socket.c:4583:fini] 0-tcp.shared-server: transport 0x7fb1bc08e0d0 destroyed [2019-08-04 10:24:35.943716] T [socket.c:961:__socket_nodelay] 0-shared-server: NODELAY enabled for socket 5 [2019-08-04 10:24:35.943730] T [socket.c:1050:__socket_keepalive] 0-shared-server: Keep-alive enabled for socket: 5, (idle: 20, interval: 2, max-probes: 9, timeout: 42) [2019-08-04 10:24:35.943746] T [socket.c:3086:socket_server_event_handler] 0-tcp.shared-server: XXX server:10.7.4.110:49156, client:10.7.3.217:36410 [2019-08-04 10:24:35.943755] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:20) in:1, out:0, err:0 [2019-08-04 10:24:35.943771] T [socket.c:2910:socket_event_handler] 0-tcp.shared-server: Server socket (20) is already connected [2019-08-04 10:24:35.943777] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:35.943789] D [socket.c:692:__socket_rwv] 0-tcp.shared-server: EOF on socket 20 (errno:22:Invalid argument); returning ENODATA [2019-08-04 10:24:35.943794] T [socket.c:2928:socket_event_handler] 0-tcp.shared-server: (sock:20) socket_event_poll_in returned -1 [2019-08-04 10:24:35.943812] T [socket.c:231:socket_dump_info] 0-tcp.shared-server: $$$ server: disconnecting from (af:2,sock:20) 10.7.3.217 non-SSL (errno:0:Success) [2019-08-04 10:24:35.943817] D [socket.c:2946:socket_event_handler] 0-transport: EPOLLERR - disconnecting (sock:20) (non-SSL) [2019-08-04 10:24:35.943828] D [logging.c:2006:_gf_msg_internal] 0-logging-infra: Buffer overflow of a buffer whose size limit is 5. About to flush least recently used log message to disk [2019-08-04 10:24:34.371487] T [MSGID: 0] [io-stats.c:2354:io_stats_statfs_cbk] 0-stack-trace: stack-address: 0x7fb188000aa8, shared-io-stats returned 0 [2019-08-04 10:24:35.943824] D [MSGID: 0] [iobuf.c:683:__iobuf_put] 0-iobuf: freeing the iobuf (0x7fb1bc002340) allocated with standard calloc() [2019-08-04 10:24:35.943879] T [socket.c:4583:fini] 0-tcp.shared-server: transport 0x7fb1bc08e900 destroyed [2019-08-04 10:24:35.943953] T [socket.c:961:__socket_nodelay] 0-shared-server: NODELAY enabled for socket 19 [2019-08-04 10:24:35.943965] T [socket.c:1050:__socket_keepalive] 0-shared-server: Keep-alive enabled for socket: 19, (idle: 20, interval: 2, max-probes: 9, timeout: 42) [2019-08-04 10:24:35.943978] T [socket.c:3086:socket_server_event_handler] 0-tcp.shared-server: XXX server:10.7.4.110:49156, client:10.7.3.217:36412 [2019-08-04 10:24:35.944146] T [socket.c:961:__socket_nodelay] 0-shared-server: NODELAY enabled for socket 20 [2019-08-04 10:24:35.944157] T [socket.c:1050:__socket_keepalive] 0-shared-server: Keep-alive enabled for socket: 20, (idle: 20, interval: 2, max-probes: 9, timeout: 42) [2019-08-04 10:24:35.944166] T [socket.c:3086:socket_server_event_handler] 0-tcp.shared-server: XXX server:10.7.4.110:49156, client:10.7.3.217:36414 [2019-08-04 10:24:35.944326] T [socket.c:961:__socket_nodelay] 0-shared-server: NODELAY enabled for socket 21 [2019-08-04 10:24:35.944336] T [socket.c:1050:__socket_keepalive] 0-shared-server: Keep-alive enabled for socket: 21, (idle: 20, interval: 2, max-probes: 9, timeout: 42) [2019-08-04 10:24:35.944345] T [socket.c:3086:socket_server_event_handler] 0-tcp.shared-server: XXX server:10.7.4.110:49156, client:10.7.3.217:36416 [2019-08-04 10:24:35.944512] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:24) in:1, out:0, err:0 [2019-08-04 10:24:35.944517] T [socket.c:2910:socket_event_handler] 0-tcp.shared-server: Server socket (24) is already connected [2019-08-04 10:24:35.944522] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:35.944532] D [socket.c:692:__socket_rwv] 0-tcp.shared-server: EOF on socket 24 (errno:61:No data available); returning ENODATA [2019-08-04 10:24:35.944537] T [socket.c:2928:socket_event_handler] 0-tcp.shared-server: (sock:24) socket_event_poll_in returned -1 [2019-08-04 10:24:35.944533] T [socket.c:961:__socket_nodelay] 0-shared-server: NODELAY enabled for socket 22 [2019-08-04 10:24:35.944544] T [socket.c:231:socket_dump_info] 0-tcp.shared-server: $$$ server: disconnecting from (af:2,sock:24) 10.7.3.217 non-SSL (errno:0:Success) [2019-08-04 10:24:35.944561] D [socket.c:2946:socket_event_handler] 0-transport: EPOLLERR - disconnecting (sock:24) (non-SSL) [2019-08-04 10:24:35.944571] T [socket.c:4583:fini] 0-tcp.shared-server: transport 0x7fb1bc0909c0 destroyed [2019-08-04 10:24:35.944559] T [socket.c:1050:__socket_keepalive] 0-shared-server: Keep-alive enabled for socket: 22, (idle: 20, interval: 2, max-probes: 9, timeout: 42) [2019-08-04 10:24:35.944591] T [socket.c:3086:socket_server_event_handler] 0-tcp.shared-server: XXX server:10.7.4.110:49156, client:10.7.3.217:36418 [2019-08-04 10:24:35.944846] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:25) in:1, out:0, err:0 [2019-08-04 10:24:35.944858] T [socket.c:2910:socket_event_handler] 0-tcp.shared-server: Server socket (25) is already connected [2019-08-04 10:24:35.944864] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:35.944872] D [socket.c:692:__socket_rwv] 0-tcp.shared-server: EOF on socket 25 (errno:61:No data available); returning ENODATA [2019-08-04 10:24:35.944886] T [socket.c:961:__socket_nodelay] 0-shared-server: NODELAY enabled for socket 23 [2019-08-04 10:24:35.944894] W [socket.c:1410:__socket_read_simple_msg] 0-tcp.shared-server: reading from socket failed. Error (No data available), peer (10.7.3.217:34584) [2019-08-04 10:24:35.944911] T [socket.c:1050:__socket_keepalive] 0-shared-server: Keep-alive enabled for socket: 23, (idle: 20, interval: 2, max-probes: 9, timeout: 42) [2019-08-04 10:24:35.944913] T [socket.c:2928:socket_event_handler] 0-tcp.shared-server: (sock:25) socket_event_poll_in returned -1 [2019-08-04 10:24:35.944927] T [socket.c:3086:socket_server_event_handler] 0-tcp.shared-server: XXX server:10.7.4.110:49156, client:10.7.3.217:36420 [2019-08-04 10:24:35.944930] T [socket.c:231:socket_dump_info] 0-tcp.shared-server: $$$ server: disconnecting from (af:2,sock:25) 10.7.3.217 non-SSL (errno:0:Success) [2019-08-04 10:24:35.944936] D [socket.c:2946:socket_event_handler] 0-transport: EPOLLERR - disconnecting (sock:25) (non-SSL) [2019-08-04 10:24:35.944940] D [MSGID: 0] [iobuf.c:683:__iobuf_put] 0-iobuf: freeing the iobuf (0x7fb1bc0915a0) allocated with standard calloc() [2019-08-04 10:24:35.944971] T [socket.c:4583:fini] 0-tcp.shared-server: transport 0x7fb1bc0866e0 destroyed [2019-08-04 10:24:35.945018] T [socket.c:961:__socket_nodelay] 0-shared-server: NODELAY enabled for socket 24 [2019-08-04 10:24:35.945029] T [socket.c:1050:__socket_keepalive] 0-shared-server: Keep-alive enabled for socket: 24, (idle: 20, interval: 2, max-probes: 9, timeout: 42) [2019-08-04 10:24:35.945040] T [socket.c:3086:socket_server_event_handler] 0-tcp.shared-server: XXX server:10.7.4.110:49156, client:10.7.3.217:36422 [2019-08-04 10:24:35.945202] T [socket.c:961:__socket_nodelay] 0-shared-server: NODELAY enabled for socket 25 [2019-08-04 10:24:35.945212] T [socket.c:1050:__socket_keepalive] 0-shared-server: Keep-alive enabled for socket: 25, (idle: 20, interval: 2, max-probes: 9, timeout: 42) [2019-08-04 10:24:35.945220] T [socket.c:3086:socket_server_event_handler] 0-tcp.shared-server: XXX server:10.7.4.110:49156, client:10.7.3.217:36424 [2019-08-04 10:24:35.945259] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:25) in:1, out:0, err:0 [2019-08-04 10:24:35.945264] T [socket.c:2891:socket_event_handler] 0-tcp.shared-server: server (sock:25) socket is not connected, completing connection [2019-08-04 10:24:35.945274] T [socket.c:2898:socket_event_handler] 0-tcp.shared-server: (sock:25) socket_complete_connection() returned 1 [2019-08-04 10:24:35.945279] T [socket.c:2902:socket_event_handler] 0-tcp.shared-server: (sock:25) returning to wait on socket [2019-08-04 10:24:35.945284] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:25) in:1, out:0, err:0 [2019-08-04 10:24:35.945288] T [socket.c:2910:socket_event_handler] 0-tcp.shared-server: Server socket (25) is already connected [2019-08-04 10:24:35.945292] T [socket.c:520:__socket_ssl_readv] 0-tcp.shared-server: ***** reading over non-SSL [2019-08-04 10:24:35.945302] T [socket.c:2928:socket_event_handler] 0-tcp.shared-server: (sock:25) socket_event_poll_in returned -12 [2019-08-04 10:24:35.945311] T [socket.c:231:socket_dump_info] 0-tcp.shared-server: $$$ server: disconnecting from (af:2,sock:25) 10.7.3.217 non-SSL (errno:0:Success) [2019-08-04 10:24:35.945315] D [socket.c:2946:socket_event_handler] 0-transport: EPOLLERR - disconnecting (sock:25) (non-SSL) [2019-08-04 10:24:35.945322] T [socket.c:4583:fini] 0-tcp.shared-server: transport 0x7fb1b800d8a0 destroyed [2019-08-04 10:24:35.945317] T [socket.c:2884:socket_event_handler] 0-tcp.shared-server: server (sock:25) in:1, out:0, err:0 [2019-08-04 10:24:35.945359] D [logging.c:1813:gf_log_flush_extra_msgs] 0-logging-infra: Log buffer size reduced. About to flush 5 extra log messages [2019-08-04 10:24:24.371553] T [MSGID: 0] [defaults.c:1642:default_statfs_cbk] 0-stack-trace: stack-address: 0x7fb18c000aa8, shared-io-threads returned 0 [2019-08-04 10:24:35.945380] D [logging.c:1816:gf_log_flush_extra_msgs] 0-logging-infra: Just flushed 5 extra log messages pending frames: patchset: git://git.gluster.org/glusterfs.git signal received: 11 time of crash: 2019-08-04 10:24:35 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 6.1 /lib64/libglusterfs.so.0(+0x26db0)[0x7fb1d7aaddb0] /lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7fb1d7ab87b4] /lib64/libc.so.6(+0x36280)[0x7fb1d60ed280] /usr/lib64/glusterfs/6.1/rpc-transport/socket.so(+0xa4cc)[0x7fb1cbe724cc] /lib64/libglusterfs.so.0(+0x8c286)[0x7fb1d7b13286] /lib64/libpthread.so.0(+0x7dd5)[0x7fb1d68eddd5] /lib64/libc.so.6(clone+0x6d)[0x7fb1d61b4ead] --------- -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 19 09:33:26 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 09:33:26 +0000 Subject: [Bugs] [Bug 1740413] Gluster volume bricks crashes when running a security scan on glusterfs ports In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1740413 M. Scherer changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mscherer at redhat.com --- Comment #5 from M. Scherer --- Hi, in the future, could the logs be attached as files rather than used as comment ? This got blocked on mailman queue due to the size of the message :/ -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 19 10:08:51 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 10:08:51 +0000 Subject: [Bugs] [Bug 1702316] Cannot upgrade 5.x volume to 6.1 because of unused 'crypt' and 'bd' xlators In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1702316 Dmitry Melekhov changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |dm at belkam.com --- Comment #6 from Dmitry Melekhov --- Just got the same problem during upgrade from 5 to 6 and the same solution. It is not clear for me why it is closed as not a bug. There is nothing about it in 6 release notes, so it should work with default values. Thank you! -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 19 10:11:58 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 10:11:58 +0000 Subject: [Bugs] [Bug 1734423] interrupts leak memory In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734423 nchilaka changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ON_QA |VERIFIED -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 19 10:16:31 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 10:16:31 +0000 Subject: [Bugs] [Bug 1743195] New: can't start gluster after upgrade from 5 to 6 Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743195 Bug ID: 1743195 Summary: can't start gluster after upgrade from 5 to 6 Product: GlusterFS Version: 6 Status: NEW Component: core Assignee: bugs at gluster.org Reporter: dm at belkam.com CC: bugs at gluster.org Target Milestone: --- Classification: Community There is bug report https://bugzilla.redhat.com/show_bug.cgi?id=1702316 which describes the same situation, and solution from it works for us. But! It is not clear for me why it is closed as not a bug. There is nothing about it in 6 release notes, so it should work with default values. Just because I can't reopen , I created new bug report. Thank you! -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 19 10:25:31 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 10:25:31 +0000 Subject: [Bugs] [Bug 1507896] glfs_init returns incorrect errno on faliure In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1507896 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23265 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 19 10:25:32 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 10:25:32 +0000 Subject: [Bugs] [Bug 1507896] glfs_init returns incorrect errno on faliure In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1507896 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST --- Comment #4 from Worker Ant --- REVIEW: https://review.gluster.org/23265 (libgfapi: return correct errno on invalid volume name) posted (#1) for review on master by Sheetal Pamecha -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 19 10:26:43 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 10:26:43 +0000 Subject: [Bugs] [Bug 1743200] New: ./tests/bugs/glusterd/bug-1595320.t is failing Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743200 Bug ID: 1743200 Summary: ./tests/bugs/glusterd/bug-1595320.t is failing Product: GlusterFS Version: mainline Status: NEW Component: glusterd Assignee: bugs at gluster.org Reporter: moagrawa at redhat.com CC: bugs at gluster.org Target Milestone: --- Classification: Community Description of problem: Sometime ./tests/bugs/glusterd/bug-1595320.t is failing at the time of counting brick process after sending a kill signal to brick process. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1.Run ./tests/bugs/glusterd/bug-1595320.t in a loop on softserve vm 2. 3. Actual results: ./tests/bugs/glusterd/bug-1595320.t is failing Expected results: ./tests/bugs/glusterd/bug-1595320.t should not fail Additional info: -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 19 10:27:02 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 10:27:02 +0000 Subject: [Bugs] [Bug 1743200] ./tests/bugs/glusterd/bug-1595320.t is failing In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743200 Mohit Agrawal changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|bugs at gluster.org |moagrawa at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 19 10:35:51 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 10:35:51 +0000 Subject: [Bugs] [Bug 1743200] ./tests/bugs/glusterd/bug-1595320.t is failing In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743200 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23266 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 19 10:35:52 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 10:35:52 +0000 Subject: [Bugs] [Bug 1743200] ./tests/bugs/glusterd/bug-1595320.t is failing In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743200 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23266 (glusterd: ./tests/bugs/glusterd/bug-1595320.t is failing) posted (#1) for review on master by MOHIT AGRAWAL -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 19 11:20:36 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 11:20:36 +0000 Subject: [Bugs] [Bug 1739442] Unable to create geo-rep session on a non-root setup. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739442 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-19 11:20:36 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23198 (geo-rep: Fix mount broker setup issue) merged (#2) on release-7 by Rinku Kothiya -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 19 11:20:37 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 11:20:37 +0000 Subject: [Bugs] [Bug 1737712] Unable to create geo-rep session on a non-root setup. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737712 Bug 1737712 depends on bug 1739442, which changed state. Bug 1739442 Summary: Unable to create geo-rep session on a non-root setup. https://bugzilla.redhat.com/show_bug.cgi?id=1739442 What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 19 11:20:38 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 11:20:38 +0000 Subject: [Bugs] [Bug 1737716] Unable to create geo-rep session on a non-root setup. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737716 Bug 1737716 depends on bug 1739442, which changed state. Bug 1739442 Summary: Unable to create geo-rep session on a non-root setup. https://bugzilla.redhat.com/show_bug.cgi?id=1739442 What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 19 11:27:44 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 11:27:44 +0000 Subject: [Bugs] [Bug 1739430] ctime: Upgrade/Enabling ctime feature wrongly updates older files with latest {a|m|c}time In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739430 --- Comment #3 from Worker Ant --- REVIEW: https://review.gluster.org/23193 (ctime: Set mdata xattr on legacy files) merged (#2) on release-7 by Rinku Kothiya -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 19 11:28:08 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 11:28:08 +0000 Subject: [Bugs] [Bug 1739437] nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739437 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-19 11:28:08 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23196 (features/utime: always update ctime at setattr) merged (#2) on release-7 by Rinku Kothiya -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 19 11:28:08 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 11:28:08 +0000 Subject: [Bugs] [Bug 1737705] ctime: nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737705 Bug 1737705 depends on bug 1739437, which changed state. Bug 1739437 Summary: nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on https://bugzilla.redhat.com/show_bug.cgi?id=1739437 What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 19 11:28:09 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 11:28:09 +0000 Subject: [Bugs] [Bug 1737746] ctime: nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737746 Bug 1737746 depends on bug 1739437, which changed state. Bug 1739437 Summary: nfs client gets bad ctime for copied file which is on glusterfs disperse volume with ctime on https://bugzilla.redhat.com/show_bug.cgi?id=1739437 What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 19 11:28:32 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 11:28:32 +0000 Subject: [Bugs] [Bug 1739430] ctime: Upgrade/Enabling ctime feature wrongly updates older files with latest {a|m|c}time In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739430 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-19 11:28:32 --- Comment #4 from Worker Ant --- REVIEW: https://review.gluster.org/23194 (features/utime: Fix mem_put crash) merged (#2) on release-7 by Rinku Kothiya -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 19 11:28:33 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 11:28:33 +0000 Subject: [Bugs] [Bug 1733885] ctime: Upgrade/Enabling ctime feature wrongly updates older files with latest {a|m|c}time In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1733885 Bug 1733885 depends on bug 1739430, which changed state. Bug 1739430 Summary: ctime: Upgrade/Enabling ctime feature wrongly updates older files with latest {a|m|c}time https://bugzilla.redhat.com/show_bug.cgi?id=1739430 What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 19 11:28:55 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 11:28:55 +0000 Subject: [Bugs] [Bug 1739436] ctime: When healing ctime xattr for legacy files, if multiple clients access and modify the same file, the ctime might be updated incorrectly. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739436 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-19 11:28:55 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23195 (posix/ctime: Fix race during lookup ctime xattr heal) merged (#2) on release-7 by Rinku Kothiya -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 19 11:28:56 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 11:28:56 +0000 Subject: [Bugs] [Bug 1734305] ctime: When healing ctime xattr for legacy files, if multiple clients access and modify the same file, the ctime might be updated incorrectly. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734305 Bug 1734305 depends on bug 1739436, which changed state. Bug 1739436 Summary: ctime: When healing ctime xattr for legacy files, if multiple clients access and modify the same file, the ctime might be updated incorrectly. https://bugzilla.redhat.com/show_bug.cgi?id=1739436 What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 19 11:28:57 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 11:28:57 +0000 Subject: [Bugs] [Bug 1737745] ctime: When healing ctime xattr for legacy files, if multiple clients access and modify the same file, the ctime might be updated incorrectly. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737745 Bug 1737745 depends on bug 1739436, which changed state. Bug 1739436 Summary: ctime: When healing ctime xattr for legacy files, if multiple clients access and modify the same file, the ctime might be updated incorrectly. https://bugzilla.redhat.com/show_bug.cgi?id=1739436 What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 19 11:38:54 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 11:38:54 +0000 Subject: [Bugs] [Bug 1743218] New: glusterd start is failed and throwing an error Address already in use Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743218 Bug ID: 1743218 Summary: glusterd start is failed and throwing an error Address already in use Product: GlusterFS Version: 7 Status: NEW Component: rpc Assignee: bugs at gluster.org Reporter: moagrawa at redhat.com CC: bugs at gluster.org Depends On: 1743020 Target Milestone: --- Classification: Community +++ This bug was initially created as a clone of Bug #1743020 +++ Description of problem: Some of the .t are failing due to glusterd start failed after kill all gluster processes. Version-Release number of selected component (if applicable): How reproducible: Run regression test suite and below test case are failing ./tests/bugs/glusterd/brick-mux-validation.t ./tests/bugs/cli/bug-1077682.t ./tests/basic/glusterd-restart-shd-mux.t ./tests/bugs/core/multiplex-limit-issue-151.t Steps to Reproduce: 1. 2. 3. Actual results: test cases are failing Expected results: test case should not fail Additional info: --- Additional comment from Worker Ant on 2019-08-18 15:59:49 UTC --- REVIEW: https://review.gluster.org/23211 (rpc: glusterd start is failed and throwing an error Address already in use) posted (#9) for review on master by MOHIT AGRAWAL --- Additional comment from Worker Ant on 2019-08-19 03:45:41 UTC --- REVIEW: https://review.gluster.org/23211 (rpc: glusterd start is failed and throwing an error Address already in use) merged (#9) on master by MOHIT AGRAWAL Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1743020 [Bug 1743020] glusterd start is failed and throwing an error Address already in use -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 19 11:38:54 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 11:38:54 +0000 Subject: [Bugs] [Bug 1743020] glusterd start is failed and throwing an error Address already in use In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743020 Mohit Agrawal changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1743218 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1743218 [Bug 1743218] glusterd start is failed and throwing an error Address already in use -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 19 11:39:09 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 11:39:09 +0000 Subject: [Bugs] [Bug 1743218] glusterd start is failed and throwing an error Address already in use In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743218 Mohit Agrawal changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|bugs at gluster.org |moagrawa at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 19 11:43:54 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 11:43:54 +0000 Subject: [Bugs] [Bug 1743219] New: glusterd start is failed and throwing an error Address already in use Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743219 Bug ID: 1743219 Summary: glusterd start is failed and throwing an error Address already in use Product: GlusterFS Version: 6 Status: NEW Component: rpc Assignee: bugs at gluster.org Reporter: moagrawa at redhat.com CC: bugs at gluster.org Depends On: 1743020 Blocks: 1743218 Target Milestone: --- Classification: Community +++ This bug was initially created as a clone of Bug #1743020 +++ Description of problem: Some of the .t are failing due to glusterd start failed after kill all gluster processes. Version-Release number of selected component (if applicable): How reproducible: Run regression test suite and below test case are failing ./tests/bugs/glusterd/brick-mux-validation.t ./tests/bugs/cli/bug-1077682.t ./tests/basic/glusterd-restart-shd-mux.t ./tests/bugs/core/multiplex-limit-issue-151.t Steps to Reproduce: 1. 2. 3. Actual results: test cases are failing Expected results: test case should not fail Additional info: --- Additional comment from Worker Ant on 2019-08-18 15:59:49 UTC --- REVIEW: https://review.gluster.org/23211 (rpc: glusterd start is failed and throwing an error Address already in use) posted (#9) for review on master by MOHIT AGRAWAL --- Additional comment from Worker Ant on 2019-08-19 03:45:41 UTC --- REVIEW: https://review.gluster.org/23211 (rpc: glusterd start is failed and throwing an error Address already in use) merged (#9) on master by MOHIT AGRAWAL Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1743020 [Bug 1743020] glusterd start is failed and throwing an error Address already in use https://bugzilla.redhat.com/show_bug.cgi?id=1743218 [Bug 1743218] glusterd start is failed and throwing an error Address already in use -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 19 11:43:54 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 11:43:54 +0000 Subject: [Bugs] [Bug 1743020] glusterd start is failed and throwing an error Address already in use In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743020 Mohit Agrawal changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1743219 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1743219 [Bug 1743219] glusterd start is failed and throwing an error Address already in use -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 19 11:43:54 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 11:43:54 +0000 Subject: [Bugs] [Bug 1743218] glusterd start is failed and throwing an error Address already in use In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743218 Mohit Agrawal changed: What |Removed |Added ---------------------------------------------------------------------------- Depends On| |1743219 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1743219 [Bug 1743219] glusterd start is failed and throwing an error Address already in use -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 19 11:44:09 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 11:44:09 +0000 Subject: [Bugs] [Bug 1743219] glusterd start is failed and throwing an error Address already in use In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743219 Mohit Agrawal changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|bugs at gluster.org |moagrawa at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 19 11:46:45 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 11:46:45 +0000 Subject: [Bugs] [Bug 1743219] glusterd start is failed and throwing an error Address already in use In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743219 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23268 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 19 11:46:46 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 11:46:46 +0000 Subject: [Bugs] [Bug 1743219] glusterd start is failed and throwing an error Address already in use In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743219 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23268 (rpc: glusterd start is failed and throwing an error Address already in use) posted (#1) for review on release-6 by MOHIT AGRAWAL -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 19 12:04:25 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 12:04:25 +0000 Subject: [Bugs] [Bug 1741783] volume heal info show nothing, while visiting from mount point blame "no such entry" In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741783 zhou lin changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |bugs at gluster.org Component|replicate |replicate Version|cns-1.0 |4.1 Product|Red Hat Gluster Storage |GlusterFS -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 19 12:05:32 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 12:05:32 +0000 Subject: [Bugs] [Bug 1741783] volume heal info show nothing, while visiting from mount point blame "no such entry" In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741783 --- Comment #8 from zhou lin --- update from Red Hat Gluster storage->Glusterfs the version in use is glusterfs 3.12.15, however,there is no such version for me to choose -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 19 12:06:27 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 12:06:27 +0000 Subject: [Bugs] [Bug 1740968] glustershd can not decide heald_sinks, and skip repair, so some entries lingering in volume heal info In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1740968 zhou lin changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |bugs at gluster.org Component|replicate |replicate Version|unspecified |4.1 Product|Red Hat Gluster Storage |GlusterFS -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 19 12:48:16 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 12:48:16 +0000 Subject: [Bugs] [Bug 1741783] volume heal info show nothing, while visiting from mount point blame "no such entry" In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741783 Karthik U S changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(zz.sh.cynthia at gma | |il.com) --- Comment #9 from Karthik U S --- >From the outputs provided, the file "mn-1__dbim-redis.service__database-nosql-cmredis.sync_state.tmp" is present only on brick "mn-0.local:/mnt/bricks/services/brick" without the gfid-link under .glusterfs dir. On the other two bricks the file itself is not present. By looking at the state of the file one possible scenario that I can think of which can lead to this is: - File creation succeeded on 1st brick - Gfid assignment succeeds but gfid-link creation fails on that brick - On the 2nd & 3rd brick file creation itself will fail Are there frequent disconnects of the bricks? I do not see the following message in the source code in version 3.12.15. [2019-08-18 09:04:45.976644] I [MSGID: 108026] [afr-self-heald.c:446:afr_shd_index_heal] 0-services-replicate-0: ret = -2,purge gfid:5f2dc6c6-3fc4-41ba-9800-a560b483de13 Is this something added by you or anyone from your team? Is this a source install or rpm install? Give the outputs of "gluster --version" and "rpm -qa | grep gluster". This is an INFO log according to the log level in the message. So how did you come to the conclusion that entry purge failed? Was this file present on the brick even after this message was logged? SHD caught this entry during heal, so most probably this file or its parent had some pending marker on them. Or did a full heal was run on this volume? -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 19 13:36:29 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 13:36:29 +0000 Subject: [Bugs] [Bug 1743215] glusterd-utils: 0-management: xfs_info exited with non-zero exit status [Permission denied] In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743215 Kaleb KEITHLEY changed: What |Removed |Added ---------------------------------------------------------------------------- CC|kkeithle at redhat.com |bugs at gluster.org Component|glusterfs |glusterd Version|30 |6 Assignee|kkeithle at redhat.com |bugs at gluster.org Product|Fedora |GlusterFS QA Contact|extras-qa at fedoraproject.org | -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 19 14:10:26 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 19 Aug 2019 14:10:26 +0000 Subject: [Bugs] [Bug 1543996] truncates read-only files on copy In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1543996 Kaleb KEITHLEY changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(rtalur at redhat.com |needinfo?(spalai at redhat.com |) |) |needinfo?(kdhananj at redhat.c | |om) | --- Comment #10 from Kaleb KEITHLEY --- Any update? This has been open since February. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 20 01:26:30 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 01:26:30 +0000 Subject: [Bugs] [Bug 1743200] ./tests/bugs/glusterd/bug-1595320.t is failing In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743200 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-20 01:26:30 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23266 (glusterd: ./tests/bugs/glusterd/bug-1595320.t is failing) merged (#1) on master by MOHIT AGRAWAL -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 20 03:14:34 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 03:14:34 +0000 Subject: [Bugs] [Bug 789278] Issues reported by Coverity static analysis tool In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=789278 --- Comment #1648 from Worker Ant --- REVIEW: https://review.gluster.org/23236 (storage/posix - Moved pointed validity check in order to avoid possible seg-fault) merged (#4) on master by Amar Tumballi -- You are receiving this mail because: You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 20 03:15:42 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 03:15:42 +0000 Subject: [Bugs] [Bug 1193929] GlusterFS can be improved In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1193929 --- Comment #746 from Worker Ant --- REVIEW: https://review.gluster.org/23178 (client_t.c: removal of dead code.) merged (#3) on master by Amar Tumballi -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 20 04:05:19 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 04:05:19 +0000 Subject: [Bugs] [Bug 789278] Issues reported by Coverity static analysis tool In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=789278 --- Comment #1649 from Worker Ant --- REVIEW: https://review.gluster.org/23251 (protocol/client - fixing a coverity issue) merged (#3) on master by Amar Tumballi -- You are receiving this mail because: You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 20 05:30:01 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 05:30:01 +0000 Subject: [Bugs] [Bug 1730409] core file generated - when EC volume stop and start is executed for 10 loops on a EC+Brickmux setup In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1730409 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-20 05:30:01 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23060 (posix: In brick_mux brick is crashed while start/stop volume in loop) merged (#23) on master by Pranith Kumar Karampuri -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 20 05:30:44 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 05:30:44 +0000 Subject: [Bugs] [Bug 1738778] Unable to setup softserve VM In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1738778 --- Comment #4 from Ravishankar N --- Deleting that line worked. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 20 05:57:51 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 05:57:51 +0000 Subject: [Bugs] [Bug 1543996] truncates read-only files on copy In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1543996 Susant Kumar Palai changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|spalai at redhat.com |jthottan at redhat.com Flags|needinfo?(spalai at redhat.com |needinfo- |) |needinfo?(jthottan at redhat.c | |om) -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 20 06:09:32 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 06:09:32 +0000 Subject: [Bugs] [Bug 1726175] CentOs 6 GlusterFS client creates files with time 01/01/1970 In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1726175 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |khiremat at redhat.com Assignee|bugs at gluster.org |khiremat at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 20 06:12:55 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 06:12:55 +0000 Subject: [Bugs] [Bug 1716979] Multiple disconnect events being propagated for the same child In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1716979 --- Comment #12 from hari gowtham --- we have backported the patches for this bug to every active release branch. About having a release 6 alone, we need to check with others involved in the process and get back to you. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 20 06:21:15 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 06:21:15 +0000 Subject: [Bugs] [Bug 1737484] geo-rep syncing significantly behind and also only one of the directories are synced with tracebacks seen In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737484 --- Comment #4 from Worker Ant --- REVIEW: https://review.gluster.org/23247 (geo-rep: Fix worker connection issue) merged (#4) on master by Amar Tumballi -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 20 06:30:39 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 06:30:39 +0000 Subject: [Bugs] [Bug 789278] Issues reported by Coverity static analysis tool In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=789278 --- Comment #1650 from Worker Ant --- REVIEW: https://review.gluster.org/23260 (mount/fuse - Fixing a coverity issue) merged (#4) on master by Amar Tumballi -- You are receiving this mail because: You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 20 07:32:08 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 07:32:08 +0000 Subject: [Bugs] [Bug 1743218] glusterd start is failed and throwing an error Address already in use In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743218 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23267 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 20 07:32:09 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 07:32:09 +0000 Subject: [Bugs] [Bug 1743218] glusterd start is failed and throwing an error Address already in use In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743218 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-20 07:32:09 --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23267 (rpc: glusterd start is failed and throwing an error Address already in use) merged (#2) on release-7 by MOHIT AGRAWAL -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 20 08:11:24 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 08:11:24 +0000 Subject: [Bugs] [Bug 789278] Issues reported by Coverity static analysis tool In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=789278 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23270 -- You are receiving this mail because: You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 20 08:11:25 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 08:11:25 +0000 Subject: [Bugs] [Bug 789278] Issues reported by Coverity static analysis tool In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=789278 --- Comment #1651 from Worker Ant --- REVIEW: https://review.gluster.org/23270 (geo-replication: fixing a coverity issue (resource leak)) posted (#1) for review on master by Barak Sason Rofman -- You are receiving this mail because: You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 20 08:46:05 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 08:46:05 +0000 Subject: [Bugs] [Bug 1423442] group files to set volume options should have comments In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1423442 Amar Tumballi changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|amukherj at redhat.com |bsasonro at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 20 08:56:39 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 08:56:39 +0000 Subject: [Bugs] [Bug 1743573] New: fuse client hung when issued a lookup "ls" on an ec volume Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743573 Bug ID: 1743573 Summary: fuse client hung when issued a lookup "ls" on an ec volume Product: GlusterFS Version: mainline Status: NEW Component: disperse Keywords: Regression Severity: urgent Assignee: bugs at gluster.org Reporter: pkarampu at redhat.com CC: amukherj at redhat.com, aspandey at redhat.com, bugs at gluster.org, csaba at redhat.com, nchilaka at redhat.com, pkarampu at redhat.com, rgowdapp at redhat.com, rhs-bugs at redhat.com, sankarshan at redhat.com, sheggodu at redhat.com, storage-qa-internal at redhat.com, vdas at redhat.com Depends On: 1731896 Blocks: 1696809 Target Milestone: --- Classification: Community Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1731896 [Bug 1731896] fuse client hung when issued a lookup "ls" on an ec volume -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 20 08:57:12 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 08:57:12 +0000 Subject: [Bugs] [Bug 1743573] fuse client hung when issued a lookup "ls" on an ec volume In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743573 Pranith Kumar K changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks|1696809 | -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 20 08:58:46 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 08:58:46 +0000 Subject: [Bugs] [Bug 1743573] fuse client hung when issued a lookup "ls" on an ec volume In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743573 --- Comment #1 from Pranith Kumar K --- (gdb) p $4->locks[0] $5 = {lock = 0x7f3da4abc1d8, fop = 0x7f3d74317e18, owner_list = {next = 0x7f3d74317ed0, prev = 0x7f3d74317ed0}, wait_list = {next = 0x7f3da4abc208, prev = 0x7f3da4abc208}, update = {false, false}, dirty = { false, false}, optimistic_changelog = false, base = 0x0, size = 0, waiting_flags = 0, fl_start = 0, fl_end = 9223372036854775807} (gdb) p $4->locks[0].lock $6 = (ec_lock_t *) 0x7f3da4abc1d8 (gdb) p *$4->locks[0].lock $7 = {ctx = 0x7f3db7cbff70, timer = 0x0, owners = {next = 0x7f3da4abc1e8, prev = 0x7f3da4abc1e8}, waiting = {next = 0x7f3da4abc1f8, prev = 0x7f3da4abc1f8}, frozen = {next = 0x7f3d74317ee0, prev = 0x7f3d74317ee0}, mask = 0, good_mask = 18446744073709551615, healing = 0, refs_owners = 0, refs_pending = 0, waiting_flags = 0, acquired = false, unlock_now = false, release = true, query = true, fd = 0x0, loc = {path = 0x7f3d75084a40 "/IOs/kernel/rhs-client45.lab.eng.blr.redhat.com/dir.2/linux-5.2.7/Documentation/devicetree/bindings/rtc", name = 0x7f3d75084aa4 "rtc", inode = 0x7f3d98014768, parent = 0x7f3d99faad38, gfid = "\310\a\376|-\205K\v\215\000\b\363>\241\021i", pargfid = "\345\330}\212\242{Nr\233\064\373\030MD\361", }, {type = ENTRYLK_WRLCK, flock = { l_type = 1, l_whence = 0, l_start = 0, l_len = 0, l_pid = 0, l_owner = {len = 0, data = '\000' }}}} (gdb) p &$4->locks[0].lock->owners $8 = (struct list_head *) 0x7f3da4abc1e8 (gdb) p &$4->locks[0].lock->waiting $9 = (struct list_head *) 0x7f3da4abc1f8 (gdb) p &$4->locks[0].lock->frozen $10 = (struct list_head *) 0x7f3da4abc208 This seems to suggest that the fop is stuck in frozen list which can only happen if lock->release is set to true. Problem: Mount-1 Mount-2 1)Tries to acquire lock on 'dir1' 1)Tries to acquire lock on 'dir1' 2)Lock is granted on brick-0 2)Lock gets EAGAIN on brick-0 and leads to blocking lock on brick-0 3)Gets a lock-contention 3) Doesn't matter what happens on mount-2 notification, marks lock->release from here on. to true. 4)New fop comes on 'dir1' which will be put in frozen list as lock->release is set to true. 5) Lock acquisition from step-2 fails because 3 bricks went down in 4+2 setup. Fop on mount-1 which is put in frozen list will hang because no codepath will move it from frozen list to any other list and the lock will not be retried. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 20 09:02:46 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 09:02:46 +0000 Subject: [Bugs] [Bug 1743573] fuse client hung when issued a lookup "ls" on an ec volume In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743573 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23272 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 20 09:02:47 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 09:02:47 +0000 Subject: [Bugs] [Bug 1743573] fuse client hung when issued a lookup "ls" on an ec volume In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743573 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23272 (cluster/ec: Mark release only when it is acquired) posted (#1) for review on master by Pranith Kumar Karampuri -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 20 09:35:46 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 09:35:46 +0000 Subject: [Bugs] [Bug 789278] Issues reported by Coverity static analysis tool In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=789278 --- Comment #1652 from Worker Ant --- REVIEW: https://review.gluster.org/23264 (libglusterfs - fixing a coverity issue) merged (#5) on master by Amar Tumballi -- You are receiving this mail because: You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 20 09:36:54 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 09:36:54 +0000 Subject: [Bugs] [Bug 1737291] features/locks: avoid use after freed of frame for blocked lock In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737291 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-20 09:36:54 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23155 (features/locks: avoid use after freed of frame for blocked lock) merged (#2) on master by Amar Tumballi -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 20 10:06:07 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 10:06:07 +0000 Subject: [Bugs] [Bug 1716979] Multiple disconnect events being propagated for the same child In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1716979 --- Comment #13 from hari gowtham --- And can you use the night rpms available at http://artifacts.ci.centos.org/gluster/nightly/ The latest rpm has the fix in it. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 20 10:10:45 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 10:10:45 +0000 Subject: [Bugs] [Bug 1726175] CentOs 6 GlusterFS client creates files with time 01/01/1970 In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1726175 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 20 10:11:02 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 10:11:02 +0000 Subject: [Bugs] [Bug 1732792] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732792 nchilaka changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ON_QA |VERIFIED -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 20 10:11:44 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 10:11:44 +0000 Subject: [Bugs] [Bug 1731448] [GSS] An Input/Output error happens on a disperse volume when doing unaligned writes to a sparse file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1731448 nchilaka changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ON_QA |VERIFIED -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 20 10:12:17 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 10:12:17 +0000 Subject: [Bugs] [Bug 1732792] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732792 --- Comment #8 from nchilaka --- Created attachment 1606039 --> https://bugzilla.redhat.com/attachment.cgi?id=1606039&action=edit python script to test -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 20 10:12:38 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 10:12:38 +0000 Subject: [Bugs] [Bug 1732774] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732774 --- Comment #8 from nchilaka --- Created attachment 1606040 --> https://bugzilla.redhat.com/attachment.cgi?id=1606040&action=edit python script to test -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 20 10:12:53 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 10:12:53 +0000 Subject: [Bugs] [Bug 1732774] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732774 nchilaka changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ON_QA |VERIFIED -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 20 10:13:03 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 10:13:03 +0000 Subject: [Bugs] [Bug 1726205] Windows client fails to copy large file to GlusterFS volume share with fruit and streams_xattr VFS modules via Samba In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1726205 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-20 10:13:03 --- Comment #4 from Worker Ant --- REVIEW: https://review.gluster.org/23206 (performance/md-cache: Do not skip caching of null character xattr values) merged (#5) on master by Amar Tumballi -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 20 10:16:04 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 10:16:04 +0000 Subject: [Bugs] [Bug 1732770] fix truncate lock to cover the write in tuncate clean In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732770 nchilaka changed: What |Removed |Added ---------------------------------------------------------------------------- QA Contact|nchilaka at redhat.com |ubansal at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 20 10:30:04 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 10:30:04 +0000 Subject: [Bugs] [Bug 1735514] Open fd heal should filter O_APPEND/O_EXCL In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1735514 Upasana changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ON_QA |VERIFIED -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 20 10:41:19 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 10:41:19 +0000 Subject: [Bugs] [Bug 1726175] CentOs 6 GlusterFS client creates files with time 01/01/1970 In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1726175 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23274 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 20 10:41:20 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 10:41:20 +0000 Subject: [Bugs] [Bug 1726175] CentOs 6 GlusterFS client creates files with time 01/01/1970 In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1726175 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |POST --- Comment #3 from Worker Ant --- REVIEW: https://review.gluster.org/23274 (ctime: Fix incorrect realtime passed to frame->root->ctime) posted (#1) for review on master by Kotresh HR -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 20 10:49:58 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 10:49:58 +0000 Subject: [Bugs] [Bug 1738786] ctime: If atime is updated via utimensat syscall ctime is not getting updated In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1738786 --- Comment #2 from Kotresh HR --- Discussion at the patch which is worth mentioning here Kinglong Mee: @Kotresh HR, I cannot reproduce this problem as you description, Steps to Reproduce: touch /mnt/file1 stat /mnt/file1 sleep 1; touch -m -d "2020-01-01 12:00:00" /mnt/file1 stat /mnt/file1 Actual results: ctime is same between two stats above Expected results: ctime should be changed between two stats above I test at glusterfs mount and nfs mount, all get right result as, # sh test.sh File: file1 Size: 0 Blocks: 0 IO Block: 131072 regular empty file Device: 30h/48d Inode: 13578024673158818984 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2019-08-08 18:39:16.099467595 +0800 Modify: 2019-08-08 18:39:16.099467595 +0800 Change: 2019-08-08 18:39:16.100847925 +0800 Birth: - File: file1 Size: 0 Blocks: 0 IO Block: 131072 regular empty file Device: 30h/48d Inode: 13578024673158818984 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2019-08-08 18:39:16.099467595 +0800 Modify: 2020-01-01 12:00:00.000000000 +0800 Change: 2019-08-08 18:39:17.126800759 +0800 Birth: - --------------- Sorry, this is not happening with your patch[1]. Because we don't update ctime to mtime which was previously being done. But with your patch [1], I think all setattrs even some internal setattrs are updating ctime. So when file is created, all times should be same. But now it's not. [root at f281 glusterfs]# stat /mastermnt/file3 File: /mastermnt/file3 Size: 0 Blocks: 0 IO Block: 131072 regular empty file Device: 2eh/46d Inode: 13563962387061186202 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Context: system_u:object_r:fusefs_t:s0 Access: 2019-08-08 17:44:31.341319550 +0530 Modify: 2019-08-08 17:44:31.341319550 +0530 Change: 2019-08-08 17:44:31.342550008 +0530 <<<< ctime is different Birth: - [root at f281 So I was trying to fix this issue. And other thing, ideally updating atime|mtime should update ctime with current time, which was not happening in "posix_update_utime_in_mdata" but was happening as part of "posix_set_ctime" in posix_setattr. ----------------------------- |But with your patch [1], I think all setattrs even some internal setattrs are updating ctime. So when file is created, all times should be same. But now it's not. Yes, you are right. when file is created, all times should be same. With the patch[1], those times are different. For nfs, a create of a file, nfs client sends a create rpc, and a setattr(set time of server). Ganesha.nfsd gets the CLOCK_REALTIME for mtime/atime, and the utime xlator gets the realtime for ctime, so that, we cannot gets all times same when creating file. I think we should let utime xlator gets the realtime for all times(ctime/atime/mtime), ganesha.nfsd does not do that. |With this patch, it is clean. I am inclined to take this patch in if this solves the original nfs problem you reported. Could you please test that out and let me know? With this patch, the nfs problem of bad ctime is not exist now. [1] https://review.gluster.org/#/c/23154/ -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 20 10:50:58 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 10:50:58 +0000 Subject: [Bugs] [Bug 1743627] New: ctime: If atime is updated via utimensat syscall ctime is not getting updated Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743627 Bug ID: 1743627 Summary: ctime: If atime is updated via utimensat syscall ctime is not getting updated Product: Red Hat Gluster Storage Version: rhgs-3.5 Status: NEW Component: core Assignee: atumball at redhat.com Reporter: khiremat at redhat.com QA Contact: rhinduja at redhat.com CC: bugs at gluster.org, nchilaka at redhat.com, rhs-bugs at redhat.com, sankarshan at redhat.com, storage-qa-internal at redhat.com Depends On: 1738786 Target Milestone: --- Classification: Red Hat +++ This bug was initially created as a clone of Bug #1738786 +++ Description of problem: When atime|mtime is updated via utime family of syscalls, ctime is not updated. Version-Release number of selected component (if applicable): mainline How reproducible: Always Steps to Reproduce: touch /mnt/file1 stat /mnt/file1 sleep 1; touch -m -d "2020-01-01 12:00:00" /mnt/file1 stat /mnt/file1 Actual results: ctime is same between two stats above Expected results: ctime should be changed between two stats above Additional info: --- Additional comment from Worker Ant on 2019-08-08 07:43:19 UTC --- REVIEW: https://review.gluster.org/23177 (ctime: Fix ctime issue with utime family of syscalls) posted (#1) for review on master by Kotresh HR --- Additional comment from Kotresh HR on 2019-08-20 10:49:58 UTC --- Discussion at the patch which is worth mentioning here Kinglong Mee: @Kotresh HR, I cannot reproduce this problem as you description, Steps to Reproduce: touch /mnt/file1 stat /mnt/file1 sleep 1; touch -m -d "2020-01-01 12:00:00" /mnt/file1 stat /mnt/file1 Actual results: ctime is same between two stats above Expected results: ctime should be changed between two stats above I test at glusterfs mount and nfs mount, all get right result as, # sh test.sh File: file1 Size: 0 Blocks: 0 IO Block: 131072 regular empty file Device: 30h/48d Inode: 13578024673158818984 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2019-08-08 18:39:16.099467595 +0800 Modify: 2019-08-08 18:39:16.099467595 +0800 Change: 2019-08-08 18:39:16.100847925 +0800 Birth: - File: file1 Size: 0 Blocks: 0 IO Block: 131072 regular empty file Device: 30h/48d Inode: 13578024673158818984 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2019-08-08 18:39:16.099467595 +0800 Modify: 2020-01-01 12:00:00.000000000 +0800 Change: 2019-08-08 18:39:17.126800759 +0800 Birth: - --------------- Sorry, this is not happening with your patch[1]. Because we don't update ctime to mtime which was previously being done. But with your patch [1], I think all setattrs even some internal setattrs are updating ctime. So when file is created, all times should be same. But now it's not. [root at f281 glusterfs]# stat /mastermnt/file3 File: /mastermnt/file3 Size: 0 Blocks: 0 IO Block: 131072 regular empty file Device: 2eh/46d Inode: 13563962387061186202 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Context: system_u:object_r:fusefs_t:s0 Access: 2019-08-08 17:44:31.341319550 +0530 Modify: 2019-08-08 17:44:31.341319550 +0530 Change: 2019-08-08 17:44:31.342550008 +0530 <<<< ctime is different Birth: - [root at f281 So I was trying to fix this issue. And other thing, ideally updating atime|mtime should update ctime with current time, which was not happening in "posix_update_utime_in_mdata" but was happening as part of "posix_set_ctime" in posix_setattr. ----------------------------- |But with your patch [1], I think all setattrs even some internal setattrs are updating ctime. So when file is created, all times should be same. But now it's not. Yes, you are right. when file is created, all times should be same. With the patch[1], those times are different. For nfs, a create of a file, nfs client sends a create rpc, and a setattr(set time of server). Ganesha.nfsd gets the CLOCK_REALTIME for mtime/atime, and the utime xlator gets the realtime for ctime, so that, we cannot gets all times same when creating file. I think we should let utime xlator gets the realtime for all times(ctime/atime/mtime), ganesha.nfsd does not do that. |With this patch, it is clean. I am inclined to take this patch in if this solves the original nfs problem you reported. Could you please test that out and let me know? With this patch, the nfs problem of bad ctime is not exist now. [1] https://review.gluster.org/#/c/23154/ Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1738786 [Bug 1738786] ctime: If atime is updated via utimensat syscall ctime is not getting updated -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 20 10:50:58 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 10:50:58 +0000 Subject: [Bugs] [Bug 1738786] ctime: If atime is updated via utimensat syscall ctime is not getting updated In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1738786 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1743627 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1743627 [Bug 1743627] ctime: If atime is updated via utimensat syscall ctime is not getting updated -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 20 10:51:48 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 10:51:48 +0000 Subject: [Bugs] [Bug 1743627] ctime: If atime is updated via utimensat syscall ctime is not getting updated In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743627 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|unspecified |high Hardware|Unspecified |x86_64 OS|Unspecified |Linux Severity|unspecified |medium -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 20 10:52:00 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 10:52:00 +0000 Subject: [Bugs] [Bug 1743627] ctime: If atime is updated via utimensat syscall ctime is not getting updated In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743627 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED Assignee|atumball at redhat.com |khiremat at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 20 10:53:22 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 10:53:22 +0000 Subject: [Bugs] [Bug 1743627] ctime: If atime is updated via utimensat syscall ctime is not getting updated In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743627 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |POST -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 20 10:58:33 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 10:58:33 +0000 Subject: [Bugs] [Bug 1743634] New: geo-rep: Changelog archive file format is incorrect Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743634 Bug ID: 1743634 Summary: geo-rep: Changelog archive file format is incorrect Product: Red Hat Gluster Storage Version: rhgs-3.5 Hardware: x86_64 OS: Linux Status: NEW Component: geo-replication Severity: medium Assignee: sunkumar at redhat.com Reporter: khiremat at redhat.com QA Contact: rhinduja at redhat.com CC: avishwan at redhat.com, bugs at gluster.org, csaba at redhat.com, khiremat at redhat.com, rhs-bugs at redhat.com, sankarshan at redhat.com, storage-qa-internal at redhat.com Depends On: 1741890 Target Milestone: --- Classification: Red Hat +++ This bug was initially created as a clone of Bug #1741890 +++ Description of problem: The created changelog archive file didn't have corresponding year and month. It created as "archive_%Y%m.tar" on python2 only systems. [root at rhs-gp-srv7 xsync]# ls -l total 664564 -rw-r--r--. 1 root root 680509440 Aug 15 16:51 archive_%Y%m.tar [root at rhs-gp-srv7 xsync]# Version-Release number of selected component (if applicable): mainline How reproducible: Always on python2 only machine (centos7) Steps to Reproduce: 1. Create geo-rep session on python2 only machine 2. ls -l /var/lib/misc/gluster/gsyncd///.processed/ Actual results: changelog archive file format is incorrect. Not substituted with corresponding year and month Expected results: changelog archive file name should have correct year and month Additional info: --- Additional comment from Worker Ant on 2019-08-16 10:59:26 UTC --- REVIEW: https://review.gluster.org/23248 (geo-rep: Fix the name of changelog archive file) posted (#1) for review on master by Kotresh HR Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1741890 [Bug 1741890] geo-rep: Changelog archive file format is incorrect -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 20 10:58:33 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 10:58:33 +0000 Subject: [Bugs] [Bug 1741890] geo-rep: Changelog archive file format is incorrect In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741890 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1743634 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1743634 [Bug 1743634] geo-rep: Changelog archive file format is incorrect -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 20 10:59:18 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 10:59:18 +0000 Subject: [Bugs] [Bug 1743634] geo-rep: Changelog archive file format is incorrect In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743634 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |Regression Status|NEW |ASSIGNED Assignee|sunkumar at redhat.com |khiremat at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 20 10:59:25 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 10:59:25 +0000 Subject: [Bugs] [Bug 1743634] geo-rep: Changelog archive file format is incorrect In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743634 RHEL Product and Program Management changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |blocker? -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 20 11:01:52 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 11:01:52 +0000 Subject: [Bugs] [Bug 1743634] geo-rep: Changelog archive file format is incorrect In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743634 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |POST -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 20 11:42:54 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 11:42:54 +0000 Subject: [Bugs] [Bug 1743652] New: CentOs 6 GlusterFS client creates files with time 01/01/1970 Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743652 Bug ID: 1743652 Summary: CentOs 6 GlusterFS client creates files with time 01/01/1970 Product: GlusterFS Version: mainline Hardware: x86_64 OS: Linux Status: NEW Component: ctime Severity: high Priority: medium Assignee: bugs at gluster.org Reporter: khiremat at redhat.com CC: alexis.fernandez at altafonte.com, atumball at redhat.com, baoboadev at gmail.com, bugs at gluster.org, khiremat at redhat.com, rkavunga at redhat.com Depends On: 1726175 Target Milestone: --- Classification: Community +++ This bug was initially created as a clone of Bug #1726175 +++ Description of problem: CentOs 6 gluster client with glusterfs volume mounted creates files with time creation "01/01/1970". Files created by user apache, or with user root with vim, or nano, are created with bad date. But if create with touch, the date is correct. Version-Release number of selected component (if applicable): glusterfs-fuse-6.3-1.el6.x86_64 How reproducible: Create file in mountpoint with vim, or nano. Steps to Reproduce: 1. yum install centos-release-gluster6 2. yum install glusterfs-client 3. mount -t glusterfs IP:/remotevol /mnt/localdir 4. cd /mnt/localdir 5. vim asdasdad 6. :wq! 7. ls -lah asdasdad Actual results: -rw-r--r-- 1 root root 0 ene 1 1970 test Expected results: -rw-r--r-- 1 root root 0 jul 1 2019 test --- Additional comment from baoboa on 2019-07-02 15:23:52 UTC --- same behavior for a centos6 client server: glusterfs-server.x86_64 6.3-1.el7 @centos-gluster6 client: glusterfs-fuse.x86_64 6.3-1.el6 @centos-gluster6 kernel version: 2.6.32-573.3.1.el6.x86_64 mount -t glusterfs server:myvol /mnt/myvol touch /mnt/myvol/test -> correct time -rw-r--r-- 1 root root 12 Jul 1 11:59 test vi /mnt/myvol/test2 -> wrong time (1970) -rw-r--r-- 1 root root 7 Dec 18 1970 test2 REM: this not the case for a centos7 client, the creation time is correct recover correct time if ctime is deactivated "gluster volume set myvol features.ctime off" ls /mnt/myvol/ -rw-r--r-- 1 root root 7 Jul 2 17:16 test2 -rw-r--r-- 1 root root 12 Jul 1 11:59 test https://review.gluster.org/#/c/glusterfs/+/22651/ this review look related to this bug/regression --- Additional comment from on 2019-07-16 08:38:30 UTC --- Thanks baoboa, I can confirm the value "gluster volume set myvol features.ctime off" fix the issue with the date. Thanks. --- Additional comment from Worker Ant on 2019-08-20 10:41:20 UTC --- REVIEW: https://review.gluster.org/23274 (ctime: Fix incorrect realtime passed to frame->root->ctime) posted (#1) for review on master by Kotresh HR Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1726175 [Bug 1726175] CentOs 6 GlusterFS client creates files with time 01/01/1970 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 20 11:42:54 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 11:42:54 +0000 Subject: [Bugs] [Bug 1726175] CentOs 6 GlusterFS client creates files with time 01/01/1970 In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1726175 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1743652 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1743652 [Bug 1743652] CentOs 6 GlusterFS client creates files with time 01/01/1970 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 20 11:43:15 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 11:43:15 +0000 Subject: [Bugs] [Bug 1743652] CentOs 6 GlusterFS client creates files with time 01/01/1970 In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743652 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED Assignee|bugs at gluster.org |khiremat at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 20 11:45:41 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 11:45:41 +0000 Subject: [Bugs] [Bug 1726175] CentOs 6 GlusterFS client creates files with time 01/01/1970 In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1726175 --- Comment #4 from Worker Ant --- REVISION POSTED: https://review.gluster.org/23274 (ctime: Fix incorrect realtime passed to frame->root->ctime) posted (#2) for review on master by Kotresh HR -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 20 11:45:44 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 11:45:44 +0000 Subject: [Bugs] [Bug 1743652] CentOs 6 GlusterFS client creates files with time 01/01/1970 In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743652 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23274 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 20 11:45:45 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 11:45:45 +0000 Subject: [Bugs] [Bug 1743652] CentOs 6 GlusterFS client creates files with time 01/01/1970 In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743652 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23274 (ctime: Fix incorrect realtime passed to frame->root->ctime) posted (#2) for review on master by Kotresh HR -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 20 12:27:26 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 12:27:26 +0000 Subject: [Bugs] [Bug 1727727] Build+Packaging Automation In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1727727 M. Scherer changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(mscherer at redhat.c | |om) | --- Comment #10 from M. Scherer --- I am not sure to understand what do you mean by "setup them up". I do expect the setup be done with ansible, using our playbooks, and not give direct access to people (because experience showed that when people have a way to bypass automation, they do bypass it sooner or later, causing us trouble later). So far, the only patch I found is https://review.gluster.org/#/c/build-jobs/+/23172/ which is not exactly something that should be merged, since that's a job that do replicate the work of jenkins. I kinda do expect a job that just run generic-package.sh on the builder, and that's it. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 20 12:38:54 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 12:38:54 +0000 Subject: [Bugs] [Bug 1741899] the volume of occupied space in the bricks of gluster volume (3 nodes replica) differs on nodes and the healing does not fix it In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741899 --- Comment #5 from Sergey Pleshkov --- [root at LSY-GL-03 host]# ./arequal-checksum -p /diskForTestData/tst -i .glusterfs Entry counts Regular files : 359953 Directories : 13244 Symbolic links : 511 Other : 0 Total : 373708 Metadata checksums Regular files : 800d132fc8dbd2d3 Directories : 2a067038668ee0 Symbolic links : 9edfcc852 Other : 3e9 Checksums Regular files : 523a264a8cb047533c6d72eee606bf2 Directories : 4f697d5629707031 Symbolic links : 173f1e2800747538 Other : 0 Total : 9aa921a4bd429a8 [root at LSY-GL-02 host]# ./arequal-checksum -p /diskForTestData/tst -i .glusterfs Entry counts Regular files : 359215 Directories : 13244 Symbolic links : 511 Other : 0 Total : 372970 Metadata checksums Regular files : 8098f54e92802273 Directories : 2a067038668ee0 Symbolic links : 9edfcc852 Other : 3e9 Checksums Regular files : d992a16c2b695ebaef21668a320a96ac Directories : 52134d6004145c08 Symbolic links : 173f1e2800747538 Other : 0 Total : 739f94ae1d03e126 [root at LSY-GL-01 host]# ./arequal-checksum -p /diskForTestData/tst -i .glusterfs Entry counts Regular files : 359215 Directories : 13244 Symbolic links : 511 Other : 0 Total : 372970 Metadata checksums Regular files : 812d17da8db2d6f3 Directories : 2a067038668ee0 Symbolic links : 9edfcc852 Other : 3e9 Checksums Regular files : b980694e409c76a1df19442db9576bc1 Directories : 26433d161d1e130e Symbolic links : 173f1e2800747538 Other : 0 Total : 57e50e5de4a17b56 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 20 12:43:42 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 12:43:42 +0000 Subject: [Bugs] [Bug 1423442] group files to set volume options should have comments In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1423442 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23277 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 20 12:43:43 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 12:43:43 +0000 Subject: [Bugs] [Bug 1423442] group files to set volume options should have comments In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1423442 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST --- Comment #5 from Worker Ant --- REVIEW: https://review.gluster.org/23277 (cli - group files to set volume options supports comments) posted (#1) for review on master by Barak Sason Rofman -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 20 12:44:30 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 12:44:30 +0000 Subject: [Bugs] [Bug 1741899] the volume of occupied space in the bricks of gluster volume (3 nodes replica) differs on nodes and the healing does not fix it In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741899 --- Comment #6 from Sergey Pleshkov --- [root at LSY-GL-03 host]# gluster volume heal TST info Brick lsy-gl-01:/diskForTestData/tst Status: Connected Number of entries: 0 Brick lsy-gl-02:/diskForTestData/tst Status: Connected Number of entries: 0 Brick lsy-gl-03:/diskForTestData/tst Status: Connected Number of entries: 0 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 20 13:14:56 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 13:14:56 +0000 Subject: [Bugs] [Bug 1741899] the volume of occupied space in the bricks of gluster volume (3 nodes replica) differs on nodes and the healing does not fix it In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741899 --- Comment #7 from Sergey Pleshkov --- Did compare of folder ?ontent and find this strange anomaly with size of folders and files on bricks. lsy-lg-02 /diskForTestData/tst/.shard/.remove_me: total 48K 0 . 48K .. 0 1b69424e-47ca-44b9-b475-9f073956fd10 0 5459b172-600a-4464-8fcd-8e987a62fb37 /diskForTestData/tst/smb_conf: total 44K 4.0K . 0 .. 8.0K failover-dns.conf 4.0K ganesha.conf 4.0K krb5.conf 4.0K mnt-gvol.mount 4.0K mnt-prod.mount 4.0K mnt-tst.mount 0 mount-restart-scripts 4.0K resolv.conf 4.0K resolv.dnsmasq 4.0K smb.conf 0 user.map lsy-gl-03 /diskForTestData/tst/.shard/.remove_me: total 32K 0 . 32K .. 0 1b69424e-47ca-44b9-b475-9f073956fd10 0 5459b172-600a-4464-8fcd-8e987a62fb37 /diskForTestData/tst/smb_conf: total 80K 4.0K . 0 .. 8.0K failover-dns.conf 8.0K ganesha.conf 8.0K krb5.conf 8.0K mnt-gvol.mount 8.0K mnt-prod.mount 8.0K mnt-tst.mount 0 mount-restart-scripts 8.0K resolv.conf 8.0K resolv.dnsmasq 8.0K smb.conf 4.0K user.map Also find difference in .gluster folder like this: lsy-lg-02 /diskForTestData/tst/.glusterfs/e5/25: total 80K 0 . 12K .. 44K e5250ec5-b28e-4015-a3b3-8c9287b961ef 8.0K e525238c-3ee1-4581-941f-29b50a2159f9 8.0K e5254136-413b-4008-aa2a-871e22fd0e89 8.0K e5257805-e240-401a-a71b-c39718095b9a lsy-gl-03 /diskForTestData/tst/.glusterfs/e5/25: total 65M 0 . 12K .. 44K e5250ec5-b28e-4015-a3b3-8c9287b961ef 8.0K e525238c-3ee1-4581-941f-29b50a2159f9 8.0K e5254136-413b-4008-aa2a-871e22fd0e89 8.0K e5257805-e240-401a-a71b-c39718095b9a 65M e525b876-7fd1-46ba-93fa-293e27db983c -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 20 16:32:35 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 16:32:35 +0000 Subject: [Bugs] [Bug 1743782] New: Windows client fails to copy large file to GlusterFS volume share with fruit and streams_xattr VFS modules via Samba Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743782 Bug ID: 1743782 Summary: Windows client fails to copy large file to GlusterFS volume share with fruit and streams_xattr VFS modules via Samba Product: GlusterFS Version: 6 Hardware: x86_64 OS: Linux Status: NEW Component: md-cache Severity: high Priority: high Assignee: bugs at gluster.org Reporter: anoopcs at redhat.com CC: atumball at redhat.com, bugs at gluster.org, gdeschner at redhat.com, madam at redhat.com, pgurusid at redhat.com, ryan at magenta.tv Depends On: 1726205 Blocks: 1732376 Target Milestone: --- Classification: Community +++ This bug was initially created as a clone of Bug #1726205 +++ Description of problem: Windows client errors out while copying large file into GlusterFS volume share configured with fruit and streams_xattr VFS modules. See attachment for error message. Version-Release number of selected component (if applicable): master How reproducible: Always Steps to Reproduce: 1. Create a basic distribute-replicate volume 2. Enable "group samba" volume set on the volume 3. Set up a Samba share with fruit and streams_xattr VFS modules vfs objects = fruit streams_xattr glusterfs 4. Connect to the share from a Windows client 5. Try to copy a large file(probably with size > 600M) into share Actual results: Windows client fails to copy large file with error(see attachment). Expected results: Copy completes successfully. Additional info(root cause): Problem lies in md-cache layer where it fails to update cache for xattrs with null value("\0"). Following steps reproduce the core issue on a plain FUSE mount: # touch /mnt/glusterfs/foobar # setfattr -n "user.DosStream.Zone.Identifier:\$DATA" -v "\0" /mnt/glusterfs/foobar # echo $? 0 # getfattr -d -m . -e hex /mnt/glusterfs/foobar getfattr: Removing leading '/' from absolute path names # file: mnt/glusterfs/foobar security.selinux=0x73797374656d5f753a6f626a6563745f723a6675736566735f74 3a733000 /mnt/glusterfs/foobar: user.DosStream.Zone.Identifier:$DATA: No such attribute # getfattr -d -m . -e hex /brick/brick1/foobar getfattr: Removing leading '/' from absolute path names # file: brick/brick1/foobar security.selinux=0x73797374656d5f753a6f626a6563745f723a676c757374657264 5f627269636b5f743a733000 trusted.gfid=0xde7d450691b24107b0c03fac58d9e49e trusted.gfid2path.17f514a2c19aaa57=0x30303030303030302d303030302d303030 302d303030302d3030303030303030303030312f666f6f626172 user.DosStream.Zone.Identifier:$DATA=0x00 # gluster v set vol performance.cache-samba-metadata off volume set: success # getfattr -d -m . -e hex /mnt/glusterfs/foobar getfattr: Removing leading '/' from absolute path names # file: mnt/glusterfs/foobar security.selinux=0x73797374656d5f753a6f626a6563745f723a6675736566735f74 3a733000 user.DosStream.Zone.Identifier:$DATA=0x00 --- Additional comment from on 2019-07-04 16:22:06 IST --- Just raised this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1727062 Not sure if there are any similarities? --- Additional comment from Anoop C S on 2019-07-05 17:11:08 IST --- --- Additional comment from Worker Ant on 2019-08-12 10:43:49 IST --- REVIEW: https://review.gluster.org/23206 (performance/md-cache: Do not skip caching of null character xattr values) posted (#1) for review on master by Anoop C S --- Additional comment from Worker Ant on 2019-08-20 15:43:03 IST --- REVIEW: https://review.gluster.org/23206 (performance/md-cache: Do not skip caching of null character xattr values) merged (#5) on master by Amar Tumballi Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1726205 [Bug 1726205] Windows client fails to copy large file to GlusterFS volume share with fruit and streams_xattr VFS modules via Samba https://bugzilla.redhat.com/show_bug.cgi?id=1732376 [Bug 1732376] Windows client fails to copy large file to GlusterFS volume share with fruit and streams_xattr VFS modules via Samba -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 20 16:32:35 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 16:32:35 +0000 Subject: [Bugs] [Bug 1726205] Windows client fails to copy large file to GlusterFS volume share with fruit and streams_xattr VFS modules via Samba In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1726205 Anoop C S changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1743782 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1743782 [Bug 1743782] Windows client fails to copy large file to GlusterFS volume share with fruit and streams_xattr VFS modules via Samba -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 20 16:34:31 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 16:34:31 +0000 Subject: [Bugs] [Bug 1743782] Windows client fails to copy large file to GlusterFS volume share with fruit and streams_xattr VFS modules via Samba In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743782 Anoop C S changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED Assignee|bugs at gluster.org |anoopcs at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 20 16:35:46 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 16:35:46 +0000 Subject: [Bugs] [Bug 1743782] Windows client fails to copy large file to GlusterFS volume share with fruit and streams_xattr VFS modules via Samba In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743782 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23279 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 20 16:35:47 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 16:35:47 +0000 Subject: [Bugs] [Bug 1743782] Windows client fails to copy large file to GlusterFS volume share with fruit and streams_xattr VFS modules via Samba In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743782 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23279 (performance/md-cache: Do not skip caching of null character xattr values) posted (#1) for review on release-6 by Anoop C S -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 20 09:35:46 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 09:35:46 +0000 Subject: [Bugs] [Bug 789278] Issues reported by Coverity static analysis tool In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=789278 --- Comment #1653 from Worker Ant --- REVIEW: https://review.gluster.org/23255 (api: fixing a coverity issue) merged (#4) on master by Amar Tumballi -- You are receiving this mail because: You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 20 16:54:08 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 16:54:08 +0000 Subject: [Bugs] [Bug 1738786] ctime: If atime is updated via utimensat syscall ctime is not getting updated In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1738786 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-20 16:54:08 --- Comment #3 from Worker Ant --- REVIEW: https://review.gluster.org/23177 (ctime: Fix ctime issue with utime family of syscalls) merged (#7) on master by Amar Tumballi -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 20 16:54:09 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 20 Aug 2019 16:54:09 +0000 Subject: [Bugs] [Bug 1743627] ctime: If atime is updated via utimensat syscall ctime is not getting updated In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743627 Bug 1743627 depends on bug 1738786, which changed state. Bug 1738786 Summary: ctime: If atime is updated via utimensat syscall ctime is not getting updated https://bugzilla.redhat.com/show_bug.cgi?id=1738786 What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 21 01:20:09 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 21 Aug 2019 01:20:09 +0000 Subject: [Bugs] [Bug 1741783] volume heal info show nothing, while visiting from mount point blame "no such entry" In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741783 zhou lin changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(zz.sh.cynthia at gma | |il.com) | --- Comment #10 from zhou lin --- this issue should happened during system startup phase, it is probable that there is connect/disconnect happends between client and server. yes, the reason i say the purge fails is because there is is still file mn-1__dbim-redis.service__database-nosql-cmredis.sync_state.tmp in brick dir. i have tried full heal, but it still does not work i do not quite understand "parent had some pending marker on them", is there any method to confirm this? -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 21 01:29:10 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 21 Aug 2019 01:29:10 +0000 Subject: [Bugs] [Bug 1741783] volume heal info show nothing, while visiting from mount point blame "no such entry" In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741783 --- Comment #11 from zhou lin --- [root at mn-0:/mnt/bricks/services/brick/db/upgrade] # glusterfs --version glusterfs 3.12.15 Repository revision: git://git.gluster.org/glusterfs.git Copyright (c) 2006-2016 Red Hat, Inc. GlusterFS comes with ABSOLUTELY NO WARRANTY. It is licensed to you under your choice of the GNU Lesser General Public License, version 3 or any later version (LGPLv3 or later), or the GNU General Public License, version 2 (GPLv2), in all cases as published by the Free Software Foundation. [root at mn-0:/mnt/bricks/services/brick/db/upgrade] # -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 21 01:51:55 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 21 Aug 2019 01:51:55 +0000 Subject: [Bugs] [Bug 1741783] volume heal info show nothing, while visiting from mount point blame "no such entry" In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741783 --- Comment #12 from zhou lin --- when i open services brick process trace level log, i find the posix_lookup return failed, is it because the missing link file? could we do sth to restore the missing link here? services-changetimerecorder [2019-08-21 01:38:08.304603] T [MSGID: 0] [changetimerecorder.c:360:ctr_lookup] 0-stack-trace: stack-address: 0x7f01b000b740, winding from services-changetimerecorder to services-trash [2019-08-21 01:38:08.304607] T [MSGID: 0] [defaults.c:2574:default_lookup] 0-stack-trace: stack-address: 0x7f01b000b740, winding from services-trash to services-posix [2019-08-21 01:38:08.304652] D [MSGID: 0] [posix.c:370:posix_lookup] 0-stack-trace: stack-address: 0x7f01b000b740, services-posix returned -1 error: No such file or directory [No such file or directory] [2019-08-21 01:38:08.304663] D [MSGID: 0] [changetimerecorder.c:309:ctr_lookup_cbk] 0-stack-trace: stack-address: 0x7f01b000b740, services-changetimerecorder returned -1 error: No such file or directory [No such file or directory] [2019-08-21 01:38:08.304669] D [MSGID: 0] [bit-rot-stub.c:2930:br_stub_lookup_cbk] 0-stack-trace: stack-address: 0x7f01b000b740, services-bitrot-stub returned -1 error: No such file or directory [No such file or directory] [2019-08-21 01:38:08.304675] D [MSGID: 0] [posix-acl.c:1017:posix_acl_lookup_cbk] 0-stack-trace: stack-address: 0x7f01b000b740, services-access-control returned -1 error: No such file or directory [No such file or directory] [2019-08-21 01:38:08.304681] D [MSGID: 0] [posix.c:2639:pl_lookup_cbk] 0-stack-trace: stack-address: 0x7f01b000b740, services-locks returned -1 error: No such file or directory [No such file or directory] [2019-08-21 01:38:08.304697] D [MSGID: 0] [upcall.c:796:up_lookup_cbk] 0-stack-trace: stack-address: 0x7f01b000b740, services-upcall returned -1 error: No such file or directory [No such file or directory] [2019-08-21 01:38:08.304703] D [MSGID: 0] [defaults.c:1266:default_lookup_cbk] 0-stack-trace: stack-address: 0x7f01b000b740, services-io-threads returned -1 error: No such file or directory [No such file or directory] [2019-08-21 01:38:08.304709] T [marker.c:2913:marker_lookup_cbk] 0-services-marker: lookup failed with No such file or directory [2019-08-21 01:38:08.304713] D [MSGID: 0] [marker.c:2948:marker_lookup_cbk] 0-stack-trace: stack-address: 0x7f01b000b740, services-marker returned -1 error: No such file or directory [No such file or directory] [2019-08-21 01:38:08.304720] D [MSGID: 0] [index.c:2015:index_lookup_cbk] 0-stack-trace: stack-address: 0x7f01b000b740, services-index returned -1 error: No such file or directory [No such file or directory] [2019-08-21 01:38:08.304726] D [MSGID: 0] [io-stats.c:2216:io_stats_lookup_cbk] 0-stack-trace: stack-address: 0x7f01b000b740, services-io-stats returned -1 error: No such file or directory [No such file or directory] [2019-08-21 01:38:08.304734] D [MSGID: 115050] [server-rpc-fops.c:185:server_lookup_cbk] 0-services-server: 1438388: LOOKUP /db/upgrade/mn-1__dbim-redis.service__database-nosql-cmredis.sync_state.tmp (5f2dc6c6-3fc4-41ba-9800-a560b483de13), client: mn-0-3851-2019/08/20-16:35:22:730333-services-client-0-0-0, error-xlator: services-posix [No such file or directory] -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 21 04:43:40 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 21 Aug 2019 04:43:40 +0000 Subject: [Bugs] [Bug 789278] Issues reported by Coverity static analysis tool In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=789278 --- Comment #1654 from Worker Ant --- REVIEW: https://review.gluster.org/23234 (storage/posix - fixing a coverity issue) merged (#11) on master by Amar Tumballi -- You are receiving this mail because: You are the assignee for the bug. From bugzilla at redhat.com Wed Aug 21 06:01:00 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 21 Aug 2019 06:01:00 +0000 Subject: [Bugs] [Bug 1743988] New: Setting cluster.heal-timeout requires volume restart Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743988 Bug ID: 1743988 Summary: Setting cluster.heal-timeout requires volume restart Product: GlusterFS Version: 6 Hardware: x86_64 OS: Linux Status: NEW Component: selfheal Severity: low Assignee: bugs at gluster.org Reporter: glenk1973 at hotmail.com CC: bugs at gluster.org Target Milestone: --- Classification: Community Description of problem: Setting the `cluster.heal-timeout` requires a volume restart to take effect. Version-Release number of selected component (if applicable): 6.5 How reproducible: Every time Steps to Reproduce: 1. Provision a 3-peer replica volume (I used three docker containers). 2. Set `cluster.favorite-child-policy` to `mtime`. 3. Mount the volume on one of the containers (say `gluster-0`, serving as a server and a client). 4. Stop the self-heal daemon. 5. Set `cluster.entry-self-heal`, `cluster.data-self-heal` and `cluster.metadata-self-heal` to off. 6. Set `cluster.quorum-type` to none. 7. Write "first write" to file `test.txt` on the mounted volume. 8. Kill the brick process `gluster-2`. 9. Write "second write" to `test.txt`. 10. Force start the volume (`gluster volume start force`) 11. Kill brick processes `gluster-0` and `gluster-1`. 12. Write "third write" to `test.txt`. 13. Force start the volume. 14. Verify that "split-brain" appears in the output of `gluster volume heal info` command. 15. Set `cluster.heal-timeout` to `60`. 16. Start the self-heal daemon. 17. Issue `gluster volume heal info` command after 70 seconds. 18. Verify that the output at step 17 does not contain "split-brain". 19. Verify that the content of `test.txt` is "third write". Actual results: The output at step 17 contains "split-brain". Expected results: The output at step 17 should _not_ contain "split-brain". Additional info: According to what Ravishankar N said on Slack (https://gluster.slack.com/archives/CH9M2KF60/p1566346818102000), changing volume options such as `cluster.heal-timeout` should not require a process restart. If I add a `gluster volume start force` command immediately after step 16 above, then I get the Expected results. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Wed Aug 21 06:04:23 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 21 Aug 2019 06:04:23 +0000 Subject: [Bugs] [Bug 1743988] Setting cluster.heal-timeout requires volume restart In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743988 --- Comment #1 from Glen K --- I should add that `cluster.quorum-type` is set to `none` for the test. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Wed Aug 21 06:13:59 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 21 Aug 2019 06:13:59 +0000 Subject: [Bugs] [Bug 1740077] Fencing: Added the tcmu-runner ALUA feature support but after one of node is rebooted the glfs_file_lock() get stucked In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1740077 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-21 06:13:59 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23216 (locks/fencing: Address hang while lock preemption) merged (#3) on release-7 by Rinku Kothiya -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 21 06:14:22 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 21 Aug 2019 06:14:22 +0000 Subject: [Bugs] [Bug 1740519] event: rename event_XXX with gf_ prefixed to avoid crash when apps linked libevent at the same time In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1740519 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23218 (event: rename event_XXX with gf_ prefixed) merged (#2) on release-7 by Rinku Kothiya -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Wed Aug 21 06:20:24 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 21 Aug 2019 06:20:24 +0000 Subject: [Bugs] [Bug 1732875] GlusterFS 7.0 tracker In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732875 --- Comment #4 from Worker Ant --- REVIEW: https://review.gluster.org/23174 (doc: Added initial release notes for release-7) merged (#2) on release-7 by Rinku Kothiya -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Wed Aug 21 06:57:26 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 21 Aug 2019 06:57:26 +0000 Subject: [Bugs] [Bug 1741899] the volume of occupied space in the bricks of gluster volume (3 nodes replica) differs on nodes and the healing does not fix it In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741899 --- Comment #8 from Ravishankar N --- Looks like the discrepancy is due to the no. of files (738 to be specific) amongst the bricks. The directories and symlinks and their checksums match on all 3 bricks. The only fix I can think of is to find out (manually) which are the files that differ in size and forcefully trigger a heal on them. You could go through "Hack: How to trigger heal on *any* file/directory" section of my blog-post https://ravispeaks.wordpress.com/2019/05/14/gluster-afr-the-complete-guide-part-3/ -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 21 07:03:13 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 21 Aug 2019 07:03:13 +0000 Subject: [Bugs] [Bug 1741899] the volume of occupied space in the bricks of gluster volume (3 nodes replica) differs on nodes and the healing does not fix it In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741899 --- Comment #9 from Ravishankar N --- Also note that sharding is currently supported only for single writer use case, typically for backing store for oVirt. (https://github.com/gluster/glusterfs/issues/290) -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 21 09:42:22 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 21 Aug 2019 09:42:22 +0000 Subject: [Bugs] [Bug 1739884] glusterfsd process crashes with SIGSEGV In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739884 --- Comment #13 from Susant Kumar Palai --- >From the current report, it is not clear which thread crashed and for what reason. Please do the following. # Install gluster-debuginfo package of the same version as your glusterfs server. # type the following command into terminal : "gdb glusterfsd $core-file" and copy the text from the terminal which should have the crashed thread bt. It looks something like this. Core was generated by `/usr/local/sbin/glusterfs --process-name fuse --volfile-server=vm3 --volfile-id'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007f12be8c5d9b in dht_create (frame=0x7f12a4001e48, this=0x0, loc=0x7f12b8009130, flags=34881, mode=33188, umask=0, fd=0x7f12b8001948, params=0x7f12ac0020c8) at dht-common.c:8677 8677 conf = this->private; [Current thread is 1 (Thread 0x7f12ce3ff700 (LWP 30689))] Missing separate debuginfos, use: dnf debuginfo-install glibc-2.26-28.fc27.x86_64 openssl-libs-1.1.0h-3.fc27.x86_64 pcre2-10.31-7.fc27.x86_64 sssd-client-1.16.2-4.fc27.x86_64 (gdb) bt #0 0x00007f12be8c5d9b in dht_create (frame=0x7f12a4001e48, this=0x0, loc=0x7f12b8009130, flags=34881, mode=33188, umask=0, fd=0x7f12b8001948, params=0x7f12ac0020c8) at dht-common.c:8677 #1 0x00007f12be633d8a in gf_utime_create (frame=0x7f12a4006cb8, this=0x7f12b0010b70, loc=0x7f12b8009130, flags=34881, mode=33188, umask=0, fd=0x7f12b8001948, xdata=0x7f12ac0020c8) at utime-autogen-fops.c:172 #2 0x00007f12ce11b86c in default_create (frame=0x7f12a4006cb8, this=0x7f12b0012890, loc=0x7f12b8009130, flags=34881, mode=33188, umask=0, fd=0x7f12b8001948, xdata=0x7f12ac0020c8) at defaults.c:2601 #3 0x00007f12be201846 in ra_create (frame=0x7f12a4001608, this=0x7f12b0014540, loc=0x7f12b8009130, flags=34881, mode=33188, umask=0, fd=0x7f12b8001948, xdata=0x7f12ac0020c8) at read-ahead.c:194 #4 0x00007f12ce11b86c in default_create (frame=0x7f12a4001608, this=0x7f12b0016120, loc=0x7f12b8009130, flags=34881, mode=33188, umask=0, fd=0x7f12b8001948, xdata=0x7f12ac0020c8) at defaults.c:2601 #5 0x00007f12bddd896f in ioc_create (frame=0x7f12a4008ff8, this=0x7f12b0018240, loc=0x7f12b8009130, flags=34881, mode=33188, umask=0, fd=0x7f12b8001948, xdata=0x7f12ac0020c8) at io-cache.c:919 #6 0x00007f12ce11b86c in default_create (frame=0x7f12a4008ff8, this=0x7f12b0019e30, loc=0x7f12b8009130, flags=34881, mode=33188, umask=0, fd=0x7f12b8001948, xdata=0x7f12ac0020c8) at defaults.c:2601 #7 0x00007f12ce11b86c in default_create (frame=0x7f12a4008ff8, this=0x7f12b001ba30, loc=0x7f12b8009130, flags=34881, mode=33188, umask=0, fd=0x7f12b8001948, xdata=0x7f12ac0020c8) at defaults.c:2601 #8 0x00007f12bd798698 in mdc_create (frame=0x7f12a4000cb8, this=0x7f12b001d610, loc=0x7f12b8009130, flags=34881, mode=33188, umask=0, fd=0x7f12b8001948, xdata=0x7f12ac0020c8) at md-cache.c:1969 #9 0x00007f12ce1102ed in default_create_resume (frame=0x7f12b8008328, this=0x7f12b001f210, loc=0x7f12b8009130, flags=34881, mode=33188, umask=0, fd=0x7f12b8001948, xdata=0x7f12ac0020c8) at defaults.c:1873 #10 0x00007f12ce05fbf8 in call_resume_wind (stub=0x7f12b80090e8) at call-stub.c:2033 #11 0x00007f12ce071bee in call_resume (stub=0x7f12b80090e8) at call-stub.c:2555 #12 0x00007f12bd5809c6 in iot_worker (data=0x7f12b002efc0) at io-threads.c:232 #13 0x00007f12ccbf550b in start_thread () from /lib64/libpthread.so.0 #14 0x00007f12cc4a516f in clone () from /lib64/libc.so.6 And then get another report with "thread apply all bt". Once we check that info we will figure out the next steps. Regards, Susant -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 21 09:54:36 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 21 Aug 2019 09:54:36 +0000 Subject: [Bugs] [Bug 789278] Issues reported by Coverity static analysis tool In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=789278 --- Comment #1655 from Worker Ant --- REVIEW: https://review.gluster.org/23237 (features/cloudsync - fix a coverity issue) merged (#5) on master by Barak Sason Rofman -- You are receiving this mail because: You are the assignee for the bug. From bugzilla at redhat.com Wed Aug 21 09:56:54 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 21 Aug 2019 09:56:54 +0000 Subject: [Bugs] [Bug 1743988] Setting cluster.heal-timeout requires volume restart In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743988 Ravishankar N changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |Triaged Status|NEW |ASSIGNED CC| |ravishankar at redhat.com Assignee|bugs at gluster.org |ravishankar at redhat.com Flags| |needinfo?(glenk1973 at hotmail | |.com) --- Comment #2 from Ravishankar N --- Okay, so after some investigation, I don't think this is an issue. When you change the heal-timeout, it does get propagated to the self-heal daemon. But since the default value is 600 seconds, the threads that do the heal only wake up after that time. Once it wakes up, subsequent runs do seem to honour the new heal-timeout value. On a glusterfs 6.5 setup: #gluster v create testvol replica 2 127.0.0.2:/home/ravi/bricks/brick{1..2} force #gluster v set testvol client-log-level DEBUG #gluster v start testvol #gluster v set testvol heal-timeout 5 #tail -f /var/log/glusterfs/glustershd.log|grep finished You don't see anything in the log yet about the crawls. But once you manually launch heal, the threads are woken up and further crawls happen every 5 seconds. #gluster v heal testvol Now in glustershd.log: [2019-08-21 09:55:02.024160] D [MSGID: 0] [afr-self-heald.c:843:afr_shd_index_healer] 0-testvol-replicate-0: finished index sweep on subvol testvol-client-0. [2019-08-21 09:55:02.024271] D [MSGID: 0] [afr-self-heald.c:843:afr_shd_index_healer] 0-testvol-replicate-0: finished index sweep on subvol testvol-client-1. [2019-08-21 09:55:08.023252] D [MSGID: 0] [afr-self-heald.c:843:afr_shd_index_healer] 0-testvol-replicate-0: finished index sweep on subvol testvol-client-1. [2019-08-21 09:55:08.023358] D [MSGID: 0] [afr-self-heald.c:843:afr_shd_index_healer] 0-testvol-replicate-0: finished index sweep on subvol testvol-client-0. [2019-08-21 09:55:14.024438] D [MSGID: 0] [afr-self-heald.c:843:afr_shd_index_healer] 0-testvol-replicate-0: finished index sweep on subvol testvol-client-1. [2019-08-21 09:55:14.024546] D [MSGID: 0] [afr-self-heald.c:843:afr_shd_index_healer] 0-testvol-replicate-0: finished index sweep on subvol testvol-client-0. Glen, could you check if that works for you? i.e. after setting the heal-timeout, manually launch heal via `gluster v heal testvol`. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Wed Aug 21 11:04:09 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 21 Aug 2019 11:04:09 +0000 Subject: [Bugs] [Bug 1741783] volume heal info show nothing, while visiting from mount point blame "no such entry" In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741783 --- Comment #13 from Karthik U S --- I tried it locally and it is not removing the entry or creating the link file and replicating the file onto the other bricks. If it does not have the gfid xattr set as well, it assigns the gfid and creates the file on the other bricks. In this case it has the xattr but the gfid-link is missing. So while doing lookup it sees the gfid and when lookup is done on that gfid it will fail as the gfid link is missing. I'm checking the code to come up with the best way to handle this case. For the time bring you can either delete the file (if it is not required) since the file is not present on quorum number of bricks or you can delete the "trusted.gfid" xattr on the brick and then do lookup on the file. That should assign new gfid and create the gfid-link file and replicate it on other bricks. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 21 11:42:32 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 21 Aug 2019 11:42:32 +0000 Subject: [Bugs] [Bug 1740316] read() returns more than file size when using direct I/O In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1740316 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-21 11:42:32 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23213 (features/shard: Send correct size when reads are sent beyond file size) merged (#3) on release-7 by Rinku Kothiya -- You are receiving this mail because: You are the QA Contact for the bug. You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 21 11:45:59 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 21 Aug 2019 11:45:59 +0000 Subject: [Bugs] [Bug 1741041] atime/mtime is not restored after healing for entry self heals In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741041 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-21 11:45:59 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23225 (afr: restore timestamp of parent dir during entry-heal) merged (#3) on release-7 by Rinku Kothiya -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 21 14:00:32 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 21 Aug 2019 14:00:32 +0000 Subject: [Bugs] [Bug 1741734] gluster-smb:glusto-test access gluster by cifs test write report Device or resource busy In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741734 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-21 14:00:32 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23240 (gluster-smb:add smb parameter when access gluster by cifs) merged (#4) on master by Amar Tumballi -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Wed Aug 21 14:01:42 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 21 Aug 2019 14:01:42 +0000 Subject: [Bugs] [Bug 789278] Issues reported by Coverity static analysis tool In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=789278 --- Comment #1656 from Worker Ant --- REVIEW: https://review.gluster.org/23249 (features/utime - fixing a coverity issue) merged (#5) on master by Amar Tumballi -- You are receiving this mail because: You are the assignee for the bug. From bugzilla at redhat.com Wed Aug 21 14:02:20 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 21 Aug 2019 14:02:20 +0000 Subject: [Bugs] [Bug 789278] Issues reported by Coverity static analysis tool In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=789278 --- Comment #1657 from Worker Ant --- REVIEW: https://review.gluster.org/23261 (storage/posix - Fixing a coverity issue) merged (#5) on master by Amar Tumballi -- You are receiving this mail because: You are the assignee for the bug. From bugzilla at redhat.com Wed Aug 21 15:50:43 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 21 Aug 2019 15:50:43 +0000 Subject: [Bugs] [Bug 1739884] glusterfsd process crashes with SIGSEGV In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739884 --- Comment #14 from Chad Feller --- I had run debuginfo-install glusterfs-server But it looks like it only installed the debuginfo packages for glusterfs dependencies and not glusterfs itself. Is this a gluster packaging bug? I'll rerun the backtrace with the glusterfs-debuginfo package installed. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 21 18:15:39 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 21 Aug 2019 18:15:39 +0000 Subject: [Bugs] [Bug 1743988] Setting cluster.heal-timeout requires volume restart In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743988 Glen K changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(glenk1973 at hotmail | |.com) | --- Comment #3 from Glen K --- In my steps above, I set the heal-timeout while the self-heal daemon is stopped: ... 4. Stop the self-heal daemon. ... 15. Set `cluster.heal-timeout` to `60`. 16. Start the self-heal daemon. ... I would expect that the configuration would certainly take effect after a restart of the self-heal daemon. Yes, launching heal manually causes the heal to happen right away, but the purpose of the test is to verify the heal happens automatically. From a user perspective, the current behaviour of the heal-timeout setting appears to be at odds with the "configuration changes take effect without restart" feature; I think it is reasonable to request that changing the heal-timeout setting results in the thread sleeps being reset to the new setting. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 21 19:55:45 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 21 Aug 2019 19:55:45 +0000 Subject: [Bugs] [Bug 1739884] glusterfsd process crashes with SIGSEGV In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739884 --- Comment #15 from Chad Feller --- I wasn't able to pull the gluster-debuginfo package via yum, even with --enablerepo=centos-storage-debuginfo. The baseurl in the CentOS-Storage-common.repo file looks correct, so I'm not sure if there is a problem with the repodata file or something else... Anyway, I manually downloaded the glusterfs-debuginfo file installed it and am attaching the output of: gdb glusterfsd $core-file As well as the thread apply all bt commands -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 21 19:58:00 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 21 Aug 2019 19:58:00 +0000 Subject: [Bugs] [Bug 1739884] glusterfsd process crashes with SIGSEGV In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739884 --- Comment #16 from Chad Feller --- Created attachment 1606731 --> https://bugzilla.redhat.com/attachment.cgi?id=1606731&action=edit gdb glusterfsd core.6563 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 21 19:58:37 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 21 Aug 2019 19:58:37 +0000 Subject: [Bugs] [Bug 1739884] glusterfsd process crashes with SIGSEGV In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739884 Chad Feller changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment|gdb glusterfsd core.6563 |gluster00: gdb glusterfsd #1606731| |core.6563 description| | -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 21 19:59:33 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 21 Aug 2019 19:59:33 +0000 Subject: [Bugs] [Bug 1739884] glusterfsd process crashes with SIGSEGV In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739884 --- Comment #17 from Chad Feller --- Created attachment 1606732 --> https://bugzilla.redhat.com/attachment.cgi?id=1606732&action=edit gluster00: gdb glusterfsd core.31704 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 21 20:00:19 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 21 Aug 2019 20:00:19 +0000 Subject: [Bugs] [Bug 1739884] glusterfsd process crashes with SIGSEGV In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739884 --- Comment #18 from Chad Feller --- Created attachment 1606733 --> https://bugzilla.redhat.com/attachment.cgi?id=1606733&action=edit gluster01: gdb glusterfsd core.6560 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 21 20:01:06 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 21 Aug 2019 20:01:06 +0000 Subject: [Bugs] [Bug 1739884] glusterfsd process crashes with SIGSEGV In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739884 --- Comment #19 from Chad Feller --- Created attachment 1606734 --> https://bugzilla.redhat.com/attachment.cgi?id=1606734&action=edit gluster01: gdb glusterfsd core.23686 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 21 21:09:22 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 21 Aug 2019 21:09:22 +0000 Subject: [Bugs] [Bug 1739884] glusterfsd process crashes with SIGSEGV In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739884 Chad Feller changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment|0 |1 #1603901 is| | obsolete| | --- Comment #20 from Chad Feller --- Created attachment 1606755 --> https://bugzilla.redhat.com/attachment.cgi?id=1606755&action=edit gdb backtrace from gluster00 crash #1 (full) -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 21 21:10:13 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 21 Aug 2019 21:10:13 +0000 Subject: [Bugs] [Bug 1739884] glusterfsd process crashes with SIGSEGV In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739884 Chad Feller changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment|0 |1 #1603902 is| | obsolete| | --- Comment #21 from Chad Feller --- Created attachment 1606757 --> https://bugzilla.redhat.com/attachment.cgi?id=1606757&action=edit gdb backtrace from gluster01 crash #1 (full) -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 21 21:10:50 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 21 Aug 2019 21:10:50 +0000 Subject: [Bugs] [Bug 1739884] glusterfsd process crashes with SIGSEGV In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739884 Chad Feller changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment|0 |1 #1603903 is| | obsolete| | --- Comment #22 from Chad Feller --- Created attachment 1606759 --> https://bugzilla.redhat.com/attachment.cgi?id=1606759&action=edit gdb backtrace from gluster00 crash #2 (full) -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 21 21:11:25 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 21 Aug 2019 21:11:25 +0000 Subject: [Bugs] [Bug 1739884] glusterfsd process crashes with SIGSEGV In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739884 --- Comment #23 from Chad Feller --- Created attachment 1606760 --> https://bugzilla.redhat.com/attachment.cgi?id=1606760&action=edit gdb backtrace from gluster01 crash #2 (full) -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 21 21:11:25 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 21 Aug 2019 21:11:25 +0000 Subject: [Bugs] [Bug 1739884] glusterfsd process crashes with SIGSEGV In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739884 Chad Feller changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment|0 |1 #1603904 is| | obsolete| | -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 22 01:17:46 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 01:17:46 +0000 Subject: [Bugs] [Bug 1741783] volume heal info show nothing, while visiting from mount point blame "no such entry" In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741783 --- Comment #14 from zhou lin --- thanks for your reply, i've tried removing the mn-1__dbim-redis.service__database-nosql-cmredis.sync_state.tmp, from mn-0 brick dir since it's size is 0 . however, there is no clue from glusterfs user side to find this error until visiting that file. because after shd heal, the entry in xattrop will be removed, and gluster volume heal info output becomes empty. is there any method to find this problem so that some actions could be made to repair manually(like removing from brick or deleting "trusted.gfid" xattr) ? in our system some app need to use that file, from their point of view, this is a bug, and i just hope there is some way to detect this issue and do some actions, so app will not stuck on this file. of course, it will be better if glusterfs source code could do the action to repair that file. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 22 04:55:25 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 04:55:25 +0000 Subject: [Bugs] [Bug 1739360] [GNFS] gluster crash with nfs.nlm off In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739360 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-22 04:55:25 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23185 (nlm: check if nlm4 is initialized in nlm_priv) merged (#8) on master by Amar Tumballi -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 22 05:35:53 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 05:35:53 +0000 Subject: [Bugs] [Bug 1743652] CentOs 6 GlusterFS client creates files with time 01/01/1970 In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743652 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-22 05:35:53 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23274 (ctime: Fix incorrect realtime passed to frame->root->ctime) merged (#3) on master by Amar Tumballi -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 22 05:57:34 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 05:57:34 +0000 Subject: [Bugs] [Bug 1739424] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739424 --- Comment #4 from Worker Ant --- REVIEW: https://review.gluster.org/23188 (cluster/ec: inherit healing from lock when it has info) merged (#3) on release-7 by Rinku Kothiya -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 22 05:57:59 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 05:57:59 +0000 Subject: [Bugs] [Bug 1739427] An Input/Output error happens on a disperse volume when doing unaligned writes to a sparse file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739427 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-22 05:57:59 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23189 (cluster/ec: fix EIO error for concurrent writes on sparse files) merged (#3) on release-7 by Rinku Kothiya -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 22 05:57:59 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 05:57:59 +0000 Subject: [Bugs] [Bug 1731448] [GSS] An Input/Output error happens on a disperse volume when doing unaligned writes to a sparse file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1731448 Bug 1731448 depends on bug 1739427, which changed state. Bug 1739427 Summary: An Input/Output error happens on a disperse volume when doing unaligned writes to a sparse file https://bugzilla.redhat.com/show_bug.cgi?id=1739427 What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 22 05:58:00 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 05:58:00 +0000 Subject: [Bugs] [Bug 1739451] An Input/Output error happens on a disperse volume when doing unaligned writes to a sparse file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739451 Bug 1739451 depends on bug 1739427, which changed state. Bug 1739427 Summary: An Input/Output error happens on a disperse volume when doing unaligned writes to a sparse file https://bugzilla.redhat.com/show_bug.cgi?id=1739427 What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 22 05:58:00 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 05:58:00 +0000 Subject: [Bugs] [Bug 1732779] [GSS] An Input/Output error happens on a disperse volume when doing unaligned writes to a sparse file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732779 Bug 1732779 depends on bug 1739427, which changed state. Bug 1739427 Summary: An Input/Output error happens on a disperse volume when doing unaligned writes to a sparse file https://bugzilla.redhat.com/show_bug.cgi?id=1739427 What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 22 05:58:26 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 05:58:26 +0000 Subject: [Bugs] [Bug 1739424] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739424 --- Comment #5 from Worker Ant --- REVIEW: https://review.gluster.org/23190 (cluster/ec: Always read from good-mask) merged (#3) on release-7 by Rinku Kothiya -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 22 05:58:51 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 05:58:51 +0000 Subject: [Bugs] [Bug 1739426] Open fd heal should filter O_APPEND/O_EXCL In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739426 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-22 05:58:51 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23191 (cluster/ec: Fix reopen flags to avoid misbehavior) merged (#3) on release-7 by Rinku Kothiya -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 22 05:58:52 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 05:58:52 +0000 Subject: [Bugs] [Bug 1735514] Open fd heal should filter O_APPEND/O_EXCL In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1735514 Bug 1735514 depends on bug 1739426, which changed state. Bug 1739426 Summary: Open fd heal should filter O_APPEND/O_EXCL https://bugzilla.redhat.com/show_bug.cgi?id=1739426 What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 22 05:58:53 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 05:58:53 +0000 Subject: [Bugs] [Bug 1739450] Open fd heal should filter O_APPEND/O_EXCL In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739450 Bug 1739450 depends on bug 1739426, which changed state. Bug 1739426 Summary: Open fd heal should filter O_APPEND/O_EXCL https://bugzilla.redhat.com/show_bug.cgi?id=1739426 What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 22 05:58:53 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 05:58:53 +0000 Subject: [Bugs] [Bug 1734303] Open fd heal should filter O_APPEND/O_EXCL In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734303 Bug 1734303 depends on bug 1739426, which changed state. Bug 1739426 Summary: Open fd heal should filter O_APPEND/O_EXCL https://bugzilla.redhat.com/show_bug.cgi?id=1739426 What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 22 05:59:19 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 05:59:19 +0000 Subject: [Bugs] [Bug 1739424] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739424 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-22 05:59:19 --- Comment #6 from Worker Ant --- REVIEW: https://review.gluster.org/23192 (cluster/ec: Update lock->good_mask on parent fop failure) merged (#4) on release-7 by Rinku Kothiya -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 22 05:59:21 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 05:59:21 +0000 Subject: [Bugs] [Bug 1732774] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732774 Bug 1732774 depends on bug 1739424, which changed state. Bug 1739424 Summary: Disperse volume : data corruption with ftruncate data in 4+2 config https://bugzilla.redhat.com/show_bug.cgi?id=1739424 What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 22 05:59:21 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 05:59:21 +0000 Subject: [Bugs] [Bug 1732792] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732792 Bug 1732792 depends on bug 1739424, which changed state. Bug 1739424 Summary: Disperse volume : data corruption with ftruncate data in 4+2 config https://bugzilla.redhat.com/show_bug.cgi?id=1739424 What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 22 05:59:22 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 05:59:22 +0000 Subject: [Bugs] [Bug 1739449] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739449 Bug 1739449 depends on bug 1739424, which changed state. Bug 1739424 Summary: Disperse volume : data corruption with ftruncate data in 4+2 config https://bugzilla.redhat.com/show_bug.cgi?id=1739424 What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 22 05:59:23 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 05:59:23 +0000 Subject: [Bugs] [Bug 1732772] Disperse volume : data corruption with ftruncate data in 4+2 config In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732772 Bug 1732772 depends on bug 1739424, which changed state. Bug 1739424 Summary: Disperse volume : data corruption with ftruncate data in 4+2 config https://bugzilla.redhat.com/show_bug.cgi?id=1739424 What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 22 05:59:50 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 05:59:50 +0000 Subject: [Bugs] [Bug 1736481] capture stat failure error while setting the gfid In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1736481 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-22 05:59:50 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23214 (storage/posix: set the op_errno to proper errno during gfid set) merged (#3) on release-7 by Rinku Kothiya -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 22 05:59:50 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 05:59:50 +0000 Subject: [Bugs] [Bug 1736482] capture stat failure error while setting the gfid In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1736482 Bug 1736482 depends on bug 1736481, which changed state. Bug 1736481 Summary: capture stat failure error while setting the gfid https://bugzilla.redhat.com/show_bug.cgi?id=1736481 What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 22 07:08:35 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 07:08:35 +0000 Subject: [Bugs] [Bug 1744420] New: glusterd crashing with core dump on the latest nightly builds. Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744420 Bug ID: 1744420 Summary: glusterd crashing with core dump on the latest nightly builds. Product: GlusterFS Version: mainline Hardware: x86_64 OS: Linux Status: NEW Component: glusterd Severity: urgent Assignee: bugs at gluster.org Reporter: kiyer at redhat.com QA Contact: kiyer at redhat.com CC: bugs at gluster.org Target Milestone: --- Classification: Community Description of problem: While trying to verify glusto-test patches we had observed that glusterd isn't starting during the installation. This was observed for the last 2 nightly builds. https://ci.centos.org/job/gluster_glusto-patch-check/1537/consoleFull # systemctl status glusterd ? glusterd.service - GlusterFS, a clustered file-system server Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled) Active: failed (Result: exit-code) since Wed 2019-08-21 12:19:20 IST; 24h ago Docs: man:glusterd(8) Process: 13852 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=1/FAILURE) Aug 21 12:19:20 dhcp35-114.lab.eng.blr.redhat.com glusterd[13853]: spinlock 1 Aug 21 12:19:20 dhcp35-114.lab.eng.blr.redhat.com glusterd[13853]: epoll.h 1 Aug 21 12:19:20 dhcp35-114.lab.eng.blr.redhat.com glusterd[13853]: xattr.h 1 Aug 21 12:19:20 dhcp35-114.lab.eng.blr.redhat.com glusterd[13853]: st_atim.tv_nsec 1 Aug 21 12:19:20 dhcp35-114.lab.eng.blr.redhat.com glusterd[13853]: package-string: glusterfs 20190820.95f71df Aug 21 12:19:20 dhcp35-114.lab.eng.blr.redhat.com glusterd[13853]: --------- Aug 21 12:19:20 dhcp35-114.lab.eng.blr.redhat.com systemd[1]: glusterd.service: control process exited, code=exited status=1 Aug 21 12:19:20 dhcp35-114.lab.eng.blr.redhat.com systemd[1]: Failed to start GlusterFS, a clustered file-system server. Aug 21 12:19:20 dhcp35-114.lab.eng.blr.redhat.com systemd[1]: Unit glusterd.service entered failed state. Aug 21 12:19:20 dhcp35-114.lab.eng.blr.redhat.com systemd[1]: glusterd.service failed. Program terminated with signal 6, Aborted. #0 0x00007fcb85868207 in __GI_raise (sig=sig at entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55 55 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig); (gdb) bt #0 0x00007fcb85868207 in __GI_raise (sig=sig at entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55 #1 0x00007fcb858698f8 in __GI_abort () at abort.c:90 #2 0x00007fcb858aad27 in __libc_message (do_abort=do_abort at entry=2, fmt=fmt at entry=0x7fcb859bb312 "*** %s ***: %s terminated\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:196 #3 0x00007fcb859499e7 in __GI___fortify_fail (msg=msg at entry=0x7fcb859bb2b8 "buffer overflow detected") at fortify_fail.c:30 #4 0x00007fcb85947b62 in __GI___chk_fail () at chk_fail.c:28 #5 0x00007fcb8594727b in ___vsnprintf_chk (s=, maxlen=, flags=, slen=, format=, args=args at entry=0x7fffd7e97708) at vsnprintf_chk.c:37 #6 0x00007fcb85947198 in ___snprintf_chk (s=s at entry=0x7fffd7e97a40 "", maxlen=maxlen at entry=4096, flags=flags at entry=1, slen=slen at entry=3776, format=format at entry=0x7fcb7b48dd4b "%s") at snprintf_chk.c:35 #7 0x00007fcb7b34efb9 in snprintf (__fmt=0x7fcb7b48dd4b "%s", __n=4096, __s=0x7fffd7e97a40 "") at /usr/include/bits/stdio2.h:64 #8 init (this=0x561402cb9520) at glusterd.c:1450 #9 0x00007fcb87222ea1 in __xlator_init (xl=0x561402cb9520) at xlator.c:597 #10 xlator_init (xl=xl at entry=0x561402cb9520) at xlator.c:623 #11 0x00007fcb8725fb29 in glusterfs_graph_init (graph=graph at entry=0x561402cb50f0) at graph.c:422 #12 0x00007fcb87260195 in glusterfs_graph_activate (graph=graph at entry=0x561402cb50f0, ctx=ctx at entry=0x561402c70010) at graph.c:776 #13 0x00005614017c3182 in glusterfs_process_volfp (ctx=ctx at entry=0x561402c70010, fp=fp at entry=0x561402cb4e70) at glusterfsd.c:2728 #14 0x00005614017c333d in glusterfs_volumes_init (ctx=ctx at entry=0x561402c70010) at glusterfsd.c:2800 #15 0x00005614017bea3a in main (argc=4, argv=) at glusterfsd.c:2962 (gdb) t a a bt Thread 7 (Thread 0x7fcb7c002700 (LWP 13727)): #0 0x00007fcb85926f73 in select () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007fcb872a4224 in runner (arg=0x561402cb2bf0) at ../../contrib/timer-wheel/timer-wheel.c:186 #2 0x00007fcb86067dd5 in start_thread (arg=0x7fcb7c002700) at pthread_create.c:307 #3 0x00007fcb8592fead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 Thread 6 (Thread 0x7fcb7e807700 (LWP 13722)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 #1 0x00007fcb872337ab in gf_timer_proc (data=0x561402cae280) at timer.c:140 #2 0x00007fcb86067dd5 in start_thread (arg=0x7fcb7e807700) at pthread_create.c:307 #3 0x00007fcb8592fead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 Thread 5 (Thread 0x7fcb7e006700 (LWP 13723)): #0 0x00007fcb8606f361 in do_sigwait (sig=0x7fcb7e0050dc, set=) at ../sysdeps/unix/sysv/linux/sigwait.c:60 #1 __sigwait (set=set at entry=0x7fcb7e0050e0, sig=sig at entry=0x7fcb7e0050dc) at ../sysdeps/unix/sysv/linux/sigwait.c:95 #2 0x00005614017c277b in glusterfs_sigwaiter (arg=) at glusterfsd.c:2463 #3 0x00007fcb86067dd5 in start_thread (arg=0x7fcb7e006700) at pthread_create.c:307 #4 0x00007fcb8592fead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 Thread 4 (Thread 0x7fcb7d805700 (LWP 13724)): #0 0x00007fcb858f6e2d in nanosleep () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007fcb858f6cc4 in __sleep (seconds=0, seconds at entry=30) at ../sysdeps/unix/sysv/linux/sleep.c:137 #2 0x00007fcb87250868 in pool_sweeper (arg=) at mem-pool.c:446 #3 0x00007fcb86067dd5 in start_thread (arg=0x7fcb7d805700) at pthread_create.c:307 #4 0x00007fcb8592fead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 Thread 3 (Thread 0x7fcb7d004700 (LWP 13725)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 #1 0x00007fcb872659b0 in syncenv_task (proc=proc at entry=0x561402caea70) at syncop.c:517 #2 0x00007fcb87266860 in syncenv_processor (thdata=0x561402caea70) at syncop.c:584 #3 0x00007fcb86067dd5 in start_thread (arg=0x7fcb7d004700) at pthread_create.c:307 #4 0x00007fcb8592fead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 Thread 2 (Thread 0x7fcb7c803700 (LWP 13726)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 #1 0x00007fcb872659b0 in syncenv_task (proc=proc at entry=0x561402caee30) at syncop.c:517 #2 0x00007fcb87266860 in syncenv_processor (thdata=0x561402caee30) at syncop.c:584 ---Type to continue, or q to quit--- #3 0x00007fcb86067dd5 in start_thread (arg=0x7fcb7c803700) at pthread_create.c:307 #4 0x00007fcb8592fead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 Thread 1 (Thread 0x7fcb8772a4c0 (LWP 13721)): #0 0x00007fcb85868207 in __GI_raise (sig=sig at entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55 #1 0x00007fcb858698f8 in __GI_abort () at abort.c:90 #2 0x00007fcb858aad27 in __libc_message (do_abort=do_abort at entry=2, fmt=fmt at entry=0x7fcb859bb312 "*** %s ***: %s terminated\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:196 #3 0x00007fcb859499e7 in __GI___fortify_fail (msg=msg at entry=0x7fcb859bb2b8 "buffer overflow detected") at fortify_fail.c:30 #4 0x00007fcb85947b62 in __GI___chk_fail () at chk_fail.c:28 #5 0x00007fcb8594727b in ___vsnprintf_chk (s=, maxlen=, flags=, slen=, format=, args=args at entry=0x7fffd7e97708) at vsnprintf_chk.c:37 #6 0x00007fcb85947198 in ___snprintf_chk (s=s at entry=0x7fffd7e97a40 "", maxlen=maxlen at entry=4096, flags=flags at entry=1, slen=slen at entry=3776, format=format at entry=0x7fcb7b48dd4b "%s") at snprintf_chk.c:35 #7 0x00007fcb7b34efb9 in snprintf (__fmt=0x7fcb7b48dd4b "%s", __n=4096, __s=0x7fffd7e97a40 "") at /usr/include/bits/stdio2.h:64 #8 init (this=0x561402cb9520) at glusterd.c:1450 #9 0x00007fcb87222ea1 in __xlator_init (xl=0x561402cb9520) at xlator.c:597 #10 xlator_init (xl=xl at entry=0x561402cb9520) at xlator.c:623 #11 0x00007fcb8725fb29 in glusterfs_graph_init (graph=graph at entry=0x561402cb50f0) at graph.c:422 #12 0x00007fcb87260195 in glusterfs_graph_activate (graph=graph at entry=0x561402cb50f0, ctx=ctx at entry=0x561402c70010) at graph.c:776 #13 0x00005614017c3182 in glusterfs_process_volfp (ctx=ctx at entry=0x561402c70010, fp=fp at entry=0x561402cb4e70) at glusterfsd.c:2728 #14 0x00005614017c333d in glusterfs_volumes_init (ctx=ctx at entry=0x561402c70010) at glusterfsd.c:2800 #15 0x00005614017bea3a in main (argc=4, argv=) at glusterfsd.c:2962 Version-Release number of selected component (if applicable): Whatever is the version in upstream. How reproducible: Always Steps to Reproduce: 1.service glusterd start. Actual results: glusterd crashing with core dump. Expected results: glusterd shouldn't crash and core files shouldn't be created. Additional info: -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 22 07:11:53 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 07:11:53 +0000 Subject: [Bugs] [Bug 1743988] Setting cluster.heal-timeout requires volume restart In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743988 --- Comment #4 from Ravishankar N --- (In reply to Glen K from comment #3) > > I would expect that the configuration would certainly take effect after a > restart of the self-heal daemon. In step-4 and 16, I assume you toggled `cluster.self-heal-daemon` off and on respectively. This actually does not kill the shd process per se and just disables/enables the heal crawls. In 6.5, a volume start force does restart shd so changing the order of the tests should do the trick, i.e. 13. Set `cluster.heal-timeout` to `60`. 14. Force start the volume. 15. Verify that "split-brain" appears in the output of `gluster volume heal info` command. > Yes, launching heal manually causes the heal to happen right away, but the > purpose of the test is to verify the heal happens automatically. From a user > perspective, the current behaviour of the heal-timeout setting appears to be > at odds with the "configuration changes take effect without restart" > feature; I think it is reasonable to request that changing the heal-timeout > setting results in the thread sleeps being reset to the new setting. Fair enough, I'll attempt a fix on master, let us see how the review goes. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 22 08:34:09 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 08:34:09 +0000 Subject: [Bugs] [Bug 1739884] glusterfsd process crashes with SIGSEGV In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739884 Susant Kumar Palai changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|spalai at redhat.com |moagrawa at redhat.com --- Comment #24 from Susant Kumar Palai --- Thanks for the report. Here is the crashed thread. Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/sbin/glusterfsd -s gluster00 --volfile-id gv0.gluster00.export-brick1-srv'. Program terminated with signal 11, Segmentation fault. #0 socket_is_connected (this=0x7f5f500af3c0) at socket.c:2619 2619 if (priv->use_ssl) { [?1034h(gdb) quit Form this it looks like "priv" is NULL. Can you print the value of priv for clarity (command to type after opening core with gdb: "p priv"). Assigning this to Mohit to take it further. Susant -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 22 09:22:28 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 09:22:28 +0000 Subject: [Bugs] [Bug 1744420] glusterd crashing with core dump on the latest nightly builds. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744420 Sanju changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |srakonde at redhat.com Flags| |needinfo?(kiyer at redhat.com) --- Comment #1 from Sanju --- Kshithij, Can you please mention all steps of reproducer? i.e, what are the steps performed on the cluster before glusterd crashed? I'm not sure but this might be having a relationship with the bugs that are filed under shd-multiplexing feature (saying this just because this is also a sigabrt). Thanks, Sanju -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 22 09:24:54 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 09:24:54 +0000 Subject: [Bugs] [Bug 1727727] Build+Packaging Automation In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1727727 --- Comment #11 from hari gowtham --- By setup i meant doing the following prerequisites: these two steps are the ones necessary as of now: - `deb.packages.dot-gnupg.tgz`: has the ~/.gnupg dir with the keyring needed to build & sign packages - packages required: build-essential pbuilder devscripts reprepro debhelper dpkg-sig And for the first time we need to do this: # First time create the /var/cache/pbuilder/base.tgz # on debian: sudo pbuilder create --distribution wheezy --mirror ftp://ftp.us.debian.org/debian/ --debootstrapopts "--keyring=/usr/share/keyrings/debian-archive-keyring.gpg" # on raspbian: sudo pbuilder create --distribution wheezy --mirror http://archive.raspbian.org/raspbian/ --debootstrapopts "--keyring=/usr/share/keyrings/raspbian-archive-keyring.gpg" NOTE: In future if any change is made here ( https://github.com/semiosis/glusterfs-debian/tree/wheezy-glusterfs-3.5/debian) then we might have to change it. The reason to go for the above two level implementation was, I wasn't aware of how to make the job run on a particular machine based on the arguments it gets. Like stretch has to be run on rhs-vm-16.storage-dev.lab.eng.bOS.redhat.com(which will be one of the jenkins debian slaves) And we have to run the script on multiple machines based on the number of distributions we want to build. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 22 09:25:48 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 09:25:48 +0000 Subject: [Bugs] [Bug 1744420] glusterd crashing with core dump on the latest nightly builds. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744420 Kshithij Iyer changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(kiyer at redhat.com) | --- Comment #2 from Kshithij Iyer --- (In reply to Sanju from comment #1) > Kshithij, > > Can you please mention all steps of reproducer? i.e, what are the steps > performed on the cluster before glusterd crashed? Just installed glusterfs using the nightly builds and tried to start glusterd. That's it! I didn't perform any other steps. > I'm not sure but this might be having a relationship with the bugs that are > filed under shd-multiplexing feature (saying this just because this is also > a sigabrt). > > Thanks, > Sanju -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 22 10:03:57 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 10:03:57 +0000 Subject: [Bugs] [Bug 1741890] geo-rep: Changelog archive file format is incorrect In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741890 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-22 10:03:57 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23248 (geo-rep: Fix the name of changelog archive file) merged (#4) on master by Aravinda VK -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 22 10:03:58 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 10:03:58 +0000 Subject: [Bugs] [Bug 1743634] geo-rep: Changelog archive file format is incorrect In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743634 Bug 1743634 depends on bug 1741890, which changed state. Bug 1741890 Summary: geo-rep: Changelog archive file format is incorrect https://bugzilla.redhat.com/show_bug.cgi?id=1741890 What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 22 10:04:42 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 10:04:42 +0000 Subject: [Bugs] [Bug 1743634] geo-rep: Changelog archive file format is incorrect In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743634 RHEL Product and Program Management changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|rhgs-3.5.0? blocker? |rhgs-3.5.0+ blocker+ Target Release|--- |RHGS 3.5.0 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 22 10:04:44 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 10:04:44 +0000 Subject: [Bugs] [Bug 1743627] ctime: If atime is updated via utimensat syscall ctime is not getting updated In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743627 RHEL Product and Program Management changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|rhgs-3.5.0? blocker? |rhgs-3.5.0+ blocker+ Target Release|--- |RHGS 3.5.0 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 22 10:17:30 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 10:17:30 +0000 Subject: [Bugs] [Bug 1727727] Build+Packaging Automation In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1727727 --- Comment #12 from M. Scherer --- Ok, so I will install the packages on the builder we have, and then have it added to jenkins. (and while on it, also have 2nd one, just in case) As for running different job running on specific machine, that's indeed pretty annoying on jenkins. I do not have enough experience with jjb, but JobTemplate is likely something that would help for that: https://docs.openstack.org/infra/jenkins-job-builder/definition.html#id2 But afaik, gluster is not dependent on the kernel, so building that with pbuilder in a chroot should be sufficient no matter what Debian, as long as it is a up to date one, no ? -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 22 10:21:04 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 10:21:04 +0000 Subject: [Bugs] [Bug 1744519] New: log aio_error return codes in posix_fs_health_check Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744519 Bug ID: 1744519 Summary: log aio_error return codes in posix_fs_health_check Product: GlusterFS Version: 6 Status: NEW Component: posix Assignee: bugs at gluster.org Reporter: moagrawa at redhat.com CC: bugs at gluster.org, rhinduja at redhat.com, rhs-bugs at redhat.com, sankarshan at redhat.com, storage-qa-internal at redhat.com Depends On: 1744518 Target Milestone: --- Classification: Community +++ This bug was initially created as a clone of Bug #1744518 +++ Description of problem: in posix_health_check function log aio_error return code in the message while aio operation is failed. Sometime brick is going down to health check thread is failed without logging error codes return by aio system calls.As per aio_error man page it returns a positive error number if the asynchronous I/O operation failed. Version-Release number of selected component (if applicable): How reproducible: Only reproducible in QE environment Steps to Reproduce: 1. 2. 3. Actual results: health check thread is failing without logging aio return codes. Expected results: Log message should print error code in case of error. Additional info: --- Additional comment from RHEL Product and Program Management on 2019-08-22 10:18:31 UTC --- This bug is automatically being proposed for the next minor release of Red Hat Gluster Storage by setting the release flag 'rhgs?3.5.0' to '?'. If this bug should be proposed for a different release, please manually change the proposed release flag. Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1744518 [Bug 1744518] log aio_error return codes in posix_fs_health_check -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 22 10:21:18 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 10:21:18 +0000 Subject: [Bugs] [Bug 1744519] log aio_error return codes in posix_fs_health_check In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744519 Mohit Agrawal changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|bugs at gluster.org |moagrawa at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 22 10:23:34 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 10:23:34 +0000 Subject: [Bugs] [Bug 1744519] log aio_error return codes in posix_fs_health_check In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744519 Mohit Agrawal changed: What |Removed |Added ---------------------------------------------------------------------------- Version|6 |mainline -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 22 10:27:05 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 10:27:05 +0000 Subject: [Bugs] [Bug 1744519] log aio_error return codes in posix_fs_health_check In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744519 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23284 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 22 10:27:06 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 10:27:06 +0000 Subject: [Bugs] [Bug 1744519] log aio_error return codes in posix_fs_health_check In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744519 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23284 (posix: log aio_error return codes in posix_fs_health_check) posted (#1) for review on master by MOHIT AGRAWAL -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 22 10:30:41 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 10:30:41 +0000 Subject: [Bugs] [Bug 1727727] Build+Packaging Automation In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1727727 hari gowtham changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |kkeithle at redhat.com Flags| |needinfo?(kkeithle at redhat.c | |om) --- Comment #13 from hari gowtham --- (In reply to M. Scherer from comment #12) > Ok, so I will install the packages on the builder we have, and then have it > added to jenkins. > (and while on it, also have 2nd one, just in case) Forgot to mention that this script file is also necessary: https://github.com/Sheetalpamecha/packaging-scripts/blob/master/generic_package.sh Will send a patch to have it in the repo. > > As for running different job running on specific machine, that's indeed > pretty annoying on jenkins. I do not have enough experience with jjb, but > JobTemplate is likely something that would help for that: > https://docs.openstack.org/infra/jenkins-job-builder/definition.html#id2 Will look into it. I'm new to writing jobs for jenkins. > > But afaik, gluster is not dependent on the kernel, so building that with > pbuilder in a chroot should be sufficient no matter what Debian, as long as > it is a up to date one, no ? Yes, gluster is not dependent on kernel, but I'm unaware of using chroot for different debian version . For this Kaleb would be the better person to answer. @kaleb can you please answer this? -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 22 10:42:48 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 10:42:48 +0000 Subject: [Bugs] [Bug 1727727] Build+Packaging Automation In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1727727 --- Comment #14 from M. Scherer --- Pbuilder do setup chroots, afaik, so that's kinda like mock, if you are maybe more familliar with the Fedora/Centos tooling. Now, maybe there is limitation and they do not work exactly the same, but I would have expected a clean chroot created each time, to build the package. I didn't do debian package since a long time. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 22 11:12:02 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 11:12:02 +0000 Subject: [Bugs] [Bug 1743634] geo-rep: Changelog archive file format is incorrect In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743634 Sunil Kumar Acharya changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |MODIFIED CC| |sheggodu at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 22 11:13:13 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 11:13:13 +0000 Subject: [Bugs] [Bug 1728047] interrupts leak memory In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1728047 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23285 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 22 11:13:14 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 11:13:14 +0000 Subject: [Bugs] [Bug 1728047] interrupts leak memory In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1728047 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|CLOSED |POST Resolution|NEXTRELEASE |--- Keywords| |Reopened --- Comment #4 from Worker Ant --- REVIEW: https://review.gluster.org/23285 (fuse: add missing GF_FREE to fuse_interrupt) posted (#1) for review on release-7 by Csaba Henk -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 22 11:13:14 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 11:13:14 +0000 Subject: [Bugs] [Bug 1734423] interrupts leak memory In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734423 Bug 1734423 depends on bug 1728047, which changed state. Bug 1728047 Summary: interrupts leak memory https://bugzilla.redhat.com/show_bug.cgi?id=1728047 What |Removed |Added ---------------------------------------------------------------------------- Status|CLOSED |POST Resolution|NEXTRELEASE |--- -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 22 11:46:33 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 11:46:33 +0000 Subject: [Bugs] [Bug 1711950] Account in download.gluster.org to upload the build packages In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1711950 --- Comment #7 from Shwetha K Acharya --- @misc (In reply to M. Scherer from comment #4) > Sure give me a deadline, and I will create the account. I mean, I do not > even need a precise one. > > Would you agree on "We do in 3 months", in which case I create the account > right now (with expiration as set). > > (I need a public ssh key and a username) We have already taken up the task of automating building and packaging. Details can be found at https://bugzilla.redhat.com/show_bug.cgi?id=1727727. Please create the account. Below are the required details: Public ssh key: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCifwFXjkLXFwnlTBMXFgTEHAA1Vavzti41B4Yp1RJYtCuJ91s+P5YHc2j4a/wpVquPJboNuv9wtqknmd5SJYBXB11dinNfHfvE+gCN9Osdn64/om9i3pIpQQeY6uvF4MF9yfyx8huEWFZeaOiljvmTZ3//4kzsJHK2yKCmJFhy5Zcg9+WMM2bjfACjlFDIuOG2kqaRM8tGggOQG9iQ/VElWOTxJkHUJaP50PWdwEHHoiCKmipe5xEcSR/6qubaF6VpMfBLmrjmJMqkjVozryVweHBLn3oQfOkJmlErwJox7hLFuk5V4fvVine5xrWKygw/kA2Mpr7Q1zXg5moZHbCP root at localhost.localdomain User name: sacharya -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 22 11:48:24 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 11:48:24 +0000 Subject: [Bugs] [Bug 1743627] ctime: If atime is updated via utimensat syscall ctime is not getting updated In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743627 Sunil Kumar Acharya changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |MODIFIED CC| |sheggodu at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 22 12:01:02 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 12:01:02 +0000 Subject: [Bugs] [Bug 1711950] Account in download.gluster.org to upload the build packages In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1711950 --- Comment #8 from M. Scherer --- As said in the comment #2 and comment #4, what is the deadline for the account closure ? If I do not get a answer, then I will just decide on "3 month after the creation" and then deploy. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 22 12:05:46 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 12:05:46 +0000 Subject: [Bugs] [Bug 1744548] New: Setting cluster.heal-timeout requires volume restart Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744548 Bug ID: 1744548 Summary: Setting cluster.heal-timeout requires volume restart Product: GlusterFS Version: mainline Hardware: x86_64 OS: Linux Status: NEW Component: selfheal Keywords: Triaged Severity: low Assignee: bugs at gluster.org Reporter: ravishankar at redhat.com CC: bugs at gluster.org, glenk1973 at hotmail.com, ravishankar at redhat.com Depends On: 1743988 Target Milestone: --- Classification: Community +++ This bug was initially created as a clone of Bug #1743988 +++ Description of problem: Setting the `cluster.heal-timeout` requires a volume restart to take effect. Version-Release number of selected component (if applicable): 6.5 How reproducible: Every time Steps to Reproduce: 1. Provision a 3-peer replica volume (I used three docker containers). 2. Set `cluster.favorite-child-policy` to `mtime`. 3. Mount the volume on one of the containers (say `gluster-0`, serving as a server and a client). 4. Stop the self-heal daemon. 5. Set `cluster.entry-self-heal`, `cluster.data-self-heal` and `cluster.metadata-self-heal` to off. 6. Set `cluster.quorum-type` to none. 7. Write "first write" to file `test.txt` on the mounted volume. 8. Kill the brick process `gluster-2`. 9. Write "second write" to `test.txt`. 10. Force start the volume (`gluster volume start force`) 11. Kill brick processes `gluster-0` and `gluster-1`. 12. Write "third write" to `test.txt`. 13. Force start the volume. 14. Verify that "split-brain" appears in the output of `gluster volume heal info` command. 15. Set `cluster.heal-timeout` to `60`. 16. Start the self-heal daemon. 17. Issue `gluster volume heal info` command after 70 seconds. 18. Verify that the output at step 17 does not contain "split-brain". 19. Verify that the content of `test.txt` is "third write". Actual results: The output at step 17 contains "split-brain". Expected results: The output at step 17 should _not_ contain "split-brain". Additional info: According to what Ravishankar N said on Slack (https://gluster.slack.com/archives/CH9M2KF60/p1566346818102000), changing volume options such as `cluster.heal-timeout` should not require a process restart. If I add a `gluster volume start force` command immediately after step 16 above, then I get the Expected results. --- Additional comment from Glen K on 2019-08-21 06:04:23 UTC --- I should add that `cluster.quorum-type` is set to `none` for the test. --- Additional comment from Ravishankar N on 2019-08-21 09:56:54 UTC --- Okay, so after some investigation, I don't think this is an issue. When you change the heal-timeout, it does get propagated to the self-heal daemon. But since the default value is 600 seconds, the threads that do the heal only wake up after that time. Once it wakes up, subsequent runs do seem to honour the new heal-timeout value. On a glusterfs 6.5 setup: #gluster v create testvol replica 2 127.0.0.2:/home/ravi/bricks/brick{1..2} force #gluster v set testvol client-log-level DEBUG #gluster v start testvol #gluster v set testvol heal-timeout 5 #tail -f /var/log/glusterfs/glustershd.log|grep finished You don't see anything in the log yet about the crawls. But once you manually launch heal, the threads are woken up and further crawls happen every 5 seconds. #gluster v heal testvol Now in glustershd.log: [2019-08-21 09:55:02.024160] D [MSGID: 0] [afr-self-heald.c:843:afr_shd_index_healer] 0-testvol-replicate-0: finished index sweep on subvol testvol-client-0. [2019-08-21 09:55:02.024271] D [MSGID: 0] [afr-self-heald.c:843:afr_shd_index_healer] 0-testvol-replicate-0: finished index sweep on subvol testvol-client-1. [2019-08-21 09:55:08.023252] D [MSGID: 0] [afr-self-heald.c:843:afr_shd_index_healer] 0-testvol-replicate-0: finished index sweep on subvol testvol-client-1. [2019-08-21 09:55:08.023358] D [MSGID: 0] [afr-self-heald.c:843:afr_shd_index_healer] 0-testvol-replicate-0: finished index sweep on subvol testvol-client-0. [2019-08-21 09:55:14.024438] D [MSGID: 0] [afr-self-heald.c:843:afr_shd_index_healer] 0-testvol-replicate-0: finished index sweep on subvol testvol-client-1. [2019-08-21 09:55:14.024546] D [MSGID: 0] [afr-self-heald.c:843:afr_shd_index_healer] 0-testvol-replicate-0: finished index sweep on subvol testvol-client-0. Glen, could you check if that works for you? i.e. after setting the heal-timeout, manually launch heal via `gluster v heal testvol`. --- Additional comment from Glen K on 2019-08-21 18:15:39 UTC --- In my steps above, I set the heal-timeout while the self-heal daemon is stopped: ... 4. Stop the self-heal daemon. ... 15. Set `cluster.heal-timeout` to `60`. 16. Start the self-heal daemon. ... I would expect that the configuration would certainly take effect after a restart of the self-heal daemon. Yes, launching heal manually causes the heal to happen right away, but the purpose of the test is to verify the heal happens automatically. From a user perspective, the current behaviour of the heal-timeout setting appears to be at odds with the "configuration changes take effect without restart" feature; I think it is reasonable to request that changing the heal-timeout setting results in the thread sleeps being reset to the new setting. --- Additional comment from Ravishankar N on 2019-08-22 07:11:53 UTC --- (In reply to Glen K from comment #3) > > I would expect that the configuration would certainly take effect after a > restart of the self-heal daemon. In step-4 and 16, I assume you toggled `cluster.self-heal-daemon` off and on respectively. This actually does not kill the shd process per se and just disables/enables the heal crawls. In 6.5, a volume start force does restart shd so changing the order of the tests should do the trick, i.e. 13. Set `cluster.heal-timeout` to `60`. 14. Force start the volume. 15. Verify that "split-brain" appears in the output of `gluster volume heal info` command. > Yes, launching heal manually causes the heal to happen right away, but the > purpose of the test is to verify the heal happens automatically. From a user > perspective, the current behaviour of the heal-timeout setting appears to be > at odds with the "configuration changes take effect without restart" > feature; I think it is reasonable to request that changing the heal-timeout > setting results in the thread sleeps being reset to the new setting. Fair enough, I'll attempt a fix on master, let us see how the review goes. Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1743988 [Bug 1743988] Setting cluster.heal-timeout requires volume restart -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 22 12:05:46 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 12:05:46 +0000 Subject: [Bugs] [Bug 1743988] Setting cluster.heal-timeout requires volume restart In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743988 Ravishankar N changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1744548 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1744548 [Bug 1744548] Setting cluster.heal-timeout requires volume restart -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 22 12:05:54 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 12:05:54 +0000 Subject: [Bugs] [Bug 1711950] Account in download.gluster.org to upload the build packages In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1711950 Shwetha K Acharya changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |hgowtham at redhat.com Flags| |needinfo?(hgowtham at redhat.c | |om) --- Comment #9 from Shwetha K Acharya --- Hari, Can you please address the above query? -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 22 12:06:34 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 12:06:34 +0000 Subject: [Bugs] [Bug 1744548] Setting cluster.heal-timeout requires volume restart In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744548 Ravishankar N changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED Assignee|bugs at gluster.org |ravishankar at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 22 12:15:14 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 12:15:14 +0000 Subject: [Bugs] [Bug 1744548] Setting cluster.heal-timeout requires volume restart In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744548 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23288 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 22 12:15:15 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 12:15:15 +0000 Subject: [Bugs] [Bug 1744548] Setting cluster.heal-timeout requires volume restart In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744548 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23288 (afr: wake up index healer threads) posted (#1) for review on master by Ravishankar N -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 22 13:01:23 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 13:01:23 +0000 Subject: [Bugs] [Bug 1711950] Account in download.gluster.org to upload the build packages In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1711950 hari gowtham changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(hgowtham at redhat.c | |om) | --- Comment #10 from hari gowtham --- We are trying to finish it within this sprint (each sprint is for 3 weeks). So we will assume that we should be done in a month with the automation. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 22 13:03:06 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 13:03:06 +0000 Subject: [Bugs] [Bug 1727727] Build+Packaging Automation In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1727727 --- Comment #15 from M. Scherer --- I did push the installation and I would like to defer the gnupg integration for now, as it likely requires a bit more discussion (like, how do we distribute the keys, etc, do we rotate it). And for the pbuilder cache, I would need to know the exact matrix of distribution we want to build and how. That part seems not too hard: https://wiki.debian.org/PbuilderTricks#How_to_build_for_different_distributions And if we aim to build on unstable, we also may need to do some work to keep the chroot updated (same for stable in fact). -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 22 13:05:37 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 13:05:37 +0000 Subject: [Bugs] [Bug 1711950] Account in download.gluster.org to upload the build packages In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1711950 --- Comment #11 from M. Scherer --- ok so 3 months is enougn (cause i also do not want to push unrealisitic deadline or more pressure, plus shit happen), I will add the account as soon as the previous ansible run finish. And if that's not enough, we can of course keep it open longer, just to be clear. But after jenkins issue last month, and the old compromise last time, we can't let stuff open too long if they are not going to clean themself. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 22 13:14:29 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 13:14:29 +0000 Subject: [Bugs] [Bug 1727727] Build+Packaging Automation In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1727727 Kaleb KEITHLEY changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(kkeithle at redhat.c | |om) | --- Comment #16 from Kaleb KEITHLEY --- yes, pbuilder is a chroot tool, similar to mock. Each time you build you get a clean chroot. We are currently building for stretch/9, buster/10, and bullseye/unstable/11. AFAIK the buildroot should be updated periodically for all of them; bullseye/unstable should probably be updated more frequently than the others. I don't know anything about pbuilder apart from what I mentioned above, and specifically I don't know anything about how to use pbuilder to build for different distributions on a single machine. I've been using separate stretch, buster, and bullseye installs on dedicated boxes to build the packages for that release of Debian. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 22 13:25:49 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 13:25:49 +0000 Subject: [Bugs] [Bug 1727727] Build+Packaging Automation In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1727727 --- Comment #17 from Kaleb KEITHLEY --- (In reply to M. Scherer from comment #15) > I did push the installation and I would like to defer the gnupg integration > for now, as it likely requires a bit more discussion (like, how do we > distribute the keys, etc, do we rotate it). > > And for the pbuilder cache, I would need to know the exact matrix of > distribution we want to build and how. That part seems not too hard: > https://wiki.debian.org/ > PbuilderTricks#How_to_build_for_different_distributions > > And if we aim to build on unstable, we also may need to do some work to keep > the chroot updated (same for stable in fact). The keys that we've been using were generated on an internal machine and distributed to the build machines, which are all internal as well. We were using a new, different key for every major version through 4.1, but some people complained about that, so for 5.x, 6.x, and now 7.x we have been using the same key. As 4.1 is about to reach EOL that essentially means we are only using a single key now for all the packages we build. AFAIK people expect the packages to be signed. And best practices suggests to me that they _must_ be signed. Given that 7.0rc0 is now out and packages will be signed with the current key, that suggests to me that we must keep using that key for the life of 7.x. We can certainly create a new key for 8.x, when that rolls around. And yes, we need a secure way to get the private key onto the jenkins build machines somehow. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 22 13:57:49 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 13:57:49 +0000 Subject: [Bugs] [Bug 1727727] Build+Packaging Automation In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1727727 --- Comment #18 from hari gowtham --- (In reply to hari gowtham from comment #13) > (In reply to M. Scherer from comment #12) > > Ok, so I will install the packages on the builder we have, and then have it > > added to jenkins. > > (and while on it, also have 2nd one, just in case) > > Forgot to mention that this script file is also necessary: > https://github.com/Sheetalpamecha/packaging-scripts/blob/master/ > generic_package.sh > Will send a patch to have it in the repo. The above mentioned file is sent as a patch at: https://review.gluster.org/#/c/build-jobs/+/23289 > > > > > As for running different job running on specific machine, that's indeed > > pretty annoying on jenkins. I do not have enough experience with jjb, but > > JobTemplate is likely something that would help for that: > > https://docs.openstack.org/infra/jenkins-job-builder/definition.html#id2 > > Will look into it. I'm new to writing jobs for jenkins. > > > > > But afaik, gluster is not dependent on the kernel, so building that with > > pbuilder in a chroot should be sufficient no matter what Debian, as long as > > it is a up to date one, no ? > > Yes, gluster is not dependent on kernel, but I'm unaware of using chroot > for different debian version . > For this Kaleb would be the better person to answer. > @kaleb can you please answer this? -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 22 14:26:44 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 14:26:44 +0000 Subject: [Bugs] [Bug 1711950] Account in download.gluster.org to upload the build packages In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1711950 --- Comment #12 from M. Scherer --- I created the user, tell me if it doesn't work. The server is download.rht.gluster.org (not download.gluster, who is a proxy). -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 22 16:34:22 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 16:34:22 +0000 Subject: [Bugs] [Bug 1744671] New: Smoke is failing for the changeset Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744671 Bug ID: 1744671 Summary: Smoke is failing for the changeset Product: GlusterFS Version: 6 Status: NEW Component: project-infrastructure Assignee: bugs at gluster.org Reporter: sheggodu at redhat.com CC: bugs at gluster.org, gluster-infra at gluster.org Target Milestone: --- Classification: Community Description of problem: Smoke job is failing for https://review.gluster.org/#/c/glusterfs/+/23284/ . Recheck is also not working properly. Please fix the issue. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 22 17:10:47 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 22 Aug 2019 17:10:47 +0000 Subject: [Bugs] [Bug 1739884] glusterfsd process crashes with SIGSEGV In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739884 --- Comment #25 from Chad Feller --- Hi, I checked all four core dumps, and they each have the same value for priv: [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/sbin/glusterfsd -s gluster00 --volfile-id gv0.gluster00.export-brick0-srv'. Program terminated with signal 11, Segmentation fault. #0 socket_is_connected (this=0x7fb40c0ef460) at socket.c:2619 2619 if (priv->use_ssl) { (gdb) p priv $1 = (socket_private_t *) 0x0 (gdb) -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 23 03:58:27 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 23 Aug 2019 03:58:27 +0000 Subject: [Bugs] [Bug 1711945] create account on download.gluster.org In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1711945 spamecha at redhat.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED Flags| |needinfo?(mscherer at redhat.c | |om) --- Comment #2 from spamecha at redhat.com --- Hi Michael Please create the account for me as well. Below are the required details: Public ssh key: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDHKKTwASBKVg4nN3p1vUj87906qFi8KQb/gTmt7ITPDg1GAvVhMJhbC4pT58/k9YjDf2Ez07VZ7fTYs9hqWHF4ZsJ2rbO2MPaHl4Fnfb8MP+Wq33juiznKRZU9+TRTFt83rDoRjDFwzhfGt6zdBPam6Etu0mR55OvWg8XM35wbdW0OP/pjIdQdjVoDp+YdpaX43lCr3M80NsbjAxk7xcPTrpqAK90qpVw1C5mqwHNeqJIGK/enADhaDaMhBPoNpWK1cy5xMnJcBbYXjrUZ4yqmhzJ48yUQiHYzlZZkx4JirbdZzE7FfRZt88crec9KTp1a/GLznP3L0dFA59SWAMKV root at shep-mac User name: spamecha -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 23 04:12:38 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 23 Aug 2019 04:12:38 +0000 Subject: [Bugs] [Bug 1744519] log aio_error return codes in posix_fs_health_check In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744519 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-23 04:12:38 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23284 (posix: log aio_error return codes in posix_fs_health_check) merged (#2) on master by MOHIT AGRAWAL -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 23 04:59:48 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 23 Aug 2019 04:59:48 +0000 Subject: [Bugs] [Bug 1744874] New: interrupts leak memory Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744874 Bug ID: 1744874 Summary: interrupts leak memory Product: GlusterFS Version: 7 Status: NEW Component: fuse Keywords: Reopened Assignee: bugs at gluster.org Reporter: nbalacha at redhat.com CC: bugs at gluster.org, csaba at redhat.com Depends On: 1728047 Blocks: 1734423 Target Milestone: --- Classification: Community +++ This bug was initially created as a clone of Bug #1728047 +++ Description of problem: When the glusterfs fuse client gets an INTERRUPT message (ie. a process gets SIGINT while in a syscall to the filesystem), not all data allocated by the handler code is freed. Version-Release number of selected component (if applicable): >= 6.0 The issue appears with the introduction of the interrupt handling framework. How reproducible: Always Steps to Reproduce: 1. Compile the test helper of the tests/features/interrupt.t test, open_and_sleep.c: $ gcc -o open_and_sleep tests/features/open_and_sleep.c 2. Mount a glusterfs volume 3. Run the command used in tests/features/interrupt.t in a loop against some file in the mount: $ while :; do ./open_and_sleep | { sleep 0.1; xargs -n1 kill -INT; } 3. Take statedumps at regular intervals and check gf_fuse_mt_iov_base memusage: # grep -A5 gf_fuse_mt_iov_base Actual results: Values of size and num_alloc fields monotonously grow with time across statedumps. Expected results: Values of size and num_alloc fields stay low across statedumps. --- Additional comment from Csaba Henk on 2019-07-08 22:11:10 UTC --- Command in reproduction step 3. is incomplete. It should be: $ while :; do ./open_and_sleep | { sleep 0.1; xargs -n1 kill -INT; }; done Improved version which also displays a counter: $ i=1; while :; do echo -en "\r$i "; ./open_and_sleep | { sleep 0.1; xargs -n1 kill -INT; }; i=$(($i+1)); done --- Additional comment from Worker Ant on 2019-07-09 09:10:00 UTC --- REVIEW: https://review.gluster.org/23016 (fuse: add missing GF_FREE to fuse_interrupt) posted (#1) for review on master by Csaba Henk --- Additional comment from Worker Ant on 2019-07-25 16:46:43 UTC --- REVIEW: https://review.gluster.org/23016 (fuse: add missing GF_FREE to fuse_interrupt) merged (#4) on master by Amar Tumballi --- Additional comment from Worker Ant on 2019-08-22 11:13:14 UTC --- REVIEW: https://review.gluster.org/23285 (fuse: add missing GF_FREE to fuse_interrupt) posted (#1) for review on release-7 by Csaba Henk Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1728047 [Bug 1728047] interrupts leak memory https://bugzilla.redhat.com/show_bug.cgi?id=1734423 [Bug 1734423] interrupts leak memory -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 23 04:59:48 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 23 Aug 2019 04:59:48 +0000 Subject: [Bugs] [Bug 1728047] interrupts leak memory In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1728047 Nithya Balachandran changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1744874 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1744874 [Bug 1744874] interrupts leak memory -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 23 04:59:48 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 23 Aug 2019 04:59:48 +0000 Subject: [Bugs] [Bug 1734423] interrupts leak memory In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734423 Nithya Balachandran changed: What |Removed |Added ---------------------------------------------------------------------------- Depends On| |1744874 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1744874 [Bug 1744874] interrupts leak memory -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 23 05:00:11 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 23 Aug 2019 05:00:11 +0000 Subject: [Bugs] [Bug 1744874] interrupts leak memory In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744874 Nithya Balachandran changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords|Reopened | -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 23 05:03:14 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 23 Aug 2019 05:03:14 +0000 Subject: [Bugs] [Bug 1728047] interrupts leak memory In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1728047 --- Comment #5 from Worker Ant --- REVISION POSTED: https://review.gluster.org/23285 (fuse: add missing GF_FREE to fuse_interrupt) posted (#2) for review on release-7 by N Balachandran -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 23 05:03:15 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 23 Aug 2019 05:03:15 +0000 Subject: [Bugs] [Bug 1728047] interrupts leak memory In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1728047 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID|Gluster.org Gerrit 23285 | -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 23 05:03:16 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 23 Aug 2019 05:03:16 +0000 Subject: [Bugs] [Bug 1744874] interrupts leak memory In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744874 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23285 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 23 05:03:17 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 23 Aug 2019 05:03:17 +0000 Subject: [Bugs] [Bug 1744874] interrupts leak memory In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744874 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23285 (fuse: add missing GF_FREE to fuse_interrupt) posted (#2) for review on release-7 by N Balachandran -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 23 06:12:45 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 23 Aug 2019 06:12:45 +0000 Subject: [Bugs] [Bug 1744883] New: GlusterFS problem dataloss Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744883 Bug ID: 1744883 Summary: GlusterFS problem dataloss Product: GlusterFS Version: 6 Hardware: x86_64 OS: Linux Status: NEW Component: glusterd Severity: urgent Assignee: bugs at gluster.org Reporter: nicola.battista89 at gmail.com CC: bugs at gluster.org Target Milestone: --- Classification: Community Created attachment 1607200 --> https://bugzilla.redhat.com/attachment.cgi?id=1607200&action=edit strace output Description of problem: Greetings, MariaDB Columnstore uses GFS bricks as an persistant storage for data. We store tables data in so-called segment files. Here[1] is the overview how we use GFS for redundancy. In your case there are some errors with those segment files that live on GFS. Here are examples of the errors I've seen on strace of the reading process. 86814 stat("/000.dir/000.dir/015.dir/064.dir/008.dir/FILE002.cdf", 0x7ff488ffdc80) = -1 ENOENT (No such file or directory) 86814 open("/000.dir/000.dir/015.dir/064.dir/008.dir/FILE002.cdf", O_RDONLY|O_NOATIME) = -1 ENOENT (No such file or directory) I've attached the strace output itself and you can filter on the segment file name /000.dir/000.dir/015.dir/064.dir/008.dir/FILE002.cdf to find the relevant errors. 1. https://mariadb.com/resources/blog/mariadb-columnstore-data-redundancy-a-look-under-the-hood/ Version-Release number of selected component (if applicable): GlusterFS version 6.X Link ticket MariaBD : https://jira.mariadb.org/browse/MCOL-3392 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 23 08:36:33 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 23 Aug 2019 08:36:33 +0000 Subject: [Bugs] [Bug 1732770] fix truncate lock to cover the write in tuncate clean In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732770 Upasana changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ON_QA |VERIFIED -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 23 10:14:39 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 23 Aug 2019 10:14:39 +0000 Subject: [Bugs] [Bug 1729108] Memory leak in glusterfsd process In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1729108 Upasana changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(nbalacha at redhat.c | |om) -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 23 10:40:10 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 23 Aug 2019 10:40:10 +0000 Subject: [Bugs] [Bug 1729108] Memory leak in glusterfsd process In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1729108 Nithya Balachandran changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(nbalacha at redhat.c | |om) | -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 23 12:32:00 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 23 Aug 2019 12:32:00 +0000 Subject: [Bugs] [Bug 1729108] Memory leak in glusterfsd process In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1729108 Upasana changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(nbalacha at redhat.c | |om) -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 23 13:18:13 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 23 Aug 2019 13:18:13 +0000 Subject: [Bugs] [Bug 1744420] glusterd crashing with core dump on the latest nightly builds. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744420 Amar Tumballi changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |atumball at redhat.com --- Comment #3 from Amar Tumballi --- I too have seen this, last week, but Kshithij, can you try restarting glusterd? It worked fine after restart. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 23 13:22:41 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 23 Aug 2019 13:22:41 +0000 Subject: [Bugs] [Bug 1744420] glusterd crashing with core dump on the latest nightly builds. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744420 --- Comment #4 from Kshithij Iyer --- (In reply to Amar Tumballi from comment #3) > I too have seen this, last week, but Kshithij, can you try restarting > glusterd? It worked fine after restart. I tried restarting as well, it didn't help Amar. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 23 13:52:36 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 23 Aug 2019 13:52:36 +0000 Subject: [Bugs] [Bug 1745026] New: endless heal gluster volume; incrementing number of files to heal when all peers in volume are up Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1745026 Bug ID: 1745026 Summary: endless heal gluster volume; incrementing number of files to heal when all peers in volume are up Product: GlusterFS Version: 4.1 Hardware: x86_64 OS: Linux Status: NEW Component: fuse Severity: high Assignee: bugs at gluster.org Reporter: tvanberlo at vangenechten.com CC: bugs at gluster.org Target Milestone: --- Classification: Community Description of problem: files that need healing increment while gluster is healing.(Number of entries goes up when a heal is already started) Version-Release number of selected component (if applicable): glusterfs.x86_64 6.4-1.el7 installed glusterfs-api.x86_64 6.4-1.el7 installed glusterfs-cli.x86_64 6.4-1.el7 installed glusterfs-client-xlators.x86_64 6.4-1.el7 installed glusterfs-events.x86_64 6.4-1.el7 installed glusterfs-fuse.x86_64 6.4-1.el7 installed glusterfs-geo-replication.x86_64 6.4-1.el7 installed glusterfs-libs.x86_64 6.4-1.el7 installed glusterfs-rdma.x86_64 6.4-1.el7 installed glusterfs-server.x86_64 6.4-1.el7 installed libvirt-daemon-driver-storage-gluster.x86_64 4.5.0-10.el7_6.12 installed python2-gluster.x86_64 6.4-1.el7 installed vdsm-gluster.x86_64 4.30.24-1.el7 installed How reproducible: In our gluster cluster ( replica 3 - 1 arbiter) it is sufficient to reboot a node of the cluster. When the node is back online and the heal is started, more files are added to the 'files that need healing' list. $(gluster volume heal ${volumeName} info| grep entries) Steps to Reproduce: 1. Reboot node in gluster cluster 2. check with 'gluster peer status' on all nodes, if all nodes are connected (if not, stop firewalld, wait until every node is connected, start firewalld. 3. wait 10 minutes or trigger heal manually: 'gluster volume heal ${volumenName}' Actual results: the list of files that need healing grow. 'gluster volume heal ${volumeName} info| grep entries' Expected results: The list of files should decrease continuously, because the gluster fuse should write to all members of the gluster cluster. Additional info: Fix: To fix this situation we execute the following steps on our ovirt cluster: The gluster volume should be remounted, depending on how the storage domain is used in ovirt this is done differently. * data volume: volume where all the vms are running on => 1 by 1 put every host/hypervisor in maintenance mode and activate the host again. (This will unmount and remount the data volume on that host) * engine volume: volume where hosted engine is running - on every host not running the engine, find the systemd scope the engine mount is running with, and restart it - migrate the engine to another host and execute the steps on the hypervisor the engine migrated from - commands to use for finding the correct scope and restarting it: ``` [root at compute0103 ~]# volname='engine' [root at compute0103 ~]# systemctl list-units|grep rhev| grep scope| grep ${volname} run-17819.scope loaded active running /usr/bin/mount -t glusterfs -o backup-volfile-servers=compute0103.priv.domain.com:compute0104.priv.domain.com compute0102.priv.domain.com:/engine /rhev/data-center/mnt/glusterSD/compute0102.priv.domain.com:_engine [root at compute0103 ~]# systemctl restart run-17819.scope ``` ``` [root at compute0103 ~]# gluster volume info data Volume Name: data Type: Replicate Volume ID: 404ec6b1-731c-4e65-a07f-4ca646054eb4 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: compute0102.priv.vangenechten.com:/gluster_bricks/data/data Brick2: compute0103.priv.vangenechten.com:/gluster_bricks/data/data Brick3: compute0104.priv.vangenechten.com:/gluster_bricks/data/data (arbiter) Options Reconfigured: server.event-threads: 4 client.event-threads: 4 features.read-only: off features.barrier: disable performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.low-prio-threads: 32 network.remote-dio: enable cluster.eager-lock: enable cluster.quorum-type: auto cluster.server-quorum-type: server cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 10000 features.shard: on user.cifs: off cluster.choose-local: off storage.owner-uid: 36 storage.owner-gid: 36 network.ping-timeout: 30 performance.strict-o-direct: on transport.address-family: inet performance.client-io-threads: on nfs.disable: on disperse.shd-wait-qlength: 1024 storage.build-pgfid: on ``` -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 23 15:50:38 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 23 Aug 2019 15:50:38 +0000 Subject: [Bugs] [Bug 1743627] ctime: If atime is updated via utimensat syscall ctime is not getting updated In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743627 errata-xmlrpc changed: What |Removed |Added ---------------------------------------------------------------------------- Status|MODIFIED |ON_QA -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 23 15:50:41 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 23 Aug 2019 15:50:41 +0000 Subject: [Bugs] [Bug 1743634] geo-rep: Changelog archive file format is incorrect In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743634 errata-xmlrpc changed: What |Removed |Added ---------------------------------------------------------------------------- Status|MODIFIED |ON_QA -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Sat Aug 24 04:29:19 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Sat, 24 Aug 2019 04:29:19 +0000 Subject: [Bugs] [Bug 789278] Issues reported by Coverity static analysis tool In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=789278 --- Comment #1658 from Worker Ant --- REVIEW: https://review.gluster.org/23259 (cluster/afr - Unused variables) merged (#5) on master by Ravishankar N -- You are receiving this mail because: You are the assignee for the bug. From bugzilla at redhat.com Sun Aug 25 05:19:29 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Sun, 25 Aug 2019 05:19:29 +0000 Subject: [Bugs] [Bug 1423442] group files to set volume options should have comments In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1423442 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-25 05:19:29 --- Comment #6 from Worker Ant --- REVIEW: https://review.gluster.org/23277 (cli - group files to set volume options supports comments) merged (#5) on master by Atin Mukherjee -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Sun Aug 25 05:20:10 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Sun, 25 Aug 2019 05:20:10 +0000 Subject: [Bugs] [Bug 1514683] Removal of bricks in volume isn't prevented if remaining brick doesn't contain all the files In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1514683 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-25 05:20:10 --- Comment #5 from Worker Ant --- REVIEW: https://review.gluster.org/23171 (glusterd: Add warning and abort in case of failures in migration during remove-brick commit) merged (#8) on master by Atin Mukherjee -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 26 03:21:58 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 26 Aug 2019 03:21:58 +0000 Subject: [Bugs] [Bug 1744420] glusterd crashing with core dump on the latest nightly builds. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744420 Atin Mukherjee changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|unspecified |urgent CC| |amukherj at redhat.com Assignee|bugs at gluster.org |srakonde at redhat.com Flags| |needinfo?(kiyer at redhat.com) --- Comment #5 from Atin Mukherjee --- So does this mean it happens every time we try to start glusterd? If so can you please pass the setup (ping offline) ? -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 26 04:54:03 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 26 Aug 2019 04:54:03 +0000 Subject: [Bugs] [Bug 1744420] glusterd crashing with core dump on the latest nightly builds. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744420 Kshithij Iyer changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(kiyer at redhat.com) | --- Comment #6 from Kshithij Iyer --- (In reply to Atin Mukherjee from comment #5) > So does this mean it happens every time we try to start glusterd? Yes! This happens every time. > If so can > you please pass the setup (ping offline) ? Have shared the details with you offline. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 26 06:07:51 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 26 Aug 2019 06:07:51 +0000 Subject: [Bugs] [Bug 789278] Issues reported by Coverity static analysis tool In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=789278 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23295 -- You are receiving this mail because: You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 26 06:07:51 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 26 Aug 2019 06:07:51 +0000 Subject: [Bugs] [Bug 789278] Issues reported by Coverity static analysis tool In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=789278 --- Comment #1659 from Worker Ant --- REVIEW: https://review.gluster.org/23295 (glusterd/ganesha: fixing resource leak in tear_down_cluster()) posted (#1) for review on master by jiffin tony Thottan -- You are receiving this mail because: You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 26 06:17:42 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 26 Aug 2019 06:17:42 +0000 Subject: [Bugs] [Bug 1741783] volume heal info show nothing, while visiting from mount point blame "no such entry" In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741783 Karthik U S changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(zz.sh.cynthia at gma | |il.com) --- Comment #15 from Karthik U S --- IMO the app should not get stuck (until your application keep on retrying to access this file) if we hit this problem as we error out saying ENOENT (correct me if it is not the case in your environment). This problem could have happened most probably due to the scenario explained in comment #9. In that case we do not have any heal pending markers for this file. If we hit this issue, currently there is no way to detect this other than checking the file manually and deleting the file/xattr. I will try to handle this in the code itself so that there won't be any manual effort required for this. Since this happens on the reboot/startup scenarios, how often the restarts are done in your environment and how often you are seeing this problem? -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 26 06:27:50 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 26 Aug 2019 06:27:50 +0000 Subject: [Bugs] [Bug 1733425] Setting volume option when one of the glusterd is stopped in the cluster, post glusterd restart seeing couldn't find vol info in glusterd logs and shd, brick process offline In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1733425 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-26 06:27:50 --- Comment #3 from Worker Ant --- REVIEW: https://review.gluster.org/23042 (glusterd: stop stale bricks during handshaking in brick mux mode) merged (#6) on master by Atin Mukherjee -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 26 06:31:13 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 26 Aug 2019 06:31:13 +0000 Subject: [Bugs] [Bug 1745421] New: ./tests/bugs/glusterd/bug-1595320.t is failing Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1745421 Bug ID: 1745421 Summary: ./tests/bugs/glusterd/bug-1595320.t is failing Product: GlusterFS Version: 6 Status: NEW Component: glusterd Assignee: bugs at gluster.org Reporter: moagrawa at redhat.com CC: bugs at gluster.org Depends On: 1743200 Target Milestone: --- Classification: Community +++ This bug was initially created as a clone of Bug #1743200 +++ Description of problem: Sometime ./tests/bugs/glusterd/bug-1595320.t is failing at the time of counting brick process after sending a kill signal to brick process. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1.Run ./tests/bugs/glusterd/bug-1595320.t in a loop on softserve vm 2. 3. Actual results: ./tests/bugs/glusterd/bug-1595320.t is failing Expected results: ./tests/bugs/glusterd/bug-1595320.t should not fail Additional info: --- Additional comment from Worker Ant on 2019-08-19 10:35:52 UTC --- REVIEW: https://review.gluster.org/23266 (glusterd: ./tests/bugs/glusterd/bug-1595320.t is failing) posted (#1) for review on master by MOHIT AGRAWAL --- Additional comment from Worker Ant on 2019-08-20 01:26:30 UTC --- REVIEW: https://review.gluster.org/23266 (glusterd: ./tests/bugs/glusterd/bug-1595320.t is failing) merged (#1) on master by MOHIT AGRAWAL Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1743200 [Bug 1743200] ./tests/bugs/glusterd/bug-1595320.t is failing -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 26 06:31:13 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 26 Aug 2019 06:31:13 +0000 Subject: [Bugs] [Bug 1743200] ./tests/bugs/glusterd/bug-1595320.t is failing In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743200 Mohit Agrawal changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1745421 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1745421 [Bug 1745421] ./tests/bugs/glusterd/bug-1595320.t is failing -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 26 06:31:35 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 26 Aug 2019 06:31:35 +0000 Subject: [Bugs] [Bug 1745421] ./tests/bugs/glusterd/bug-1595320.t is failing In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1745421 Mohit Agrawal changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|bugs at gluster.org |moagrawa at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 26 06:32:04 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 26 Aug 2019 06:32:04 +0000 Subject: [Bugs] [Bug 1743219] glusterd start is failed and throwing an error Address already in use In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743219 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-26 06:32:04 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23268 (rpc: glusterd start is failed and throwing an error Address already in use) merged (#1) on release-6 by MOHIT AGRAWAL -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 26 06:32:05 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 26 Aug 2019 06:32:05 +0000 Subject: [Bugs] [Bug 1743218] glusterd start is failed and throwing an error Address already in use In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743218 Bug 1743218 depends on bug 1743219, which changed state. Bug 1743219 Summary: glusterd start is failed and throwing an error Address already in use https://bugzilla.redhat.com/show_bug.cgi?id=1743219 What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 26 06:33:48 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 26 Aug 2019 06:33:48 +0000 Subject: [Bugs] [Bug 1745421] ./tests/bugs/glusterd/bug-1595320.t is failing In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1745421 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23296 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 26 06:33:49 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 26 Aug 2019 06:33:49 +0000 Subject: [Bugs] [Bug 1745421] ./tests/bugs/glusterd/bug-1595320.t is failing In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1745421 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23296 (glusterd: ./tests/bugs/glusterd/bug-1595320.t is failing) posted (#1) for review on release-6 by MOHIT AGRAWAL -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 26 06:33:57 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 26 Aug 2019 06:33:57 +0000 Subject: [Bugs] [Bug 1745422] New: ./tests/bugs/glusterd/bug-1595320.t is failing Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1745422 Bug ID: 1745422 Summary: ./tests/bugs/glusterd/bug-1595320.t is failing Product: GlusterFS Version: 7 Status: NEW Component: glusterd Assignee: bugs at gluster.org Reporter: moagrawa at redhat.com CC: bugs at gluster.org Depends On: 1743200 Blocks: 1745421 Target Milestone: --- Classification: Community +++ This bug was initially created as a clone of Bug #1743200 +++ Description of problem: Sometime ./tests/bugs/glusterd/bug-1595320.t is failing at the time of counting brick process after sending a kill signal to brick process. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1.Run ./tests/bugs/glusterd/bug-1595320.t in a loop on softserve vm 2. 3. Actual results: ./tests/bugs/glusterd/bug-1595320.t is failing Expected results: ./tests/bugs/glusterd/bug-1595320.t should not fail Additional info: --- Additional comment from Worker Ant on 2019-08-19 10:35:52 UTC --- REVIEW: https://review.gluster.org/23266 (glusterd: ./tests/bugs/glusterd/bug-1595320.t is failing) posted (#1) for review on master by MOHIT AGRAWAL --- Additional comment from Worker Ant on 2019-08-20 01:26:30 UTC --- REVIEW: https://review.gluster.org/23266 (glusterd: ./tests/bugs/glusterd/bug-1595320.t is failing) merged (#1) on master by MOHIT AGRAWAL Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1743200 [Bug 1743200] ./tests/bugs/glusterd/bug-1595320.t is failing https://bugzilla.redhat.com/show_bug.cgi?id=1745421 [Bug 1745421] ./tests/bugs/glusterd/bug-1595320.t is failing -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 26 06:33:57 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 26 Aug 2019 06:33:57 +0000 Subject: [Bugs] [Bug 1743200] ./tests/bugs/glusterd/bug-1595320.t is failing In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743200 Mohit Agrawal changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1745422 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1745422 [Bug 1745422] ./tests/bugs/glusterd/bug-1595320.t is failing -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 26 06:33:57 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 26 Aug 2019 06:33:57 +0000 Subject: [Bugs] [Bug 1745421] ./tests/bugs/glusterd/bug-1595320.t is failing In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1745421 Mohit Agrawal changed: What |Removed |Added ---------------------------------------------------------------------------- Depends On| |1745422 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1745422 [Bug 1745422] ./tests/bugs/glusterd/bug-1595320.t is failing -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 26 06:34:13 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 26 Aug 2019 06:34:13 +0000 Subject: [Bugs] [Bug 1745422] ./tests/bugs/glusterd/bug-1595320.t is failing In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1745422 Mohit Agrawal changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|bugs at gluster.org |moagrawa at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 26 06:36:50 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 26 Aug 2019 06:36:50 +0000 Subject: [Bugs] [Bug 1745422] ./tests/bugs/glusterd/bug-1595320.t is failing In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1745422 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23297 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 26 06:36:51 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 26 Aug 2019 06:36:51 +0000 Subject: [Bugs] [Bug 1745422] ./tests/bugs/glusterd/bug-1595320.t is failing In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1745422 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23297 (glusterd: ./tests/bugs/glusterd/bug-1595320.t is failing) posted (#1) for review on release-7 by MOHIT AGRAWAL -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 26 07:17:22 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 26 Aug 2019 07:17:22 +0000 Subject: [Bugs] [Bug 789278] Issues reported by Coverity static analysis tool In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=789278 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23299 -- You are receiving this mail because: You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 26 07:17:24 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 26 Aug 2019 07:17:24 +0000 Subject: [Bugs] [Bug 789278] Issues reported by Coverity static analysis tool In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=789278 --- Comment #1660 from Worker Ant --- REVIEW: https://review.gluster.org/23299 (glusterd: Unused value coverity fix) posted (#1) for review on master by Sanju Rakonde -- You are receiving this mail because: You are the assignee for the bug. From bugzilla at redhat.com Mon Aug 26 07:20:45 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 26 Aug 2019 07:20:45 +0000 Subject: [Bugs] [Bug 1745421] ./tests/bugs/glusterd/bug-1595320.t is failing In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1745421 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-26 07:20:45 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23296 (glusterd: ./tests/bugs/glusterd/bug-1595320.t is failing) merged (#1) on release-6 by MOHIT AGRAWAL -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 26 07:51:08 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 26 Aug 2019 07:51:08 +0000 Subject: [Bugs] [Bug 1741783] volume heal info show nothing, while visiting from mount point blame "no such entry" In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741783 zhou lin changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(zz.sh.cynthia at gma | |il.com) | --- Comment #16 from zhou lin --- this issue does not appear often, only twice until now. the case is hard reboot all 3 storage nodes repeatedly, fore each storage nodes, once per minute. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 26 08:22:58 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 26 Aug 2019 08:22:58 +0000 Subject: [Bugs] [Bug 1729108] Memory leak in glusterfsd process In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1729108 Nithya Balachandran changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(nbalacha at redhat.c | |om) | -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 26 08:31:44 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 26 Aug 2019 08:31:44 +0000 Subject: [Bugs] [Bug 1729108] Memory leak in glusterfsd process In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1729108 Upasana changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(nbalacha at redhat.c | |om) --- Comment #11 from Upasana --- Private+Shared = RAM used of glusterfsd -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 26 08:50:44 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 26 Aug 2019 08:50:44 +0000 Subject: [Bugs] [Bug 1745422] ./tests/bugs/glusterd/bug-1595320.t is failing In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1745422 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-26 08:50:44 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23297 (glusterd: ./tests/bugs/glusterd/bug-1595320.t is failing) merged (#1) on release-7 by MOHIT AGRAWAL -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 26 08:50:45 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 26 Aug 2019 08:50:45 +0000 Subject: [Bugs] [Bug 1745421] ./tests/bugs/glusterd/bug-1595320.t is failing In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1745421 Bug 1745421 depends on bug 1745422, which changed state. Bug 1745422 Summary: ./tests/bugs/glusterd/bug-1595320.t is failing https://bugzilla.redhat.com/show_bug.cgi?id=1745422 What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 26 09:03:43 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 26 Aug 2019 09:03:43 +0000 Subject: [Bugs] [Bug 1729108] Memory leak in glusterfsd process In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1729108 Nithya Balachandran changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(nbalacha at redhat.c | |om) | -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 26 09:11:51 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 26 Aug 2019 09:11:51 +0000 Subject: [Bugs] [Bug 1741783] volume heal info show nothing, while visiting from mount point blame "no such entry" In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741783 --- Comment #17 from zhou lin --- it would be a good news for glusterfs users if this problem could be solved by gluster source code, since scanning all volume for such files would be of very expensive cost! i am looking forward for this patch, thanks! -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 26 09:21:53 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 26 Aug 2019 09:21:53 +0000 Subject: [Bugs] [Bug 1729108] Memory leak in glusterfsd process In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1729108 Upasana changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ON_QA |VERIFIED -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 26 09:23:50 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 26 Aug 2019 09:23:50 +0000 Subject: [Bugs] [Bug 1741783] volume heal info show nothing, while visiting from mount point blame "no such entry" In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741783 --- Comment #18 from Karthik U S --- Will update here once the patch is ready. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 26 10:39:27 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 26 Aug 2019 10:39:27 +0000 Subject: [Bugs] [Bug 1740968] glustershd can not decide heald_sinks, and skip repair, so some entries lingering in volume heal info In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1740968 Karthik U S changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED --- Comment #7 from Karthik U S --- Hi Cynthia, Appreciate your efforts on finding the root cause for this issue. Yes you are right. In __afr_selfheal_entry_prepare() it is not setting the bricks which are needing heal as healed_sinks in this case. I created this locally by setting the required xattrs and creating the gfid entries manually on the backend. Will work on the fix for this. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Mon Aug 26 12:20:20 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Mon, 26 Aug 2019 12:20:20 +0000 Subject: [Bugs] [Bug 789278] Issues reported by Coverity static analysis tool In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=789278 --- Comment #1661 from Worker Ant --- REVIEW: https://review.gluster.org/23299 (glusterd: Unused value coverity fix) merged (#3) on master by Sanju Rakonde -- You are receiving this mail because: You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 02:26:51 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 02:26:51 +0000 Subject: [Bugs] [Bug 1740968] glustershd can not decide heald_sinks, and skip repair, so some entries lingering in volume heal info In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1740968 --- Comment #8 from zhou lin --- thanks! looking forward for your fix patch :)! -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 27 05:42:22 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 05:42:22 +0000 Subject: [Bugs] [Bug 1740968] glustershd can not decide heald_sinks, and skip repair, so some entries lingering in volume heal info In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1740968 Hunang Shujun changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |shujun.huang at nokia-sbell.co | |m --- Comment #9 from Hunang Shujun --- the healed_sinks is empty is because afr_selfheal_find_direction do not find any "sink". In the function, only the node who accuse by source node can be decided as sink, other accuse node will not be identified as sink. The rule is valid or not? Any reason? for (i = 0; i < priv->child_count; i++) { if (!sources[i])---> the accuse info will not be taken into consider when the node is not source continue; if (self_accused[i]) continue; for (j = 0; j < priv->child_count; j++) { if (matrix[i][j]) sinks[j] = 1; } } -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 27 06:22:06 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 06:22:06 +0000 Subject: [Bugs] [Bug 1744883] GlusterFS problem dataloss In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744883 Atin Mukherjee changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |amukherj at redhat.com Component|glusterd |core --- Comment #1 from Atin Mukherjee --- Could you explain the problem a bit more in details along with providing the volume configuration (gluster v info output). I'm moving this bug to core component. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 06:22:20 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 06:22:20 +0000 Subject: [Bugs] [Bug 1744883] GlusterFS problem dataloss In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744883 Atin Mukherjee changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(nicola.battista89 | |@gmail.com) -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 06:59:32 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 06:59:32 +0000 Subject: [Bugs] [Bug 1740968] glustershd can not decide heald_sinks, and skip repair, so some entries lingering in volume heal info In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1740968 --- Comment #10 from Karthik U S --- (In reply to Hunang Shujun from comment #9) > the healed_sinks is empty is because afr_selfheal_find_direction do not find > any "sink". In the function, only the node who accuse by source node can be > decided as sink, other accuse node will not be identified as sink. The rule > is valid or not? Any reason? > for (i = 0; i < priv->child_count; i++) { > if (!sources[i])---> the accuse info will not be taken into > consider when the node is not source > continue; > if (self_accused[i]) > continue; > for (j = 0; j < priv->child_count; j++) { > if (matrix[i][j]) > sinks[j] = 1; > } > } This is a valid code. Here we consider only those bricks which are not blamed by any of the non-accused bricks as sinks. Then in __afr_selfheal_entry_prepare() we will intersect the locked_on and sinks to populate the healed_sinks. After this __afr_selfheal_entry_finalize_source() will be called which attempts to mark all the bricks which are not source as healed_sinks. sources_count = AFR_COUNT(sources, priv->child_count); if ((AFR_CMP(locked_on, healed_sinks, priv->child_count) == 0) || !sources_count || afr_does_witness_exist(this, witness)) { -------> These condition does not hold true in this case so it fails to mark the non-sources as sinks memset(sources, 0, sizeof(*sources) * priv->child_count); afr_mark_active_sinks(this, sources, locked_on, healed_sinks); return -1; } source = afr_choose_source_by_policy(priv, sources, AFR_ENTRY_TRANSACTION); return source; We need to handle this case separately where we have source set but there is no brick marked as sink. Since this is happening for entry heal we can not directly consider all the other bricks as sinks, which might lead to data loss. So the best way would be to do conservative merge here. I will check whether this happens for data & metadata heal case as well (ideally it should not) and then send a patch to fix this. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 27 07:18:47 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 07:18:47 +0000 Subject: [Bugs] [Bug 1740968] glustershd can not decide heald_sinks, and skip repair, so some entries lingering in volume heal info In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1740968 --- Comment #11 from Hunang Shujun --- I am appreciate for your detail explaination. :) -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 27 07:22:14 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 07:22:14 +0000 Subject: [Bugs] [Bug 1744950] glusterfs wrong size with total sum of brick. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744950 Nithya Balachandran changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |bugs at gluster.org Component|glusterfs |core Version|ocs-3.11 |4.1 Assignee|atumball at redhat.com |bugs at gluster.org Product|Red Hat Gluster Storage |GlusterFS QA Contact|bmekala at redhat.com | --- Comment #5 from Nithya Balachandran --- 3.12.15 is an upstream build. Updating the BZ accordingly. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 07:22:56 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 07:22:56 +0000 Subject: [Bugs] [Bug 1744950] glusterfs wrong size with total sum of brick. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744950 --- Comment #6 from Nithya Balachandran --- (In reply to liuruit from comment #4) > firstly, yum install glusterfs 3.12.2, and create the volume. > then upgrade to 3.12.15, and expand the volume. > > volume vol_xxx-posix > type storage/posix > option shared-brick-count 1 > option volume-id c47089b2-96c2-4ec2-9dfb-988d1e593cdc > option directory > /var/lib/heketi/mounts/vg_929e45b20519c80a714d7645061e354f/ > brick_5bd825a22e9511d539d24226a3d937a7/brick > end-volume > > df -h|grep brick_5bd825a22e9511d539d24226a3d937a7 > /dev/mapper/vg_929e45b20519c80a714d7645061e354f- > brick_5bd825a22e9511d539d24226a3d937a7 10G 8.0G 2.0G 81% > /var/lib/heketi/mounts/vg_929e45b20519c80a714d7645061e354f/ > brick_5bd825a22e9511d539d24226a3d937a7 > > ------ > volume vol_xxx-posix > type storage/posix > option shared-brick-count 1 > option volume-id c47089b2-96c2-4ec2-9dfb-988d1e593cdc > option directory > /var/lib/heketi/mounts/vg_13de35a047bf8fd839f8b5b6c5aa7b20/ > brick_df9b0d0b41cd17848212a9e2215eba8a/brick > end-volume > > df -h|grep brick_df9b0d0b41cd17848212a9e2215eba8a > /dev/mapper/vg_13de35a047bf8fd839f8b5b6c5aa7b20- > brick_df9b0d0b41cd17848212a9e2215eba8a 40G 33G 7.6G 82% > /var/lib/heketi/mounts/vg_13de35a047bf8fd839f8b5b6c5aa7b20/ > brick_df9b0d0b41cd17848212a9e2215eba8a Please check the bricks on all 3 nodes. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 07:27:26 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 07:27:26 +0000 Subject: [Bugs] [Bug 1744883] GlusterFS problem dataloss In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744883 Nicola battista changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(nicola.battista89 | |@gmail.com) | --- Comment #2 from Nicola battista --- Hi, Sure this is the output : [root at cstore-pm01 ~]# glusterfs --version glusterfs 6.5 gluster> volume status Status of volume: dbroot1 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 172.16.31.5:/usr/local/mariadb/column store/gluster/brick1 49152 0 Y 12001 Brick 172.16.31.6:/usr/local/mariadb/column store/gluster/brick1 49152 0 Y 11632 Brick 172.16.31.7:/usr/local/mariadb/column store/gluster/brick1 49152 0 Y 11640 Self-heal Daemon on localhost N/A N/A Y 12021 Self-heal Daemon on 172.16.31.6 N/A N/A Y 11663 Self-heal Daemon on 172.16.31.7 N/A N/A Y 11673 Task Status of Volume dbroot1 ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: dbroot2 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 172.16.31.5:/usr/local/mariadb/column store/gluster/brick2 49153 0 Y 12000 Brick 172.16.31.6:/usr/local/mariadb/column store/gluster/brick2 49153 0 Y 11633 Brick 172.16.31.7:/usr/local/mariadb/column store/gluster/brick2 49153 0 Y 11651 Self-heal Daemon on localhost N/A N/A Y 12021 Self-heal Daemon on 172.16.31.6 N/A N/A Y 11663 Self-heal Daemon on 172.16.31.7 N/A N/A Y 11673 Task Status of Volume dbroot2 ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: dbroot3 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 172.16.31.5:/usr/local/mariadb/column store/gluster/brick3 49154 0 Y 12002 Brick 172.16.31.6:/usr/local/mariadb/column store/gluster/brick3 49154 0 Y 11648 Brick 172.16.31.7:/usr/local/mariadb/column store/gluster/brick3 49154 0 Y 11662 Self-heal Daemon on localhost N/A N/A Y 12021 Self-heal Daemon on 172.16.31.6 N/A N/A Y 11663 Self-heal Daemon on 172.16.31.7 N/A N/A Y 11673 Task Status of Volume dbroot3 ------------------------------------------------------------------------------ There are no active volume tasks gluster> volume info all Volume Name: dbroot1 Type: Replicate Volume ID: ecf4fd04-2e96-47d9-8a40-4f84a48657fb Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 172.16.31.5:/usr/local/mariadb/columnstore/gluster/brick1 Brick2: 172.16.31.6:/usr/local/mariadb/columnstore/gluster/brick1 Brick3: 172.16.31.7:/usr/local/mariadb/columnstore/gluster/brick1 Options Reconfigured: transport.address-family: inet nfs.disable: on performance.client-io-threads: off Volume Name: dbroot2 Type: Replicate Volume ID: f2b49f9f-3a91-4ac4-8eb3-4a327d0dbc61 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 172.16.31.5:/usr/local/mariadb/columnstore/gluster/brick2 Brick2: 172.16.31.6:/usr/local/mariadb/columnstore/gluster/brick2 Brick3: 172.16.31.7:/usr/local/mariadb/columnstore/gluster/brick2 Options Reconfigured: transport.address-family: inet nfs.disable: on performance.client-io-threads: off Volume Name: dbroot3 Type: Replicate Volume ID: 73b96917-c842-4fc2-8bca-099735c4aa6a Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 172.16.31.5:/usr/local/mariadb/columnstore/gluster/brick3 Brick2: 172.16.31.6:/usr/local/mariadb/columnstore/gluster/brick3 Brick3: 172.16.31.7:/usr/local/mariadb/columnstore/gluster/brick3 Options Reconfigured: transport.address-family: inet nfs.disable: on performance.client-io-threads: off Thanks, Regards Nicola Battista -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 07:42:06 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 07:42:06 +0000 Subject: [Bugs] [Bug 1463191] gfapi: discard glfs object when volume is deleted In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1463191 Vishal Pandey changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 27 07:47:33 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 07:47:33 +0000 Subject: [Bugs] [Bug 1558507] Gluster allows renaming of folders, which contain WORMed/Retain or WORMed files In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1558507 Vishal Pandey changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(david.spisla at iter | |nity.com) --- Comment #6 from Vishal Pandey --- I have mentioned some of the reasons why this feature might not be needed afetr some discussions with Karthik. If there is anything else anyone would like to add or else can we close the issue ? -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 27 07:48:19 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 07:48:19 +0000 Subject: [Bugs] [Bug 1524058] gluster peer command stops working with unhelpful error messages when DNS doens't work In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1524058 Vishal Pandey changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(nh2-redhatbugzill | |a at deditus.de) --- Comment #5 from Vishal Pandey --- @nh2 Can you please try to reproduct this issue again on the latest releases ? -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 27 07:51:29 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 07:51:29 +0000 Subject: [Bugs] [Bug 1574298] on glusterd initial process, brick started always a error with "EPOLLERR - disconnecting now" In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1574298 Vishal Pandey changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(george.lian at nokia | |.com) --- Comment #10 from Vishal Pandey --- George, Can you try and reproduce this on the latest upstream version ? -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 27 07:54:31 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 07:54:31 +0000 Subject: [Bugs] [Bug 1744950] glusterfs wrong size with total sum of brick. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744950 Nithya Balachandran changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(liuruit at gmail.com | |) --- Comment #7 from Nithya Balachandran --- (In reply to Nithya Balachandran from comment #6) > (In reply to liuruit from comment #4) > > firstly, yum install glusterfs 3.12.2, and create the volume. > > then upgrade to 3.12.15, and expand the volume. > > > > volume vol_xxx-posix > > type storage/posix > > option shared-brick-count 1 > > option volume-id c47089b2-96c2-4ec2-9dfb-988d1e593cdc > > option directory Please check the value of shared-brick-count for the bricks on all 3 nodes. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 08:20:53 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 08:20:53 +0000 Subject: [Bugs] [Bug 1745914] New: ESTALE change in fuse breaks get_real_filename implementation Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1745914 Bug ID: 1745914 Summary: ESTALE change in fuse breaks get_real_filename implementation Product: GlusterFS Version: 4.1 Status: NEW Component: posix Assignee: bugs at gluster.org Reporter: amukherj at redhat.com CC: bugs at gluster.org Target Milestone: --- Classification: Community This bug was initially created as a copy of Bug #1722977 I am copying this bug because: Description of problem: The change of ENOENT to ESTALE in the fuse bridge in 59629f1da9dca670d5dcc6425f7f89b3e96b46bf has broken the get_real_filename implementation over fuse: get_real_filename is implemented as a virtual extended attribute to help Samba implement the case-insensitive but case preserving SMB protocol more efficiently. It is implemented as a getxattr call on the parent directory with the virtual key of "get_real_filename:" by looking for a spelling with different case for the provided file/dir name () and returning this correct spelling as a result if the entry is found. Originally (05aaec645a6262d431486eb5ac7cd702646cfcfb), the implementation used the ENOENT errno to return the authoritative answer that does not exist in any case folding. Now this implementation is actually a violation or misuse of the defined API for the getxattr call which returns ENOENT for the case that the dir that the call is made against does not exist and ENOATTR (or the synonym ENODATA) for the case that the xattr does not exist. This was not a problem until the gluster fuse-bridge was changed to do map ENOENT to ESTALE in 59629f1da9dca670d5dcc6425f7f89b3e96b46bf, after which we the getxattr call for get_real_filename returned an ESTALE instead of ENOENT breaking the expectation in Samba. (It is an independent problem that ESTALE should not leak out to user space but is intended to trigger retries between fuse and gluster. My theory is that the leaking happens because of the wrong use of ESTALE here: the parent directory exists in this case, and there is nothing stale....) But nevertheless, the semantics seem to be incorrect here and should be changed. Version-Release number of selected component (if applicable): master and version 6 How reproducible: Always. Steps to Reproduce: On a gluster fuse mount, run `getfattr -n glusterfs.get_real_filename:file-that-does-not-exist /path/to/fuse/mount/some-subdir`. Actual results: This shows the ESTALE error. Expected results: It shows ENONET. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 08:23:01 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 08:23:01 +0000 Subject: [Bugs] [Bug 1745914] ESTALE change in fuse breaks get_real_filename implementation In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1745914 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23305 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 08:23:02 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 08:23:02 +0000 Subject: [Bugs] [Bug 1745914] ESTALE change in fuse breaks get_real_filename implementation In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1745914 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23305 ([RFC] change get_real_filename implementation to use ENOATTR instead of ENOENT) posted (#1) for review on release-7 by Atin Mukherjee -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 08:28:14 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 08:28:14 +0000 Subject: [Bugs] [Bug 1744950] glusterfs wrong size with total sum of brick. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744950 liuruit at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(liuruit at gmail.com | |) | --- Comment #8 from liuruit at gmail.com --- grep shared -r . ./vol_xxx.host1.var-lib-heketi-mounts-vg_929e45b20519c80a714d7645061e354f-brick_5bd825a22e9511d539d24226a3d937a7-brick.vol: option shared-brick-count 1 ./vol_xxx.host2.var-lib-heketi-mounts-vg_d42ee5516f065e5f10b223bbb0a00d9b-brick_6078cfee3d8e48b50586b539fdfe8d61-brick.vol: option shared-brick-count 0 ./vol_xxx.host3.var-lib-heketi-mounts-vg_62960212c5851a4f597ee9ccfd6ae6d9-brick_edbd921fc1f3a9431eaa14eb8afff4d3-brick.vol: option shared-brick-count 0 ./vol_xxx.host1.var-lib-heketi-mounts-vg_58768cbf62201deef23eb06ab4161ca8-brick_fd4e796278c127f6a7b0d70d5689a24e-brick.vol: option shared-brick-count 0 ./vol_xxx.host2.var-lib-heketi-mounts-vg_13de35a047bf8fd839f8b5b6c5aa7b20-brick_df9b0d0b41cd17848212a9e2215eba8a-brick.vol: option shared-brick-count 1 ./vol_xxx.host3.var-lib-heketi-mounts-vg_19d11e2d0689d918b6affd2acfb2bcfe-brick_ebb2523fa96dbfe301c74e16428b04a0-brick.vol: option shared-brick-count 0 All brick on the 3 host have same value. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 08:28:42 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 08:28:42 +0000 Subject: [Bugs] [Bug 1743215] glusterd-utils: 0-management: xfs_info exited with non-zero exit status [Permission denied] In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743215 Atin Mukherjee changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |amukherj at redhat.com Flags| |needinfo?(johannbg at gmail.co | |m) --- Comment #1 from Atin Mukherjee --- I do not believe that this is a glusterfs bug. glusterd_add_inode_size_to_dict () would pass the device (the underlying device on which brick is mounted) to the xfs_prog. If runner framework is unable to execute the xfs_prog call successfully it will throw up the errno and in this case it's permission denied. Could you point out your brick paths from gluster volume info output and then run a df command to find the device of the brick paths and then execute xfs_prog to see what it throws up? -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 08:33:09 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 08:33:09 +0000 Subject: [Bugs] [Bug 1745916] New: glusterfs client process memory leak after enable tls on community version 6.5 Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1745916 Bug ID: 1745916 Summary: glusterfs client process memory leak after enable tls on community version 6.5 Product: GlusterFS Version: 6 Hardware: x86_64 OS: Linux Status: NEW Component: rpc Severity: medium Assignee: bugs at gluster.org Reporter: zz.sh.cynthia at gmail.com CC: bugs at gluster.org Target Milestone: --- Classification: Community Description of problem: after enable ssl, glusterfs process memory leak detected Version-Release number of selected component (if applicable): glusterfs 6.5 How reproducible: Steps to Reproduce: 1.enable tls 2.do io on volume with tls enabled 3.found glusterfs client process memory increase steadily Actual results: Expected results: Additional info: I find that the following patch create new SSL_CTX for each transport, but when I check the code, I am not clear that In function socket_server_event_handler, ?ret = socket_init(new_trans);? create new SSL_CTX for new_trans, but why after that, new_priv->ssl_ctx = priv->ssl_ctx; this will overwrite the newly allocated ssl_ctx in new_priv, and may cause potential memory leak, i think. Could you please brief on my confusion, many thanks! SHA-1: 06fa261207f0f0625c52fa977b96e5875e9a91e0 * socket/ssl: fix crl handling Problem: Just setting the path to the CRL directory in socket_init() wasn't working. Solution: Need to use special API to retrieve and set X509_VERIFY_PARAM and set the CRL checking flags explicitly. Also, setting the CRL checking flags is a big pain, since the connection is declared as failed if any CRL isn't found in the designated file or directory. A comment has been added to the code appropriately. Change-Id: I8a8ed2ddaf4b5eb974387d2f7b1a85c1ca39fe79 fixes: bz#1687326 Signed-off-by: Milind Changire -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 08:58:34 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 08:58:34 +0000 Subject: [Bugs] [Bug 1744950] glusterfs wrong size with total sum of brick. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744950 --- Comment #9 from Nithya Balachandran --- (In reply to liuruit from comment #8) > grep shared -r . > ./vol_xxx.host1.var-lib-heketi-mounts-vg_929e45b20519c80a714d7645061e354f- > brick_5bd825a22e9511d539d24226a3d937a7-brick.vol: option > shared-brick-count 1 > ./vol_xxx.host2.var-lib-heketi-mounts-vg_d42ee5516f065e5f10b223bbb0a00d9b- > brick_6078cfee3d8e48b50586b539fdfe8d61-brick.vol: option > shared-brick-count 0 > ./vol_xxx.host3.var-lib-heketi-mounts-vg_62960212c5851a4f597ee9ccfd6ae6d9- > brick_edbd921fc1f3a9431eaa14eb8afff4d3-brick.vol: option > shared-brick-count 0 > ./vol_xxx.host1.var-lib-heketi-mounts-vg_58768cbf62201deef23eb06ab4161ca8- > brick_fd4e796278c127f6a7b0d70d5689a24e-brick.vol: option > shared-brick-count 0 > ./vol_xxx.host2.var-lib-heketi-mounts-vg_13de35a047bf8fd839f8b5b6c5aa7b20- > brick_df9b0d0b41cd17848212a9e2215eba8a-brick.vol: option > shared-brick-count 1 > ./vol_xxx.host3.var-lib-heketi-mounts-vg_19d11e2d0689d918b6affd2acfb2bcfe- > brick_ebb2523fa96dbfe301c74e16428b04a0-brick.vol: option > shared-brick-count 0 > > All brick on the 3 host have same value. These look like the values from a single host. Is that correct? You need to run the grep on every node (host1, host2 and host3) Please provide the values for all the nodes - you should see 1 entry for each brick on each node. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 09:09:27 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 09:09:27 +0000 Subject: [Bugs] [Bug 1744950] glusterfs wrong size with total sum of brick. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744950 --- Comment #10 from liuruit at gmail.com --- Another volume info. gluster volume info vol_320b6dab471a7b810d92ff03e9ef05c6 Volume Name: vol_320b6dab471a7b810d92ff03e9ef05c6 Type: Distributed-Replicate Volume ID: c47089b2-96c2-4ec2-9dfb-988d1e593cdc Status: Started Snapshot Count: 0 Number of Bricks: 2 x 3 = 6 Transport-type: tcp Bricks: Brick1: 10.10.2.20:/var/lib/heketi/mounts/vg_929e45b20519c80a714d7645061e354f/brick_5bd825a22e9511d539d24226a3d937a7/brick Brick2: 10.10.2.22:/var/lib/heketi/mounts/vg_d42ee5516f065e5f10b223bbb0a00d9b/brick_6078cfee3d8e48b50586b539fdfe8d61/brick Brick3: 10.10.2.21:/var/lib/heketi/mounts/vg_62960212c5851a4f597ee9ccfd6ae6d9/brick_edbd921fc1f3a9431eaa14eb8afff4d3/brick Brick4: 10.10.2.19:/var/lib/heketi/mounts/vg_58768cbf62201deef23eb06ab4161ca8/brick_fd4e796278c127f6a7b0d70d5689a24e/brick Brick5: 10.10.2.20:/var/lib/heketi/mounts/vg_13de35a047bf8fd839f8b5b6c5aa7b20/brick_df9b0d0b41cd17848212a9e2215eba8a/brick Brick6: 10.10.2.22:/var/lib/heketi/mounts/vg_19d11e2d0689d918b6affd2acfb2bcfe/brick_ebb2523fa96dbfe301c74e16428b04a0/brick Options Reconfigured: nfs.disable: on 10.10.2.19: grep shared -r . ./vol_320b6dab471a7b810d92ff03e9ef05c6.10.51.2.20.var-lib-heketi-mounts-vg_929e45b20519c80a714d7645061e354f-brick_5bd825a22e9511d539d24226a3d937a7-brick.vol: option shared-brick-count 0 ./vol_320b6dab471a7b810d92ff03e9ef05c6.10.51.2.22.var-lib-heketi-mounts-vg_d42ee5516f065e5f10b223bbb0a00d9b-brick_6078cfee3d8e48b50586b539fdfe8d61-brick.vol: option shared-brick-count 0 ./vol_320b6dab471a7b810d92ff03e9ef05c6.10.51.2.21.var-lib-heketi-mounts-vg_62960212c5851a4f597ee9ccfd6ae6d9-brick_edbd921fc1f3a9431eaa14eb8afff4d3-brick.vol: option shared-brick-count 0 ./vol_320b6dab471a7b810d92ff03e9ef05c6.10.51.2.19.var-lib-heketi-mounts-vg_58768cbf62201deef23eb06ab4161ca8-brick_fd4e796278c127f6a7b0d70d5689a24e-brick.vol: option shared-brick-count 1 ./vol_320b6dab471a7b810d92ff03e9ef05c6.10.51.2.20.var-lib-heketi-mounts-vg_13de35a047bf8fd839f8b5b6c5aa7b20-brick_df9b0d0b41cd17848212a9e2215eba8a-brick.vol: option shared-brick-count 0 ./vol_320b6dab471a7b810d92ff03e9ef05c6.10.51.2.22.var-lib-heketi-mounts-vg_19d11e2d0689d918b6affd2acfb2bcfe-brick_ebb2523fa96dbfe301c74e16428b04a0-brick.vol: option shared-brick-count 0 10.10.2.20: grep shared -r . ./vol_320b6dab471a7b810d92ff03e9ef05c6.10.51.2.20.var-lib-heketi-mounts-vg_929e45b20519c80a714d7645061e354f-brick_5bd825a22e9511d539d24226a3d937a7-brick.vol: option shared-brick-count 1 ./vol_320b6dab471a7b810d92ff03e9ef05c6.10.51.2.22.var-lib-heketi-mounts-vg_d42ee5516f065e5f10b223bbb0a00d9b-brick_6078cfee3d8e48b50586b539fdfe8d61-brick.vol: option shared-brick-count 0 ./vol_320b6dab471a7b810d92ff03e9ef05c6.10.51.2.21.var-lib-heketi-mounts-vg_62960212c5851a4f597ee9ccfd6ae6d9-brick_edbd921fc1f3a9431eaa14eb8afff4d3-brick.vol: option shared-brick-count 0 ./vol_320b6dab471a7b810d92ff03e9ef05c6.10.51.2.19.var-lib-heketi-mounts-vg_58768cbf62201deef23eb06ab4161ca8-brick_fd4e796278c127f6a7b0d70d5689a24e-brick.vol: option shared-brick-count 0 ./vol_320b6dab471a7b810d92ff03e9ef05c6.10.51.2.20.var-lib-heketi-mounts-vg_13de35a047bf8fd839f8b5b6c5aa7b20-brick_df9b0d0b41cd17848212a9e2215eba8a-brick.vol: option shared-brick-count 1 ./vol_320b6dab471a7b810d92ff03e9ef05c6.10.51.2.22.var-lib-heketi-mounts-vg_19d11e2d0689d918b6affd2acfb2bcfe-brick_ebb2523fa96dbfe301c74e16428b04a0-brick.vol: option shared-brick-count 0 10.10.2.21: grep shared -r . ./vol_320b6dab471a7b810d92ff03e9ef05c6.10.51.2.20.var-lib-heketi-mounts-vg_929e45b20519c80a714d7645061e354f-brick_5bd825a22e9511d539d24226a3d937a7-brick.vol: option shared-brick-count 0 ./vol_320b6dab471a7b810d92ff03e9ef05c6.10.51.2.22.var-lib-heketi-mounts-vg_d42ee5516f065e5f10b223bbb0a00d9b-brick_6078cfee3d8e48b50586b539fdfe8d61-brick.vol: option shared-brick-count 0 ./vol_320b6dab471a7b810d92ff03e9ef05c6.10.51.2.21.var-lib-heketi-mounts-vg_62960212c5851a4f597ee9ccfd6ae6d9-brick_edbd921fc1f3a9431eaa14eb8afff4d3-brick.vol: option shared-brick-count 1 ./vol_320b6dab471a7b810d92ff03e9ef05c6.10.51.2.19.var-lib-heketi-mounts-vg_58768cbf62201deef23eb06ab4161ca8-brick_fd4e796278c127f6a7b0d70d5689a24e-brick.vol: option shared-brick-count 0 ./vol_320b6dab471a7b810d92ff03e9ef05c6.10.51.2.20.var-lib-heketi-mounts-vg_13de35a047bf8fd839f8b5b6c5aa7b20-brick_df9b0d0b41cd17848212a9e2215eba8a-brick.vol: option shared-brick-count 0 ./vol_320b6dab471a7b810d92ff03e9ef05c6.10.51.2.22.var-lib-heketi-mounts-vg_19d11e2d0689d918b6affd2acfb2bcfe-brick_ebb2523fa96dbfe301c74e16428b04a0-brick.vol: option shared-brick-count 0 10.10.2.22: ./vol_320b6dab471a7b810d92ff03e9ef05c6.10.51.2.20.var-lib-heketi-mounts-vg_929e45b20519c80a714d7645061e354f-brick_5bd825a22e9511d539d24226a3d937a7-brick.vol: option shared-brick-count 0 ./vol_320b6dab471a7b810d92ff03e9ef05c6.10.51.2.22.var-lib-heketi-mounts-vg_d42ee5516f065e5f10b223bbb0a00d9b-brick_6078cfee3d8e48b50586b539fdfe8d61-brick.vol: option shared-brick-count 2 ./vol_320b6dab471a7b810d92ff03e9ef05c6.10.51.2.21.var-lib-heketi-mounts-vg_62960212c5851a4f597ee9ccfd6ae6d9-brick_edbd921fc1f3a9431eaa14eb8afff4d3-brick.vol: option shared-brick-count 0 ./vol_320b6dab471a7b810d92ff03e9ef05c6.10.51.2.19.var-lib-heketi-mounts-vg_58768cbf62201deef23eb06ab4161ca8-brick_fd4e796278c127f6a7b0d70d5689a24e-brick.vol: option shared-brick-count 0 ./vol_320b6dab471a7b810d92ff03e9ef05c6.10.51.2.20.var-lib-heketi-mounts-vg_13de35a047bf8fd839f8b5b6c5aa7b20-brick_df9b0d0b41cd17848212a9e2215eba8a-brick.vol: option shared-brick-count 0 ./vol_320b6dab471a7b810d92ff03e9ef05c6.10.51.2.22.var-lib-heketi-mounts-vg_19d11e2d0689d918b6affd2acfb2bcfe-brick_ebb2523fa96dbfe301c74e16428b04a0-brick.vol: option shared-brick-count 2 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 09:48:20 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 09:48:20 +0000 Subject: [Bugs] [Bug 1744950] glusterfs wrong size with total sum of brick. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744950 Nithya Balachandran changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED Component|core |glusterd --- Comment #11 from Nithya Balachandran --- Does this volume have the same problem as the other one? If yes, the problem is with the volfiles for the bricks on 10.10.2.22: ./vol_320b6dab471a7b810d92ff03e9ef05c6.10.51.2.22.var-lib-heketi-mounts-vg_d42ee5516f065e5f10b223bbb0a00d9b-brick_6078cfee3d8e48b50586b539fdfe8d61-brick.vol: option shared-brick-count 2 ./vol_320b6dab471a7b810d92ff03e9ef05c6.10.51.2.22.var-lib-heketi-mounts-vg_19d11e2d0689d918b6affd2acfb2bcfe-brick_ebb2523fa96dbfe301c74e16428b04a0-brick.vol: option shared-brick-count 2 Both these have a shared-brick-count value of 2 which causes gluster to internally halve the available disk size for these bricks. As they are on different replica sets and the lowest disk space value of of the bricks is taken for the disk space of the replica set, this means the value of the disk space is halved for the entire volume. This is the same problem reported in https://bugzilla.redhat.com/show_bug.cgi?id=1517260. To recover, please do the following: 1. Restart glusterd on each node 2. For each volume, run the following command from any one gluster node: gluster v set cluster.min-free-disk 11% This should regenerate the volfiles with the correct values. Recheck the shared-brick-count values after doing these steps - the values should be 0 or 1. The df values should also be correct. Moving this to the glusterd component. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 10:04:29 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 10:04:29 +0000 Subject: [Bugs] [Bug 1558507] Gluster allows renaming of folders, which contain WORMed/Retain or WORMed files In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1558507 david.spisla at iternity.com changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(david.spisla at iter | |nity.com) | --- Comment #7 from david.spisla at iternity.com --- I followed the discussion and I agree to the arguments. You can close the issue from my point of view -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 27 10:50:55 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 10:50:55 +0000 Subject: [Bugs] [Bug 1745965] New: glusterd fails to start dumping core via SIGABRT Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1745965 Bug ID: 1745965 Summary: glusterd fails to start dumping core via SIGABRT Product: GlusterFS Version: mainline OS: Linux Status: NEW Component: glusterd Severity: medium Assignee: bugs at gluster.org Reporter: anoopcs at redhat.com CC: bugs at gluster.org Target Milestone: --- Classification: Community Description of problem: glusterd fails to come up either using systemd control or direct invoking dumping core via SIGABRT. Version-Release number of selected component (if applicable): master How reproducible: Always Steps to Reproduce: 1. Install GlusterFS nightly rpms from https://ci.centos.org/artifacts/gluster/nightly/master.repo 2. Try to being up glusterd # glusterd --debug Actual results: glusterd process exits dumping core with SIGABRT Expected results: glusterd does not crash and process is alive. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 10:51:37 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 10:51:37 +0000 Subject: [Bugs] [Bug 1745965] glusterd fails to start dumping core via SIGABRT In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1745965 --- Comment #1 from Anoop C S --- $ sudo gdb /usr/sbin/glusterd core.12898 GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-114.el7 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: ... Reading symbols from /usr/sbin/glusterfsd...Reading symbols from /usr/lib/debug/usr/sbin/glusterfsd.debug...done. done. warning: core file may not match specified executable file. [New LWP 12898] [New LWP 12899] [New LWP 12900] [New LWP 12901] [New LWP 12902] [New LWP 12903] [New LWP 12904] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `glusterd --debug'. Program terminated with signal 6, Aborted. #0 0x00007fbd7ad852c7 in __GI_raise (sig=sig at entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55 55 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig); (gdb) bt #0 0x00007fbd7ad852c7 in __GI_raise (sig=sig at entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55 #1 0x00007fbd7ad869b8 in __GI_abort () at abort.c:90 #2 0x00007fbd7adc7e17 in __libc_message (do_abort=do_abort at entry=2, fmt=fmt at entry=0x7fbd7aed8492 "*** %s ***: %s terminated\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:196 #3 0x00007fbd7ae66b67 in __GI___fortify_fail (msg=msg at entry=0x7fbd7aed8438 "buffer overflow detected") at fortify_fail.c:30 #4 0x00007fbd7ae64ce2 in __GI___chk_fail () at chk_fail.c:28 #5 0x00007fbd7ae643fb in ___vsnprintf_chk (s=, maxlen=, flags=, slen=, format=, args=args at entry=0x7ffefca6cdf8) at vsnprintf_chk.c:37 #6 0x00007fbd7ae64318 in ___snprintf_chk (s=s at entry=0x7ffefca6d130 "", maxlen=maxlen at entry=4096, flags=flags at entry=1, slen=slen at entry=3776, format=format at entry=0x7fbd709a8eab "%s") at snprintf_chk.c:35 #7 0x00007fbd70866029 in snprintf (__fmt=0x7fbd709a8eab "%s", __n=4096, __s=0x7ffefca6d130 "") at /usr/include/bits/stdio2.h:64 #8 init (this=0x557ef9f3b510) at glusterd.c:1450 #9 0x00007fbd7c740ed1 in __xlator_init (xl=0x557ef9f3b510) at xlator.c:597 #10 xlator_init (xl=xl at entry=0x557ef9f3b510) at xlator.c:623 #11 0x00007fbd7c77dbd9 in glusterfs_graph_init (graph=graph at entry=0x557ef9f37140) at graph.c:422 #12 0x00007fbd7c77e245 in glusterfs_graph_activate (graph=graph at entry=0x557ef9f37140, ctx=ctx at entry=0x557ef9ef2010) at graph.c:776 #13 0x0000557ef8287182 in glusterfs_process_volfp (ctx=ctx at entry=0x557ef9ef2010, fp=fp at entry=0x557ef9f36bb0) at glusterfsd.c:2728 #14 0x0000557ef828733d in glusterfs_volumes_init (ctx=ctx at entry=0x557ef9ef2010) at glusterfsd.c:2800 #15 0x0000557ef8282a3a in main (argc=2, argv=) at glusterfsd.c:2962 (gdb) f 8 #8 init (this=0x557ef9f3b510) at glusterd.c:1450 1450 len = snprintf(logdir, PATH_MAX, "%s", DEFAULT_LOG_FILE_DIRECTORY); (gdb) l 1396 1391 0, 1392 }; 1393 char rundir[PATH_MAX] = { 1394 0, 1395 }; 1396 char logdir[VALID_GLUSTERD_PATHMAX] = { 1397 0, 1398 }; 1399 char cmd_log_filename[PATH_MAX] = { 1400 0, (gdb) f 3 #3 0x00007fbd7ae66b67 in __GI___fortify_fail (msg=msg at entry=0x7fbd7aed8438 "buffer overflow detected") at fortify_fail.c:30 30 __libc_message (2, "*** %s ***: %s terminated\n", -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 10:52:44 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 10:52:44 +0000 Subject: [Bugs] [Bug 1745965] glusterd fails to start due to SIGABRT dumping core In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1745965 Anoop C S changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|glusterd fails to start |glusterd fails to start due |dumping core via SIGABRT |to SIGABRT dumping core -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 10:59:27 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 10:59:27 +0000 Subject: [Bugs] [Bug 1745967] New: File size was not truncated for all files when tried with rebalance in progress. Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1745967 Bug ID: 1745967 Summary: File size was not truncated for all files when tried with rebalance in progress. Product: GlusterFS Version: mainline Status: NEW Component: distribute Keywords: Triaged, ZStream Severity: medium Assignee: bugs at gluster.org Reporter: nbalacha at redhat.com CC: bugs at gluster.org, nbalacha at redhat.com, nchilaka at redhat.com, rhs-bugs at redhat.com, sankarshan at redhat.com, saraut at redhat.com, spalai at redhat.com, storage-qa-internal at redhat.com, tdesala at redhat.com Depends On: 1638333 Target Milestone: --- Classification: Community +++ This bug was initially created as a clone of Bug #1638333 +++ Description of problem: After triggering the rebalance process, simultaneously, truncate command was passed on the mount point, rebalance completed successfully and truncate command did not throw any error, but on doing "# ll ." it was noticed that file size of many files was not truncated to zero as per the truncate command. Version-Release number of selected component (if applicable): 3.12.2-22 How reproducible: 2/2 Steps to Reproduce: 1. Create a distributed-replicated volume (e.g. 3*3) 2. Start and mount the volume on client node. 3. Add brick to the volume using # gluster v add-brick volname replica 3 brick10 brick11 brick12 4. From the client node create files on the mount point e.g. # for i in {1..8000}; do dd if=/dev/urandom of=file_$i bs=1M count=1; done 5. Trigger rebalance. 6. While rebalance is still in progress, start truncating the files from the mount point e.g. # for i in {1..8000}; do truncate -s 0 file_$i; done 7. Wait for the migration to complete. 8. Now from the mount point check the size of all the files. Actual results: File size for many files was not truncated to zero. Expected results: All the files should have size zero. Server rebalance log snippet: ============================ [2018-10-11 10:05:39.996406] W [MSGID: 109023] [dht-rebalance.c:962:__dht_check_free_space] 0-cloud6-dht: data movement of file {blocks:2048 name:(/file_7860)} would result in dst node (cloud6-replicate-3:37985800) having lower disk space than the source node (cloud6-replicate-2:37999680).Skipping file. [2018-10-11 10:05:40.003945] W [MSGID: 109023] [dht-rebalance.c:962:__dht_check_free_space] 0-cloud6-dht: data movement of file {blocks:2048 name:(/file_7865)} would result in dst node (cloud6-replicate-3:37983752) having lower disk space than the source node (cloud6-replicate-2:37999680).Skipping file. [2018-10-11 10:05:40.009722] I [dht-rebalance.c:1516:dht_migrate_file] 0-cloud6-dht: /file_7923: attempting to move from cloud6-replicate-2 to cloud6-replicate-3 [2018-10-11 10:05:40.015101] I [MSGID: 109126] [dht-rebalance.c:2825:gf_defrag_migrate_single_file] 0-cloud6-dht: File migration skipped for /file_7860. [2018-10-11 10:05:40.021613] I [MSGID: 109126] [dht-rebalance.c:2825:gf_defrag_migrate_single_file] 0-cloud6-dht: File migration skipped for /file_7865. [2018-10-11 10:05:40.026830] W [MSGID: 109023] [dht-rebalance.c:962:__dht_check_free_space] 0-cloud6-dht: data movement of file {blocks:2048 name:(/file_7905)} would result in dst node (cloud6-replicate-3:37985800) having lower disk space than the source node (cloud6-replicate-2:38001728).Skipping file. [2018-10-11 10:05:40.039200] I [MSGID: 109126] [dht-rebalance.c:2825:gf_defrag_migrate_single_file] 0-cloud6-dht: File migration skipped for /file_7905. Mount point "# ll ." command output snippet: =========================================== -rw-r--r--. 1 root root 0 Oct 11 15:34 file_2266 -rw-r--r--. 1 root root 0 Oct 11 15:34 file_2267 -rw-r--r--. 1 root root 0 Oct 11 15:34 file_2268 -rw-r--r--. 1 root root 0 Oct 11 15:34 file_2269 -rw-r--r--. 1 root root 0 Oct 11 15:34 file_227 -rw-r--r--. 1 root root 0 Oct 11 15:34 file_2270 -rw-r--r--. 1 root root 0 Oct 11 15:34 file_2271 -rw-r--r--. 1 root root 0 Oct 11 15:34 file_2272 -rw-r--r--. 1 root root 0 Oct 11 15:34 file_2273 -rw-r--r--. 1 root root 0 Oct 11 15:34 file_2274 -rw-r--r--. 1 root root 1048576 Oct 11 14:52 file_2275 -rw-r--r--. 1 root root 1048576 Oct 11 14:52 file_2275 -rw-r--r--. 1 root root 0 Oct 11 15:34 file_2276 -rw-r--r--. 1 root root 1048576 Oct 11 14:52 file_2277 -rw-r--r--. 1 root root 1048576 Oct 11 14:52 file_2277 -rw-r--r--. 1 root root 1048576 Oct 11 14:52 file_2278 -rw-r--r--. 1 root root 1048576 Oct 11 14:52 file_2278 -rw-r--r--. 1 root root 1048576 Oct 11 14:52 file_2279 -rw-r--r--. 1 root root 1048576 Oct 11 14:52 file_2279 -rw-r--r--. 1 root root 0 Oct 11 15:34 file_228 -rw-r--r--. 1 root root 0 Oct 11 15:34 file_2280 -rw-r--r--. 1 root root 0 Oct 11 15:34 file_2281 --- Additional comment from Nithya Balachandran on 2018-11-19 04:34:20 UTC --- This needs to be fixed. No analysis done yet so no RCA. Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1638333 [Bug 1638333] File size was not truncated for all files when tried with rebalance in progress. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 11:08:43 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 11:08:43 +0000 Subject: [Bugs] [Bug 1745967] File size was not truncated for all files when tried with rebalance in progress. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1745967 Nithya Balachandran changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED --- Comment #1 from Nithya Balachandran --- RCA: If a file was truncated during a migration, no error is returned by __dht_rebalance_migrate_data. Now, if the ia_size of the src file is less than the number of bytes written to the destination, we abort the data migration and error out. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 11:10:51 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 11:10:51 +0000 Subject: [Bugs] [Bug 1745967] File size was not truncated for all files when tried with rebalance in progress. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1745967 Nithya Balachandran changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords|Triaged, ZStream | Assignee|bugs at gluster.org |nbalacha at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 11:26:46 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 11:26:46 +0000 Subject: [Bugs] [Bug 1745967] File size was not truncated for all files when tried with rebalance in progress. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1745967 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23308 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 27 11:26:47 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 11:26:47 +0000 Subject: [Bugs] [Bug 1745967] File size was not truncated for all files when tried with rebalance in progress. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1745967 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |POST --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23308 (cluster/dht: Handle file truncates during migration) posted (#1) for review on master by N Balachandran -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 27 11:27:10 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 11:27:10 +0000 Subject: [Bugs] [Bug 1745914] ESTALE change in fuse breaks get_real_filename implementation In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1745914 Atin Mukherjee changed: What |Removed |Added ---------------------------------------------------------------------------- Version|4.1 |7 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 11:31:12 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 11:31:12 +0000 Subject: [Bugs] [Bug 1745965] glusterd fails to start due to SIGABRT dumping core In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1745965 Nithya Balachandran changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED CC| |nbalacha at redhat.com Assignee|bugs at gluster.org |nbalacha at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 11:32:54 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 11:32:54 +0000 Subject: [Bugs] [Bug 1745965] glusterd fails to start due to SIGABRT dumping core In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1745965 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23309 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 27 11:32:55 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 11:32:55 +0000 Subject: [Bugs] [Bug 1745965] glusterd fails to start due to SIGABRT dumping core In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1745965 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |POST --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23309 (glusterd: Fixed incorrect size argument) posted (#1) for review on master by N Balachandran -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 27 12:02:42 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 12:02:42 +0000 Subject: [Bugs] [Bug 1732961] shard file with different gfid in different subvolume causing VM to pause on stale file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732961 Nithya Balachandran changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |bugs at gluster.org, | |nbalacha at redhat.com Component|distribute |distribute Version|unspecified |mainline Assignee|spalai at redhat.com |bugs at gluster.org Product|Red Hat Gluster Storage |GlusterFS QA Contact|tdesala at redhat.com | --- Comment #4 from Nithya Balachandran --- Olaf, Gluster 3.12.15 is an upstream release so it looks like you are using the upstream bits. I am modifying the BZ accordingly. Regards, Nithya -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 12:06:04 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 12:06:04 +0000 Subject: [Bugs] [Bug 1744420] glusterd crashing with core dump on the latest nightly builds. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744420 Atin Mukherjee changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST Depends On| |1745965 --- Comment #7 from Atin Mukherjee --- Fix posted https://review.gluster.org/#/c/glusterfs/+/23309 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1745965 [Bug 1745965] glusterd fails to start due to SIGABRT dumping core -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 27 12:06:04 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 12:06:04 +0000 Subject: [Bugs] [Bug 1745965] glusterd fails to start due to SIGABRT dumping core In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1745965 Atin Mukherjee changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1744420 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1744420 [Bug 1744420] glusterd crashing with core dump on the latest nightly builds. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 27 12:11:44 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 12:11:44 +0000 Subject: [Bugs] [Bug 1672480] Bugs Test Module tests failing on s390x In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1672480 abhays changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo? |needinfo?(atumball at redhat.c | |om) --- Comment #68 from abhays --- @Amar Could you please share the details of the environment and the log file, you encountered this error on? -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 27 12:23:51 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 12:23:51 +0000 Subject: [Bugs] [Bug 1732961] shard file with different gfid in different subvolume causing VM to pause on stale file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732961 Nithya Balachandran changed: What |Removed |Added ---------------------------------------------------------------------------- Comment #2 is|1 |0 private| | Status|NEW |ASSIGNED Flags| |needinfo?(olaf.buitelaar at gm | |ail.com) --- Comment #5 from Nithya Balachandran --- Olaf, Please provide the information Susant has asked for as well as the gluster volume info, the gluster mount and brick logs. Regards, Nithya -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 12:39:46 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 12:39:46 +0000 Subject: [Bugs] [Bug 1558507] Gluster allows renaming of folders, which contain WORMed/Retain or WORMed files In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1558507 Karthik U S changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |CLOSED Resolution|--- |WONTFIX Last Closed| |2019-08-27 12:39:46 --- Comment #8 from Karthik U S --- Thanks for the ack David. Closing this bug as per comment #7. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 27 13:01:39 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 13:01:39 +0000 Subject: [Bugs] [Bug 1732961] shard file with different gfid in different subvolume causing VM to pause on stale file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732961 Olaf Buitelaar changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(olaf.buitelaar at gm | |ail.com) | --- Comment #6 from Olaf Buitelaar --- Dear Susant, Sorry i completely missed your comment. I don't really think it's possible to update the IO pattern of the App's. It's essentially VM's (via KVM, managed by oVirt) The vm's are running various apps via docker, probably the most IO intensive are Mariadb databases. I've recently updated to gluster 6.4 and are currently running gluster 6.5, and the issue still exists. however i did notice a few things; on VM's where this issue did occur before, i ran sparsify (https://ovirt.org/develop/release-management/features/virt/virt-sparsify.html) after that i've not observed the issue since. On 2 VM's the issue still exists, these VM's do have snapshots, and didn't allow me to run sparsify. They currently only seem to pause when running xfs_fsr. Except once, while a brick-replace was in action (as we're migrating to a new environment). @Nithya I've attached the logs earlier send in the mailing list. If you do require new logs let me know which exactly you're interested in, since it shouldn't be hard to reproduce the issue here. I don't think the mount logs will add much, since the VM's use libgfapi Thanks Olaf -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 13:02:47 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 13:02:47 +0000 Subject: [Bugs] [Bug 1732961] shard file with different gfid in different subvolume causing VM to pause on stale file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732961 --- Comment #7 from Olaf Buitelaar --- Created attachment 1608565 --> https://bugzilla.redhat.com/attachment.cgi?id=1608565&action=edit ansible job to clear T file, showing shard's affected -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 13:03:06 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 13:03:06 +0000 Subject: [Bugs] [Bug 1732961] shard file with different gfid in different subvolume causing VM to pause on stale file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732961 --- Comment #8 from Olaf Buitelaar --- Created attachment 1608566 --> https://bugzilla.redhat.com/attachment.cgi?id=1608566&action=edit ansible job to clear T file, showing shard's affected -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 13:04:01 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 13:04:01 +0000 Subject: [Bugs] [Bug 1732961] shard file with different gfid in different subvolume causing VM to pause on stale file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732961 --- Comment #9 from Olaf Buitelaar --- Created attachment 1608567 --> https://bugzilla.redhat.com/attachment.cgi?id=1608567&action=edit qemu log -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 13:04:18 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 13:04:18 +0000 Subject: [Bugs] [Bug 1732961] shard file with different gfid in different subvolume causing VM to pause on stale file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732961 --- Comment #10 from Olaf Buitelaar --- Created attachment 1608568 --> https://bugzilla.redhat.com/attachment.cgi?id=1608568&action=edit qemu log -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 13:04:38 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 13:04:38 +0000 Subject: [Bugs] [Bug 1732961] shard file with different gfid in different subvolume causing VM to pause on stale file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732961 --- Comment #11 from Olaf Buitelaar --- Created attachment 1608569 --> https://bugzilla.redhat.com/attachment.cgi?id=1608569&action=edit qemu log -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 13:05:07 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 13:05:07 +0000 Subject: [Bugs] [Bug 1732961] shard file with different gfid in different subvolume causing VM to pause on stale file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732961 --- Comment #12 from Olaf Buitelaar --- Created attachment 1608570 --> https://bugzilla.redhat.com/attachment.cgi?id=1608570&action=edit qemu log -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 13:05:52 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 13:05:52 +0000 Subject: [Bugs] [Bug 1732961] shard file with different gfid in different subvolume causing VM to pause on stale file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732961 --- Comment #13 from Olaf Buitelaar --- Created attachment 1608571 --> https://bugzilla.redhat.com/attachment.cgi?id=1608571&action=edit volume info -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 13:06:17 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 13:06:17 +0000 Subject: [Bugs] [Bug 1732961] shard file with different gfid in different subvolume causing VM to pause on stale file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732961 --- Comment #14 from Olaf Buitelaar --- Created attachment 1608572 --> https://bugzilla.redhat.com/attachment.cgi?id=1608572&action=edit volume info -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 13:06:46 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 13:06:46 +0000 Subject: [Bugs] [Bug 1732961] shard file with different gfid in different subvolume causing VM to pause on stale file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732961 --- Comment #15 from Olaf Buitelaar --- Created attachment 1608573 --> https://bugzilla.redhat.com/attachment.cgi?id=1608573&action=edit volume info -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 13:07:12 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 13:07:12 +0000 Subject: [Bugs] [Bug 1732961] shard file with different gfid in different subvolume causing VM to pause on stale file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732961 --- Comment #16 from Olaf Buitelaar --- Created attachment 1608574 --> https://bugzilla.redhat.com/attachment.cgi?id=1608574&action=edit volume info -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 13:07:54 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 13:07:54 +0000 Subject: [Bugs] [Bug 1732961] shard file with different gfid in different subvolume causing VM to pause on stale file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732961 --- Comment #17 from Olaf Buitelaar --- Created attachment 1608575 --> https://bugzilla.redhat.com/attachment.cgi?id=1608575&action=edit gluster logs -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 13:08:32 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 13:08:32 +0000 Subject: [Bugs] [Bug 1732961] shard file with different gfid in different subvolume causing VM to pause on stale file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732961 --- Comment #18 from Olaf Buitelaar --- Created attachment 1608576 --> https://bugzilla.redhat.com/attachment.cgi?id=1608576&action=edit gluster logs -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 13:09:09 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 13:09:09 +0000 Subject: [Bugs] [Bug 1732961] shard file with different gfid in different subvolume causing VM to pause on stale file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732961 --- Comment #19 from Olaf Buitelaar --- Created attachment 1608577 --> https://bugzilla.redhat.com/attachment.cgi?id=1608577&action=edit gluster logs -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 13:09:34 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 13:09:34 +0000 Subject: [Bugs] [Bug 1732961] shard file with different gfid in different subvolume causing VM to pause on stale file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732961 --- Comment #20 from Olaf Buitelaar --- Created attachment 1608578 --> https://bugzilla.redhat.com/attachment.cgi?id=1608578&action=edit gluster logs -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 13:09:58 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 13:09:58 +0000 Subject: [Bugs] [Bug 1732961] shard file with different gfid in different subvolume causing VM to pause on stale file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732961 --- Comment #21 from Olaf Buitelaar --- Created attachment 1608579 --> https://bugzilla.redhat.com/attachment.cgi?id=1608579&action=edit gluster logs -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 13:10:36 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 13:10:36 +0000 Subject: [Bugs] [Bug 1732961] shard file with different gfid in different subvolume causing VM to pause on stale file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732961 --- Comment #22 from Olaf Buitelaar --- Created attachment 1608580 --> https://bugzilla.redhat.com/attachment.cgi?id=1608580&action=edit gluster logs -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 13:11:02 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 13:11:02 +0000 Subject: [Bugs] [Bug 1732961] shard file with different gfid in different subvolume causing VM to pause on stale file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732961 --- Comment #23 from Olaf Buitelaar --- Created attachment 1608581 --> https://bugzilla.redhat.com/attachment.cgi?id=1608581&action=edit gluster logs -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 13:11:54 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 13:11:54 +0000 Subject: [Bugs] [Bug 1732961] shard file with different gfid in different subvolume causing VM to pause on stale file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732961 --- Comment #24 from Olaf Buitelaar --- Created attachment 1608582 --> https://bugzilla.redhat.com/attachment.cgi?id=1608582&action=edit gluster logs -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 13:12:19 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 13:12:19 +0000 Subject: [Bugs] [Bug 1732961] shard file with different gfid in different subvolume causing VM to pause on stale file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732961 --- Comment #25 from Olaf Buitelaar --- Created attachment 1608583 --> https://bugzilla.redhat.com/attachment.cgi?id=1608583&action=edit gluster logs -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 13:19:34 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 13:19:34 +0000 Subject: [Bugs] [Bug 1183054] rpmlint throws couple of errors for RPM spec file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1183054 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-27 13:19:34 --- Comment #3 from Worker Ant --- REVIEW: https://review.gluster.org/23039 (build: fix rpmlint warnings in specfile) merged (#5) on master by Amar Tumballi -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 27 14:11:48 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 14:11:48 +0000 Subject: [Bugs] [Bug 1734027] glusterd 6.4 memory leaks 2-3 GB per 24h (OOM) In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734027 --- Comment #15 from Alex --- Well after 2 weeks without the exporter it seems the memory is staying stable on all 3 nodes. So this would indicate there is something not getting cleaned up by gluster after the specific requests from the prometheus exporter? -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 27 14:14:12 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 14:14:12 +0000 Subject: [Bugs] [Bug 1744874] interrupts leak memory In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744874 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-27 14:14:12 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23285 (fuse: add missing GF_FREE to fuse_interrupt) merged (#3) on release-7 by N Balachandran -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 14:14:13 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 14:14:13 +0000 Subject: [Bugs] [Bug 1734423] interrupts leak memory In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734423 Bug 1734423 depends on bug 1744874, which changed state. Bug 1744874 Summary: interrupts leak memory https://bugzilla.redhat.com/show_bug.cgi?id=1744874 What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 27 15:09:16 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 15:09:16 +0000 Subject: [Bugs] [Bug 1746067] New: packaging: rdma on s390x, unnecessary ldconfig scriptlets Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1746067 Bug ID: 1746067 Summary: packaging: rdma on s390x, unnecessary ldconfig scriptlets Product: GlusterFS Version: 7 Status: NEW Component: packaging Assignee: bugs at gluster.org Reporter: kkeithle at redhat.com CC: bugs at gluster.org Target Milestone: --- Classification: Community This bug was initially created as a copy of Bug #1686875 I am copying this bug because: Description of problem: rdma on s390x since f27, rhel7 since 2016 unnecessary ldconfig in scriptlets reported by fedora Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 15:20:54 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 15:20:54 +0000 Subject: [Bugs] [Bug 1746067] packaging: rdma on s390x, unnecessary ldconfig scriptlets In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1746067 Kaleb KEITHLEY changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |CLOSED Resolution|--- |NOTABUG Last Closed| |2019-08-27 15:20:54 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 15:30:19 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 15:30:19 +0000 Subject: [Bugs] [Bug 1745965] glusterd fails to start due to SIGABRT dumping core In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1745965 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-27 15:30:19 --- Comment #3 from Worker Ant --- REVIEW: https://review.gluster.org/23309 (glusterd: Fixed incorrect size argument) merged (#2) on master by Atin Mukherjee -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 27 15:30:19 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 15:30:19 +0000 Subject: [Bugs] [Bug 1744420] glusterd crashing with core dump on the latest nightly builds. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744420 Bug 1744420 depends on bug 1745965, which changed state. Bug 1745965 Summary: glusterd fails to start due to SIGABRT dumping core https://bugzilla.redhat.com/show_bug.cgi?id=1745965 What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 27 15:56:12 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 15:56:12 +0000 Subject: [Bugs] [Bug 1744883] GlusterFS problem dataloss In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744883 Roman changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |roman.nozdrin at mariadb.com --- Comment #3 from Roman --- Greetings, I'm from Mariadb ColumnStore development team. CS is a database engine that extensively works with data stored on GFS bricks. In the case described by Nicolla we are facing errors when trying to access files that exists according with ls output. Nicolla will add ls output to prove the files are visible from OS perspective. However when we try to open files programmaticaly using VFS these function call fails with ENOENT. 86814 stat("/000.dir/000.dir/015.dir/064.dir/008.dir/FILE002.cdf", 0x7ff488ffdc80) = -1 ENOENT (No such file or directory) 86814 open("/000.dir/000.dir/015.dir/064.dir/008.dir/FILE002.cdf", O_RDONLY|O_NOATIME) = -1 ENOENT (No such file or directory) Hope this makes the issue clear. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 17:28:17 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 17:28:17 +0000 Subject: [Bugs] [Bug 1746118] New: capture stat failure error while setting the gfid Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1746118 Bug ID: 1746118 Summary: capture stat failure error while setting the gfid Product: GlusterFS Version: mainline Status: NEW Component: posix Assignee: bugs at gluster.org Reporter: rabhat at redhat.com CC: bugs at gluster.org Depends On: 1736481, 1736482 Target Milestone: --- Classification: Community +++ This bug was initially created as a clone of Bug #1736482 +++ +++ This bug was initially created as a clone of Bug #1736481 +++ Description of problem: For create operation, after the entry is created, posix xlator tries to set the gfid for that entry. While doing that, there are several places where setting gfid can fail. While the failure is handled in all the cases, for one of the failure cases, the errno is not captured. Capturing this might help in debugging. int posix_gfid_set(xlator_t *this, const char *path, loc_t *loc, dict_t *xattr_req, pid_t pid, int *op_errno) { uuid_t uuid_req; uuid_t uuid_curr; int ret = 0; ssize_t size = 0; struct stat stat = { 0, }; *op_errno = 0; if (!xattr_req) { if (pid != GF_SERVER_PID_TRASH) { gf_msg(this->name, GF_LOG_ERROR, EINVAL, P_MSG_INVALID_ARGUMENT, "xattr_req is null"); *op_errno = EINVAL; ret = -1; } goto out; } if (sys_lstat(path, &stat) != 0) { ret = -1; gf_msg(this->name, GF_LOG_ERROR, errno, P_MSG_LSTAT_FAILED, "lstat on %s failed", path); goto out; } HERE, errno is not captured. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: --- Additional comment from Worker Ant on 2019-08-01 19:47:09 UTC --- REVIEW: https://review.gluster.org/23144 (storage/posix: set the op_errno to proper errno during gfid set) posted (#1) for review on master by Raghavendra Bhat --- Additional comment from Worker Ant on 2019-08-04 07:09:48 UTC --- REVIEW: https://review.gluster.org/23144 (storage/posix: set the op_errno to proper errno during gfid set) merged (#2) on master by Amar Tumballi Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1736481 [Bug 1736481] capture stat failure error while setting the gfid https://bugzilla.redhat.com/show_bug.cgi?id=1736482 [Bug 1736482] capture stat failure error while setting the gfid -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 17:28:17 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 17:28:17 +0000 Subject: [Bugs] [Bug 1736482] capture stat failure error while setting the gfid In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1736482 Raghavendra Bhat changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1746118 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1746118 [Bug 1746118] capture stat failure error while setting the gfid -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 17:28:17 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 17:28:17 +0000 Subject: [Bugs] [Bug 1736481] capture stat failure error while setting the gfid In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1736481 Raghavendra Bhat changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1746118 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1746118 [Bug 1746118] capture stat failure error while setting the gfid -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 27 17:28:31 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 17:28:31 +0000 Subject: [Bugs] [Bug 1746118] capture stat failure error while setting the gfid In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1746118 Raghavendra Bhat changed: What |Removed |Added ---------------------------------------------------------------------------- Version|mainline |6 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 17:31:37 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 17:31:37 +0000 Subject: [Bugs] [Bug 1746118] capture stat failure error while setting the gfid In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1746118 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23311 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 17:31:38 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 17:31:38 +0000 Subject: [Bugs] [Bug 1746118] capture stat failure error while setting the gfid In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1746118 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23311 (storage/posix: set the op_errno to proper errno during gfid set) posted (#1) for review on release-6 by Raghavendra Bhat -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 18:06:39 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 18:06:39 +0000 Subject: [Bugs] [Bug 1746138] New: ctime: If atime is updated via utimensat syscall ctime is not getting updated Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1746138 Bug ID: 1746138 Summary: ctime: If atime is updated via utimensat syscall ctime is not getting updated Product: GlusterFS Version: 6 Hardware: x86_64 OS: Linux Status: NEW Component: ctime Severity: high Assignee: bugs at gluster.org Reporter: khiremat at redhat.com CC: bugs at gluster.org, nchilaka at redhat.com Depends On: 1738786 Blocks: 1743627 Target Milestone: --- Classification: Community +++ This bug was initially created as a clone of Bug #1738786 +++ Description of problem: When atime|mtime is updated via utime family of syscalls, ctime is not updated. Version-Release number of selected component (if applicable): mainline How reproducible: Always Steps to Reproduce: touch /mnt/file1 stat /mnt/file1 sleep 1; touch -m -d "2020-01-01 12:00:00" /mnt/file1 stat /mnt/file1 Actual results: ctime is same between two stats above Expected results: ctime should be changed between two stats above Additional info: --- Additional comment from Worker Ant on 2019-08-08 07:43:19 UTC --- REVIEW: https://review.gluster.org/23177 (ctime: Fix ctime issue with utime family of syscalls) posted (#1) for review on master by Kotresh HR --- Additional comment from Kotresh HR on 2019-08-20 10:49:58 UTC --- Discussion at the patch which is worth mentioning here Kinglong Mee: @Kotresh HR, I cannot reproduce this problem as you description, Steps to Reproduce: touch /mnt/file1 stat /mnt/file1 sleep 1; touch -m -d "2020-01-01 12:00:00" /mnt/file1 stat /mnt/file1 Actual results: ctime is same between two stats above Expected results: ctime should be changed between two stats above I test at glusterfs mount and nfs mount, all get right result as, # sh test.sh File: file1 Size: 0 Blocks: 0 IO Block: 131072 regular empty file Device: 30h/48d Inode: 13578024673158818984 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2019-08-08 18:39:16.099467595 +0800 Modify: 2019-08-08 18:39:16.099467595 +0800 Change: 2019-08-08 18:39:16.100847925 +0800 Birth: - File: file1 Size: 0 Blocks: 0 IO Block: 131072 regular empty file Device: 30h/48d Inode: 13578024673158818984 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2019-08-08 18:39:16.099467595 +0800 Modify: 2020-01-01 12:00:00.000000000 +0800 Change: 2019-08-08 18:39:17.126800759 +0800 Birth: - --------------- Sorry, this is not happening with your patch[1]. Because we don't update ctime to mtime which was previously being done. But with your patch [1], I think all setattrs even some internal setattrs are updating ctime. So when file is created, all times should be same. But now it's not. [root at f281 glusterfs]# stat /mastermnt/file3 File: /mastermnt/file3 Size: 0 Blocks: 0 IO Block: 131072 regular empty file Device: 2eh/46d Inode: 13563962387061186202 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Context: system_u:object_r:fusefs_t:s0 Access: 2019-08-08 17:44:31.341319550 +0530 Modify: 2019-08-08 17:44:31.341319550 +0530 Change: 2019-08-08 17:44:31.342550008 +0530 <<<< ctime is different Birth: - [root at f281 So I was trying to fix this issue. And other thing, ideally updating atime|mtime should update ctime with current time, which was not happening in "posix_update_utime_in_mdata" but was happening as part of "posix_set_ctime" in posix_setattr. ----------------------------- |But with your patch [1], I think all setattrs even some internal setattrs are updating ctime. So when file is created, all times should be same. But now it's not. Yes, you are right. when file is created, all times should be same. With the patch[1], those times are different. For nfs, a create of a file, nfs client sends a create rpc, and a setattr(set time of server). Ganesha.nfsd gets the CLOCK_REALTIME for mtime/atime, and the utime xlator gets the realtime for ctime, so that, we cannot gets all times same when creating file. I think we should let utime xlator gets the realtime for all times(ctime/atime/mtime), ganesha.nfsd does not do that. |With this patch, it is clean. I am inclined to take this patch in if this solves the original nfs problem you reported. Could you please test that out and let me know? With this patch, the nfs problem of bad ctime is not exist now. [1] https://review.gluster.org/#/c/23154/ --- Additional comment from Worker Ant on 2019-08-20 16:54:08 UTC --- REVIEW: https://review.gluster.org/23177 (ctime: Fix ctime issue with utime family of syscalls) merged (#7) on master by Amar Tumballi Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1738786 [Bug 1738786] ctime: If atime is updated via utimensat syscall ctime is not getting updated https://bugzilla.redhat.com/show_bug.cgi?id=1743627 [Bug 1743627] ctime: If atime is updated via utimensat syscall ctime is not getting updated -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 18:06:39 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 18:06:39 +0000 Subject: [Bugs] [Bug 1738786] ctime: If atime is updated via utimensat syscall ctime is not getting updated In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1738786 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1746138 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1746138 [Bug 1746138] ctime: If atime is updated via utimensat syscall ctime is not getting updated -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 27 18:06:39 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 18:06:39 +0000 Subject: [Bugs] [Bug 1743627] ctime: If atime is updated via utimensat syscall ctime is not getting updated In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743627 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Depends On| |1746138 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1746138 [Bug 1746138] ctime: If atime is updated via utimensat syscall ctime is not getting updated -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 27 18:11:21 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 18:11:21 +0000 Subject: [Bugs] [Bug 1746138] ctime: If atime is updated via utimensat syscall ctime is not getting updated In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1746138 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23312 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 18:11:22 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 18:11:22 +0000 Subject: [Bugs] [Bug 1746138] ctime: If atime is updated via utimensat syscall ctime is not getting updated In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1746138 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23312 (ctime: Fix ctime issue with utime family of syscalls) posted (#1) for review on release-6 by Kotresh HR -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 18:13:39 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 18:13:39 +0000 Subject: [Bugs] [Bug 1726175] CentOs 6 GlusterFS client creates files with time 01/01/1970 In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1726175 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23313 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 27 18:13:41 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 18:13:41 +0000 Subject: [Bugs] [Bug 1726175] CentOs 6 GlusterFS client creates files with time 01/01/1970 In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1726175 --- Comment #5 from Worker Ant --- REVIEW: https://review.gluster.org/23313 (ctime: Fix incorrect realtime passed to frame->root->ctime) posted (#1) for review on release-6 by Kotresh HR -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 27 18:13:46 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 18:13:46 +0000 Subject: [Bugs] [Bug 1746138] ctime: If atime is updated via utimensat syscall ctime is not getting updated In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1746138 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|bugs at gluster.org |khiremat at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 18:14:34 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 18:14:34 +0000 Subject: [Bugs] [Bug 1746140] New: geo-rep: Changelog archive file format is incorrect Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1746140 Bug ID: 1746140 Summary: geo-rep: Changelog archive file format is incorrect Product: GlusterFS Version: 6 Hardware: x86_64 OS: Linux Status: NEW Component: geo-replication Severity: medium Assignee: bugs at gluster.org Reporter: khiremat at redhat.com CC: bugs at gluster.org Depends On: 1741890 Blocks: 1743634 Target Milestone: --- Classification: Community +++ This bug was initially created as a clone of Bug #1741890 +++ Description of problem: The created changelog archive file didn't have corresponding year and month. It created as "archive_%Y%m.tar" on python2 only systems. [root at rhs-gp-srv7 xsync]# ls -l total 664564 -rw-r--r--. 1 root root 680509440 Aug 15 16:51 archive_%Y%m.tar [root at rhs-gp-srv7 xsync]# Version-Release number of selected component (if applicable): mainline How reproducible: Always on python2 only machine (centos7) Steps to Reproduce: 1. Create geo-rep session on python2 only machine 2. ls -l /var/lib/misc/gluster/gsyncd///.processed/ Actual results: changelog archive file format is incorrect. Not substituted with corresponding year and month Expected results: changelog archive file name should have correct year and month Additional info: --- Additional comment from Worker Ant on 2019-08-16 10:59:26 UTC --- REVIEW: https://review.gluster.org/23248 (geo-rep: Fix the name of changelog archive file) posted (#1) for review on master by Kotresh HR --- Additional comment from Worker Ant on 2019-08-22 10:03:57 UTC --- REVIEW: https://review.gluster.org/23248 (geo-rep: Fix the name of changelog archive file) merged (#4) on master by Aravinda VK Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1741890 [Bug 1741890] geo-rep: Changelog archive file format is incorrect https://bugzilla.redhat.com/show_bug.cgi?id=1743634 [Bug 1743634] geo-rep: Changelog archive file format is incorrect -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 18:14:34 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 18:14:34 +0000 Subject: [Bugs] [Bug 1741890] geo-rep: Changelog archive file format is incorrect In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741890 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1746140 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1746140 [Bug 1746140] geo-rep: Changelog archive file format is incorrect -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 27 18:14:34 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 18:14:34 +0000 Subject: [Bugs] [Bug 1743634] geo-rep: Changelog archive file format is incorrect In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743634 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Depends On| |1746140 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1746140 [Bug 1746140] geo-rep: Changelog archive file format is incorrect -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 27 18:24:33 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 18:24:33 +0000 Subject: [Bugs] [Bug 1746142] New: ctime: If atime is updated via utimensat syscall ctime is not getting updated Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1746142 Bug ID: 1746142 Summary: ctime: If atime is updated via utimensat syscall ctime is not getting updated Product: GlusterFS Version: 7 Hardware: x86_64 OS: Linux Status: NEW Component: ctime Severity: high Assignee: bugs at gluster.org Reporter: khiremat at redhat.com CC: bugs at gluster.org, nchilaka at redhat.com Depends On: 1738786 Blocks: 1743627, 1746138 Target Milestone: --- Classification: Community +++ This bug was initially created as a clone of Bug #1738786 +++ Description of problem: When atime|mtime is updated via utime family of syscalls, ctime is not updated. Version-Release number of selected component (if applicable): mainline How reproducible: Always Steps to Reproduce: touch /mnt/file1 stat /mnt/file1 sleep 1; touch -m -d "2020-01-01 12:00:00" /mnt/file1 stat /mnt/file1 Actual results: ctime is same between two stats above Expected results: ctime should be changed between two stats above Additional info: --- Additional comment from Worker Ant on 2019-08-08 07:43:19 UTC --- REVIEW: https://review.gluster.org/23177 (ctime: Fix ctime issue with utime family of syscalls) posted (#1) for review on master by Kotresh HR --- Additional comment from Kotresh HR on 2019-08-20 10:49:58 UTC --- Discussion at the patch which is worth mentioning here Kinglong Mee: @Kotresh HR, I cannot reproduce this problem as you description, Steps to Reproduce: touch /mnt/file1 stat /mnt/file1 sleep 1; touch -m -d "2020-01-01 12:00:00" /mnt/file1 stat /mnt/file1 Actual results: ctime is same between two stats above Expected results: ctime should be changed between two stats above I test at glusterfs mount and nfs mount, all get right result as, # sh test.sh File: file1 Size: 0 Blocks: 0 IO Block: 131072 regular empty file Device: 30h/48d Inode: 13578024673158818984 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2019-08-08 18:39:16.099467595 +0800 Modify: 2019-08-08 18:39:16.099467595 +0800 Change: 2019-08-08 18:39:16.100847925 +0800 Birth: - File: file1 Size: 0 Blocks: 0 IO Block: 131072 regular empty file Device: 30h/48d Inode: 13578024673158818984 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2019-08-08 18:39:16.099467595 +0800 Modify: 2020-01-01 12:00:00.000000000 +0800 Change: 2019-08-08 18:39:17.126800759 +0800 Birth: - --------------- Sorry, this is not happening with your patch[1]. Because we don't update ctime to mtime which was previously being done. But with your patch [1], I think all setattrs even some internal setattrs are updating ctime. So when file is created, all times should be same. But now it's not. [root at f281 glusterfs]# stat /mastermnt/file3 File: /mastermnt/file3 Size: 0 Blocks: 0 IO Block: 131072 regular empty file Device: 2eh/46d Inode: 13563962387061186202 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Context: system_u:object_r:fusefs_t:s0 Access: 2019-08-08 17:44:31.341319550 +0530 Modify: 2019-08-08 17:44:31.341319550 +0530 Change: 2019-08-08 17:44:31.342550008 +0530 <<<< ctime is different Birth: - [root at f281 So I was trying to fix this issue. And other thing, ideally updating atime|mtime should update ctime with current time, which was not happening in "posix_update_utime_in_mdata" but was happening as part of "posix_set_ctime" in posix_setattr. ----------------------------- |But with your patch [1], I think all setattrs even some internal setattrs are updating ctime. So when file is created, all times should be same. But now it's not. Yes, you are right. when file is created, all times should be same. With the patch[1], those times are different. For nfs, a create of a file, nfs client sends a create rpc, and a setattr(set time of server). Ganesha.nfsd gets the CLOCK_REALTIME for mtime/atime, and the utime xlator gets the realtime for ctime, so that, we cannot gets all times same when creating file. I think we should let utime xlator gets the realtime for all times(ctime/atime/mtime), ganesha.nfsd does not do that. |With this patch, it is clean. I am inclined to take this patch in if this solves the original nfs problem you reported. Could you please test that out and let me know? With this patch, the nfs problem of bad ctime is not exist now. [1] https://review.gluster.org/#/c/23154/ --- Additional comment from Worker Ant on 2019-08-20 16:54:08 UTC --- REVIEW: https://review.gluster.org/23177 (ctime: Fix ctime issue with utime family of syscalls) merged (#7) on master by Amar Tumballi Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1738786 [Bug 1738786] ctime: If atime is updated via utimensat syscall ctime is not getting updated https://bugzilla.redhat.com/show_bug.cgi?id=1743627 [Bug 1743627] ctime: If atime is updated via utimensat syscall ctime is not getting updated https://bugzilla.redhat.com/show_bug.cgi?id=1746138 [Bug 1746138] ctime: If atime is updated via utimensat syscall ctime is not getting updated -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 18:24:33 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 18:24:33 +0000 Subject: [Bugs] [Bug 1738786] ctime: If atime is updated via utimensat syscall ctime is not getting updated In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1738786 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1746142 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1746142 [Bug 1746142] ctime: If atime is updated via utimensat syscall ctime is not getting updated -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 27 18:24:33 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 18:24:33 +0000 Subject: [Bugs] [Bug 1743627] ctime: If atime is updated via utimensat syscall ctime is not getting updated In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743627 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Depends On| |1746142 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1746142 [Bug 1746142] ctime: If atime is updated via utimensat syscall ctime is not getting updated -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 27 18:24:33 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 18:24:33 +0000 Subject: [Bugs] [Bug 1746138] ctime: If atime is updated via utimensat syscall ctime is not getting updated In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1746138 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Depends On| |1746142 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1746142 [Bug 1746142] ctime: If atime is updated via utimensat syscall ctime is not getting updated -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 27 18:26:36 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 18:26:36 +0000 Subject: [Bugs] [Bug 1746142] ctime: If atime is updated via utimensat syscall ctime is not getting updated In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1746142 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|bugs at gluster.org |khiremat at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 18:27:11 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 18:27:11 +0000 Subject: [Bugs] [Bug 1746145] New: CentOs 6 GlusterFS client creates files with time 01/01/1970 Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1746145 Bug ID: 1746145 Summary: CentOs 6 GlusterFS client creates files with time 01/01/1970 Product: GlusterFS Version: 7 Hardware: x86_64 OS: Linux Status: NEW Component: ctime Severity: high Priority: medium Assignee: bugs at gluster.org Reporter: khiremat at redhat.com CC: alexis.fernandez at altafonte.com, atumball at redhat.com, baoboadev at gmail.com, bugs at gluster.org, khiremat at redhat.com, rkavunga at redhat.com Depends On: 1726175, 1743652 Target Milestone: --- Classification: Community +++ This bug was initially created as a clone of Bug #1743652 +++ +++ This bug was initially created as a clone of Bug #1726175 +++ Description of problem: CentOs 6 gluster client with glusterfs volume mounted creates files with time creation "01/01/1970". Files created by user apache, or with user root with vim, or nano, are created with bad date. But if create with touch, the date is correct. Version-Release number of selected component (if applicable): glusterfs-fuse-6.3-1.el6.x86_64 How reproducible: Create file in mountpoint with vim, or nano. Steps to Reproduce: 1. yum install centos-release-gluster6 2. yum install glusterfs-client 3. mount -t glusterfs IP:/remotevol /mnt/localdir 4. cd /mnt/localdir 5. vim asdasdad 6. :wq! 7. ls -lah asdasdad Actual results: -rw-r--r-- 1 root root 0 ene 1 1970 test Expected results: -rw-r--r-- 1 root root 0 jul 1 2019 test --- Additional comment from baoboa on 2019-07-02 15:23:52 UTC --- same behavior for a centos6 client server: glusterfs-server.x86_64 6.3-1.el7 @centos-gluster6 client: glusterfs-fuse.x86_64 6.3-1.el6 @centos-gluster6 kernel version: 2.6.32-573.3.1.el6.x86_64 mount -t glusterfs server:myvol /mnt/myvol touch /mnt/myvol/test -> correct time -rw-r--r-- 1 root root 12 Jul 1 11:59 test vi /mnt/myvol/test2 -> wrong time (1970) -rw-r--r-- 1 root root 7 Dec 18 1970 test2 REM: this not the case for a centos7 client, the creation time is correct recover correct time if ctime is deactivated "gluster volume set myvol features.ctime off" ls /mnt/myvol/ -rw-r--r-- 1 root root 7 Jul 2 17:16 test2 -rw-r--r-- 1 root root 12 Jul 1 11:59 test https://review.gluster.org/#/c/glusterfs/+/22651/ this review look related to this bug/regression --- Additional comment from on 2019-07-16 08:38:30 UTC --- Thanks baoboa, I can confirm the value "gluster volume set myvol features.ctime off" fix the issue with the date. Thanks. --- Additional comment from Worker Ant on 2019-08-20 10:41:20 UTC --- REVIEW: https://review.gluster.org/23274 (ctime: Fix incorrect realtime passed to frame->root->ctime) posted (#1) for review on master by Kotresh HR --- Additional comment from Worker Ant on 2019-08-20 11:45:45 UTC --- REVIEW: https://review.gluster.org/23274 (ctime: Fix incorrect realtime passed to frame->root->ctime) posted (#2) for review on master by Kotresh HR --- Additional comment from Worker Ant on 2019-08-22 05:35:53 UTC --- REVIEW: https://review.gluster.org/23274 (ctime: Fix incorrect realtime passed to frame->root->ctime) merged (#3) on master by Amar Tumballi Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1726175 [Bug 1726175] CentOs 6 GlusterFS client creates files with time 01/01/1970 https://bugzilla.redhat.com/show_bug.cgi?id=1743652 [Bug 1743652] CentOs 6 GlusterFS client creates files with time 01/01/1970 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 18:27:11 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 18:27:11 +0000 Subject: [Bugs] [Bug 1726175] CentOs 6 GlusterFS client creates files with time 01/01/1970 In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1726175 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1746145 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1746145 [Bug 1746145] CentOs 6 GlusterFS client creates files with time 01/01/1970 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 27 18:27:11 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 18:27:11 +0000 Subject: [Bugs] [Bug 1743652] CentOs 6 GlusterFS client creates files with time 01/01/1970 In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743652 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1746145 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1746145 [Bug 1746145] CentOs 6 GlusterFS client creates files with time 01/01/1970 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 27 18:27:13 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 18:27:13 +0000 Subject: [Bugs] [Bug 1746142] ctime: If atime is updated via utimensat syscall ctime is not getting updated In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1746142 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23314 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 27 18:27:14 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 18:27:14 +0000 Subject: [Bugs] [Bug 1746142] ctime: If atime is updated via utimensat syscall ctime is not getting updated In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1746142 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23314 (ctime: Fix ctime issue with utime family of syscalls) posted (#1) for review on release-7 by Kotresh HR -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Tue Aug 27 18:29:32 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 18:29:32 +0000 Subject: [Bugs] [Bug 1746145] CentOs 6 GlusterFS client creates files with time 01/01/1970 In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1746145 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23315 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 18:29:33 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 18:29:33 +0000 Subject: [Bugs] [Bug 1746145] CentOs 6 GlusterFS client creates files with time 01/01/1970 In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1746145 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23315 (ctime: Fix incorrect realtime passed to frame->root->ctime) posted (#1) for review on release-7 by Kotresh HR -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Tue Aug 27 18:31:09 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Tue, 27 Aug 2019 18:31:09 +0000 Subject: [Bugs] [Bug 1746145] CentOs 6 GlusterFS client creates files with time 01/01/1970 In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1746145 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|bugs at gluster.org |khiremat at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Wed Aug 28 02:17:16 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 28 Aug 2019 02:17:16 +0000 Subject: [Bugs] [Bug 1744420] glusterd crashing with core dump on the latest nightly builds. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744420 Atin Mukherjee changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-28 02:17:16 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 28 02:45:56 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 28 Aug 2019 02:45:56 +0000 Subject: [Bugs] [Bug 1745965] glusterd fails to start due to SIGABRT dumping core In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1745965 --- Comment #4 from Nithya Balachandran --- RCA: rpm builds use the following flags: $ rpm --showrc | grep stack-protector -13: __global_compiler_flags -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches %{_hardened_cflags} %{_annotated_cflags}Thanks to Nithya for mentioning the presence of -fstack-protector flag as a probable cause resulting in the crash which lead me to check default rpm build macros. -D_FORTIFY_SOURCE=2 checks for buffer overruns and aborts the process if it finds any. >From the coredump: #8 init (this=0x557ef9f3b510) at glusterd.c:1450 1450 len = snprintf(logdir, PATH_MAX, "%s", DEFAULT_LOG_FILE_DIRECTORY); But char logdir[VALID_GLUSTERD_PATHMAX] = {0,}; #define VALID_GLUSTERD_PATHMAX (PATH_MAX - (256 + 64)) so this can cause a buffer overrun. More info at: https://stackoverflow.com/questions/13517526/difference-between-gcc-d-fortify-source-1-and-d-fortify-source-2 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 28 02:53:06 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 28 Aug 2019 02:53:06 +0000 Subject: [Bugs] [Bug 1744950] glusterfs wrong size with total sum of brick. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744950 Nithya Balachandran changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|bugs at gluster.org |srakonde at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Wed Aug 28 02:53:43 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 28 Aug 2019 02:53:43 +0000 Subject: [Bugs] [Bug 1744950] glusterfs wrong size with total sum of brick. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744950 --- Comment #12 from Nithya Balachandran --- Please note 3.12 is EOL so I have used version 4.1 in the BZ. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 28 03:43:56 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 28 Aug 2019 03:43:56 +0000 Subject: [Bugs] [Bug 1746228] New: systemctl start glusterd is getting timed out on the scaled setup with 2000 volumes Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1746228 Bug ID: 1746228 Summary: systemctl start glusterd is getting timed out on the scaled setup with 2000 volumes Product: GlusterFS Version: mainline Hardware: x86_64 OS: Linux Status: NEW Component: glusterd Severity: medium Assignee: bugs at gluster.org Reporter: moagrawa at redhat.com CC: bmekala at redhat.com, bugs at gluster.org, rhs-bugs at redhat.com, sankarshan at redhat.com, storage-qa-internal at redhat.com, vbellur at redhat.com Depends On: 1746027 Target Milestone: --- Classification: Community Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1746027 [Bug 1746027] systemctl start glusterd is getting timed out on the scaled setup with 2000 volumes -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Wed Aug 28 03:44:31 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 28 Aug 2019 03:44:31 +0000 Subject: [Bugs] [Bug 1746228] systemctl start glusterd is getting timed out on the scaled setup with 2000 volumes In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1746228 Mohit Agrawal changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|bugs at gluster.org |moagrawa at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Wed Aug 28 03:47:51 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 28 Aug 2019 03:47:51 +0000 Subject: [Bugs] [Bug 1746228] systemctl start glusterd is getting timed out on the scaled setup with 2000 volumes In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1746228 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23316 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 28 03:47:52 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 28 Aug 2019 03:47:52 +0000 Subject: [Bugs] [Bug 1746228] systemctl start glusterd is getting timed out on the scaled setup with 2000 volumes In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1746228 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23316 (glusterd: glusterd service is getting timed out on scaled setup) posted (#1) for review on master by MOHIT AGRAWAL -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 28 04:16:13 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 28 Aug 2019 04:16:13 +0000 Subject: [Bugs] [Bug 1734027] glusterd 6.4 memory leaks 2-3 GB per 24h (OOM) In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734027 --- Comment #16 from Atin Mukherjee --- We observed a leak in 'volume status all detail' which was fixed through https://bugzilla.redhat.com/show_bug.cgi?id=1694610 and the fix went into 6.1. It's surprising that you're still seeing this leak in latest glusterfs-6 series. We'll try to reproduce this in house and get back. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 28 04:24:24 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 28 Aug 2019 04:24:24 +0000 Subject: [Bugs] [Bug 1741899] the volume of occupied space in the bricks of gluster volume (3 nodes replica) differs on nodes and the healing does not fix it In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741899 Atin Mukherjee changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |amukherj at redhat.com Component|glusterd |replicate Assignee|ksubrahm at redhat.com |bugs at gluster.org -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Wed Aug 28 04:26:34 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 28 Aug 2019 04:26:34 +0000 Subject: [Bugs] [Bug 1744950] glusterfs wrong size with total sum of brick. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744950 Atin Mukherjee changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |CLOSED CC| |amukherj at redhat.com Resolution|--- |CURRENTRELEASE Last Closed| |2019-08-28 04:26:34 --- Comment #13 from Atin Mukherjee --- We'd not need to keep this bug open given the root cause has been provided and it has been confirmed that this is same as BZ 1517260. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 28 04:50:53 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 28 Aug 2019 04:50:53 +0000 Subject: [Bugs] [Bug 1744950] glusterfs wrong size with total sum of brick. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744950 Atin Mukherjee changed: What |Removed |Added ---------------------------------------------------------------------------- Status|CLOSED |NEW Resolution|CURRENTRELEASE |--- Keywords| |Reopened --- Comment #14 from Atin Mukherjee --- Based on the discussion with Nithya, the fix went into 3.12.7, however this is reported against 3.12.15. So we need to cross check if there's any other code path where this issue still exists where the same hasn't been fixed in the latest releases. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 28 05:27:35 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 28 Aug 2019 05:27:35 +0000 Subject: [Bugs] [Bug 1741899] the volume of occupied space in the bricks of gluster volume (3 nodes replica) differs on nodes and the healing does not fix it In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741899 --- Comment #10 from Sergey Pleshkov --- (In reply to Ravishankar N from comment #8) > Looks like the discrepancy is due to the no. of files (738 to be specific) > amongst the bricks. The directories and symlinks and their checksums match > on all 3 bricks. The only fix I can think of is to find out (manually) which > are the files that differ in size and forcefully trigger a heal on them. You > could go through "Hack: How to trigger heal on *any* file/directory" section > of my blog-post > https://ravispeaks.wordpress.com/2019/05/14/gluster-afr-the-complete-guide- > part-3/ Hello Is there any proven way to compare files / folders on two nodes of a glaster to find different files? I tried using the "rsync -rin" command but it turned out to be ineffective for comparison (selects all files in general) -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Wed Aug 28 05:47:17 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 28 Aug 2019 05:47:17 +0000 Subject: [Bugs] [Bug 1746142] ctime: If atime is updated via utimensat syscall ctime is not getting updated In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1746142 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-28 05:47:17 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23314 (ctime: Fix ctime issue with utime family of syscalls) merged (#1) on release-7 by Kotresh HR -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 28 05:47:17 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 28 Aug 2019 05:47:17 +0000 Subject: [Bugs] [Bug 1743627] ctime: If atime is updated via utimensat syscall ctime is not getting updated In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743627 Bug 1743627 depends on bug 1746142, which changed state. Bug 1746142 Summary: ctime: If atime is updated via utimensat syscall ctime is not getting updated https://bugzilla.redhat.com/show_bug.cgi?id=1746142 What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 28 05:47:18 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 28 Aug 2019 05:47:18 +0000 Subject: [Bugs] [Bug 1746138] ctime: If atime is updated via utimensat syscall ctime is not getting updated In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1746138 Bug 1746138 depends on bug 1746142, which changed state. Bug 1746142 Summary: ctime: If atime is updated via utimensat syscall ctime is not getting updated https://bugzilla.redhat.com/show_bug.cgi?id=1746142 What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 28 05:50:07 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 28 Aug 2019 05:50:07 +0000 Subject: [Bugs] [Bug 1746145] CentOs 6 GlusterFS client creates files with time 01/01/1970 In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1746145 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-28 05:50:07 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23315 (ctime: Fix incorrect realtime passed to frame->root->ctime) merged (#2) on release-7 by Rinku Kothiya -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 28 06:24:52 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 28 Aug 2019 06:24:52 +0000 Subject: [Bugs] [Bug 1743634] geo-rep: Changelog archive file format is incorrect In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743634 Kshithij Iyer changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(khiremat at redhat.c | |om) -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 28 06:39:43 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 28 Aug 2019 06:39:43 +0000 Subject: [Bugs] [Bug 1741044] atime/mtime is not restored after healing for entry self heals In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741044 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-28 06:39:43 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23226 (afr: restore timestamp of parent dir during entry-heal) merged (#2) on release-6 by Ravishankar N -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 28 06:39:43 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 28 Aug 2019 06:39:43 +0000 Subject: [Bugs] [Bug 1741041] atime/mtime is not restored after healing for entry self heals In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741041 Bug 1741041 depends on bug 1741044, which changed state. Bug 1741044 Summary: atime/mtime is not restored after healing for entry self heals https://bugzilla.redhat.com/show_bug.cgi?id=1741044 What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 28 06:54:59 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 28 Aug 2019 06:54:59 +0000 Subject: [Bugs] [Bug 1375431] [RFE] enable sharding and strict-o-direct with virt profile - /var/lib/glusterd/groups/virt In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1375431 Krutika Dhananjay changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(kdhananj at redhat.c | |om) | --- Comment #17 from Krutika Dhananjay --- (In reply to Nir Soffer from comment #16) > Krutika, according to comment 10, remote-dio and enabling strict-o-direct > should be part of the virt group, but this bug was closed without adding > them. > > So it looks like this bug was closed without implementing the requested > feature. > > We seems to have issues like this: > https://bugzilla.redhat.com/show_bug.cgi?id=1737256#c10 > > Because strict-o-direct is not part of the virt group. > > Should we file a new RFE for including it in the virt group? Sorry about the late response. I was focusing all my attention on some cu cases past few days. Yeah, I think it's a valid point, given the amount of confusion around it. Could you file the bz and share the bug-id with me? I'll send a patch after that. -Krutika -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 28 06:57:03 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 28 Aug 2019 06:57:03 +0000 Subject: [Bugs] [Bug 1744883] GlusterFS problem dataloss In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744883 --- Comment #4 from Nicola battista --- Hi, The file exist : [root at cstore-pm01 ~]# ls -lthr /usr/local/mariadb/columnstore/gluster/brick2/000.dir/000.dir/015.dir/064.dir/008.dir/FILE002.cdf -rw-r--r-- 2 root root 2.1G May 30 16:55 /usr/local/mariadb/columnstore/gluster/brick2/000.dir/000.dir/015.dir/064.dir/008.dir/FILE002.cdf Thanks, best regards Nicola Battista -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Wed Aug 28 07:00:37 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 28 Aug 2019 07:00:37 +0000 Subject: [Bugs] [Bug 1744883] GlusterFS problem dataloss In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744883 Nithya Balachandran changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |nbalacha at redhat.com Flags| |needinfo?(nicola.battista89 | |@gmail.com) --- Comment #5 from Nithya Balachandran --- Please provide the following: On the volume on which you see this issue (dbroot2 based on the above comment and the volume info provided earlier): 1. The ls -l /000.dir/000.dir/015.dir/064.dir/008.dir/ from the the gluster mount point 2. the ls -l output for the same directory on each brick of the volume Do you see any error messages in the gluster client mount log when you perform the stat? -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Wed Aug 28 07:32:00 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 28 Aug 2019 07:32:00 +0000 Subject: [Bugs] [Bug 1746320] New: SHORT-WRITE error leads to crash Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1746320 Bug ID: 1746320 Summary: SHORT-WRITE error leads to crash Product: GlusterFS Version: mainline Status: NEW Component: error-gen Assignee: bugs at gluster.org Reporter: pkarampu at redhat.com CC: bugs at gluster.org Target Milestone: --- Classification: Community Description of problem: In error-gen xlator when we use short-write error, it leads to crash. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. Edit fuse volume file to include error-gen xlator between write-behind and dht 2. Make the following changes to mount volfile. ... volume patchy-utime type features/utime option noatime on subvolumes patchy-dht end-volume volume patchy-error-gen type debug/error-gen option failure 50 option enable WRITE subvolumes patchy-utime end-volume volume patchy-write-behind type performance/write-behind subvolumes patchy-error-gen end-volume ... 3. Mount the volume and run the following command: for i in {1..10}; do dd if=/dev/null of=$i bs=1M count=10 conv=fsync; done Thread 9 "glfs_iotwr000" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7f3c44cbd700 (LWP 32368)] 0x00007f3c5722ef83 in __memmove_avx_unaligned_erms () from /lib64/libc.so.6 (gdb) bt #0 0x00007f3c5722ef83 in __memmove_avx_unaligned_erms () from /lib64/libc.so.6 #1 0x00007f3c457f4754 in ec_iov_copy_to (dst=0x7f3c569c0000, vector=0x7f3c3c002720, count=2, offset=0, size=117) at ec-helpers.c:118 #2 0x00007f3c458265e6 in ec_writev_prepare_buffers (ec=0x7f3c400663d0, fop=0x7f3c3000d930) at ec-inode-write.c:1849 #3 0x00007f3c45826ce0 in ec_writev_start (fop=0x7f3c3000d930) at ec-inode-write.c:2027 #4 0x00007f3c45827985 in ec_manager_writev (fop=0x7f3c3000d930, state=3) at ec-inode-write.c:2176 #5 0x00007f3c457fd311 in __ec_manager (fop=0x7f3c3000d930, error=0) at ec-common.c:2945 #6 0x00007f3c457fd41d in ec_manager (fop=0x7f3c3000d930, error=0) at ec-common.c:2963 #7 0x00007f3c458282d0 in ec_writev (frame=0x7f3c3001d830, this=0x7f3c4001c4a0, target=18446744073709551615, fop_flags=128, func=0x7f3c578c869e , data=0x0, fd=0x7f3c3c004340, vector=0x7f3c3c006510, count=2, offset=1703936, flags=32769, iobref=0x7f3c3c00b420, xdata=0x0) at ec-inode-write.c:2368 #8 0x00007f3c457f1d35 in ec_gf_writev (frame=0x7f3c3001d830, this=0x7f3c4001c4a0, fd=0x7f3c3c004340, vector=0x7f3c3c006510, count=2, offset=1703936, flags=32769, iobref=0x7f3c3c00b420, xdata=0x0) at ec.c:1341 #9 0x00007f3c578e02f1 in default_writev (frame=0x7f3c3001d830, this=0x7f3c4001e6c0, fd=0x7f3c3c004340, vector=0x7f3c3c006510, count=2, off=1703936, flags=32769, iobref=0x7f3c3c00b420, xdata=0x0) at defaults.c:2550 #10 0x00007f3c469e7400 in gf_utime_writev (frame=0x7f3c300088e0, this=0x7f3c40020690, fd=0x7f3c3c004340, vector=0x7f3c3c006510, count=2, off=1703936, flags=32769, iobref=0x7f3c3c00b420, xdata=0x0) at utime-autogen-fops.c:81 --Type for more, q to quit, c to continue without paging-- #11 0x00007f3c456e117d in error_gen_writev (frame=0x7f3c300088e0, this=0x7f3c400223d0, fd=0x7f3c3c004340, vector=0x7f3c44cbb760, count=2, off=1703936, flags=32769, iobref=0x7f3c3c00b420, xdata=0x0) at error-gen.c:771 #12 0x00007f3c456c1e3a in wb_fulfill_head (wb_inode=0x7f3c30002ce0, head=0x7f3c3000be40) at write-behind.c:1159 #13 0x00007f3c456c208d in wb_fulfill (wb_inode=0x7f3c30002ce0, liabilities=0x7f3c44cbb8b0) at write-behind.c:1216 #14 0x00007f3c456c3ba9 in wb_process_queue (wb_inode=0x7f3c30002ce0) at write-behind.c:1784 #15 0x00007f3c456c4675 in wb_writev (frame=0x7f3c3000fd90, this=0x7f3c40024320, fd=0x7f3c3c004340, vector=0x7f3c3c006580, count=1, offset=1966080, flags=32769, iobref=0x7f3c3c0068d0, xdata=0x0) at write-behind.c:1893 #16 0x00007f3c456ac01f in ra_writev (frame=0x7f3c300041f0, this=0x7f3c40025fd0, fd=0x7f3c3c004340, vector=0x7f3c3c006580, count=1, offset=1966080, flags=32769, iobref=0x7f3c3c0068d0, xdata=0x0) at read-ahead.c:650 #17 0x00007f3c45697161 in rda_writev (frame=0x7f3c30011c60, this=0x7f3c40027ba0, fd=0x7f3c3c004340, vector=0x7f3c3c006580, count=1, off=1966080, flags=32769, iobref=0x7f3c3c0068d0, xdata=0x0) at readdir-ahead.c:786 #18 0x00007f3c4567dcbb in ioc_writev (frame=0x7f3c3000b130, this=0x7f3c40029c70, fd=0x7f3c3c004340, vector=0x7f3c3c006580, count=1, offset=1966080, flags=32769, iobref=0x7f3c3c0068d0, xdata=0x0) at io-cache.c:1305 #19 0x00007f3c578d4798 in default_writev_resume (frame=0x7f3c3001d830, this=0x7f3c4002b850, fd=0x7f3c3c004340, vector=0x7f3c3c006580, count=1, off=1966080, flags=32769, iobref=0x7f3c3c0068d0, xdata=0x0) at defaults.c:1831 --Type for more, q to quit, c to continue without paging-- #20 0x00007f3c5782533b in call_resume_wind (stub=0x7f3c3001b6f0) at call-stub.c:2085 #21 0x00007f3c5783710c in call_resume (stub=0x7f3c3001b6f0) at call-stub.c:2555 #22 0x00007f3c45665362 in open_and_resume (this=0x7f3c4002b850, fd=0x7f3c3c004340, stub=0x7f3c3001b6f0) at open-behind.c:480 #23 0x00007f3c45666a5c in ob_writev (frame=0x7f3c3001d830, this=0x7f3c4002b850, fd=0x7f3c3c004340, iov=0x7f3c3c001e30, count=1, offset=1966080, flags=32769, iobref=0x7f3c3c0068d0, xdata=0x0) at open-behind.c:678 #24 0x00007f3c4565545d in qr_writev (frame=0x7f3c300088e0, this=0x7f3c4002d440, fd=0x7f3c3c004340, iov=0x7f3c3c001e30, count=1, offset=1966080, flags=32769, iobref=0x7f3c3c0068d0, xdata=0x0) at quick-read.c:849 #25 0x00007f3c4563601e in mdc_writev (frame=0x7f3c300066d0, this=0x7f3c4002f010, fd=0x7f3c3c004340, vector=0x7f3c3c001e30, count=1, offset=1966080, flags=32769, iobref=0x7f3c3c0068d0, xdata=0x0) at md-cache.c:2082 #26 0x00007f3c578d4798 in default_writev_resume (frame=0x7f3c3c006df0, this=0x7f3c40030c00, fd=0x7f3c3c004340, vector=0x7f3c3c001e30, count=1, off=1966080, flags=32769, iobref=0x7f3c3c0068d0, xdata=0x0) at defaults.c:1831 #27 0x00007f3c5782533b in call_resume_wind (stub=0x7f3c3c009950) at call-stub.c:2085 #28 0x00007f3c5783710c in call_resume (stub=0x7f3c3c009950) at call-stub.c:2555 #29 0x00007f3c4561b372 in iot_worker (data=0x7f3c40040a30) at io-threads.c:232 #30 0x00007f3c5757e5a2 in start_thread () from /lib64/libpthread.so.0 #31 0x00007f3c571cb023 in clone () from /lib64/libc.so.6 (gdb) fr 11 #11 0x00007f3c456e117d in error_gen_writev (frame=0x7f3c300088e0, this=0x7f3c400223d0, fd=0x7f3c3c004340, vector=0x7f3c44cbb760, count=2, off=1703936, flags=32769, iobref=0x7f3c3c00b420, xdata=0x0) at error-gen.c:771 771 STACK_WIND_TAIL(frame, FIRST_CHILD(this), FIRST_CHILD(this)->fops->writev, (gdb) p count $1 = 2 Actual results: Expected results: Additional info: -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Wed Aug 28 07:34:58 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 28 Aug 2019 07:34:58 +0000 Subject: [Bugs] [Bug 1744883] GlusterFS problem dataloss In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744883 Nicola battista changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(nicola.battista89 | |@gmail.com) | --- Comment #6 from Nicola battista --- Hi, [root at cstore-pm03 ~]# ls -l /usr/local/mariadb/columnstore/gluster/brick2/000.dir/000.dir/015.dir/064.dir/008.dir total 2097420 -rw-r--r-- 2 root root 2147753984 May 30 16:55 FILE002.cdf [root at cstore-pm01 ~]# ls -l /usr/local/mariadb/columnstore/gluster/brick2/000.dir/000.dir/015.dir/064.dir/008.dir/ total 2097420 -rw-r--r-- 2 root root 2147753984 May 30 16:55 FILE002.cdf [root at cstore-pm02 ~]# ls -l /usr/local/mariadb/columnstore/gluster/brick2/000.dir/000.dir/015.dir/064.dir/008.dir/ total 2097420 -rw-r--r-- 2 root root 2147753984 May 30 16:55 FILE002.cdf -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Wed Aug 28 07:37:51 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 28 Aug 2019 07:37:51 +0000 Subject: [Bugs] [Bug 1746320] SHORT-WRITE error leads to crash In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1746320 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23318 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Wed Aug 28 07:37:53 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 28 Aug 2019 07:37:53 +0000 Subject: [Bugs] [Bug 1746320] SHORT-WRITE error leads to crash In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1746320 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23318 (debug/error-gen: Set count correctly for short-writes) posted (#1) for review on master by Pranith Kumar Karampuri -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Wed Aug 28 08:05:52 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 28 Aug 2019 08:05:52 +0000 Subject: [Bugs] [Bug 1744883] GlusterFS problem dataloss In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744883 Nithya Balachandran changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(nicola.battista89 | |@gmail.com) --- Comment #7 from Nithya Balachandran --- (In reply to Nicola battista from comment #6) > Hi, > > [root at cstore-pm03 ~]# ls -l > /usr/local/mariadb/columnstore/gluster/brick2/000.dir/000.dir/015.dir/064. > dir/008.dir > total 2097420 > -rw-r--r-- 2 root root 2147753984 May 30 16:55 FILE002.cdf > > [root at cstore-pm01 ~]# ls -l > /usr/local/mariadb/columnstore/gluster/brick2/000.dir/000.dir/015.dir/064. > dir/008.dir/ > total 2097420 > -rw-r--r-- 2 root root 2147753984 May 30 16:55 FILE002.cdf > > [root at cstore-pm02 ~]# ls -l > /usr/local/mariadb/columnstore/gluster/brick2/000.dir/000.dir/015.dir/064. > dir/008.dir/ > total 2097420 > -rw-r--r-- 2 root root 2147753984 May 30 16:55 FILE002.cdf Hi, What does ls -l return from the client mount point? Please also provide the xattrs set on this file on each brick. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Wed Aug 28 08:31:57 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 28 Aug 2019 08:31:57 +0000 Subject: [Bugs] [Bug 1743782] Windows client fails to copy large file to GlusterFS volume share with fruit and streams_xattr VFS modules via Samba In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743782 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-28 08:31:57 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23279 (performance/md-cache: Do not skip caching of null character xattr values) merged (#2) on release-6 by hari gowtham -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 28 08:34:17 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 28 Aug 2019 08:34:17 +0000 Subject: [Bugs] [Bug 1740525] event: rename event_XXX with gf_ prefixed to avoid crash when apps linked libevent at the same time In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1740525 hari gowtham changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |hgowtham at redhat.com --- Comment #2 from hari gowtham --- The bug for mainline is: https://github.com/gluster/glusterfs/issues/699 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Wed Aug 28 08:34:28 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 28 Aug 2019 08:34:28 +0000 Subject: [Bugs] [Bug 1193929] GlusterFS can be improved In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1193929 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23319 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Wed Aug 28 08:34:30 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 28 Aug 2019 08:34:30 +0000 Subject: [Bugs] [Bug 1193929] GlusterFS can be improved In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1193929 --- Comment #747 from Worker Ant --- REVIEW: https://review.gluster.org/23319 ([WIP][RFC]dht-common.c: remove some strcat and temp buffers) posted (#1) for review on master by Yaniv Kaul -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Wed Aug 28 08:34:57 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 28 Aug 2019 08:34:57 +0000 Subject: [Bugs] [Bug 1726175] CentOs 6 GlusterFS client creates files with time 01/01/1970 In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1726175 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-28 08:34:57 --- Comment #6 from Worker Ant --- REVIEW: https://review.gluster.org/23313 (ctime: Fix incorrect realtime passed to frame->root->ctime) merged (#2) on release-6 by hari gowtham -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 28 08:34:57 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 28 Aug 2019 08:34:57 +0000 Subject: [Bugs] [Bug 1743652] CentOs 6 GlusterFS client creates files with time 01/01/1970 In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743652 Bug 1743652 depends on bug 1726175, which changed state. Bug 1726175 Summary: CentOs 6 GlusterFS client creates files with time 01/01/1970 https://bugzilla.redhat.com/show_bug.cgi?id=1726175 What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 28 08:34:58 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 28 Aug 2019 08:34:58 +0000 Subject: [Bugs] [Bug 1746145] CentOs 6 GlusterFS client creates files with time 01/01/1970 In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1746145 Bug 1746145 depends on bug 1726175, which changed state. Bug 1726175 Summary: CentOs 6 GlusterFS client creates files with time 01/01/1970 https://bugzilla.redhat.com/show_bug.cgi?id=1726175 What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 28 08:35:23 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 28 Aug 2019 08:35:23 +0000 Subject: [Bugs] [Bug 1740525] event: rename event_XXX with gf_ prefixed to avoid crash when apps linked libevent at the same time In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1740525 --- Comment #3 from Worker Ant --- REVIEW: https://review.gluster.org/23219 (event: rename event_XXX with gf_ prefixed) merged (#2) on release-6 by hari gowtham -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Wed Aug 28 09:05:45 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 28 Aug 2019 09:05:45 +0000 Subject: [Bugs] [Bug 1744883] GlusterFS problem dataloss In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744883 Nicola battista changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(nicola.battista89 | |@gmail.com) | --- Comment #8 from Nicola battista --- Hi, [root at cstore-pm03 ~]# df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 16G 0 16G 0% /dev tmpfs 16G 11M 16G 1% /dev/shm tmpfs 16G 9.0M 16G 1% /run tmpfs 16G 0 16G 0% /sys/fs/cgroup /dev/mapper/ol-root 46G 6.1G 40G 14% / /dev/mapper/glusterfs_dbroot1-brick1 400G 224G 177G 56% /usr/local/mariadb/columnstore/gluster/brick1 /dev/mapper/glusterfs_dbroot2-brick2 400G 96G 304G 24% /usr/local/mariadb/columnstore/gluster/brick2 /dev/mapper/glusterfs_dbroot3-brick3 400G 92G 309G 23% /usr/local/mariadb/columnstore/gluster/brick3 /dev/sda1 497M 232M 266M 47% /boot tmpfs 3.2G 0 3.2G 0% /run/user/0 172.16.31.7:/dbroot3 400G 96G 305G 24% /usr/local/mariadb/columnstore/data3 172.16.31.7:/dbroot2 400G 100G 300G 25% /usr/local/mariadb/columnstore/data2 172.16.31.7:/dbroot1 400G 228G 173G 57% /usr/local/mariadb/columnstore/data1 [root at cstore-pm03 ~]# ls -lhtr /usr/local/mariadb/columnstore/data2/000.dir/000.dir/015.dir/064.dir/008.dir/FILE002.cdf -rw-r--r-- 1 root root 2.1G May 30 16:55 /usr/local/mariadb/columnstore/data2/000.dir/000.dir/015.dir/064.dir/008.dir/FILE002.cdf [root at cstore-pm02 ~]# ls -lhtr /usr/local/mariadb/columnstore/data2/000.dir/000.dir/015.dir/064.dir/008.dir/FILE002.cdf -rw-r--r-- 1 root root 2.1G May 30 16:55 /usr/local/mariadb/columnstore/data2/000.dir/000.dir/015.dir/064.dir/008.dir/FILE002.cdf [root at cstore-pm02 ~]# ls -lhtr /usr/local/mariadb/columnstore/data2/000.dir/000.dir/015.dir/064.dir/008.dir/FILE002.cdf -rw-r--r-- 1 root root 2.1G May 30 16:55 /usr/local/mariadb/columnstore/data2/000.dir/000.dir/015.dir/064.dir/008.dir/FILE002.cdf Could you explain this step : Please also provide the xattrs set on this file on each brick. I've execute xattr -l /usr/local/mariadb/columnstore/data2/000.dir/000.dir/015.dir/064.dir/008.dir/FILE002.cdf but not have output. Regards. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Wed Aug 28 09:07:46 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 28 Aug 2019 09:07:46 +0000 Subject: [Bugs] [Bug 1744950] glusterfs wrong size with total sum of brick. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744950 Sanju changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED Flags| |needinfo?(liuruit at gmail.com | |) --- Comment #15 from Sanju --- As the user is saying he expanded the cluster, I checked the add-brick code path and I don't see any problem here: excerpt from glusterd_op_perform_add_bricks : ret = glusterd_resolve_brick (brickinfo); if (ret) goto out; if (!gf_uuid_compare (brickinfo->uuid, MY_UUID)) { ret = sys_statvfs (brickinfo->path, &brickstat); if (ret) { gf_msg (this->name, GF_LOG_ERROR, errno, GD_MSG_STATVFS_FAILED, "Failed to fetch disk utilization " "from the brick (%s:%s). Please check the health of " "the brick. Error code was %s", brickinfo->hostname, brickinfo->path, strerror (errno)); goto out; } brickinfo->statfs_fsid = brickstat.f_fsid; } Did you upgrade from some older gluster release to 3.12.15? If you have upgraded from the older version, you are hitting https://bugzilla.redhat.com/show_bug.cgi?id=1632889. This bug got fixed in release-6 and backported to release-4 and release-5 branches. Thanks, Sanju -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Wed Aug 28 09:44:48 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 28 Aug 2019 09:44:48 +0000 Subject: [Bugs] [Bug 1744883] GlusterFS problem dataloss In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744883 Nithya Balachandran changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |needinfo?(nicola.battista89 | |@gmail.com) --- Comment #9 from Nithya Balachandran --- Hi, How are you accessing the volume? I'm assuming you are using a fuse mount. I would need to see the ls -l output for the same directory from that client fuse mount. The ls -l you have provided is from directly on the bricks. We now need to compare that information with the view the client sees. As for the xattrs, please use the command getfattr -e hex -m . -d for the dir on each brick. If you are on #gluster on IRC, it might be easier to sync up there. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Wed Aug 28 09:48:31 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 28 Aug 2019 09:48:31 +0000 Subject: [Bugs] [Bug 1746368] New: Use rwlock to protect inode table Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1746368 Bug ID: 1746368 Summary: Use rwlock to protect inode table Product: GlusterFS Version: mainline Hardware: All OS: Linux Status: NEW Component: locks Severity: medium Assignee: bugs at gluster.org Reporter: gechangwei at live.cn CC: bugs at gluster.org Target Milestone: --- Classification: Community Description of problem: Presently, inode table is protected by mutex which effects glusterfs concurrency. Using rwlock to protect inode would be better,. Version-Release number of selected component (if applicable): How reproducible: Not a bug, no need to reproduce. Steps to Reproduce: 1. 2. 3. A lock usage optimization, not applicable for below items. Actual results: Expected results: Additional info: -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Wed Aug 28 11:44:58 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 28 Aug 2019 11:44:58 +0000 Subject: [Bugs] [Bug 1744883] GlusterFS problem dataloss In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744883 Nicola battista changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(nicola.battista89 | |@gmail.com) | --- Comment #10 from Nicola battista --- Hi, I'm using the fuse mount. fstab mount : 172.16.31.5:/dbroot3 /usr/local/mariadb/columnstore/data3 glusterfs defaults,direct-io-mode=enable 00 172.16.31.5:/dbroot2 /usr/local/mariadb/columnstore/data2 glusterfs defaults,direct-io-mode=enable 00 172.16.31.5:/dbroot1 /usr/local/mariadb/columnstore/data1 glusterfs defaults,direct-io-mode=enable 00 The path /usr/local/mariadb/columnstore/data2/ is mount client of the brick. [root at cstore-pm01 ~]# getfattr -e hex -m . -d /usr/local/mariadb/columnstore/gluster/brick2 getfattr: Removing leading '/' from absolute path names # file: usr/local/mariadb/columnstore/gluster/brick2 trusted.afr.dbroot2-client-1=0x000000000000000000000000 trusted.afr.dbroot2-client-2=0x000000000000000000000000 trusted.afr.dirty=0x000000000000000000000000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.mdata=0x010000000000000000000000005c94a223000000002de177a2000000005c94a223000000002de177a2000000005c2e0f3f0000000007d401b3 trusted.glusterfs.volume-id=0xf2b49f9f3a914ac48eb34a327d0dbc61 [root at cstore-pm01 ~]# getfattr -e hex -m . -d /usr/local/mariadb/columnstore/gluster/brick1 getfattr: Removing leading '/' from absolute path names # file: usr/local/mariadb/columnstore/gluster/brick1 trusted.afr.dbroot1-client-1=0x000000000000000000000000 trusted.afr.dirty=0x000000000000000000000000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.mdata=0x010000000000000000000000005c94a1f600000000200d67fc000000005c94a1f600000000200d67fc000000005c2e049d00000000101aff76 trusted.glusterfs.volume-id=0xecf4fd042e9647d98a404f84a48657fb [root at cstore-pm01 ~]# getfattr -e hex -m . -d /usr/local/mariadb/columnstore/gluster/brick3 getfattr: Removing leading '/' from absolute path names # file: usr/local/mariadb/columnstore/gluster/brick3 trusted.afr.dbroot3-client-1=0x000000000000000000000000 trusted.afr.dbroot3-client-2=0x000000000000000000000000 trusted.afr.dirty=0x000000000000000000000000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.volume-id=0x73b96917c8424fc28bca099735c4aa6a -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Wed Aug 28 13:48:05 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 28 Aug 2019 13:48:05 +0000 Subject: [Bugs] [Bug 1743215] glusterd-utils: 0-management: xfs_info exited with non-zero exit status [Permission denied] In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743215 J?hann B. Gu?mundsson changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(johannbg at gmail.co | |m) | --- Comment #2 from J?hann B. Gu?mundsson --- There is no such thing as xfs_prog in Fedora so presumably you just meant xfs_info which triggers this error if not you probably need to ping Eric. # gluster volume info Volume Name: virt01 Type: Replicate Volume ID: Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: :/srv/glusterfs/images Brick2: :/srv/glusterfs/images Brick3: :/srv/glusterfs/images Options Reconfigured: performance.client-io-threads: off nfs.disable: on transport.address-family: inet server.allow-insecure: on The glusterfs mount point is the same on all hosts localhost:/virt01 7.3T 149G 7.2T 2% /var/lib/libvirt/images If I run it against the mapped device # xfs_info /dev/mapper/ht_gluster01-lv_gluster01 xfs_info: /dev/mapper/ht_gluster01-lv_gluster01 contains a mounted filesystem fatal error -- couldn't initialize XFS library If I run it against the mapped mount point for the device ( /srv/glusterfs/ ) # xfs_info /srv/glusterfs/ meta-data=/dev/mapper/ht_gluster01-lv_gluster01 isize=512 agcount=32, agsize=61047296 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=0 data = bsize=4096 blocks=1953513472, imaxpct=5 = sunit=32 swidth=320 blks naming =version 2 bsize=8192 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=521728, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 If I run it against the Brick path which is a directory called images and resides under /srv/glusterfs/ xfs mount point ( /srv/glusterfs/images ) xfs_info /srv/glusterfs/images /srv/glusterfs/images: Not a XFS mount point. Note that none of the error msg xfs_info provided are permissions error -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Wed Aug 28 16:01:07 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 28 Aug 2019 16:01:07 +0000 Subject: [Bugs] [Bug 1716979] Multiple disconnect events being propagated for the same child In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1716979 --- Comment #14 from Amgad --- Thanks Hari: >we have backported the patches for this bug to every active release branch. What does exactly mean? does it mean the bug is in 6.3-1 now for instance? or 5.5-1? Regards, Amgad -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Wed Aug 28 16:43:09 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 28 Aug 2019 16:43:09 +0000 Subject: [Bugs] [Bug 1193929] GlusterFS can be improved In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1193929 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23324 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Wed Aug 28 16:43:10 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 28 Aug 2019 16:43:10 +0000 Subject: [Bugs] [Bug 1193929] GlusterFS can be improved In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1193929 --- Comment #748 from Worker Ant --- REVIEW: https://review.gluster.org/23324 ([WIP]posix*.c: remove unneeded strlen() calls) posted (#1) for review on master by Yaniv Kaul -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Wed Aug 28 19:55:02 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 28 Aug 2019 19:55:02 +0000 Subject: [Bugs] [Bug 1739320] The result (hostname) of getnameinfo for all bricks (ipv6 addresses) are the same, while they are not. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739320 --- Comment #9 from Amgad --- Hi Amar /GlusterFS team I was busy addressing other development issues - back to the this IPv6 one. In this problem, the volume is created thru heketi and failed at the "glusterd-volume-ops.c" file when "glusterd_compare_addrinfo" is called. In a different test (system is configured with pure IPv6), where volumes were generated using gluster CLI, the volumes are created at different servers, but "glustershd" failed to come up with the following errors: [2019-08-28 19:11:36.645541] I [MSGID: 100030] [glusterfsd.c:2847:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 6.5 (args: /usr/sbin/glusterfs -s 2001:db8:1234::8 --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/3a1e3977fd7318f2.socket --xlator-option *replicate*.node-uuid=8e2b40a7-098c-4f0a-b323-2e764bd315f3 --process-name glustershd --client-pid=-6) [2019-08-28 19:11:36.646207] I [glusterfsd.c:2556:daemonize] 0-glusterfs: Pid of current running process is 26375 [2019-08-28 19:11:36.655872] I [socket.c:902:__socket_server_bind] 0-socket.glusterfsd: closing (AF_UNIX) reuse check socket 9 [2019-08-28 19:11:36.656708] E [MSGID: 101075] [common-utils.c:508:gf_resolve_ip6] 0-resolver: getaddrinfo failed (family:2) (Address family for hostname not supported) [2019-08-28 19:11:36.656730] E [name.c:258:af_inet_client_get_remote_sockaddr] 0-glusterfs: DNS resolution failed on host 2001:db8:1234::8 [2019-08-28 19:11:36.658459] I [MSGID: 101190] [event-epoll.c:680:event_dispatch_epoll_worker] 0-epoll: Started thread with index 0 [2019-08-28 19:11:36.658744] I [glusterfsd-mgmt.c:2443:mgmt_rpc_notify] 0-glusterfsd-mgmt: disconnected from remote-host: 2001:db8:1234::8 [2019-08-28 19:11:36.658766] I [glusterfsd-mgmt.c:2463:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers [2019-08-28 19:11:36.658832] I [MSGID: 101190] [event-epoll.c:680:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2019-08-28 19:11:36.659376] W [glusterfsd.c:1570:cleanup_and_exit] (-->/lib64/libgfrpc.so.0(+0xf1d3) [0x7f61e883a1d3] -->/usr/sbin/glusterfs(+0x12fef) [0x5653bb1c9fef] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x5653bb1c201b] ) 0-: received signum (1), shutting down It indcates that function "gf_resolve_ip6" in the "common-utils.c" is failing becuase of (family:2) -- since the IP is IPv6, the family should be 10, not 2 and thus it failed as family:2 not supported. same for "af_inet_client_get_remote_sockaddr". Any suggestion what could be passing the family as "2" (IPv4) rather than "10" (IPv6)? Regards, Amgad -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Wed Aug 28 21:46:03 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Wed, 28 Aug 2019 21:46:03 +0000 Subject: [Bugs] [Bug 1739320] The result (hostname) of getnameinfo for all bricks (ipv6 addresses) are the same, while they are not. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739320 --- Comment #10 from Amgad --- GlusterFS team: Can someone check urgently if "hints.ai_family" in these function calls is set to "AF_INET6" and not "AF_UNSPEC" to force version? Regards, Amgad -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 29 00:21:07 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 29 Aug 2019 00:21:07 +0000 Subject: [Bugs] [Bug 1746615] New: SSL Volumes Fail Intermittently in 6.5 Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1746615 Bug ID: 1746615 Summary: SSL Volumes Fail Intermittently in 6.5 Product: GlusterFS Version: 6 Hardware: x86_64 OS: Linux Status: NEW Component: glusterd Assignee: bugs at gluster.org Reporter: billycole at mail.com CC: bugs at gluster.org Target Milestone: --- Classification: Community Description of problem: Volumes fail to mount properly with client/server.ssl enabled on volumes. This seems to apply to multiple volume types, though have only tested it with distributed and dispersed. The mount command succeeds, but accessing the volume gives several intermittent "Transport endpoint is not connected" errors. This results in odd behavior such as `ls` returning nothing, then erroring, then occasionally returning a result. Similarly, when issuing `df` commands in succession on the mount, it will start reporting the full drive size, then slowly "shrink" until it starts to throw "transport endpoint is not connected" errors. [test at ip-10-10-30-220 ~]$ df -h /gscratch Filesystem Size Used Avail Use% Mounted on ip-10-10-31-10.ec2.internal:/scratch 44T 496G 44T 2% /gscratch [test at ip-10-10-30-220 ~]$ df -h /gscratch Filesystem Size Used Avail Use% Mounted on ip-10-10-31-10.ec2.internal:/scratch 44T 496G 44T 2% /gscratch [test at ip-10-10-30-220 ~]$ df -h /gscratch Filesystem Size Used Avail Use% Mounted on ip-10-10-31-10.ec2.internal:/scratch 44T 496G 44T 2% /gscratch [test at ip-10-10-30-220 ~]$ df -h /gscratch Filesystem Size Used Avail Use% Mounted on ip-10-10-31-10.ec2.internal:/scratch 44T 496G 44T 2% /gscratch [test at ip-10-10-30-220 ~]$ df -h /gscratch Errors. It almost seems as if the connection is established and then immediately killed after an attempt to push data over it, and waiting a few seconds causes the connections to re-establish. Disabling the "client.ssl" and "server.ssl" settings on the volume cause these errors to go away. Version-Release number of selected component (if applicable): glusterfs 6.5 How reproducible: It seems to be consistent on the cluster that I have. Steps to Reproduce: 1. Follow docs here on setting up certs: https://docs.gluster.org/en/latest/Administrator%20Guide/SSL/ 2. Create new volume, enable client ssl and server ssl. Start volume. 3. Mount volume on client. 4. Try to create a new file on the mount, ls the drive, or issue the df command. Actual results: Intermittent transport errors. Expected results: The drive should be mountable. Additional info: -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 29 03:57:54 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 29 Aug 2019 03:57:54 +0000 Subject: [Bugs] [Bug 1741899] the volume of occupied space in the bricks of gluster volume (3 nodes replica) differs on nodes and the healing does not fix it In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741899 --- Comment #11 from Ravishankar N --- (In reply to Sergey Pleshkov from comment #10) > Hello > > Is there any proven way to compare files / folders on two nodes of a glaster > to find different files? > I tried using the "rsync -rin" command but it turned out to be ineffective > for comparison (selects all files in general) To compare just the directory structure (to find what files are missing), maybe you could run `diff <(ssh root at lsy-gl-01 ls -R /diskForTestData/tst) <(ssh root at lsy-gl-02 ls -R /diskForTestData/tst)` etc. after setting up password-less ssh. You would need to ignore the contents of .glusterfs though. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 29 04:00:50 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 29 Aug 2019 04:00:50 +0000 Subject: [Bugs] [Bug 1734423] interrupts leak memory In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734423 Prashant Dhange changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |pdhange at redhat.com --- Comment #7 from Prashant Dhange --- *** Bug 1737674 has been marked as a duplicate of this bug. *** -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 29 04:08:14 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 29 Aug 2019 04:08:14 +0000 Subject: [Bugs] [Bug 1734423] interrupts leak memory In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734423 Prashant Dhange changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |bkunal at redhat.com Flags| |needinfo?(bkunal at redhat.com | |) -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 29 04:11:43 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 29 Aug 2019 04:11:43 +0000 Subject: [Bugs] [Bug 1746368] Use rwlock to protect inode table In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1746368 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23325 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 29 04:11:44 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 29 Aug 2019 04:11:44 +0000 Subject: [Bugs] [Bug 1746368] Use rwlock to protect inode table In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1746368 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23325 (libglusterfs: use rwlock to protect inode table) posted (#1) for review on master by None -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 29 04:15:00 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 29 Aug 2019 04:15:00 +0000 Subject: [Bugs] [Bug 1734423] interrupts leak memory In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734423 Prashant Dhange changed: What |Removed |Added ---------------------------------------------------------------------------- Group| |redhat, private -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 29 05:54:06 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 29 Aug 2019 05:54:06 +0000 Subject: [Bugs] [Bug 1744883] GlusterFS problem dataloss In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744883 Ravishankar N changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |ravishankar at redhat.com Flags| |needinfo?(nicola.battista89 | |@gmail.com) --- Comment #11 from Ravishankar N --- Hi Nicola, a) Can you provide the gluster fuse mount log of the node which is used by the application in comment #3? If this is something you can reproduce at will with your application, please provide debug-level log. Steps: 1 `gluster volume set dbroot2 client-log-level DEBUG` 2 Run the application and note the timestamp (UTC) at which it gets ENOENT 3 Provide the fuse mount log (something like /var/log/glusterfs/usr-local-mariadb-columnstore-data2.log) 4 Also tell us the time noted in step-2 to make it easier to look for issues in the log of step-3. 5.`gluster volume set dbroot2 client-log-level INFO` <===== Restores it back to the default log level. b) The getfattr output we need is that of the file in question from all bricks of the volume. comment#10 seems to give the output on the brick root of different volumes. What we need is: `getfattr -e hex -m . -d /usr/local/mariadb/columnstore/gluster/brick2/000.dir/000.dir/015.dir/064.dir/008.dir/FILE002.cdf` from 3 bricks of dbroot2. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 29 06:11:44 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 29 Aug 2019 06:11:44 +0000 Subject: [Bugs] [Bug 1738878] FUSE client's memory leak In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1738878 --- Comment #8 from Sergey Pleshkov --- Created attachment 1609209 --> https://bugzilla.redhat.com/attachment.cgi?id=1609209&action=edit statedump1 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 29 06:29:28 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 29 Aug 2019 06:29:28 +0000 Subject: [Bugs] [Bug 1738878] FUSE client's memory leak In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1738878 --- Comment #9 from Sergey Pleshkov --- Hello Yesterday I ran tests on a problem client. These were the find and chmod commands on gluster share. Actually, the process of the glusterfs continiously eats RAM on them and does not free it away. On another client that uses glusterfs version 3.12.2 (from RHEL7 repo), I also encountered a similar situation - glusterfs process eats RAM and it is also not free it ( but it is eaten very slowly) On other clients that access the same gluster volume, when performing tests with find and chmod command, RAM is also eaten up, but freed when the tests are turned off. I collected a few state dumps from a problem client and put it in the cloud. https://cloud.hostco.ru/s/w9MY6jj5Hpj2qoa In the near future I plan to upgrade glusterfs on client to version 6.5 and set lru-limit (don't know what i can do about this problem). Do you have any advise about it ? Script to reproduce problem: #!/bin/sh a=0 while [ $a -lt 36000 ] do find $gluster_mount_point -type f > /dev/null sleep 1 a=`expr $a + 1` done -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 29 07:14:08 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 29 Aug 2019 07:14:08 +0000 Subject: [Bugs] [Bug 1745026] endless heal gluster volume; incrementing number of files to heal when all peers in volume are up In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1745026 --- Comment #1 from tvanberlo at vangenechten.com --- What I forgot to mention and the reason I opened this as a bug report for gluster fuse: When this happens we tested on a gluster mount to see where the files were written to, and on only 2 of the 3 members the data (metadata for arbiter) was written. The file was not found on 1 member of the gluster cluster. When we remounted our gluster mounts(like described in the opening post), the healing finished and no other files were added to the 'files to heal list'. Yesterday a gluster member node was rebooted and today I repeated the test to see where the data is written to. Now all data is written to all members but the heal is still in progress. How long does it take for a gluster mount to notice a reappearing member and to write to a node that was down? -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 29 07:31:14 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 29 Aug 2019 07:31:14 +0000 Subject: [Bugs] [Bug 1744883] GlusterFS problem dataloss In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744883 Nicola battista changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(nicola.battista89 | |@gmail.com) | --- Comment #12 from Nicola battista --- Created attachment 1609285 --> https://bugzilla.redhat.com/attachment.cgi?id=1609285&action=edit GlusterFS log dbroot1 debug mode. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 29 07:31:18 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 29 Aug 2019 07:31:18 +0000 Subject: [Bugs] [Bug 1734423] interrupts leak memory In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734423 Bipin Kunal changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(bkunal at redhat.com | |) | -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 29 07:31:33 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 29 Aug 2019 07:31:33 +0000 Subject: [Bugs] [Bug 1744883] GlusterFS problem dataloss In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744883 --- Comment #13 from Nicola battista --- Created attachment 1609286 --> https://bugzilla.redhat.com/attachment.cgi?id=1609286&action=edit GlusterFS log dbroot2 debug mode. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 29 07:31:51 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 29 Aug 2019 07:31:51 +0000 Subject: [Bugs] [Bug 1744883] GlusterFS problem dataloss In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744883 --- Comment #14 from Nicola battista --- Created attachment 1609287 --> https://bugzilla.redhat.com/attachment.cgi?id=1609287&action=edit GlusterFS log dbroot3 debug mode. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 29 07:37:33 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 29 Aug 2019 07:37:33 +0000 Subject: [Bugs] [Bug 1744883] GlusterFS problem dataloss In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744883 --- Comment #15 from Nicola battista --- Hi, I've attached the log for each dbroot1. timestamp of the logs : cstore-pm01 File : GlusterFS log dbroot1 debug mode. start [2019-08-29 07:22:02.082937] finish [[2019-08-29 07:22:24.471092]] cstore-pm02 File : GlusterFS log dbroot2 debug mode. start [2019-08-29 07:17:20.287038] finish [2019-08-29 07:18:03.043808] [root at cstore-pm01 glusterfs]# getfattr -e hex -m . -d /usr/local/mariadb/columnstore/gluster/brick2/000.dir/000.dir/015.dir/064.dir/008.dir/FILE002.cdf getfattr: Removing leading '/' from absolute path names # file: usr/local/mariadb/columnstore/gluster/brick2/000.dir/000.dir/015.dir/064.dir/008.dir/FILE002.cdf trusted.afr.dirty=0x000000000000000000000000 trusted.gfid=0x48acd62fe5c34ae69ce4ce5cb23067c7 trusted.gfid2path.f3904b12d67725d3=0x66653035366232622d616437622d343934392d393463312d6433353031306338633966632f46494c453030322e636466 [root at cstore-pm02 glusterfs]# getfattr -e hex -m . -d /usr/local/mariadb/columnstore/gluster/brick2/000.dir/000.dir/015.dir/064.dir/008.dir/FILE002.cdf getfattr: Removing leading '/' from absolute path names # file: usr/local/mariadb/columnstore/gluster/brick2/000.dir/000.dir/015.dir/064.dir/008.dir/FILE002.cdf trusted.afr.dirty=0x000000000000000000000000 trusted.gfid=0x48acd62fe5c34ae69ce4ce5cb23067c7 trusted.gfid2path.f3904b12d67725d3=0x66653035366232622d616437622d343934392d393463312d6433353031306338633966632f46494c453030322e636466 [root at cstore-pm03 glusterfs]# getfattr -e hex -m . -d /usr/local/mariadb/columnstore/gluster/brick2/000.dir/000.dir/015.dir/064.dir/008.dir/FILE002.cdf getfattr: Removing leading '/' from absolute path names # file: usr/local/mariadb/columnstore/gluster/brick2/000.dir/000.dir/015.dir/064.dir/008.dir/FILE002.cdf trusted.afr.dirty=0x000000000000000000000000 trusted.gfid=0x48acd62fe5c34ae69ce4ce5cb23067c7 trusted.gfid2path.f3904b12d67725d3=0x66653035366232622d616437622d343934392d393463312d6433353031306338633966632f46494c453030322e636466 Do you need another information? Thanks Regards Nicola Battista -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 29 07:41:44 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 29 Aug 2019 07:41:44 +0000 Subject: [Bugs] [Bug 1744883] GlusterFS problem dataloss In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744883 --- Comment #16 from Nicola battista --- [root at cstore-pm01 glusterfs]# getfattr -e hex -m . -d /usr/local/mariadb/columnstore/gluster/brick2/000.dir/000.dir/015.dir/064.dir/008.dir/FILE002.cdf getfattr: Removing leading '/' from absolute path names # file: usr/local/mariadb/columnstore/gluster/brick2/000.dir/000.dir/015.dir/064.dir/008.dir/FILE002.cdf trusted.afr.dirty=0x000000000000000000000000 trusted.gfid=0x48acd62fe5c34ae69ce4ce5cb23067c7 trusted.gfid2path.f3904b12d67725d3=0x66653035366232622d616437622d343934392d393463312d6433353031306338633966632f46494c453030322e636466 [root at cstore-pm01 glusterfs]# getfattr -e hex -m . -d /usr/local/mariadb/columnstore/gluster/brick1/000.dir/000.dir/015.dir/064.dir/008.dir/FILE000.cdf getfattr: Removing leading '/' from absolute path names # file: usr/local/mariadb/columnstore/gluster/brick1/000.dir/000.dir/015.dir/064.dir/008.dir/FILE000.cdf trusted.afr.dirty=0x000000000000000000000000 trusted.gfid=0x6d6349a71d6e43a2a603e65c768d99ab trusted.gfid2path.1b3027e4fc5fdfd2=0x66373865643264662d633632652d343761652d383634342d3764653933336131613430392f46494c453030302e636466 [root at cstore-pm01 glusterfs]# getfattr -e hex -m . -d /usr/local/mariadb/columnstore/gluster/brick3/000.dir/000.dir/015.dir/064.dir/008.dir/FILE001.cdf getfattr: Removing leading '/' from absolute path names # file: usr/local/mariadb/columnstore/gluster/brick3/000.dir/000.dir/015.dir/064.dir/008.dir/FILE001.cdf trusted.afr.dirty=0x000000000000000000000000 trusted.gfid=0xfb097efd00044c0caf6eea79a40aab6a trusted.gfid2path.9af95bfcaac3386a=0x33373264663634382d393137642d343738332d393736652d3463336334666430366563352f46494c453030312e636466 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 29 09:36:59 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 29 Aug 2019 09:36:59 +0000 Subject: [Bugs] [Bug 1193929] GlusterFS can be improved In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1193929 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23327 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 29 09:36:59 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 29 Aug 2019 09:36:59 +0000 Subject: [Bugs] [Bug 1193929] GlusterFS can be improved In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1193929 --- Comment #749 from Worker Ant --- REVIEW: https://review.gluster.org/23327 (build: Fix libglusterd Makefile target) posted (#1) for review on master by Anoop C S -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 29 10:02:13 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 29 Aug 2019 10:02:13 +0000 Subject: [Bugs] [Bug 1734423] interrupts leak memory In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1734423 Prashant Dhange changed: What |Removed |Added ---------------------------------------------------------------------------- Group|redhat, private | -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 29 10:09:44 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 29 Aug 2019 10:09:44 +0000 Subject: [Bugs] [Bug 1746810] New: markdown files containing 404 links Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1746810 Bug ID: 1746810 Summary: markdown files containing 404 links Product: GlusterFS Version: mainline Hardware: All OS: All Status: NEW Component: doc Keywords: EasyFix Severity: medium Assignee: bugs at gluster.org Reporter: kiyer at redhat.com CC: bugs at gluster.org Target Milestone: --- Classification: Community Description of problem: Seeing a lot of 404 links in the markdown files available across the project. $find . -name \*.md -exec markdown-link-check {} -v -q \; FILE: ./extras/cliutils/README.md [?] https://github.com/aravindavk/glusterfs-restapi ? Status: 404 ERROR: dead links found! FILE: ./doc/developer-guide/commit-guidelines.md [?] mailto:name@example.com ? Status: 400 ERROR: dead links found! FILE: ./doc/developer-guide/gfapi-symbol-versions.md FILE: ./doc/developer-guide/xlator-classification.md [?] TBD ? Status: 400 Error: ENOENT: no such file or directory, access '/home/kiyer/upstream/glusterfs/doc/developer-guide/TBD' ERROR: dead links found! FILE: ./doc/developer-guide/translator-development.md [?] http://www.gluster.org/community/documentation/index.php/Translators ? Status: 404 ERROR: dead links found! FILE: ./doc/developer-guide/options-to-contribute.md [?] ./conding-standard.md ? Status: 400 Error: ENOENT: no such file or directory, access '/home/kiyer/upstream/glusterfs/doc/developer-guide/conding-standard.md' ERROR: dead links found! FILE: ./doc/developer-guide/Using-Gluster-Test-Framework.md FILE: ./doc/developer-guide/README.md [?] ./bd-xlator.md ? Status: 400 Error: ENOENT: no such file or directory, access '/home/kiyer/upstream/glusterfs/doc/developer-guide/bd-xlator.md' [?] ./coredump-analysis.md ? Status: 400 Error: ENOENT: no such file or directory, access '/home/kiyer/upstream/glusterfs/doc/developer-guide/coredump-analysis.md' ERROR: dead links found! FILE: ./doc/features/ganesha-ha.md FILE: ./doc/debugging/statedump.md [?] https://github.com/gluster/glusterfs/blob/master/doc/data-structures/mem-pool.md ? Status: 404 ERROR: dead links found! FILE: ./doc/debugging/gfid-to-path.md [?] https://gist.github.com/semiosis/4392640 ? Status: 404 ERROR: dead links found! FILE: ./doc/README.md [?] http://docs.gluster.org/en/latest/Upgrade-Guide/README/ ? Status: 404 ERROR: dead links found! FILE: ./geo-replication/syncdaemon/README.md [?] http://python.net/crew/theller/ctypes/ ? Status: 0 Error: ETIMEDOUT ERROR: dead links found! Version-Release number of selected component (if applicable): Whatever is the latest upstream version How reproducible: Always Steps to Reproduce: 1. Install markdown-link-checker using the below command: # npm install --save markdown-link-check 2. Git clone glusterfs and run markdown_link check using the below command: $find . -name \*.md -exec markdown-link-check {} -v -q \; Actual results: doc contain 404 links. Expected results: doc shouldn't contain 404 links. Additional info: -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 29 10:55:35 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 29 Aug 2019 10:55:35 +0000 Subject: [Bugs] [Bug 1744883] GlusterFS problem dataloss In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744883 --- Comment #17 from Ravishankar N --- The xattrs don't seem to indicate any pending heals from AFR point of view. The mount logs also do not contain any information about lookup/stat/open failing (either for the file name FILE000.cdf/its gfid or even in general). Given that you are able to access the file using the fuse mount as per comment#8, I'm not sure this is a bug in gluster. Is there a chance of races in the application where a thread tries to access the file before a creat() from another thread? -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 29 11:22:50 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 29 Aug 2019 11:22:50 +0000 Subject: [Bugs] [Bug 1741899] the volume of occupied space in the bricks of gluster volume (3 nodes replica) differs on nodes and the healing does not fix it In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741899 --- Comment #12 from Sergey Pleshkov --- Hello Once again, I executed the command to replace the brick for lsy-gl-03 (as a simple way to repair a identity of files on lsy-gl-03): (gluster volume replace-brick TST lsy-gl-03:/diskForData/tst lsy-gl-03:/diskForTestData/tst-fix commit force) it must synchronize all files from live nodes (lsy-gl-01, lsy-gl-03), as i know. But as a result, I again got a discrepancy between the actual sizes on the disk (df -h) [root at LSY-GL-02 host]# df -h Filesystem Size Used Avail Use% Mounted on LSY-GL-02:/TST 500G 115G 385G 23% /mnt/tst /dev/sdc1 500G 110G 390G 22% /diskForTestData [root at LSY-GL-03 host]# df -h Filesystem Size Used Avail Use% Mounted on /dev/sdc1 500G 107G 394G 22% /diskForTestData LSY-GL-03:/TST 500G 115G 385G 23% /mnt/tst I finded diff files by exec a diff command (/diskForTestData/tst is symlink to /diskForTestData/tst-fix): [root at LSY-GL-02 ~]# diff <(ls -Ra /diskForTestData/tst/lsy-tst/) <(ssh host at lsy-gl-03 sudo ls -Ra /diskForTestData/tst/lsy-tst) 1c1 < /diskForTestData/tst/lsy-tst/: --- > /diskForTestData/tst/lsy-tst: 357638a357639,357643 > 00b0d046-1e1c-4088-bb67-527513bd432d.1 > 00b0d046-1e1c-4088-bb67-527513bd432d.2 > 00b0d046-1e1c-4088-bb67-527513bd432d.3 > 00b0d046-1e1c-4088-bb67-527513bd432d.4 > 00b0d046-1e1c-4088-bb67-527513bd432d.5 357644a357650,357652 > 0339fa08-fb52-4f9f-bbc1-998a88bad3a9.1 > 0339fa08-fb52-4f9f-bbc1-998a88bad3a9.2 > 0339fa08-fb52-4f9f-bbc1-998a88bad3a9.3 357652a357661,357663 ..... Also finded a reason, what arequal-checksum command shows a lot more regular files on lsy-gl-03 - it is folder /diskForTestData/tst/lsy-tst/.shard and files in it. But on lsy-gl-03 it have size like 70gb, but on lsy-lg-01,02 - 58gb [root at LSY-GL-03 .shard]# du -sh /diskForTestData/tst/lsy-tst/.shard/ 70G /diskForTestData/tst/lsy-tst/.shard/ [root at LSY-GL-02 host]# du -sh /diskForTestData/tst/lsy-tst/.shard/ 58G /diskForTestData/tst/lsy-tst/.shard/ Also I have folder /diskForTestData/tst/.shard with identical files (hardlinks, i think) What should I do with this situation ? Copy .shard files from lsy-l-03 on lsy-gl-02,01 ? Heal status count zero [root at LSY-GL-03 tst]# gluster volume heal TST info Brick lsy-gl-01:/diskForTestData/tst Status: Connected Number of entries: 0 Brick lsy-gl-02:/diskForTestData/tst Status: Connected Number of entries: 0 Brick lsy-gl-03:/diskForTestData/tst-fix Status: Connected Number of entries: 0 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 29 12:49:41 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 29 Aug 2019 12:49:41 +0000 Subject: [Bugs] [Bug 1703322] Need to document about fips-mode-rchecksum in gluster-7 release notes. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1703322 Rinku changed: What |Removed |Added ---------------------------------------------------------------------------- Version|mainline |7 Flags|needinfo?(rkothiya at redhat.c | |om) | -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 29 13:15:07 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 29 Aug 2019 13:15:07 +0000 Subject: [Bugs] [Bug 1703322] Need to document about fips-mode-rchecksum in gluster-7 release notes. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1703322 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23330 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 29 13:15:09 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 29 Aug 2019 13:15:09 +0000 Subject: [Bugs] [Bug 1703322] Need to document about fips-mode-rchecksum in gluster-7 release notes. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1703322 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |POST --- Comment #4 from Worker Ant --- REVIEW: https://review.gluster.org/23330 (doc: documented about fips-mode-rchecksum) posted (#1) for review on release-7 by Rinku Kothiya -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 29 14:01:34 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 29 Aug 2019 14:01:34 +0000 Subject: [Bugs] [Bug 1746228] systemctl start glusterd is getting timed out on the scaled setup with 2000 volumes In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1746228 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-29 14:01:34 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23316 (glusterd: glusterd service is getting timed out on scaled setup) merged (#1) on master by MOHIT AGRAWAL -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Thu Aug 29 14:58:03 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 29 Aug 2019 14:58:03 +0000 Subject: [Bugs] [Bug 1193929] GlusterFS can be improved In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1193929 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23331 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 29 14:58:04 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 29 Aug 2019 14:58:04 +0000 Subject: [Bugs] [Bug 1193929] GlusterFS can be improved In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1193929 --- Comment #750 from Worker Ant --- REVIEW: https://review.gluster.org/23331 (logging.c: check for log level before checking for args.) posted (#1) for review on master by Yaniv Kaul -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 29 15:37:40 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 29 Aug 2019 15:37:40 +0000 Subject: [Bugs] [Bug 1744883] GlusterFS problem dataloss In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744883 --- Comment #18 from Roman --- That's uncomforting news. No, CS doesn't even use this call and at this point of data files existance they are read only. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Thu Aug 29 15:45:13 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Thu, 29 Aug 2019 15:45:13 +0000 Subject: [Bugs] [Bug 1744883] GlusterFS problem dataloss In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744883 --- Comment #19 from Nicola battista --- Hi all, Maybe this can you help : ############ CSTORE PM01 ######## [root at cstore-pm01 ~]# hexdump -C /usr/local/mariadb/columnstore/gluster/brick2/000.dir/000.dir/015.dir/064.dir/008.dir/FILE002.cdf | head 00000000 8e 77 d0 84 a3 19 c1 fd 02 00 00 00 00 00 00 00 |.w..............| 00000010 01 00 00 00 00 00 00 00 00 20 04 00 00 00 00 00 |......... ......| 00000020 00 00 04 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00001000 00 20 04 00 00 00 00 00 00 40 35 00 00 00 00 00 |. ....... at 5.....| 00001010 00 20 66 00 00 00 00 00 00 60 97 00 00 00 00 00 |. f......`......| 00001020 00 a0 c8 00 00 00 00 00 00 e0 f9 00 00 00 00 00 |................| 00001030 00 20 2b 01 00 00 00 00 00 60 5c 01 00 00 00 00 |. +......`\.....| 00001040 00 80 8d 01 00 00 00 00 00 80 bf 01 00 00 00 00 |................| [root at cstore-pm01 ~]# hexdump -C /usr/local/mariadb/columnstore/gluster/brick1/000.dir/000.dir/015.dir/064.dir/008.dir/FILE000.cdf | head 00000000 8e 77 d0 84 a3 19 c1 fd 02 00 00 00 00 00 00 00 |.w..............| 00000010 01 00 00 00 00 00 00 00 00 20 04 00 00 00 00 00 |......... ......| 00000020 00 20 04 00 00 00 00 00 00 00 00 00 00 00 00 00 |. ..............| 00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00001000 00 20 04 00 00 00 00 00 00 60 35 00 00 00 00 00 |. .......`5.....| 00001010 00 e0 66 00 00 00 00 00 00 40 98 00 00 00 00 00 |..f...... at ......| 00001020 00 60 c9 00 00 00 00 00 00 a0 fa 00 00 00 00 00 |.`..............| 00001030 00 c0 2b 01 00 00 00 00 00 40 5d 01 00 00 00 00 |..+......@].....| 00001040 00 60 8e 01 00 00 00 00 00 20 be 01 00 00 00 00 |.`....... ......| [root at cstore-pm01 ~]# hexdump -C /usr/local/mariadb/columnstore/gluster/brick3/000.dir/000.dir/015.dir/064.dir/008.dir/FILE001.cdf | head 00000000 8e 77 d0 84 a3 19 c1 fd 02 00 00 00 00 00 00 00 |.w..............| 00000010 01 00 00 00 00 00 00 00 00 20 04 00 00 00 00 00 |......... ......| 00000020 00 00 04 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00001000 00 20 04 00 00 00 00 00 00 60 35 00 00 00 00 00 |. .......`5.....| 00001010 00 a0 66 00 00 00 00 00 00 00 98 00 00 00 00 00 |..f.............| 00001020 00 00 c9 00 00 00 00 00 00 e0 f9 00 00 00 00 00 |................| 00001030 00 40 2b 01 00 00 00 00 00 60 5c 01 00 00 00 00 |. at +......`\.....| 00001040 00 c0 8d 01 00 00 00 00 00 00 bf 01 00 00 00 00 |................| ############ CSTORE PM02 ######## [root at cstore-pm02 ~]# hexdump -C /usr/local/mariadb/columnstore/gluster/brick1/000.dir/000.dir/015.dir/064.dir/008.dir/FILE000.cdf | head 00000000 8e 77 d0 84 a3 19 c1 fd 02 00 00 00 00 00 00 00 |.w..............| 00000010 01 00 00 00 00 00 00 00 00 20 04 00 00 00 00 00 |......... ......| 00000020 00 20 04 00 00 00 00 00 00 00 00 00 00 00 00 00 |. ..............| 00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00001000 00 20 04 00 00 00 00 00 00 60 35 00 00 00 00 00 |. .......`5.....| 00001010 00 e0 66 00 00 00 00 00 00 40 98 00 00 00 00 00 |..f...... at ......| 00001020 00 60 c9 00 00 00 00 00 00 a0 fa 00 00 00 00 00 |.`..............| 00001030 00 c0 2b 01 00 00 00 00 00 40 5d 01 00 00 00 00 |..+......@].....| 00001040 00 60 8e 01 00 00 00 00 00 20 be 01 00 00 00 00 |.`....... ......| [root at cstore-pm02 ~]# hexdump -C /usr/local/mariadb/columnstore/gluster/brick2/000.dir/000.dir/015.dir/064.dir/008.dir/FILE002.cdf | head 00000000 8e 77 d0 84 a3 19 c1 fd 02 00 00 00 00 00 00 00 |.w..............| 00000010 01 00 00 00 00 00 00 00 00 20 04 00 00 00 00 00 |......... ......| 00000020 00 00 04 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00001000 00 20 04 00 00 00 00 00 00 40 35 00 00 00 00 00 |. ....... at 5.....| 00001010 00 20 66 00 00 00 00 00 00 60 97 00 00 00 00 00 |. f......`......| 00001020 00 a0 c8 00 00 00 00 00 00 e0 f9 00 00 00 00 00 |................| 00001030 00 20 2b 01 00 00 00 00 00 60 5c 01 00 00 00 00 |. +......`\.....| 00001040 00 80 8d 01 00 00 00 00 00 80 bf 01 00 00 00 00 |................| [root at cstore-pm02 ~]# hexdump -C /usr/local/mariadb/columnstore/gluster/brick3/000.dir/000.dir/015.dir/064.dir/008.dir/FILE001.cdf | head 00000000 8e 77 d0 84 a3 19 c1 fd 02 00 00 00 00 00 00 00 |.w..............| 00000010 01 00 00 00 00 00 00 00 00 20 04 00 00 00 00 00 |......... ......| 00000020 00 00 04 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00001000 00 20 04 00 00 00 00 00 00 60 35 00 00 00 00 00 |. .......`5.....| 00001010 00 a0 66 00 00 00 00 00 00 00 98 00 00 00 00 00 |..f.............| 00001020 00 00 c9 00 00 00 00 00 00 e0 f9 00 00 00 00 00 |................| 00001030 00 40 2b 01 00 00 00 00 00 60 5c 01 00 00 00 00 |. at +......`\.....| 00001040 00 c0 8d 01 00 00 00 00 00 00 bf 01 00 00 00 00 |................| ############ CSTORE PM03 ######## [root at cstore-pm03 ~]# hexdump -C /usr/local/mariadb/columnstore/gluster/brick1/000.dir/000.dir/015.dir/064.dir/008.dir/FILE000.cdf | head 00000000 8e 77 d0 84 a3 19 c1 fd 02 00 00 00 00 00 00 00 |.w..............| 00000010 01 00 00 00 00 00 00 00 00 20 04 00 00 00 00 00 |......... ......| 00000020 00 20 04 00 00 00 00 00 00 00 00 00 00 00 00 00 |. ..............| 00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00001000 00 20 04 00 00 00 00 00 00 60 35 00 00 00 00 00 |. .......`5.....| 00001010 00 e0 66 00 00 00 00 00 00 40 98 00 00 00 00 00 |..f...... at ......| 00001020 00 60 c9 00 00 00 00 00 00 a0 fa 00 00 00 00 00 |.`..............| 00001030 00 c0 2b 01 00 00 00 00 00 40 5d 01 00 00 00 00 |..+......@].....| 00001040 00 60 8e 01 00 00 00 00 00 20 be 01 00 00 00 00 |.`....... ......| [root at cstore-pm03 ~]# hexdump -C /usr/local/mariadb/columnstore/gluster/brick2/000.dir/000.dir/015.dir/064.dir/008.dir/FILE002.cdf | head 00000000 8e 77 d0 84 a3 19 c1 fd 02 00 00 00 00 00 00 00 |.w..............| 00000010 01 00 00 00 00 00 00 00 00 20 04 00 00 00 00 00 |......... ......| 00000020 00 00 04 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00001000 00 20 04 00 00 00 00 00 00 40 35 00 00 00 00 00 |. ....... at 5.....| 00001010 00 20 66 00 00 00 00 00 00 60 97 00 00 00 00 00 |. f......`......| 00001020 00 a0 c8 00 00 00 00 00 00 e0 f9 00 00 00 00 00 |................| 00001030 00 20 2b 01 00 00 00 00 00 60 5c 01 00 00 00 00 |. +......`\.....| 00001040 00 80 8d 01 00 00 00 00 00 80 bf 01 00 00 00 00 |................| [root at cstore-pm03 ~]# hexdump -C /usr/local/mariadb/columnstore/gluster/brick3/000.dir/000.dir/015.dir/064.dir/008.dir/FILE001.cdf | head 00000000 8e 77 d0 84 a3 19 c1 fd 02 00 00 00 00 00 00 00 |.w..............| 00000010 01 00 00 00 00 00 00 00 00 20 04 00 00 00 00 00 |......... ......| 00000020 00 00 04 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00001000 00 20 04 00 00 00 00 00 00 60 35 00 00 00 00 00 |. .......`5.....| 00001010 00 a0 66 00 00 00 00 00 00 00 98 00 00 00 00 00 |..f.............| 00001020 00 00 c9 00 00 00 00 00 00 e0 f9 00 00 00 00 00 |................| 00001030 00 40 2b 01 00 00 00 00 00 60 5c 01 00 00 00 00 |. at +......`\.....| 00001040 00 c0 8d 01 00 00 00 00 00 00 bf 01 00 00 00 00 |................| Thanks Regards Nicola Battista -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 30 03:15:12 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 30 Aug 2019 03:15:12 +0000 Subject: [Bugs] [Bug 1739320] The result (hostname) of getnameinfo for all bricks (ipv6 addresses) are the same, while they are not. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739320 --- Comment #11 from Amgad --- *** I verified that "ai_family" passed in "af_inet_client_get_remote_sockaddr" to "gf_resolve_ip6" [rpc/rpc-transport/socket/src/name.c], is for IPv4 "2" and not IPv6 (should be "10"): af_inet_client_get_remote_sockaddr(rpc_transport_t *this, struct sockaddr *sockaddr, socklen_t *sockaddr_len) ....... /* TODO: gf_resolve is a blocking call. kick in some non blocking dns techniques */ ret = gf_resolve_ip6(remote_host, remote_port, sockaddr->sa_family, &this->dnscache, &addr_info); gf_log(this->name, GF_LOG_ERROR, "CSTO-DEBUG: Family Address is %d", sockaddr->sa_family); ==> my added debug msg AND *** in [libglusterfs/src/common-utils.c] where "gf_resolve_ip6" is defined where the IPv6 is passed to "getaddrinfo" as host name, and it failed because the ai_family is not right: int32_t gf_resolve_ip6(const char *hostname, uint16_t port, int family, void **dnscache, struct addrinfo **addr_info) { ... if ((ret = getaddrinfo(hostname, port_str, &hints, &cache->first)) != 0) { gf_msg("resolver", GF_LOG_ERROR, 0, LG_MSG_GETADDRINFO_FAILED, "getaddrinfo failed (family:%d) (%s)", family, gai_strerror(ret)); gf_msg("resolver", GF_LOG_ERROR, 0, LG_MSG_GETADDRINFO_FAILED, ==> my added debug msg "CSTO-DEBUG: getaddrinfo failed (hostname:%s) (%s)", hostname, gai_strerror(ret)); ......... /var/log/glusterfs/glustershd.log output: ..... [2019-08-30 01:03:51.871225] E [MSGID: 101075] [common-utils.c:512:gf_resolve_ip6] 0-resolver: CSTO-DEBUG: getaddrinfo failed (hostname:2001:db8:1234::8) (Address family for hostname not supported) [2019-08-30 01:03:51.871239] E [name.c:256:af_inet_client_get_remote_sockaddr] 0-glusterfs: CSTO-DEBUG: Family Address is 2 ==> [2019-08-30 01:03:51.871249] E [name.c:260:af_inet_client_get_remote_sockaddr] 0-glusterfs: DNS resolution failed on host 2001:db8:1234::8 ........ That's why failed DNS resolution and caused glustershd not to come up. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 30 03:46:38 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 30 Aug 2019 03:46:38 +0000 Subject: [Bugs] [Bug 1739320] The result (hostname) of getnameinfo for all bricks (ipv6 addresses) are the same, while they are not. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739320 Mohit Agrawal changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|bugs at gluster.org |moagrawa at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 30 04:22:43 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 30 Aug 2019 04:22:43 +0000 Subject: [Bugs] [Bug 1387404] geo-rep: gsync-sync-gfid binary installed in /usr/share/... In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1387404 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-30 04:22:43 --- Comment #3 from Worker Ant --- REVIEW: https://review.gluster.org/23059 (build: move arch-dependent files from /usr/share to /usr/libexec) merged (#4) on master by Amar Tumballi -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 30 04:25:40 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 30 Aug 2019 04:25:40 +0000 Subject: [Bugs] [Bug 1744548] Setting cluster.heal-timeout requires volume restart In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744548 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-30 04:25:40 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23288 (afr: wake up index healer threads) merged (#4) on master by Ravishankar N -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 30 04:38:25 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 30 Aug 2019 04:38:25 +0000 Subject: [Bugs] [Bug 1193929] GlusterFS can be improved In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1193929 --- Comment #751 from Worker Ant --- REVIEW: https://review.gluster.org/23327 (build: Fix libglusterd Makefile target) merged (#2) on master by Amar Tumballi -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 30 04:40:34 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 30 Aug 2019 04:40:34 +0000 Subject: [Bugs] [Bug 1737778] ocf resource agent for volumes don't work in non-standard environment In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1737778 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-30 04:40:34 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23165 (peer_map parameter and fix in state detection when no brick is running on peer) merged (#2) on master by Amar Tumballi -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 30 04:42:02 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 30 Aug 2019 04:42:02 +0000 Subject: [Bugs] [Bug 1746320] SHORT-WRITE error leads to crash In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1746320 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-30 04:42:02 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23318 (debug/error-gen: Set count correctly for short-writes) merged (#2) on master by Amar Tumballi -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 30 05:04:16 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 30 Aug 2019 05:04:16 +0000 Subject: [Bugs] [Bug 1743988] Setting cluster.heal-timeout requires volume restart In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743988 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23332 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 30 05:04:17 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 30 Aug 2019 05:04:17 +0000 Subject: [Bugs] [Bug 1743988] Setting cluster.heal-timeout requires volume restart In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743988 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |POST --- Comment #5 from Worker Ant --- REVIEW: https://review.gluster.org/23332 (afr: wake up index healer threads) posted (#1) for review on release-6 by Ravishankar N -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 30 05:04:25 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 30 Aug 2019 05:04:25 +0000 Subject: [Bugs] [Bug 1747301] New: Setting cluster.heal-timeout requires volume restart Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1747301 Bug ID: 1747301 Summary: Setting cluster.heal-timeout requires volume restart Product: GlusterFS Version: 7 Hardware: x86_64 OS: Linux Status: NEW Component: selfheal Keywords: Triaged Severity: low Assignee: bugs at gluster.org Reporter: ravishankar at redhat.com CC: bugs at gluster.org, glenk1973 at hotmail.com, ravishankar at redhat.com Depends On: 1743988, 1744548 Target Milestone: --- Classification: Community +++ This bug was initially created as a clone of Bug #1744548 +++ +++ This bug was initially created as a clone of Bug #1743988 +++ Description of problem: Setting the `cluster.heal-timeout` requires a volume restart to take effect. Version-Release number of selected component (if applicable): 6.5 How reproducible: Every time Steps to Reproduce: 1. Provision a 3-peer replica volume (I used three docker containers). 2. Set `cluster.favorite-child-policy` to `mtime`. 3. Mount the volume on one of the containers (say `gluster-0`, serving as a server and a client). 4. Stop the self-heal daemon. 5. Set `cluster.entry-self-heal`, `cluster.data-self-heal` and `cluster.metadata-self-heal` to off. 6. Set `cluster.quorum-type` to none. 7. Write "first write" to file `test.txt` on the mounted volume. 8. Kill the brick process `gluster-2`. 9. Write "second write" to `test.txt`. 10. Force start the volume (`gluster volume start force`) 11. Kill brick processes `gluster-0` and `gluster-1`. 12. Write "third write" to `test.txt`. 13. Force start the volume. 14. Verify that "split-brain" appears in the output of `gluster volume heal info` command. 15. Set `cluster.heal-timeout` to `60`. 16. Start the self-heal daemon. 17. Issue `gluster volume heal info` command after 70 seconds. 18. Verify that the output at step 17 does not contain "split-brain". 19. Verify that the content of `test.txt` is "third write". Actual results: The output at step 17 contains "split-brain". Expected results: The output at step 17 should _not_ contain "split-brain". Additional info: According to what Ravishankar N said on Slack (https://gluster.slack.com/archives/CH9M2KF60/p1566346818102000), changing volume options such as `cluster.heal-timeout` should not require a process restart. If I add a `gluster volume start force` command immediately after step 16 above, then I get the Expected results. --- Additional comment from Glen K on 2019-08-21 06:04:23 UTC --- I should add that `cluster.quorum-type` is set to `none` for the test. --- Additional comment from Ravishankar N on 2019-08-21 09:56:54 UTC --- Okay, so after some investigation, I don't think this is an issue. When you change the heal-timeout, it does get propagated to the self-heal daemon. But since the default value is 600 seconds, the threads that do the heal only wake up after that time. Once it wakes up, subsequent runs do seem to honour the new heal-timeout value. On a glusterfs 6.5 setup: #gluster v create testvol replica 2 127.0.0.2:/home/ravi/bricks/brick{1..2} force #gluster v set testvol client-log-level DEBUG #gluster v start testvol #gluster v set testvol heal-timeout 5 #tail -f /var/log/glusterfs/glustershd.log|grep finished You don't see anything in the log yet about the crawls. But once you manually launch heal, the threads are woken up and further crawls happen every 5 seconds. #gluster v heal testvol Now in glustershd.log: [2019-08-21 09:55:02.024160] D [MSGID: 0] [afr-self-heald.c:843:afr_shd_index_healer] 0-testvol-replicate-0: finished index sweep on subvol testvol-client-0. [2019-08-21 09:55:02.024271] D [MSGID: 0] [afr-self-heald.c:843:afr_shd_index_healer] 0-testvol-replicate-0: finished index sweep on subvol testvol-client-1. [2019-08-21 09:55:08.023252] D [MSGID: 0] [afr-self-heald.c:843:afr_shd_index_healer] 0-testvol-replicate-0: finished index sweep on subvol testvol-client-1. [2019-08-21 09:55:08.023358] D [MSGID: 0] [afr-self-heald.c:843:afr_shd_index_healer] 0-testvol-replicate-0: finished index sweep on subvol testvol-client-0. [2019-08-21 09:55:14.024438] D [MSGID: 0] [afr-self-heald.c:843:afr_shd_index_healer] 0-testvol-replicate-0: finished index sweep on subvol testvol-client-1. [2019-08-21 09:55:14.024546] D [MSGID: 0] [afr-self-heald.c:843:afr_shd_index_healer] 0-testvol-replicate-0: finished index sweep on subvol testvol-client-0. Glen, could you check if that works for you? i.e. after setting the heal-timeout, manually launch heal via `gluster v heal testvol`. --- Additional comment from Glen K on 2019-08-21 18:15:39 UTC --- In my steps above, I set the heal-timeout while the self-heal daemon is stopped: ... 4. Stop the self-heal daemon. ... 15. Set `cluster.heal-timeout` to `60`. 16. Start the self-heal daemon. ... I would expect that the configuration would certainly take effect after a restart of the self-heal daemon. Yes, launching heal manually causes the heal to happen right away, but the purpose of the test is to verify the heal happens automatically. From a user perspective, the current behaviour of the heal-timeout setting appears to be at odds with the "configuration changes take effect without restart" feature; I think it is reasonable to request that changing the heal-timeout setting results in the thread sleeps being reset to the new setting. --- Additional comment from Ravishankar N on 2019-08-22 07:11:53 UTC --- (In reply to Glen K from comment #3) > > I would expect that the configuration would certainly take effect after a > restart of the self-heal daemon. In step-4 and 16, I assume you toggled `cluster.self-heal-daemon` off and on respectively. This actually does not kill the shd process per se and just disables/enables the heal crawls. In 6.5, a volume start force does restart shd so changing the order of the tests should do the trick, i.e. 13. Set `cluster.heal-timeout` to `60`. 14. Force start the volume. 15. Verify that "split-brain" appears in the output of `gluster volume heal info` command. > Yes, launching heal manually causes the heal to happen right away, but the > purpose of the test is to verify the heal happens automatically. From a user > perspective, the current behaviour of the heal-timeout setting appears to be > at odds with the "configuration changes take effect without restart" > feature; I think it is reasonable to request that changing the heal-timeout > setting results in the thread sleeps being reset to the new setting. Fair enough, I'll attempt a fix on master, let us see how the review goes. --- Additional comment from Worker Ant on 2019-08-22 12:15:15 UTC --- REVIEW: https://review.gluster.org/23288 (afr: wake up index healer threads) posted (#1) for review on master by Ravishankar N --- Additional comment from Worker Ant on 2019-08-30 04:25:40 UTC --- REVIEW: https://review.gluster.org/23288 (afr: wake up index healer threads) merged (#4) on master by Ravishankar N Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1743988 [Bug 1743988] Setting cluster.heal-timeout requires volume restart https://bugzilla.redhat.com/show_bug.cgi?id=1744548 [Bug 1744548] Setting cluster.heal-timeout requires volume restart -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 30 05:04:25 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 30 Aug 2019 05:04:25 +0000 Subject: [Bugs] [Bug 1743988] Setting cluster.heal-timeout requires volume restart In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743988 Ravishankar N changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1747301 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1747301 [Bug 1747301] Setting cluster.heal-timeout requires volume restart -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 30 05:04:25 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 30 Aug 2019 05:04:25 +0000 Subject: [Bugs] [Bug 1744548] Setting cluster.heal-timeout requires volume restart In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1744548 Ravishankar N changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1747301 Referenced Bugs: https://bugzilla.redhat.com/show_bug.cgi?id=1747301 [Bug 1747301] Setting cluster.heal-timeout requires volume restart -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 30 05:04:42 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 30 Aug 2019 05:04:42 +0000 Subject: [Bugs] [Bug 1747301] Setting cluster.heal-timeout requires volume restart In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1747301 Ravishankar N changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED Assignee|bugs at gluster.org |ravishankar at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 30 05:06:27 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 30 Aug 2019 05:06:27 +0000 Subject: [Bugs] [Bug 1747301] Setting cluster.heal-timeout requires volume restart In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1747301 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23333 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 30 05:06:28 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 30 Aug 2019 05:06:28 +0000 Subject: [Bugs] [Bug 1747301] Setting cluster.heal-timeout requires volume restart In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1747301 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |POST --- Comment #1 from Worker Ant --- REVIEW: https://review.gluster.org/23333 (afr: wake up index healer threads) posted (#1) for review on release-7 by Ravishankar N -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 30 05:36:01 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 30 Aug 2019 05:36:01 +0000 Subject: [Bugs] [Bug 1741899] the volume of occupied space in the bricks of gluster volume (3 nodes replica) differs on nodes and the healing does not fix it In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741899 Ravishankar N changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |kdhananj at redhat.com --- Comment #13 from Ravishankar N --- Adding the Sharding maintainer Krutika to the bug for any possible advice on comment#12. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 30 07:12:13 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 30 Aug 2019 07:12:13 +0000 Subject: [Bugs] [Bug 1743634] geo-rep: Changelog archive file format is incorrect In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743634 Kotresh HR changed: What |Removed |Added ---------------------------------------------------------------------------- Flags|needinfo?(khiremat at redhat.c | |om) | -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 30 07:19:55 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 30 Aug 2019 07:19:55 +0000 Subject: [Bugs] [Bug 1743634] geo-rep: Changelog archive file format is incorrect In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743634 Kshithij Iyer changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ON_QA |VERIFIED -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 30 07:24:22 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 30 Aug 2019 07:24:22 +0000 Subject: [Bugs] [Bug 1193929] GlusterFS can be improved In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1193929 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Link ID| |Gluster.org Gerrit 23335 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 30 07:24:23 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 30 Aug 2019 07:24:23 +0000 Subject: [Bugs] [Bug 1193929] GlusterFS can be improved In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1193929 --- Comment #752 from Worker Ant --- REVIEW: https://review.gluster.org/23335 (glusterd-store.c: remove of dead code) posted (#1) for review on master by Yaniv Kaul -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 30 07:37:14 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 30 Aug 2019 07:37:14 +0000 Subject: [Bugs] [Bug 1741899] the volume of occupied space in the bricks of gluster volume (3 nodes replica) differs on nodes and the healing does not fix it In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741899 --- Comment #14 from Krutika Dhananjay --- (In reply to Ravishankar N from comment #13) > Adding the Sharding maintainer Krutika to the bug for any possible advice on > comment#12. Copying the shards across bricks from the backend is not a good idea. A parallel operation on the file while the copy is going on can lead to inconsistencies. Ravi, Seems like the main issue is replication inconsistency after a replace-brick. Any heal-related errors in the logs? I see cluster.favorite-child-policy set in volume-info. Would it be an issue here? (As an aside, network.ping-timeout is set to 5s and that's really low. @Sergey, you should probably set to a higher value, say 30s or more) -Krutika -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 30 08:59:40 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 30 Aug 2019 08:59:40 +0000 Subject: [Bugs] [Bug 1739320] The result (hostname) of getnameinfo for all bricks (ipv6 addresses) are the same, while they are not. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739320 --- Comment #12 from Mohit Agrawal --- Hi, To enable ipv6 for gluster processes you need to change "transport.address-family" in /etc/glusterfs/glusterd.vol and restart glusterd The issue has been fixed in upstream from the below patch https://review.gluster.org/#/c/glusterfs/+/21948/ By default transport address family is inet and the value is commented in file /etc/glusterfs/glusterd.vol. To enable ipv6 please change the value to inet6 and uncomment the line as below cat /etc/glusterfs/glusterd.vol volume management type mgmt/glusterd option working-directory /var/lib/glusterd option transport-type socket option transport.socket.keepalive-time 10 option transport.socket.keepalive-interval 2 option transport.socket.read-fail-log off option transport.socket.listen-port 24007 option ping-timeout 0 option event-threads 1 # option lock-timer 180 option transport.address-family inet6 # option base-port 49152 option max-port 60999 end-volume Thanks, Mohit Agrawal -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 30 09:02:16 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 30 Aug 2019 09:02:16 +0000 Subject: [Bugs] [Bug 1738878] FUSE client's memory leak In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1738878 --- Comment #10 from Sergey Pleshkov --- Hello Yesterday I upgraded the client to version 6.5 and set the lru-limit - the problem with the continuous occupation of RAM was solved by this workaround. Gathered a couple of state dumps if anybody want to see them. https://cloud.hostco.ru/s/w9MY6jj5Hpj2qoa But ran into another problem after this update Server software version 5.5, client version 6.5 - every time I write a file to a mounted shared folder, I see this error in the logs (with or without lru-limit option) [2019-08-30 08:31:04.763118] E [fuse-bridge.c:220:check_and_dump_fuse_W] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f361d877a3b] (--> /usr/lib64/glusterfs/6.5/xlator/mount/fuse.so(+0x81d1)[0x7f3614c261d1] (--> /usr/lib64/glusterfs/6.5/xlator/mount/fuse.so(+0x8aaa)[0x7f3614c26aaa] (--> /lib64/libpthread.so.0(+0x7dd5)[0x7f361c6b5dd5] (--> /lib64/libc.so.6(clone+0x6d)[0x7f361bf7dead] ))))) 0-glusterfs-fuse: writing to fuse device failed: No such file or directory Files are created in shared folder, on other clients I see them, the contents can be updated. I need to open another bug on this issue? -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 30 09:23:28 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 30 Aug 2019 09:23:28 +0000 Subject: [Bugs] [Bug 1732961] shard file with different gfid in different subvolume causing VM to pause on stale file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1732961 --- Comment #26 from Olaf Buitelaar --- Created attachment 1609814 --> https://bugzilla.redhat.com/attachment.cgi?id=1609814&action=edit another instance of stale file log's with v6.5 Just adding another sample, of the stale file error. probably the entry you're interested in is at; lease-08.dc01.adsolutions-data-gfs-bricks-bricka-ovirt-data.log with line; [2019-08-30 03:48:16.689485] E [MSGID: 113002] [posix-entry-ops.c:323:posix_lookup] 1-ovirt-data-posix: buf->ia_gfid is null for /data0/gfs/bricks/brick1/ovirt-data/.shard/8a27b91a-ff02-42dc-bd4c-caa019424de8.1682 [No data available] [2019-08-30 03:48:16.689544] E [MSGID: 115050] [server-rpc-fops_v2.c:158:server4_lookup_cbk] 0-ovirt-data-server: 20674155: LOOKUP /.shard/8a27b91a-ff02-42dc-bd4c-caa019424de8.1682 (be318638-e8a0-4c6d-977d-7a937aa84806/8a27b91a-ff02-42dc-bd4c-caa019424de8.1682), client: CTX_ID:e00800f6-01d4-4d32-b374-1cd8f82dae57-GRAPH_ID:0-PID:18060-HOST:lease-07.dc01.adsolutions-PC_NAME:ovirt-data-client-9-RECON_NO:-0, error-xlator: ovirt-data-posix [No data available] this error seems different than the other encounters, which usually logged something like; [2019-07-17 01:21:52.768672] I [MSGID: 113030] [posix-entry-ops.c:1146:posix_unlink] 0-ovirt-kube-posix: open-fd-key-status: 0 for /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.111756 [2019-07-17 01:21:52.768713] I [MSGID: 113031] [posix-entry-ops.c:1053:posix_skip_non_linkto_unlink] 0-posix: linkto_xattr status: 0 for /data/gfs/bricks/brick1/ovirt-kube/.shard/a38d64bc-a28b-4ee1-a0bb-f919e7a1022c.111756 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 30 09:24:44 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 30 Aug 2019 09:24:44 +0000 Subject: [Bugs] [Bug 1746138] ctime: If atime is updated via utimensat syscall ctime is not getting updated In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1746138 Worker Ant changed: What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE Last Closed| |2019-08-30 09:24:44 --- Comment #2 from Worker Ant --- REVIEW: https://review.gluster.org/23312 (ctime: Fix ctime issue with utime family of syscalls) merged (#2) on release-6 by hari gowtham -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 30 09:24:44 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 30 Aug 2019 09:24:44 +0000 Subject: [Bugs] [Bug 1743627] ctime: If atime is updated via utimensat syscall ctime is not getting updated In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743627 Bug 1743627 depends on bug 1746138, which changed state. Bug 1746138 Summary: ctime: If atime is updated via utimensat syscall ctime is not getting updated https://bugzilla.redhat.com/show_bug.cgi?id=1746138 What |Removed |Added ---------------------------------------------------------------------------- Status|POST |CLOSED Resolution|--- |NEXTRELEASE -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 30 09:32:25 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 30 Aug 2019 09:32:25 +0000 Subject: [Bugs] [Bug 1741899] the volume of occupied space in the bricks of gluster volume (3 nodes replica) differs on nodes and the healing does not fix it In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741899 --- Comment #15 from Ravishankar N --- cluster.favorite-child-policy should not cause any problems w.r.t missing files. Perhaps Sergey can check for errors in glustershd.log. FWIW, I did try out a replace brick with the volume options being the same as this one (and having files > shard size) and the heals were successful. This was on glusterfs-5.5. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 30 09:47:02 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 30 Aug 2019 09:47:02 +0000 Subject: [Bugs] [Bug 1741899] the volume of occupied space in the bricks of gluster volume (3 nodes replica) differs on nodes and the healing does not fix it In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741899 --- Comment #16 from Sergey Pleshkov --- Errors for what period? Since the last brick replacement (August 28) on lsy-gl-03 - there are no errors in the file glfsheal-TST.log, only informational messages Prior to this, all bricks were replaced sequentially (transfer over a separate disk on the node) - also no heal errros in logs glfsheal-TST.log on all nodes. After that, a problem was seen with the size of the raw data. In file glustershd.log from lsy-gl-03 - exist error messages, but not about TST volume -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 30 09:50:44 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 30 Aug 2019 09:50:44 +0000 Subject: [Bugs] [Bug 1741899] the volume of occupied space in the bricks of gluster volume (3 nodes replica) differs on nodes and the healing does not fix it In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1741899 --- Comment #17 from Sergey Pleshkov --- On lsy-gl-01,02 in file glustershd.log many info messages about selfeal operations when replace brick process works -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 30 10:21:07 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 30 Aug 2019 10:21:07 +0000 Subject: [Bugs] [Bug 1376757] Data corruption in write ordering of rebalance and application writes In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1376757 Karthik U S changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |CLOSED Resolution|--- |WONTFIX Last Closed| |2019-08-30 10:21:07 --- Comment #12 from Karthik U S --- This issue is being tracked by https://github.com/gluster/glusterfs/issues/308. Since there is no active work going on this closing this for now. Feel free to reopen this or open a new bug when needed. -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 30 10:57:58 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 30 Aug 2019 10:57:58 +0000 Subject: [Bugs] [Bug 1716979] Multiple disconnect events being propagated for the same child In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1716979 --- Comment #15 from hari gowtham --- (In reply to Amgad from comment #14) > Thanks Hari: > > >we have backported the patches for this bug to every active release branch. > > What does exactly mean? does it mean the bug is in 6.3-1 now for instance? > or 5.5-1? > > Regards, > Amgad The bug was root caused to be found on master and other branches. So the fixes have to sent to those branches as well. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 30 11:25:33 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 30 Aug 2019 11:25:33 +0000 Subject: [Bugs] [Bug 1747414] New: EIO error on check_and_dump_fuse_W call Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1747414 Bug ID: 1747414 Summary: EIO error on check_and_dump_fuse_W call Product: GlusterFS Version: 5 Hardware: x86_64 OS: Linux Status: NEW Component: libglusterfsclient Assignee: bugs at gluster.org Reporter: stam.bardis at gmail.com CC: bugs at gluster.org Target Milestone: --- Classification: Community Created attachment 1609823 --> https://bugzilla.redhat.com/attachment.cgi?id=1609823&action=edit Fuse client logs Description of problem: After massively create/rename/delete file operations on a volume we are getting EIO failure (fuse-bridge.c:219:check_and_dump_fuse_W). Version-Release number of selected component (if applicable): gluster --version glusterfs 5.6 cat /proc/version Linux version 4.4.179-1.el7.elrepo.x86_64 (mockbuild at Build64R7) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC) ) #1 SMP Sat Apr 27 08:29:04 EDT 2019 Volume Name: oam Type: Replicate Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Options Reconfigured: cluster.consistent-metadata: on performance.lazy-open: on performance.strict-o-direct: off performance.open-behind: on performance.quick-read: on performance.io-cache: on performance.readdir-ahead: on performance.write-behind: on performance.read-ahead: on performance.stat-prefetch: on diagnostics.brick-log-level: INFO diagnostics.client-log-level: ERROR diagnostics.brick-sys-log-level: INFO cluster.server-quorum-type: server cluster.quorum-type: auto transport.address-family: inet nfs.disable: on performance.client-io-threads: off cluster.server-quorum-ratio: 51% How reproducible: Script that does the following operations in a massive way: create-rename-delete a file on a single thread. Actual results: Debug logs from FUSE client attached -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Fri Aug 30 13:43:24 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 30 Aug 2019 13:43:24 +0000 Subject: [Bugs] [Bug 1739320] The result (hostname) of getnameinfo for all bricks (ipv6 addresses) are the same, while they are not. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739320 --- Comment #13 from Amgad --- Hi Mohit: Our "/etc/glusterfs/glusterd.vol" is set with IPv6 - so this is not the case see below: Regards, Amgad volume management type mgmt/glusterd option working-directory /var/lib/glusterd option transport-type socket,rdma option transport.socket.keepalive-time 10 option transport.socket.keepalive-interval 2 option transport.socket.read-fail-log off option transport.socket.listen-port 24007 option transport.rdma.listen-port 24008 option transport.address-family inet6 option transport.socket.bind-address 2001:db8:1234::8 option transport.tcp.bind-address 2001:db8:1234::8 option transport.rdma.bind-address 2001:db8:1234::8 option ping-timeout 0 option event-threads 1 option transport.listen-backlog 1024 # option base-port 49152 end-volume -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 30 14:03:31 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 30 Aug 2019 14:03:31 +0000 Subject: [Bugs] [Bug 1739320] The result (hostname) of getnameinfo for all bricks (ipv6 addresses) are the same, while they are not. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739320 --- Comment #14 from Mohit Agrawal --- Hi, Can you please share complete dump of /var/log/gluster directory. Thanks, Mohit Agrawal -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 30 20:58:32 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 30 Aug 2019 20:58:32 +0000 Subject: [Bugs] [Bug 1739320] The result (hostname) of getnameinfo for all bricks (ipv6 addresses) are the same, while they are not. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739320 --- Comment #15 from Amgad --- I'm attaching the tar file. I just reverted the private version with my debugging statements to 6.5-1. Keep in mind this was upgraded back and forth several times, so the logs have different versions, but the latest is 6.5-1 -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Fri Aug 30 21:00:59 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Fri, 30 Aug 2019 21:00:59 +0000 Subject: [Bugs] [Bug 1739320] The result (hostname) of getnameinfo for all bricks (ipv6 addresses) are the same, while they are not. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739320 --- Comment #16 from Amgad --- Created attachment 1609995 --> https://bugzilla.redhat.com/attachment.cgi?id=1609995&action=edit /var/log/glusterfs tar file -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Sat Aug 31 00:14:50 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Sat, 31 Aug 2019 00:14:50 +0000 Subject: [Bugs] [Bug 1740413] Gluster volume bricks crashes when running a security scan on glusterfs ports In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1740413 Calvin Dunigan changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |cdunigan at axway.com --- Comment #6 from Calvin Dunigan --- We have encountered the same issue on gluster 6.4. Does anyone know whether this affects older versions (e.g. 3.X)? -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Sat Aug 31 03:06:04 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Sat, 31 Aug 2019 03:06:04 +0000 Subject: [Bugs] [Bug 1739320] The result (hostname) of getnameinfo for all bricks (ipv6 addresses) are the same, while they are not. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1739320 --- Comment #17 from Mohit Agrawal --- Hi, As per currently shared logs it seems now you are facing a different issue, issue related to "DNS resolution failed" is resolved already. It seems earlier correct transport-type was not mentioned in volfile so brick was not coming up(throwing an error Address family not supported) but now brick is failing because brick is not able to connect with glusterd because glusterd is not up. >>>>>>>>>>>>>>>>>>>>>> ..... ..... [2019-08-30 01:03:41.480435] W [socket.c:721:__socket_rwv] 0-glusterfs: readv on 2001:db8:1234::8:24007 failed (No data available) [2019-08-30 01:03:41.480554] I [glusterfsd-mgmt.c:2443:mgmt_rpc_notify] 0-glusterfsd-mgmt: disconnected from remote-host: ceph-cs-01.storage.bcmt.cluster.local [2019-08-30 01:03:41.480573] I [glusterfsd-mgmt.c:2463:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers ..... .... >>>>>>>>>>>>>>>>>>>>>>>>>> I am seeing similar messages in other brick logs file also. glusterd is not coming up because it is throwing an error "Address is already in use". >>>>>>>>>>>>>>>>> [2019-08-30 01:03:43.493787] I [socket.c:904:__socket_server_bind] 0-socket.management: closing (AF_UNIX) reuse check socket 10 [2019-08-30 01:03:43.499501] I [MSGID: 106513] [glusterd-store.c:2394:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 60000 [2019-08-30 01:03:43.503539] I [MSGID: 106544] [glusterd.c:152:glusterd_uuid_init] 0-management: retrieved UUID: 8e2b40a7-098c-4f0a-b323-2e764bd315f3 [2019-08-30 01:03:43.855699] I [MSGID: 106498] [glusterd-handler.c:3687:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0 [2019-08-30 01:03:43.860181] I [MSGID: 106498] [glusterd-handler.c:3687:glusterd_friend_add_from_peerinfo] 0-management: connect returned 0 [2019-08-30 01:03:43.860245] W [MSGID: 106061] [glusterd-handler.c:3490:glusterd_transport_inet_options_build] 0-glusterd: Failed to get tcp-user-timeout [2019-08-30 01:03:43.860284] I [rpc-clnt.c:1005:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2019-08-30 01:03:43.966588] E [name.c:256:af_inet_client_get_remote_sockaddr] 0-management: CSTO-DEBUG: Family Address is 10 [2019-08-30 01:03:43.967196] I [rpc-clnt.c:1005:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2019-08-30 01:03:43.969757] E [name.c:256:af_inet_client_get_remote_sockaddr] 0-management: CSTO-DEBUG: Family Address is 10 [2019-08-30 01:03:44.681604] E [socket.c:923:__socket_server_bind] 0-socket.management: binding to failed: Address already in use [2019-08-30 01:03:44.681645] E [socket.c:925:__socket_server_bind] 0-socket.management: Port is already in use [2019-08-30 01:03:45.681776] E [socket.c:923:__socket_server_bind] 0-socket.management: binding to failed: Address already in use [2019-08-30 01:03:45.681883] E [socket.c:925:__socket_server_bind] 0-socket.management: Port is already in use [2019-08-30 01:03:46.681992] E [socket.c:923:__socket_server_bind] 0-socket.management: binding to failed: Address already in use [2019-08-30 01:03:46.682027] E [socket.c:925:__socket_server_bind] 0-socket.management: Port is already in use [2019-08-30 01:03:47.682249] E [socket.c:925:__socket_server_bind] 0-socket.management: Port is already in use [2019-08-30 01:03:47.682191] E [socket.c:923:__socket_server_bind] 0-socket.management: binding to failed: Address already in use [2019-08-30 01:03:43.967187] W [MSGID: 106061] [glusterd-handler.c:3490:glusterd_transport_inet_options_build] 0-glusterd: Failed to get tcp-user-timeout [2019-08-30 01:03:48.598585] W [glusterfsd.c:1570:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f4af4b0fdd5] -->glusterd(glusterfs_sigwaiter+0xe5) [0x5584ef3131b5] -->glusterd(cleanup_and_exit+0x6b) [0x5584ef31301b] ) 0-: received signum (15), shutting down >>>>>>>>>>>>>>>>>>>>>> We have fixed the same in release-6 recently https://review.gluster.org/#/c/glusterfs/+/23268/ Kindly apply this patch or install the build after merged this patch. Regards, Mohit Agrawal -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Sat Aug 31 17:33:09 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Sat, 31 Aug 2019 17:33:09 +0000 Subject: [Bugs] [Bug 1356824] glusterfs-3.7.1-16 problem with size difference in listing dir and file In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1356824 PnT Account Manager changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|sankarshan at redhat.com |ykaul at redhat.com -- You are receiving this mail because: You are on the CC list for the bug. From bugzilla at redhat.com Sat Aug 31 17:55:45 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Sat, 31 Aug 2019 17:55:45 +0000 Subject: [Bugs] [Bug 1620580] Deleted a volume and created a new volume with similar but not the same name. The kubernetes pod still keeps on running and doesn't crash. Still possible to write to gluster mount In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1620580 PnT Account Manager changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|atumball at redhat.com |bugs at gluster.org -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Sat Aug 31 17:55:46 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Sat, 31 Aug 2019 17:55:46 +0000 Subject: [Bugs] [Bug 1627060] ./tests/features/trash.t test case failing on s390x In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1627060 PnT Account Manager changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|atumball at redhat.com |bugs at gluster.org -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Sat Aug 31 17:55:47 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Sat, 31 Aug 2019 17:55:47 +0000 Subject: [Bugs] [Bug 1636297] Make it easy to build / host a project which just builds glusterfs translator In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1636297 PnT Account Manager changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|atumball at redhat.com |bugs at gluster.org -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Sat Aug 31 17:55:51 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Sat, 31 Aug 2019 17:55:51 +0000 Subject: [Bugs] [Bug 1672480] Bugs Test Module tests failing on s390x In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1672480 PnT Account Manager changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|atumball at redhat.com |bugs at gluster.org -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Sat Aug 31 17:55:52 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Sat, 31 Aug 2019 17:55:52 +0000 Subject: [Bugs] [Bug 1683317] ./tests/bugs/glusterfs/bug-866459.t failing on s390x In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1683317 PnT Account Manager changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|atumball at redhat.com |bugs at gluster.org -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. From bugzilla at redhat.com Sat Aug 31 17:55:53 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Sat, 31 Aug 2019 17:55:53 +0000 Subject: [Bugs] [Bug 1689097] gfapi: provide an option for changing statedump path in glfs-api. In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1689097 PnT Account Manager changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|atumball at redhat.com |bugs at gluster.org -- You are receiving this mail because: You are the assignee for the bug. From bugzilla at redhat.com Sat Aug 31 17:56:00 2019 From: bugzilla at redhat.com (bugzilla at redhat.com) Date: Sat, 31 Aug 2019 17:56:00 +0000 Subject: [Bugs] [Bug 1743094] glusterfs build fails on centos7 In-Reply-To: References: Message-ID: https://bugzilla.redhat.com/show_bug.cgi?id=1743094 PnT Account Manager changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|atumball at redhat.com |bugs at gluster.org -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug.