From jahernan at redhat.com Wed Jan 2 07:03:03 2019 From: jahernan at redhat.com (Xavi Hernandez) Date: Wed, 2 Jan 2019 08:03:03 +0100 Subject: [Gluster-devel] Latency analysis of GlusterFS' network layer for pgbench In-Reply-To: References:

Message-ID: On Mon, Dec 24, 2018 at 11:30 AM Sankarshan Mukhopadhyay < sankarshan.mukhopadhyay at gmail.com> wrote: > [pulling the conclusions up to enable better in-line] > > > Conclusions: > > > > We should never have a volume with caching-related xlators disabled. The > price we pay for it is too high. We need to make them work consistently and > aggressively to avoid as many requests as we can. > > Are there current issues in terms of behavior which are known/observed > when these are enabled? > > > We need to analyze client/server xlators deeper to see if we can avoid > some delays. However optimizing something that is already at the > microsecond level can be very hard. > > That is true - are there any significant gains which can be accrued by > putting efforts here or, should this be a lower priority? > I would say that for volumes based on spinning disks this is not a high priority, but if we want to provide good performance for NVME storage, this is something that needs to be done. On NVME, reads and writes can be served in few tens of microseconds, so adding 100 us in the network layer could easily mean a performance reduction of 70% or more. > > We need to determine what causes the fluctuations in brick side and > avoid them. > > This scenario is very similar to a smallfile/metadata workload, so this > is probably one important cause of its bad performance. > > What kind of instrumentation is required to enable the determination? > > On Fri, Dec 21, 2018 at 1:48 PM Xavi Hernandez > wrote: > > > > Hi, > > > > I've done some tracing of the latency that network layer introduces in > gluster. I've made the analysis as part of the pgbench performance issue > (in particulat the initialization and scaling phase), so I decided to look > at READV for this particular workload, but I think the results can be > extrapolated to other operations that also have small latency (cached data > from FS for example). > > > > Note that measuring latencies introduces some latency. It consists in a > call to clock_get_time() for each probe point, so the real latency will be > a bit lower, but still proportional to these numbers. > > > > [snip] > _______________________________________________ > Gluster-devel mailing list > Gluster-devel at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rgowdapp at redhat.com Wed Jan 2 08:00:20 2019 From: rgowdapp at redhat.com (Raghavendra Gowdappa) Date: Wed, 2 Jan 2019 13:30:20 +0530 Subject: [Gluster-devel] [Gluster-users] On making ctime generator enabled by default in stack In-Reply-To: References:

Message-ID: On Mon, Nov 12, 2018 at 10:48 AM Amar Tumballi wrote: > > > On Mon, Nov 12, 2018 at 10:39 AM Vijay Bellur wrote: > >> >> >> On Sun, Nov 11, 2018 at 8:25 PM Raghavendra Gowdappa >> wrote: >> >>> >>> >>> On Sun, Nov 11, 2018 at 11:41 PM Vijay Bellur >>> wrote: >>> >>>> >>>> >>>> On Mon, Nov 5, 2018 at 8:31 PM Raghavendra Gowdappa < >>>> rgowdapp at redhat.com> wrote: >>>> >>>>> >>>>> >>>>> On Tue, Nov 6, 2018 at 9:58 AM Vijay Bellur >>>>> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Mon, Nov 5, 2018 at 7:56 PM Raghavendra Gowdappa < >>>>>> rgowdapp at redhat.com> wrote: >>>>>> >>>>>>> All, >>>>>>> >>>>>>> There is a patch [1] from Kotresh, which makes ctime generator as >>>>>>> default in stack. Currently ctime generator is being recommended only for >>>>>>> usecases where ctime is important (like for Elasticsearch). However, a >>>>>>> reliable (c)(m)time can fix many consistency issues within glusterfs stack >>>>>>> too. These are issues with caching layers having stale (meta)data >>>>>>> [2][3][4]. Basically just like applications, components within glusterfs >>>>>>> stack too need a time to find out which among racing ops (like write, stat, >>>>>>> etc) has latest (meta)data. >>>>>>> >>>>>>> Also note that a consistent (c)(m)time is not an optional feature, >>>>>>> but instead forms the core of the infrastructure. So, I am proposing to >>>>>>> merge this patch. If you've any objections, please voice out before Nov 13, >>>>>>> 2018 (a week from today). >>>>>>> >>>>>>> As to the existing known issues/limitations with ctime generator, my >>>>>>> conversations with Kotresh, revealed following: >>>>>>> * Potential performance degradation (we don't yet have data to >>>>>>> conclusively prove it, preliminary basic tests from Kotresh didn't indicate >>>>>>> a significant perf drop). >>>>>>> >>>>>> >>>>>> Do we have this data captured somewhere? If not, would it be possible >>>>>> to share that data here? >>>>>> >>>>> >>>>> I misquoted Kotresh. He had measured impact of gfid2path and said both >>>>> features might've similar impact as major perf cost is related to storing >>>>> xattrs on backend fs. I am in the process of getting a fresh set of >>>>> numbers. Will post those numbers when available. >>>>> >>>>> >>>> >>>> I observe that the patch under discussion has been merged now [1]. A >>>> quick search did not yield me any performance data. Do we have the >>>> performance numbers posted somewhere? >>>> >>> >>> No. Perf benchmarking is a task pending on me. >>> >> >> When can we expect this task to be complete? >> >> In any case, I don't think it is ideal for us to merge a patch without >> completing our due diligence on it. How do we want to handle this scenario >> since the patch is already merged? >> >> We could: >> >> 1. Revert the patch now >> 2. Review the performance data and revert the patch if performance >> characterization indicates a significant dip. It would be preferable to >> complete this activity before we branch off for the next release. >> > > I am for option 2. Considering the branch out for next release is another > 2 months, and no one is expected to use the 'release' off a master branch > yet, it makes sense to give that buffer time to get this activity completed. > Its unlikely I'll have time for carrying out perf benchmark. Hence I've posted a revert here: https://review.gluster.org/#/c/glusterfs/+/21975/ > Regards, > Amar > > 3. Think of some other option? >> >> Thanks, >> Vijay >> >> >>> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > > > -- > Amar Tumballi (amarts) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nigelb at redhat.com Thu Jan 3 14:41:57 2019 From: nigelb at redhat.com (Nigel Babu) Date: Thu, 3 Jan 2019 20:11:57 +0530 Subject: [Gluster-devel] Tests for the GCS stack using the k8s framework Message-ID: Hello, Deepshikha and I have been working on understanding and using the k8s framework for testing the GCS stack. With the help of the folks from sig-storage, we've managed to write a sample test that needs to be run against an already setup k8s gluster with GCS installed on top[1]. This is a temporary location for the tests and we'll move these into gluster-csi-driver repo[2] once some of the dependency issues[3] are sorted out. The upstream storage tests are being split out into a test suite[4] that can be consumed out of tree by folks like us who are implementing a CSI driver interface. When that happens, we should be able to continuously validate against the standards set for the storage interface. [1]: https://github.com/nigelbabu/gcs-test/ [2]: https://github.com/gluster/gluster-csi-driver/ [3]: https://github.com/gluster/gluster-csi-driver/issues/131 [4]: https://github.com/kubernetes/kubernetes/tree/master/test/e2e/storage/testsuites -- nigelb -------------- next part -------------- An HTML attachment was scrubbed... URL: From jenkins at build.gluster.org Mon Jan 7 01:45:03 2019 From: jenkins at build.gluster.org (jenkins at build.gluster.org) Date: Mon, 7 Jan 2019 01:45:03 +0000 (UTC) Subject: [Gluster-devel] Weekly Untriaged Bugs Message-ID: <1605456600.113.1546825503928.JavaMail.jenkins@jenkins-el7.rht.gluster.org> [...truncated 6 lines...] https://bugzilla.redhat.com/1660404 / core: Conditional freeing of string after returning from dict_set_dynstr function https://bugzilla.redhat.com/1657645 / core: [Glusterfs-server-5.1] Gluster storage domain creation fails on MountError https://bugzilla.redhat.com/1658108 / disperse: [disperse] Dump respective itables in EC to statedumps. https://bugzilla.redhat.com/1658472 / disperse: Mountpoint not accessible for few seconds when bricks are brought down to max redundancy after reset brick https://bugzilla.redhat.com/1663337 / doc: Gluster documentation on quorum-reads option is incorrect https://bugzilla.redhat.com/1659334 / fuse: FUSE mount seems to be hung and not accessible https://bugzilla.redhat.com/1663205 / fuse: List dictionary is too slow https://bugzilla.redhat.com/1659824 / fuse: Unable to mount gluster fs on glusterfs client: Transport endpoint is not connected https://bugzilla.redhat.com/1657743 / fuse: Very high memory usage (25GB) on Gluster FUSE mountpoint https://bugzilla.redhat.com/1663583 / geo-replication: Geo-replication fails to open logfile "/var/log/glusterfs/cli.log" on slave. https://bugzilla.redhat.com/1662178 / glusterd: Compilation fails for xlators/mgmt/glusterd/src with error "undefined reference to `dlclose'" https://bugzilla.redhat.com/1663247 / glusterd: remove static memory allocations from code https://bugzilla.redhat.com/1663519 / gluster-smb: Memory leak when smb.conf has "store dos attributes = yes" https://bugzilla.redhat.com/1657607 / posix: Convert nr_files to gf_atomic in posix_private structure https://bugzilla.redhat.com/1659371 / posix: posix_janitor_thread_proc has bug that can't go into the janitor_walker if change the system time forward and change back https://bugzilla.redhat.com/1659374 / posix: posix_janitor_thread_proc has bug that can't go into the janitor_walker if change the system time forward and change back https://bugzilla.redhat.com/1659378 / posix: posix_janitor_thread_proc has bug that can't go into the janitor_walker if change the system time forward and change back https://bugzilla.redhat.com/1657860 / project-infrastructure: Archives for ci-results mailinglist are getting wiped (with each mail?) https://bugzilla.redhat.com/1659934 / project-infrastructure: Cannot unsubscribe the review.gluster.org https://bugzilla.redhat.com/1659394 / project-infrastructure: Maintainer permissions on gluster-mixins project for Ankush https://bugzilla.redhat.com/1661895 / replicate: [disperse] Dump respective itables in EC to statedumps. https://bugzilla.redhat.com/1662557 / replicate: glusterfs process crashes, causing "Transport endpoint not connected". https://bugzilla.redhat.com/1658742 / rpc: Inconsistent type for 'remote-port' parameter [...truncated 2 lines...] -------------- next part -------------- A non-text attachment was scrubbed... Name: build.log Type: application/octet-stream Size: 3119 bytes Desc: not available URL: From atumball at redhat.com Mon Jan 7 03:34:47 2019 From: atumball at redhat.com (Amar Tumballi Suryanarayan) Date: Mon, 7 Jan 2019 09:04:47 +0530 Subject: [Gluster-devel] Gluster Maintainer's meeting: 7th Jan, 2019 - Agenda Message-ID: Meeting date: 2019-01-07 18:30 IST, 13:00 UTC, 08:00 EDTBJ Link - Bridge: https://bluejeans.com/217609845 Attendance Agenda - Welcome 2019: Discuss about goals : - https://hackmd.io/OiQId65pStuBa_BPPazcmA - Progress with GCS - Scale testing showing GD2 can scale to 1000s of PVs (each is a gluster volume, in RWX mode) - new CSI for gluster-block showing good scale numbers, which is reaching higher than current 1k RWO PV per cluster, but need to iron out few things. (https://github.com/gluster/gluster-csi-driver/pull/105) - Performance focus: - Any update? What are the patch in progress? - How to measure the perf of a patch, is there any hardware? - Static Analyzers: - glusterfs: - coverity - 63 open - clang-scan - 32 open (with many false-positives). - gluster-block: - coverity: 1 open (66 last week) - GlusterFS-6: - Any priority review needed? - What are the critical areas need focus? - How to make glusto automated tests become blocker for the release? - Upgrade tests, need to start early. - Schedule as called out in the mail NOTE: Working backwards on the schedule, here?s what we have: - Announcement: Week of Mar 4th, 2019 - GA tagging: Mar-01-2019 - RC1: On demand before GA - RC0: Feb-04-2019 - Late features cut-off: Week of Jan-21st, 2018 - Branching (feature cutoff date): Jan-14-2018 (~45 days prior to branching) - Feature/scope proposal for the release (end date): Dec-12-2018 - Round Table? ================= Feel free to add your topic into : https://hackmd.io/yTC-un5XT6KUB9V37LG6OQ?edit -- Amar Tumballi (amarts) -------------- next part -------------- An HTML attachment was scrubbed... URL: From rkavunga at redhat.com Tue Jan 8 06:53:04 2019 From: rkavunga at redhat.com (RAFI KC) Date: Tue, 8 Jan 2019 12:23:04 +0530 Subject: [Gluster-devel] Implementing multiplexing for self heal client. In-Reply-To: References: <8050438a-10a3-4329-ff58-6eae863c62cd@redhat.com>

Message-ID: <63459166-fd5b-5ff0-f1f8-9a966c02f27a@redhat.com> I have completed the patches and pushed for reviews. Please feel free to raise your review concerns/suggestions. https://review.gluster.org/#/c/glusterfs/+/21868 https://review.gluster.org/#/c/glusterfs/+/21907 https://review.gluster.org/#/c/glusterfs/+/21960 https://review.gluster.org/#/c/glusterfs/+/21989/ Regards Rafi KC On 12/24/18 3:58 PM, RAFI KC wrote: > > On 12/21/18 6:56 PM, Sankarshan Mukhopadhyay wrote: >> On Fri, Dec 21, 2018 at 6:30 PM RAFI KC wrote: >>> Hi All, >>> >>> What is the problem? >>> As of now self-heal client is running as one daemon per node, this >>> means >>> even if there are multiple volumes, there will only be one self-heal >>> daemon. So to take effect of each configuration changes in the cluster, >>> the self-heal has to be reconfigured. But it doesn't have ability to >>> dynamically reconfigure. Which means when you have lot of volumes in >>> the >>> cluster, every management operation that involves configurations >>> changes >>> like volume start/stop, add/remove brick etc will result in self-heal >>> daemon restart. If such operation is executed more often, it is not >>> only >>> slow down self-heal for a volume, but also increases the slef-heal logs >>> substantially. >> What is the value of the number of volumes when you write "lot of >> volumes"? 1000 volumes, more etc > > Yes, more than 1000 volumes. It also depends on how often you execute > glusterd management operations (mentioned above). Each time self heal > daemon is restarted, it prints the entire graph. This graph traces in > the log will contribute the majority it's size. > > >> >>> >>> How to fix it? >>> >>> We are planning to follow a similar procedure as attach/detach graphs >>> dynamically which is similar to brick multiplex. The detailed steps is >>> as below, >>> >>> >>> >>> >>> 1) First step is to make shd per volume daemon, to generate/reconfigure >>> volfiles per volume basis . >>> >>> ??? 1.1) This will help to attach the volfiles easily to existing >>> shd daemon >>> >>> ??? 1.2) This will help to send notification to shd daemon as each >>> volinfo keeps the daemon object >>> >>> ??? 1.3) reconfiguring a particular subvolume is easier as we can check >>> the topology better >>> >>> ??? 1.4) With this change the volfiles will be moved to workdir/vols/ >>> directory. >>> >>> 2) Writing new rpc requests like attach/detach_client_graph function to >>> support clients attach/detach >>> >>> ??? 2.1) Also functions like graph reconfigure, mgmt_getspec_cbk has to >>> be modified >>> >>> 3) Safely detaching a subvolume when there are pending frames to >>> unwind. >>> >>> ??? 3.1) We can mark the client disconnected and make all the frames to >>> unwind with ENOTCONN >>> >>> ??? 3.2) We can wait all the i/o to unwind until the new updated subvol >>> attaches >>> >>> 4) Handle scenarios like glusterd restart, node reboot, etc >>> >>> >>> >>> At the moment we are not planning to limit the number of heal subvolmes >>> per process as, because with the current approach also for every volume >>> heal was doing from a single process. We have not heared any major >>> complains on this? >> Is the plan to not ever limit or, have a throttle set to a default >> high(er) value? How would system resources be impacted if the proposed >> design is implemented? > > The plan is to implement in a way that it can support more than one > multiplexed self-heal daemon. The throttling function as of now > returns the same process to multiplex, but it can be easily modified > to create a new process. > > This multiplexing logic won't utilize any additional resources that it > currently does. > > > Rafi KC > > >> _______________________________________________ >> Gluster-devel mailing list >> Gluster-devel at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From nbalacha at redhat.com Tue Jan 8 13:33:13 2019 From: nbalacha at redhat.com (Nithya Balachandran) Date: Tue, 8 Jan 2019 19:03:13 +0530 Subject: [Gluster-devel] https://review.gluster.org/#/c/glusterfs/+/19778/ In-Reply-To: References:

Message-ID: Shyam, what is your take on this? An upstream user has tried it out and reported that it seems to fix the issue , however cpu utilization doubles. Regards, Nithya On Fri, 28 Dec 2018 at 09:17, Amar Tumballi wrote: > I feel its good to backport considering glusterfs-6.0 is another 2 months > away. > > On Fri, Dec 28, 2018 at 8:19 AM Nithya Balachandran > wrote: > >> Hi, >> >> Can we backport this to release-5 ? We have several reports of high >> memory usage in fuse clients from users and this is likely to help. >> >> Regards, >> Nithya >> _______________________________________________ >> Gluster-devel mailing list >> Gluster-devel at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-devel > > > > -- > Amar Tumballi (amarts) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From srangana at redhat.com Tue Jan 8 14:33:58 2019 From: srangana at redhat.com (Shyam Ranganathan) Date: Tue, 8 Jan 2019 09:33:58 -0500 Subject: [Gluster-devel] https://review.gluster.org/#/c/glusterfs/+/19778/ In-Reply-To: References:

Message-ID: On 1/8/19 8:33 AM, Nithya Balachandran wrote: > Shyam, what is your take on this? > An upstream user has tried it out and reported that it seems to fix the > issue , however cpu utilization doubles. We usually do not backport big fixes unless they are critical. My first answer would be, can't this wait for rel-6 which is up next? The change has gone through a good review overall, so from a review thoroughness perspective it looks good. The change has a test case to ensure that the limits are honored, so again a plus. Also, it is a switch, so in the worst case moving back to unlimited should be possible with little adverse effects in case the fix has issues. It hence, comes down to how confident are we that the change is not disruptive to an existing branch? If we can answer this with resonable confidence we can backport it and release it with the next 5.x update release. > > Regards, > Nithya > > On Fri, 28 Dec 2018 at 09:17, Amar Tumballi > wrote: > > I feel its good to backport considering glusterfs-6.0 is another 2 > months away. > > On Fri, Dec 28, 2018 at 8:19 AM Nithya Balachandran > > wrote: > > Hi, > > Can we backport this to release-5 ? We have several reports of > high memory usage in fuse clients from users and this is likely > to help. > > Regards, > Nithya > _______________________________________________ > Gluster-devel mailing list > Gluster-devel at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-devel > > > > -- > Amar Tumballi (amarts) > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-devel > From atumball at redhat.com Wed Jan 9 02:57:03 2019 From: atumball at redhat.com (Amar Tumballi Suryanarayan) Date: Wed, 9 Jan 2019 08:27:03 +0530 Subject: [Gluster-devel] https://review.gluster.org/#/c/glusterfs/+/19778/ In-Reply-To: References:

Message-ID: On Tue, Jan 8, 2019 at 8:04 PM Shyam Ranganathan wrote: > On 1/8/19 8:33 AM, Nithya Balachandran wrote: > > Shyam, what is your take on this? > > An upstream user has tried it out and reported that it seems to fix the > > issue , however cpu utilization doubles. > > We usually do not backport big fixes unless they are critical. My first > answer would be, can't this wait for rel-6 which is up next? > > Considering it may take some more time to get adoption, doing a backport may surely benefit users, IMO. > The change has gone through a good review overall, so from a review > thoroughness perspective it looks good. > > The change has a test case to ensure that the limits are honored, so > again a plus. > > Also, it is a switch, so in the worst case moving back to unlimited > should be possible with little adverse effects in case the fix has issues. > > It hence, comes down to how confident are we that the change is not > disruptive to an existing branch? If we can answer this with resonable > confidence we can backport it and release it with the next 5.x update > release. > > Considering the code which the patch changes has changed very little over last few years, I feel it is totally safe to do the backport. Don't see any possible surprises. Will send a patch today on release-5 branch. -Amar > > > > Regards, > > Nithya > > > > On Fri, 28 Dec 2018 at 09:17, Amar Tumballi > > wrote: > > > > I feel its good to backport considering glusterfs-6.0 is another 2 > > months away. > > > > On Fri, Dec 28, 2018 at 8:19 AM Nithya Balachandran > > > wrote: > > > > Hi, > > > > Can we backport this to release-5 ? We have several reports of > > high memory usage in fuse clients from users and this is likely > > to help. > > > > Regards, > > Nithya > > _______________________________________________ > > Gluster-devel mailing list > > Gluster-devel at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-devel > > > > > > > > -- > > Amar Tumballi (amarts) > > > > > > _______________________________________________ > > Gluster-devel mailing list > > Gluster-devel at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-devel > > > -- Amar Tumballi (amarts) -------------- next part -------------- An HTML attachment was scrubbed... URL: From atumball at redhat.com Wed Jan 9 03:05:19 2019 From: atumball at redhat.com (Amar Tumballi Suryanarayan) Date: Wed, 9 Jan 2019 08:35:19 +0530 Subject: [Gluster-devel] Gluster Maintainer's meeting: 7th Jan, 2019 - Meeting minutes In-Reply-To: References: Message-ID: Meeting date: 2019-01-07 18:30 IST, 13:00 UTC, 08:00 EDT BJ Link - Bridge: https://bluejeans.com/217609845 - Watch: https://bluejeans.com/s/sGFpa Attendance Agenda - Welcome 2019: New goals / Discuss: - https://hackmd.io/OiQId65pStuBa_BPPazcmA - Give it a week and take it to mailing list, discuss and agree upon - [Nigel] Some of the above points are threads of its own. May need separate thread. - Progress with GCS - Email about GCS in community. - RWX: - Scale testing showing GD2 can scale to 1000s of PVs (each is a gluster volume) - Bricks with LVM - Some delete issues seen, specially with LV command scale. Patch sent. - Create rate: 500 PVs / 12mins - More details by end of the week, including delete numbers. - RWO: - new CSI for gluster-block showing good scale numbers, which is reaching higher than current 1k RWO PV per cluster, but need to iron out few things. (https://github.com/gluster/gluster-csi-driver/pull/105 ) - 280 pods in 3 hosts, 1-1 Pod->PV ratio: leaner graph. - 1080 PVs with 1-12 ratio on 3 machines - Working on 3000+ PVC on just 3 hosts, will update by another 2 days. - Poornima is coming up with steps and details about the PR/version used etc. - Static Analyzers: - glusterfs: - Coverity - 63 open - https://scan.coverity.com/projects/gluster-glusterfs?tab=overview - clang-scan - 32 open - https://build.gluster.org/job/clang-scan/lastCompletedBuild/clangScanBuildBugs/ - gluster-block: - https://scan.coverity.com/projects/gluster-gluster-block?tab=overview - coverity: 1 open (66 last week) - GlusterFS-6: - Any priority review needed? - Fencing patches - Reducing threads (GH Issue: 475) - glfs-api statx patches [merged] - What are the critical areas need focus? - Asan Build ? Currently not green - Some java errors, machine offline. Need to look into this. - How to make glusto automated tests become blocker for the release? - Upgrade tests, need to start early. - Schedule as called out in the mail NOTE: Working backwards on the schedule, here?s what we have: - Announcement: Week of Mar 4th, 2019 - GA tagging: Mar-01-2019 - RC1: On demand before GA - RC0: Feb-04-2019 - Late features cut-off: Week of Jan-21st, 2018 - Branching (feature cutoff date): Jan-14-2018 (~45 days prior to branching) - Feature/scope proposal for the release (end date): Dec-12-2018 - Round Table? - [Sunny] Meetup in BLR this weekend. Please do come (at least those who are in BLR) - [Susant] Softserve has 4hrs timeout, which can?t get full regression cycle. Can we get at least 2 more hours added, so full regression can be run. ------- On Mon, Jan 7, 2019 at 9:04 AM Amar Tumballi Suryanarayan < atumball at redhat.com> wrote: > > Meeting date: 2019-01-07 18:30 IST, 13:00 UTC, 08:00 EDTBJ Link > > - Bridge: https://bluejeans.com/217609845 > > Attendance > Agenda > > - > > Welcome 2019: Discuss about goals : > - https://hackmd.io/OiQId65pStuBa_BPPazcmA > - > > Progress with GCS > - Scale testing showing GD2 can scale to 1000s of PVs (each is a > gluster volume, in RWX mode) > - new CSI for gluster-block showing good scale numbers, which is > reaching higher than current 1k RWO PV per cluster, but need to iron out > few things. (https://github.com/gluster/gluster-csi-driver/pull/105) > - > > Performance focus: > - Any update? What are the patch in progress? > - How to measure the perf of a patch, is there any hardware? > - > > Static Analyzers: > - glusterfs: > - coverity - 63 open > - clang-scan - 32 open (with many false-positives). > - gluster-block: > - coverity: 1 open (66 last week) > - > > GlusterFS-6: > - Any priority review needed? > - What are the critical areas need focus? > - How to make glusto automated tests become blocker for the release? > - Upgrade tests, need to start early. > - Schedule as called out in the mail > > NOTE: Working backwards on the schedule, here?s what we have: > - Announcement: Week of Mar 4th, 2019 > - GA tagging: Mar-01-2019 > - RC1: On demand before GA > - RC0: Feb-04-2019 > - Late features cut-off: Week of Jan-21st, 2018 > - Branching (feature cutoff date): Jan-14-2018 (~45 days prior > to branching) > - Feature/scope proposal for the release (end date): Dec-12-2018 > - > > Round Table? > > ================= > > Feel free to add your topic into : > https://hackmd.io/yTC-un5XT6KUB9V37LG6OQ?edit > > > -- > Amar Tumballi (amarts) > -- Amar Tumballi (amarts) -------------- next part -------------- An HTML attachment was scrubbed... URL: From nbalacha at redhat.com Wed Jan 9 06:23:12 2019 From: nbalacha at redhat.com (Nithya Balachandran) Date: Wed, 9 Jan 2019 11:53:12 +0530 Subject: [Gluster-devel] https://review.gluster.org/#/c/glusterfs/+/19778/ In-Reply-To: References:

Message-ID: On Wed, 9 Jan 2019 at 08:28, Amar Tumballi Suryanarayan wrote: > > > On Tue, Jan 8, 2019 at 8:04 PM Shyam Ranganathan > wrote: > >> On 1/8/19 8:33 AM, Nithya Balachandran wrote: >> > Shyam, what is your take on this? >> > An upstream user has tried it out and reported that it seems to fix the >> > issue , however cpu utilization doubles. >> >> We usually do not backport big fixes unless they are critical. My first >> answer would be, can't this wait for rel-6 which is up next? >> >> Considering it may take some more time to get adoption, doing a backport > may surely benefit users, IMO. > > I agree. This is a pain point for several users and I would like to have folks be able to try this out earlier and provide feedback. The change has gone through a good review overall, so from a review >> thoroughness perspective it looks good. >> >> The change has a test case to ensure that the limits are honored, so >> again a plus. >> >> Also, it is a switch, so in the worst case moving back to unlimited >> should be possible with little adverse effects in case the fix has issues. >> >> It hence, comes down to how confident are we that the change is not >> disruptive to an existing branch? If we can answer this with resonable >> confidence we can backport it and release it with the next 5.x update >> release. >> >> > Considering the code which the patch changes has changed very little over > last few years, I feel it is > totally safe to do the backport. Don't see any possible surprises. Will > send a patch today on release-5 branch. > > -Amar > > > >> > >> > Regards, >> > Nithya >> > >> > On Fri, 28 Dec 2018 at 09:17, Amar Tumballi > > > wrote: >> > >> > I feel its good to backport considering glusterfs-6.0 is another 2 >> > months away. >> > >> > On Fri, Dec 28, 2018 at 8:19 AM Nithya Balachandran >> > > wrote: >> > >> > Hi, >> > >> > Can we backport this to release-5 ? We have several reports of >> > high memory usage in fuse clients from users and this is likely >> > to help. >> > >> > Regards, >> > Nithya >> > _______________________________________________ >> > Gluster-devel mailing list >> > Gluster-devel at gluster.org >> > https://lists.gluster.org/mailman/listinfo/gluster-devel >> > >> > >> > >> > -- >> > Amar Tumballi (amarts) >> > >> > >> > _______________________________________________ >> > Gluster-devel mailing list >> > Gluster-devel at gluster.org >> > https://lists.gluster.org/mailman/listinfo/gluster-devel >> > >> > > > -- > Amar Tumballi (amarts) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jstrunk at redhat.com Wed Jan 9 16:21:04 2019 From: jstrunk at redhat.com (John Strunk) Date: Wed, 9 Jan 2019 11:21:04 -0500 Subject: [Gluster-devel] Weekly GCS architecture call Message-ID: We have a weekly 1 hour call to discuss architecture topics related to GCS. This call has been ongoing for several months as an internal meeting. With the new year, we are expanding the invitation for the community to join and hear/contribute to the discussions. Meeting info: - Time/Date: Thursdays at 15:00 UTC (hint: `date -d "15:00 UTC"`) - Location: Bluejeans - https://bluejeans.com/600091070 - Minutes/Agenda/Info: https://hackmd.io/sj9ik9SCTYm81YcQDOOrtw This week's main topic will be a roundtable discussion to highlight the set of remaining tasks for a GCS 1.0 release. We are targeting the 1.0 release for the end of January / early February. See you tomorrow. -John -------------- next part -------------- An HTML attachment was scrubbed... URL: From srangana at redhat.com Wed Jan 9 18:54:13 2019 From: srangana at redhat.com (Shyam Ranganathan) Date: Wed, 9 Jan 2019 13:54:13 -0500 Subject: [Gluster-devel] Regression health for release-5.next and release-6 Message-ID: <89f54d02-8c78-a507-416a-c1ca1d7b4be2@redhat.com> Hi, As part of branching preparation next week for release-6, please find test failures and respective test links here [1]. The top tests that are failing/dumping-core are as below and need attention, - ec/bug-1236065.t - glusterd/add-brick-and-validate-replicated-volume-options.t - readdir-ahead/bug-1390050.t - glusterd/brick-mux-validation.t - bug-1432542-mpx-restart-crash.t Others of interest, - replicate/bug-1341650.t Please file a bug if needed against the test case and report the same here, in case a problem is already addressed, then do send back the patch details that addresses this issue as a response to this mail. Thanks, Shyam [1] Regression failures: https://hackmd.io/wsPgKjfJRWCP8ixHnYGqcA?view From manu at netbsd.org Thu Jan 10 02:40:34 2019 From: manu at netbsd.org (Emmanuel Dreyfus) Date: Thu, 10 Jan 2019 03:40:34 +0100 Subject: [Gluster-devel] FUSE directory filehandle Message-ID: <1o1643u.1ah4ac9mi3e0M%manu@netbsd.org> Hello This is not strictly a GlusterFS question since I came to it porting LTFS to NetBSD, however I would like to make sure I will not break GlusterFS by fixing NetBSD FUSE implementation for LTFS. Current NetBSD FUSE implementation sends the filehandle in any FUSE requests for an open node, regardless of its type (directory or file). I discovered that libfuse low level code manages filehandle differently for opendir/readdir/syncdir/releasedir than for other operations. As a result, when a getattr is done on a directory, setting the filehandle obtained from opendir can cause a crash in libfuse. The fix for NetBSD FUSE implementation is to avoid setting the filehandle for the following FUSE operations on directories: getattr, setattr, poll, getlk, setlk, setlkw, read, write (only the first two ones are likely to be actually used, though) Does anyone forsee a possible problem for GlusterFS with such a behavior? In other words, will it be fine to always have a FUSE_UNKNOWN_FH (aka null) filehandle for getattr/setattr on directories? -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz manu at netbsd.org From amukherj at redhat.com Thu Jan 10 10:25:27 2019 From: amukherj at redhat.com (Atin Mukherjee) Date: Thu, 10 Jan 2019 15:55:27 +0530 Subject: [Gluster-devel] Regression health for release-5.next and release-6 In-Reply-To: <89f54d02-8c78-a507-416a-c1ca1d7b4be2@redhat.com> References: <89f54d02-8c78-a507-416a-c1ca1d7b4be2@redhat.com> Message-ID: Mohit, Sanju - request you to investigate the failures related to glusterd and brick-mux and report back to the list. On Thu, Jan 10, 2019 at 12:25 AM Shyam Ranganathan wrote: > Hi, > > As part of branching preparation next week for release-6, please find > test failures and respective test links here [1]. > > The top tests that are failing/dumping-core are as below and need > attention, > - ec/bug-1236065.t > - glusterd/add-brick-and-validate-replicated-volume-options.t > - readdir-ahead/bug-1390050.t > - glusterd/brick-mux-validation.t > - bug-1432542-mpx-restart-crash.t > > Others of interest, > - replicate/bug-1341650.t > > Please file a bug if needed against the test case and report the same > here, in case a problem is already addressed, then do send back the > patch details that addresses this issue as a response to this mail. > > Thanks, > Shyam > > [1] Regression failures: https://hackmd.io/wsPgKjfJRWCP8ixHnYGqcA?view > _______________________________________________ > Gluster-devel mailing list > Gluster-devel at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-devel > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From moagrawa at redhat.com Thu Jan 10 11:20:27 2019 From: moagrawa at redhat.com (Mohit Agrawal) Date: Thu, 10 Jan 2019 16:50:27 +0530 Subject: [Gluster-devel] Regression health for release-5.next and release-6 In-Reply-To: References: <89f54d02-8c78-a507-416a-c1ca1d7b4be2@redhat.com> Message-ID: I think we should consider regression-builds after merged the patch ( https://review.gluster.org/#/c/glusterfs/+/21990/) as we know this patch introduced some delay. Thanks, Mohit Agrawal On Thu, Jan 10, 2019 at 3:55 PM Atin Mukherjee wrote: > Mohit, Sanju - request you to investigate the failures related to glusterd > and brick-mux and report back to the list. > > On Thu, Jan 10, 2019 at 12:25 AM Shyam Ranganathan > wrote: > >> Hi, >> >> As part of branching preparation next week for release-6, please find >> test failures and respective test links here [1]. >> >> The top tests that are failing/dumping-core are as below and need >> attention, >> - ec/bug-1236065.t >> - glusterd/add-brick-and-validate-replicated-volume-options.t >> - readdir-ahead/bug-1390050.t >> - glusterd/brick-mux-validation.t >> - bug-1432542-mpx-restart-crash.t >> >> Others of interest, >> - replicate/bug-1341650.t >> >> Please file a bug if needed against the test case and report the same >> here, in case a problem is already addressed, then do send back the >> patch details that addresses this issue as a response to this mail. >> >> Thanks, >> Shyam >> >> [1] Regression failures: https://hackmd.io/wsPgKjfJRWCP8ixHnYGqcA?view >> _______________________________________________ >> Gluster-devel mailing list >> Gluster-devel at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-devel >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From atumball at redhat.com Thu Jan 10 11:56:37 2019 From: atumball at redhat.com (Amar Tumballi Suryanarayan) Date: Thu, 10 Jan 2019 17:26:37 +0530 Subject: [Gluster-devel] Regression health for release-5.next and release-6 In-Reply-To: References: <89f54d02-8c78-a507-416a-c1ca1d7b4be2@redhat.com> Message-ID: That is a good point Mohit, but do we know how many of these tests failed because of 'timeout' ? If most of these are due to timeout, then yes, it may be a valid point. -Amar On Thu, Jan 10, 2019 at 4:51 PM Mohit Agrawal wrote: > I think we should consider regression-builds after merged the patch ( > https://review.gluster.org/#/c/glusterfs/+/21990/) > as we know this patch introduced some delay. > > Thanks, > Mohit Agrawal > > On Thu, Jan 10, 2019 at 3:55 PM Atin Mukherjee > wrote: > >> Mohit, Sanju - request you to investigate the failures related to >> glusterd and brick-mux and report back to the list. >> >> On Thu, Jan 10, 2019 at 12:25 AM Shyam Ranganathan >> wrote: >> >>> Hi, >>> >>> As part of branching preparation next week for release-6, please find >>> test failures and respective test links here [1]. >>> >>> The top tests that are failing/dumping-core are as below and need >>> attention, >>> - ec/bug-1236065.t >>> - glusterd/add-brick-and-validate-replicated-volume-options.t >>> - readdir-ahead/bug-1390050.t >>> - glusterd/brick-mux-validation.t >>> - bug-1432542-mpx-restart-crash.t >>> >>> Others of interest, >>> - replicate/bug-1341650.t >>> >>> Please file a bug if needed against the test case and report the same >>> here, in case a problem is already addressed, then do send back the >>> patch details that addresses this issue as a response to this mail. >>> >>> Thanks, >>> Shyam >>> >>> [1] Regression failures: https://hackmd.io/wsPgKjfJRWCP8ixHnYGqcA?view >>> _______________________________________________ >>> Gluster-devel mailing list >>> Gluster-devel at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-devel >>> >>> >>> _______________________________________________ > Gluster-devel mailing list > Gluster-devel at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-devel -- Amar Tumballi (amarts) -------------- next part -------------- An HTML attachment was scrubbed... URL: From amukherj at redhat.com Fri Jan 11 03:29:06 2019 From: amukherj at redhat.com (Atin Mukherjee) Date: Fri, 11 Jan 2019 08:59:06 +0530 Subject: [Gluster-devel] GCS 0.5 release Message-ID: Today, we are announcing the availability of GCS (Gluster Container Storage) 0.5. Highlights and updates since v0.4: - GCS environment updated to kube 1.13 - CSI deployment moved to 1.0 - Integrated Anthill deployment - Kube & etcd metrics added to prometheus - Tuning of etcd to increase stability - GD2 bug fixes from scale testing effort. Included components: - Glusterd2: https://github.com/gluster/glusterd2 - Gluster CSI driver: https://github.com/gluster/gluster-csi-driver - Gluster-prometheus: https://github.com/gluster/gluster-prometheus - Anthill - https://github.com/gluster/anthill/ - Gluster-Mixins - https://github.com/gluster/gluster-mixins/ For more details on the specific content of this release please refer [3]. If you are interested in contributing, please see [4] or contact the gluster-devel mailing list. We?re always interested in any bugs that you find, pull requests for new features and your feedback. Regards, Team GCS [1] https://github.com/gluster/gcs/releases [2] https://github.com/gluster/gcs/tree/master/deploy [3] https://waffle.io/gluster/gcs?label=GCS%2F0.5 - search for ?Done? lane [4] https://github.com/gluster/gcs -------------- next part -------------- An HTML attachment was scrubbed... URL: From atumball at redhat.com Fri Jan 11 08:37:36 2019 From: atumball at redhat.com (Amar Tumballi Suryanarayan) Date: Fri, 11 Jan 2019 14:07:36 +0530 Subject: [Gluster-devel] FUSE directory filehandle In-Reply-To: <1o1643u.1ah4ac9mi3e0M%manu@netbsd.org> References: <1o1643u.1ah4ac9mi3e0M%manu@netbsd.org> Message-ID: On Thu, Jan 10, 2019 at 8:17 AM Emmanuel Dreyfus wrote: > Hello > > This is not strictly a GlusterFS question since I came to it porting > LTFS to NetBSD, however I would like to make sure I will not break > GlusterFS by fixing NetBSD FUSE implementation for LTFS. > > Current NetBSD FUSE implementation sends the filehandle in any FUSE > requests for an open node, regardless of its type (directory or file). > > I discovered that libfuse low level code manages filehandle differently > for opendir/readdir/syncdir/releasedir than for other operations. As a > result, when a getattr is done on a directory, setting the filehandle > obtained from opendir can cause a crash in libfuse. > > The fix for NetBSD FUSE implementation is to avoid setting the > filehandle for the following FUSE operations on directories: getattr, > setattr, poll, getlk, setlk, setlkw, read, write (only the first two > ones are likely to be actually used, though) > > Does anyone forsee a possible problem for GlusterFS with such a > behavior? In other words, will it be fine to always have a > FUSE_UNKNOWN_FH (aka null) filehandle for getattr/setattr on > directories? > > Below is the code snippet from fuse_getattr(). #if FUSE_KERNEL_MINOR_VERSION >= 9 priv = this->private; if (priv->proto_minor >= 9 && fgi->getattr_flags & FUSE_GETATTR_FH) state->fd = fd_ref((fd_t *)(uintptr_t)fgi->fh); #endif Which means, it may crash if we get fd as NULL, when FUSE_GETATTR_FH is set. > > -- > Emmanuel Dreyfus > http://hcpnet.free.fr/pubz > manu at netbsd.org > _______________________________________________ > Gluster-devel mailing list > Gluster-devel at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-devel > > > -- Amar Tumballi (amarts) -------------- next part -------------- An HTML attachment was scrubbed... URL: From rgowdapp at redhat.com Fri Jan 11 14:39:22 2019 From: rgowdapp at redhat.com (Raghavendra Gowdappa) Date: Fri, 11 Jan 2019 20:09:22 +0530 Subject: [Gluster-devel] Latency analysis of GlusterFS' network layer for pgbench In-Reply-To: References:

Message-ID: Here is the update of the progress till now: * The client profile attached till now shows the tuple creation is dominated by writes and fstats. Note that fstats are side-effects of writes as writes invalidate attributes of the file from kernel attribute cache. * The rest of the init phase (which is marked by msgs "setting primary key" and "vaccuum") is dominated by reads. Next bigger set of operations are writes followed by fstats. So, only writes, reads and fstats are the operations we need to optimize to reduce the init time latency. As mentioned in my previous mail, I did following tunings: * Enabled only write-behind, md-cache and open-behind. - write-behind was configured with a cache-size/window-size of 20MB - open-behind was configured with read-after-open yes - md-cache was loaded as a child of write-behind in xlator graph. As a parent of write-behind, writes responses of writes cached in write-behind would invalidate stats. But when loaded as a child of write-behind this problem won't be there. Note that in both cases fstat would pass through write-behind (In the former case due to no stats in md-cache). However in the latter case fstats can be served by md-cache. - md-cache used to aggressively invalidate inodes. For the purpose of this test, I just commented out inode-invalidate code in md-cache. We need to fine tune the invalidation invocation logic. - set group-metadata-cache to on. But turned off upcall notifications. Note that since this workload basically accesses all its data through single mount point. So, there is no shared files across mounts and hence its safe to turn off invalidations. * Applied fix to https://bugzilla.redhat.com/show_bug.cgi?id=1648781 With the above set of tunings I could reduce the init time of scale 8000 from 16.6 hrs to 11.4 hrs - an improvement in the range 25% to 30% Since the workload is dominated by reads, we think a good read-cache where reads to regions just written are served from cache would greatly improve the performance. Since kernel page-cache already provides that functionality along with read-ahead (which is more intelligent and serves more read patterns than supported by Glusterfs read-ahead), we wanted to try that. But, Manoj found a bug where reads followed by writes are not served from page cache [5]. I am currently waiting for the resolution of this bug. As an alternative, I can modify io-cache to serve reads from the data just written. But, the change involves its challenges and hence would like to get a resolution on [5] (either positive or negative) before proceeding with modifications to io-cache. As to the rpc latency, Krutika had long back identified that reading a single rpc message involves atleast 4 reads to socket. These many number of reads were done to identify the structure of the message on the go. The reason we wanted to discover the rpc message was to identify the part of the rpc message containing read or write payload and make sure that payload is directly read into a buffer different than the one containing rest of the rpc message. This strategy will make sure payloads are not copied again when buffers are moved across caches (read-ahead, io-cache etc) and also the rest of the rpc message can be freed even though the payload outlives the rpc message (when payloads are cached). However, we can experiment an approach where we can either do away with zero-copy requirement or let the entire buffer containing rpc message and payload to live in the cache. >From my observations and discussions with Manoj and Xavi, this workload is very sensitive to latency (than to concurrency). So, I am hopeful the above approaches will give positive results. [5] https://bugzilla.redhat.com/show_bug.cgi?id=1664934 regards, Raghavendra On Fri, Dec 28, 2018 at 12:44 PM Raghavendra Gowdappa wrote: > > > On Mon, Dec 24, 2018 at 6:05 PM Raghavendra Gowdappa > wrote: > >> >> >> On Mon, Dec 24, 2018 at 3:40 PM Sankarshan Mukhopadhyay < >> sankarshan.mukhopadhyay at gmail.com> wrote: >> >>> [pulling the conclusions up to enable better in-line] >>> >>> > Conclusions: >>> > >>> > We should never have a volume with caching-related xlators disabled. >>> The price we pay for it is too high. We need to make them work consistently >>> and aggressively to avoid as many requests as we can. >>> >>> Are there current issues in terms of behavior which are known/observed >>> when these are enabled? >>> >> >> We did have issues with pgbench in past. But they've have been fixed. >> Please refer to bz [1] for details. On 5.1, it runs successfully with all >> caching related xlators enabled. Having said that the only performance >> xlators which gave improved performance were open-behind and write-behind >> [2] (write-behind had some issues, which will be fixed by [3] and we'll >> have to measure performance again with fix to [3]). >> > > One quick update. Enabling write-behind and md-cache with fix for [3] > reduced the total time taken for pgbench init phase roughly by 20%-25% > (from 12.5 min to 9.75 min for a scale of 100). Though this is still a huge > time (around 12hrs for a db of scale 8000). I'll follow up with a detailed > report once my experiments are complete. Currently trying to optimize the > read path. > > >> For some reason, read-side caching didn't improve transactions per >> second. I am working on this problem currently. Note that these bugs >> measure transaction phase of pgbench, but what xavi measured in his mail is >> init phase. Nevertheless, evaluation of read caching (metadata/data) will >> still be relevant for init phase too. >> >> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1512691 >> [2] https://bugzilla.redhat.com/show_bug.cgi?id=1629589#c4 >> [3] https://bugzilla.redhat.com/show_bug.cgi?id=1648781 >> >> >>> > We need to analyze client/server xlators deeper to see if we can avoid >>> some delays. However optimizing something that is already at the >>> microsecond level can be very hard. >>> >>> That is true - are there any significant gains which can be accrued by >>> putting efforts here or, should this be a lower priority? >>> >> >> The problem identified by xavi is also the one we (Manoj, Krutika, me and >> Milind) had encountered in the past [4]. The solution we used was to have >> multiple rpc connections between single brick and client. The solution >> indeed fixed the bottleneck. So, there is definitely work involved here - >> either to fix the single connection model or go with multiple connection >> model. Its preferred to improve single connection and resort to multiple >> connections only if bottlenecks in single connection are not fixable. >> Personally I think this is high priority along with having appropriate >> client side caching. >> >> [4] https://bugzilla.redhat.com/show_bug.cgi?id=1467614#c52 >> >> >>> > We need to determine what causes the fluctuations in brick side and >>> avoid them. >>> > This scenario is very similar to a smallfile/metadata workload, so >>> this is probably one important cause of its bad performance. >>> >>> What kind of instrumentation is required to enable the determination? >>> >>> On Fri, Dec 21, 2018 at 1:48 PM Xavi Hernandez >>> wrote: >>> > >>> > Hi, >>> > >>> > I've done some tracing of the latency that network layer introduces in >>> gluster. I've made the analysis as part of the pgbench performance issue >>> (in particulat the initialization and scaling phase), so I decided to look >>> at READV for this particular workload, but I think the results can be >>> extrapolated to other operations that also have small latency (cached data >>> from FS for example). >>> > >>> > Note that measuring latencies introduces some latency. It consists in >>> a call to clock_get_time() for each probe point, so the real latency will >>> be a bit lower, but still proportional to these numbers. >>> > >>> >>> [snip] >>> _______________________________________________ >>> Gluster-devel mailing list >>> Gluster-devel at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-devel >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: pgbench-init-client-profile.tgz Type: application/x-compressed-tar Size: 8962 bytes Desc: not available URL: From srangana at redhat.com Fri Jan 11 15:50:09 2019 From: srangana at redhat.com (Shyam Ranganathan) Date: Fri, 11 Jan 2019 10:50:09 -0500 Subject: [Gluster-devel] Regression health for release-5.next and release-6 In-Reply-To: References: <89f54d02-8c78-a507-416a-c1ca1d7b4be2@redhat.com> Message-ID: <65f1c892-a3c4-2401-4827-dbe0277875b2@redhat.com> We can check health on master post the patch as stated by Mohit below. Release-5 is causing some concerns as we need to tag the release yesterday, but we have the following 2 tests failing or coredumping pretty regularly, need attention on these. ec/bug-1236065.t glusterd/add-brick-and-validate-replicated-volume-options.t Shyam On 1/10/19 6:20 AM, Mohit Agrawal wrote: > I think we should consider regression-builds after merged the patch > (https://review.gluster.org/#/c/glusterfs/+/21990/)? > as we know this patch introduced some delay. > > Thanks, > Mohit Agrawal > > On Thu, Jan 10, 2019 at 3:55 PM Atin Mukherjee > wrote: > > Mohit, Sanju - request you to investigate the failures related to > glusterd and brick-mux and report back to the list. > > On Thu, Jan 10, 2019 at 12:25 AM Shyam Ranganathan > > wrote: > > Hi, > > As part of branching preparation next week for release-6, please > find > test failures and respective test links here [1]. > > The top tests that are failing/dumping-core are as below and > need attention, > - ec/bug-1236065.t > - glusterd/add-brick-and-validate-replicated-volume-options.t > - readdir-ahead/bug-1390050.t > - glusterd/brick-mux-validation.t > - bug-1432542-mpx-restart-crash.t > > Others of interest, > - replicate/bug-1341650.t > > Please file a bug if needed against the test case and report the > same > here, in case a problem is already addressed, then do send back the > patch details that addresses this issue as a response to this mail. > > Thanks, > Shyam > > [1] Regression failures: > https://hackmd.io/wsPgKjfJRWCP8ixHnYGqcA?view > _______________________________________________ > Gluster-devel mailing list > Gluster-devel at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-devel > > From moagrawa at redhat.com Sat Jan 12 12:59:56 2019 From: moagrawa at redhat.com (Mohit Agrawal) Date: Sat, 12 Jan 2019 18:29:56 +0530 Subject: [Gluster-devel] Regression health for release-5.next and release-6 In-Reply-To: <65f1c892-a3c4-2401-4827-dbe0277875b2@redhat.com> References: <89f54d02-8c78-a507-416a-c1ca1d7b4be2@redhat.com> <65f1c892-a3c4-2401-4827-dbe0277875b2@redhat.com> Message-ID: For specific to "add-brick-and-validate-replicated-volume-options.t" i have posted a patch https://review.gluster.org/22015. For test case "ec/bug-1236065.t" I think the issue needs to be check by ec team On the brick side, it is showing below logs >>>>>>>>>>>>>>>>> on wire in the future [Invalid argument] The message "I [MSGID: 101016] [glusterfs3.h:746:dict_to_xdr] 0-dict: key 'trusted.ec.dirty' would not be sent on wire in the future [Invalid argument]" repeated 3 times between [2019-01-12 12:25:25.902828] and [2019-01-12 12:25:25.902992] [2019-01-12 12:25:25.903553] W [MSGID: 114031] [client-rpc-fops_v2.c:1614:client4_0_fxattrop_cbk] 0-patchy-client-1: remote operation failed [Bad file descriptor] [2019-01-12 12:25:25.903998] W [MSGID: 122040] [ec-common.c:1181:ec_prepare_update_cbk] 0-patchy-disperse-0: Failed to get size and version : FOP : 'FXATTROP' failed on gfid d91f6331-d394-479d-ab51-6bcf674ac3e0 [Input/output error] [2019-01-12 12:25:25.904059] W [fuse-bridge.c:1907:fuse_unlink_cbk] 0-glusterfs-fuse: 3259: UNLINK() /test/0.o => -1 (Input/output error) >>>>>>>>>>>>>>>>>>> Test case is getting timed out because "volume heal $V0 full" command is stuck, look's like shd is getting stuck at getxattr >>>>>>>>>>>>>>. Thread 8 (Thread 0x7f83777fe700 (LWP 25552)): #0 0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0 #1 0x00007f83bc910e5b in syncop_getxattr (subvol=, loc=loc at entry=0x7f83777fdbb0, dict=dict at entry=0x0, key=key at entry=0x7f83add06a28 "trusted.ec.heal", xdata_in=xdata_in at entry=0x0, xdata_out=xdata_out at entry=0x0) at syncop.c:1680 #2 0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a8030880, child=, loc=0x7f83777fdbb0, full=) at ec-heald.c:161 #3 0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a80094b0, entry=, parent=0x7f83777fdde0, data=0x7f83a8030880) at ec-heald.c:294 #4 0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a80094b0, loc=loc at entry=0x7f83777fdde0, pid=pid at entry=-6, data=data at entry=0x7f83a8030880, fn=fn at entry=0x7f83add03140 ) at syncop-utils.c:125 #5 0x00007f83add03534 in ec_shd_full_sweep (healer=healer at entry=0x7f83a8030880, inode=) at ec-heald.c:311 #6 0x00007f83add0367b in ec_shd_full_healer (data=0x7f83a8030880) at ec-heald.c:372 #7 0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0 #8 0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6 Thread 7 (Thread 0x7f8376ffd700 (LWP 25553)): #0 0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0 #1 0x00007f83bc910e5b in syncop_getxattr (subvol=, loc=loc at entry=0x7f8376ffcbb0, dict=dict at entry=0x0, key=key at entry=0x7f83add06a28 "trusted.ec.heal", xdata_in=xdata_in at entry=0x0, xdata_out=xdata_out at entry=0x0) at syncop.c:1680 #2 0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a80308f0, child=, loc=0x7f8376ffcbb0, full=) at ec-heald.c:161 #3 0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a800d110, entry=, parent=0x7f8376ffcde0, data=0x7f83a80308f0) at ec-heald.c:294 #4 0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a800d110, loc=loc at entry=0x7f8376ffcde0, pid=pid at entry=-6, data=data at entry=0x7f83a80308f0, fn=fn at entry=0x7f83add03140 ) at syncop-utils.c:125 #5 0x00007f83add03534 in ec_shd_full_sweep (healer=healer at entry=0x7f83a80308f0, inode=) at ec-heald.c:311 #6 0x00007f83add0367b in ec_shd_full_healer (data=0x7f83a80308f0) at ec-heald.c:372 #7 0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0 #8 0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6 Thread 6 (Thread 0x7f83767fc700 (LWP 25554)): #0 0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0 #1 0x00007f83bc910e5b in syncop_getxattr (subvol=, loc=loc at entry=0x7f83767fbbb0, dict=dict at entry=0x0, key=key at entry=0x7f83add06a28 "trusted.ec.heal", xdata_in=xdata_in at entry=0x0, xdata_out=xdata_out at entry=0x0) at syncop.c:1680 #2 0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a8030960, child=, loc=0x7f83767fbbb0, full=) at ec-heald.c:161 #3 0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a8010af0, entry=, parent=0x7f83767fbde0, data=0x7f83a8030960) at ec-heald.c:294 #4 0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a8010af0, loc=loc at entry=0x7f83767fbde0, pid=pid at entry=-6, data=data at entry=0x7f83a8030960, fn=fn at entry=0x7f83add03140 ) at syncop-utils.c:125 #5 0x00007f83add03534 in ec_shd_full_sweep (healer=healer at entry=0x7f83a8030960, inode=) at ec-heald.c:311 #6 0x00007f83add0367b in ec_shd_full_healer (data=0x7f83a8030960) at ec-heald.c:372 #7 0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0 #8 0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6 Thread 5 (Thread 0x7f8375ffb700 (LWP 25555)): #0 0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0 #1 0x00007f83bc910e5b in syncop_getxattr (subvol=, loc=loc at entry=0x7f8375ffabb0, dict=dict at entry=0x0, key=key at entry=0x7f83add06a28 "trusted.ec.heal", xdata_in=xdata_in at entry=0x0, xdata_out=xdata_out at entry=0x0) at syncop.c:1680 #2 0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a80309d0, child=, loc=0x7f8375ffabb0, full=) at ec-heald.c:161 #3 0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a80144d0, entry=, parent=0x7f8375ffade0, data=0x7f83a80309d0) at ec-heald.c:294 #4 0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a80144d0, loc=loc at entry=0x7f8375ffade0, pid=pid at entry=-6, data=data at entry=0x7f83a80309d0, fn=fn at entry=0x7f83add03140 ) at syncop-utils.c:125 #5 0x00007f83add03534 in ec_shd_full_sweep (healer=healer at entry=0x7f83a80309d0, inode=) at ec-heald.c:311 #6 0x00007f83add0367b in ec_shd_full_healer (data=0x7f83a80309d0) at ec-heald.c:372 #7 0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0 #8 0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6 Thread 4 (Thread 0x7f83757fa700 (LWP 25556)): #0 0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0 #1 0x00007f83bc910e5b in syncop_getxattr (subvol=, loc=loc at entry=0x7f83757f9bb0, dict=dict at entry=0x0, key=key at entry=0x7f83add06a28 "trusted.ec.heal", xdata_in=xdata_in at entry=0x0, xdata_out=xdata_out at entry=0x0) at syncop.c:1680 #2 0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a8030a40, child=, loc=0x7f83757f9bb0, full=) at ec-heald.c:161 #3 0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a8017eb0, entry=, parent=0x7f83757f9de0, data=0x7f83a8030a40) at ec-heald.c:294 #4 0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a8017eb0, loc=loc at entry=0x7f83757f9de0, pid=pid at entry=-6, data=data at entry=0x7f83a8030a40, fn=fn at entry=0x7f83add03140 ) at syncop-utils.c:125 #5 0x00007f83add03534 in ec_shd_full_sweep (healer=healer at entry=0x7f83a8030a40, inode=) at ec-heald.c:311 #6 0x00007f83add0367b in ec_shd_full_healer (data=0x7f83a8030a40) at ec-heald.c:372 #7 0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0 #8 0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6 Thread 3 (Thread 0x7f8374ff9700 (LWP 25557)): #0 0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0 #1 0x00007f83bc910e5b in syncop_getxattr (subvol=, loc=loc at entry=0x7f8374ff8bb0, dict=dict at entry=0x0, key=key at entry=0x7f83add06a28 "trusted.ec.heal", xdata_in=xdata_in at entry=0x0, xdata_out=xdata_out at entry=0x0) at syncop.c:1680 #2 0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a8030ab0, child=, loc=0x7f8374ff8bb0, full=) at ec-heald.c:161 #3 0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a801b890, entry=, parent=0x7f8374ff8de0, data=0x7f83a8030ab0) at ec-heald.c:294 #4 0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a801b890, loc=loc at entry=0x7f8374ff8de0, pid=pid at entry=-6, data=data at entry=0x7f83a8030ab0, fn=fn at entry=0x7f83add03140 ) at syncop-utils.c:125 #5 0x00007f83add03534 in ec_shd_full_sweep (healer=healer at entry=0x7f83a8030ab0, inode=) at ec-heald.c:311 #6 0x00007f83add0367b in ec_shd_full_healer (data=0x7f83a8030ab0) at ec-heald.c:372 #7 0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0 #8 0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6 Thread 2 (Thread 0x7f8367fff700 (LWP 25558)): #0 0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0 #1 0x00007f83bc910e5b in syncop_getxattr (subvol=, loc=loc at entry=0x7f8367ffebb0, dict=dict at entry=0x0, key=key at entry=0x7f83add06a28 "trusted.ec.heal", xdata_in=xdata_in at entry=0x0, xdata_out=xdata_out at entry=0x0) at syncop.c:1680 #2 0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a8030b20, child=, loc=0x7f8367ffebb0, full=) at ec-heald.c:161 #3 0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a801f270, entry=, parent=0x7f8367ffede0, data=0x7f83a8030b20) at ec-heald.c:294 #4 0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a801f270, loc=loc at entry=0x7f8367ffede0, pid=pid at entry=-6, data=data at entry=0x7f83a8030b20, fn=fn at entry=0x7f83add03140 ) at syncop-utils.c:125 #5 0x00007f83add03534 in ec_shd_full_sweep (healer=healer at entry=0x7f83a8030b20, inode=) at ec-heald.c:311 #6 0x00007f83add0367b in ec_shd_full_healer (data=0x7f83a8030b20) at ec-heald.c:372 #7 0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0 #8 0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6 Thread 1 (Thread 0x7f83bcdd1780 (LWP 25383)): #0 0x00007f83bb70af57 in pthread_join () from /usr/lib64/libpthread.so.0 #1 0x00007f83bc92eff8 in event_dispatch_epoll (event_pool=0x55af0a6dd560) at event-epoll.c:846 #2 0x000055af0a4116b8 in main (argc=15, argv=0x7fff75610898) at glusterfsd.c:2848 >>>>>>>>>>>>>>>>>>>>>>>>>>. Thanks, Mohit Agrawal On Fri 11 Jan, 2019, 21:20 Shyam Ranganathan We can check health on master post the patch as stated by Mohit below. > > Release-5 is causing some concerns as we need to tag the release > yesterday, but we have the following 2 tests failing or coredumping > pretty regularly, need attention on these. > > ec/bug-1236065.t > glusterd/add-brick-and-validate-replicated-volume-options.t > > Shyam > On 1/10/19 6:20 AM, Mohit Agrawal wrote: > > I think we should consider regression-builds after merged the patch > > (https://review.gluster.org/#/c/glusterfs/+/21990/) > > as we know this patch introduced some delay. > > > > Thanks, > > Mohit Agrawal > > > > On Thu, Jan 10, 2019 at 3:55 PM Atin Mukherjee > > wrote: > > > > Mohit, Sanju - request you to investigate the failures related to > > glusterd and brick-mux and report back to the list. > > > > On Thu, Jan 10, 2019 at 12:25 AM Shyam Ranganathan > > > wrote: > > > > Hi, > > > > As part of branching preparation next week for release-6, please > > find > > test failures and respective test links here [1]. > > > > The top tests that are failing/dumping-core are as below and > > need attention, > > - ec/bug-1236065.t > > - glusterd/add-brick-and-validate-replicated-volume-options.t > > - readdir-ahead/bug-1390050.t > > - glusterd/brick-mux-validation.t > > - bug-1432542-mpx-restart-crash.t > > > > Others of interest, > > - replicate/bug-1341650.t > > > > Please file a bug if needed against the test case and report the > > same > > here, in case a problem is already addressed, then do send back > the > > patch details that addresses this issue as a response to this > mail. > > > > Thanks, > > Shyam > > > > [1] Regression failures: > > https://hackmd.io/wsPgKjfJRWCP8ixHnYGqcA?view > > _______________________________________________ > > Gluster-devel mailing list > > Gluster-devel at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-devel > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From moagrawa at redhat.com Sat Jan 12 13:16:20 2019 From: moagrawa at redhat.com (Mohit Agrawal) Date: Sat, 12 Jan 2019 18:46:20 +0530 Subject: [Gluster-devel] Regression health for release-5.next and release-6 In-Reply-To: References: <89f54d02-8c78-a507-416a-c1ca1d7b4be2@redhat.com> <65f1c892-a3c4-2401-4827-dbe0277875b2@redhat.com> Message-ID: Previous logs related to client not bricks, below are the brick logs [2019-01-12 12:25:25.893485]:++++++++++ G_LOG:./tests/bugs/ec/bug-1236065.t: TEST: 68 rm -f 0.o 10.o 11.o 12.o 13.o 14.o 15.o 16.o 17.o 18.o 19.o 1.o 2.o 3.o 4.o 5.o 6.o 7.o 8.o 9.o ++++++++++ The message "I [MSGID: 101016] [glusterfs3.h:746:dict_to_xdr] 0-dict: key 'trusted.ec.size' would not be sent on wire in the future [Invalid argument]" repeated 199 times between [2019-01-12 12:25:25.283989] and [2019-01-12 12:25:25.899532] [2019-01-12 12:25:25.903375] E [MSGID: 113001] [posix-inode-fd-ops.c:4617:_posix_handle_xattr_keyvalue_pair] 8-patchy-posix: fgetxattr failed on gfid=d91f6331-d394-479d-ab51-6bcf674ac3e0 while doing xattrop: Key:trusted.ec.dirty (Bad file descriptor) [Bad file descriptor] [2019-01-12 12:25:25.903468] E [MSGID: 115073] [server-rpc-fops_v2.c:1805:server4_fxattrop_cbk] 0-patchy-server: 1486: FXATTROP 2 (d91f6331-d394-479d-ab51-6bcf674ac3e0), client: CTX_ID:b785c2b0-3453-4a03-b129-19e6ceeb5346-GRAPH_ID:0-PID:24147-HOST:softserve-moagrawa-test.1-PC_NAME:patchy-client-1-RECON_NO:-1, error-xlator: patchy-posix [Bad file descriptor] Thanks, Mohit Agrawal On Sat, Jan 12, 2019 at 6:29 PM Mohit Agrawal wrote: > > For specific to "add-brick-and-validate-replicated-volume-options.t" i > have posted a patch https://review.gluster.org/22015. > For test case "ec/bug-1236065.t" I think the issue needs to be check by ec > team > > On the brick side, it is showing below logs > > >>>>>>>>>>>>>>>>> > > on wire in the future [Invalid argument] > The message "I [MSGID: 101016] [glusterfs3.h:746:dict_to_xdr] 0-dict: key > 'trusted.ec.dirty' would not be sent on wire in the future [Invalid > argument]" repeated 3 times between [2019-01-12 12:25:25.902828] and > [2019-01-12 12:25:25.902992] > [2019-01-12 12:25:25.903553] W [MSGID: 114031] > [client-rpc-fops_v2.c:1614:client4_0_fxattrop_cbk] 0-patchy-client-1: > remote operation failed [Bad file descriptor] > [2019-01-12 12:25:25.903998] W [MSGID: 122040] > [ec-common.c:1181:ec_prepare_update_cbk] 0-patchy-disperse-0: Failed to get > size and version : FOP : 'FXATTROP' failed on gfid > d91f6331-d394-479d-ab51-6bcf674ac3e0 [Input/output error] > [2019-01-12 12:25:25.904059] W [fuse-bridge.c:1907:fuse_unlink_cbk] > 0-glusterfs-fuse: 3259: UNLINK() /test/0.o => -1 (Input/output error) > > >>>>>>>>>>>>>>>>>>> > > Test case is getting timed out because "volume heal $V0 full" command is > stuck, look's like shd is getting stuck at getxattr > > >>>>>>>>>>>>>>. > > Thread 8 (Thread 0x7f83777fe700 (LWP 25552)): > #0 0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from > /usr/lib64/libpthread.so.0 > #1 0x00007f83bc910e5b in syncop_getxattr (subvol=, > loc=loc at entry=0x7f83777fdbb0, dict=dict at entry=0x0, key=key at entry=0x7f83add06a28 > "trusted.ec.heal", xdata_in=xdata_in at entry=0x0, xdata_out=xdata_out at entry=0x0) > at syncop.c:1680 > #2 0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a8030880, > child=, loc=0x7f83777fdbb0, full=) at > ec-heald.c:161 > #3 0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a80094b0, > entry=, parent=0x7f83777fdde0, data=0x7f83a8030880) at > ec-heald.c:294 > #4 0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a80094b0, loc=loc at entry=0x7f83777fdde0, > pid=pid at entry=-6, data=data at entry=0x7f83a8030880, fn=fn at entry=0x7f83add03140 > ) at syncop-utils.c:125 > #5 0x00007f83add03534 in ec_shd_full_sweep (healer=healer at entry=0x7f83a8030880, > inode=) at ec-heald.c:311 > #6 0x00007f83add0367b in ec_shd_full_healer (data=0x7f83a8030880) at > ec-heald.c:372 > #7 0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0 > #8 0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6 > Thread 7 (Thread 0x7f8376ffd700 (LWP 25553)): > #0 0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from > /usr/lib64/libpthread.so.0 > #1 0x00007f83bc910e5b in syncop_getxattr (subvol=, > loc=loc at entry=0x7f8376ffcbb0, dict=dict at entry=0x0, key=key at entry=0x7f83add06a28 > "trusted.ec.heal", xdata_in=xdata_in at entry=0x0, xdata_out=xdata_out at entry=0x0) > at syncop.c:1680 > #2 0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a80308f0, > child=, loc=0x7f8376ffcbb0, full=) at > ec-heald.c:161 > #3 0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a800d110, > entry=, parent=0x7f8376ffcde0, data=0x7f83a80308f0) at > ec-heald.c:294 > #4 0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a800d110, loc=loc at entry=0x7f8376ffcde0, > pid=pid at entry=-6, data=data at entry=0x7f83a80308f0, fn=fn at entry=0x7f83add03140 > ) at syncop-utils.c:125 > #5 0x00007f83add03534 in ec_shd_full_sweep (healer=healer at entry=0x7f83a80308f0, > inode=) at ec-heald.c:311 > #6 0x00007f83add0367b in ec_shd_full_healer (data=0x7f83a80308f0) at > ec-heald.c:372 > #7 0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0 > #8 0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6 > Thread 6 (Thread 0x7f83767fc700 (LWP 25554)): > #0 0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from > /usr/lib64/libpthread.so.0 > #1 0x00007f83bc910e5b in syncop_getxattr (subvol=, > loc=loc at entry=0x7f83767fbbb0, dict=dict at entry=0x0, key=key at entry=0x7f83add06a28 > "trusted.ec.heal", xdata_in=xdata_in at entry=0x0, xdata_out=xdata_out at entry=0x0) > at syncop.c:1680 > #2 0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a8030960, > child=, loc=0x7f83767fbbb0, full=) at > ec-heald.c:161 > #3 0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a8010af0, > entry=, parent=0x7f83767fbde0, data=0x7f83a8030960) at > ec-heald.c:294 > #4 0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a8010af0, loc=loc at entry=0x7f83767fbde0, > pid=pid at entry=-6, data=data at entry=0x7f83a8030960, fn=fn at entry=0x7f83add03140 > ) at syncop-utils.c:125 > #5 0x00007f83add03534 in ec_shd_full_sweep (healer=healer at entry=0x7f83a8030960, > inode=) at ec-heald.c:311 > #6 0x00007f83add0367b in ec_shd_full_healer (data=0x7f83a8030960) at > ec-heald.c:372 > #7 0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0 > #8 0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6 > Thread 5 (Thread 0x7f8375ffb700 (LWP 25555)): > #0 0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from > /usr/lib64/libpthread.so.0 > #1 0x00007f83bc910e5b in syncop_getxattr (subvol=, > loc=loc at entry=0x7f8375ffabb0, dict=dict at entry=0x0, key=key at entry=0x7f83add06a28 > "trusted.ec.heal", xdata_in=xdata_in at entry=0x0, xdata_out=xdata_out at entry=0x0) > at syncop.c:1680 > #2 0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a80309d0, > child=, loc=0x7f8375ffabb0, full=) at > ec-heald.c:161 > #3 0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a80144d0, > entry=, parent=0x7f8375ffade0, data=0x7f83a80309d0) at > ec-heald.c:294 > #4 0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a80144d0, loc=loc at entry=0x7f8375ffade0, > pid=pid at entry=-6, data=data at entry=0x7f83a80309d0, fn=fn at entry=0x7f83add03140 > ) at syncop-utils.c:125 > #5 0x00007f83add03534 in ec_shd_full_sweep (healer=healer at entry=0x7f83a80309d0, > inode=) at ec-heald.c:311 > #6 0x00007f83add0367b in ec_shd_full_healer (data=0x7f83a80309d0) at > ec-heald.c:372 > #7 0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0 > #8 0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6 > Thread 4 (Thread 0x7f83757fa700 (LWP 25556)): > #0 0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from > /usr/lib64/libpthread.so.0 > #1 0x00007f83bc910e5b in syncop_getxattr (subvol=, > loc=loc at entry=0x7f83757f9bb0, dict=dict at entry=0x0, key=key at entry=0x7f83add06a28 > "trusted.ec.heal", xdata_in=xdata_in at entry=0x0, xdata_out=xdata_out at entry=0x0) > at syncop.c:1680 > #2 0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a8030a40, > child=, loc=0x7f83757f9bb0, full=) at > ec-heald.c:161 > #3 0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a8017eb0, > entry=, parent=0x7f83757f9de0, data=0x7f83a8030a40) at > ec-heald.c:294 > #4 0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a8017eb0, loc=loc at entry=0x7f83757f9de0, > pid=pid at entry=-6, data=data at entry=0x7f83a8030a40, fn=fn at entry=0x7f83add03140 > ) at syncop-utils.c:125 > #5 0x00007f83add03534 in ec_shd_full_sweep (healer=healer at entry=0x7f83a8030a40, > inode=) at ec-heald.c:311 > #6 0x00007f83add0367b in ec_shd_full_healer (data=0x7f83a8030a40) at > ec-heald.c:372 > #7 0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0 > #8 0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6 > Thread 3 (Thread 0x7f8374ff9700 (LWP 25557)): > #0 0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from > /usr/lib64/libpthread.so.0 > #1 0x00007f83bc910e5b in syncop_getxattr (subvol=, > loc=loc at entry=0x7f8374ff8bb0, dict=dict at entry=0x0, key=key at entry=0x7f83add06a28 > "trusted.ec.heal", xdata_in=xdata_in at entry=0x0, xdata_out=xdata_out at entry=0x0) > at syncop.c:1680 > #2 0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a8030ab0, > child=, loc=0x7f8374ff8bb0, full=) at > ec-heald.c:161 > #3 0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a801b890, > entry=, parent=0x7f8374ff8de0, data=0x7f83a8030ab0) at > ec-heald.c:294 > #4 0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a801b890, loc=loc at entry=0x7f8374ff8de0, > pid=pid at entry=-6, data=data at entry=0x7f83a8030ab0, fn=fn at entry=0x7f83add03140 > ) at syncop-utils.c:125 > #5 0x00007f83add03534 in ec_shd_full_sweep (healer=healer at entry=0x7f83a8030ab0, > inode=) at ec-heald.c:311 > #6 0x00007f83add0367b in ec_shd_full_healer (data=0x7f83a8030ab0) at > ec-heald.c:372 > #7 0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0 > #8 0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6 > Thread 2 (Thread 0x7f8367fff700 (LWP 25558)): > #0 0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from > /usr/lib64/libpthread.so.0 > #1 0x00007f83bc910e5b in syncop_getxattr (subvol=, > loc=loc at entry=0x7f8367ffebb0, dict=dict at entry=0x0, key=key at entry=0x7f83add06a28 > "trusted.ec.heal", xdata_in=xdata_in at entry=0x0, xdata_out=xdata_out at entry=0x0) > at syncop.c:1680 > #2 0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a8030b20, > child=, loc=0x7f8367ffebb0, full=) at > ec-heald.c:161 > #3 0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a801f270, > entry=, parent=0x7f8367ffede0, data=0x7f83a8030b20) at > ec-heald.c:294 > #4 0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a801f270, loc=loc at entry=0x7f8367ffede0, > pid=pid at entry=-6, data=data at entry=0x7f83a8030b20, fn=fn at entry=0x7f83add03140 > ) at syncop-utils.c:125 > #5 0x00007f83add03534 in ec_shd_full_sweep (healer=healer at entry=0x7f83a8030b20, > inode=) at ec-heald.c:311 > #6 0x00007f83add0367b in ec_shd_full_healer (data=0x7f83a8030b20) at > ec-heald.c:372 > #7 0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0 > #8 0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6 > Thread 1 (Thread 0x7f83bcdd1780 (LWP 25383)): > #0 0x00007f83bb70af57 in pthread_join () from /usr/lib64/libpthread.so.0 > #1 0x00007f83bc92eff8 in event_dispatch_epoll (event_pool=0x55af0a6dd560) > at event-epoll.c:846 > #2 0x000055af0a4116b8 in main (argc=15, argv=0x7fff75610898) at > glusterfsd.c:2848 > > > >>>>>>>>>>>>>>>>>>>>>>>>>>. > > Thanks, > Mohit Agrawal > > On Fri 11 Jan, 2019, 21:20 Shyam Ranganathan >> We can check health on master post the patch as stated by Mohit below. >> >> Release-5 is causing some concerns as we need to tag the release >> yesterday, but we have the following 2 tests failing or coredumping >> pretty regularly, need attention on these. >> >> ec/bug-1236065.t >> glusterd/add-brick-and-validate-replicated-volume-options.t >> >> Shyam >> On 1/10/19 6:20 AM, Mohit Agrawal wrote: >> > I think we should consider regression-builds after merged the patch >> > (https://review.gluster.org/#/c/glusterfs/+/21990/) >> > as we know this patch introduced some delay. >> > >> > Thanks, >> > Mohit Agrawal >> > >> > On Thu, Jan 10, 2019 at 3:55 PM Atin Mukherjee > > > wrote: >> > >> > Mohit, Sanju - request you to investigate the failures related to >> > glusterd and brick-mux and report back to the list. >> > >> > On Thu, Jan 10, 2019 at 12:25 AM Shyam Ranganathan >> > > wrote: >> > >> > Hi, >> > >> > As part of branching preparation next week for release-6, please >> > find >> > test failures and respective test links here [1]. >> > >> > The top tests that are failing/dumping-core are as below and >> > need attention, >> > - ec/bug-1236065.t >> > - glusterd/add-brick-and-validate-replicated-volume-options.t >> > - readdir-ahead/bug-1390050.t >> > - glusterd/brick-mux-validation.t >> > - bug-1432542-mpx-restart-crash.t >> > >> > Others of interest, >> > - replicate/bug-1341650.t >> > >> > Please file a bug if needed against the test case and report the >> > same >> > here, in case a problem is already addressed, then do send back >> the >> > patch details that addresses this issue as a response to this >> mail. >> > >> > Thanks, >> > Shyam >> > >> > [1] Regression failures: >> > https://hackmd.io/wsPgKjfJRWCP8ixHnYGqcA?view >> > _______________________________________________ >> > Gluster-devel mailing list >> > Gluster-devel at gluster.org >> > https://lists.gluster.org/mailman/listinfo/gluster-devel >> > >> > >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jenkins at build.gluster.org Mon Jan 14 01:45:02 2019 From: jenkins at build.gluster.org (jenkins at build.gluster.org) Date: Mon, 14 Jan 2019 01:45:02 +0000 (UTC) Subject: [Gluster-devel] Weekly Untriaged Bugs Message-ID: <172507852.9.1547430303459.JavaMail.jenkins@jenkins-el7.rht.gluster.org> [...truncated 6 lines...] https://bugzilla.redhat.com/1660404 / core: Conditional freeing of string after returning from dict_set_dynstr function https://bugzilla.redhat.com/1665145 / core: Writes on Gluster 5 volumes fail with EIO when "cluster.consistent-metadata" is set https://bugzilla.redhat.com/1663337 / doc: Gluster documentation on quorum-reads option is incorrect https://bugzilla.redhat.com/1663205 / fuse: List dictionary is too slow https://bugzilla.redhat.com/1664524 / geo-replication: Non-root geo-replication session goes to faulty state, when the session is started https://bugzilla.redhat.com/1662178 / glusterd: Compilation fails for xlators/mgmt/glusterd/src with error "undefined reference to `dlclose'" https://bugzilla.redhat.com/1663247 / glusterd: remove static memory allocations from code https://bugzilla.redhat.com/1663519 / gluster-smb: Memory leak when smb.conf has "store dos attributes = yes" https://bugzilla.redhat.com/1665361 / project-infrastructure: Alerts for offline nodes https://bugzilla.redhat.com/1659934 / project-infrastructure: Cannot unsubscribe the review.gluster.org https://bugzilla.redhat.com/1663780 / project-infrastructure: On docs.gluster.org, we should convert spaces in folder or file names to 301 redirects to hypens https://bugzilla.redhat.com/1665677 / rdma: volume create and transport change with rdma failed https://bugzilla.redhat.com/1664215 / read-ahead: Toggling readdir-ahead translator off causes some clients to umount some of its volumes https://bugzilla.redhat.com/1661895 / replicate: [disperse] Dump respective itables in EC to statedumps. https://bugzilla.redhat.com/1662557 / replicate: glusterfs process crashes, causing "Transport endpoint not connected". https://bugzilla.redhat.com/1664398 / tests: ./tests/00-geo-rep/00-georep-verify-setup.t does not work with ./run-tests-in-vagrant.sh [...truncated 2 lines...] -------------- next part -------------- A non-text attachment was scrubbed... Name: build.log Type: application/octet-stream Size: 2220 bytes Desc: not available URL: From aspandey at redhat.com Mon Jan 14 10:06:22 2019 From: aspandey at redhat.com (Ashish Pandey) Date: Mon, 14 Jan 2019 05:06:22 -0500 (EST) Subject: [Gluster-devel] Regression health for release-5.next and release-6 In-Reply-To: References: <89f54d02-8c78-a507-416a-c1ca1d7b4be2@redhat.com> <65f1c892-a3c4-2401-4827-dbe0277875b2@redhat.com>

Message-ID: <2134165472.57578088.1547460382588.JavaMail.zimbra@redhat.com> I downloaded logs of regression runs 1077 and 1073 and tried to investigate it. In both regression ec/bug-1236065.t is hanging on TEST 70 which is trying to get the online brick count I can see that in mount/bricks and glusterd logs it has not move forward after this test. glusterd.log - [2019-01-06 16:27:51.346408]:++++++++++ G_LOG:./tests/bugs/ec/bug-1236065.t: TEST: 70 5 online_brick_count ++++++++++ [2019-01-06 16:27:51.645014] I [MSGID: 106499] [glusterd-handler.c:4404:__glusterd_handle_status_volume] 0-management: Received status volume req for volume patchy [2019-01-06 16:27:51.646664] I [dict.c:2745:dict_get_str_boolean] (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x4a6c3) [0x7f4c37fe06c3] -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x43b3a) [0x7f4c37fd9b3a] -->/build/install/lib/libglusterfs.so.0(dict_get_str_boolean+0x170) [0x7f4c433d83fb] ) 0-dict: key nfs.disable, integer type asked, has string type [Invalid argument] [2019-01-06 16:27:51.647177] I [dict.c:2361:dict_get_strn] (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) [0x7f4c38095a32] -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) [0x7f4c37fdd4ac] -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) [0x7f4c433d7673] ) 0-dict: key brick0.rdma_port, string type asked, has integer type [Invalid argument] [2019-01-06 16:27:51.647227] I [dict.c:2361:dict_get_strn] (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) [0x7f4c38095a32] -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) [0x7f4c37fdd4ac] -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) [0x7f4c433d7673] ) 0-dict: key brick1.rdma_port, string type asked, has integer type [Invalid argument] [2019-01-06 16:27:51.647292] I [dict.c:2361:dict_get_strn] (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) [0x7f4c38095a32] -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) [0x7f4c37fdd4ac] -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) [0x7f4c433d7673] ) 0-dict: key brick2.rdma_port, string type asked, has integer type [Invalid argument] [2019-01-06 16:27:51.647333] I [dict.c:2361:dict_get_strn] (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) [0x7f4c38095a32] -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) [0x7f4c37fdd4ac] -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) [0x7f4c433d7673] ) 0-dict: key brick3.rdma_port, string type asked, has integer type [Invalid argument] [2019-01-06 16:27:51.647371] I [dict.c:2361:dict_get_strn] (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) [0x7f4c38095a32] -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) [0x7f4c37fdd4ac] -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) [0x7f4c433d7673] ) 0-dict: key brick4.rdma_port, string type asked, has integer type [Invalid argument] [2019-01-06 16:27:51.647409] I [dict.c:2361:dict_get_strn] (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) [0x7f4c38095a32] -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) [0x7f4c37fdd4ac] -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) [0x7f4c433d7673] ) 0-dict: key brick5.rdma_port, string type asked, has integer type [Invalid argument] [2019-01-06 16:27:51.647447] I [dict.c:2361:dict_get_strn] (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) [0x7f4c38095a32] -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) [0x7f4c37fdd4ac] -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) [0x7f4c433d7673] ) 0-dict: key brick6.rdma_port, string type asked, has integer type [Invalid argument] [2019-01-06 16:27:51.649335] E [MSGID: 101191] [event-epoll.c:759:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler [2019-01-06 16:27:51.932871] I [MSGID: 106499] [glusterd-handler.c:4404:__glusterd_handle_status_volume] 0-management: Received status volume req for volume patchy It is just taking lot of time to get the status at this point. It looks like there could be some issue with connection or the handing of volume status when some bricks are down. --- Ashish ----- Original Message ----- From: "Mohit Agrawal" To: "Shyam Ranganathan" Cc: "Gluster Devel" Sent: Saturday, January 12, 2019 6:46:20 PM Subject: Re: [Gluster-devel] Regression health for release-5.next and release-6 Previous logs related to client not bricks, below are the brick logs [2019-01-12 12:25:25.893485]:++++++++++ G_LOG:./tests/bugs/ec/bug-1236065.t: TEST: 68 rm -f 0.o 10.o 11.o 12.o 13.o 14.o 15.o 16.o 17.o 18.o 19.o 1.o 2.o 3.o 4.o 5.o 6.o 7.o 8.o 9.o ++++++++++ The message "I [MSGID: 101016] [glusterfs3.h:746:dict_to_xdr] 0-dict: key 'trusted.ec.size' would not be sent on wire in the future [Invalid argument]" repeated 199 times between [2019-01-12 12:25:25.283989] and [2019-01-12 12:25:25.899532] [2019-01-12 12:25:25.903375] E [MSGID: 113001] [posix-inode-fd-ops.c:4617:_posix_handle_xattr_keyvalue_pair] 8-patchy-posix: fgetxattr failed on gfid=d91f6331-d394-479d-ab51-6bcf674ac3e0 while doing xattrop: Key:trusted.ec.dirty (Bad file descriptor) [Bad file descriptor] [2019-01-12 12:25:25.903468] E [MSGID: 115073] [server-rpc-fops_v2.c:1805:server4_fxattrop_cbk] 0-patchy-server: 1486: FXATTROP 2 (d91f6331-d394-479d-ab51-6bcf674ac3e0), client: CTX_ID:b785c2b0-3453-4a03-b129-19e6ceeb5346-GRAPH_ID:0-PID:24147-HOST:softserve-moagrawa-test.1-PC_NAME:patchy-client-1-RECON_NO:-1, error-xlator: patchy-posix [Bad file descriptor] Thanks, Mohit Agrawal On Sat, Jan 12, 2019 at 6:29 PM Mohit Agrawal < moagrawa at redhat.com > wrote: For specific to "add-brick-and-validate-replicated-volume-options.t" i have posted a patch https://review.gluster.org/22015 . For test case "ec/bug-1236065.t" I think the issue needs to be check by ec team On the brick side, it is showing below logs >>>>>>>>>>>>>>>>> on wire in the future [Invalid argument] The message "I [MSGID: 101016] [glusterfs3.h:746:dict_to_xdr] 0-dict: key 'trusted.ec.dirty' would not be sent on wire in the future [Invalid argument]" repeated 3 times between [2019-01-12 12:25:25.902828] and [2019-01-12 12:25:25.902992] [2019-01-12 12:25:25.903553] W [MSGID: 114031] [client-rpc-fops_v2.c:1614:client4_0_fxattrop_cbk] 0-patchy-client-1: remote operation failed [Bad file descriptor] [2019-01-12 12:25:25.903998] W [MSGID: 122040] [ec-common.c:1181:ec_prepare_update_cbk] 0-patchy-disperse-0: Failed to get size and version : FOP : 'FXATTROP' failed on gfid d91f6331-d394-479d-ab51-6bcf674ac3e0 [Input/output error] [2019-01-12 12:25:25.904059] W [fuse-bridge.c:1907:fuse_unlink_cbk] 0-glusterfs-fuse: 3259: UNLINK() /test/0.o => -1 (Input/output error) >>>>>>>>>>>>>>>>>>> Test case is getting timed out because "volume heal $V0 full" command is stuck, look's like shd is getting stuck at getxattr >>>>>>>>>>>>>>. Thread 8 (Thread 0x7f83777fe700 (LWP 25552)): #0 0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0 #1 0x00007f83bc910e5b in syncop_getxattr (subvol=, loc=loc at entry=0x7f83777fdbb0, dict=dict at entry=0x0, key=key at entry=0x7f83add06a28 "trusted.ec.heal", xdata_in=xdata_in at entry=0x0, xdata_out=xdata_out at entry=0x0) at syncop.c:1680 #2 0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a8030880, child=, loc=0x7f83777fdbb0, full=) at ec-heald.c:161 #3 0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a80094b0, entry=, parent=0x7f83777fdde0, data=0x7f83a8030880) at ec-heald.c:294 #4 0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a80094b0, loc=loc at entry=0x7f83777fdde0, pid=pid at entry=-6, data=data at entry=0x7f83a8030880, fn=fn at entry=0x7f83add03140 ) at syncop-utils.c:125 #5 0x00007f83add03534 in ec_shd_full_sweep (healer=healer at entry=0x7f83a8030880, inode=) at ec-heald.c:311 #6 0x00007f83add0367b in ec_shd_full_healer (data=0x7f83a8030880) at ec-heald.c:372 #7 0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0 #8 0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6 Thread 7 (Thread 0x7f8376ffd700 (LWP 25553)): #0 0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0 #1 0x00007f83bc910e5b in syncop_getxattr (subvol=, loc=loc at entry=0x7f8376ffcbb0, dict=dict at entry=0x0, key=key at entry=0x7f83add06a28 "trusted.ec.heal", xdata_in=xdata_in at entry=0x0, xdata_out=xdata_out at entry=0x0) at syncop.c:1680 #2 0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a80308f0, child=, loc=0x7f8376ffcbb0, full=) at ec-heald.c:161 #3 0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a800d110, entry=, parent=0x7f8376ffcde0, data=0x7f83a80308f0) at ec-heald.c:294 #4 0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a800d110, loc=loc at entry=0x7f8376ffcde0, pid=pid at entry=-6, data=data at entry=0x7f83a80308f0, fn=fn at entry=0x7f83add03140 ) at syncop-utils.c:125 #5 0x00007f83add03534 in ec_shd_full_sweep (healer=healer at entry=0x7f83a80308f0, inode=) at ec-heald.c:311 #6 0x00007f83add0367b in ec_shd_full_healer (data=0x7f83a80308f0) at ec-heald.c:372 #7 0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0 #8 0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6 Thread 6 (Thread 0x7f83767fc700 (LWP 25554)): #0 0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0 #1 0x00007f83bc910e5b in syncop_getxattr (subvol=, loc=loc at entry=0x7f83767fbbb0, dict=dict at entry=0x0, key=key at entry=0x7f83add06a28 "trusted.ec.heal", xdata_in=xdata_in at entry=0x0, xdata_out=xdata_out at entry=0x0) at syncop.c:1680 #2 0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a8030960, child=, loc=0x7f83767fbbb0, full=) at ec-heald.c:161 #3 0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a8010af0, entry=, parent=0x7f83767fbde0, data=0x7f83a8030960) at ec-heald.c:294 #4 0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a8010af0, loc=loc at entry=0x7f83767fbde0, pid=pid at entry=-6, data=data at entry=0x7f83a8030960, fn=fn at entry=0x7f83add03140 ) at syncop-utils.c:125 #5 0x00007f83add03534 in ec_shd_full_sweep (healer=healer at entry=0x7f83a8030960, inode=) at ec-heald.c:311 #6 0x00007f83add0367b in ec_shd_full_healer (data=0x7f83a8030960) at ec-heald.c:372 #7 0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0 #8 0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6 Thread 5 (Thread 0x7f8375ffb700 (LWP 25555)): #0 0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0 #1 0x00007f83bc910e5b in syncop_getxattr (subvol=, loc=loc at entry=0x7f8375ffabb0, dict=dict at entry=0x0, key=key at entry=0x7f83add06a28 "trusted.ec.heal", xdata_in=xdata_in at entry=0x0, xdata_out=xdata_out at entry=0x0) at syncop.c:1680 #2 0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a80309d0, child=, loc=0x7f8375ffabb0, full=) at ec-heald.c:161 #3 0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a80144d0, entry=, parent=0x7f8375ffade0, data=0x7f83a80309d0) at ec-heald.c:294 #4 0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a80144d0, loc=loc at entry=0x7f8375ffade0, pid=pid at entry=-6, data=data at entry=0x7f83a80309d0, fn=fn at entry=0x7f83add03140 ) at syncop-utils.c:125 #5 0x00007f83add03534 in ec_shd_full_sweep (healer=healer at entry=0x7f83a80309d0, inode=) at ec-heald.c:311 #6 0x00007f83add0367b in ec_shd_full_healer (data=0x7f83a80309d0) at ec-heald.c:372 #7 0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0 #8 0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6 Thread 4 (Thread 0x7f83757fa700 (LWP 25556)): #0 0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0 #1 0x00007f83bc910e5b in syncop_getxattr (subvol=, loc=loc at entry=0x7f83757f9bb0, dict=dict at entry=0x0, key=key at entry=0x7f83add06a28 "trusted.ec.heal", xdata_in=xdata_in at entry=0x0, xdata_out=xdata_out at entry=0x0) at syncop.c:1680 #2 0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a8030a40, child=, loc=0x7f83757f9bb0, full=) at ec-heald.c:161 #3 0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a8017eb0, entry=, parent=0x7f83757f9de0, data=0x7f83a8030a40) at ec-heald.c:294 #4 0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a8017eb0, loc=loc at entry=0x7f83757f9de0, pid=pid at entry=-6, data=data at entry=0x7f83a8030a40, fn=fn at entry=0x7f83add03140 ) at syncop-utils.c:125 #5 0x00007f83add03534 in ec_shd_full_sweep (healer=healer at entry=0x7f83a8030a40, inode=) at ec-heald.c:311 #6 0x00007f83add0367b in ec_shd_full_healer (data=0x7f83a8030a40) at ec-heald.c:372 #7 0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0 #8 0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6 Thread 3 (Thread 0x7f8374ff9700 (LWP 25557)): #0 0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0 #1 0x00007f83bc910e5b in syncop_getxattr (subvol=, loc=loc at entry=0x7f8374ff8bb0, dict=dict at entry=0x0, key=key at entry=0x7f83add06a28 "trusted.ec.heal", xdata_in=xdata_in at entry=0x0, xdata_out=xdata_out at entry=0x0) at syncop.c:1680 #2 0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a8030ab0, child=, loc=0x7f8374ff8bb0, full=) at ec-heald.c:161 #3 0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a801b890, entry=, parent=0x7f8374ff8de0, data=0x7f83a8030ab0) at ec-heald.c:294 #4 0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a801b890, loc=loc at entry=0x7f8374ff8de0, pid=pid at entry=-6, data=data at entry=0x7f83a8030ab0, fn=fn at entry=0x7f83add03140 ) at syncop-utils.c:125 #5 0x00007f83add03534 in ec_shd_full_sweep (healer=healer at entry=0x7f83a8030ab0, inode=) at ec-heald.c:311 #6 0x00007f83add0367b in ec_shd_full_healer (data=0x7f83a8030ab0) at ec-heald.c:372 #7 0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0 #8 0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6 Thread 2 (Thread 0x7f8367fff700 (LWP 25558)): #0 0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0 #1 0x00007f83bc910e5b in syncop_getxattr (subvol=, loc=loc at entry=0x7f8367ffebb0, dict=dict at entry=0x0, key=key at entry=0x7f83add06a28 "trusted.ec.heal", xdata_in=xdata_in at entry=0x0, xdata_out=xdata_out at entry=0x0) at syncop.c:1680 #2 0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a8030b20, child=, loc=0x7f8367ffebb0, full=) at ec-heald.c:161 #3 0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a801f270, entry=, parent=0x7f8367ffede0, data=0x7f83a8030b20) at ec-heald.c:294 #4 0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a801f270, loc=loc at entry=0x7f8367ffede0, pid=pid at entry=-6, data=data at entry=0x7f83a8030b20, fn=fn at entry=0x7f83add03140 ) at syncop-utils.c:125 #5 0x00007f83add03534 in ec_shd_full_sweep (healer=healer at entry=0x7f83a8030b20, inode=) at ec-heald.c:311 #6 0x00007f83add0367b in ec_shd_full_healer (data=0x7f83a8030b20) at ec-heald.c:372 #7 0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0 #8 0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6 Thread 1 (Thread 0x7f83bcdd1780 (LWP 25383)): #0 0x00007f83bb70af57 in pthread_join () from /usr/lib64/libpthread.so.0 #1 0x00007f83bc92eff8 in event_dispatch_epoll (event_pool=0x55af0a6dd560) at event-epoll.c:846 #2 0x000055af0a4116b8 in main (argc=15, argv=0x7fff75610898) at glusterfsd.c:2848 >>>>>>>>>>>>>>>>>>>>>>>>>>. Thanks, Mohit Agrawal On Fri 11 Jan, 2019, 21:20 Shyam Ranganathan < srangana at redhat.com wrote:

We can check health on master post the patch as stated by Mohit below. Release-5 is causing some concerns as we need to tag the release yesterday, but we have the following 2 tests failing or coredumping pretty regularly, need attention on these. ec/bug-1236065.t glusterd/add-brick-and-validate-replicated-volume-options.t Shyam On 1/10/19 6:20 AM, Mohit Agrawal wrote: > I think we should consider regression-builds after merged the patch > ( https://review.gluster.org/#/c/glusterfs/+/21990/ ) > as we know this patch introduced some delay. > > Thanks, > Mohit Agrawal > > On Thu, Jan 10, 2019 at 3:55 PM Atin Mukherjee < amukherj at redhat.com > > wrote: > > Mohit, Sanju - request you to investigate the failures related to > glusterd and brick-mux and report back to the list. > > On Thu, Jan 10, 2019 at 12:25 AM Shyam Ranganathan > < srangana at redhat.com > wrote: > > Hi, > > As part of branching preparation next week for release-6, please > find > test failures and respective test links here [1]. > > The top tests that are failing/dumping-core are as below and > need attention, > - ec/bug-1236065.t > - glusterd/add-brick-and-validate-replicated-volume-options.t > - readdir-ahead/bug-1390050.t > - glusterd/brick-mux-validation.t > - bug-1432542-mpx-restart-crash.t > > Others of interest, > - replicate/bug-1341650.t > > Please file a bug if needed against the test case and report the > same > here, in case a problem is already addressed, then do send back the > patch details that addresses this issue as a response to this mail. > > Thanks, > Shyam > > [1] Regression failures: > https://hackmd.io/wsPgKjfJRWCP8ixHnYGqcA?view > _______________________________________________ > Gluster-devel mailing list > Gluster-devel at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-devel > >

_______________________________________________ Gluster-devel mailing list Gluster-devel at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From jahernan at redhat.com Tue Jan 15 08:35:20 2019 From: jahernan at redhat.com (Xavi Hernandez) Date: Tue, 15 Jan 2019 09:35:20 +0100 Subject: [Gluster-devel] Regression health for release-5.next and release-6 In-Reply-To: <2134165472.57578088.1547460382588.JavaMail.zimbra@redhat.com> References: <89f54d02-8c78-a507-416a-c1ca1d7b4be2@redhat.com> <65f1c892-a3c4-2401-4827-dbe0277875b2@redhat.com>

<2134165472.57578088.1547460382588.JavaMail.zimbra@redhat.com> Message-ID: On Mon, Jan 14, 2019 at 11:08 AM Ashish Pandey wrote: > > I downloaded logs of regression runs 1077 and 1073 and tried to > investigate it. > In both regression ec/bug-1236065.t is hanging on TEST 70 which is trying > to get the online brick count > > I can see that in mount/bricks and glusterd logs it has not move forward > after this test. > glusterd.log - > > [2019-01-06 16:27:51.346408]:++++++++++ > G_LOG:./tests/bugs/ec/bug-1236065.t: TEST: 70 5 online_brick_count > ++++++++++ > [2019-01-06 16:27:51.645014] I [MSGID: 106499] > [glusterd-handler.c:4404:__glusterd_handle_status_volume] 0-management: > Received status volume req for volume patchy > [2019-01-06 16:27:51.646664] I [dict.c:2745:dict_get_str_boolean] > (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x4a6c3) > [0x7f4c37fe06c3] > -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x43b3a) > [0x7f4c37fd9b3a] > -->/build/install/lib/libglusterfs.so.0(dict_get_str_boolean+0x170) > [0x7f4c433d83fb] ) 0-dict: key nfs.disable, integer type asked, has string > type [Invalid argument] > [2019-01-06 16:27:51.647177] I [dict.c:2361:dict_get_strn] > (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) > [0x7f4c38095a32] > -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) > [0x7f4c37fdd4ac] > -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) > [0x7f4c433d7673] ) 0-dict: key brick0.rdma_port, string type asked, has > integer type [Invalid argument] > [2019-01-06 16:27:51.647227] I [dict.c:2361:dict_get_strn] > (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) > [0x7f4c38095a32] > -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) > [0x7f4c37fdd4ac] > -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) > [0x7f4c433d7673] ) 0-dict: key brick1.rdma_port, string type asked, has > integer type [Invalid argument] > [2019-01-06 16:27:51.647292] I [dict.c:2361:dict_get_strn] > (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) > [0x7f4c38095a32] > -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) > [0x7f4c37fdd4ac] > -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) > [0x7f4c433d7673] ) 0-dict: key brick2.rdma_port, string type asked, has > integer type [Invalid argument] > [2019-01-06 16:27:51.647333] I [dict.c:2361:dict_get_strn] > (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) > [0x7f4c38095a32] > -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) > [0x7f4c37fdd4ac] > -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) > [0x7f4c433d7673] ) 0-dict: key brick3.rdma_port, string type asked, has > integer type [Invalid argument] > [2019-01-06 16:27:51.647371] I [dict.c:2361:dict_get_strn] > (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) > [0x7f4c38095a32] > -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) > [0x7f4c37fdd4ac] > -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) > [0x7f4c433d7673] ) 0-dict: key brick4.rdma_port, string type asked, has > integer type [Invalid argument] > [2019-01-06 16:27:51.647409] I [dict.c:2361:dict_get_strn] > (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) > [0x7f4c38095a32] > -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) > [0x7f4c37fdd4ac] > -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) > [0x7f4c433d7673] ) 0-dict: key brick5.rdma_port, string type asked, has > integer type [Invalid argument] > [2019-01-06 16:27:51.647447] I [dict.c:2361:dict_get_strn] > (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) > [0x7f4c38095a32] > -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) > [0x7f4c37fdd4ac] > -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) > [0x7f4c433d7673] ) 0-dict: key brick6.rdma_port, string type asked, has > integer type [Invalid argument] > [2019-01-06 16:27:51.649335] E [MSGID: 101191] > [event-epoll.c:759:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch > handler > [2019-01-06 16:27:51.932871] I [MSGID: 106499] > [glusterd-handler.c:4404:__glusterd_handle_status_volume] 0-management: > Received status volume req for volume patchy > > It is just taking lot of time to get the status at this point. > It looks like there could be some issue with connection or the handing of > volume status when some bricks are down. > The 'online_brick_count' check uses 'gluster volume status' to get some information, and it does that several times (currently 7). Looking at cmd_history.log, I see that after the 'online_brick_count' at line 70, only one 'gluster volume status' has completed. Apparently the second 'gluster volume status' is hung. In cli.log I see that the second 'gluster volume status' seems to have started, but not finished: Normal run: [2019-01-08 16:36:43.628821] I [cli.c:834:main] 0-cli: Started running gluster with version 6dev [2019-01-08 16:36:43.808182] I [MSGID: 101190] [event-epoll.c:675:event_dispatch_epoll_worker] 0-epoll: Started thread with index 0 [2019-01-08 16:36:43.808287] I [MSGID: 101190] [event-epoll.c:675:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2019-01-08 16:36:43.808432] E [MSGID: 101191] [event-epoll.c:759:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler [2019-01-08 16:36:43.816534] I [dict.c:1947:dict_get_uint32] (-->gluster(cli_cmd_process+0x1e4) [0x40db50] -->gluster(cli_cmd_volume_status_cbk+0x90) [0x415bec] -->/build/install/lib/libglusterfs.so.0(dict_get_uint32+0x176) [0x7fefe569456 9] ) 0-dict: key cmd, unsigned integer type asked, has integer type [Invalid argument] [2019-01-08 16:36:43.816716] I [dict.c:1947:dict_get_uint32] (-->gluster(cli_cmd_volume_status_cbk+0x1cb) [0x415d27] -->gluster(gf_cli_status_volume_all+0xc8) [0x42fa94] -->/build/install/lib/libglusterfs.so.0(dict_get_uint32+0x176) [0x7f efe5694569] ) 0-dict: key cmd, unsigned integer type asked, has integer type [Invalid argument] [2019-01-08 16:36:43.824437] I [input.c:31:cli_batch] 0-: Exiting with: 0 Bad run: [2019-01-08 16:36:43.940361] I [cli.c:834:main] 0-cli: Started running gluster with version 6dev [2019-01-08 16:36:44.147364] I [MSGID: 101190] [event-epoll.c:675:event_dispatch_epoll_worker] 0-epoll: Started thread with index 0 [2019-01-08 16:36:44.147477] I [MSGID: 101190] [event-epoll.c:675:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2019-01-08 16:36:44.147583] E [MSGID: 101191] [event-epoll.c:759:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler In glusterd.log it seems as if it hasn't received any status request. It looks like the cli has not even connected to glusterd. Xavi > --- > Ashish > > > > ------------------------------ > *From: *"Mohit Agrawal" > *To: *"Shyam Ranganathan" > *Cc: *"Gluster Devel" > *Sent: *Saturday, January 12, 2019 6:46:20 PM > *Subject: *Re: [Gluster-devel] Regression health for release-5.next > and release-6 > > Previous logs related to client not bricks, below are the brick logs > > [2019-01-12 12:25:25.893485]:++++++++++ > G_LOG:./tests/bugs/ec/bug-1236065.t: TEST: 68 rm -f 0.o 10.o 11.o 12.o 13.o > 14.o 15.o 16.o 17.o 18.o 19.o 1.o 2.o 3.o 4.o 5.o 6.o 7.o 8.o 9.o ++++++++++ > The message "I [MSGID: 101016] [glusterfs3.h:746:dict_to_xdr] 0-dict: key > 'trusted.ec.size' would not be sent on wire in the future [Invalid > argument]" repeated 199 times between [2019-01-12 12:25:25.283989] and > [2019-01-12 12:25:25.899532] > [2019-01-12 12:25:25.903375] E [MSGID: 113001] > [posix-inode-fd-ops.c:4617:_posix_handle_xattr_keyvalue_pair] > 8-patchy-posix: fgetxattr failed on > gfid=d91f6331-d394-479d-ab51-6bcf674ac3e0 while doing xattrop: > Key:trusted.ec.dirty (Bad file descriptor) [Bad file descriptor] > [2019-01-12 12:25:25.903468] E [MSGID: 115073] > [server-rpc-fops_v2.c:1805:server4_fxattrop_cbk] 0-patchy-server: 1486: > FXATTROP 2 (d91f6331-d394-479d-ab51-6bcf674ac3e0), client: > CTX_ID:b785c2b0-3453-4a03-b129-19e6ceeb5346-GRAPH_ID:0-PID:24147-HOST:softserve-moagrawa-test.1-PC_NAME:patchy-client-1-RECON_NO:-1, > error-xlator: patchy-posix [Bad file descriptor] > > > Thanks, > Mohit Agrawal > > On Sat, Jan 12, 2019 at 6:29 PM Mohit Agrawal wrote: > >> >> For specific to "add-brick-and-validate-replicated-volume-options.t" i >> have posted a patch https://review.gluster.org/22015. >> For test case "ec/bug-1236065.t" I think the issue needs to be check by >> ec team >> >> On the brick side, it is showing below logs >> >> >>>>>>>>>>>>>>>>> >> >> on wire in the future [Invalid argument] >> The message "I [MSGID: 101016] [glusterfs3.h:746:dict_to_xdr] 0-dict: key >> 'trusted.ec.dirty' would not be sent on wire in the future [Invalid >> argument]" repeated 3 times between [2019-01-12 12:25:25.902828] and >> [2019-01-12 12:25:25.902992] >> [2019-01-12 12:25:25.903553] W [MSGID: 114031] >> [client-rpc-fops_v2.c:1614:client4_0_fxattrop_cbk] 0-patchy-client-1: >> remote operation failed [Bad file descriptor] >> [2019-01-12 12:25:25.903998] W [MSGID: 122040] >> [ec-common.c:1181:ec_prepare_update_cbk] 0-patchy-disperse-0: Failed to get >> size and version : FOP : 'FXATTROP' failed on gfid >> d91f6331-d394-479d-ab51-6bcf674ac3e0 [Input/output error] >> [2019-01-12 12:25:25.904059] W [fuse-bridge.c:1907:fuse_unlink_cbk] >> 0-glusterfs-fuse: 3259: UNLINK() /test/0.o => -1 (Input/output error) >> >> >>>>>>>>>>>>>>>>>>> >> >> Test case is getting timed out because "volume heal $V0 full" command is >> stuck, look's like shd is getting stuck at getxattr >> >> >>>>>>>>>>>>>>. >> >> Thread 8 (Thread 0x7f83777fe700 (LWP 25552)): >> #0 0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from >> /usr/lib64/libpthread.so.0 >> #1 0x00007f83bc910e5b in syncop_getxattr (subvol=, >> loc=loc at entry=0x7f83777fdbb0, dict=dict at entry=0x0, key=key at entry=0x7f83add06a28 >> "trusted.ec.heal", xdata_in=xdata_in at entry=0x0, xdata_out=xdata_out at entry=0x0) >> at syncop.c:1680 >> #2 0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a8030880, >> child=, loc=0x7f83777fdbb0, full=) at >> ec-heald.c:161 >> #3 0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a80094b0, >> entry=, parent=0x7f83777fdde0, data=0x7f83a8030880) at >> ec-heald.c:294 >> #4 0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a80094b0, loc=loc at entry=0x7f83777fdde0, >> pid=pid at entry=-6, data=data at entry=0x7f83a8030880, fn=fn at entry=0x7f83add03140 >> ) at syncop-utils.c:125 >> #5 0x00007f83add03534 in ec_shd_full_sweep (healer=healer at entry=0x7f83a8030880, >> inode=) at ec-heald.c:311 >> #6 0x00007f83add0367b in ec_shd_full_healer (data=0x7f83a8030880) at >> ec-heald.c:372 >> #7 0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0 >> #8 0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6 >> Thread 7 (Thread 0x7f8376ffd700 (LWP 25553)): >> #0 0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from >> /usr/lib64/libpthread.so.0 >> #1 0x00007f83bc910e5b in syncop_getxattr (subvol=, >> loc=loc at entry=0x7f8376ffcbb0, dict=dict at entry=0x0, key=key at entry=0x7f83add06a28 >> "trusted.ec.heal", xdata_in=xdata_in at entry=0x0, xdata_out=xdata_out at entry=0x0) >> at syncop.c:1680 >> #2 0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a80308f0, >> child=, loc=0x7f8376ffcbb0, full=) at >> ec-heald.c:161 >> #3 0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a800d110, >> entry=, parent=0x7f8376ffcde0, data=0x7f83a80308f0) at >> ec-heald.c:294 >> #4 0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a800d110, loc=loc at entry=0x7f8376ffcde0, >> pid=pid at entry=-6, data=data at entry=0x7f83a80308f0, fn=fn at entry=0x7f83add03140 >> ) at syncop-utils.c:125 >> #5 0x00007f83add03534 in ec_shd_full_sweep (healer=healer at entry=0x7f83a80308f0, >> inode=) at ec-heald.c:311 >> #6 0x00007f83add0367b in ec_shd_full_healer (data=0x7f83a80308f0) at >> ec-heald.c:372 >> #7 0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0 >> #8 0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6 >> Thread 6 (Thread 0x7f83767fc700 (LWP 25554)): >> #0 0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from >> /usr/lib64/libpthread.so.0 >> #1 0x00007f83bc910e5b in syncop_getxattr (subvol=, >> loc=loc at entry=0x7f83767fbbb0, dict=dict at entry=0x0, key=key at entry=0x7f83add06a28 >> "trusted.ec.heal", xdata_in=xdata_in at entry=0x0, xdata_out=xdata_out at entry=0x0) >> at syncop.c:1680 >> #2 0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a8030960, >> child=, loc=0x7f83767fbbb0, full=) at >> ec-heald.c:161 >> #3 0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a8010af0, >> entry=, parent=0x7f83767fbde0, data=0x7f83a8030960) at >> ec-heald.c:294 >> #4 0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a8010af0, loc=loc at entry=0x7f83767fbde0, >> pid=pid at entry=-6, data=data at entry=0x7f83a8030960, fn=fn at entry=0x7f83add03140 >> ) at syncop-utils.c:125 >> #5 0x00007f83add03534 in ec_shd_full_sweep (healer=healer at entry=0x7f83a8030960, >> inode=) at ec-heald.c:311 >> #6 0x00007f83add0367b in ec_shd_full_healer (data=0x7f83a8030960) at >> ec-heald.c:372 >> #7 0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0 >> #8 0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6 >> Thread 5 (Thread 0x7f8375ffb700 (LWP 25555)): >> #0 0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from >> /usr/lib64/libpthread.so.0 >> #1 0x00007f83bc910e5b in syncop_getxattr (subvol=, >> loc=loc at entry=0x7f8375ffabb0, dict=dict at entry=0x0, key=key at entry=0x7f83add06a28 >> "trusted.ec.heal", xdata_in=xdata_in at entry=0x0, xdata_out=xdata_out at entry=0x0) >> at syncop.c:1680 >> #2 0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a80309d0, >> child=, loc=0x7f8375ffabb0, full=) at >> ec-heald.c:161 >> #3 0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a80144d0, >> entry=, parent=0x7f8375ffade0, data=0x7f83a80309d0) at >> ec-heald.c:294 >> #4 0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a80144d0, loc=loc at entry=0x7f8375ffade0, >> pid=pid at entry=-6, data=data at entry=0x7f83a80309d0, fn=fn at entry=0x7f83add03140 >> ) at syncop-utils.c:125 >> #5 0x00007f83add03534 in ec_shd_full_sweep (healer=healer at entry=0x7f83a80309d0, >> inode=) at ec-heald.c:311 >> #6 0x00007f83add0367b in ec_shd_full_healer (data=0x7f83a80309d0) at >> ec-heald.c:372 >> #7 0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0 >> #8 0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6 >> Thread 4 (Thread 0x7f83757fa700 (LWP 25556)): >> #0 0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from >> /usr/lib64/libpthread.so.0 >> #1 0x00007f83bc910e5b in syncop_getxattr (subvol=, >> loc=loc at entry=0x7f83757f9bb0, dict=dict at entry=0x0, key=key at entry=0x7f83add06a28 >> "trusted.ec.heal", xdata_in=xdata_in at entry=0x0, xdata_out=xdata_out at entry=0x0) >> at syncop.c:1680 >> #2 0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a8030a40, >> child=, loc=0x7f83757f9bb0, full=) at >> ec-heald.c:161 >> #3 0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a8017eb0, >> entry=, parent=0x7f83757f9de0, data=0x7f83a8030a40) at >> ec-heald.c:294 >> #4 0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a8017eb0, loc=loc at entry=0x7f83757f9de0, >> pid=pid at entry=-6, data=data at entry=0x7f83a8030a40, fn=fn at entry=0x7f83add03140 >> ) at syncop-utils.c:125 >> #5 0x00007f83add03534 in ec_shd_full_sweep (healer=healer at entry=0x7f83a8030a40, >> inode=) at ec-heald.c:311 >> #6 0x00007f83add0367b in ec_shd_full_healer (data=0x7f83a8030a40) at >> ec-heald.c:372 >> #7 0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0 >> #8 0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6 >> Thread 3 (Thread 0x7f8374ff9700 (LWP 25557)): >> #0 0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from >> /usr/lib64/libpthread.so.0 >> #1 0x00007f83bc910e5b in syncop_getxattr (subvol=, >> loc=loc at entry=0x7f8374ff8bb0, dict=dict at entry=0x0, key=key at entry=0x7f83add06a28 >> "trusted.ec.heal", xdata_in=xdata_in at entry=0x0, xdata_out=xdata_out at entry=0x0) >> at syncop.c:1680 >> #2 0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a8030ab0, >> child=, loc=0x7f8374ff8bb0, full=) at >> ec-heald.c:161 >> #3 0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a801b890, >> entry=, parent=0x7f8374ff8de0, data=0x7f83a8030ab0) at >> ec-heald.c:294 >> #4 0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a801b890, loc=loc at entry=0x7f8374ff8de0, >> pid=pid at entry=-6, data=data at entry=0x7f83a8030ab0, fn=fn at entry=0x7f83add03140 >> ) at syncop-utils.c:125 >> #5 0x00007f83add03534 in ec_shd_full_sweep (healer=healer at entry=0x7f83a8030ab0, >> inode=) at ec-heald.c:311 >> #6 0x00007f83add0367b in ec_shd_full_healer (data=0x7f83a8030ab0) at >> ec-heald.c:372 >> #7 0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0 >> #8 0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6 >> Thread 2 (Thread 0x7f8367fff700 (LWP 25558)): >> #0 0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from >> /usr/lib64/libpthread.so.0 >> #1 0x00007f83bc910e5b in syncop_getxattr (subvol=, >> loc=loc at entry=0x7f8367ffebb0, dict=dict at entry=0x0, key=key at entry=0x7f83add06a28 >> "trusted.ec.heal", xdata_in=xdata_in at entry=0x0, xdata_out=xdata_out at entry=0x0) >> at syncop.c:1680 >> #2 0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a8030b20, >> child=, loc=0x7f8367ffebb0, full=) at >> ec-heald.c:161 >> #3 0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a801f270, >> entry=, parent=0x7f8367ffede0, data=0x7f83a8030b20) at >> ec-heald.c:294 >> #4 0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a801f270, loc=loc at entry=0x7f8367ffede0, >> pid=pid at entry=-6, data=data at entry=0x7f83a8030b20, fn=fn at entry=0x7f83add03140 >> ) at syncop-utils.c:125 >> #5 0x00007f83add03534 in ec_shd_full_sweep (healer=healer at entry=0x7f83a8030b20, >> inode=) at ec-heald.c:311 >> #6 0x00007f83add0367b in ec_shd_full_healer (data=0x7f83a8030b20) at >> ec-heald.c:372 >> #7 0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0 >> #8 0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6 >> Thread 1 (Thread 0x7f83bcdd1780 (LWP 25383)): >> #0 0x00007f83bb70af57 in pthread_join () from /usr/lib64/libpthread.so.0 >> #1 0x00007f83bc92eff8 in event_dispatch_epoll >> (event_pool=0x55af0a6dd560) at event-epoll.c:846 >> #2 0x000055af0a4116b8 in main (argc=15, argv=0x7fff75610898) at >> glusterfsd.c:2848 >> >> >> >>>>>>>>>>>>>>>>>>>>>>>>>>. >> >> Thanks, >> Mohit Agrawal >> >> On Fri 11 Jan, 2019, 21:20 Shyam Ranganathan > >>> We can check health on master post the patch as stated by Mohit below. >>> >>> Release-5 is causing some concerns as we need to tag the release >>> yesterday, but we have the following 2 tests failing or coredumping >>> pretty regularly, need attention on these. >>> >>> ec/bug-1236065.t >>> glusterd/add-brick-and-validate-replicated-volume-options.t >>> >>> Shyam >>> On 1/10/19 6:20 AM, Mohit Agrawal wrote: >>> > I think we should consider regression-builds after merged the patch >>> > (https://review.gluster.org/#/c/glusterfs/+/21990/) >>> > as we know this patch introduced some delay. >>> > >>> > Thanks, >>> > Mohit Agrawal >>> > >>> > On Thu, Jan 10, 2019 at 3:55 PM Atin Mukherjee >> > > wrote: >>> > >>> > Mohit, Sanju - request you to investigate the failures related to >>> > glusterd and brick-mux and report back to the list. >>> > >>> > On Thu, Jan 10, 2019 at 12:25 AM Shyam Ranganathan >>> > > wrote: >>> > >>> > Hi, >>> > >>> > As part of branching preparation next week for release-6, >>> please >>> > find >>> > test failures and respective test links here [1]. >>> > >>> > The top tests that are failing/dumping-core are as below and >>> > need attention, >>> > - ec/bug-1236065.t >>> > - glusterd/add-brick-and-validate-replicated-volume-options.t >>> > - readdir-ahead/bug-1390050.t >>> > - glusterd/brick-mux-validation.t >>> > - bug-1432542-mpx-restart-crash.t >>> > >>> > Others of interest, >>> > - replicate/bug-1341650.t >>> > >>> > Please file a bug if needed against the test case and report >>> the >>> > same >>> > here, in case a problem is already addressed, then do send >>> back the >>> > patch details that addresses this issue as a response to this >>> mail. >>> > >>> > Thanks, >>> > Shyam >>> > >>> > [1] Regression failures: >>> > https://hackmd.io/wsPgKjfJRWCP8ixHnYGqcA?view >>> > _______________________________________________ >>> > Gluster-devel mailing list >>> > Gluster-devel at gluster.org >>> > https://lists.gluster.org/mailman/listinfo/gluster-devel >>> > >>> > >>> >> > _______________________________________________ > Gluster-devel mailing list > Gluster-devel at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-devel > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From atin.mukherjee83 at gmail.com Tue Jan 15 08:42:20 2019 From: atin.mukherjee83 at gmail.com (Atin Mukherjee) Date: Tue, 15 Jan 2019 14:12:20 +0530 Subject: [Gluster-devel] Regression health for release-5.next and release-6 In-Reply-To: References: <89f54d02-8c78-a507-416a-c1ca1d7b4be2@redhat.com> <65f1c892-a3c4-2401-4827-dbe0277875b2@redhat.com>

<2134165472.57578088.1547460382588.JavaMail.zimbra@redhat.com> Message-ID: Interesting. I?ll do a deep dive at it sometime this week. On Tue, 15 Jan 2019 at 14:05, Xavi Hernandez wrote: > On Mon, Jan 14, 2019 at 11:08 AM Ashish Pandey > wrote: > >> >> I downloaded logs of regression runs 1077 and 1073 and tried to >> investigate it. >> In both regression ec/bug-1236065.t is hanging on TEST 70 which is >> trying to get the online brick count >> >> I can see that in mount/bricks and glusterd logs it has not move forward >> after this test. >> glusterd.log - >> >> [2019-01-06 16:27:51.346408]:++++++++++ >> G_LOG:./tests/bugs/ec/bug-1236065.t: TEST: 70 5 online_brick_count >> ++++++++++ >> [2019-01-06 16:27:51.645014] I [MSGID: 106499] >> [glusterd-handler.c:4404:__glusterd_handle_status_volume] 0-management: >> Received status volume req for volume patchy >> [2019-01-06 16:27:51.646664] I [dict.c:2745:dict_get_str_boolean] >> (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x4a6c3) >> [0x7f4c37fe06c3] >> -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x43b3a) >> [0x7f4c37fd9b3a] >> -->/build/install/lib/libglusterfs.so.0(dict_get_str_boolean+0x170) >> [0x7f4c433d83fb] ) 0-dict: key nfs.disable, integer type asked, has string >> type [Invalid argument] >> [2019-01-06 16:27:51.647177] I [dict.c:2361:dict_get_strn] >> (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) >> [0x7f4c38095a32] >> -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) >> [0x7f4c37fdd4ac] >> -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) >> [0x7f4c433d7673] ) 0-dict: key brick0.rdma_port, string type asked, has >> integer type [Invalid argument] >> [2019-01-06 16:27:51.647227] I [dict.c:2361:dict_get_strn] >> (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) >> [0x7f4c38095a32] >> -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) >> [0x7f4c37fdd4ac] >> -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) >> [0x7f4c433d7673] ) 0-dict: key brick1.rdma_port, string type asked, has >> integer type [Invalid argument] >> [2019-01-06 16:27:51.647292] I [dict.c:2361:dict_get_strn] >> (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) >> [0x7f4c38095a32] >> -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) >> [0x7f4c37fdd4ac] >> -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) >> [0x7f4c433d7673] ) 0-dict: key brick2.rdma_port, string type asked, has >> integer type [Invalid argument] >> [2019-01-06 16:27:51.647333] I [dict.c:2361:dict_get_strn] >> (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) >> [0x7f4c38095a32] >> -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) >> [0x7f4c37fdd4ac] >> -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) >> [0x7f4c433d7673] ) 0-dict: key brick3.rdma_port, string type asked, has >> integer type [Invalid argument] >> [2019-01-06 16:27:51.647371] I [dict.c:2361:dict_get_strn] >> (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) >> [0x7f4c38095a32] >> -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) >> [0x7f4c37fdd4ac] >> -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) >> [0x7f4c433d7673] ) 0-dict: key brick4.rdma_port, string type asked, has >> integer type [Invalid argument] >> [2019-01-06 16:27:51.647409] I [dict.c:2361:dict_get_strn] >> (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) >> [0x7f4c38095a32] >> -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) >> [0x7f4c37fdd4ac] >> -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) >> [0x7f4c433d7673] ) 0-dict: key brick5.rdma_port, string type asked, has >> integer type [Invalid argument] >> [2019-01-06 16:27:51.647447] I [dict.c:2361:dict_get_strn] >> (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) >> [0x7f4c38095a32] >> -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) >> [0x7f4c37fdd4ac] >> -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) >> [0x7f4c433d7673] ) 0-dict: key brick6.rdma_port, string type asked, has >> integer type [Invalid argument] >> [2019-01-06 16:27:51.649335] E [MSGID: 101191] >> [event-epoll.c:759:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch >> handler >> [2019-01-06 16:27:51.932871] I [MSGID: 106499] >> [glusterd-handler.c:4404:__glusterd_handle_status_volume] 0-management: >> Received status volume req for volume patchy >> >> It is just taking lot of time to get the status at this point. >> It looks like there could be some issue with connection or the handing of >> volume status when some bricks are down. >> > > The 'online_brick_count' check uses 'gluster volume status' to get some > information, and it does that several times (currently 7). Looking at > cmd_history.log, I see that after the 'online_brick_count' at line 70, only > one 'gluster volume status' has completed. Apparently the second 'gluster > volume status' is hung. > > In cli.log I see that the second 'gluster volume status' seems to have > started, but not finished: > > Normal run: > > [2019-01-08 16:36:43.628821] I [cli.c:834:main] 0-cli: Started running > gluster with version 6dev > [2019-01-08 16:36:43.808182] I [MSGID: 101190] > [event-epoll.c:675:event_dispatch_epoll_worker] 0-epoll: Started thread > with index 0 > [2019-01-08 16:36:43.808287] I [MSGID: 101190] > [event-epoll.c:675:event_dispatch_epoll_worker] 0-epoll: Started thread > with index 1 > [2019-01-08 16:36:43.808432] E [MSGID: 101191] > [event-epoll.c:759:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch > handler > [2019-01-08 16:36:43.816534] I [dict.c:1947:dict_get_uint32] > (-->gluster(cli_cmd_process+0x1e4) [0x40db50] > -->gluster(cli_cmd_volume_status_cbk+0x90) [0x415bec] > -->/build/install/lib/libglusterfs.so.0(dict_get_uint32+0x176) > [0x7fefe569456 > 9] ) 0-dict: key cmd, unsigned integer type asked, has integer type > [Invalid argument] > [2019-01-08 16:36:43.816716] I [dict.c:1947:dict_get_uint32] > (-->gluster(cli_cmd_volume_status_cbk+0x1cb) [0x415d27] > -->gluster(gf_cli_status_volume_all+0xc8) [0x42fa94] > -->/build/install/lib/libglusterfs.so.0(dict_get_uint32+0x176) [0x7f > efe5694569] ) 0-dict: key cmd, unsigned integer type asked, has integer > type [Invalid argument] > [2019-01-08 16:36:43.824437] I [input.c:31:cli_batch] 0-: Exiting with: 0 > > > Bad run: > > [2019-01-08 16:36:43.940361] I [cli.c:834:main] 0-cli: Started running > gluster with version 6dev > [2019-01-08 16:36:44.147364] I [MSGID: 101190] > [event-epoll.c:675:event_dispatch_epoll_worker] 0-epoll: Started thread > with index 0 > [2019-01-08 16:36:44.147477] I [MSGID: 101190] > [event-epoll.c:675:event_dispatch_epoll_worker] 0-epoll: Started thread > with index 1 > [2019-01-08 16:36:44.147583] E [MSGID: 101191] > [event-epoll.c:759:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch > handler > > > In glusterd.log it seems as if it hasn't received any status request. It > looks like the cli has not even connected to glusterd. > > Xavi > > >> --- >> Ashish >> >> >> >> ------------------------------ >> *From: *"Mohit Agrawal" >> *To: *"Shyam Ranganathan" >> *Cc: *"Gluster Devel" >> *Sent: *Saturday, January 12, 2019 6:46:20 PM >> *Subject: *Re: [Gluster-devel] Regression health for release-5.next >> and release-6 >> >> Previous logs related to client not bricks, below are the brick logs >> >> [2019-01-12 12:25:25.893485]:++++++++++ >> G_LOG:./tests/bugs/ec/bug-1236065.t: TEST: 68 rm -f 0.o 10.o 11.o 12.o 13.o >> 14.o 15.o 16.o 17.o 18.o 19.o 1.o 2.o 3.o 4.o 5.o 6.o 7.o 8.o 9.o ++++++++++ >> The message "I [MSGID: 101016] [glusterfs3.h:746:dict_to_xdr] 0-dict: key >> 'trusted.ec.size' would not be sent on wire in the future [Invalid >> argument]" repeated 199 times between [2019-01-12 12:25:25.283989] and >> [2019-01-12 12:25:25.899532] >> [2019-01-12 12:25:25.903375] E [MSGID: 113001] >> [posix-inode-fd-ops.c:4617:_posix_handle_xattr_keyvalue_pair] >> 8-patchy-posix: fgetxattr failed on >> gfid=d91f6331-d394-479d-ab51-6bcf674ac3e0 while doing xattrop: >> Key:trusted.ec.dirty (Bad file descriptor) [Bad file descriptor] >> [2019-01-12 12:25:25.903468] E [MSGID: 115073] >> [server-rpc-fops_v2.c:1805:server4_fxattrop_cbk] 0-patchy-server: 1486: >> FXATTROP 2 (d91f6331-d394-479d-ab51-6bcf674ac3e0), client: >> CTX_ID:b785c2b0-3453-4a03-b129-19e6ceeb5346-GRAPH_ID:0-PID:24147-HOST:softserve-moagrawa-test.1-PC_NAME:patchy-client-1-RECON_NO:-1, >> error-xlator: patchy-posix [Bad file descriptor] >> >> >> Thanks, >> Mohit Agrawal >> >> On Sat, Jan 12, 2019 at 6:29 PM Mohit Agrawal >> wrote: >> >>> >>> For specific to "add-brick-and-validate-replicated-volume-options.t" i >>> have posted a patch https://review.gluster.org/22015. >>> For test case "ec/bug-1236065.t" I think the issue needs to be check by >>> ec team >>> >>> On the brick side, it is showing below logs >>> >>> >>>>>>>>>>>>>>>>> >>> >>> on wire in the future [Invalid argument] >>> The message "I [MSGID: 101016] [glusterfs3.h:746:dict_to_xdr] 0-dict: >>> key 'trusted.ec.dirty' would not be sent on wire in the future [Invalid >>> argument]" repeated 3 times between [2019-01-12 12:25:25.902828] and >>> [2019-01-12 12:25:25.902992] >>> [2019-01-12 12:25:25.903553] W [MSGID: 114031] >>> [client-rpc-fops_v2.c:1614:client4_0_fxattrop_cbk] 0-patchy-client-1: >>> remote operation failed [Bad file descriptor] >>> [2019-01-12 12:25:25.903998] W [MSGID: 122040] >>> [ec-common.c:1181:ec_prepare_update_cbk] 0-patchy-disperse-0: Failed to get >>> size and version : FOP : 'FXATTROP' failed on gfid >>> d91f6331-d394-479d-ab51-6bcf674ac3e0 [Input/output error] >>> [2019-01-12 12:25:25.904059] W [fuse-bridge.c:1907:fuse_unlink_cbk] >>> 0-glusterfs-fuse: 3259: UNLINK() /test/0.o => -1 (Input/output error) >>> >>> >>>>>>>>>>>>>>>>>>> >>> >>> Test case is getting timed out because "volume heal $V0 full" command is >>> stuck, look's like shd is getting stuck at getxattr >>> >>> >>>>>>>>>>>>>>. >>> >>> Thread 8 (Thread 0x7f83777fe700 (LWP 25552)): >>> #0 0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from >>> /usr/lib64/libpthread.so.0 >>> #1 0x00007f83bc910e5b in syncop_getxattr (subvol=, >>> loc=loc at entry=0x7f83777fdbb0, dict=dict at entry=0x0, key=key at entry=0x7f83add06a28 >>> "trusted.ec.heal", xdata_in=xdata_in at entry=0x0, >>> xdata_out=xdata_out at entry=0x0) at syncop.c:1680 >>> #2 0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a8030880, >>> child=, loc=0x7f83777fdbb0, full=) at >>> ec-heald.c:161 >>> #3 0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a80094b0, >>> entry=, parent=0x7f83777fdde0, data=0x7f83a8030880) at >>> ec-heald.c:294 >>> #4 0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a80094b0, >>> loc=loc at entry=0x7f83777fdde0, pid=pid at entry=-6, data=data at entry=0x7f83a8030880, >>> fn=fn at entry=0x7f83add03140 ) at syncop-utils.c:125 >>> #5 0x00007f83add03534 in ec_shd_full_sweep (healer=healer at entry=0x7f83a8030880, >>> inode=) at ec-heald.c:311 >>> #6 0x00007f83add0367b in ec_shd_full_healer (data=0x7f83a8030880) at >>> ec-heald.c:372 >>> #7 0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0 >>> #8 0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6 >>> Thread 7 (Thread 0x7f8376ffd700 (LWP 25553)): >>> #0 0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from >>> /usr/lib64/libpthread.so.0 >>> #1 0x00007f83bc910e5b in syncop_getxattr (subvol=, >>> loc=loc at entry=0x7f8376ffcbb0, dict=dict at entry=0x0, key=key at entry=0x7f83add06a28 >>> "trusted.ec.heal", xdata_in=xdata_in at entry=0x0, >>> xdata_out=xdata_out at entry=0x0) at syncop.c:1680 >>> #2 0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a80308f0, >>> child=, loc=0x7f8376ffcbb0, full=) at >>> ec-heald.c:161 >>> #3 0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a800d110, >>> entry=, parent=0x7f8376ffcde0, data=0x7f83a80308f0) at >>> ec-heald.c:294 >>> #4 0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a800d110, >>> loc=loc at entry=0x7f8376ffcde0, pid=pid at entry=-6, data=data at entry=0x7f83a80308f0, >>> fn=fn at entry=0x7f83add03140 ) at syncop-utils.c:125 >>> #5 0x00007f83add03534 in ec_shd_full_sweep (healer=healer at entry=0x7f83a80308f0, >>> inode=) at ec-heald.c:311 >>> #6 0x00007f83add0367b in ec_shd_full_healer (data=0x7f83a80308f0) at >>> ec-heald.c:372 >>> #7 0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0 >>> #8 0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6 >>> Thread 6 (Thread 0x7f83767fc700 (LWP 25554)): >>> #0 0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from >>> /usr/lib64/libpthread.so.0 >>> #1 0x00007f83bc910e5b in syncop_getxattr (subvol=, >>> loc=loc at entry=0x7f83767fbbb0, dict=dict at entry=0x0, key=key at entry=0x7f83add06a28 >>> "trusted.ec.heal", xdata_in=xdata_in at entry=0x0, >>> xdata_out=xdata_out at entry=0x0) at syncop.c:1680 >>> #2 0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a8030960, >>> child=, loc=0x7f83767fbbb0, full=) at >>> ec-heald.c:161 >>> #3 0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a8010af0, >>> entry=, parent=0x7f83767fbde0, data=0x7f83a8030960) at >>> ec-heald.c:294 >>> #4 0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a8010af0, >>> loc=loc at entry=0x7f83767fbde0, pid=pid at entry=-6, data=data at entry=0x7f83a8030960, >>> fn=fn at entry=0x7f83add03140 ) at syncop-utils.c:125 >>> #5 0x00007f83add03534 in ec_shd_full_sweep (healer=healer at entry=0x7f83a8030960, >>> inode=) at ec-heald.c:311 >>> #6 0x00007f83add0367b in ec_shd_full_healer (data=0x7f83a8030960) at >>> ec-heald.c:372 >>> #7 0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0 >>> #8 0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6 >>> Thread 5 (Thread 0x7f8375ffb700 (LWP 25555)): >>> #0 0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from >>> /usr/lib64/libpthread.so.0 >>> #1 0x00007f83bc910e5b in syncop_getxattr (subvol=, >>> loc=loc at entry=0x7f8375ffabb0, dict=dict at entry=0x0, key=key at entry=0x7f83add06a28 >>> "trusted.ec.heal", xdata_in=xdata_in at entry=0x0, >>> xdata_out=xdata_out at entry=0x0) at syncop.c:1680 >>> #2 0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a80309d0, >>> child=, loc=0x7f8375ffabb0, full=) at >>> ec-heald.c:161 >>> #3 0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a80144d0, >>> entry=, parent=0x7f8375ffade0, data=0x7f83a80309d0) at >>> ec-heald.c:294 >>> #4 0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a80144d0, >>> loc=loc at entry=0x7f8375ffade0, pid=pid at entry=-6, data=data at entry=0x7f83a80309d0, >>> fn=fn at entry=0x7f83add03140 ) at syncop-utils.c:125 >>> #5 0x00007f83add03534 in ec_shd_full_sweep (healer=healer at entry=0x7f83a80309d0, >>> inode=) at ec-heald.c:311 >>> #6 0x00007f83add0367b in ec_shd_full_healer (data=0x7f83a80309d0) at >>> ec-heald.c:372 >>> #7 0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0 >>> #8 0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6 >>> Thread 4 (Thread 0x7f83757fa700 (LWP 25556)): >>> #0 0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from >>> /usr/lib64/libpthread.so.0 >>> #1 0x00007f83bc910e5b in syncop_getxattr (subvol=, >>> loc=loc at entry=0x7f83757f9bb0, dict=dict at entry=0x0, key=key at entry=0x7f83add06a28 >>> "trusted.ec.heal", xdata_in=xdata_in at entry=0x0, >>> xdata_out=xdata_out at entry=0x0) at syncop.c:1680 >>> #2 0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a8030a40, >>> child=, loc=0x7f83757f9bb0, full=) at >>> ec-heald.c:161 >>> #3 0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a8017eb0, >>> entry=, parent=0x7f83757f9de0, data=0x7f83a8030a40) at >>> ec-heald.c:294 >>> #4 0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a8017eb0, >>> loc=loc at entry=0x7f83757f9de0, pid=pid at entry=-6, data=data at entry=0x7f83a8030a40, >>> fn=fn at entry=0x7f83add03140 ) at syncop-utils.c:125 >>> #5 0x00007f83add03534 in ec_shd_full_sweep (healer=healer at entry=0x7f83a8030a40, >>> inode=) at ec-heald.c:311 >>> #6 0x00007f83add0367b in ec_shd_full_healer (data=0x7f83a8030a40) at >>> ec-heald.c:372 >>> #7 0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0 >>> #8 0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6 >>> Thread 3 (Thread 0x7f8374ff9700 (LWP 25557)): >>> #0 0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from >>> /usr/lib64/libpthread.so.0 >>> #1 0x00007f83bc910e5b in syncop_getxattr (subvol=, >>> loc=loc at entry=0x7f8374ff8bb0, dict=dict at entry=0x0, key=key at entry=0x7f83add06a28 >>> "trusted.ec.heal", xdata_in=xdata_in at entry=0x0, >>> xdata_out=xdata_out at entry=0x0) at syncop.c:1680 >>> #2 0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a8030ab0, >>> child=, loc=0x7f8374ff8bb0, full=) at >>> ec-heald.c:161 >>> #3 0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a801b890, >>> entry=, parent=0x7f8374ff8de0, data=0x7f83a8030ab0) at >>> ec-heald.c:294 >>> #4 0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a801b890, >>> loc=loc at entry=0x7f8374ff8de0, pid=pid at entry=-6, data=data at entry=0x7f83a8030ab0, >>> fn=fn at entry=0x7f83add03140 ) at syncop-utils.c:125 >>> #5 0x00007f83add03534 in ec_shd_full_sweep (healer=healer at entry=0x7f83a8030ab0, >>> inode=) at ec-heald.c:311 >>> #6 0x00007f83add0367b in ec_shd_full_healer (data=0x7f83a8030ab0) at >>> ec-heald.c:372 >>> #7 0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0 >>> #8 0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6 >>> Thread 2 (Thread 0x7f8367fff700 (LWP 25558)): >>> #0 0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from >>> /usr/lib64/libpthread.so.0 >>> #1 0x00007f83bc910e5b in syncop_getxattr (subvol=, >>> loc=loc at entry=0x7f8367ffebb0, dict=dict at entry=0x0, key=key at entry=0x7f83add06a28 >>> "trusted.ec.heal", xdata_in=xdata_in at entry=0x0, >>> xdata_out=xdata_out at entry=0x0) at syncop.c:1680 >>> #2 0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a8030b20, >>> child=