From atumball at redhat.com Wed May 1 12:29:43 2019 From: atumball at redhat.com (Amar Tumballi Suryanarayan) Date: Wed, 1 May 2019 17:59:43 +0530 Subject: [Gluster-users] parallel-readdir prevents directories and files listing - Bug 1670382 In-Reply-To: References: Message-ID: On Mon, Apr 29, 2019 at 3:56 PM Jo?o Ba?to < joao.bauto at neuro.fchampalimaud.org> wrote: > Hi, > > I have an 8 brick distributed volume where Windows and Linux clients mount > the volume via samba and headless compute servers using gluster native > fuse. With parallel-readdir on, if a Windows client creates a new folder, > the folder is indeed created but invisible to the Windows client. Accessing > the same samba share in a Linux client, the folder is again visible and > with normal behavior. The same folder is also visible when mounting via > gluster native fuse. > > The Windows client can list existing directories and rename them while, > for files, everything seems to be working fine. > > Gluster servers: CentOS 7.5 with Gluster 5.3 and Samba 4.8.3-4.el7.0.1 > from @fasttrack > Clients tested: Windows 10, Ubuntu 18.10, CentOS 7.5 > > https://bugzilla.redhat.com/show_bug.cgi?id=1670382 > Thanks for the bug report. Will look into this, and get back. Last I knew, we recommended to avoid fuse and samba shares on same volume (Mainly as we couldn't spend a lot of effort on testing the configuration). Anyways, we would treat the behavior as bug for sure. One possible path looking at below volume info is to disable 'stat-prefetch' option and see if it helps. Next option I would try is to disable readdir-ahead. Regards, Amar > > > Volume Name: tank > Type: Distribute > Volume ID: 9582685f-07fa-41fd-b9fc-ebab3a6989cf > Status: Started > Snapshot Count: 0 > Number of Bricks: 8 > Transport-type: tcp > Bricks: > Brick1: swp-gluster-01:/tank/volume1/brick > Brick2: swp-gluster-02:/tank/volume1/brick > Brick3: swp-gluster-03:/tank/volume1/brick > Brick4: swp-gluster-04:/tank/volume1/brick > Brick5: swp-gluster-01:/tank/volume2/brick > Brick6: swp-gluster-02:/tank/volume2/brick > Brick7: swp-gluster-03:/tank/volume2/brick > Brick8: swp-gluster-04:/tank/volume2/brick > Options Reconfigured: > performance.parallel-readdir: on > performance.readdir-ahead: on > performance.cache-invalidation: on > performance.md-cache-timeout: 600 > storage.batch-fsync-delay-usec: 0 > performance.write-behind-window-size: 32MB > performance.stat-prefetch: on > performance.read-ahead: on > performance.read-ahead-page-count: 16 > performance.rda-request-size: 131072 > performance.quick-read: on > performance.open-behind: on > performance.nl-cache-timeout: 600 > performance.nl-cache: on > performance.io-thread-count: 64 > performance.io-cache: off > performance.flush-behind: on > performance.client-io-threads: off > performance.write-behind: off > performance.cache-samba-metadata: on > network.inode-lru-limit: 0 > features.cache-invalidation-timeout: 600 > features.cache-invalidation: on > cluster.readdir-optimize: on > cluster.lookup-optimize: on > client.event-threads: 4 > server.event-threads: 16 > features.quota-deem-statfs: on > nfs.disable: on > features.quota: on > features.inode-quota: on > cluster.enable-shared-storage: disable > > Cheers, > > Jo?o Ba?to > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -- Amar Tumballi (amarts) -------------- next part -------------- An HTML attachment was scrubbed... URL: From atumball at redhat.com Wed May 1 12:31:58 2019 From: atumball at redhat.com (Amar Tumballi Suryanarayan) Date: Wed, 1 May 2019 18:01:58 +0530 Subject: [Gluster-users] Gluster 5 Geo-replication Guide In-Reply-To: References: Message-ID: On Fri, Apr 26, 2019 at 7:00 PM Shon Stephens wrote: > Dear All, > Is there a good, step by step guide for setting up geo-replication > with Glusterfs 5? The docs are a difficult to decipher read, for me, and > seem more feature guide than actual instruction. > > Geo-Replication steps in glusterfs-5 is similar to the previous versions (and glusterfs-6.x too). If you are used to Ansible to setup gluster for you, we already have geo-replication setup automated with Ansible @ http://github.com/gluster/gluster-ansible -Amar > Thank you, > Shon > -- > > SHON STEPHENS > > SENIOR CONSULTANT > > Red Hat > > T: 571-781-0787 M: 703-297-0682 > > TRIED. TESTED. TRUSTED. > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -- Amar Tumballi (amarts) -------------- next part -------------- An HTML attachment was scrubbed... URL: From atumball at redhat.com Wed May 1 12:34:52 2019 From: atumball at redhat.com (Amar Tumballi Suryanarayan) Date: Wed, 1 May 2019 18:04:52 +0530 Subject: [Gluster-users] GlusterFS on ZFS In-Reply-To: References: <085deed5-f048-4baa-84f8-1f6ef1436a5b@email.android.com> Message-ID: On Tue, Apr 23, 2019 at 11:38 PM Cody Hill wrote: > > Thanks for the info Karli, > > I wasn?t aware ZFS Dedup was such a dog. I guess I?ll leave that off. My > data get?s 3.5:1 savings on compression alone. I was aware of stripped > sets. I will be doing 6x Striped sets across 12x disks. > > On top of this design I?m going to try and test Intel Optane DIMM (512GB) > as a ?Tier? for GlusterFS to try and get further write acceleration. And > issues with GlusterFS ?Tier? functionality that anyone is aware of? > > Hi Cody, I wanted to be honest about GlusterFS 'Tier' functionality. While it is functional and works, we had not seen the actual benefit we expected with the feature, and noticed it is better to use the tiering on each host machine (ie, on bricks) and use those bricks as glusterfs bricks. (like dmcache). Also note that from glusterfs-6.x releases, Tier feature is deprecated. -Amar > Thank you, > Cody Hill > > On Apr 18, 2019, at 2:32 AM, Karli Sj?berg wrote: > > > > Den 17 apr. 2019 16:30 skrev Cody Hill : > > Hey folks. > > I?m looking to deploy GlusterFS to host some VMs. I?ve done a lot of > reading and would like to implement Deduplication and Compression in this > setup. My thought would be to run ZFS to handle the Compression and > Deduplication. > > > You _really_ don't want ZFS doing dedup for any reason. > > > ZFS would give me the following benefits: > 1. If a single disk fails rebuilds happen locally instead of over the > network > 2. Zil & L2Arc should add a slight performance increase > > > Adding two really good NVME SSD's as a mirrored SLOG vdev does a huge deal > for synchronous write performance, turning every random write into large > streams that the spinning drives handle better. > > Don't know how picky Gluster is about synchronicity though, most > "performance" tweaking suggests setting stuff to async, which I wouldn't > recommend, but it's a huge boost for throughput obviously; not having to > wait for stuff to actually get written, but it's dangerous. > > With mirrored NVME SLOG's, you could probably get that throughput without > going asynchronous, which saves you from potential data corruption in a > sudden power loss. > > L2ARC on the other hand does a bit for read latency, but for a general > purpose file server- in practice- not a huge difference, the working set is > just too large. Also keep in mind that L2ARC isn't "free". You need more > RAM to know where you've cached stuff... > > 3. Deduplication and Compression are inline and have pretty good > performance with modern hardware (Intel Skylake) > > > ZFS deduplication has terrible performance. Watch your throughput > automatically drop from hundreds or thousands of MB/s down to, like 5. It's > a feature;) > > 4. Automated Snapshotting > > I can then layer GlusterFS on top to handle distribution to allow 3x > Replicas of my storage. > My question is? Why aren?t more people doing this? Is this a horrible idea > for some reason that I?m missing? > > > While it could save a lot of space in some hypothetical instance, the > drawbacks can never motivate it. E.g. if you want one node to suddenly die > and never recover because of RAM exhaustion, go with ZFS dedup ;) > > I?d be very interested to hear your thoughts. > > > Avoid ZFS dedup at all costs. LZ4 compression on the hand is awesome, > definitely use that! It's basically a free performance enhancer the also > saves space :) > > As another person has said, the best performance layout is RAID10- striped > mirrors. I understand you'd want to get as much volume as possible with > RAID-Z/RAID(5|6) since gluster also replicates/distributes, but it has a > huge impact on IOPS. If performance is the main concern, do striped mirrors > with replica 3 in Gluster. My advice is to test thoroughly with different > pool layouts to see what gives acceptable performance against your volume > requirements. > > /K > > > Additional thoughts: > I?d like to use Ganesha pNFS to connect to this storage. (Any issues here?) > I think I?d need KeepAliveD across these 3x nodes to store in the FSTAB > (Is this correct?) > I?m also thinking about creating a ?Gluster Tier? of 512GB of Intel Optane > DIMM to really smooth out write latencies? Any issues here? > > Thank you, > Cody Hill > > > > > > > > > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -- Amar Tumballi (amarts) -------------- next part -------------- An HTML attachment was scrubbed... URL: From atumball at redhat.com Wed May 1 12:36:45 2019 From: atumball at redhat.com (Amar Tumballi Suryanarayan) Date: Wed, 1 May 2019 18:06:45 +0530 Subject: [Gluster-users] Proposal: Changes in Gluster Community meetings In-Reply-To: <907BA003-F786-46CF-A31B-38C93CE9BB20@onholyground.com> References: <62104B6F-99CF-4C22-80FC-9C177F73E897@onholyground.com> <907BA003-F786-46CF-A31B-38C93CE9BB20@onholyground.com> Message-ID: On Tue, Apr 23, 2019 at 8:47 PM Darrell Budic wrote: > I was one of the folk who wanted a NA/EMEA scheduled meeting, and I?m > going to have to miss it due to some real life issues (clogged sewer I?m > going to have to be dealing with at the time). Apologies, I?ll work on > making the next one. > > No problem. We will continue to have these meetings every week (ie, bi-weekly in each timezone). Feel free to join when possible. We surely like to see more community participation for sure, but everyone would have their day jobs, so no pressure :-) -Amar > -Darrell > > On Apr 22, 2019, at 4:20 PM, FNU Raghavendra Manjunath > wrote: > > > Hi, > > This is the agenda for tomorrow's community meeting for NA/EMEA timezone. > > https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g?both > ---- > > > > On Thu, Apr 11, 2019 at 4:56 AM Amar Tumballi Suryanarayan < > atumball at redhat.com> wrote: > >> Hi All, >> >> Below is the final details of our community meeting, and I will be >> sending invites to mailing list following this email. You can add Gluster >> Community Calendar so you can get notifications on the meetings. >> >> We are starting the meetings from next week. For the first meeting, we >> need 1 volunteer from users to discuss the use case / what went well, and >> what went bad, etc. preferrably in APAC region. NA/EMEA region, next week. >> >> Draft Content: https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g >> ---- >> Gluster Community Meeting >> Previous >> Meeting minutes: >> >> - http://github.com/gluster/community >> >> >> Date/Time: >> Check the community calendar >> >> Bridge >> >> - APAC friendly hours >> - Bridge: https://bluejeans.com/836554017 >> - NA/EMEA >> - Bridge: https://bluejeans.com/486278655 >> >> ------------------------------ >> Attendance >> >> - Name, Company >> >> Host >> >> - Who will host next meeting? >> - Host will need to send out the agenda 24hr - 12hrs in advance to >> mailing list, and also make sure to send the meeting minutes. >> - Host will need to reach out to one user at least who can talk >> about their usecase, their experience, and their needs. >> - Host needs to send meeting minutes as PR to >> http://github.com/gluster/community >> >> User stories >> >> - Discuss 1 usecase from a user. >> - How was the architecture derived, what volume type used, >> options, etc? >> - What were the major issues faced ? How to improve them? >> - What worked good? >> - How can we all collaborate well, so it is win-win for the >> community and the user? How can we >> >> Community >> >> - >> >> Any release updates? >> - >> >> Blocker issues across the project? >> - >> >> Metrics >> - Number of new bugs since previous meeting. How many are not triaged? >> - Number of emails, anything unanswered? >> >> Conferences >> / Meetups >> >> - Any conference in next 1 month where gluster-developers are going? >> gluster-users are going? So we can meet and discuss. >> >> Developer >> focus >> >> - >> >> Any design specs to discuss? >> - >> >> Metrics of the week? >> - Coverity >> - Clang-Scan >> - Number of patches from new developers. >> - Did we increase test coverage? >> - [Atin] Also talk about most frequent test failures in the CI and >> carve out an AI to get them fixed. >> >> RoundTable >> >> - >> >> ---- >> >> Regards, >> Amar >> >> On Mon, Mar 25, 2019 at 8:53 PM Amar Tumballi Suryanarayan < >> atumball at redhat.com> wrote: >> >>> Thanks for the feedback Darrell, >>> >>> The new proposal is to have one in North America 'morning' time. (10AM >>> PST), And another in ASIA day time, which is evening 7pm/6pm in Australia, >>> 9pm Newzealand, 5pm Tokyo, 4pm Beijing. >>> >>> For example, if we choose Every other Tuesday for meeting, and 1st of >>> the month is Tuesday, we would have North America time for 1st, and on 15th >>> it would be ASIA/Pacific time. >>> >>> Hopefully, this way, we can cover all the timezones, and meeting minutes >>> would be committed to github repo, so that way, it will be easier for >>> everyone to be aware of what is happening. >>> >>> Regards, >>> Amar >>> >>> On Mon, Mar 25, 2019 at 8:40 PM Darrell Budic >>> wrote: >>> >>>> As a user, I?d like to visit more of these, but the time slot is my >>>> 3AM. Any possibility for a rolling schedule (move meeting +6 hours each >>>> week with rolling attendance from maintainers?) or an occasional regional >>>> meeting 12 hours opposed to the one you?re proposing? >>>> >>>> -Darrell >>>> >>>> On Mar 25, 2019, at 4:25 AM, Amar Tumballi Suryanarayan < >>>> atumball at redhat.com> wrote: >>>> >>>> All, >>>> >>>> We currently have 3 meetings which are public: >>>> >>>> 1. Maintainer's Meeting >>>> >>>> - Runs once in 2 weeks (on Mondays), and current attendance is around >>>> 3-5 on an avg, and not much is discussed. >>>> - Without majority attendance, we can't take any decisions too. >>>> >>>> 2. Community meeting >>>> >>>> - Supposed to happen on #gluster-meeting, every 2 weeks, and is the >>>> only meeting which is for 'Community/Users'. Others are for developers >>>> as of now. >>>> Sadly attendance is getting closer to 0 in recent times. >>>> >>>> 3. GCS meeting >>>> >>>> - We started it as an effort inside Red Hat gluster team, and opened it >>>> up for community from Jan 2019, but the attendance was always from RHT >>>> members, and haven't seen any traction from wider group. >>>> >>>> So, I have a proposal to call out for cancelling all these meeting, >>>> and keeping just 1 weekly 'Community' meeting, where even topics >>>> related to maintainers and GCS and other projects can be discussed. >>>> >>>> I have a template of a draft template @ >>>> https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g >>>> >>>> Please feel free to suggest improvements, both in agenda and in >>>> timings. So, we can have more participation from members of community, >>>> which allows more user - developer interactions, and hence quality of >>>> project. >>>> >>>> Waiting for feedbacks, >>>> >>>> Regards, >>>> Amar >>>> >>>> >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>> >>>> >>>> >>> >>> -- >>> Amar Tumballi (amarts) >>> >> >> >> -- >> Amar Tumballi (amarts) >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -- Amar Tumballi (amarts) -------------- next part -------------- An HTML attachment was scrubbed... URL: From atumball at redhat.com Wed May 1 12:39:44 2019 From: atumball at redhat.com (Amar Tumballi Suryanarayan) Date: Wed, 1 May 2019 18:09:44 +0530 Subject: [Gluster-users] Community Happy Hour at Red Hat Summit In-Reply-To: References: Message-ID: On Mon, Apr 22, 2019 at 8:14 PM Amye Scavarda wrote: > The Ceph and Gluster teams are joining forces to put on a Community > Happy Hour in Boston on Tuesday, May 7th as part of Red Hat Summit. > > I will be there at Gluster Booth in Red Hat Summit. If you, or your colleagues/friends are attending, let me know. Would like to catch up for sure! -Amar > More details, including RSVP at: > https://cephandglusterhappyhour_rhsummit.eventbrite.com > -- amye > > -- > Amye Scavarda | amye at redhat.com | Gluster Community Lead > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > > > -- Amar Tumballi (amarts) -------------- next part -------------- An HTML attachment was scrubbed... URL: From atumball at redhat.com Wed May 1 12:43:25 2019 From: atumball at redhat.com (Amar Tumballi Suryanarayan) Date: Wed, 1 May 2019 18:13:25 +0530 Subject: [Gluster-users] adding thin arbiter In-Reply-To: References: Message-ID: On Mon, Apr 22, 2019 at 3:12 PM Karthik Subrahmanya wrote: > Hi, > > Currently we do not have support for converting an existing volume to a > thin-arbiter volume. It is also not supported to replace the thin-arbiter > brick with a new one. > You can create a fresh thin arbiter volume using GD2 framework and play > around that. Feel free to share your experience with thin-arbiter. > The GD1 CLIs are being implemented. We will keep things posted on this > list as and when they are ready to consume. > > Effort on this can be found @ https://review.gluster.org/22612 > Regards, > Karthik > > On Fri, Apr 19, 2019 at 8:39 PM wrote: > >> Hi guys, >> >> On an existing volume, I have a volume with 3 replica. One of them is an >> arbiter. Is there a way to change the arbiter to a thin-arbiter? I tried >> removing the arbiter brick and add it back, but the add-brick command >> does't take the --thin-arbiter option. >> >> xpk >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -- Amar Tumballi (amarts) -------------- next part -------------- An HTML attachment was scrubbed... URL: From atumball at redhat.com Wed May 1 12:46:23 2019 From: atumball at redhat.com (Amar Tumballi Suryanarayan) Date: Wed, 1 May 2019 18:16:23 +0530 Subject: [Gluster-users] Hard Failover with Samba and Glusterfs In-Reply-To: References: Message-ID: On Wed, Apr 17, 2019 at 1:33 PM David Spisla wrote: > Dear Gluster Community, > > I have this setup: 4-Node Glusterfs v5.5 Cluster, using SAMBA/CTDB v4.8 to > access the volumes (each node has a VIP) > > I was testing this failover scenario: > > 1. Start Writing 940 GB with small files (64K-100K)from a Win10 Client to > node1 > 2. During the write process I hardly shutdown node1 (where the client is > connect via VIP) by turn off the power > > My expectation is, that the write process stops and after a while the > Win10 Client offers me a Retry, so I can continue the write on different > node (which has now the VIP of node1). > In past time I did this observation, but now the system shows a strange > bahaviour: > > The Win10 Client do nothing and the Explorer freezes, in the backend CTDB > can not perform the failover and throws errors. The glusterd from node2 and > node3 logs this messages: > >> [2019-04-16 14:47:31.828323] W [glusterd-locks.c:795:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x24349) [0x7f1a62fcb349] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x2d950) [0x7f1a62fd4950] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0xe0359) [0x7f1a63087359] ) 0-management: Lock for vol archive1 not held >> [2019-04-16 14:47:31.828350] W [MSGID: 106117] [glusterd-handler.c:6451:__glusterd_peer_rpc_notify] 0-management: Lock not released for archive1 >> [2019-04-16 14:47:31.828369] W [glusterd-locks.c:795:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x24349) [0x7f1a62fcb349] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x2d950) [0x7f1a62fd4950] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0xe0359) [0x7f1a63087359] ) 0-management: Lock for vol archive2 not held >> [2019-04-16 14:47:31.828376] W [MSGID: 106117] [glusterd-handler.c:6451:__glusterd_peer_rpc_notify] 0-management: Lock not released for archive2 >> [2019-04-16 14:47:31.828412] W [glusterd-locks.c:795:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x24349) [0x7f1a62fcb349] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x2d950) [0x7f1a62fd4950] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0xe0359) [0x7f1a63087359] ) 0-management: Lock for vol gluster_shared_storage not held >> [2019-04-16 14:47:31.828423] W [MSGID: 106117] [glusterd-handler.c:6451:__glusterd_peer_rpc_notify] 0-management: Lock not released for gluster_shared_storage >> >> > *In my oponion Samba/CTDB can not perform the failover correctly and > continue the write process because glusterfs didn't released the lock.* > What do you think? It seems to me like a bug because in past time the > failover works correctly. > > Thanks for the report David. It surely looks like a bug, and I would let some experts on this domain answer the question. One request on such thing is to file a bug (preferred) or github issue, so it can be present in system. > Regards > David Spisla > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -- Amar Tumballi (amarts) -------------- next part -------------- An HTML attachment was scrubbed... URL: From atumball at redhat.com Wed May 1 12:49:21 2019 From: atumball at redhat.com (Amar Tumballi Suryanarayan) Date: Wed, 1 May 2019 18:19:21 +0530 Subject: [Gluster-users] gluster mountbroker failed after upgrade to gluster 6 In-Reply-To: References: Message-ID: Few questions inline. On Fri, Apr 12, 2019 at 1:09 PM Benedikt Kale? wrote: > |Hi,| > > |I updated to gluster to gluster 6 and now the geo-replication remains > in status "Faulty". > >From which version did you upgrade? And what does the volume info look like ? (Helps us to understand if this is something we have already tested or not). > | > > |If I run a "gluster-mountbroker status" I get: > | > > |Traceback (most recent call last): > File "/usr/sbin/gluster-mountbroker", line 396, in > runcli() > File "/usr/lib/python3/dist-packages/gluster/cliutils/cliutils.py", > line 225, in runcli > cls.run(args) > File "/usr/sbin/gluster-mountbroker", line 275, in run > out = execute_in_peers("node-status") > File "/usr/lib/python3/dist-packages/gluster/cliutils/cliutils.py", > line 127, in execute_in_peers > raise GlusterCmdException((rc, out, err, " ".join(cmd))) > gluster.cliutils.cliutils.GlusterCmdException: (1, '', 'Unable to end. > Error : Success\n', 'gluster system:: execute mountbroker.py node-status') > | > > |What can I do: set up the georeplication again?| > > Sorry for delay, and we will surely try to get you back to normal state. Can you check the logs in /var/log/glusterfs/geo-replication/* and see if there is anything concerning there? That would help in understanding the situation. -Amar > |Best regards| > > |Benedikt > | > > | > | > > -- > ?forumZFD > Entschieden f?r Frieden|Committed to Peace > > Benedikt Kale? > Leiter Team IT|Head team IT > > Forum Ziviler Friedensdienst e.V.|Forum Civil Peace Service > Am K?lner Brett 8 | 50825 K?ln | Germany > > Tel 0221 91273233 | Fax 0221 91273299 | > http://www.forumZFD.de > > Vorstand nach ? 26 BGB, einzelvertretungsberechtigt|Executive Board: > Oliver Knabe (Vorsitz|Chair), Sonja Wiekenberg-Mlalandle, Alexander Mauz > VR 17651 Amtsgericht K?ln > > Spenden|Donations: IBAN DE37 3702 0500 0008 2401 01 BIC BFSWDE33XXX > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -- Amar Tumballi (amarts) -------------- next part -------------- An HTML attachment was scrubbed... URL: From atumball at redhat.com Wed May 1 12:55:01 2019 From: atumball at redhat.com (Amar Tumballi Suryanarayan) Date: Wed, 1 May 2019 18:25:01 +0530 Subject: [Gluster-users] performance - what can I expect In-Reply-To: <8f150899-321b-f184-978c-9b7b01e6fb39@dalco.ch> References: <381efa03-78b3-e244-9f52-054b357b5d57@dalco.ch> <8f150899-321b-f184-978c-9b7b01e6fb39@dalco.ch> Message-ID: Hi Pascal, Sorry for complete delay in this one. And thanks for testing out in different scenarios. Few questions before others can have a look and advice you. 1. What is the volume info output ? 2. Do you see any concerning logs in glusterfs log files? 3. Please use `gluster volume profile` while running the tests, and that gives a lot of information. 4. Considering you are using glusterfs-6.0, please take statedump of client process (on any node) before and after the test, so we can analyze the latency information of each translators. With these information, I hope we will be in a better state to answer the questions. On Wed, Apr 10, 2019 at 3:45 PM Pascal Suter wrote: > i continued my testing with 5 clients, all attached over 100Gbit/s > omni-path via IP over IB. when i run the same iozone benchmark across > all 5 clients where gluster is mounted using the glusterfs client, i get > an aggretated write throughput of only about 400GB/s and an aggregated > read throughput of 1.5GB/s. Each node was writing a single 200Gb file in > 16MB chunks and the files where distributed across all three bricks on > the server. > > the connection was established over Omnipath for sure, as there is no > other link between the nodes and server. > > i have no clue what i'm doing wrong here. i can't believe that this is a > normal performance people would expect to see from gluster. i guess > nobody would be using it if it was this slow. > > again, when written dreictly to the xfs filesystem on the bricks, i get > over 6GB/s read and write throughput using the same benchmark. > > any advise is appreciated > > cheers > > Pascal > > On 04.04.19 12:03, Pascal Suter wrote: > > I just noticed i left the most important parameters out :) > > > > here's the write command with filesize and recordsize in it as well :) > > > > ./iozone -i 0 -t 1 -F /mnt/gluster/storage/thread1 -+n -c -C -e -I -w > > -+S 0 -s 200G -r 16384k > > > > also i ran the benchmark without direct_io which resulted in an even > > worse performance. > > > > i also tried to mount the gluster volume via nfs-ganesha which further > > reduced throughput down to about 450MB/s > > > > if i run the iozone benchmark with 3 threads writing to all three > > bricks directly (from the xfs filesystem) i get throughputs of around > > 6GB/s .. if I run the same benchmark through gluster mounted locally > > using the fuse client and with enough threads so that each brick gets > > at least one file written to it, i end up seing throughputs around > > 1.5GB/s .. that's a 4x decrease in performance. at it actually is the > > same if i run the benchmark with less threads and files only get > > written to two out of three bricks. > > > > cpu load on the server is around 25% by the way, nicely distributed > > across all available cores. > > > > i can't believe that gluster should really be so slow and everybody is > > just happily using it. any hints on what i'm doing wrong are very > > welcome. > > > > i'm using gluster 6.0 by the way. > > > > regards > > > > Pascal > > > > On 03.04.19 12:28, Pascal Suter wrote: > >> Hi all > >> > >> I am currently testing gluster on a single server. I have three > >> bricks, each a hardware RAID6 volume with thin provisioned LVM that > >> was aligned to the RAID and then formatted with xfs. > >> > >> i've created a distributed volume so that entire files get > >> distributed across my three bricks. > >> > >> first I ran a iozone benchmark across each brick testing the read and > >> write perofrmance of a single large file per brick > >> > >> i then mounted my gluster volume locally and ran another iozone run > >> with the same parameters writing a single file. the file went to > >> brick 1 which, when used driectly, would write with 2.3GB/s and read > >> with 1.5GB/s. however, through gluster i got only 800MB/s read and > >> 750MB/s write throughput > >> > >> another run with two processes each writing a file, where one file > >> went to the first brick and the other file to the second brick (which > >> by itself when directly accessed wrote at 2.8GB/s and read at > >> 2.7GB/s) resulted in 1.2GB/s of aggregated write and also aggregated > >> read throughput. > >> > >> Is this a normal performance i can expect out of a glusterfs or is it > >> worth tuning in order to really get closer to the actual brick > >> filesystem performance? > >> > >> here are the iozone commands i use for writing and reading.. note > >> that i am using directIO in order to make sure i don't get fooled by > >> cache :) > >> > >> ./iozone -i 0 -t 1 -F /mnt/brick${b}/thread1 -+n -c -C -e -I -w -+S 0 > >> -s $filesize -r $recordsize > iozone-brick${b}-write.txt > >> > >> ./iozone -i 1 -t 1 -F /mnt/brick${b}/thread1 -+n -c -C -e -I -w -+S 0 > >> -s $filesize -r $recordsize > iozone-brick${b}-read.txt > >> > >> cheers > >> > >> Pascal > >> > >> _______________________________________________ > >> Gluster-users mailing list > >> Gluster-users at gluster.org > >> https://lists.gluster.org/mailman/listinfo/gluster-users > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > > > -- Amar Tumballi (amarts) -------------- next part -------------- An HTML attachment was scrubbed... URL: From sankarshan.mukhopadhyay at gmail.com Thu May 2 02:51:18 2019 From: sankarshan.mukhopadhyay at gmail.com (Sankarshan Mukhopadhyay) Date: Thu, 2 May 2019 08:21:18 +0530 Subject: [Gluster-users] Posting a set of conversations around troubleshooting Gluster Message-ID: is the link to the play list. We are at this point 2 episodes in. I'd like to (a) keep this first pass as basic introduction to troubleshooting (b) focus on the components which seem complicated enough. Requesting for feedback on which components we should cover next. Please reply to this thread. Additionally, as an administrator/user, if you have a set of scripts/tools which you use to troubleshoot and would like to talk about them, let me know and we will set up a conversation. From pascal.suter at dalco.ch Thu May 2 07:51:12 2019 From: pascal.suter at dalco.ch (Pascal Suter) Date: Thu, 2 May 2019 09:51:12 +0200 Subject: [Gluster-users] performance - what can I expect In-Reply-To: References: <381efa03-78b3-e244-9f52-054b357b5d57@dalco.ch> <8f150899-321b-f184-978c-9b7b01e6fb39@dalco.ch> Message-ID: <263c3d8d-d3ab-f052-85e1-ad7ade4073d7@dalco.ch> Hi Amar thanks for rolling this back up. Actually i have done some more benchmarking and fiddled with the config to finally reach a performance figure i could live with. I now can squeeze about 3GB/s out of that server which seems to be close to what i can get out of its network uplink (using IP over Omni-Path). The system is now set up and in production so i can't run any benchmarks on it anymore but i will get back at benchmarking in the near future to test some storage related hardware, and i will try it with gluster on top again. embarassingly the biggest performance issue was that the default installation of the server was running the "performance" profile of tuned. once i switched it to "throughput-performance" performance increased dramatically. the volume info now looks pretty unspectacular: Volume Name: storage Type: Distribute Volume ID: c81c7e46-add5-4d88-9945-24cf7947ef8c Status: Started Snapshot Count: 0 Number of Bricks: 3 Transport-type: tcp Bricks: Brick1: themis01:/data/brick1/brick Brick2: themis01:/data/brick2/brick Brick3: themis01:/data/brick3/brick Options Reconfigured: transport.address-family: inet nfs.disable: on thanks for pointing out gluster volume profile, i'll have a go with it during my next benchmarking session. so far i was using iostat to track brick-level io performance during my benchmarks. the main question i wanted to ask was, if there is a general rule of thumb, how much throughput of the original bare brick throughput would be expected to be left over once gluster is added on top of it. to give you an example: when I use a parallel filesystem like Lustre or BeeGFS i usually expect to get at least about 85% of the raw storage target throughput as aggregated bandwidth over a multi-node test out of my Lustre or BeeGFS setup. I consider any numbers below that to be too low and therefore will have to dig into performance tuning to find the bottle neck. i was hoping someone could give me a rule-of-thumb number for a simple distributed gluster setup, like that 85% number i've established for a parallel file system. so at the moment my takeaway is, in a simple distributed volume across 3 bricks with an aggregated bandwidth of 6GB/s i can expect to get about 3GB/s aggregated bandwith out of the gluster mount, given there are no bottle necks in the network. the 3GB/s is a number conducted under ideal circumstances, meaning, i primed the storage to make sure i could run a benchmark run using three nodes, with each node running a single thread writing to a single file and each file was located on another bricke. this yielded the maximum perfomance as this was pure streaming IO without any overlapping file writing to the bricks other than the overhead created by gluster's own internal mechanisms. Interestingly, the performance didn't drop much when i added nodes and threads and introduced more random-ish io by having several processes write to the same brick. So I assume, what "eats" up the 50% performance in the end is probably Gluster writing all these additional hidden files which I assume is some sort of Metadata. This causes additional IO on the disk that i'm streaming my one file to and therefore turns my streaming IO into a random io load for the raid controller and underlying harddisks which on spinning disks would have about the performance impact i was seing in my benchmarks. I have yet to try gluster on a Flash based brick and test its performance there.. i would expect to see a better "efficiency" than the 50% i've measured on this system here as random io vs. streaming io should not make such a difference (or acutally almost no difference at all) on a flash based storage. but that's? me guessing now. so for the moment i'm fine but i would still be interested in hearing ball-park figure "efficiency" numbers from others using gluster in a similar setup. cheers Pascal On 01.05.19 14:55, Amar Tumballi Suryanarayan wrote: > Hi Pascal, > > Sorry for complete delay in this one. And thanks for testing out in > different scenarios.? Few questions before others can have a look and > advice you. > > 1. What is the volume info output ? > > 2. Do you see any concerning logs in glusterfs log files? > > 3. Please use `gluster volume profile` while running the tests, and > that gives a lot of information. > > 4. Considering you are using glusterfs-6.0, please take statedump of > client process (on any node) before and after the test, so we can > analyze the latency information of each translators. > > With these information, I hope we will be in a better state to answer > the questions. > > > On Wed, Apr 10, 2019 at 3:45 PM Pascal Suter > wrote: > > i continued my testing with 5 clients, all attached over 100Gbit/s > omni-path via IP over IB. when i run the same iozone benchmark across > all 5 clients where gluster is mounted using the glusterfs client, > i get > an aggretated write throughput of only about 400GB/s and an > aggregated > read throughput of 1.5GB/s. Each node was writing a single 200Gb > file in > 16MB chunks and the files where distributed across all three > bricks on > the server. > > the connection was established over Omnipath for sure, as there is no > other link between the nodes and server. > > i have no clue what i'm doing wrong here. i can't believe that > this is a > normal performance people would expect to see from gluster. i guess > nobody would be using it if it was this slow. > > again, when written dreictly to the xfs filesystem on the bricks, > i get > over 6GB/s read and write throughput using the same benchmark. > > any advise is appreciated > > cheers > > Pascal > > On 04.04.19 12:03, Pascal Suter wrote: > > I just noticed i left the most important parameters out :) > > > > here's the write command with filesize and recordsize in it as > well :) > > > > ./iozone -i 0 -t 1 -F /mnt/gluster/storage/thread1 -+n -c -C -e > -I -w > > -+S 0 -s 200G -r 16384k > > > > also i ran the benchmark without direct_io which resulted in an > even > > worse performance. > > > > i also tried to mount the gluster volume via nfs-ganesha which > further > > reduced throughput down to about 450MB/s > > > > if i run the iozone benchmark with 3 threads writing to all three > > bricks directly (from the xfs filesystem) i get throughputs of > around > > 6GB/s .. if I run the same benchmark through gluster mounted > locally > > using the fuse client and with enough threads so that each brick > gets > > at least one file written to it, i end up seing throughputs around > > 1.5GB/s .. that's a 4x decrease in performance. at it actually > is the > > same if i run the benchmark with less threads and files only get > > written to two out of three bricks. > > > > cpu load on the server is around 25% by the way, nicely distributed > > across all available cores. > > > > i can't believe that gluster should really be so slow and > everybody is > > just happily using it. any hints on what i'm doing wrong are very > > welcome. > > > > i'm using gluster 6.0 by the way. > > > > regards > > > > Pascal > > > > On 03.04.19 12:28, Pascal Suter wrote: > >> Hi all > >> > >> I am currently testing gluster on a single server. I have three > >> bricks, each a hardware RAID6 volume with thin provisioned LVM > that > >> was aligned to the RAID and then formatted with xfs. > >> > >> i've created a distributed volume so that entire files get > >> distributed across my three bricks. > >> > >> first I ran a iozone benchmark across each brick testing the > read and > >> write perofrmance of a single large file per brick > >> > >> i then mounted my gluster volume locally and ran another iozone > run > >> with the same parameters writing a single file. the file went to > >> brick 1 which, when used driectly, would write with 2.3GB/s and > read > >> with 1.5GB/s. however, through gluster i got only 800MB/s read and > >> 750MB/s write throughput > >> > >> another run with two processes each writing a file, where one file > >> went to the first brick and the other file to the second brick > (which > >> by itself when directly accessed wrote at 2.8GB/s and read at > >> 2.7GB/s) resulted in 1.2GB/s of aggregated write and also > aggregated > >> read throughput. > >> > >> Is this a normal performance i can expect out of a glusterfs or > is it > >> worth tuning in order to really get closer to the actual brick > >> filesystem performance? > >> > >> here are the iozone commands i use for writing and reading.. note > >> that i am using directIO in order to make sure i don't get > fooled by > >> cache :) > >> > >> ./iozone -i 0 -t 1 -F /mnt/brick${b}/thread1 -+n -c -C -e -I -w > -+S 0 > >> -s $filesize -r $recordsize > iozone-brick${b}-write.txt > >> > >> ./iozone -i 1 -t 1 -F /mnt/brick${b}/thread1 -+n -c -C -e -I -w > -+S 0 > >> -s $filesize -r $recordsize > iozone-brick${b}-read.txt > >> > >> cheers > >> > >> Pascal > >> > >> _______________________________________________ > >> Gluster-users mailing list > >> Gluster-users at gluster.org > >> https://lists.gluster.org/mailman/listinfo/gluster-users > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > > -- > Amar Tumballi (amarts) -------------- next part -------------- An HTML attachment was scrubbed... URL: From atumball at redhat.com Thu May 2 08:30:46 2019 From: atumball at redhat.com (Amar Tumballi Suryanarayan) Date: Thu, 2 May 2019 14:00:46 +0530 Subject: [Gluster-users] performance - what can I expect In-Reply-To: <263c3d8d-d3ab-f052-85e1-ad7ade4073d7@dalco.ch> References: <381efa03-78b3-e244-9f52-054b357b5d57@dalco.ch> <8f150899-321b-f184-978c-9b7b01e6fb39@dalco.ch> <263c3d8d-d3ab-f052-85e1-ad7ade4073d7@dalco.ch> Message-ID: On Thu, May 2, 2019 at 1:21 PM Pascal Suter wrote: > Hi Amar > > thanks for rolling this back up. Actually i have done some more > benchmarking and fiddled with the config to finally reach a performance > figure i could live with. I now can squeeze about 3GB/s out of that server > which seems to be close to what i can get out of its network uplink (using > IP over Omni-Path). The system is now set up and in production so i can't > run any benchmarks on it anymore but i will get back at benchmarking in the > near future to test some storage related hardware, and i will try it with > gluster on top again. > > embarassingly the biggest performance issue was that the default > installation of the server was running the "performance" profile of tuned. > once i switched it to "throughput-performance" performance increased > dramatically. > > the volume info now looks pretty unspectacular: > > Volume Name: storage > Type: Distribute > Volume ID: c81c7e46-add5-4d88-9945-24cf7947ef8c > Status: Started > Snapshot Count: 0 > Number of Bricks: 3 > Transport-type: tcp > Bricks: > Brick1: themis01:/data/brick1/brick > Brick2: themis01:/data/brick2/brick > Brick3: themis01:/data/brick3/brick > Options Reconfigured: > transport.address-family: inet > nfs.disable: on > > thanks for pointing out gluster volume profile, i'll have a go with it > during my next benchmarking session. so far i was using iostat to track > brick-level io performance during my benchmarks. > > the main question i wanted to ask was, if there is a general rule of > thumb, how much throughput of the original bare brick throughput would be > expected to be left over once gluster is added on top of it. to give you an > example: when I use a parallel filesystem like Lustre or BeeGFS i usually > expect to get at least about 85% of the raw storage target throughput as > aggregated bandwidth over a multi-node test out of my Lustre or BeeGFS > setup. I consider any numbers below that to be too low and therefore will > have to dig into performance tuning to find the bottle neck. > > i was hoping someone could give me a rule-of-thumb number for a simple > distributed gluster setup, like that 85% number i've established for a > parallel file system. > > so at the moment my takeaway is, in a simple distributed volume across 3 > bricks with an aggregated bandwidth of 6GB/s i can expect to get about > 3GB/s aggregated bandwith out of the gluster mount, given there are no > bottle necks in the network. the 3GB/s is a number conducted under ideal > circumstances, meaning, i primed the storage to make sure i could run a > benchmark run using three nodes, with each node running a single thread > writing to a single file and each file was located on another bricke. this > yielded the maximum perfomance as this was pure streaming IO without any > overlapping file writing to the bricks other than the overhead created by > gluster's own internal mechanisms. > > Interestingly, the performance didn't drop much when i added nodes and > threads and introduced more random-ish io by having several processes write > to the same brick. So I assume, what "eats" up the 50% performance in the > end is probably Gluster writing all these additional hidden files which I > assume is some sort of Metadata. This causes additional IO on the disk that > i'm streaming my one file to and therefore turns my streaming IO into a > random io load for the raid controller and underlying harddisks which on > spinning disks would have about the performance impact i was seing in my > benchmarks. > Thanks for all these details. I have yet to try gluster on a Flash based brick and test its performance > there.. i would expect to see a better "efficiency" than the 50% i've > measured on this system here as random io vs. streaming io should not make > such a difference (or acutally almost no difference at all) on a flash > based storage. but that's me guessing now. > > so for the moment i'm fine but i would still be interested in hearing > ball-park figure "efficiency" numbers from others using gluster in a > similar setup. > We couldn't get a single number on this yet. Mainly because of multiple reasons. * Gluster's volume type has different behavior (performance wise) * Network plays more significant role than that of disk performance. Mostly latency involved in n/w than the throughput. * Different work loads (like create heavy Vs read/write, sequential read/write Vs random read/write) needs different options (currently they are not auto-tuned). * If one has good n/w and disk speed, even back end filesystem configuration (because of the layout we have with gfid etc) too matter a bit. Best thing is to understand the workload first, and then tuning for it (at present). cheers > > Pascal > On 01.05.19 14:55, Amar Tumballi Suryanarayan wrote: > > Hi Pascal, > > Sorry for complete delay in this one. And thanks for testing out in > different scenarios. Few questions before others can have a look and > advice you. > > 1. What is the volume info output ? > > 2. Do you see any concerning logs in glusterfs log files? > > 3. Please use `gluster volume profile` while running the tests, and that > gives a lot of information. > > 4. Considering you are using glusterfs-6.0, please take statedump of > client process (on any node) before and after the test, so we can analyze > the latency information of each translators. > > With these information, I hope we will be in a better state to answer the > questions. > > > On Wed, Apr 10, 2019 at 3:45 PM Pascal Suter > wrote: > >> i continued my testing with 5 clients, all attached over 100Gbit/s >> omni-path via IP over IB. when i run the same iozone benchmark across >> all 5 clients where gluster is mounted using the glusterfs client, i get >> an aggretated write throughput of only about 400GB/s and an aggregated >> read throughput of 1.5GB/s. Each node was writing a single 200Gb file in >> 16MB chunks and the files where distributed across all three bricks on >> the server. >> >> the connection was established over Omnipath for sure, as there is no >> other link between the nodes and server. >> >> i have no clue what i'm doing wrong here. i can't believe that this is a >> normal performance people would expect to see from gluster. i guess >> nobody would be using it if it was this slow. >> >> again, when written dreictly to the xfs filesystem on the bricks, i get >> over 6GB/s read and write throughput using the same benchmark. >> >> any advise is appreciated >> >> cheers >> >> Pascal >> >> On 04.04.19 12:03, Pascal Suter wrote: >> > I just noticed i left the most important parameters out :) >> > >> > here's the write command with filesize and recordsize in it as well :) >> > >> > ./iozone -i 0 -t 1 -F /mnt/gluster/storage/thread1 -+n -c -C -e -I -w >> > -+S 0 -s 200G -r 16384k >> > >> > also i ran the benchmark without direct_io which resulted in an even >> > worse performance. >> > >> > i also tried to mount the gluster volume via nfs-ganesha which further >> > reduced throughput down to about 450MB/s >> > >> > if i run the iozone benchmark with 3 threads writing to all three >> > bricks directly (from the xfs filesystem) i get throughputs of around >> > 6GB/s .. if I run the same benchmark through gluster mounted locally >> > using the fuse client and with enough threads so that each brick gets >> > at least one file written to it, i end up seing throughputs around >> > 1.5GB/s .. that's a 4x decrease in performance. at it actually is the >> > same if i run the benchmark with less threads and files only get >> > written to two out of three bricks. >> > >> > cpu load on the server is around 25% by the way, nicely distributed >> > across all available cores. >> > >> > i can't believe that gluster should really be so slow and everybody is >> > just happily using it. any hints on what i'm doing wrong are very >> > welcome. >> > >> > i'm using gluster 6.0 by the way. >> > >> > regards >> > >> > Pascal >> > >> > On 03.04.19 12:28, Pascal Suter wrote: >> >> Hi all >> >> >> >> I am currently testing gluster on a single server. I have three >> >> bricks, each a hardware RAID6 volume with thin provisioned LVM that >> >> was aligned to the RAID and then formatted with xfs. >> >> >> >> i've created a distributed volume so that entire files get >> >> distributed across my three bricks. >> >> >> >> first I ran a iozone benchmark across each brick testing the read and >> >> write perofrmance of a single large file per brick >> >> >> >> i then mounted my gluster volume locally and ran another iozone run >> >> with the same parameters writing a single file. the file went to >> >> brick 1 which, when used driectly, would write with 2.3GB/s and read >> >> with 1.5GB/s. however, through gluster i got only 800MB/s read and >> >> 750MB/s write throughput >> >> >> >> another run with two processes each writing a file, where one file >> >> went to the first brick and the other file to the second brick (which >> >> by itself when directly accessed wrote at 2.8GB/s and read at >> >> 2.7GB/s) resulted in 1.2GB/s of aggregated write and also aggregated >> >> read throughput. >> >> >> >> Is this a normal performance i can expect out of a glusterfs or is it >> >> worth tuning in order to really get closer to the actual brick >> >> filesystem performance? >> >> >> >> here are the iozone commands i use for writing and reading.. note >> >> that i am using directIO in order to make sure i don't get fooled by >> >> cache :) >> >> >> >> ./iozone -i 0 -t 1 -F /mnt/brick${b}/thread1 -+n -c -C -e -I -w -+S 0 >> >> -s $filesize -r $recordsize > iozone-brick${b}-write.txt >> >> >> >> ./iozone -i 1 -t 1 -F /mnt/brick${b}/thread1 -+n -c -C -e -I -w -+S 0 >> >> -s $filesize -r $recordsize > iozone-brick${b}-read.txt >> >> >> >> cheers >> >> >> >> Pascal >> >> >> >> _______________________________________________ >> >> Gluster-users mailing list >> >> Gluster-users at gluster.org >> >> https://lists.gluster.org/mailman/listinfo/gluster-users >> > _______________________________________________ >> > Gluster-users mailing list >> > Gluster-users at gluster.org >> > https://lists.gluster.org/mailman/listinfo/gluster-users >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> >> > > -- > Amar Tumballi (amarts) > > -- Amar Tumballi (amarts) -------------- next part -------------- An HTML attachment was scrubbed... URL: From joao.bauto at neuro.fchampalimaud.org Thu May 2 09:54:44 2019 From: joao.bauto at neuro.fchampalimaud.org (=?UTF-8?B?Sm/Do28gQmHDunRv?=) Date: Thu, 2 May 2019 10:54:44 +0100 Subject: [Gluster-users] parallel-readdir prevents directories and files listing - Bug 1670382 In-Reply-To: References: Message-ID: Thanks for the reply Amar. Last I knew, we recommended to avoid fuse and samba shares on same volume > (Mainly as we couldn't spend a lot of effort on testing the configuration). Does this also apply to samba shares when using vfs glusterfs? Anyways, we would treat the behavior as bug for sure. One possible path > looking at below volume info is to disable 'stat-prefetch' option and see > if it helps. Next option I would try is to disable readdir-ahead. I'll try and give feedback. Thanks, Jo?o Amar Tumballi Suryanarayan escreveu no dia quarta, 1/05/2019 ?(s) 13:30: > > > On Mon, Apr 29, 2019 at 3:56 PM Jo?o Ba?to < > joao.bauto at neuro.fchampalimaud.org> wrote: > >> Hi, >> >> I have an 8 brick distributed volume where Windows and Linux clients >> mount the volume via samba and headless compute servers using gluster >> native fuse. With parallel-readdir on, if a Windows client creates a new >> folder, the folder is indeed created but invisible to the Windows client. >> Accessing the same samba share in a Linux client, the folder is again >> visible and with normal behavior. The same folder is also visible when >> mounting via gluster native fuse. >> >> The Windows client can list existing directories and rename them while, >> for files, everything seems to be working fine. >> >> Gluster servers: CentOS 7.5 with Gluster 5.3 and Samba 4.8.3-4.el7.0.1 >> from @fasttrack >> Clients tested: Windows 10, Ubuntu 18.10, CentOS 7.5 >> >> https://bugzilla.redhat.com/show_bug.cgi?id=1670382 >> > > Thanks for the bug report. Will look into this, and get back. > > Last I knew, we recommended to avoid fuse and samba shares on same volume > (Mainly as we couldn't spend a lot of effort on testing the configuration). > Anyways, we would treat the behavior as bug for sure. One possible path > looking at below volume info is to disable 'stat-prefetch' option and see > if it helps. Next option I would try is to disable readdir-ahead. > > Regards, > Amar > > >> >> >> Volume Name: tank >> Type: Distribute >> Volume ID: 9582685f-07fa-41fd-b9fc-ebab3a6989cf >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 8 >> Transport-type: tcp >> Bricks: >> Brick1: swp-gluster-01:/tank/volume1/brick >> Brick2: swp-gluster-02:/tank/volume1/brick >> Brick3: swp-gluster-03:/tank/volume1/brick >> Brick4: swp-gluster-04:/tank/volume1/brick >> Brick5: swp-gluster-01:/tank/volume2/brick >> Brick6: swp-gluster-02:/tank/volume2/brick >> Brick7: swp-gluster-03:/tank/volume2/brick >> Brick8: swp-gluster-04:/tank/volume2/brick >> Options Reconfigured: >> performance.parallel-readdir: on >> performance.readdir-ahead: on >> performance.cache-invalidation: on >> performance.md-cache-timeout: 600 >> storage.batch-fsync-delay-usec: 0 >> performance.write-behind-window-size: 32MB >> performance.stat-prefetch: on >> performance.read-ahead: on >> performance.read-ahead-page-count: 16 >> performance.rda-request-size: 131072 >> performance.quick-read: on >> performance.open-behind: on >> performance.nl-cache-timeout: 600 >> performance.nl-cache: on >> performance.io-thread-count: 64 >> performance.io-cache: off >> performance.flush-behind: on >> performance.client-io-threads: off >> performance.write-behind: off >> performance.cache-samba-metadata: on >> network.inode-lru-limit: 0 >> features.cache-invalidation-timeout: 600 >> features.cache-invalidation: on >> cluster.readdir-optimize: on >> cluster.lookup-optimize: on >> client.event-threads: 4 >> server.event-threads: 16 >> features.quota-deem-statfs: on >> nfs.disable: on >> features.quota: on >> features.inode-quota: on >> cluster.enable-shared-storage: disable >> >> Cheers, >> >> Jo?o Ba?to >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > > > -- > Amar Tumballi (amarts) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pkalever at redhat.com Thu May 2 17:34:41 2019 From: pkalever at redhat.com (Prasanna Kalever) Date: Thu, 2 May 2019 23:04:41 +0530 Subject: [Gluster-users] gluster-block v0.4 is alive! Message-ID: Hello Gluster folks, Gluster-block team is happy to announce the v0.4 release [1]. This is the new stable version of gluster-block, lots of new and exciting features and interesting bug fixes are made available as part of this release. Please find the big list of release highlights and notable fixes at [2]. Details about installation can be found in the easy install guide at [3]. Find the details about prerequisites and setup guide at [4]. If you are a new user, checkout the demo video attached in the README doc [5], which will be a good source of intro to the project. There are good examples about how to use gluster-block both in the man pages [6] and test file [7] (also in the README). gluster-block is part of fedora package collection, an updated package with release version v0.4 will be soon made available. And the community provided packages will be soon made available at [8]. Please spend a minute to report any kind of issue that comes to your notice with this handy link [9]. We look forward to your feedback, which will help gluster-block get better! We would like to thank all our users, contributors for bug filing and fixes, also the whole team who involved in the huge effort with pre-release testing. [1] https://github.com/gluster/gluster-block [2] https://github.com/gluster/gluster-block/releases [3] https://github.com/gluster/gluster-block/blob/master/INSTALL [4] https://github.com/gluster/gluster-block#usage [5] https://github.com/gluster/gluster-block/blob/master/README.md [6] https://github.com/gluster/gluster-block/tree/master/docs [7] https://github.com/gluster/gluster-block/blob/master/tests/basic.t [8] https://download.gluster.org/pub/gluster/gluster-block/ [9] https://github.com/gluster/gluster-block/issues/new Cheers, Team Gluster-Block! From dcunningham at voisonics.com Fri May 3 02:10:03 2019 From: dcunningham at voisonics.com (David Cunningham) Date: Fri, 3 May 2019 14:10:03 +1200 Subject: [Gluster-users] Thin-arbiter questions Message-ID: Hello, We are setting up a thin-arbiter and hope someone can help with some questions. We've been following the documentation from https://docs.gluster.org/en/latest/Administrator%20Guide/Thin-Arbiter-Volumes/ . 1. What release of 5.x supports thin-arbiter? We tried a "gluster volume create" with the --thin-arbiter option on 5.5 and got an "unrecognized option --thin-arbiter" error. 2. The instruction to create a new volume with a thin-arbiter is clear. How do you add a thin-arbiter to an already existing volume though? 3. The documentation suggests running glusterfsd manually to start the thin-arbiter. Is there a service that can do this instead? I found a mention of one in https://bugzilla.redhat.com/show_bug.cgi?id=1579786 but it's not really documented. Thanks in advance for your help, -- David Cunningham, Voisonics Limited http://voisonics.com/ USA: +1 213 221 1092 New Zealand: +64 (0)28 2558 3782 -------------- next part -------------- An HTML attachment was scrubbed... URL: From aspandey at redhat.com Fri May 3 02:30:11 2019 From: aspandey at redhat.com (Ashish Pandey) Date: Thu, 2 May 2019 22:30:11 -0400 (EDT) Subject: [Gluster-users] Thin-arbiter questions In-Reply-To: References: Message-ID: <1272256010.16170135.1556850611144.JavaMail.zimbra@redhat.com> Hi David, Creation of thin-arbiter volume is currently supported by GD2 only. The command " glustercli " is available when glusterd2 is running. We are also working on providing thin-arbiter support on glusted however, it is not available right now. https://review.gluster.org/#/c/glusterfs/+/22612/ --- Ashish ----- Original Message ----- From: "David Cunningham" To: gluster-users at gluster.org Sent: Friday, May 3, 2019 7:40:03 AM Subject: [Gluster-users] Thin-arbiter questions Hello, We are setting up a thin-arbiter and hope someone can help with some questions. We've been following the documentation from https://docs.gluster.org/en/latest/Administrator%20Guide/Thin-Arbiter-Volumes/ . 1. What release of 5.x supports thin-arbiter? We tried a "gluster volume create" with the --thin-arbiter option on 5.5 and got an "unrecognized option --thin-arbiter" error. 2. The instruction to create a new volume with a thin-arbiter is clear. How do you add a thin-arbiter to an already existing volume though? 3. The documentation suggests running glusterfsd manually to start the thin-arbiter. Is there a service that can do this instead? I found a mention of one in https://bugzilla.redhat.com/show_bug.cgi?id=1579786 but it's not really documented. Thanks in advance for your help, -- David Cunningham, Voisonics Limited http://voisonics.com/ USA: +1 213 221 1092 New Zealand: +64 (0)28 2558 3782 _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From dcunningham at voisonics.com Fri May 3 02:34:04 2019 From: dcunningham at voisonics.com (David Cunningham) Date: Fri, 3 May 2019 14:34:04 +1200 Subject: [Gluster-users] Thin-arbiter questions In-Reply-To: <1272256010.16170135.1556850611144.JavaMail.zimbra@redhat.com> References: <1272256010.16170135.1556850611144.JavaMail.zimbra@redhat.com> Message-ID: Hi Ashish, Thanks very much for that reply. How stable is GD2? Is there even a vague ETA on when it might be supported in gluster? On Fri, 3 May 2019 at 14:30, Ashish Pandey wrote: > Hi David, > > Creation of thin-arbiter volume is currently supported by GD2 only. The > command "glustercli" is available when glusterd2 is running. > We are also working on providing thin-arbiter support on glusted however, > it is not available right now. > https://review.gluster.org/#/c/glusterfs/+/22612/ > > --- > Ashish > > ------------------------------ > *From: *"David Cunningham" > *To: *gluster-users at gluster.org > *Sent: *Friday, May 3, 2019 7:40:03 AM > *Subject: *[Gluster-users] Thin-arbiter questions > > Hello, > > We are setting up a thin-arbiter and hope someone can help with some > questions. We've been following the documentation from > https://docs.gluster.org/en/latest/Administrator%20Guide/Thin-Arbiter-Volumes/ > . > > 1. What release of 5.x supports thin-arbiter? We tried a "gluster volume > create" with the --thin-arbiter option on 5.5 and got an "unrecognized > option --thin-arbiter" error. > > 2. The instruction to create a new volume with a thin-arbiter is clear. > How do you add a thin-arbiter to an already existing volume though? > > 3. The documentation suggests running glusterfsd manually to start the > thin-arbiter. Is there a service that can do this instead? I found a > mention of one in https://bugzilla.redhat.com/show_bug.cgi?id=1579786 but > it's not really documented. > > Thanks in advance for your help, > > -- > David Cunningham, Voisonics Limited > http://voisonics.com/ > USA: +1 213 221 1092 > New Zealand: +64 (0)28 2558 3782 > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > > -- David Cunningham, Voisonics Limited http://voisonics.com/ USA: +1 213 221 1092 New Zealand: +64 (0)28 2558 3782 -------------- next part -------------- An HTML attachment was scrubbed... URL: From aspandey at redhat.com Fri May 3 02:43:25 2019 From: aspandey at redhat.com (Ashish Pandey) Date: Thu, 2 May 2019 22:43:25 -0400 (EDT) Subject: [Gluster-users] Thin-arbiter questions In-Reply-To: References: <1272256010.16170135.1556850611144.JavaMail.zimbra@redhat.com> Message-ID: <1331654608.16170402.1556851405061.JavaMail.zimbra@redhat.com> David, I am adding members who are working on glusterd2 (Aravinda) and thin-arbiter support in glusterd (Vishal) and who can better reply on these questions. Patch for glusterd has been sent and it only requires reviews. I hope it should be completed in next 1 month or so. https://review.gluster.org/#/c/glusterfs/+/22612/ --- Ashish ----- Original Message ----- From: "David Cunningham" To: "Ashish Pandey" Cc: gluster-users at gluster.org Sent: Friday, May 3, 2019 8:04:04 AM Subject: Re: [Gluster-users] Thin-arbiter questions Hi Ashish, Thanks very much for that reply. How stable is GD2? Is there even a vague ETA on when it might be supported in gluster? On Fri, 3 May 2019 at 14:30, Ashish Pandey < aspandey at redhat.com > wrote: Hi David, Creation of thin-arbiter volume is currently supported by GD2 only. The command " glustercli " is available when glusterd2 is running. We are also working on providing thin-arbiter support on glusted however, it is not available right now. https://review.gluster.org/#/c/glusterfs/+/22612/ --- Ashish From: "David Cunningham" < dcunningham at voisonics.com > To: gluster-users at gluster.org Sent: Friday, May 3, 2019 7:40:03 AM Subject: [Gluster-users] Thin-arbiter questions Hello, We are setting up a thin-arbiter and hope someone can help with some questions. We've been following the documentation from https://docs.gluster.org/en/latest/Administrator%20Guide/Thin-Arbiter-Volumes/ . 1. What release of 5.x supports thin-arbiter? We tried a "gluster volume create" with the --thin-arbiter option on 5.5 and got an "unrecognized option --thin-arbiter" error. 2. The instruction to create a new volume with a thin-arbiter is clear. How do you add a thin-arbiter to an already existing volume though? 3. The documentation suggests running glusterfsd manually to start the thin-arbiter. Is there a service that can do this instead? I found a mention of one in https://bugzilla.redhat.com/show_bug.cgi?id=1579786 but it's not really documented. Thanks in advance for your help, -- David Cunningham, Voisonics Limited http://voisonics.com/ USA: +1 213 221 1092 New Zealand: +64 (0)28 2558 3782 _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -- David Cunningham, Voisonics Limited http://voisonics.com/ USA: +1 213 221 1092 New Zealand: +64 (0)28 2558 3782 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jthottan at redhat.com Fri May 3 06:04:50 2019 From: jthottan at redhat.com (Jiffin Tony Thottan) Date: Fri, 3 May 2019 11:34:50 +0530 Subject: [Gluster-users] Proposing to previous ganesha HA cluster solution back to gluster code as gluster-7 feature In-Reply-To: <7d75b62f0eb0495782c46ef8521790d5@ul-exc-pr-mbx13.ulaval.ca> References: <9BE7F129-DE42-46A5-896B-81460E605E9E@gmail.com> <7d75b62f0eb0495782c46ef8521790d5@ul-exc-pr-mbx13.ulaval.ca> Message-ID: On 30/04/19 6:41 PM, Renaud Fortier wrote: > > IMO, you should keep storhaug and maintain it. At the beginning, we > were with pacemaker and corosync. Then we move to storhaug with the > upgrade to gluster 4.1.x. Now you are talking about going back like it > was. Maybe it will be better with pacemake and corosync but the > important is to have a solution that will be stable and maintained. > I agree it is very frustrating, there is no longer development planned for future unless someone pick it and work on for its stabilization and improvement. My plan is just to get back what gluster and nfs-ganesha had before -- Jiffin > thanks > > Renaud > > *De?:*gluster-users-bounces at gluster.org > [mailto:gluster-users-bounces at gluster.org] *De la part de* Jim Kinney > *Envoy??:* 30 avril 2019 08:20 > *??:* gluster-users at gluster.org; Jiffin Tony Thottan > ; gluster-users at gluster.org; Gluster Devel > ; gluster-maintainers at gluster.org; > nfs-ganesha ; devel at lists.nfs-ganesha.org > *Objet?:* Re: [Gluster-users] Proposing to previous ganesha HA cluster > solution back to gluster code as gluster-7 feature > > +1! > I'm using nfs-ganesha in my next upgrade so my client systems can use > NFS instead of fuse mounts. Having an integrated, designed in process > to coordinate multiple nodes into an HA cluster will very welcome. > > On April 30, 2019 3:20:11 AM EDT, Jiffin Tony Thottan > > wrote: > > Hi all, > > Some of you folks may be familiar with HA solution provided for > nfs-ganesha by gluster using pacemaker and corosync. > > That feature was removed in glusterfs 3.10 in favour for common HA > project "Storhaug". Even Storhaug was not progressed > > much from last two years and current development is in halt state, > hence planning to restore old HA ganesha solution back > > to gluster code repository with some improvement and targetting > for next gluster release 7. > > I have opened up an issue [1] with details and posted initial set > of patches [2] > > Please share your thoughts on the same > > Regards, > > Jiffin > > [1]https://github.com/gluster/glusterfs/issues/663 > > > [2] > https://review.gluster.org/#/q/topic:rfc-663+(status:open+OR+status:merged) > > > -- > Sent from my Android device with K-9 Mail. All tyopes are thumb > related and reflect authenticity. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jthottan at redhat.com Fri May 3 06:08:07 2019 From: jthottan at redhat.com (Jiffin Tony Thottan) Date: Fri, 3 May 2019 11:38:07 +0530 Subject: [Gluster-users] Proposing to previous ganesha HA clustersolution back to gluster code as gluster-7 feature In-Reply-To: <1028413072.2343069.1556630991785@mail.yahoo.com> References: <1028413072.2343069.1556630991785@mail.yahoo.com> Message-ID: <84885b70-e6b0-6e9b-f43d-a13dbafc6b6a@redhat.com> On 30/04/19 6:59 PM, Strahil Nikolov wrote: > Hi, > > I'm posting this again as it got bounced. > Keep in mind that corosync/pacemaker? is hard for proper setup by new admins/users. > > I'm still trying to remediate the effects of poor configuration at work. > Also, storhaug is nice for hyperconverged setups where the host is not only hosting bricks, but? other? workloads. > Corosync/pacemaker require proper fencing to be setup and most of the stonith resources 'shoot the other node in the head'. > I would be happy to see an easy to deploy (let say 'cluster.enable-ha-ganesha true') and gluster to be bringing up the Floating IPs and taking care of the NFS locks, so no disruption will be felt by the clients. It do take care those, but need to follow certain prerequisite, but please fencing won't configured for this setup. May we think about in future. -- Jiffin > > Still, this will be a lot of work to achieve. > > Best Regards, > Strahil Nikolov > > On Apr 30, 2019 15:19, Jim Kinney wrote: >> >> +1! >> I'm using nfs-ganesha in my next upgrade so my client systems can use NFS instead of fuse mounts. Having an integrated, designed in process to coordinate multiple nodes into an HA cluster will very welcome. >> >> On April 30, 2019 3:20:11 AM EDT, Jiffin Tony Thottan wrote: >>> >>> Hi all, >>> >>> Some of you folks may be familiar with HA solution provided for nfs-ganesha by gluster using pacemaker and corosync. >>> >>> That feature was removed in glusterfs 3.10 in favour for common HA project "Storhaug". Even Storhaug was not progressed >>> >>> much from last two years and current development is in halt state, hence planning to restore old HA ganesha solution back >>> >>> to gluster code repository with some improvement and targetting for next gluster release 7. >>> >>> ??I have opened up an issue [1] with details and posted initial set of patches [2] >>> >>> Please share your thoughts on the same >>> >>> >>> Regards, >>> >>> Jiffin >>> >>> [1] https://github.com/gluster/glusterfs/issues/663 >>> >>> [2] https://review.gluster.org/#/q/topic:rfc-663+(status:open+OR+status:merged) >>> >>> >> -- >> Sent from my Android device with K-9 Mail. All tyopes are thumb related and reflect authenticity. > Keep in mind that corosync/pacemaker? is hard for proper setup by new admins/users. > > I'm still trying to remediate the effects of poor configuration at work. > Also, storhaug is nice for hyperconverged setups where the host is not only hosting bricks, but? other? workloads. > Corosync/pacemaker require proper fencing to be setup and most of the stonith resources 'shoot the other node in the head'. > I would be happy to see an easy to deploy (let say 'cluster.enable-ha-ganesha true') and gluster to be bringing up the Floating IPs and taking care of the NFS locks, so no disruption will be felt by the clients. > > Still, this will be a lot of work to achieve. > > Best Regards, > Strahil NikolovOn Apr 30, 2019 15:19, Jim Kinney wrote: >> +1! >> I'm using nfs-ganesha in my next upgrade so my client systems can use NFS instead of fuse mounts. Having an integrated, designed in process to coordinate multiple nodes into an HA cluster will very welcome. >> >> On April 30, 2019 3:20:11 AM EDT, Jiffin Tony Thottan wrote: >>> Hi all, >>> >>> Some of you folks may be familiar with HA solution provided for nfs-ganesha by gluster using pacemaker and corosync. >>> >>> That feature was removed in glusterfs 3.10 in favour for common HA project "Storhaug". Even Storhaug was not progressed >>> >>> much from last two years and current development is in halt state, hence planning to restore old HA ganesha solution back >>> >>> to gluster code repository with some improvement and targetting for next gluster release 7. >>> >>> I have opened up an issue [1] with details and posted initial set of patches [2] >>> >>> Please share your thoughts on the same >>> >>> Regards, >>> >>> Jiffin >>> >>> [1] https://github.com/gluster/glusterfs/issues/663 >>> >>> [2] https://review.gluster.org/#/q/topic:rfc-663+(status:open+OR+status:merged) >> >> -- >> Sent from my Android device with K-9 Mail. All tyopes are thumb related and reflect authenticity. From hunter86_bg at yahoo.com Fri May 3 18:40:01 2019 From: hunter86_bg at yahoo.com (Strahil) Date: Fri, 03 May 2019 21:40:01 +0300 Subject: [Gluster-users] Thin-arbiter questions Message-ID: Hi Ashish, Can someone commit the doc change I have already proposed ? At least, the doc will clarify that fact . Best Regards, Strahil NikolovOn May 3, 2019 05:30, Ashish Pandey wrote: > > Hi David, > > Creation of thin-arbiter volume is currently supported by GD2 only. The command "glustercli" is available when glusterd2 is running. > We are also working on providing thin-arbiter support on glusted however, it is not available right now. > https://review.gluster.org/#/c/glusterfs/+/22612/ > > --- > Ashish > > ________________________________ > From: "David Cunningham" > To: gluster-users at gluster.org > Sent: Friday, May 3, 2019 7:40:03 AM > Subject: [Gluster-users] Thin-arbiter questions > > Hello, > > We are setting up a thin-arbiter and hope someone can help with some questions. We've been following the documentation from https://docs.gluster.org/en/latest/Administrator%20Guide/Thin-Arbiter-Volumes/. > > 1. What release of 5.x supports thin-arbiter? We tried a "gluster volume create" with the --thin-arbiter option on 5.5 and got an "unrecognized option --thin-arbiter" error. > > 2. The instruction to create a new volume with a thin-arbiter is clear. How do you add a thin-arbiter to an already existing volume though? > > 3. The documentation suggests running glusterfsd manually to start the thin-arbiter. Is there a service that can do this instead? I found a mention of one in https://bugzilla.redhat.com/show_bug.cgi?id=1579786 but it's not really documented. > > Thanks in advance for your help, > > -- > David Cunningham, Voisonics Limited > http://voisonics.com/ > USA: +1 213 221 1092 > New Zealand: +64 (0)28 2558 3782 > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dcunningham at voisonics.com Fri May 3 21:15:03 2019 From: dcunningham at voisonics.com (David Cunningham) Date: Sat, 4 May 2019 09:15:03 +1200 Subject: [Gluster-users] Thin-arbiter questions In-Reply-To: <1331654608.16170402.1556851405061.JavaMail.zimbra@redhat.com> References: <1272256010.16170135.1556850611144.JavaMail.zimbra@redhat.com> <1331654608.16170402.1556851405061.JavaMail.zimbra@redhat.com> Message-ID: OK, thank you Ashish. On Fri, 3 May 2019 at 14:43, Ashish Pandey wrote: > David, > > I am adding members who are working on glusterd2 (Aravinda) and > thin-arbiter support in glusterd (Vishal) and who can > better reply on these questions. > > Patch for glusterd has been sent and it only requires reviews. I hope it > should be completed in next 1 month or so. > https://review.gluster.org/#/c/glusterfs/+/22612/ > > --- > Ashish > > ------------------------------ > *From: *"David Cunningham" > *To: *"Ashish Pandey" > *Cc: *gluster-users at gluster.org > *Sent: *Friday, May 3, 2019 8:04:04 AM > *Subject: *Re: [Gluster-users] Thin-arbiter questions > > Hi Ashish, > > Thanks very much for that reply. How stable is GD2? Is there even a vague > ETA on when it might be supported in gluster? > > > On Fri, 3 May 2019 at 14:30, Ashish Pandey wrote: > >> Hi David, >> >> Creation of thin-arbiter volume is currently supported by GD2 only. The >> command "glustercli" is available when glusterd2 is running. >> We are also working on providing thin-arbiter support on glusted however, >> it is not available right now. >> https://review.gluster.org/#/c/glusterfs/+/22612/ >> >> --- >> Ashish >> >> ------------------------------ >> *From: *"David Cunningham" >> *To: *gluster-users at gluster.org >> *Sent: *Friday, May 3, 2019 7:40:03 AM >> *Subject: *[Gluster-users] Thin-arbiter questions >> >> Hello, >> >> We are setting up a thin-arbiter and hope someone can help with some >> questions. We've been following the documentation from >> https://docs.gluster.org/en/latest/Administrator%20Guide/Thin-Arbiter-Volumes/ >> . >> >> 1. What release of 5.x supports thin-arbiter? We tried a "gluster volume >> create" with the --thin-arbiter option on 5.5 and got an "unrecognized >> option --thin-arbiter" error. >> >> 2. The instruction to create a new volume with a thin-arbiter is clear. >> How do you add a thin-arbiter to an already existing volume though? >> >> 3. The documentation suggests running glusterfsd manually to start the >> thin-arbiter. Is there a service that can do this instead? I found a >> mention of one in https://bugzilla.redhat.com/show_bug.cgi?id=1579786 >> but it's not really documented. >> >> Thanks in advance for your help, >> >> -- >> David Cunningham, Voisonics Limited >> http://voisonics.com/ >> USA: +1 213 221 1092 >> New Zealand: +64 (0)28 2558 3782 >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> > > -- > David Cunningham, Voisonics Limited > http://voisonics.com/ > USA: +1 213 221 1092 > New Zealand: +64 (0)28 2558 3782 > > -- David Cunningham, Voisonics Limited http://voisonics.com/ USA: +1 213 221 1092 New Zealand: +64 (0)28 2558 3782 -------------- next part -------------- An HTML attachment was scrubbed... URL: From archon810 at gmail.com Fri May 3 22:44:33 2019 From: archon810 at gmail.com (Artem Russakovskii) Date: Fri, 3 May 2019 15:44:33 -0700 Subject: [Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP In-Reply-To: References: <2ed171d2-df68-ada3-e0de-53f19cb79520@redhat.com> Message-ID: Just to update everyone on the nasty crash one of our servers continued having even after 5.5/5.6, I posted a summary of the results here: https://bugzilla.redhat.com/show_bug.cgi?id=1690769#c4. Sincerely, Artem -- Founder, Android Police , APK Mirror , Illogical Robot LLC beerpla.net | +ArtemRussakovskii | @ArtemR On Wed, Mar 20, 2019 at 12:57 PM Artem Russakovskii wrote: > Amar, > > I see debuginfo packages now and have installed them. I'm available via > Skype as before, just ping me there. > > Sincerely, > Artem > > -- > Founder, Android Police , APK Mirror > , Illogical Robot LLC > beerpla.net | +ArtemRussakovskii > | @ArtemR > > > > On Tue, Mar 19, 2019 at 10:46 PM Amar Tumballi Suryanarayan < > atumball at redhat.com> wrote: > >> >> >> On Wed, Mar 20, 2019 at 9:52 AM Artem Russakovskii >> wrote: >> >>> Can I roll back performance.write-behind: off and lru-limit=0 then? I'm >>> waiting for the debug packages to be available for OpenSUSE, then I can >>> help Amar with another debug session. >>> >>> >> Yes, the write-behind issue is now fixed. You can enable write-behind. >> Also remove lru-limit=0, so you can also utilize the benefit of garbage >> collection introduced in 5.4 >> >> Lets get to fixing the problem once the debuginfo packages are available. >> >> >> >>> In the meantime, have you had time to set up 1x4 replicate testing? I was >>> told you were only testing 1x3, and it's the 4th brick that may be >>> causing >>> the crash, which is consistent with this whole time only 1 of 4 bricks >>> constantly crashing. The other 3 have been rock solid. I'm hoping you >>> could >>> find the issue without a debug session this way. >>> >>> >> That is my gut feeling still. Added a basic test case with 4 bricks, >> https://review.gluster.org/#/c/glusterfs/+/22328/. But I think this >> particular issue is happening only on certain pattern of access for 1x4 >> setup. Lets get to the root of it once we have debuginfo packages for Suse >> builds. >> >> -Amar >> >> Sincerely, >>> Artem >>> >>> -- >>> Founder, Android Police , APK Mirror >>> , Illogical Robot LLC >>> beerpla.net | +ArtemRussakovskii >>> | @ArtemR >>> >>> >>> >>> On Tue, Mar 19, 2019 at 8:27 PM Nithya Balachandran >> > >>> wrote: >>> >>> > Hi Artem, >>> > >>> > I think you are running into a different crash. The ones reported which >>> > were prevented by turning off write-behind are now fixed. >>> > We will need to look into the one you are seeing to see why it is >>> > happening. >>> > >>> > Regards, >>> > Nithya >>> > >>> > >>> > On Tue, 19 Mar 2019 at 20:25, Artem Russakovskii >>> > wrote: >>> > >>> >> The flood is indeed fixed for us on 5.5. However, the crashes are not. >>> >> >>> >> Sincerely, >>> >> Artem >>> >> >>> >> -- >>> >> Founder, Android Police , APK Mirror >>> >> , Illogical Robot LLC >>> >> beerpla.net | +ArtemRussakovskii >>> >> | @ArtemR >>> >> >>> >> >>> >> >>> >> On Mon, Mar 18, 2019 at 5:41 AM Hu Bert >>> wrote: >>> >> >>> >>> Hi Amar, >>> >>> >>> >>> if you refer to this bug: >>> >>> https://bugzilla.redhat.com/show_bug.cgi?id=1674225 : in the test >>> >>> setup i haven't seen those entries, while copying & deleting a few >>> GBs >>> >>> of data. For a final statement we have to wait until i updated our >>> >>> live gluster servers - could take place on tuesday or wednesday. >>> >>> >>> >>> Maybe other users can do an update to 5.4 as well and report back >>> here. >>> >>> >>> >>> >>> >>> Hubert >>> >>> >>> >>> >>> >>> >>> >>> Am Mo., 18. M?rz 2019 um 11:36 Uhr schrieb Amar Tumballi Suryanarayan >>> >>> : >>> >>> > >>> >>> > Hi Hu Bert, >>> >>> > >>> >>> > Appreciate the feedback. Also are the other boiling issues related >>> to >>> >>> logs fixed now? >>> >>> > >>> >>> > -Amar >>> >>> > >>> >>> > On Mon, Mar 18, 2019 at 3:54 PM Hu Bert >>> >>> wrote: >>> >>> >> >>> >>> >> update: upgrade from 5.3 -> 5.5 in a replicate 3 test setup with 2 >>> >>> >> volumes done. In 'gluster peer status' the peers stay connected >>> during >>> >>> >> the upgrade, no 'peer rejected' messages. No cksum mismatches in >>> the >>> >>> >> logs. Looks good :-) >>> >>> >> >>> >>> >> Am Mo., 18. M?rz 2019 um 09:54 Uhr schrieb Hu Bert < >>> >>> revirii at googlemail.com>: >>> >>> >> > >>> >>> >> > Good morning :-) >>> >>> >> > >>> >>> >> > for debian the packages are there: >>> >>> >> > >>> >>> >>> https://download.gluster.org/pub/gluster/glusterfs/5/5.5/Debian/stretch/amd64/apt/pool/main/g/glusterfs/ >>> >>> >> > >>> >>> >> > I'll do an upgrade of a test installation 5.3 -> 5.5 and see if >>> >>> there >>> >>> >> > are some errors etc. and report back. >>> >>> >> > >>> >>> >> > btw: no release notes for 5.4 and 5.5 so far? >>> >>> >> > https://docs.gluster.org/en/latest/release-notes/ ? >>> >>> >> > >>> >>> >> > Am Fr., 15. M?rz 2019 um 14:28 Uhr schrieb Shyam Ranganathan >>> >>> >> > : >>> >>> >> > > >>> >>> >> > > We created a 5.5 release tag, and it is under packaging now. >>> It >>> >>> should >>> >>> >> > > be packaged and ready for testing early next week and should >>> be >>> >>> released >>> >>> >> > > close to mid-week next week. >>> >>> >> > > >>> >>> >> > > Thanks, >>> >>> >> > > Shyam >>> >>> >> > > On 3/13/19 12:34 PM, Artem Russakovskii wrote: >>> >>> >> > > > Wednesday now with no update :-/ >>> >>> >> > > > >>> >>> >> > > > Sincerely, >>> >>> >> > > > Artem >>> >>> >> > > > >>> >>> >> > > > -- >>> >>> >> > > > Founder, Android Police , APK >>> >>> Mirror >>> >>> >> > > > , Illogical Robot LLC >>> >>> >> > > > beerpla.net | +ArtemRussakovskii >>> >>> >> > > > | @ArtemR >>> >>> >> > > > >>> >>> >> > > > >>> >>> >> > > > >>> >>> >> > > > On Tue, Mar 12, 2019 at 10:28 AM Artem Russakovskii < >>> >>> archon810 at gmail.com >>> >>> >> > > > > wrote: >>> >>> >> > > > >>> >>> >> > > > Hi Amar, >>> >>> >> > > > >>> >>> >> > > > Any updates on this? I'm still not seeing it in OpenSUSE >>> >>> build >>> >>> >> > > > repos. Maybe later today? >>> >>> >> > > > >>> >>> >> > > > Thanks. >>> >>> >> > > > >>> >>> >> > > > Sincerely, >>> >>> >> > > > Artem >>> >>> >> > > > >>> >>> >> > > > -- >>> >>> >> > > > Founder, Android Police , >>> >>> APK Mirror >>> >>> >> > > > , Illogical Robot LLC >>> >>> >> > > > beerpla.net | +ArtemRussakovskii >>> >>> >> > > > | @ArtemR >>> >>> >> > > > >>> >>> >> > > > >>> >>> >> > > > >>> >>> >> > > > On Wed, Mar 6, 2019 at 10:30 PM Amar Tumballi >>> Suryanarayan >>> >>> >> > > > > >>> wrote: >>> >>> >> > > > >>> >>> >> > > > We are talking days. Not weeks. Considering already >>> it >>> >>> is >>> >>> >> > > > Thursday here. 1 more day for tagging, and >>> packaging. >>> >>> May be ok >>> >>> >> > > > to expect it on Monday. >>> >>> >> > > > >>> >>> >> > > > -Amar >>> >>> >> > > > >>> >>> >> > > > On Thu, Mar 7, 2019 at 11:54 AM Artem Russakovskii >>> >>> >> > > > > >>> >>> wrote: >>> >>> >> > > > >>> >>> >> > > > Is the next release going to be an imminent >>> hotfix, >>> >>> i.e. >>> >>> >> > > > something like today/tomorrow, or are we talking >>> >>> weeks? >>> >>> >> > > > >>> >>> >> > > > Sincerely, >>> >>> >> > > > Artem >>> >>> >> > > > >>> >>> >> > > > -- >>> >>> >> > > > Founder, Android Police < >>> >>> http://www.androidpolice.com>, APK >>> >>> >> > > > Mirror , Illogical >>> >>> Robot LLC >>> >>> >> > > > beerpla.net | >>> >>> +ArtemRussakovskii >>> >>> >> > > > | >>> >>> @ArtemR >>> >>> >> > > > >>> >>> >> > > > >>> >>> >> > > > >>> >>> >> > > > On Tue, Mar 5, 2019 at 11:09 AM Artem >>> Russakovskii >>> >>> >> > > > >> archon810 at gmail.com>> >>> >>> wrote: >>> >>> >> > > > >>> >>> >> > > > Ended up downgrading to 5.3 just in case. >>> Peer >>> >>> status >>> >>> >> > > > and volume status are OK now. >>> >>> >> > > > >>> >>> >> > > > zypper install --oldpackage >>> >>> glusterfs-5.3-lp150.100.1 >>> >>> >> > > > Loading repository data... >>> >>> >> > > > Reading installed packages... >>> >>> >> > > > Resolving package dependencies... >>> >>> >> > > > >>> >>> >> > > > Problem: glusterfs-5.3-lp150.100.1.x86_64 >>> >>> requires >>> >>> >> > > > libgfapi0 = 5.3, but this requirement >>> cannot be >>> >>> provided >>> >>> >> > > > not installable providers: >>> >>> >> > > > libgfapi0-5.3-lp150.100.1.x86_64[glusterfs] >>> >>> >> > > > Solution 1: Following actions will be done: >>> >>> >> > > > downgrade of >>> libgfapi0-5.4-lp150.100.1.x86_64 >>> >>> to >>> >>> >> > > > libgfapi0-5.3-lp150.100.1.x86_64 >>> >>> >> > > > downgrade of >>> >>> libgfchangelog0-5.4-lp150.100.1.x86_64 to >>> >>> >> > > > libgfchangelog0-5.3-lp150.100.1.x86_64 >>> >>> >> > > > downgrade of >>> libgfrpc0-5.4-lp150.100.1.x86_64 >>> >>> to >>> >>> >> > > > libgfrpc0-5.3-lp150.100.1.x86_64 >>> >>> >> > > > downgrade of >>> libgfxdr0-5.4-lp150.100.1.x86_64 >>> >>> to >>> >>> >> > > > libgfxdr0-5.3-lp150.100.1.x86_64 >>> >>> >> > > > downgrade of >>> >>> libglusterfs0-5.4-lp150.100.1.x86_64 to >>> >>> >> > > > libglusterfs0-5.3-lp150.100.1.x86_64 >>> >>> >> > > > Solution 2: do not install >>> >>> glusterfs-5.3-lp150.100.1.x86_64 >>> >>> >> > > > Solution 3: break >>> >>> glusterfs-5.3-lp150.100.1.x86_64 by >>> >>> >> > > > ignoring some of its dependencies >>> >>> >> > > > >>> >>> >> > > > Choose from above solutions by number or >>> cancel >>> >>> >> > > > [1/2/3/c] (c): 1 >>> >>> >> > > > Resolving dependencies... >>> >>> >> > > > Resolving package dependencies... >>> >>> >> > > > >>> >>> >> > > > The following 6 packages are going to be >>> >>> downgraded: >>> >>> >> > > > glusterfs libgfapi0 libgfchangelog0 >>> libgfrpc0 >>> >>> >> > > > libgfxdr0 libglusterfs0 >>> >>> >> > > > >>> >>> >> > > > 6 packages to downgrade. >>> >>> >> > > > >>> >>> >> > > > Sincerely, >>> >>> >> > > > Artem >>> >>> >> > > > >>> >>> >> > > > -- >>> >>> >> > > > Founder, Android Police >>> >>> >> > > > , APK Mirror >>> >>> >> > > > , Illogical >>> Robot >>> >>> LLC >>> >>> >> > > > beerpla.net | >>> >>> +ArtemRussakovskii >>> >>> >> > > > >>> | >>> >>> @ArtemR >>> >>> >> > > > >>> >>> >> > > > >>> >>> >> > > > >>> >>> >> > > > On Tue, Mar 5, 2019 at 10:57 AM Artem >>> >>> Russakovskii >>> >>> >> > > > >> >>> archon810 at gmail.com>> wrote: >>> >>> >> > > > >>> >>> >> > > > Noticed the same when upgrading from >>> 5.3 to >>> >>> 5.4, as >>> >>> >> > > > mentioned. >>> >>> >> > > > >>> >>> >> > > > I'm confused though. Is actual >>> replication >>> >>> affected, >>> >>> >> > > > because the 5.4 server and the 3x 5.3 >>> >>> servers still >>> >>> >> > > > show heal info as all 4 connected, and >>> the >>> >>> files >>> >>> >> > > > seem to be replicating correctly as >>> well. >>> >>> >> > > > >>> >>> >> > > > So what's actually affected - just the >>> >>> status >>> >>> >> > > > command, or leaving 5.4 on one of the >>> nodes >>> >>> is doing >>> >>> >> > > > some damage to the underlying fs? Is it >>> >>> fixable by >>> >>> >> > > > tweaking transport.socket.ssl-enabled? >>> Does >>> >>> >> > > > upgrading all servers to 5.4 resolve >>> it, or >>> >>> should >>> >>> >> > > > we revert back to 5.3? >>> >>> >> > > > >>> >>> >> > > > Sincerely, >>> >>> >> > > > Artem >>> >>> >> > > > >>> >>> >> > > > -- >>> >>> >> > > > Founder, Android Police >>> >>> >> > > > , APK >>> Mirror >>> >>> >> > > > , Illogical >>> >>> Robot LLC >>> >>> >> > > > beerpla.net | >>> >>> >> > > > +ArtemRussakovskii >>> >>> >> > > > < >>> https://plus.google.com/+ArtemRussakovskii >>> >>> > >>> >>> >> > > > | @ArtemR >>> >>> >> > > > >>> >>> >> > > > >>> >>> >> > > > On Tue, Mar 5, 2019 at 2:02 AM Hu Bert >>> >>> >> > > > >> >>> >> > > > > wrote: >>> >>> >> > > > >>> >>> >> > > > fyi: did a downgrade 5.4 -> 5.3 and >>> it >>> >>> worked. >>> >>> >> > > > all replicas are up and >>> >>> >> > > > running. Awaiting updated v5.4. >>> >>> >> > > > >>> >>> >> > > > thx :-) >>> >>> >> > > > >>> >>> >> > > > Am Di., 5. M?rz 2019 um 09:26 Uhr >>> >>> schrieb Hari >>> >>> >> > > > Gowtham >> >>> >> > > > >: >>> >>> >> > > > > >>> >>> >> > > > > There are plans to revert the >>> patch >>> >>> causing >>> >>> >> > > > this error and rebuilt 5.4. >>> >>> >> > > > > This should happen faster. the >>> >>> rebuilt 5.4 >>> >>> >> > > > should be void of this upgrade >>> issue. >>> >>> >> > > > > >>> >>> >> > > > > In the meantime, you can use 5.3 >>> for >>> >>> this cluster. >>> >>> >> > > > > Downgrading to 5.3 will work if it >>> >>> was just >>> >>> >> > > > one node that was upgrade to 5.4 >>> >>> >> > > > > and the other nodes are still in >>> 5.3 >> >> >> >> -- >> Amar Tumballi (amarts) >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Sat May 4 06:34:56 2019 From: hunter86_bg at yahoo.com (Strahil) Date: Sat, 04 May 2019 09:34:56 +0300 Subject: [Gluster-users] Proposing to previous ganesha HA clustersolution back to gluster code as gluster-7 feature Message-ID: Hi Jiffin, No vendor will support your corosync/pacemaker stack if you do not have proper fencing. As Gluster is already a cluster of its own, it makes sense to control everything from there. Best Regards, Strahil NikolovOn May 3, 2019 09:08, Jiffin Tony Thottan wrote: > > > On 30/04/19 6:59 PM, Strahil Nikolov wrote: > > Hi, > > > > I'm posting this again as it got bounced. > > Keep in mind that corosync/pacemaker? is hard for proper setup by new admins/users. > > > > I'm still trying to remediate the effects of poor configuration at work. > > Also, storhaug is nice for hyperconverged setups where the host is not only hosting bricks, but? other? workloads. > > Corosync/pacemaker require proper fencing to be setup and most of the stonith resources 'shoot the other node in the head'. > > I would be happy to see an easy to deploy (let say 'cluster.enable-ha-ganesha true') and gluster to be bringing up the Floating IPs and taking care of the NFS locks, so no disruption will be felt by the clients. > > > It do take care those, but need to follow certain prerequisite, but > please fencing won't configured for this setup. May we think about in > future. > > -- > > Jiffin > > > > > Still, this will be a lot of work to achieve. > > > > Best Regards, > > Strahil Nikolov > > > > On Apr 30, 2019 15:19, Jim Kinney wrote: > >>??? > >> +1! > >> I'm using nfs-ganesha in my next upgrade so my client systems can use NFS instead of fuse mounts. Having an integrated, designed in process to coordinate multiple nodes into an HA cluster will very welcome. > >> > >> On April 30, 2019 3:20:11 AM EDT, Jiffin Tony Thottan wrote: > >>>??? > >>> Hi all, > >>> > >>> Some of you folks may be familiar with HA solution provided for nfs-ganesha by gluster using pacemaker and corosync. > >>> > >>> That feature was removed in glusterfs 3.10 in favour for common HA project "Storhaug". Even Storhaug was not progressed > >>> > >>> much from last two years and current development is in halt state, hence planning to restore old HA ganesha solution back > >>> > >>> to gluster code repository with some improvement and targetting for next gluster release 7. > >>> > >>>? ??I have opened up an issue [1] with details and posted initial set of patches [2] > >>> > >>> Please share your thoughts on the same > >>> > >>> > >>> Regards, > >>> > >>> Jiffin > >>> > >>> [1] https://github.com/gluster/glusterfs/issues/663 > >>> > >>> [2] https://review.gluster.org/#/q/topic:rfc-663+(status:open+OR+status:merged) > >>> > >>> > >> -- > >> Sent from my Android device with K-9 Mail. All tyopes are thumb related and reflect authenticity. > > Keep in mind that corosync/pacemaker? is hard for proper setup by new admins/users. > > > > I'm still trying to remediate the effects of poor configuration at work. > > Also, storhaug is nice for hyperconverged setups where the host is not only hosting bricks, but? other? workloads. > > Corosync/pacemaker require proper fencing to be setup and most of the stonith resources 'shoot the other node in the head'. > > I would be happy to see an easy to deploy (let say 'cluster.enable-ha-ganesha true') and gluster to be bringing up the Floating IPs and taking care of the NFS locks, so no disruption will be felt by the clients. > > > > Still, this will be a lot of work to achieve. > > > > Best Regards, > > Strahil NikolovOn Apr 30, 2019 15:19, Jim Kinney wrote: > >> +1! > >> I'm using nfs-ganesha in my next upgrade so my client systems can use NFS instead of fuse mounts. Having an integrated, designed in process to coordinate multiple nodes into an HA cluster will very welcome. > >> > >> On April 30, 2019 3:20:11 AM EDT, Jiffin Tony Thottan wrote: > >>> Hi all, > >>> > >>> Some of you folks may be familiar with HA solution provided for nfs-ganesha by gluster using pacemaker and corosync. > >>> > >>> That feature was removed in glusterfs 3.10 in favour for common HA project "Storhaug". Even Storhaug was not progressed > >>> > >>> much from last two years and current development is in halt state, hence planning to restore old HA ganesha solution back > >>> > >>> to gluster code repository with some improvement and targetting for next gluster release 7. > >>> > >>> I have opened up an issue [1] with details and posted initial set of patches [2] > >>> > >>> Please share your thoughts on the same > >>> > >>> Regards, > >>> > >>> Jiffin > >>> > >>> [1] https://github.com/gluster/glusterfs/issues/663 > >>> > >>> [2] https://review.gluster.org/#/q/topic:rfc-663+(status:open+OR+status:merged) > >> > >> -- > >> Sent from my Android device with K-9 Mail. All tyopes are thumb related and reflect authenticity. From aspandey at redhat.com Mon May 6 06:21:22 2019 From: aspandey at redhat.com (Ashish Pandey) Date: Mon, 6 May 2019 02:21:22 -0400 (EDT) Subject: [Gluster-users] Thin-arbiter questions In-Reply-To: References: Message-ID: <757816852.16925254.1557123682731.JavaMail.zimbra@redhat.com> Hi, I can see that Amar has already committed the changes and those are visible on https://docs.gluster.org/en/latest/Administrator%20Guide/Thin-Arbiter-Volumes/ --- Ashish ----- Original Message ----- From: "Strahil" To: "Ashish" , "David" Cc: "gluster-users" Sent: Saturday, May 4, 2019 12:10:01 AM Subject: Re: [Gluster-users] Thin-arbiter questions Hi Ashish, Can someone commit the doc change I have already proposed ? At least, the doc will clarify that fact . Best Regards, Strahil Nikolov On May 3, 2019 05:30, Ashish Pandey wrote: Hi David, Creation of thin-arbiter volume is currently supported by GD2 only. The command " glustercli " is available when glusterd2 is running. We are also working on providing thin-arbiter support on glusted however, it is not available right now. https://review.gluster.org/#/c/glusterfs/+/22612/ --- Ashish From: "David Cunningham" To: gluster-users at gluster.org Sent: Friday, May 3, 2019 7:40:03 AM Subject: [Gluster-users] Thin-arbiter questions Hello, We are setting up a thin-arbiter and hope someone can help with some questions. We've been following the documentation from https://docs.gluster.org/en/latest/Administrator%20Guide/Thin-Arbiter-Volumes/ . 1. What release of 5.x supports thin-arbiter? We tried a "gluster volume create" with the --thin-arbiter option on 5.5 and got an "unrecognized option --thin-arbiter" error. 2. The instruction to create a new volume with a thin-arbiter is clear. How do you add a thin-arbiter to an already existing volume though? 3. The documentation suggests running glusterfsd manually to start the thin-arbiter. Is there a service that can do this instead? I found a mention of one in https://bugzilla.redhat.com/show_bug.cgi?id=1579786 but it's not really documented. Thanks in advance for your help, -- David Cunningham, Voisonics Limited http://voisonics.com/ USA: +1 213 221 1092 New Zealand: +64 (0)28 2558 3782 _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From dcunningham at voisonics.com Mon May 6 08:10:30 2019 From: dcunningham at voisonics.com (David Cunningham) Date: Mon, 6 May 2019 20:10:30 +1200 Subject: [Gluster-users] Thin-arbiter questions In-Reply-To: <757816852.16925254.1557123682731.JavaMail.zimbra@redhat.com> References: <757816852.16925254.1557123682731.JavaMail.zimbra@redhat.com> Message-ID: Hi Ashish, Thank you for the update. Does that mean they're now in the regular Glusterfs? Any idea how long it typically takes the Ubuntu and CentOS packages to be updated with the latest code? On Mon, 6 May 2019 at 18:21, Ashish Pandey wrote: > Hi, > > I can see that Amar has already committed the changes and those are > visible on > https://docs.gluster.org/en/latest/Administrator%20Guide/Thin-Arbiter-Volumes/ > > --- > Ashish > > > > ------------------------------ > *From: *"Strahil" > *To: *"Ashish" , "David" > *Cc: *"gluster-users" > *Sent: *Saturday, May 4, 2019 12:10:01 AM > *Subject: *Re: [Gluster-users] Thin-arbiter questions > > Hi Ashish, > > Can someone commit the doc change I have already proposed ? > At least, the doc will clarify that fact . > > Best Regards, > Strahil Nikolov > On May 3, 2019 05:30, Ashish Pandey wrote: > > Hi David, > > Creation of thin-arbiter volume is currently supported by GD2 only. The > command "glustercli" is available when glusterd2 is running. > We are also working on providing thin-arbiter support on glusted however, > it is not available right now. > https://review.gluster.org/#/c/glusterfs/+/22612/ > > --- > Ashish > > ------------------------------ > *From: *"David Cunningham" > *To: *gluster-users at gluster.org > *Sent: *Friday, May 3, 2019 7:40:03 AM > *Subject: *[Gluster-users] Thin-arbiter questions > > Hello, > > We are setting up a thin-arbiter and hope someone can help with some > questions. We've been following the documentation from > https://docs.gluster.org/en/latest/Administrator%20Guide/Thin-Arbiter-Volumes/ > . > > 1. What release of 5.x supports thin-arbiter? We tried a "gluster volume > create" with the --thin-arbiter option on 5.5 and got an "unrecognized > option --thin-arbiter" error. > > 2. The instruction to create a new volume with a thin-arbiter is clear. > How do you add a thin-arbiter to an already existing volume though? > > 3. The documentation suggests running glusterfsd manually to start the > thin-arbiter. Is there a service that can do this instead? I found a > mention of one in https://bugzilla.redhat.com/show_bug.cgi?id=1579786 but > it's not really documented. > > Thanks in advance for your help, > > -- > David Cunningham, Voisonics Limited > http://voisonics.com/ > USA: +1 213 221 1092 > New Zealand: +64 (0)28 2558 3782 > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > > -- David Cunningham, Voisonics Limited http://voisonics.com/ USA: +1 213 221 1092 New Zealand: +64 (0)28 2558 3782 -------------- next part -------------- An HTML attachment was scrubbed... URL: From aspandey at redhat.com Mon May 6 08:34:07 2019 From: aspandey at redhat.com (Ashish Pandey) Date: Mon, 6 May 2019 04:34:07 -0400 (EDT) Subject: [Gluster-users] Thin-arbiter questions In-Reply-To: References: <757816852.16925254.1557123682731.JavaMail.zimbra@redhat.com> Message-ID: <645227359.16980056.1557131647054.JavaMail.zimbra@redhat.com> ----- Original Message ----- From: "David Cunningham" To: "Ashish Pandey" Cc: "gluster-users" Sent: Monday, May 6, 2019 1:40:30 PM Subject: Re: [Gluster-users] Thin-arbiter questions Hi Ashish, Thank you for the update. Does that mean they're now in the regular Glusterfs? Any idea how long it typically takes the Ubuntu and CentOS packages to be updated with the latest code? No, for regular glusterd, work is still in progress. It will be done soon. I don't have answer for the next question. May be Amar have information regarding this. Adding him in CC. On Mon, 6 May 2019 at 18:21, Ashish Pandey < aspandey at redhat.com > wrote: Hi, I can see that Amar has already committed the changes and those are visible on https://docs.gluster.org/en/latest/Administrator%20Guide/Thin-Arbiter-Volumes/ --- Ashish From: "Strahil" < hunter86_bg at yahoo.com > To: "Ashish" < aspandey at redhat.com >, "David" < dcunningham at voisonics.com > Cc: "gluster-users" < gluster-users at gluster.org > Sent: Saturday, May 4, 2019 12:10:01 AM Subject: Re: [Gluster-users] Thin-arbiter questions Hi Ashish, Can someone commit the doc change I have already proposed ? At least, the doc will clarify that fact . Best Regards, Strahil Nikolov On May 3, 2019 05:30, Ashish Pandey < aspandey at redhat.com > wrote:
Hi David, Creation of thin-arbiter volume is currently supported by GD2 only. The command " glustercli " is available when glusterd2 is running. We are also working on providing thin-arbiter support on glusted however, it is not available right now. https://review.gluster.org/#/c/glusterfs/+/22612/ --- Ashish From: "David Cunningham" < dcunningham at voisonics.com > To: gluster-users at gluster.org Sent: Friday, May 3, 2019 7:40:03 AM Subject: [Gluster-users] Thin-arbiter questions Hello, We are setting up a thin-arbiter and hope someone can help with some questions. We've been following the documentation from https://docs.gluster.org/en/latest/Administrator%20Guide/Thin-Arbiter-Volumes/ . 1. What release of 5.x supports thin-arbiter? We tried a "gluster volume create" with the --thin-arbiter option on 5.5 and got an "unrecognized option --thin-arbiter" error. 2. The instruction to create a new volume with a thin-arbiter is clear. How do you add a thin-arbiter to an already existing volume though? 3. The documentation suggests running glusterfsd manually to start the thin-arbiter. Is there a service that can do this instead? I found a mention of one in https://bugzilla.redhat.com/show_bug.cgi?id=1579786 but it's not really documented. Thanks in advance for your help, -- David Cunningham, Voisonics Limited http://voisonics.com/ USA: +1 213 221 1092 New Zealand: +64 (0)28 2558 3782 _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
-- David Cunningham, Voisonics Limited http://voisonics.com/ USA: +1 213 221 1092 New Zealand: +64 (0)28 2558 3782 _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From spisla80 at gmail.com Mon May 6 10:08:39 2019 From: spisla80 at gmail.com (David Spisla) Date: Mon, 6 May 2019 12:08:39 +0200 Subject: [Gluster-users] Brick-Xlators crashes after Set-RO and Read Message-ID: Hello folks, we have a client application (runs on Win10) which does some FOPs on a gluster volume which is accessed by SMB. *Scenario 1* is a READ Operation which reads all files successively and checks if the files data was correctly copied. While doing this, all brick processes crashes and in the logs one have this crash report on every brick log: > CTX_ID:a0359502-2c76-4fee-8cb9-365679dc690e-GRAPH_ID:0-PID:32934-HOST:XX-XXXXX-XX-XX-PC_NAME:shortterm-client-2-RECON_NO:-0, gfid: 00000000-0000-0000-0000-000000000001, req(uid:2000,gid:2000,perm:1,ngrps:1), ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission denied] > pending frames: > frame : type(0) op(27) > frame : type(0) op(40) > patchset: git://git.gluster.org/glusterfs.git > signal received: 11 > time of crash: > 2019-04-16 08:32:21 > configuration details: > argp 1 > backtrace 1 > dlfcn 1 > libpthread 1 > llistxattr 1 > setfsid 1 > spinlock 1 > epoll.h 1 > xattr.h 1 > st_atim.tv_nsec 1 > package-string: glusterfs 5.5 > /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f9a5bd4d64c] > /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f9a5bd57d26] > /lib64/libc.so.6(+0x361a0)[0x7f9a5af141a0] > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7f9a4ef0e910] > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7f9a4ef0b118] > /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7f9a4f1278d6] > /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7f9a4f35975b] > /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7f9a4f1203b3] > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7f9a4ef0b5b2] > /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c] > /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c] > /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7f9a4e8cf548] > /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7f9a5bdefc22] > /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7f9a5bd733a5] > /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7f9a4e6b7088] > /lib64/libpthread.so.0(+0x7569)[0x7f9a5b29f569] > /lib64/libc.so.6(clone+0x3f)[0x7f9a5afd69af] > > *Scenario 2 *The application just SET Read-Only on each file sucessively. After the 70th file was set, all the bricks crashes and again, one can read this crash report in every brick log: > > > [2019-05-02 07:43:39.953591] I [MSGID: 139001] > [posix-acl.c:263:posix_acl_log_permit_denied] 0-longterm-access-control: > client: > CTX_ID:21aa9c75-3a5f-41f9-925b-48e4c80bd24a-GRAPH_ID:0-PID:16325-HOST:XXX-X-X-XXX-PC_NAME:longterm-client-0-RECON_NO:-0, > gfid: 00000000-0000-0000-0000-000000000001, > req(uid:2000,gid:2000,perm:1,ngrps:1), > ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission > denied] > > pending frames: > > frame : type(0) op(27) > > patchset: git://git.gluster.org/glusterfs.git > > signal received: 11 > > time of crash: > > 2019-05-02 07:43:39 > > configuration details: > > argp 1 > > backtrace 1 > > dlfcn 1 > > libpthread 1 > > llistxattr 1 > > setfsid 1 > > spinlock 1 > > epoll.h 1 > > xattr.h 1 > > st_atim.tv_nsec 1 > > package-string: glusterfs 5.5 > > /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fbb3f0b364c] > > /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fbb3f0bdd26] > > /lib64/libc.so.6(+0x361e0)[0x7fbb3e27a1e0] > > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7fbb32257910] > > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7fbb32254118] > > /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7fbb324708d6] > > > /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7fbb326a275b] > > /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7fbb324693b3] > > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7fbb322545b2] > > /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c] > > /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c] > > /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7fbb31c18548] > > /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7fbb3f155c22] > > /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7fbb3f0d93a5] > > > /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7fbb31a00088] > > /lib64/libpthread.so.0(+0x7569)[0x7fbb3e605569] > > /lib64/libc.so.6(clone+0x3f)[0x7fbb3e33c9ef] > This happens on a 3-Node Gluster v5.5 Cluster on two different volumes. But both volumes has the same settings: > Volume Name: shortterm > Type: Replicate > Volume ID: 5307e5c5-e8a1-493a-a846-342fb0195dee > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: fs-xxxxx-c1-n1:/gluster/brick4/glusterbrick > Brick2: fs-xxxxx-c1-n2:/gluster/brick4/glusterbrick > Brick3: fs-xxxxx-c1-n3:/gluster/brick4/glusterbrick > Options Reconfigured: > storage.reserve: 1 > performance.client-io-threads: off > nfs.disable: on > transport.address-family: inet > user.smb: disable > features.read-only: off > features.worm: off > features.worm-file-level: on > features.retention-mode: enterprise > features.default-retention-period: 120 > network.ping-timeout: 10 > features.cache-invalidation: on > features.cache-invalidation-timeout: 600 > performance.nl-cache: on > performance.nl-cache-timeout: 600 > client.event-threads: 32 > server.event-threads: 32 > cluster.lookup-optimize: on > performance.stat-prefetch: on > performance.cache-invalidation: on > performance.md-cache-timeout: 600 > performance.cache-samba-metadata: on > performance.cache-ima-xattrs: on > performance.io-thread-count: 64 > cluster.use-compound-fops: on > performance.cache-size: 512MB > performance.cache-refresh-timeout: 10 > performance.read-ahead: off > performance.write-behind-window-size: 4MB > performance.write-behind: on > storage.build-pgfid: on > features.utime: on > storage.ctime: on > cluster.quorum-type: fixed > cluster.quorum-count: 2 > features.bitrot: on > features.scrub: Active > features.scrub-freq: daily > cluster.enable-shared-storage: enable > > Why can this happen to all Brick processes? I don't understand the crash report. The FOPs are nothing special and after restart brick processes everything works fine and our application was succeed. Regards David Spisla -------------- next part -------------- An HTML attachment was scrubbed... URL: From spisla80 at gmail.com Mon May 6 11:51:32 2019 From: spisla80 at gmail.com (David Spisla) Date: Mon, 6 May 2019 13:51:32 +0200 Subject: [Gluster-users] Hard Failover with Samba and Glusterfs In-Reply-To: References: Message-ID: Hello, I create a Bug for this issue: https://bugzilla.redhat.com/show_bug.cgi?id=1706842 Regards David Spisla Am Mi., 1. Mai 2019 um 14:46 Uhr schrieb Amar Tumballi Suryanarayan < atumball at redhat.com>: > > > On Wed, Apr 17, 2019 at 1:33 PM David Spisla wrote: > >> Dear Gluster Community, >> >> I have this setup: 4-Node Glusterfs v5.5 Cluster, using SAMBA/CTDB v4.8 >> to access the volumes (each node has a VIP) >> >> I was testing this failover scenario: >> >> 1. Start Writing 940 GB with small files (64K-100K)from a Win10 Client >> to node1 >> 2. During the write process I hardly shutdown node1 (where the client >> is connect via VIP) by turn off the power >> >> My expectation is, that the write process stops and after a while the >> Win10 Client offers me a Retry, so I can continue the write on different >> node (which has now the VIP of node1). >> In past time I did this observation, but now the system shows a strange >> bahaviour: >> >> The Win10 Client do nothing and the Explorer freezes, in the backend CTDB >> can not perform the failover and throws errors. The glusterd from node2 and >> node3 logs this messages: >> >>> [2019-04-16 14:47:31.828323] W [glusterd-locks.c:795:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x24349) [0x7f1a62fcb349] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x2d950) [0x7f1a62fd4950] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0xe0359) [0x7f1a63087359] ) 0-management: Lock for vol archive1 not held >>> [2019-04-16 14:47:31.828350] W [MSGID: 106117] [glusterd-handler.c:6451:__glusterd_peer_rpc_notify] 0-management: Lock not released for archive1 >>> [2019-04-16 14:47:31.828369] W [glusterd-locks.c:795:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x24349) [0x7f1a62fcb349] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x2d950) [0x7f1a62fd4950] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0xe0359) [0x7f1a63087359] ) 0-management: Lock for vol archive2 not held >>> [2019-04-16 14:47:31.828376] W [MSGID: 106117] [glusterd-handler.c:6451:__glusterd_peer_rpc_notify] 0-management: Lock not released for archive2 >>> [2019-04-16 14:47:31.828412] W [glusterd-locks.c:795:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x24349) [0x7f1a62fcb349] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x2d950) [0x7f1a62fd4950] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0xe0359) [0x7f1a63087359] ) 0-management: Lock for vol gluster_shared_storage not held >>> [2019-04-16 14:47:31.828423] W [MSGID: 106117] [glusterd-handler.c:6451:__glusterd_peer_rpc_notify] 0-management: Lock not released for gluster_shared_storage >>> >>> >> *In my oponion Samba/CTDB can not perform the failover correctly and >> continue the write process because glusterfs didn't released the lock.* >> What do you think? It seems to me like a bug because in past time the >> failover works correctly. >> >> > Thanks for the report David. It surely looks like a bug, and I would let > some experts on this domain answer the question. One request on such thing > is to file a bug (preferred) or github issue, so it can be present in > system. > > >> Regards >> David Spisla >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > > > -- > Amar Tumballi (amarts) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lm at zork.pl Mon May 6 13:13:14 2019 From: lm at zork.pl (=?UTF-8?Q?=c5=81ukasz_Michalski?=) Date: Mon, 6 May 2019 15:13:14 +0200 Subject: [Gluster-users] heal: Not able to fetch volfile from glusterd Message-ID: <4376d725-a451-7b18-a7a1-c5285c3570b3@zork.pl> Hi, I have problem resolving split-brain in one of my installations. CenOS 7, glusterfs 3.10.12, replica on two nodes: [root at ixmed1 iscsi]# gluster volume status cluster Status of volume: cluster Gluster process???????????????????????????? TCP Port? RDMA Port Online? Pid ------------------------------------------------------------------------------ Brick ixmed2:/glusterfs-bricks/cluster/clus ter???????????????????????????????????????? 49153???? 0 Y?????? 3028 Brick ixmed1:/glusterfs-bricks/cluster/clus ter???????????????????????????????????????? 49153???? 0 Y?????? 2917 Self-heal Daemon on localhost?????????????? N/A?????? N/A Y?????? 112929 Self-heal Daemon on ixmed2????????????????? N/A?????? N/A Y?????? 57774 Task Status of Volume cluster ------------------------------------------------------------------------------ There are no active volume tasks When I try to access one file glusterd reports split brain: [2019-05-06 12:36:43.785098] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-cluster-replicate-0: Failing READ on gfid 2584a0e2-c0fa-4fde-8537-5d5b6a5a4635: split-brain observed. [Input/output error] [2019-05-06 12:36:43.787952] E [MSGID: 108008] [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-cluster-replicate-0: Failing FGETXATTR on gfid 2584a0e2-c0fa-4fde-8537-5d5b6a5a4635: split-brain observed. [Input/output error] [2019-05-06 12:36:43.788778] W [MSGID: 108027] [afr-common.c:2722:afr_discover_done] 0-cluster-replicate-0: no read subvols for (null) [2019-05-06 12:36:43.790123] W [fuse-bridge.c:2254:fuse_readv_cbk] 0-glusterfs-fuse: 3352501: READ => -1 gfid=2584a0e2-c0fa-4fde-8537-5d5b6a5a4635 fd=0x7fde0803f390 (Input/output error) [2019-05-06 12:36:43.794979] W [fuse-bridge.c:2254:fuse_readv_cbk] 0-glusterfs-fuse: 3352506: READ => -1 gfid=2584a0e2-c0fa-4fde-8537-5d5b6a5a4635 fd=0x7fde08215ed0 (Input/output error) [2019-05-06 12:36:43.800468] W [fuse-bridge.c:2254:fuse_readv_cbk] 0-glusterfs-fuse: 3352508: READ => -1 gfid=2584a0e2-c0fa-4fde-8537-5d5b6a5a4635 fd=0x7fde08215ed0 (Input/output error) The problem is that "gluster volume heal info" hangs for 10 seconds and returns: ??? Not able to fetch volfile from glusterd ??? Volume heal failed glfsheal.log contains: [2019-05-06 12:40:25.589879] I [afr.c:94:fix_quorum_options] 0-cluster-replicate-0: reindeer: incoming qtype = none [2019-05-06 12:40:25.589967] I [afr.c:116:fix_quorum_options] 0-cluster-replicate-0: reindeer: quorum_count = 0 [2019-05-06 12:40:25.593294] W [MSGID: 101174] [graph.c:361:_log_if_unknown_option] 0-cluster-readdir-ahead: option 'parallel-readdir' is not recognized [2019-05-06 12:40:25.593895] I [MSGID: 104045] [glfs-master.c:91:notify] 0-gfapi: New graph 69786d65-6431-2d32-3037-3739322d3230 (0) coming up [2019-05-06 12:40:25.593972] I [MSGID: 114020] [client.c:2352:notify] 0-cluster-client-0: parent translators are ready, attempting connect on transport [2019-05-06 12:40:25.607836] I [MSGID: 114020] [client.c:2352:notify] 0-cluster-client-1: parent translators are ready, attempting connect on transport [2019-05-06 12:40:25.608556] I [rpc-clnt.c:2000:rpc_clnt_reconfig] 0-cluster-client-0: changing port to 49153 (from 0) [2019-05-06 12:40:25.618167] I [rpc-clnt.c:2000:rpc_clnt_reconfig] 0-cluster-client-1: changing port to 49153 (from 0) [2019-05-06 12:40:25.629595] I [MSGID: 114057] [client-handshake.c:1451:select_server_supported_programs] 0-cluster-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2019-05-06 12:40:25.632031] I [MSGID: 114046] [client-handshake.c:1216:client_setvolume_cbk] 0-cluster-client-0: Connected to cluster-client-0, attached to remote volume '/glusterfs-bricks/cluster/cluster'. [2019-05-06 12:40:25.632100] I [MSGID: 114047] [client-handshake.c:1227:client_setvolume_cbk] 0-cluster-client-0: Server and Client lk-version numbers are not same, reopening the fds [2019-05-06 12:40:25.632263] I [MSGID: 108005] [afr-common.c:4817:afr_notify] 0-cluster-replicate-0: Subvolume 'cluster-client-0' came back up; going online. [2019-05-06 12:40:25.637707] I [MSGID: 114057] [client-handshake.c:1451:select_server_supported_programs] 0-cluster-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2019-05-06 12:40:25.639285] I [MSGID: 114046] [client-handshake.c:1216:client_setvolume_cbk] 0-cluster-client-1: Connected to cluster-client-1, attached to remote volume '/glusterfs-bricks/cluster/cluster'. [2019-05-06 12:40:25.639341] I [MSGID: 114047] [client-handshake.c:1227:client_setvolume_cbk] 0-cluster-client-1: Server and Client lk-version numbers are not same, reopening the fds [2019-05-06 12:40:31.564407] C [rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired] 0-cluster-client-0: server 10.0.104.26:49153 has not responded in the last 5 seconds, disconnecting. [2019-05-06 12:40:31.565764] C [rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired] 0-cluster-client-1: server 10.0.7.26:49153 has not responded in the last 5 seconds, disconnecting. [2019-05-06 12:40:35.645545] I [MSGID: 114018] [client.c:2276:client_rpc_notify] 0-cluster-client-0: disconnected from cluster-client-0. Client process will keep trying to connect to glusterd until brick's port is available [2019-05-06 12:40:35.645683] I [socket.c:3534:socket_submit_request] 0-cluster-client-0: not connected (priv->connected = -1) [2019-05-06 12:40:35.645755] W [rpc-clnt.c:1693:rpc_clnt_submit] 0-cluster-client-0: failed to submit rpc-request (XID: 0x7 Program: GlusterFS 3.3, ProgVers: 330, Proc: 14) to rpc-transport (cluster-client-0) [2019-05-06 12:40:35.645807] W [MSGID: 114031] [client-rpc-fops.c:797:client3_3_statfs_cbk] 0-cluster-client-0: remote operation failed [Drugi koniec nie jest po??czony] [2019-05-06 12:40:35.645887] I [socket.c:3534:socket_submit_request] 0-cluster-client-1: not connected (priv->connected = -1) [2019-05-06 12:40:35.645918] W [rpc-clnt.c:1693:rpc_clnt_submit] 0-cluster-client-1: failed to submit rpc-request (XID: 0x7 Program: GlusterFS 3.3, ProgVers: 330, Proc: 14) to rpc-transport (cluster-client-1) [2019-05-06 12:40:35.645955] W [MSGID: 114031] [client-rpc-fops.c:797:client3_3_statfs_cbk] 0-cluster-client-1: remote operation failed [Drugi koniec nie jest po??czony] [2019-05-06 12:40:35.646008] W [MSGID: 109075] [dht-diskusage.c:44:dht_du_info_cbk] 0-cluster-dht: failed to get disk info from cluster-replicate-0 [Drugi koniec nie jest po??czony] [2019-05-06 12:40:35.647846] I [MSGID: 114018] [client.c:2276:client_rpc_notify] 0-cluster-client-1: disconnected from cluster-client-1. Client process will keep trying to connect to glusterd until brick's port is available [2019-05-06 12:40:35.647895] E [MSGID: 108006] [afr-common.c:4842:afr_notify] 0-cluster-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up. [2019-05-06 12:40:35.647989] I [MSGID: 108006] [afr-common.c:4984:afr_local_init] 0-cluster-replicate-0: no subvolumes up [2019-05-06 12:40:35.648051] I [MSGID: 108006] [afr-common.c:4984:afr_local_init] 0-cluster-replicate-0: no subvolumes up [2019-05-06 12:40:35.648122] I [MSGID: 104039] [glfs-resolve.c:902:__glfs_active_subvol] 0-cluster: first lookup on graph 69786d65-6431-2d32-3037-3739322d3230 (0) failed (Drugi koniec nie jest po??czony) [Drugi koniec nie jest po??czony] "Drugi koniec nie jest po??czony" -> Transport endpoint not connected On brick process side there is an connection attempt: [2019-05-06 12:40:25.638032] I [addr.c:182:gf_auth] 0-/glusterfs-bricks/cluster/cluster: allowed = "*", received addr = "10.0.7.26" [2019-05-06 12:40:25.638080] I [login.c:111:gf_auth] 0-auth/login: allowed user names: e2f4c8f4-d040-4856-b6e3-62611fbab0ea [2019-05-06 12:40:25.638109] I [MSGID: 115029] [server-handshake.c:695:server_setvolume] 0-cluster-server: accepted client from ixmed1-207792-2019/05/06-12:40:25:562982-cluster-client-1-0-0 (version: 3.10.12) [2019-05-06 12:40:31.565931] I [MSGID: 115036] [server.c:559:server_rpc_notify] 0-cluster-server: disconnecting connection from ixmed1-207792-2019/05/06-12:40:25:562982-cluster-client-1-0-0 [2019-05-06 12:40:31.566420] I [MSGID: 101055] [client_t.c:436:gf_client_unref] 0-cluster-server: Shutting down connection ixmed1-207792-2019/05/06-12:40:25:562982-cluster-client-1-0-0 I am not able to use any heal command because of this problem. I have three volumes configured on that nodes. Configuration is identical and "gluster volume heal" command fails for all of them. Can anyone help? Thanks, ?ukasz From vbellur at redhat.com Mon May 6 17:48:22 2019 From: vbellur at redhat.com (Vijay Bellur) Date: Mon, 6 May 2019 10:48:22 -0700 Subject: [Gluster-users] Brick-Xlators crashes after Set-RO and Read In-Reply-To: References: Message-ID: Thank you for the report, David. Do you have core files available on any of the servers? If yes, would it be possible for you to provide a backtrace. Regards, Vijay On Mon, May 6, 2019 at 3:09 AM David Spisla wrote: > Hello folks, > > we have a client application (runs on Win10) which does some FOPs on a > gluster volume which is accessed by SMB. > > *Scenario 1* is a READ Operation which reads all files successively and > checks if the files data was correctly copied. While doing this, all brick > processes crashes and in the logs one have this crash report on every brick > log: > >> CTX_ID:a0359502-2c76-4fee-8cb9-365679dc690e-GRAPH_ID:0-PID:32934-HOST:XX-XXXXX-XX-XX-PC_NAME:shortterm-client-2-RECON_NO:-0, gfid: 00000000-0000-0000-0000-000000000001, req(uid:2000,gid:2000,perm:1,ngrps:1), ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission denied] >> pending frames: >> frame : type(0) op(27) >> frame : type(0) op(40) >> patchset: git://git.gluster.org/glusterfs.git >> signal received: 11 >> time of crash: >> 2019-04-16 08:32:21 >> configuration details: >> argp 1 >> backtrace 1 >> dlfcn 1 >> libpthread 1 >> llistxattr 1 >> setfsid 1 >> spinlock 1 >> epoll.h 1 >> xattr.h 1 >> st_atim.tv_nsec 1 >> package-string: glusterfs 5.5 >> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f9a5bd4d64c] >> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f9a5bd57d26] >> /lib64/libc.so.6(+0x361a0)[0x7f9a5af141a0] >> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7f9a4ef0e910] >> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7f9a4ef0b118] >> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7f9a4f1278d6] >> /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7f9a4f35975b] >> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7f9a4f1203b3] >> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7f9a4ef0b5b2] >> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c] >> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c] >> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7f9a4e8cf548] >> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7f9a5bdefc22] >> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7f9a5bd733a5] >> /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7f9a4e6b7088] >> /lib64/libpthread.so.0(+0x7569)[0x7f9a5b29f569] >> /lib64/libc.so.6(clone+0x3f)[0x7f9a5afd69af] >> >> *Scenario 2 *The application just SET Read-Only on each file > sucessively. After the 70th file was set, all the bricks crashes and again, > one can read this crash report in every brick log: > >> >> >> [2019-05-02 07:43:39.953591] I [MSGID: 139001] >> [posix-acl.c:263:posix_acl_log_permit_denied] 0-longterm-access-control: >> client: >> CTX_ID:21aa9c75-3a5f-41f9-925b-48e4c80bd24a-GRAPH_ID:0-PID:16325-HOST:XXX-X-X-XXX-PC_NAME:longterm-client-0-RECON_NO:-0, >> gfid: 00000000-0000-0000-0000-000000000001, >> req(uid:2000,gid:2000,perm:1,ngrps:1), >> ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission >> denied] >> >> pending frames: >> >> frame : type(0) op(27) >> >> patchset: git://git.gluster.org/glusterfs.git >> >> signal received: 11 >> >> time of crash: >> >> 2019-05-02 07:43:39 >> >> configuration details: >> >> argp 1 >> >> backtrace 1 >> >> dlfcn 1 >> >> libpthread 1 >> >> llistxattr 1 >> >> setfsid 1 >> >> spinlock 1 >> >> epoll.h 1 >> >> xattr.h 1 >> >> st_atim.tv_nsec 1 >> >> package-string: glusterfs 5.5 >> >> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fbb3f0b364c] >> >> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fbb3f0bdd26] >> >> /lib64/libc.so.6(+0x361e0)[0x7fbb3e27a1e0] >> >> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7fbb32257910] >> >> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7fbb32254118] >> >> >> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7fbb324708d6] >> >> >> /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7fbb326a275b] >> >> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7fbb324693b3] >> >> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7fbb322545b2] >> >> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c] >> >> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c] >> >> >> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7fbb31c18548] >> >> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7fbb3f155c22] >> >> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7fbb3f0d93a5] >> >> >> /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7fbb31a00088] >> >> /lib64/libpthread.so.0(+0x7569)[0x7fbb3e605569] >> >> /lib64/libc.so.6(clone+0x3f)[0x7fbb3e33c9ef] >> > > This happens on a 3-Node Gluster v5.5 Cluster on two different volumes. > But both volumes has the same settings: > >> Volume Name: shortterm >> Type: Replicate >> Volume ID: 5307e5c5-e8a1-493a-a846-342fb0195dee >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 1 x 3 = 3 >> Transport-type: tcp >> Bricks: >> Brick1: fs-xxxxx-c1-n1:/gluster/brick4/glusterbrick >> Brick2: fs-xxxxx-c1-n2:/gluster/brick4/glusterbrick >> Brick3: fs-xxxxx-c1-n3:/gluster/brick4/glusterbrick >> Options Reconfigured: >> storage.reserve: 1 >> performance.client-io-threads: off >> nfs.disable: on >> transport.address-family: inet >> user.smb: disable >> features.read-only: off >> features.worm: off >> features.worm-file-level: on >> features.retention-mode: enterprise >> features.default-retention-period: 120 >> network.ping-timeout: 10 >> features.cache-invalidation: on >> features.cache-invalidation-timeout: 600 >> performance.nl-cache: on >> performance.nl-cache-timeout: 600 >> client.event-threads: 32 >> server.event-threads: 32 >> cluster.lookup-optimize: on >> performance.stat-prefetch: on >> performance.cache-invalidation: on >> performance.md-cache-timeout: 600 >> performance.cache-samba-metadata: on >> performance.cache-ima-xattrs: on >> performance.io-thread-count: 64 >> cluster.use-compound-fops: on >> performance.cache-size: 512MB >> performance.cache-refresh-timeout: 10 >> performance.read-ahead: off >> performance.write-behind-window-size: 4MB >> performance.write-behind: on >> storage.build-pgfid: on >> features.utime: on >> storage.ctime: on >> cluster.quorum-type: fixed >> cluster.quorum-count: 2 >> features.bitrot: on >> features.scrub: Active >> features.scrub-freq: daily >> cluster.enable-shared-storage: enable >> >> > Why can this happen to all Brick processes? I don't understand the crash > report. The FOPs are nothing special and after restart brick processes > everything works fine and our application was succeed. > > Regards > David Spisla > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From atumball at redhat.com Mon May 6 18:15:04 2019 From: atumball at redhat.com (Amar Tumballi Suryanarayan) Date: Mon, 6 May 2019 14:15:04 -0400 Subject: [Gluster-users] gluster-block v0.4 is alive! In-Reply-To: References: Message-ID: On Thu, May 2, 2019 at 1:35 PM Prasanna Kalever wrote: > Hello Gluster folks, > > Gluster-block team is happy to announce the v0.4 release [1]. > > This is the new stable version of gluster-block, lots of new and > exciting features and interesting bug fixes are made available as part > of this release. > Please find the big list of release highlights and notable fixes at [2]. > > Good work Team (Prasanna and Xiubo Li to be precise)!! This was much needed release w.r.to gluster-block project, mainly because of the number of improvements done since last release. Also, gluster-block release 0.3 was not compatible with glusterfs-6.x series. All, feel free to use it if your deployment has any usecase for Block storage, and give us feedback. Happy to make sure gluster-block is stable for you. Regards, Amar > Details about installation can be found in the easy install guide at > [3]. Find the details about prerequisites and setup guide at [4]. > If you are a new user, checkout the demo video attached in the README > doc [5], which will be a good source of intro to the project. > There are good examples about how to use gluster-block both in the man > pages [6] and test file [7] (also in the README). > > gluster-block is part of fedora package collection, an updated package > with release version v0.4 will be soon made available. And the > community provided packages will be soon made available at [8]. > > Please spend a minute to report any kind of issue that comes to your > notice with this handy link [9]. > We look forward to your feedback, which will help gluster-block get better! > > We would like to thank all our users, contributors for bug filing and > fixes, also the whole team who involved in the huge effort with > pre-release testing. > > > [1] https://github.com/gluster/gluster-block > [2] https://github.com/gluster/gluster-block/releases > [3] https://github.com/gluster/gluster-block/blob/master/INSTALL > [4] https://github.com/gluster/gluster-block#usage > [5] https://github.com/gluster/gluster-block/blob/master/README.md > [6] https://github.com/gluster/gluster-block/tree/master/docs > [7] https://github.com/gluster/gluster-block/blob/master/tests/basic.t > [8] https://download.gluster.org/pub/gluster/gluster-block/ > [9] https://github.com/gluster/gluster-block/issues/new > > Cheers, > Team Gluster-Block! > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > > > -- Amar Tumballi (amarts) -------------- next part -------------- An HTML attachment was scrubbed... URL: From atumball at redhat.com Mon May 6 18:16:25 2019 From: atumball at redhat.com (Amar Tumballi Suryanarayan) Date: Mon, 6 May 2019 14:16:25 -0400 Subject: [Gluster-users] [External] Re: anyone using gluster-block? In-Reply-To: References: Message-ID: Davide, With release 0.4, gluster-block is now having more functionality, and we did many stability fixes. Feel free to try out, and let us know how you feel. -Amar On Fri, Nov 9, 2018 at 3:36 AM Davide Obbi wrote: > Hi Vijay, > > The Volume has been created using heketi-cli blockvolume create command. > The block config is the config applied by heketi out of the box and in my > case ended up to be: > - 3 nodes each with 1 brick > - the brick is carved from a VG with a single PV > - the PV consists of a 1.2TB SSD, not partitioned and no HW RAID behind > - the volume does not have any custom setting aside what configured in > /etc/glusterfs/group-gluster-block by default > performance.quick-read=off > performance.read-ahead=off > performance.io-cache=off > performance.stat-prefetch=off > performance.open-behind=off > performance.readdir-ahead=off > performance.strict-o-direct=on > network.remote-dio=disable > cluster.eager-lock=enable > cluster.quorum-type=auto > cluster.data-self-heal-algorithm=full > cluster.locking-scheme=granular > cluster.shd-max-threads=8 > cluster.shd-wait-qlength=10000 > features.shard=on > features.shard-block-size=64MB > user.cifs=off > server.allow-insecure=on > cluster.choose-local=off > > Kernel: 3.10.0-862.11.6.el7.x86_64 > OS: Centos 7.5.1804 > tcmu-runner: 0.2rc4.el7 > > Each node has 32 cores and 128GB RAM and 10Gb connection. > > What i am trying to understand is what should be performance expectations > with gluster-block since i couldnt find many benchmarks online. > > Regards > Davide > > > On Fri, Nov 9, 2018 at 7:07 AM Vijay Bellur wrote: > >> Hi Davide, >> >> Can you please share the block hosting volume configuration? >> >> Also, more details about the kernel and tcmu-runner versions could help >> in understanding the problem better. >> >> Thanks, >> Vijay >> >> On Tue, Nov 6, 2018 at 6:16 AM Davide Obbi >> wrote: >> >>> Hi, >>> >>> i am testing gluster-block and i am wondering if someone has used it and >>> have some feedback regarding its performance.. just to set some >>> expectations... for example: >>> - i have deployed a block volume using heketi on a 3 nodes gluster4.1 >>> cluster. it's a replica3 volume. >>> - i have mounted via iscsi using multipath config suggested, created >>> vg/lv and put xfs on it >>> - all done without touching any volume setting or customizing xfs >>> parameters etc.. >>> - all baremetal running on 10Gb, gluster has a single block device, SSD >>> in use by heketi >>> >>> so i tried a dd and i get a 4.7 MB/s? >>> - on the gluster nodes i have in write ~200iops, ~15MB/s, 75% util >>> steady and spiky await time up to 100ms alternating between the servers. >>> CPUs are mostly idle but there is some waiting... >>> - Glusterd and fsd utilization is below 1% >>> >>> The thing is that a gluster fuse mount on same platform does not have >>> this slowness so there must be something wrong with my understanding of >>> gluster-block? >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> > > -- > Davide Obbi > System Administrator > > Booking.com B.V. > Vijzelstraat 66-80 Amsterdam 1017HL Netherlands > Direct +31207031558 > [image: Booking.com] > Empowering People to experience the world since 1996 > 43 languages, 214+ offices worldwide, 141,000+ global destinations, 29 > million reported listings > Subsidiary of Booking Holdings Inc. (NASDAQ: BKNG) > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -- Amar Tumballi (amarts) -------------- next part -------------- An HTML attachment was scrubbed... URL: From jthottan at redhat.com Tue May 7 04:10:11 2019 From: jthottan at redhat.com (Jiffin Tony Thottan) Date: Tue, 7 May 2019 09:40:11 +0530 Subject: [Gluster-users] Proposing to previous ganesha HA clustersolution back to gluster code as gluster-7 feature In-Reply-To: References: Message-ID: Hi On 04/05/19 12:04 PM, Strahil wrote: > Hi Jiffin, > > No vendor will support your corosync/pacemaker stack if you do not have proper fencing. > As Gluster is already a cluster of its own, it makes sense to control everything from there. > > Best Regards, Yeah I agree with your point. What I meant to say by default this feature won't provide any fencing mechanism, user need to manually configure fencing for the cluster. In future we can try to include to default fencing configuration for the ganesha cluster as part of the Ganesha HA configuration Regards, Jiffin > Strahil NikolovOn May 3, 2019 09:08, Jiffin Tony Thottan wrote: >> >> On 30/04/19 6:59 PM, Strahil Nikolov wrote: >>> Hi, >>> >>> I'm posting this again as it got bounced. >>> Keep in mind that corosync/pacemaker? is hard for proper setup by new admins/users. >>> >>> I'm still trying to remediate the effects of poor configuration at work. >>> Also, storhaug is nice for hyperconverged setups where the host is not only hosting bricks, but? other? workloads. >>> Corosync/pacemaker require proper fencing to be setup and most of the stonith resources 'shoot the other node in the head'. >>> I would be happy to see an easy to deploy (let say 'cluster.enable-ha-ganesha true') and gluster to be bringing up the Floating IPs and taking care of the NFS locks, so no disruption will be felt by the clients. >> >> It do take care those, but need to follow certain prerequisite, but >> please fencing won't configured for this setup. May we think about in >> future. >> >> -- >> >> Jiffin >> >>> Still, this will be a lot of work to achieve. >>> >>> Best Regards, >>> Strahil Nikolov >>> >>> On Apr 30, 2019 15:19, Jim Kinney wrote: >>>> >>>> +1! >>>> I'm using nfs-ganesha in my next upgrade so my client systems can use NFS instead of fuse mounts. Having an integrated, designed in process to coordinate multiple nodes into an HA cluster will very welcome. >>>> >>>> On April 30, 2019 3:20:11 AM EDT, Jiffin Tony Thottan wrote: >>>>> >>>>> Hi all, >>>>> >>>>> Some of you folks may be familiar with HA solution provided for nfs-ganesha by gluster using pacemaker and corosync. >>>>> >>>>> That feature was removed in glusterfs 3.10 in favour for common HA project "Storhaug". Even Storhaug was not progressed >>>>> >>>>> much from last two years and current development is in halt state, hence planning to restore old HA ganesha solution back >>>>> >>>>> to gluster code repository with some improvement and targetting for next gluster release 7. >>>>> >>>>> ? ??I have opened up an issue [1] with details and posted initial set of patches [2] >>>>> >>>>> Please share your thoughts on the same >>>>> >>>>> >>>>> Regards, >>>>> >>>>> Jiffin >>>>> >>>>> [1] https://github.com/gluster/glusterfs/issues/663 >>>>> >>>>> [2] https://review.gluster.org/#/q/topic:rfc-663+(status:open+OR+status:merged) >>>>> >>>>> >>>> -- >>>> Sent from my Android device with K-9 Mail. All tyopes are thumb related and reflect authenticity. >>> Keep in mind that corosync/pacemaker? is hard for proper setup by new admins/users. >>> >>> I'm still trying to remediate the effects of poor configuration at work. >>> Also, storhaug is nice for hyperconverged setups where the host is not only hosting bricks, but? other? workloads. >>> Corosync/pacemaker require proper fencing to be setup and most of the stonith resources 'shoot the other node in the head'. >>> I would be happy to see an easy to deploy (let say 'cluster.enable-ha-ganesha true') and gluster to be bringing up the Floating IPs and taking care of the NFS locks, so no disruption will be felt by the clients. >>> >>> Still, this will be a lot of work to achieve. >>> >>> Best Regards, >>> Strahil NikolovOn Apr 30, 2019 15:19, Jim Kinney wrote: >>>> +1! >>>> I'm using nfs-ganesha in my next upgrade so my client systems can use NFS instead of fuse mounts. Having an integrated, designed in process to coordinate multiple nodes into an HA cluster will very welcome. >>>> >>>> On April 30, 2019 3:20:11 AM EDT, Jiffin Tony Thottan wrote: >>>>> Hi all, >>>>> >>>>> Some of you folks may be familiar with HA solution provided for nfs-ganesha by gluster using pacemaker and corosync. >>>>> >>>>> That feature was removed in glusterfs 3.10 in favour for common HA project "Storhaug". Even Storhaug was not progressed >>>>> >>>>> much from last two years and current development is in halt state, hence planning to restore old HA ganesha solution back >>>>> >>>>> to gluster code repository with some improvement and targetting for next gluster release 7. >>>>> >>>>> I have opened up an issue [1] with details and posted initial set of patches [2] >>>>> >>>>> Please share your thoughts on the same >>>>> >>>>> Regards, >>>>> >>>>> Jiffin >>>>> >>>>> [1] https://github.com/gluster/glusterfs/issues/663 >>>>> >>>>> [2] https://review.gluster.org/#/q/topic:rfc-663+(status:open+OR+status:merged) >>>> -- >>>> Sent from my Android device with K-9 Mail. All tyopes are thumb related and reflect authenticity. From ndevos at redhat.com Tue May 7 05:35:34 2019 From: ndevos at redhat.com (Niels de Vos) Date: Tue, 7 May 2019 07:35:34 +0200 Subject: [Gluster-users] gluster-block v0.4 is alive! In-Reply-To: References: Message-ID: <20190507053534.GF5209@ndevos-x270> On Thu, May 02, 2019 at 11:04:41PM +0530, Prasanna Kalever wrote: > Hello Gluster folks, > > Gluster-block team is happy to announce the v0.4 release [1]. > > This is the new stable version of gluster-block, lots of new and > exciting features and interesting bug fixes are made available as part > of this release. > Please find the big list of release highlights and notable fixes at [2]. > > Details about installation can be found in the easy install guide at > [3]. Find the details about prerequisites and setup guide at [4]. > If you are a new user, checkout the demo video attached in the README > doc [5], which will be a good source of intro to the project. > There are good examples about how to use gluster-block both in the man > pages [6] and test file [7] (also in the README). > > gluster-block is part of fedora package collection, an updated package > with release version v0.4 will be soon made available. And the > community provided packages will be soon made available at [8]. Updates for Fedora are available in the testing repositories: Fedora 30: https://bodhi.fedoraproject.org/updates/FEDORA-2019-76730d7230 Fedora 29: https://bodhi.fedoraproject.org/updates/FEDORA-2019-cc7cdce2a4 Fedora 28: https://bodhi.fedoraproject.org/updates/FEDORA-2019-9e9a210110 Installation instructions can be found at the above links. Please leave testing feedback as comments on the Fedora Update pages. Thanks, Niels > Please spend a minute to report any kind of issue that comes to your > notice with this handy link [9]. > We look forward to your feedback, which will help gluster-block get better! > > We would like to thank all our users, contributors for bug filing and > fixes, also the whole team who involved in the huge effort with > pre-release testing. > > > [1] https://github.com/gluster/gluster-block > [2] https://github.com/gluster/gluster-block/releases > [3] https://github.com/gluster/gluster-block/blob/master/INSTALL > [4] https://github.com/gluster/gluster-block#usage > [5] https://github.com/gluster/gluster-block/blob/master/README.md > [6] https://github.com/gluster/gluster-block/tree/master/docs > [7] https://github.com/gluster/gluster-block/blob/master/tests/basic.t > [8] https://download.gluster.org/pub/gluster/gluster-block/ > [9] https://github.com/gluster/gluster-block/issues/new > > Cheers, > Team Gluster-Block! > _______________________________________________ > Gluster-devel mailing list > Gluster-devel at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-devel From ravishankar at redhat.com Tue May 7 06:25:07 2019 From: ravishankar at redhat.com (Ravishankar N) Date: Tue, 7 May 2019 11:55:07 +0530 Subject: [Gluster-users] heal: Not able to fetch volfile from glusterd In-Reply-To: <4376d725-a451-7b18-a7a1-c5285c3570b3@zork.pl> References: <4376d725-a451-7b18-a7a1-c5285c3570b3@zork.pl> Message-ID: On 06/05/19 6:43 PM, ?ukasz Michalski wrote: > Hi, > > I have problem resolving split-brain in one of my installations. > > CenOS 7, glusterfs 3.10.12, replica on two nodes: > > [root at ixmed1 iscsi]# gluster volume status cluster > Status of volume: cluster > Gluster process???????????????????????????? TCP Port? RDMA Port > Online? Pid > ------------------------------------------------------------------------------ > > Brick ixmed2:/glusterfs-bricks/cluster/clus > ter???????????????????????????????????????? 49153???? 0 Y 3028 > Brick ixmed1:/glusterfs-bricks/cluster/clus > ter???????????????????????????????????????? 49153???? 0 Y 2917 > Self-heal Daemon on localhost?????????????? N/A?????? N/A Y 112929 > Self-heal Daemon on ixmed2????????????????? N/A?????? N/A Y 57774 > > Task Status of Volume cluster > ------------------------------------------------------------------------------ > > There are no active volume tasks > > When I try to access one file glusterd reports split brain: > > [2019-05-06 12:36:43.785098] E [MSGID: 108008] > [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-cluster-replicate-0: > Failing READ on gfid 2584a0e2-c0fa-4fde-8537-5d5b6a5a4635: split-brain > observed. [Input/output error] > [2019-05-06 12:36:43.787952] E [MSGID: 108008] > [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-cluster-replicate-0: > Failing FGETXATTR on gfid 2584a0e2-c0fa-4fde-8537-5d5b6a5a4635: > split-brain observed. [Input/output error] > [2019-05-06 12:36:43.788778] W [MSGID: 108027] > [afr-common.c:2722:afr_discover_done] 0-cluster-replicate-0: no read > subvols for (null) > [2019-05-06 12:36:43.790123] W [fuse-bridge.c:2254:fuse_readv_cbk] > 0-glusterfs-fuse: 3352501: READ => -1 > gfid=2584a0e2-c0fa-4fde-8537-5d5b6a5a4635 fd=0x7fde0803f390 > (Input/output error) > [2019-05-06 12:36:43.794979] W [fuse-bridge.c:2254:fuse_readv_cbk] > 0-glusterfs-fuse: 3352506: READ => -1 > gfid=2584a0e2-c0fa-4fde-8537-5d5b6a5a4635 fd=0x7fde08215ed0 > (Input/output error) > [2019-05-06 12:36:43.800468] W [fuse-bridge.c:2254:fuse_readv_cbk] > 0-glusterfs-fuse: 3352508: READ => -1 > gfid=2584a0e2-c0fa-4fde-8537-5d5b6a5a4635 fd=0x7fde08215ed0 > (Input/output error) > > The problem is that "gluster volume heal info" hangs for 10 seconds > and returns: > > ??? Not able to fetch volfile from glusterd > ??? Volume heal failed > > glfsheal.log contains: > > [2019-05-06 12:40:25.589879] I [afr.c:94:fix_quorum_options] > 0-cluster-replicate-0: reindeer: incoming qtype = none > [2019-05-06 12:40:25.589967] I [afr.c:116:fix_quorum_options] > 0-cluster-replicate-0: reindeer: quorum_count = 0 > [2019-05-06 12:40:25.593294] W [MSGID: 101174] > [graph.c:361:_log_if_unknown_option] 0-cluster-readdir-ahead: option > 'parallel-readdir' is not recognized > [2019-05-06 12:40:25.593895] I [MSGID: 104045] > [glfs-master.c:91:notify] 0-gfapi: New graph > 69786d65-6431-2d32-3037-3739322d3230 (0) coming up > [2019-05-06 12:40:25.593972] I [MSGID: 114020] [client.c:2352:notify] > 0-cluster-client-0: parent translators are ready, attempting connect > on transport > [2019-05-06 12:40:25.607836] I [MSGID: 114020] [client.c:2352:notify] > 0-cluster-client-1: parent translators are ready, attempting connect > on transport > [2019-05-06 12:40:25.608556] I [rpc-clnt.c:2000:rpc_clnt_reconfig] > 0-cluster-client-0: changing port to 49153 (from 0) > [2019-05-06 12:40:25.618167] I [rpc-clnt.c:2000:rpc_clnt_reconfig] > 0-cluster-client-1: changing port to 49153 (from 0) > [2019-05-06 12:40:25.629595] I [MSGID: 114057] > [client-handshake.c:1451:select_server_supported_programs] > 0-cluster-client-0: Using Program GlusterFS 3.3, Num (1298437), > Version (330) > [2019-05-06 12:40:25.632031] I [MSGID: 114046] > [client-handshake.c:1216:client_setvolume_cbk] 0-cluster-client-0: > Connected to cluster-client-0, attached to remote volume > '/glusterfs-bricks/cluster/cluster'. > [2019-05-06 12:40:25.632100] I [MSGID: 114047] > [client-handshake.c:1227:client_setvolume_cbk] 0-cluster-client-0: > Server and Client lk-version numbers are not same, reopening the fds > [2019-05-06 12:40:25.632263] I [MSGID: 108005] > [afr-common.c:4817:afr_notify] 0-cluster-replicate-0: Subvolume > 'cluster-client-0' came back up; going online. > [2019-05-06 12:40:25.637707] I [MSGID: 114057] > [client-handshake.c:1451:select_server_supported_programs] > 0-cluster-client-1: Using Program GlusterFS 3.3, Num (1298437), > Version (330) > [2019-05-06 12:40:25.639285] I [MSGID: 114046] > [client-handshake.c:1216:client_setvolume_cbk] 0-cluster-client-1: > Connected to cluster-client-1, attached to remote volume > '/glusterfs-bricks/cluster/cluster'. > [2019-05-06 12:40:25.639341] I [MSGID: 114047] > [client-handshake.c:1227:client_setvolume_cbk] 0-cluster-client-1: > Server and Client lk-version numbers are not same, reopening the fds > [2019-05-06 12:40:31.564407] C > [rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired] 0-cluster-client-0: > server 10.0.104.26:49153 has not responded in the last 5 seconds, > disconnecting. > [2019-05-06 12:40:31.565764] C > [rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired] 0-cluster-client-1: > server 10.0.7.26:49153 has not responded in the last 5 seconds, > disconnecting. This seems to be a problem.? Have you changed the value of ping-timeout ? Could you share the output of `gluster volume info`? Does the same issue occur if you try to resolve the split-brain on the gfid 2584a0e2-c0fa-4fde-8537-5d5b6a5a4635 using the |gluster volume heal split-brain |CLI? -Ravi > [2019-05-06 12:40:35.645545] I [MSGID: 114018] > [client.c:2276:client_rpc_notify] 0-cluster-client-0: disconnected > from cluster-client-0. Client process will keep trying to connect to > glusterd until brick's port is available > [2019-05-06 12:40:35.645683] I [socket.c:3534:socket_submit_request] > 0-cluster-client-0: not connected (priv->connected = -1) > [2019-05-06 12:40:35.645755] W [rpc-clnt.c:1693:rpc_clnt_submit] > 0-cluster-client-0: failed to submit rpc-request (XID: 0x7 Program: > GlusterFS 3.3, ProgVers: 330, Proc: 14) to rpc-transport > (cluster-client-0) > [2019-05-06 12:40:35.645807] W [MSGID: 114031] > [client-rpc-fops.c:797:client3_3_statfs_cbk] 0-cluster-client-0: > remote operation failed [Drugi koniec nie jest po??czony] > [2019-05-06 12:40:35.645887] I [socket.c:3534:socket_submit_request] > 0-cluster-client-1: not connected (priv->connected = -1) > [2019-05-06 12:40:35.645918] W [rpc-clnt.c:1693:rpc_clnt_submit] > 0-cluster-client-1: failed to submit rpc-request (XID: 0x7 Program: > GlusterFS 3.3, ProgVers: 330, Proc: 14) to rpc-transport > (cluster-client-1) > [2019-05-06 12:40:35.645955] W [MSGID: 114031] > [client-rpc-fops.c:797:client3_3_statfs_cbk] 0-cluster-client-1: > remote operation failed [Drugi koniec nie jest po??czony] > [2019-05-06 12:40:35.646008] W [MSGID: 109075] > [dht-diskusage.c:44:dht_du_info_cbk] 0-cluster-dht: failed to get disk > info from cluster-replicate-0 [Drugi koniec nie jest po??czony] > [2019-05-06 12:40:35.647846] I [MSGID: 114018] > [client.c:2276:client_rpc_notify] 0-cluster-client-1: disconnected > from cluster-client-1. Client process will keep trying to connect to > glusterd until brick's port is available > [2019-05-06 12:40:35.647895] E [MSGID: 108006] > [afr-common.c:4842:afr_notify] 0-cluster-replicate-0: All subvolumes > are down. Going offline until atleast one of them comes back up. > [2019-05-06 12:40:35.647989] I [MSGID: 108006] > [afr-common.c:4984:afr_local_init] 0-cluster-replicate-0: no > subvolumes up > [2019-05-06 12:40:35.648051] I [MSGID: 108006] > [afr-common.c:4984:afr_local_init] 0-cluster-replicate-0: no > subvolumes up > [2019-05-06 12:40:35.648122] I [MSGID: 104039] > [glfs-resolve.c:902:__glfs_active_subvol] 0-cluster: first lookup on > graph 69786d65-6431-2d32-3037-3739322d3230 (0) failed (Drugi koniec > nie jest po??czony) [Drugi koniec nie jest po??czony] > > "Drugi koniec nie jest po??czony" -> Transport endpoint not connected > > On brick process side there is an connection attempt: > > [2019-05-06 12:40:25.638032] I [addr.c:182:gf_auth] > 0-/glusterfs-bricks/cluster/cluster: allowed = "*", received addr = > "10.0.7.26" > [2019-05-06 12:40:25.638080] I [login.c:111:gf_auth] 0-auth/login: > allowed user names: e2f4c8f4-d040-4856-b6e3-62611fbab0ea > [2019-05-06 12:40:25.638109] I [MSGID: 115029] > [server-handshake.c:695:server_setvolume] 0-cluster-server: accepted > client from > ixmed1-207792-2019/05/06-12:40:25:562982-cluster-client-1-0-0 > (version: 3.10.12) > [2019-05-06 12:40:31.565931] I [MSGID: 115036] > [server.c:559:server_rpc_notify] 0-cluster-server: disconnecting > connection from > ixmed1-207792-2019/05/06-12:40:25:562982-cluster-client-1-0-0 > [2019-05-06 12:40:31.566420] I [MSGID: 101055] > [client_t.c:436:gf_client_unref] 0-cluster-server: Shutting down > connection ixmed1-207792-2019/05/06-12:40:25:562982-cluster-client-1-0-0 > > I am not able to use any heal command because of this problem. > > I have three volumes configured on that nodes. Configuration is > identical and "gluster volume heal" command fails for all of them. > > Can anyone help? > > Thanks, > ?ukasz > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From spisla80 at gmail.com Tue May 7 09:15:52 2019 From: spisla80 at gmail.com (David Spisla) Date: Tue, 7 May 2019 11:15:52 +0200 Subject: [Gluster-users] Brick-Xlators crashes after Set-RO and Read In-Reply-To: References: Message-ID: Hello Vijay, how can I create such a core file? Or will it be created automatically if a gluster process crashes? Maybe you can give me a hint and will try to get a backtrace. Unfortunately this bug is not easy to reproduce because it appears only sometimes. Regards David Spisla Am Mo., 6. Mai 2019 um 19:48 Uhr schrieb Vijay Bellur : > Thank you for the report, David. Do you have core files available on any > of the servers? If yes, would it be possible for you to provide a backtrace. > > Regards, > Vijay > > On Mon, May 6, 2019 at 3:09 AM David Spisla wrote: > >> Hello folks, >> >> we have a client application (runs on Win10) which does some FOPs on a >> gluster volume which is accessed by SMB. >> >> *Scenario 1* is a READ Operation which reads all files successively and >> checks if the files data was correctly copied. While doing this, all brick >> processes crashes and in the logs one have this crash report on every brick >> log: >> >>> CTX_ID:a0359502-2c76-4fee-8cb9-365679dc690e-GRAPH_ID:0-PID:32934-HOST:XX-XXXXX-XX-XX-PC_NAME:shortterm-client-2-RECON_NO:-0, gfid: 00000000-0000-0000-0000-000000000001, req(uid:2000,gid:2000,perm:1,ngrps:1), ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission denied] >>> pending frames: >>> frame : type(0) op(27) >>> frame : type(0) op(40) >>> patchset: git://git.gluster.org/glusterfs.git >>> signal received: 11 >>> time of crash: >>> 2019-04-16 08:32:21 >>> configuration details: >>> argp 1 >>> backtrace 1 >>> dlfcn 1 >>> libpthread 1 >>> llistxattr 1 >>> setfsid 1 >>> spinlock 1 >>> epoll.h 1 >>> xattr.h 1 >>> st_atim.tv_nsec 1 >>> package-string: glusterfs 5.5 >>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f9a5bd4d64c] >>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f9a5bd57d26] >>> /lib64/libc.so.6(+0x361a0)[0x7f9a5af141a0] >>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7f9a4ef0e910] >>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7f9a4ef0b118] >>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7f9a4f1278d6] >>> /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7f9a4f35975b] >>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7f9a4f1203b3] >>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7f9a4ef0b5b2] >>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c] >>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c] >>> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7f9a4e8cf548] >>> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7f9a5bdefc22] >>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7f9a5bd733a5] >>> /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7f9a4e6b7088] >>> /lib64/libpthread.so.0(+0x7569)[0x7f9a5b29f569] >>> /lib64/libc.so.6(clone+0x3f)[0x7f9a5afd69af] >>> >>> *Scenario 2 *The application just SET Read-Only on each file >> sucessively. After the 70th file was set, all the bricks crashes and again, >> one can read this crash report in every brick log: >> >>> >>> >>> [2019-05-02 07:43:39.953591] I [MSGID: 139001] >>> [posix-acl.c:263:posix_acl_log_permit_denied] 0-longterm-access-control: >>> client: >>> CTX_ID:21aa9c75-3a5f-41f9-925b-48e4c80bd24a-GRAPH_ID:0-PID:16325-HOST:XXX-X-X-XXX-PC_NAME:longterm-client-0-RECON_NO:-0, >>> gfid: 00000000-0000-0000-0000-000000000001, >>> req(uid:2000,gid:2000,perm:1,ngrps:1), >>> ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission >>> denied] >>> >>> pending frames: >>> >>> frame : type(0) op(27) >>> >>> patchset: git://git.gluster.org/glusterfs.git >>> >>> signal received: 11 >>> >>> time of crash: >>> >>> 2019-05-02 07:43:39 >>> >>> configuration details: >>> >>> argp 1 >>> >>> backtrace 1 >>> >>> dlfcn 1 >>> >>> libpthread 1 >>> >>> llistxattr 1 >>> >>> setfsid 1 >>> >>> spinlock 1 >>> >>> epoll.h 1 >>> >>> xattr.h 1 >>> >>> st_atim.tv_nsec 1 >>> >>> package-string: glusterfs 5.5 >>> >>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fbb3f0b364c] >>> >>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fbb3f0bdd26] >>> >>> /lib64/libc.so.6(+0x361e0)[0x7fbb3e27a1e0] >>> >>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7fbb32257910] >>> >>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7fbb32254118] >>> >>> >>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7fbb324708d6] >>> >>> >>> /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7fbb326a275b] >>> >>> >>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7fbb324693b3] >>> >>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7fbb322545b2] >>> >>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c] >>> >>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c] >>> >>> >>> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7fbb31c18548] >>> >>> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7fbb3f155c22] >>> >>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7fbb3f0d93a5] >>> >>> >>> /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7fbb31a00088] >>> >>> /lib64/libpthread.so.0(+0x7569)[0x7fbb3e605569] >>> >>> /lib64/libc.so.6(clone+0x3f)[0x7fbb3e33c9ef] >>> >> >> This happens on a 3-Node Gluster v5.5 Cluster on two different volumes. >> But both volumes has the same settings: >> >>> Volume Name: shortterm >>> Type: Replicate >>> Volume ID: 5307e5c5-e8a1-493a-a846-342fb0195dee >>> Status: Started >>> Snapshot Count: 0 >>> Number of Bricks: 1 x 3 = 3 >>> Transport-type: tcp >>> Bricks: >>> Brick1: fs-xxxxx-c1-n1:/gluster/brick4/glusterbrick >>> Brick2: fs-xxxxx-c1-n2:/gluster/brick4/glusterbrick >>> Brick3: fs-xxxxx-c1-n3:/gluster/brick4/glusterbrick >>> Options Reconfigured: >>> storage.reserve: 1 >>> performance.client-io-threads: off >>> nfs.disable: on >>> transport.address-family: inet >>> user.smb: disable >>> features.read-only: off >>> features.worm: off >>> features.worm-file-level: on >>> features.retention-mode: enterprise >>> features.default-retention-period: 120 >>> network.ping-timeout: 10 >>> features.cache-invalidation: on >>> features.cache-invalidation-timeout: 600 >>> performance.nl-cache: on >>> performance.nl-cache-timeout: 600 >>> client.event-threads: 32 >>> server.event-threads: 32 >>> cluster.lookup-optimize: on >>> performance.stat-prefetch: on >>> performance.cache-invalidation: on >>> performance.md-cache-timeout: 600 >>> performance.cache-samba-metadata: on >>> performance.cache-ima-xattrs: on >>> performance.io-thread-count: 64 >>> cluster.use-compound-fops: on >>> performance.cache-size: 512MB >>> performance.cache-refresh-timeout: 10 >>> performance.read-ahead: off >>> performance.write-behind-window-size: 4MB >>> performance.write-behind: on >>> storage.build-pgfid: on >>> features.utime: on >>> storage.ctime: on >>> cluster.quorum-type: fixed >>> cluster.quorum-count: 2 >>> features.bitrot: on >>> features.scrub: Active >>> features.scrub-freq: daily >>> cluster.enable-shared-storage: enable >>> >>> >> Why can this happen to all Brick processes? I don't understand the crash >> report. The FOPs are nothing special and after restart brick processes >> everything works fine and our application was succeed. >> >> Regards >> David Spisla >> >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aspandey at redhat.com Tue May 7 09:19:05 2019 From: aspandey at redhat.com (Ashish Pandey) Date: Tue, 7 May 2019 05:19:05 -0400 (EDT) Subject: [Gluster-users] Meeting Details on footer of the gluster-devel and gluster-user mailing list In-Reply-To: <2029030585.17155612.1557220163425.JavaMail.zimbra@redhat.com> Message-ID: <1839109616.17156274.1557220745006.JavaMail.zimbra@redhat.com> Hi, While we send a mail on gluster-devel or gluster-user mailing list, following content gets auto generated and placed at the end of mail. Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users Gluster-devel mailing list Gluster-devel at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel In the similar way, is it possible to attach meeting schedule and link at the end of every such mails? Like this - Meeting schedule - * APAC friendly hours * Tuesday 14th May 2019 , 11:30AM IST * Bridge: https://bluejeans.com/836554017 * NA/EMEA * Tuesday 7th May 2019 , 01:00 PM EDT * Bridge: https://bluejeans.com/486278655 Or just a link to meeting minutes details?? https://github.com/gluster/community/tree/master/meetings This will help developers and users of the community to know when and where meeting happens and how to attend those meetings. --- Ashish -------------- next part -------------- An HTML attachment was scrubbed... URL: From spisla80 at gmail.com Tue May 7 10:20:05 2019 From: spisla80 at gmail.com (David Spisla) Date: Tue, 7 May 2019 12:20:05 +0200 Subject: [Gluster-users] Hard Failover with Samba and Glusterfs In-Reply-To: References: Message-ID: All answers to this questions are in this bugreport: https://bugzilla.redhat.com/show_bug.cgi?GoAheadAndLogIn=Log%20in&id=1706842 Am Do., 18. Apr. 2019 um 09:21 Uhr schrieb hgichon : > Hi. > > I have a some question about your testing. > > 1. What was the glusterfs version you used in past time? > 2. How about a volume configuration? > 3. Was CTDB vip failed over correctly? If so, Clould you attach > /var/log/samba/glusterfs-volname.win10.ip.log ? > > Best Regards > > - kpkim > > > 2019? 4? 17? (?) ?? 5:02, David Spisla ?? ??: > >> Dear Gluster Community, >> >> I have this setup: 4-Node Glusterfs v5.5 Cluster, using SAMBA/CTDB v4.8 >> to access the volumes (each node has a VIP) >> >> I was testing this failover scenario: >> >> 1. Start Writing 940 GB with small files (64K-100K)from a Win10 Client >> to node1 >> 2. During the write process I hardly shutdown node1 (where the client >> is connect via VIP) by turn off the power >> >> My expectation is, that the write process stops and after a while the >> Win10 Client offers me a Retry, so I can continue the write on different >> node (which has now the VIP of node1). >> In past time I did this observation, but now the system shows a strange >> bahaviour: >> >> The Win10 Client do nothing and the Explorer freezes, in the backend CTDB >> can not perform the failover and throws errors. The glusterd from node2 and >> node3 logs this messages: >> >>> [2019-04-16 14:47:31.828323] W [glusterd-locks.c:795:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x24349) [0x7f1a62fcb349] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x2d950) [0x7f1a62fd4950] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0xe0359) [0x7f1a63087359] ) 0-management: Lock for vol archive1 not held >>> [2019-04-16 14:47:31.828350] W [MSGID: 106117] [glusterd-handler.c:6451:__glusterd_peer_rpc_notify] 0-management: Lock not released for archive1 >>> [2019-04-16 14:47:31.828369] W [glusterd-locks.c:795:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x24349) [0x7f1a62fcb349] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x2d950) [0x7f1a62fd4950] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0xe0359) [0x7f1a63087359] ) 0-management: Lock for vol archive2 not held >>> [2019-04-16 14:47:31.828376] W [MSGID: 106117] [glusterd-handler.c:6451:__glusterd_peer_rpc_notify] 0-management: Lock not released for archive2 >>> [2019-04-16 14:47:31.828412] W [glusterd-locks.c:795:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x24349) [0x7f1a62fcb349] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x2d950) [0x7f1a62fd4950] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0xe0359) [0x7f1a63087359] ) 0-management: Lock for vol gluster_shared_storage not held >>> [2019-04-16 14:47:31.828423] W [MSGID: 106117] [glusterd-handler.c:6451:__glusterd_peer_rpc_notify] 0-management: Lock not released for gluster_shared_storage >>> >>> >> *In my oponion Samba/CTDB can not perform the failover correctly and >> continue the write process because glusterfs didn't released the lock.* >> What do you think? It seems to me like a bug because in past time the >> failover works correctly. >> >> Regards >> David Spisla >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.orth at gmail.com Tue May 7 13:12:06 2019 From: alan.orth at gmail.com (Alan Orth) Date: Tue, 7 May 2019 16:12:06 +0300 Subject: [Gluster-users] "No space left on device" during rebalance with failed brick on Gluster 4.1.7 Message-ID: Dear list, We are using a Distributed-Replicate volume with replica 2 on Gluster 4.1.7 on CentOS 7. One of our nodes died recently and we will add new nodes and bricks to replace it soon. In preparation for the maintenance I wanted to rebalance the volume to make the disk thrashing less intense when we add/remove bricks, but after eight hours of scanning I see millions of "failures" in the rebalance status. The volume rebalance log shows many errors like: [2019-05-07 06:06:02.310843] E [MSGID: 109023] [dht-rebalance.c:2907:gf_defrag_migrate_single_file] 0-data-dht: migrate-data failed for /ilri/miseq/MiSeq2/MiSeq2Output2018/180912_M03021_0002_000000000-BVM95/Thumbnail_Images/L001/C174.1/s_1_2103_c.jpg [No space left on device] The bricks on the healthy nodes all have 1.5TB of free space so I'm not sure what this error means. Could it be because one of the replicas is unavailable? I saw a similar bug report? about that. I've started a simple fix-layout without data migration and it is working fine. Thank you, ? https://access.redhat.com/solutions/456333 -- Alan Orth alan.orth at gmail.com https://picturingjordan.com https://englishbulgaria.net https://mjanja.ch "In heaven all the interesting people are missing." ?Friedrich Nietzsche -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.orth at gmail.com Tue May 7 13:51:23 2019 From: alan.orth at gmail.com (Alan Orth) Date: Tue, 7 May 2019 16:51:23 +0300 Subject: [Gluster-users] "No space left on device" during rebalance with failed brick on Gluster 4.1.7 In-Reply-To: References: Message-ID: Dear list, After looking at my rebalance log more I saw another message that helped me solve the problem: [2019-05-06 22:46:01.074035] W [MSGID: 0] [dht-rebalance.c:1075:__dht_check_free_space] 0-data-dht: Write will cross min-free-disk for file - /ilri/miseq/MiSeq1/MiSeq1Output_2014/140624_M01601_0035_000000000-A6L82/Data/TileStatus/TileStatusL1T1114.tpl on subvol - data-replicate-1. Looking for new subvol I have not test cluster.min-free-disk, but it appears that the default value for is 10%, and my bricks are at about 97% capacity so the "No space error" in my previous message makes sense. I reduced the cluster.min-free-disk to 2% and restarted the data rebalance and now I see that it is already rebalancing files. The issue is solved. Sorry about that! Thank you, On Tue, May 7, 2019 at 4:12 PM Alan Orth wrote: > Dear list, > > We are using a Distributed-Replicate volume with replica 2 on Gluster > 4.1.7 on CentOS 7. One of our nodes died recently and we will add new nodes > and bricks to replace it soon. In preparation for the maintenance I wanted > to rebalance the volume to make the disk thrashing less intense when we > add/remove bricks, but after eight hours of scanning I see millions of > "failures" in the rebalance status. The volume rebalance log shows many > errors like: > > [2019-05-07 06:06:02.310843] E [MSGID: 109023] > [dht-rebalance.c:2907:gf_defrag_migrate_single_file] 0-data-dht: > migrate-data failed for > /ilri/miseq/MiSeq2/MiSeq2Output2018/180912_M03021_0002_000000000-BVM95/Thumbnail_Images/L001/C174.1/s_1_2103_c.jpg > [No space left on device] > > The bricks on the healthy nodes all have 1.5TB of free space so I'm not > sure what this error means. Could it be because one of the replicas is > unavailable? I saw a similar bug report? about that. I've started a simple > fix-layout without data migration and it is working fine. > > Thank you, > > ? https://access.redhat.com/solutions/456333 > -- > Alan Orth > alan.orth at gmail.com > https://picturingjordan.com > https://englishbulgaria.net > https://mjanja.ch > "In heaven all the interesting people are missing." ?Friedrich Nietzsche > -- Alan Orth alan.orth at gmail.com https://picturingjordan.com https://englishbulgaria.net https://mjanja.ch "In heaven all the interesting people are missing." ?Friedrich Nietzsche -------------- next part -------------- An HTML attachment was scrubbed... URL: From lm at zork.pl Tue May 7 14:01:03 2019 From: lm at zork.pl (=?UTF-8?Q?=c5=81ukasz_Michalski?=) Date: Tue, 7 May 2019 16:01:03 +0200 Subject: [Gluster-users] heal: Not able to fetch volfile from glusterd In-Reply-To: References: <4376d725-a451-7b18-a7a1-c5285c3570b3@zork.pl> Message-ID: > > Does the same issue occur if you try to resolve the split-brain on the > gfid 2584a0e2-c0fa-4fde-8537-5d5b6a5a4635 using the |gluster volume heal > split-brain |CLI? > Many thanks for responding! gluster volume info: Volume Name: cluster Type: Replicate Volume ID: 8787d95e-8e66-4476-a990-4e27fc47c765 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: ixmed2:/glusterfs-bricks/cluster/cluster Brick2: ixmed1:/glusterfs-bricks/cluster/cluster Options Reconfigured: network.ping-timeout: 5 user.smb: disable transport.address-family: inet nfs.disable: on The problem was in network.ping-timeout set to 5 seconds. It is set for such a short value to prevent smb session from disconnecting when one node goes offline. It seems that for split-brain resolution and management I have to temporarily set this value to 30 seconds or more. Regards, ?ukasz From vbellur at redhat.com Tue May 7 18:08:13 2019 From: vbellur at redhat.com (Vijay Bellur) Date: Tue, 7 May 2019 11:08:13 -0700 Subject: [Gluster-users] Brick-Xlators crashes after Set-RO and Read In-Reply-To: References: Message-ID: Hello David, On Tue, May 7, 2019 at 2:16 AM David Spisla wrote: > Hello Vijay, > > how can I create such a core file? Or will it be created automatically if > a gluster process crashes? > Maybe you can give me a hint and will try to get a backtrace. > Generation of core file is dependent on the system configuration. `man 5 core` contains useful information to generate a core file in a directory. Once a core file is generated, you can use gdb to get a backtrace of all threads (using "thread apply all bt full"). > Unfortunately this bug is not easy to reproduce because it appears only > sometimes. > If the bug is not easy to reproduce, having a backtrace from the generated core would be very useful! Thanks, Vijay > > Regards > David Spisla > > Am Mo., 6. Mai 2019 um 19:48 Uhr schrieb Vijay Bellur >: > >> Thank you for the report, David. Do you have core files available on any >> of the servers? If yes, would it be possible for you to provide a backtrace. >> >> Regards, >> Vijay >> >> On Mon, May 6, 2019 at 3:09 AM David Spisla wrote: >> >>> Hello folks, >>> >>> we have a client application (runs on Win10) which does some FOPs on a >>> gluster volume which is accessed by SMB. >>> >>> *Scenario 1* is a READ Operation which reads all files successively and >>> checks if the files data was correctly copied. While doing this, all brick >>> processes crashes and in the logs one have this crash report on every brick >>> log: >>> >>>> CTX_ID:a0359502-2c76-4fee-8cb9-365679dc690e-GRAPH_ID:0-PID:32934-HOST:XX-XXXXX-XX-XX-PC_NAME:shortterm-client-2-RECON_NO:-0, gfid: 00000000-0000-0000-0000-000000000001, req(uid:2000,gid:2000,perm:1,ngrps:1), ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission denied] >>>> pending frames: >>>> frame : type(0) op(27) >>>> frame : type(0) op(40) >>>> patchset: git://git.gluster.org/glusterfs.git >>>> signal received: 11 >>>> time of crash: >>>> 2019-04-16 08:32:21 >>>> configuration details: >>>> argp 1 >>>> backtrace 1 >>>> dlfcn 1 >>>> libpthread 1 >>>> llistxattr 1 >>>> setfsid 1 >>>> spinlock 1 >>>> epoll.h 1 >>>> xattr.h 1 >>>> st_atim.tv_nsec 1 >>>> package-string: glusterfs 5.5 >>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f9a5bd4d64c] >>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f9a5bd57d26] >>>> /lib64/libc.so.6(+0x361a0)[0x7f9a5af141a0] >>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7f9a4ef0e910] >>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7f9a4ef0b118] >>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7f9a4f1278d6] >>>> /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7f9a4f35975b] >>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7f9a4f1203b3] >>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7f9a4ef0b5b2] >>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c] >>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c] >>>> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7f9a4e8cf548] >>>> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7f9a5bdefc22] >>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7f9a5bd733a5] >>>> /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7f9a4e6b7088] >>>> /lib64/libpthread.so.0(+0x7569)[0x7f9a5b29f569] >>>> /lib64/libc.so.6(clone+0x3f)[0x7f9a5afd69af] >>>> >>>> *Scenario 2 *The application just SET Read-Only on each file >>> sucessively. After the 70th file was set, all the bricks crashes and again, >>> one can read this crash report in every brick log: >>> >>>> >>>> >>>> [2019-05-02 07:43:39.953591] I [MSGID: 139001] >>>> [posix-acl.c:263:posix_acl_log_permit_denied] 0-longterm-access-control: >>>> client: >>>> CTX_ID:21aa9c75-3a5f-41f9-925b-48e4c80bd24a-GRAPH_ID:0-PID:16325-HOST:XXX-X-X-XXX-PC_NAME:longterm-client-0-RECON_NO:-0, >>>> gfid: 00000000-0000-0000-0000-000000000001, >>>> req(uid:2000,gid:2000,perm:1,ngrps:1), >>>> ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission >>>> denied] >>>> >>>> pending frames: >>>> >>>> frame : type(0) op(27) >>>> >>>> patchset: git://git.gluster.org/glusterfs.git >>>> >>>> signal received: 11 >>>> >>>> time of crash: >>>> >>>> 2019-05-02 07:43:39 >>>> >>>> configuration details: >>>> >>>> argp 1 >>>> >>>> backtrace 1 >>>> >>>> dlfcn 1 >>>> >>>> libpthread 1 >>>> >>>> llistxattr 1 >>>> >>>> setfsid 1 >>>> >>>> spinlock 1 >>>> >>>> epoll.h 1 >>>> >>>> xattr.h 1 >>>> >>>> st_atim.tv_nsec 1 >>>> >>>> package-string: glusterfs 5.5 >>>> >>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fbb3f0b364c] >>>> >>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fbb3f0bdd26] >>>> >>>> /lib64/libc.so.6(+0x361e0)[0x7fbb3e27a1e0] >>>> >>>> >>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7fbb32257910] >>>> >>>> >>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7fbb32254118] >>>> >>>> >>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7fbb324708d6] >>>> >>>> >>>> /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7fbb326a275b] >>>> >>>> >>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7fbb324693b3] >>>> >>>> >>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7fbb322545b2] >>>> >>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c] >>>> >>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c] >>>> >>>> >>>> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7fbb31c18548] >>>> >>>> >>>> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7fbb3f155c22] >>>> >>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7fbb3f0d93a5] >>>> >>>> >>>> /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7fbb31a00088] >>>> >>>> /lib64/libpthread.so.0(+0x7569)[0x7fbb3e605569] >>>> >>>> /lib64/libc.so.6(clone+0x3f)[0x7fbb3e33c9ef] >>>> >>> >>> This happens on a 3-Node Gluster v5.5 Cluster on two different volumes. >>> But both volumes has the same settings: >>> >>>> Volume Name: shortterm >>>> Type: Replicate >>>> Volume ID: 5307e5c5-e8a1-493a-a846-342fb0195dee >>>> Status: Started >>>> Snapshot Count: 0 >>>> Number of Bricks: 1 x 3 = 3 >>>> Transport-type: tcp >>>> Bricks: >>>> Brick1: fs-xxxxx-c1-n1:/gluster/brick4/glusterbrick >>>> Brick2: fs-xxxxx-c1-n2:/gluster/brick4/glusterbrick >>>> Brick3: fs-xxxxx-c1-n3:/gluster/brick4/glusterbrick >>>> Options Reconfigured: >>>> storage.reserve: 1 >>>> performance.client-io-threads: off >>>> nfs.disable: on >>>> transport.address-family: inet >>>> user.smb: disable >>>> features.read-only: off >>>> features.worm: off >>>> features.worm-file-level: on >>>> features.retention-mode: enterprise >>>> features.default-retention-period: 120 >>>> network.ping-timeout: 10 >>>> features.cache-invalidation: on >>>> features.cache-invalidation-timeout: 600 >>>> performance.nl-cache: on >>>> performance.nl-cache-timeout: 600 >>>> client.event-threads: 32 >>>> server.event-threads: 32 >>>> cluster.lookup-optimize: on >>>> performance.stat-prefetch: on >>>> performance.cache-invalidation: on >>>> performance.md-cache-timeout: 600 >>>> performance.cache-samba-metadata: on >>>> performance.cache-ima-xattrs: on >>>> performance.io-thread-count: 64 >>>> cluster.use-compound-fops: on >>>> performance.cache-size: 512MB >>>> performance.cache-refresh-timeout: 10 >>>> performance.read-ahead: off >>>> performance.write-behind-window-size: 4MB >>>> performance.write-behind: on >>>> storage.build-pgfid: on >>>> features.utime: on >>>> storage.ctime: on >>>> cluster.quorum-type: fixed >>>> cluster.quorum-count: 2 >>>> features.bitrot: on >>>> features.scrub: Active >>>> features.scrub-freq: daily >>>> cluster.enable-shared-storage: enable >>>> >>>> >>> Why can this happen to all Brick processes? I don't understand the crash >>> report. The FOPs are nothing special and after restart brick processes >>> everything works fine and our application was succeed. >>> >>> Regards >>> David Spisla >>> >>> >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From rabhat at redhat.com Tue May 7 18:14:33 2019 From: rabhat at redhat.com (FNU Raghavendra Manjunath) Date: Tue, 7 May 2019 14:14:33 -0400 Subject: [Gluster-users] Meeting Details on footer of the gluster-devel and gluster-user mailing list In-Reply-To: <1839109616.17156274.1557220745006.JavaMail.zimbra@redhat.com> References: <2029030585.17155612.1557220163425.JavaMail.zimbra@redhat.com> <1839109616.17156274.1557220745006.JavaMail.zimbra@redhat.com> Message-ID: + 1 to this. There is also one more thing. For some reason, the community meeting is not visible in my calendar (especially NA region). I am not sure if anyone else also facing this issue. Regards, Raghavendra On Tue, May 7, 2019 at 5:19 AM Ashish Pandey wrote: > Hi, > > While we send a mail on gluster-devel or gluster-user mailing list, > following content gets auto generated and placed at the end of mail. > > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > > Gluster-devel mailing list > Gluster-devel at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-devel > > In the similar way, is it possible to attach meeting schedule and link at the end of every such mails? > Like this - > > Meeting schedule - > > > - APAC friendly hours > - Tuesday 14th May 2019, 11:30AM IST > - Bridge: https://bluejeans.com/836554017 > - NA/EMEA > - Tuesday 7th May 2019, 01:00 PM EDT > - Bridge: https://bluejeans.com/486278655 > > Or just a link to meeting minutes details?? > https://github.com/gluster/community/tree/master/meetings > > This will help developers and users of the community to know when and where meeting happens and how to attend those meetings. > > --- > Ashish > > > > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From vbellur at redhat.com Tue May 7 18:37:27 2019 From: vbellur at redhat.com (Vijay Bellur) Date: Tue, 7 May 2019 11:37:27 -0700 Subject: [Gluster-users] Meeting Details on footer of the gluster-devel and gluster-user mailing list In-Reply-To: References: <2029030585.17155612.1557220163425.JavaMail.zimbra@redhat.com> <1839109616.17156274.1557220745006.JavaMail.zimbra@redhat.com> Message-ID: On Tue, May 7, 2019 at 11:15 AM FNU Raghavendra Manjunath wrote: > > + 1 to this. > I have updated the footer of gluster-devel. If that looks ok, we can extend it to gluster-users too. In case of a month with 5 Tuesdays, we can skip the 5th Tuesday and always stick to the first 4 Tuesdays of every month. That will help in describing the community meeting schedule better. If we want to keep the schedule running on alternate Tuesdays, please let me know and the mailing list footers can be updated accordingly :-). > There is also one more thing. For some reason, the community meeting is > not visible in my calendar (especially NA region). I am not sure if anyone > else also facing this issue. > I did face this issue. Realized that we had a meeting today and showed up at the meeting a while later but did not see many participants. Perhaps, the calendar invite has to be made a recurring one. Thanks, Vijay > > Regards, > Raghavendra > > On Tue, May 7, 2019 at 5:19 AM Ashish Pandey wrote: > >> Hi, >> >> While we send a mail on gluster-devel or gluster-user mailing list, >> following content gets auto generated and placed at the end of mail. >> >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> Gluster-devel mailing list >> Gluster-devel at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-devel >> >> In the similar way, is it possible to attach meeting schedule and link at the end of every such mails? >> Like this - >> >> Meeting schedule - >> >> >> - APAC friendly hours >> - Tuesday 14th May 2019, 11:30AM IST >> - Bridge: https://bluejeans.com/836554017 >> - NA/EMEA >> - Tuesday 7th May 2019, 01:00 PM EDT >> - Bridge: https://bluejeans.com/486278655 >> >> Or just a link to meeting minutes details?? >> https://github.com/gluster/community/tree/master/meetings >> >> This will help developers and users of the community to know when and where meeting happens and how to attend those meetings. >> >> --- >> Ashish >> >> >> >> >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From amukherj at redhat.com Wed May 8 04:15:10 2019 From: amukherj at redhat.com (Atin Mukherjee) Date: Wed, 8 May 2019 09:45:10 +0530 Subject: [Gluster-users] [Gluster-devel] Meeting Details on footer of the gluster-devel and gluster-user mailing list In-Reply-To: References: <2029030585.17155612.1557220163425.JavaMail.zimbra@redhat.com> <1839109616.17156274.1557220745006.JavaMail.zimbra@redhat.com> Message-ID: On Wed, May 8, 2019 at 12:08 AM Vijay Bellur wrote: > > > On Tue, May 7, 2019 at 11:15 AM FNU Raghavendra Manjunath < > rabhat at redhat.com> wrote: > >> >> + 1 to this. >> > > I have updated the footer of gluster-devel. If that looks ok, we can > extend it to gluster-users too. > > In case of a month with 5 Tuesdays, we can skip the 5th Tuesday and always > stick to the first 4 Tuesdays of every month. That will help in describing > the community meeting schedule better. If we want to keep the schedule > running on alternate Tuesdays, please let me know and the mailing list > footers can be updated accordingly :-). > > >> There is also one more thing. For some reason, the community meeting is >> not visible in my calendar (especially NA region). I am not sure if anyone >> else also facing this issue. >> > > I did face this issue. Realized that we had a meeting today and showed up > at the meeting a while later but did not see many participants. Perhaps, > the calendar invite has to be made a recurring one. > We'd need to explicitly import the invite and add it to our calendar, otherwise it doesn't reflect. > Thanks, > Vijay > > >> >> Regards, >> Raghavendra >> >> On Tue, May 7, 2019 at 5:19 AM Ashish Pandey wrote: >> >>> Hi, >>> >>> While we send a mail on gluster-devel or gluster-user mailing list, >>> following content gets auto generated and placed at the end of mail. >>> >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> Gluster-devel mailing list >>> Gluster-devel at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-devel >>> >>> In the similar way, is it possible to attach meeting schedule and link at the end of every such mails? >>> Like this - >>> >>> Meeting schedule - >>> >>> >>> - APAC friendly hours >>> - Tuesday 14th May 2019, 11:30AM IST >>> - Bridge: https://bluejeans.com/836554017 >>> - NA/EMEA >>> - Tuesday 7th May 2019, 01:00 PM EDT >>> - Bridge: https://bluejeans.com/486278655 >>> >>> Or just a link to meeting minutes details?? >>> https://github.com/gluster/community/tree/master/meetings >>> >>> This will help developers and users of the community to know when and where meeting happens and how to attend those meetings. >>> >>> --- >>> Ashish >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > _______________________________________________ > > Community Meeting Calendar: > > APAC Schedule - > Every 2nd and 4th Tuesday at 11:30 AM IST > Bridge: https://bluejeans.com/836554017 > > NA/EMEA Schedule - > Every 1st and 3rd Tuesday at 01:00 PM EDT > Bridge: https://bluejeans.com/486278655 > > Gluster-devel mailing list > Gluster-devel at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-devel > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From amukherj at redhat.com Wed May 8 04:16:47 2019 From: amukherj at redhat.com (Atin Mukherjee) Date: Wed, 8 May 2019 09:46:47 +0530 Subject: [Gluster-users] [Gluster-devel] Meeting Details on footer of the gluster-devel and gluster-user mailing list In-Reply-To: References: <2029030585.17155612.1557220163425.JavaMail.zimbra@redhat.com> <1839109616.17156274.1557220745006.JavaMail.zimbra@redhat.com> Message-ID: On Wed, May 8, 2019 at 9:45 AM Atin Mukherjee wrote: > > > On Wed, May 8, 2019 at 12:08 AM Vijay Bellur wrote: > >> >> >> On Tue, May 7, 2019 at 11:15 AM FNU Raghavendra Manjunath < >> rabhat at redhat.com> wrote: >> >>> >>> + 1 to this. >>> >> >> I have updated the footer of gluster-devel. If that looks ok, we can >> extend it to gluster-users too. >> >> In case of a month with 5 Tuesdays, we can skip the 5th Tuesday and >> always stick to the first 4 Tuesdays of every month. That will help in >> describing the community meeting schedule better. If we want to keep the >> schedule running on alternate Tuesdays, please let me know and the mailing >> list footers can be updated accordingly :-). >> >> >>> There is also one more thing. For some reason, the community meeting is >>> not visible in my calendar (especially NA region). I am not sure if anyone >>> else also facing this issue. >>> >> >> I did face this issue. Realized that we had a meeting today and showed up >> at the meeting a while later but did not see many participants. Perhaps, >> the calendar invite has to be made a recurring one. >> > > We'd need to explicitly import the invite and add it to our calendar, > otherwise it doesn't reflect. > And you're right that the last series wasn't a recurring one either. > >> Thanks, >> Vijay >> >> >>> >>> Regards, >>> Raghavendra >>> >>> On Tue, May 7, 2019 at 5:19 AM Ashish Pandey >>> wrote: >>> >>>> Hi, >>>> >>>> While we send a mail on gluster-devel or gluster-user mailing list, >>>> following content gets auto generated and placed at the end of mail. >>>> >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>> >>>> Gluster-devel mailing list >>>> Gluster-devel at gluster.org >>>> https://lists.gluster.org/mailman/listinfo/gluster-devel >>>> >>>> In the similar way, is it possible to attach meeting schedule and link at the end of every such mails? >>>> Like this - >>>> >>>> Meeting schedule - >>>> >>>> >>>> - APAC friendly hours >>>> - Tuesday 14th May 2019, 11:30AM IST >>>> - Bridge: https://bluejeans.com/836554017 >>>> - NA/EMEA >>>> - Tuesday 7th May 2019, 01:00 PM EDT >>>> - Bridge: https://bluejeans.com/486278655 >>>> >>>> Or just a link to meeting minutes details?? >>>> https://github.com/gluster/community/tree/master/meetings >>>> >>>> This will help developers and users of the community to know when and where meeting happens and how to attend those meetings. >>>> >>>> --- >>>> Ashish >>>> >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> _______________________________________________ >> >> Community Meeting Calendar: >> >> APAC Schedule - >> Every 2nd and 4th Tuesday at 11:30 AM IST >> Bridge: https://bluejeans.com/836554017 >> >> NA/EMEA Schedule - >> Every 1st and 3rd Tuesday at 01:00 PM EDT >> Bridge: https://bluejeans.com/486278655 >> >> Gluster-devel mailing list >> Gluster-devel at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-devel >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndevos at redhat.com Wed May 8 07:08:08 2019 From: ndevos at redhat.com (Niels de Vos) Date: Wed, 8 May 2019 09:08:08 +0200 Subject: [Gluster-users] Meeting Details on footer of the gluster-devel and gluster-user mailing list In-Reply-To: References: <2029030585.17155612.1557220163425.JavaMail.zimbra@redhat.com> <1839109616.17156274.1557220745006.JavaMail.zimbra@redhat.com> Message-ID: <20190508070808.GA22482@ndevos-x270> On Tue, May 07, 2019 at 11:37:27AM -0700, Vijay Bellur wrote: > On Tue, May 7, 2019 at 11:15 AM FNU Raghavendra Manjunath > wrote: > > > > > + 1 to this. > > > > I have updated the footer of gluster-devel. If that looks ok, we can extend > it to gluster-users too. > > In case of a month with 5 Tuesdays, we can skip the 5th Tuesday and always > stick to the first 4 Tuesdays of every month. That will help in describing > the community meeting schedule better. If we want to keep the schedule > running on alternate Tuesdays, please let me know and the mailing list > footers can be updated accordingly :-). > > > > There is also one more thing. For some reason, the community meeting is > > not visible in my calendar (especially NA region). I am not sure if anyone > > else also facing this issue. > > > > I did face this issue. Realized that we had a meeting today and showed up > at the meeting a while later but did not see many participants. Perhaps, > the calendar invite has to be made a recurring one. Maybe a new invite can be sent with the minutes after a meeting has finished. This makes it easier for people that recently subscribed to the list to add it to their calendar? Niels > > Thanks, > Vijay > > > > > > Regards, > > Raghavendra > > > > On Tue, May 7, 2019 at 5:19 AM Ashish Pandey wrote: > > > >> Hi, > >> > >> While we send a mail on gluster-devel or gluster-user mailing list, > >> following content gets auto generated and placed at the end of mail. > >> > >> Gluster-users mailing list > >> Gluster-users at gluster.org > >> https://lists.gluster.org/mailman/listinfo/gluster-users > >> > >> Gluster-devel mailing list > >> Gluster-devel at gluster.org > >> https://lists.gluster.org/mailman/listinfo/gluster-devel > >> > >> In the similar way, is it possible to attach meeting schedule and link at the end of every such mails? > >> Like this - > >> > >> Meeting schedule - > >> > >> > >> - APAC friendly hours > >> - Tuesday 14th May 2019, 11:30AM IST > >> - Bridge: https://bluejeans.com/836554017 > >> - NA/EMEA > >> - Tuesday 7th May 2019, 01:00 PM EDT > >> - Bridge: https://bluejeans.com/486278655 > >> > >> Or just a link to meeting minutes details?? > >> https://github.com/gluster/community/tree/master/meetings > >> > >> This will help developers and users of the community to know when and where meeting happens and how to attend those meetings. > >> > >> --- > >> Ashish > >> > >> > >> > >> > >> > >> > >> _______________________________________________ > >> Gluster-users mailing list > >> Gluster-users at gluster.org > >> https://lists.gluster.org/mailman/listinfo/gluster-users > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users From vbellur at redhat.com Wed May 8 07:31:37 2019 From: vbellur at redhat.com (Vijay Bellur) Date: Wed, 8 May 2019 00:31:37 -0700 Subject: [Gluster-users] Meeting Details on footer of the gluster-devel and gluster-user mailing list In-Reply-To: <20190508070808.GA22482@ndevos-x270> References: <2029030585.17155612.1557220163425.JavaMail.zimbra@redhat.com> <1839109616.17156274.1557220745006.JavaMail.zimbra@redhat.com> <20190508070808.GA22482@ndevos-x270> Message-ID: On Wed, May 8, 2019 at 12:08 AM Niels de Vos wrote: > On Tue, May 07, 2019 at 11:37:27AM -0700, Vijay Bellur wrote: > > On Tue, May 7, 2019 at 11:15 AM FNU Raghavendra Manjunath < > rabhat at redhat.com> > > wrote: > > > > > > > > + 1 to this. > > > > > > > I have updated the footer of gluster-devel. If that looks ok, we can > extend > > it to gluster-users too. > > > > In case of a month with 5 Tuesdays, we can skip the 5th Tuesday and > always > > stick to the first 4 Tuesdays of every month. That will help in > describing > > the community meeting schedule better. If we want to keep the schedule > > running on alternate Tuesdays, please let me know and the mailing list > > footers can be updated accordingly :-). > > > > > > > There is also one more thing. For some reason, the community meeting is > > > not visible in my calendar (especially NA region). I am not sure if > anyone > > > else also facing this issue. > > > > > > > I did face this issue. Realized that we had a meeting today and showed up > > at the meeting a while later but did not see many participants. Perhaps, > > the calendar invite has to be made a recurring one. > > Maybe a new invite can be sent with the minutes after a meeting has > finished. This makes it easier for people that recently subscribed to > the list to add it to their calendar? > > > That is a good point. I have observed in google groups based mailing lists that a calendar invite for a recurring event is sent automatically to people after they subscribe to the list. I don't think mailman has a similar feature yet. Thanks, Vijay -------------- next part -------------- An HTML attachment was scrubbed... URL: From spisla80 at gmail.com Thu May 9 12:52:52 2019 From: spisla80 at gmail.com (David Spisla) Date: Thu, 9 May 2019 14:52:52 +0200 Subject: [Gluster-users] Create Gluster RPMs on a SLES15 machine Message-ID: Hello Kaleb, i am trying to create my own Gluster v5.5 RPMs for SLES15 and I am using a SLES15 system to create them. I got the following error message: rpmbuild --define '_topdir /home/davids/glusterfs/extras/LinuxRPM/rpmbuild' > --with gnfs -bb rpmbuild/SPECS/glusterfs.spec > warning: bogus date in %changelog: Tue Apr 17 2019 kkeithle at redhat.com > warning: bogus date in %changelog: Fri Sep 19 2018 kkeithle at redhat.com > error: Failed build dependencies: > rpcgen is needed by glusterfs-5.5-100.x86_64 > make: *** [Makefile:579: rpms] Error 1 > > In the corresponding glusterfs.spec file (branch sles15-glusterfs-5 in Repo glusterfs-suse) there is rpcgen listed as dependency. But unfortunately there is no rpcgen package provided on SLES15. Or with other words: I did only find RPMs for other SUSE distributions, but not for SLES15. Do you know that isssue? What is the name of the distribution which you are using to create Packages for SLES15? Regards David Spisla -------------- next part -------------- An HTML attachment was scrubbed... URL: From spisla80 at gmail.com Thu May 9 14:12:03 2019 From: spisla80 at gmail.com (David Spisla) Date: Thu, 9 May 2019 16:12:03 +0200 Subject: [Gluster-users] Improve stability between SMB/CTDB and Gluster (together with Samba Core Developer) Message-ID: Dear Gluster Community, at the moment we are improving the stability of SMB/CTDB and Gluster. For this purpose we are working together with an advanced SAMBA Core Developer. He did some debugging but needs more information about Gluster Core Behaviour. *Would any of the Gluster Developer wants to have a online conference with him and me?* I would organize everything. In my opinion this is a good chance to improve stability of Glusterfs and this is at the moment one of the major issues in the Community. Regards David Spisla -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Fri May 10 10:36:38 2019 From: hunter86_bg at yahoo.com (Strahil) Date: Fri, 10 May 2019 13:36:38 +0300 Subject: [Gluster-users] Advice needed for network change. Message-ID: <4ie49rarxdk9rx1mer4ol71w.1557484034416@email.android.com> Hello Community, I'm making some changes and I would like to hear? your opinion on the topic. First, let me share my setup. I have 3 systems with in a replica 3 arbiter 1 hyperconverged setup (oVirt) which use 1 gbit networks for any connectivity. I have added 4 dual-port 1 gbit NICs ( 8 ports per machine in total) and connected them directly between ovirt1 and ovirt2 /data nodes/ with LACP aggregation (layer3+layer4 hashing). As ovirt1 & ovirt2 are directly connected /trying to reduce costs by avoiding the switch/ I have setup /etc/hosts for the arbiter /ovirt3/ to point tothe old IPs . So they look like: ovirt1 & ovirt2 /data nodes/ /etc/hosts: 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.1.90 ovirt1.localdomain ovirt1 192.168.1.64 ovirt2.localdomain ovirt2 192.168.1.41 ovirt3.localdomain ovirt3 10.10.10.1 gluster1.localdomain gluster1 10.10.10.2 gluster2.localdomain gluster2 ovirt3 /etc/hosts: 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 #As gluster1 & gluster2 are directly connected , we cannot reach them. 192.168.1.90 ovirt1.localdomain ovirt1 gluster1 192.168.1.64 ovirt2.localdomain ovirt2 gluster2 192.168.1.41 ovirt3.localdomain ovirt3 Do you see any obstacles to 'peer probe' and then 'replace brick' the 2 data nodes. Downtime is not an issue, but I preffer not to wipe the setup. Thanks for reading this long post and don't hesitate to recommend any tunings. I am still considering what values to put for the client/server thread count. Best Regards, Strahil Nikolov -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkeithle at redhat.com Fri May 10 14:24:15 2019 From: kkeithle at redhat.com (Kaleb Keithley) Date: Fri, 10 May 2019 10:24:15 -0400 Subject: [Gluster-users] Create Gluster RPMs on a SLES15 machine In-Reply-To: References: Message-ID: Seems I accidentally omitted gluster-users in my first reply. On Thu, May 9, 2019 at 3:19 PM Kaleb Keithley wrote: > On Thu, May 9, 2019 at 8:53 AM David Spisla wrote: > >> Hello Kaleb, >> >> i am trying to create my own Gluster v5.5 RPMs for SLES15 and I am using >> a SLES15 system to create them. I got the following error message: >> >> rpmbuild --define '_topdir >>> /home/davids/glusterfs/extras/LinuxRPM/rpmbuild' --with gnfs -bb >>> rpmbuild/SPECS/glusterfs.spec >>> warning: bogus date in %changelog: Tue Apr 17 2019 kkeithle at >>> redhat.com >>> warning: bogus date in %changelog: Fri Sep 19 2018 kkeithle at >>> redhat.com >>> error: Failed build dependencies: >>> rpcgen is needed by glusterfs-5.5-100.x86_64 >>> make: *** [Makefile:579: rpms] Error 1 >>> >>> >> In the corresponding glusterfs.spec file (branch sles15-glusterfs-5 in >> Repo glusterfs-suse) there is rpcgen listed as dependency. But >> unfortunately there is no rpcgen package provided on SLES15. Or with other >> words: >> I did only find RPMs for other SUSE distributions, but not for SLES15. >> >> Do you know that issue? >> > > I'm afraid I don't. > > >> What is the name of the distribution which you are using to create >> Packages for SLES15? >> > > The community packages are built on the OpenSUSE OBS and they are built on > SLES15 ?the one that OBS provides. I don't know any details beyond that. It > could be a real SLES15 system, or it could be a build in mock, or SUSE's > chroot build tool if they don't have mock. > > You can see the build logs from the community builds of glusterfs-5.5 and > glusterfs-5.6 for SLES15 at [1] and [2] respectively. AFAIK it's a > completely "vanilla" SLES15 and seems to have rpcgen-1.3-2.18 available. > Finding things in the OBS repos seems to be hit or miss sometimes. I can't > find the SLE_15 rpcgen package. > > (Back in SLES11 days I had a free eval license that let me update and > install add-on packages on my own system. I tried to get a similar license > for SLES12 and was advised to just use OBS. I haven't even bothered trying > to get one for SLES15. It makes it harder IMO to figure things out.) > > I recommend asking the OBS team on #opensuse-buildservice on (freenode) > IRC. They've always been very helpful to me. > Miuku on #opensuse-buildservice poked around and found that the unbundled rpcgen in SLE_15 comes from the rpcsvc-proto rpm. (Not the rpcgen rpm as it does in Fedora and RHEL8.) All the gluster community packages for SLE_15 going back to glusterfs-5.0 in October 2018 have used the unbundled rpcgen. You can do the same, or remove the BuildRequires: rpcgen line and use the glibc bundled rpcgen. HTH -- Kaleb -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgurusid at redhat.com Mon May 13 05:22:06 2019 From: pgurusid at redhat.com (Poornima Gurusiddaiah) Date: Mon, 13 May 2019 10:52:06 +0530 Subject: [Gluster-users] [Gluster-devel] Improve stability between SMB/CTDB and Gluster (together with Samba Core Developer) In-Reply-To: References: Message-ID: Hi, We would be definitely interested in this. Thank you for contacting us. For the starter we can have an online conference. Please suggest few possible date and times for the week(preferably between IST 7.00AM - 9.PM)? Adding Anoop and Gunther who are also the main contributors to the Gluster-Samba integration. Thanks, Poornima On Thu, May 9, 2019 at 7:43 PM David Spisla wrote: > Dear Gluster Community, > at the moment we are improving the stability of SMB/CTDB and Gluster. For > this purpose we are working together with an advanced SAMBA Core Developer. > He did some debugging but needs more information about Gluster Core > Behaviour. > > *Would any of the Gluster Developer wants to have a online conference with > him and me?* > > I would organize everything. In my opinion this is a good chance to > improve stability of Glusterfs and this is at the moment one of the major > issues in the Community. > > Regards > David Spisla > _______________________________________________ > > Community Meeting Calendar: > > APAC Schedule - > Every 2nd and 4th Tuesday at 11:30 AM IST > Bridge: https://bluejeans.com/836554017 > > NA/EMEA Schedule - > Every 1st and 3rd Tuesday at 01:00 PM EDT > Bridge: https://bluejeans.com/486278655 > > Gluster-devel mailing list > Gluster-devel at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-devel > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From spisla80 at gmail.com Mon May 13 06:10:35 2019 From: spisla80 at gmail.com (David Spisla) Date: Mon, 13 May 2019 08:10:35 +0200 Subject: [Gluster-users] Create Gluster RPMs on a SLES15 machine In-Reply-To: References: Message-ID: Hello Kaleb, thank you for the info. I'll try this out. Regards David Am Fr., 10. Mai 2019 um 16:24 Uhr schrieb Kaleb Keithley < kkeithle at redhat.com>: > Seems I accidentally omitted gluster-users in my first reply. > > On Thu, May 9, 2019 at 3:19 PM Kaleb Keithley wrote: > >> On Thu, May 9, 2019 at 8:53 AM David Spisla wrote: >> >>> Hello Kaleb, >>> >>> i am trying to create my own Gluster v5.5 RPMs for SLES15 and I am using >>> a SLES15 system to create them. I got the following error message: >>> >>> rpmbuild --define '_topdir >>>> /home/davids/glusterfs/extras/LinuxRPM/rpmbuild' --with gnfs -bb >>>> rpmbuild/SPECS/glusterfs.spec >>>> warning: bogus date in %changelog: Tue Apr 17 2019 kkeithle at >>>> redhat.com >>>> warning: bogus date in %changelog: Fri Sep 19 2018 kkeithle at >>>> redhat.com >>>> error: Failed build dependencies: >>>> rpcgen is needed by glusterfs-5.5-100.x86_64 >>>> make: *** [Makefile:579: rpms] Error 1 >>>> >>>> >>> In the corresponding glusterfs.spec file (branch sles15-glusterfs-5 in >>> Repo glusterfs-suse) there is rpcgen listed as dependency. But >>> unfortunately there is no rpcgen package provided on SLES15. Or with other >>> words: >>> I did only find RPMs for other SUSE distributions, but not for SLES15. >>> >>> Do you know that issue? >>> >> >> I'm afraid I don't. >> >> >>> What is the name of the distribution which you are using to create >>> Packages for SLES15? >>> >> >> The community packages are built on the OpenSUSE OBS and they are built >> on SLES15 ?the one that OBS provides. I don't know any details beyond that. >> It could be a real SLES15 system, or it could be a build in mock, or SUSE's >> chroot build tool if they don't have mock. >> >> You can see the build logs from the community builds of glusterfs-5.5 and >> glusterfs-5.6 for SLES15 at [1] and [2] respectively. AFAIK it's a >> completely "vanilla" SLES15 and seems to have rpcgen-1.3-2.18 available. >> Finding things in the OBS repos seems to be hit or miss sometimes. I can't >> find the SLE_15 rpcgen package. >> >> (Back in SLES11 days I had a free eval license that let me update and >> install add-on packages on my own system. I tried to get a similar license >> for SLES12 and was advised to just use OBS. I haven't even bothered trying >> to get one for SLES15. It makes it harder IMO to figure things out.) >> >> I recommend asking the OBS team on #opensuse-buildservice on (freenode) >> IRC. They've always been very helpful to me. >> > > Miuku on #opensuse-buildservice poked around and found that the unbundled > rpcgen in SLE_15 comes from the rpcsvc-proto rpm. (Not the rpcgen rpm as it > does in Fedora and RHEL8.) > > All the gluster community packages for SLE_15 going back to glusterfs-5.0 > in October 2018 have used the unbundled rpcgen. You can do the same, or > remove the BuildRequires: rpcgen line and use the glibc bundled rpcgen. > > HTH > > -- > > Kaleb > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From snowmailer at gmail.com Mon May 13 06:47:45 2019 From: snowmailer at gmail.com (Martin Toth) Date: Mon, 13 May 2019 08:47:45 +0200 Subject: [Gluster-users] VMs blocked for more than 120 seconds Message-ID: Hi all, I am running replica 3 on SSDs with 10G networking, everything works OK but VMs stored in Gluster volume occasionally freeze with ?Task XY blocked for more than 120 seconds?. Only solution is to poweroff (hard) VM and than boot it up again. I am unable to SSH and also login with console, its stuck probably on some disk operation. No error/warning logs or messages are store in VMs logs. KVM/Libvirt(qemu) using libgfapi and fuse mount to access VM disks on replica volume. Can someone advice how to debug this problem or what can cause these issues? It?s really annoying, I?ve tried to google everything but nothing came up. I?ve tried changing virtio-scsi-pci to virtio-blk-pci disk drivers, but its not related. BR, Martin These are volume settings : Type: Replicate Volume ID: b021bbb6-fa99-4cc7-88f6-49152a22cb9e Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: node1:/imagestore/brick1 Brick2: node2:/imagestore/brick1 Brick3: node3:/imagestore/brick1 Options Reconfigured: performance.client-io-threads: on performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: on cluster.min-free-disk: 10% cluster.server-quorum-type: server cluster.quorum-type: auto cluster.eager-lock: enable cluster.data-self-heal-algorithm: full network.remote-dio: enable network.ping-timeout: 30 diagnostics.count-fop-hits: on diagnostics.latency-measurement: on client.event-threads: 4 server.event-threads: 4 storage.owner-gid: 9869 storage.owner-uid: 9869 server.allow-insecure: on nfs.disable: on performance.readdir-ahead: on -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2019-05-13 at 08.32.24.png Type: image/png Size: 144426 bytes Desc: not available URL: From lemonnierk at ulrar.net Mon May 13 06:55:48 2019 From: lemonnierk at ulrar.net (lemonnierk at ulrar.net) Date: Mon, 13 May 2019 07:55:48 +0100 Subject: [Gluster-users] VMs blocked for more than 120 seconds In-Reply-To: References: Message-ID: <20190513065548.GI25080@althea.ulrar.net> On Mon, May 13, 2019 at 08:47:45AM +0200, Martin Toth wrote: > Hi all, Hi > > I am running replica 3 on SSDs with 10G networking, everything works OK but VMs stored in Gluster volume occasionally freeze with ?Task XY blocked for more than 120 seconds?. > Only solution is to poweroff (hard) VM and than boot it up again. I am unable to SSH and also login with console, its stuck probably on some disk operation. No error/warning logs or messages are store in VMs logs. > As far as I know this should be unrelated, I get this during heals without any freezes, it just means the storage is slow I think. > KVM/Libvirt(qemu) using libgfapi and fuse mount to access VM disks on replica volume. Can someone advice how to debug this problem or what can cause these issues? > It?s really annoying, I?ve tried to google everything but nothing came up. I?ve tried changing virtio-scsi-pci to virtio-blk-pci disk drivers, but its not related. > Any chance your gluster goes readonly ? Have you checked your gluster logs to see if maybe they lose each other some times ? /var/log/glusterfs For libgfapi accesses you'd have it's log on qemu's standard output, that might contain the actual error at the time of the freez. From snowmailer at gmail.com Mon May 13 07:03:40 2019 From: snowmailer at gmail.com (Martin Toth) Date: Mon, 13 May 2019 09:03:40 +0200 Subject: [Gluster-users] VMs blocked for more than 120 seconds In-Reply-To: <20190513065548.GI25080@althea.ulrar.net> References: <20190513065548.GI25080@althea.ulrar.net> Message-ID: Hi, there is no healing operation, not peer disconnects, no readonly filesystem. Yes, storage is slow and unavailable for 120 seconds, but why, its SSD with 10G, performance is good. > you'd have it's log on qemu's standard output, If you mean /var/log/libvirt/qemu/vm.log there is nothing. I am looking for problem for more than month, tried everything. Can?t find anything. Any more clues or leads? BR, Martin > On 13 May 2019, at 08:55, lemonnierk at ulrar.net wrote: > > On Mon, May 13, 2019 at 08:47:45AM +0200, Martin Toth wrote: >> Hi all, > > Hi > >> >> I am running replica 3 on SSDs with 10G networking, everything works OK but VMs stored in Gluster volume occasionally freeze with ?Task XY blocked for more than 120 seconds?. >> Only solution is to poweroff (hard) VM and than boot it up again. I am unable to SSH and also login with console, its stuck probably on some disk operation. No error/warning logs or messages are store in VMs logs. >> > > As far as I know this should be unrelated, I get this during heals > without any freezes, it just means the storage is slow I think. > >> KVM/Libvirt(qemu) using libgfapi and fuse mount to access VM disks on replica volume. Can someone advice how to debug this problem or what can cause these issues? >> It?s really annoying, I?ve tried to google everything but nothing came up. I?ve tried changing virtio-scsi-pci to virtio-blk-pci disk drivers, but its not related. >> > > Any chance your gluster goes readonly ? Have you checked your gluster > logs to see if maybe they lose each other some times ? > /var/log/glusterfs > > For libgfapi accesses you'd have it's log on qemu's standard output, > that might contain the actual error at the time of the freez. > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users From kdhananj at redhat.com Mon May 13 07:19:25 2019 From: kdhananj at redhat.com (Krutika Dhananjay) Date: Mon, 13 May 2019 12:49:25 +0530 Subject: [Gluster-users] VMs blocked for more than 120 seconds In-Reply-To: References: <20190513065548.GI25080@althea.ulrar.net> Message-ID: What version of gluster are you using? Also, can you capture and share volume-profile output for a run where you manage to recreate this issue? https://docs.gluster.org/en/v3/Administrator%20Guide/Monitoring%20Workload/#running-glusterfs-volume-profile-command Let me know if you have any questions. -Krutika On Mon, May 13, 2019 at 12:34 PM Martin Toth wrote: > Hi, > > there is no healing operation, not peer disconnects, no readonly > filesystem. Yes, storage is slow and unavailable for 120 seconds, but why, > its SSD with 10G, performance is good. > > > you'd have it's log on qemu's standard output, > > If you mean /var/log/libvirt/qemu/vm.log there is nothing. I am looking > for problem for more than month, tried everything. Can?t find anything. Any > more clues or leads? > > BR, > Martin > > > On 13 May 2019, at 08:55, lemonnierk at ulrar.net wrote: > > > > On Mon, May 13, 2019 at 08:47:45AM +0200, Martin Toth wrote: > >> Hi all, > > > > Hi > > > >> > >> I am running replica 3 on SSDs with 10G networking, everything works OK > but VMs stored in Gluster volume occasionally freeze with ?Task XY blocked > for more than 120 seconds?. > >> Only solution is to poweroff (hard) VM and than boot it up again. I am > unable to SSH and also login with console, its stuck probably on some disk > operation. No error/warning logs or messages are store in VMs logs. > >> > > > > As far as I know this should be unrelated, I get this during heals > > without any freezes, it just means the storage is slow I think. > > > >> KVM/Libvirt(qemu) using libgfapi and fuse mount to access VM disks on > replica volume. Can someone advice how to debug this problem or what can > cause these issues? > >> It?s really annoying, I?ve tried to google everything but nothing came > up. I?ve tried changing virtio-scsi-pci to virtio-blk-pci disk drivers, but > its not related. > >> > > > > Any chance your gluster goes readonly ? Have you checked your gluster > > logs to see if maybe they lose each other some times ? > > /var/log/glusterfs > > > > For libgfapi accesses you'd have it's log on qemu's standard output, > > that might contain the actual error at the time of the freez. > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From kdhananj at redhat.com Mon May 13 07:21:19 2019 From: kdhananj at redhat.com (Krutika Dhananjay) Date: Mon, 13 May 2019 12:51:19 +0530 Subject: [Gluster-users] VMs blocked for more than 120 seconds In-Reply-To: References: <20190513065548.GI25080@althea.ulrar.net> Message-ID: Also, what's the caching policy that qemu is using on the affected vms? Is it cache=none? Or something else? You can get this information in the command line of qemu-kvm process corresponding to your vm in the ps output. -Krutika On Mon, May 13, 2019 at 12:49 PM Krutika Dhananjay wrote: > What version of gluster are you using? > Also, can you capture and share volume-profile output for a run where you > manage to recreate this issue? > > https://docs.gluster.org/en/v3/Administrator%20Guide/Monitoring%20Workload/#running-glusterfs-volume-profile-command > Let me know if you have any questions. > > -Krutika > > On Mon, May 13, 2019 at 12:34 PM Martin Toth wrote: > >> Hi, >> >> there is no healing operation, not peer disconnects, no readonly >> filesystem. Yes, storage is slow and unavailable for 120 seconds, but why, >> its SSD with 10G, performance is good. >> >> > you'd have it's log on qemu's standard output, >> >> If you mean /var/log/libvirt/qemu/vm.log there is nothing. I am looking >> for problem for more than month, tried everything. Can?t find anything. Any >> more clues or leads? >> >> BR, >> Martin >> >> > On 13 May 2019, at 08:55, lemonnierk at ulrar.net wrote: >> > >> > On Mon, May 13, 2019 at 08:47:45AM +0200, Martin Toth wrote: >> >> Hi all, >> > >> > Hi >> > >> >> >> >> I am running replica 3 on SSDs with 10G networking, everything works >> OK but VMs stored in Gluster volume occasionally freeze with ?Task XY >> blocked for more than 120 seconds?. >> >> Only solution is to poweroff (hard) VM and than boot it up again. I am >> unable to SSH and also login with console, its stuck probably on some disk >> operation. No error/warning logs or messages are store in VMs logs. >> >> >> > >> > As far as I know this should be unrelated, I get this during heals >> > without any freezes, it just means the storage is slow I think. >> > >> >> KVM/Libvirt(qemu) using libgfapi and fuse mount to access VM disks on >> replica volume. Can someone advice how to debug this problem or what can >> cause these issues? >> >> It?s really annoying, I?ve tried to google everything but nothing came >> up. I?ve tried changing virtio-scsi-pci to virtio-blk-pci disk drivers, but >> its not related. >> >> >> > >> > Any chance your gluster goes readonly ? Have you checked your gluster >> > logs to see if maybe they lose each other some times ? >> > /var/log/glusterfs >> > >> > For libgfapi accesses you'd have it's log on qemu's standard output, >> > that might contain the actual error at the time of the freez. >> > _______________________________________________ >> > Gluster-users mailing list >> > Gluster-users at gluster.org >> > https://lists.gluster.org/mailman/listinfo/gluster-users >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From snowmailer at gmail.com Mon May 13 07:31:51 2019 From: snowmailer at gmail.com (Martin Toth) Date: Mon, 13 May 2019 09:31:51 +0200 Subject: [Gluster-users] VMs blocked for more than 120 seconds In-Reply-To: References: <20190513065548.GI25080@althea.ulrar.net> Message-ID: <681F0862-7C80-414D-9637-7697A8C65AFA@gmail.com> Cache in qemu is none. That should be correct. This is full command : /usr/bin/qemu-system-x86_64 -name one-312 -S -machine pc-i440fx-xenial,accel=kvm,usb=off -m 4096 -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -uuid e95a774e-a594-4e98-b141-9f30a3f848c1 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-one-312/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime -no-shutdown -boot order=c,menu=on,splash-time=3000,strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/var/lib/one//datastores/116/312/disk.0,format=raw,if=none,id=drive-virtio-disk1,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk1,id=virtio-disk1 -drive file=gluster://localhost:24007/imagestore/7b64d6757acc47a39503f68731f89b8e,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0 -drive file=/var/lib/one//datastores/116/312/disk.1,format=raw,if=none,id=drive-ide0-0-0,readonly=on -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,fd=26,id=hostnet0 -device e1000,netdev=hostnet0,id=net0,mac=02:00:5c:f0:e4:39,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-one-312/org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -vnc 0.0.0.0:312,password -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on I?ve highlighted disks. First is VM context disk - Fuse used, second is SDA (OS is installed here) - libgfapi used, third is SWAP - Fuse used. Krutika, I will start profiling on Gluster Volumes and wait for next VM to fail. Than I will attach/send profiling info after some VM will be failed. I suppose this is correct profiling strategy. Thanks, BR! Martin > On 13 May 2019, at 09:21, Krutika Dhananjay wrote: > > Also, what's the caching policy that qemu is using on the affected vms? > Is it cache=none? Or something else? You can get this information in the command line of qemu-kvm process corresponding to your vm in the ps output. > > -Krutika > > On Mon, May 13, 2019 at 12:49 PM Krutika Dhananjay > wrote: > What version of gluster are you using? > Also, can you capture and share volume-profile output for a run where you manage to recreate this issue? > https://docs.gluster.org/en/v3/Administrator%20Guide/Monitoring%20Workload/#running-glusterfs-volume-profile-command > Let me know if you have any questions. > > -Krutika > > On Mon, May 13, 2019 at 12:34 PM Martin Toth > wrote: > Hi, > > there is no healing operation, not peer disconnects, no readonly filesystem. Yes, storage is slow and unavailable for 120 seconds, but why, its SSD with 10G, performance is good. > > > you'd have it's log on qemu's standard output, > > If you mean /var/log/libvirt/qemu/vm.log there is nothing. I am looking for problem for more than month, tried everything. Can?t find anything. Any more clues or leads? > > BR, > Martin > > > On 13 May 2019, at 08:55, lemonnierk at ulrar.net wrote: > > > > On Mon, May 13, 2019 at 08:47:45AM +0200, Martin Toth wrote: > >> Hi all, > > > > Hi > > > >> > >> I am running replica 3 on SSDs with 10G networking, everything works OK but VMs stored in Gluster volume occasionally freeze with ?Task XY blocked for more than 120 seconds?. > >> Only solution is to poweroff (hard) VM and than boot it up again. I am unable to SSH and also login with console, its stuck probably on some disk operation. No error/warning logs or messages are store in VMs logs. > >> > > > > As far as I know this should be unrelated, I get this during heals > > without any freezes, it just means the storage is slow I think. > > > >> KVM/Libvirt(qemu) using libgfapi and fuse mount to access VM disks on replica volume. Can someone advice how to debug this problem or what can cause these issues? > >> It?s really annoying, I?ve tried to google everything but nothing came up. I?ve tried changing virtio-scsi-pci to virtio-blk-pci disk drivers, but its not related. > >> > > > > Any chance your gluster goes readonly ? Have you checked your gluster > > logs to see if maybe they lose each other some times ? > > /var/log/glusterfs > > > > For libgfapi accesses you'd have it's log on qemu's standard output, > > that might contain the actual error at the time of the freez. > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrevolodin at gmail.com Mon May 13 07:33:55 2019 From: andrevolodin at gmail.com (Andrey Volodin) Date: Mon, 13 May 2019 07:33:55 +0000 Subject: [Gluster-users] VMs blocked for more than 120 seconds In-Reply-To: <681F0862-7C80-414D-9637-7697A8C65AFA@gmail.com> References: <20190513065548.GI25080@althea.ulrar.net> <681F0862-7C80-414D-9637-7697A8C65AFA@gmail.com> Message-ID: as per https://helpful.knobs-dials.com/index.php/INFO:_task_blocked_for_more_than_120_seconds. , the informational warning could be suppressed with : "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" Moreover, as per their website : "*This message is not an error*. It is an indication that a program has had to wait for a very long time, and what it was doing. " More reference: https://serverfault.com/questions/405210/can-high-load-cause-server-hang-and-error-blocked-for-more-than-120-seconds Regards, Andrei On Mon, May 13, 2019 at 7:32 AM Martin Toth wrote: > Cache in qemu is none. That should be correct. This is full command : > > /usr/bin/qemu-system-x86_64 -name one-312 -S -machine > pc-i440fx-xenial,accel=kvm,usb=off -m 4096 -realtime mlock=off -smp > 4,sockets=4,cores=1,threads=1 -uuid e95a774e-a594-4e98-b141-9f30a3f848c1 > -no-user-config -nodefaults -chardev > socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-one-312/monitor.sock,server,nowait > -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime > -no-shutdown -boot order=c,menu=on,splash-time=3000,strict=on -device > piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 > > -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 > -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 > -drive file=/var/lib/one//datastores/116/312/*disk.0* > ,format=raw,if=none,id=drive-virtio-disk1,cache=none > -device > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk1,id=virtio-disk1 > -drive file=gluster://localhost:24007/imagestore/ > *7b64d6757acc47a39503f68731f89b8e* > ,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none > -device > scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0 > -drive file=/var/lib/one//datastores/116/312/*disk.1* > ,format=raw,if=none,id=drive-ide0-0-0,readonly=on > -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 > > -netdev tap,fd=26,id=hostnet0 > -device e1000,netdev=hostnet0,id=net0,mac=02:00:5c:f0:e4:39,bus=pci.0,addr=0x3 > -chardev pty,id=charserial0 -device > isa-serial,chardev=charserial0,id=serial0 > -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-one-312/org.qemu.guest_agent.0,server,nowait > -device > virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 > -vnc 0.0.0.0:312,password -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 > -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on > > I?ve highlighted disks. First is VM context disk - Fuse used, second is > SDA (OS is installed here) - libgfapi used, third is SWAP - Fuse used. > > Krutika, > I will start profiling on Gluster Volumes and wait for next VM to fail. > Than I will attach/send profiling info after some VM will be failed. I > suppose this is correct profiling strategy. > > Thanks, > BR! > Martin > > On 13 May 2019, at 09:21, Krutika Dhananjay wrote: > > Also, what's the caching policy that qemu is using on the affected vms? > Is it cache=none? Or something else? You can get this information in the > command line of qemu-kvm process corresponding to your vm in the ps output. > > -Krutika > > On Mon, May 13, 2019 at 12:49 PM Krutika Dhananjay > wrote: > >> What version of gluster are you using? >> Also, can you capture and share volume-profile output for a run where you >> manage to recreate this issue? >> >> https://docs.gluster.org/en/v3/Administrator%20Guide/Monitoring%20Workload/#running-glusterfs-volume-profile-command >> Let me know if you have any questions. >> >> -Krutika >> >> On Mon, May 13, 2019 at 12:34 PM Martin Toth >> wrote: >> >>> Hi, >>> >>> there is no healing operation, not peer disconnects, no readonly >>> filesystem. Yes, storage is slow and unavailable for 120 seconds, but why, >>> its SSD with 10G, performance is good. >>> >>> > you'd have it's log on qemu's standard output, >>> >>> If you mean /var/log/libvirt/qemu/vm.log there is nothing. I am looking >>> for problem for more than month, tried everything. Can?t find anything. Any >>> more clues or leads? >>> >>> BR, >>> Martin >>> >>> > On 13 May 2019, at 08:55, lemonnierk at ulrar.net wrote: >>> > >>> > On Mon, May 13, 2019 at 08:47:45AM +0200, Martin Toth wrote: >>> >> Hi all, >>> > >>> > Hi >>> > >>> >> >>> >> I am running replica 3 on SSDs with 10G networking, everything works >>> OK but VMs stored in Gluster volume occasionally freeze with ?Task XY >>> blocked for more than 120 seconds?. >>> >> Only solution is to poweroff (hard) VM and than boot it up again. I >>> am unable to SSH and also login with console, its stuck probably on some >>> disk operation. No error/warning logs or messages are store in VMs logs. >>> >> >>> > >>> > As far as I know this should be unrelated, I get this during heals >>> > without any freezes, it just means the storage is slow I think. >>> > >>> >> KVM/Libvirt(qemu) using libgfapi and fuse mount to access VM disks on >>> replica volume. Can someone advice how to debug this problem or what can >>> cause these issues? >>> >> It?s really annoying, I?ve tried to google everything but nothing >>> came up. I?ve tried changing virtio-scsi-pci to virtio-blk-pci disk >>> drivers, but its not related. >>> >> >>> > >>> > Any chance your gluster goes readonly ? Have you checked your gluster >>> > logs to see if maybe they lose each other some times ? >>> > /var/log/glusterfs >>> > >>> > For libgfapi accesses you'd have it's log on qemu's standard output, >>> > that might contain the actual error at the time of the freez. >>> > _______________________________________________ >>> > Gluster-users mailing list >>> > Gluster-users at gluster.org >>> > https://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrevolodin at gmail.com Mon May 13 07:37:15 2019 From: andrevolodin at gmail.com (Andrey Volodin) Date: Mon, 13 May 2019 07:37:15 +0000 Subject: [Gluster-users] VMs blocked for more than 120 seconds In-Reply-To: References: <20190513065548.GI25080@althea.ulrar.net> <681F0862-7C80-414D-9637-7697A8C65AFA@gmail.com> Message-ID: what is the context from dmesg ? On Mon, May 13, 2019 at 7:33 AM Andrey Volodin wrote: > as per > https://helpful.knobs-dials.com/index.php/INFO:_task_blocked_for_more_than_120_seconds. , > the informational warning could be suppressed with : > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > > Moreover, as per their website : "*This message is not an error*. > It is an indication that a program has had to wait for a very long time, > and what it was doing. " > More reference: > https://serverfault.com/questions/405210/can-high-load-cause-server-hang-and-error-blocked-for-more-than-120-seconds > > Regards, > Andrei > > On Mon, May 13, 2019 at 7:32 AM Martin Toth wrote: > >> Cache in qemu is none. That should be correct. This is full command : >> >> /usr/bin/qemu-system-x86_64 -name one-312 -S -machine >> pc-i440fx-xenial,accel=kvm,usb=off -m 4096 -realtime mlock=off -smp >> 4,sockets=4,cores=1,threads=1 -uuid e95a774e-a594-4e98-b141-9f30a3f848c1 >> -no-user-config -nodefaults -chardev >> socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-one-312/monitor.sock,server,nowait >> -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime >> -no-shutdown -boot order=c,menu=on,splash-time=3000,strict=on -device >> piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 >> >> -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 >> -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 >> -drive file=/var/lib/one//datastores/116/312/*disk.0* >> ,format=raw,if=none,id=drive-virtio-disk1,cache=none >> -device >> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk1,id=virtio-disk1 >> -drive file=gluster://localhost:24007/imagestore/ >> *7b64d6757acc47a39503f68731f89b8e* >> ,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none >> -device >> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0 >> -drive file=/var/lib/one//datastores/116/312/*disk.1* >> ,format=raw,if=none,id=drive-ide0-0-0,readonly=on >> -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 >> >> -netdev tap,fd=26,id=hostnet0 >> -device e1000,netdev=hostnet0,id=net0,mac=02:00:5c:f0:e4:39,bus=pci.0,addr=0x3 >> -chardev pty,id=charserial0 -device >> isa-serial,chardev=charserial0,id=serial0 >> -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-one-312/org.qemu.guest_agent.0,server,nowait >> -device >> virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 >> -vnc 0.0.0.0:312,password -device >> cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device >> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on >> >> I?ve highlighted disks. First is VM context disk - Fuse used, second is >> SDA (OS is installed here) - libgfapi used, third is SWAP - Fuse used. >> >> Krutika, >> I will start profiling on Gluster Volumes and wait for next VM to fail. >> Than I will attach/send profiling info after some VM will be failed. I >> suppose this is correct profiling strategy. >> >> Thanks, >> BR! >> Martin >> >> On 13 May 2019, at 09:21, Krutika Dhananjay wrote: >> >> Also, what's the caching policy that qemu is using on the affected vms? >> Is it cache=none? Or something else? You can get this information in the >> command line of qemu-kvm process corresponding to your vm in the ps output. >> >> -Krutika >> >> On Mon, May 13, 2019 at 12:49 PM Krutika Dhananjay >> wrote: >> >>> What version of gluster are you using? >>> Also, can you capture and share volume-profile output for a run where >>> you manage to recreate this issue? >>> >>> https://docs.gluster.org/en/v3/Administrator%20Guide/Monitoring%20Workload/#running-glusterfs-volume-profile-command >>> Let me know if you have any questions. >>> >>> -Krutika >>> >>> On Mon, May 13, 2019 at 12:34 PM Martin Toth >>> wrote: >>> >>>> Hi, >>>> >>>> there is no healing operation, not peer disconnects, no readonly >>>> filesystem. Yes, storage is slow and unavailable for 120 seconds, but why, >>>> its SSD with 10G, performance is good. >>>> >>>> > you'd have it's log on qemu's standard output, >>>> >>>> If you mean /var/log/libvirt/qemu/vm.log there is nothing. I am looking >>>> for problem for more than month, tried everything. Can?t find anything. Any >>>> more clues or leads? >>>> >>>> BR, >>>> Martin >>>> >>>> > On 13 May 2019, at 08:55, lemonnierk at ulrar.net wrote: >>>> > >>>> > On Mon, May 13, 2019 at 08:47:45AM +0200, Martin Toth wrote: >>>> >> Hi all, >>>> > >>>> > Hi >>>> > >>>> >> >>>> >> I am running replica 3 on SSDs with 10G networking, everything works >>>> OK but VMs stored in Gluster volume occasionally freeze with ?Task XY >>>> blocked for more than 120 seconds?. >>>> >> Only solution is to poweroff (hard) VM and than boot it up again. I >>>> am unable to SSH and also login with console, its stuck probably on some >>>> disk operation. No error/warning logs or messages are store in VMs logs. >>>> >> >>>> > >>>> > As far as I know this should be unrelated, I get this during heals >>>> > without any freezes, it just means the storage is slow I think. >>>> > >>>> >> KVM/Libvirt(qemu) using libgfapi and fuse mount to access VM disks >>>> on replica volume. Can someone advice how to debug this problem or what >>>> can cause these issues? >>>> >> It?s really annoying, I?ve tried to google everything but nothing >>>> came up. I?ve tried changing virtio-scsi-pci to virtio-blk-pci disk >>>> drivers, but its not related. >>>> >> >>>> > >>>> > Any chance your gluster goes readonly ? Have you checked your gluster >>>> > logs to see if maybe they lose each other some times ? >>>> > /var/log/glusterfs >>>> > >>>> > For libgfapi accesses you'd have it's log on qemu's standard output, >>>> > that might contain the actual error at the time of the freez. >>>> > _______________________________________________ >>>> > Gluster-users mailing list >>>> > Gluster-users at gluster.org >>>> > https://lists.gluster.org/mailman/listinfo/gluster-users >>>> >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Mon May 13 07:44:05 2019 From: hunter86_bg at yahoo.com (Strahil) Date: Mon, 13 May 2019 10:44:05 +0300 Subject: [Gluster-users] Advice needed for network change. Message-ID: <78m66mxb8mlgq3hspi04g8vn.1557733445126@email.android.com> Hey All, I have managed to migrate but with some slight changes. I'm using teaming with balance runner (l3 + l4 for balance hashing) on 6 ports (2 dual port + 2 nics using only 1 port ) per machine. Running some tests (8 network connections in parallel) between the replica nodes shows an aggregated bandwidth of 400+ MB/s (megabytes). Now I need to 'force' gluster to open more connections in order to spread the load to as many ports as possible. I have tried with client & server event-threads set to 6 , but the performance is bellow my expectations . Any hints will be appreciated. Best Regards, Strahil Nikolov On May 10, 2019 13:36, Strahil wrote: > > Hello Community, > > I'm making some changes and I would like to hear? your opinion on the topic. > > First, let me share my setup. > I have 3 systems with in a replica 3 arbiter 1 hyperconverged setup (oVirt) which use 1 gbit networks for any connectivity. > > I have added 4 dual-port 1 gbit NICs ( 8 ports per machine in total) and connected them directly between ovirt1 and ovirt2 /data nodes/ with LACP aggregation (layer3+layer4 hashing). > > As ovirt1 & ovirt2 are directly connected /trying to reduce costs by avoiding the switch/ I have? setup /etc/hosts for the arbiter? /ovirt3/ to point tothe old IPs . > > So they look like: > ovirt1 & ovirt2 /data nodes/? /etc/hosts: > > 127.0.0.1?? localhost localhost.localdomain localhost4 localhost4.localdomain4 > ::1???????? localhost localhost.localdomain localhost6 localhost6.localdomain6 > 192.168.1.90 ovirt1.localdomain ovirt1 > 192.168.1.64 ovirt2.localdomain ovirt2 > 192.168.1.41 ovirt3.localdomain ovirt3 > 10.10.10.1?? gluster1.localdomain gluster1 > 10.10.10.2?? gluster2.localdomain gluster2 > > ovirt3 /etc/hosts: > > 127.0.0.1?? localhost localhost.localdomain localhost4 localhost4.localdomain4 > ::1???????? localhost localhost.localdomain localhost6 localhost6.localdomain6 > #As gluster1 & gluster2 are directly connected , we cannot reach them. > 192.168.1.90 ovirt1.localdomain ovirt1 gluster1 > 192.168.1.64 ovirt2.localdomain ovirt2 gluster2 > 192.168.1.41 ovirt3.localdomain ovirt3 > > Do you see any obstacles to 'peer probe' and then 'replace brick' the 2 data nodes. > Downtime is not an issue, but I preffer not to wipe the setup. > > Thanks for reading this long post and don't hesitate to recommend any tunings. > I am still considering what values to put for the? client/server thread count. > > Best Regards, > Strahil Nikolov -------------- next part -------------- An HTML attachment was scrubbed... URL: From kdhananj at redhat.com Mon May 13 08:20:14 2019 From: kdhananj at redhat.com (Krutika Dhananjay) Date: Mon, 13 May 2019 13:50:14 +0530 Subject: [Gluster-users] VMs blocked for more than 120 seconds In-Reply-To: <681F0862-7C80-414D-9637-7697A8C65AFA@gmail.com> References: <20190513065548.GI25080@althea.ulrar.net> <681F0862-7C80-414D-9637-7697A8C65AFA@gmail.com> Message-ID: OK. In that case, can you check if the following two changes help: # gluster volume set $VOL network.remote-dio off # gluster volume set $VOL performance.strict-o-direct on preferably one option changed at a time, its impact tested and then the next change applied and tested. Also, gluster version please? -Krutika On Mon, May 13, 2019 at 1:02 PM Martin Toth wrote: > Cache in qemu is none. That should be correct. This is full command : > > /usr/bin/qemu-system-x86_64 -name one-312 -S -machine > pc-i440fx-xenial,accel=kvm,usb=off -m 4096 -realtime mlock=off -smp > 4,sockets=4,cores=1,threads=1 -uuid e95a774e-a594-4e98-b141-9f30a3f848c1 > -no-user-config -nodefaults -chardev > socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-one-312/monitor.sock,server,nowait > -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime > -no-shutdown -boot order=c,menu=on,splash-time=3000,strict=on -device > piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 > > -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 > -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 > -drive file=/var/lib/one//datastores/116/312/*disk.0* > ,format=raw,if=none,id=drive-virtio-disk1,cache=none > -device > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk1,id=virtio-disk1 > -drive file=gluster://localhost:24007/imagestore/ > *7b64d6757acc47a39503f68731f89b8e* > ,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none > -device > scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0 > -drive file=/var/lib/one//datastores/116/312/*disk.1* > ,format=raw,if=none,id=drive-ide0-0-0,readonly=on > -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 > > -netdev tap,fd=26,id=hostnet0 > -device e1000,netdev=hostnet0,id=net0,mac=02:00:5c:f0:e4:39,bus=pci.0,addr=0x3 > -chardev pty,id=charserial0 -device > isa-serial,chardev=charserial0,id=serial0 > -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-one-312/org.qemu.guest_agent.0,server,nowait > -device > virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 > -vnc 0.0.0.0:312,password -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 > -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on > > I?ve highlighted disks. First is VM context disk - Fuse used, second is > SDA (OS is installed here) - libgfapi used, third is SWAP - Fuse used. > > Krutika, > I will start profiling on Gluster Volumes and wait for next VM to fail. > Than I will attach/send profiling info after some VM will be failed. I > suppose this is correct profiling strategy. > About this, how many vms do you need to recreate it? A single vm? Or multiple vms doing IO in parallel? > Thanks, > BR! > Martin > > On 13 May 2019, at 09:21, Krutika Dhananjay wrote: > > Also, what's the caching policy that qemu is using on the affected vms? > Is it cache=none? Or something else? You can get this information in the > command line of qemu-kvm process corresponding to your vm in the ps output. > > -Krutika > > On Mon, May 13, 2019 at 12:49 PM Krutika Dhananjay > wrote: > >> What version of gluster are you using? >> Also, can you capture and share volume-profile output for a run where you >> manage to recreate this issue? >> >> https://docs.gluster.org/en/v3/Administrator%20Guide/Monitoring%20Workload/#running-glusterfs-volume-profile-command >> Let me know if you have any questions. >> >> -Krutika >> >> On Mon, May 13, 2019 at 12:34 PM Martin Toth >> wrote: >> >>> Hi, >>> >>> there is no healing operation, not peer disconnects, no readonly >>> filesystem. Yes, storage is slow and unavailable for 120 seconds, but why, >>> its SSD with 10G, performance is good. >>> >>> > you'd have it's log on qemu's standard output, >>> >>> If you mean /var/log/libvirt/qemu/vm.log there is nothing. I am looking >>> for problem for more than month, tried everything. Can?t find anything. Any >>> more clues or leads? >>> >>> BR, >>> Martin >>> >>> > On 13 May 2019, at 08:55, lemonnierk at ulrar.net wrote: >>> > >>> > On Mon, May 13, 2019 at 08:47:45AM +0200, Martin Toth wrote: >>> >> Hi all, >>> > >>> > Hi >>> > >>> >> >>> >> I am running replica 3 on SSDs with 10G networking, everything works >>> OK but VMs stored in Gluster volume occasionally freeze with ?Task XY >>> blocked for more than 120 seconds?. >>> >> Only solution is to poweroff (hard) VM and than boot it up again. I >>> am unable to SSH and also login with console, its stuck probably on some >>> disk operation. No error/warning logs or messages are store in VMs logs. >>> >> >>> > >>> > As far as I know this should be unrelated, I get this during heals >>> > without any freezes, it just means the storage is slow I think. >>> > >>> >> KVM/Libvirt(qemu) using libgfapi and fuse mount to access VM disks on >>> replica volume. Can someone advice how to debug this problem or what can >>> cause these issues? >>> >> It?s really annoying, I?ve tried to google everything but nothing >>> came up. I?ve tried changing virtio-scsi-pci to virtio-blk-pci disk >>> drivers, but its not related. >>> >> >>> > >>> > Any chance your gluster goes readonly ? Have you checked your gluster >>> > logs to see if maybe they lose each other some times ? >>> > /var/log/glusterfs >>> > >>> > For libgfapi accesses you'd have it's log on qemu's standard output, >>> > that might contain the actual error at the time of the freez. >>> > _______________________________________________ >>> > Gluster-users mailing list >>> > Gluster-users at gluster.org >>> > https://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgurusid at redhat.com Tue May 14 04:36:21 2019 From: pgurusid at redhat.com (pgurusid at redhat.com) Date: Tue, 14 May 2019 04:36:21 +0000 Subject: [Gluster-users] Invitation: Gluster Community Meeting (APAC friendly hours) @ Every 2 weeks at 11:30am on Tuesday 15 times (IST) (gluster-users@gluster.org) Message-ID: <0000000000001e34cf0588d19307@google.com> You have been invited to the following event. Title: Gluster Community Meeting (APAC friendly hours) Bridge: https://bluejeans.com/836554017 Meeting minutes: https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g?both Previous Meeting notes: http://github.com/gluster/community When: Every 2 weeks at 11:30am on Tuesday 15 times India Standard Time - Kolkata Where: https://bluejeans.com/836554017 Calendar: gluster-users at gluster.org Who: * pgurusid at redhat.com - organizer * gluster-users at gluster.org * maintainers at gluster.org * gluster-devel at gluster.org Event details: https://www.google.com/calendar/event?action=VIEW&eid=NTEwOGJvMGZjMnRjN3Z0YzY0OGNmb3E4dXQgZ2x1c3Rlci11c2Vyc0BnbHVzdGVyLm9yZw&tok=MTkjcGd1cnVzaWRAcmVkaGF0LmNvbTk4OTgxMGM4NWE4YjNlMjU0ZjM2YjAxNDBjNTlhMjdjYWY2ODA5Mjk&ctz=Asia%2FKolkata&hl=en&es=0 Invitation from Google Calendar: https://www.google.com/calendar/ You are receiving this courtesy email at the account gluster-users at gluster.org because you are an attendee of this event. To stop receiving future updates for this event, decline this event. Alternatively you can sign up for a Google account at https://www.google.com/calendar/ and control your notification settings for your entire calendar. Forwarding this invitation could allow any recipient to send a response to the organizer and be added to the guest list, or invite others regardless of their own invitation status, or to modify your RSVP. Learn more at https://support.google.com/calendar/answer/37135#forwarding -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/calendar Size: 2142 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: invite.ics Type: application/ics Size: 2194 bytes Desc: not available URL: From pgurusid at redhat.com Tue May 14 04:47:10 2019 From: pgurusid at redhat.com (pgurusid at redhat.com) Date: Tue, 14 May 2019 04:47:10 +0000 Subject: [Gluster-users] Updated invitation: Gluster Community Meeting (APAC friendly hours) @ Every 2 weeks from 11:30am to 12:30pm on Tuesday 15 times (IST) (gluster-users@gluster.org) Message-ID: <000000000000d5a2c10588d1b9e2@google.com> This event has been changed. Title: Gluster Community Meeting (APAC friendly hours) Bridge: https://bluejeans.com/836554017 Meeting minutes: https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g?both Previous Meeting notes: http://github.com/gluster/community When: Every 2 weeks from 11:30am to 12:30pm on Tuesday 15 times India Standard Time - Kolkata (changed) Where: https://bluejeans.com/836554017 Calendar: gluster-users at gluster.org Who: * pgurusid at redhat.com - organizer * gluster-users at gluster.org * maintainers at gluster.org * gluster-devel at gluster.org * ranaraya at redhat.com * khiremat at redhat.com * dcunningham at voisonics.com Event details: https://www.google.com/calendar/event?action=VIEW&eid=NTEwOGJvMGZjMnRjN3Z0YzY0OGNmb3E4dXQgZ2x1c3Rlci11c2Vyc0BnbHVzdGVyLm9yZw&tok=MTkjcGd1cnVzaWRAcmVkaGF0LmNvbTk4OTgxMGM4NWE4YjNlMjU0ZjM2YjAxNDBjNTlhMjdjYWY2ODA5Mjk&ctz=Asia%2FKolkata&hl=en&es=0 Invitation from Google Calendar: https://www.google.com/calendar/ You are receiving this courtesy email at the account gluster-users at gluster.org because you are an attendee of this event. To stop receiving future updates for this event, decline this event. Alternatively you can sign up for a Google account at https://www.google.com/calendar/ and control your notification settings for your entire calendar. Forwarding this invitation could allow any recipient to send a response to the organizer and be added to the guest list, or invite others regardless of their own invitation status, or to modify your RSVP. Learn more at https://support.google.com/calendar/answer/37135#forwarding -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/calendar Size: 2585 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: invite.ics Type: application/ics Size: 2644 bytes Desc: not available URL: From paul at vandervlis.nl Wed May 15 11:24:41 2019 From: paul at vandervlis.nl (Paul van der Vlis) Date: Wed, 15 May 2019 13:24:41 +0200 Subject: [Gluster-users] Cannot see all data in mount Message-ID: <9241cbaf-38ba-63e0-95f0-120bd9856bf5@vandervlis.nl> Hello, I am the new sysadmin of an organization what uses Glusterfs. I did not set it up, and I don't know much about Glusterfs. What I do not understand is that I do not see all data in the mount. Not as root, not as a normal user who has privileges. When I do "ls" in one of the subdirectories I don't see any data, but this data exists at the server! In another subdirectory I see everything fine, the rights of the directories and files inside are the same. I mount with something like: /bin/mount -t glusterfs -o acl 10.8.0.1:/data /data I see data in /data/VOORBEELD/, and I don't see any data in /data/ALGEMEEN/. I don't see something special in /etc/exports or in /etc/glusterfs on the server. Is there maybe a mechanism in Glusterfs what can exclude data from export? Or is there a way to debug this problem? With regards, Paul van der Vlis ---- # file: VOORBEELD # owner: root # group: secretariaat # flags: -s- user::rwx group::rwx group:medewerkers:r-x mask::rwx other::--- default:user::rwx default:group::rwx default:group:medewerkers:r-x default:mask::rwx default:other::--- # file: ALGEMEEN # owner: root # group: secretariaat # flags: -s- user::rwx group::rwx group:medewerkers:r-x mask::rwx other::--- default:user::rwx default:group::rwx default:group:medewerkers:r-x default:mask::rwx default:other::--- ------ -- Paul van der Vlis Linux systeembeheer Groningen https://www.vandervlis.nl/ From hunter86_bg at yahoo.com Wed May 15 12:59:24 2019 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Wed, 15 May 2019 12:59:24 +0000 (UTC) Subject: [Gluster-users] Cannot see all data in mount In-Reply-To: <9241cbaf-38ba-63e0-95f0-120bd9856bf5@vandervlis.nl> References: <9241cbaf-38ba-63e0-95f0-120bd9856bf5@vandervlis.nl> Message-ID: <1716249284.809654.1557925164742@mail.yahoo.com> Most probably you use sharding , which splits the files into smaller chunks so you can fit a 1TB file into gluster nodes with bricks of smaller size.So if you have 2 dispersed servers each having 500Gb brick->? without sharding you won't be able to store files larger than the brick size - no matter you have free space on the other server. When sharding is enabled - you will see on the brick the first shard as a file and the rest is in a hidden folder called ".shards" (or something like that). The benefit is also viewable when you need to do some maintenance on a gluster node, as you will need to heal only the shards containing modified by the customers' data. Best Regards,Strahil Nikolov ? ?????, 15 ??? 2019 ?., 7:31:39 ?. ???????-4, Paul van der Vlis ??????: Hello, I am the new sysadmin of an organization what uses Glusterfs. I did not set it up, and I don't know much about Glusterfs. What I do not understand is that I do not see all data in the mount. Not as root, not as a normal user who has privileges. When I do "ls" in one of the subdirectories I don't see any data, but this data exists at the server! In another subdirectory I see everything fine, the rights of the directories and files inside are the same. I mount with something like: /bin/mount -t glusterfs -o acl 10.8.0.1:/data /data I see data in /data/VOORBEELD/, and I don't see any data in /data/ALGEMEEN/. I don't see something special in /etc/exports or in /etc/glusterfs on the server. Is there maybe a mechanism in Glusterfs what can exclude data from export?? Or is there a way to debug this problem? With regards, Paul van der Vlis ---- # file: VOORBEELD # owner: root # group: secretariaat # flags: -s- user::rwx group::rwx group:medewerkers:r-x mask::rwx other::--- default:user::rwx default:group::rwx default:group:medewerkers:r-x default:mask::rwx default:other::--- # file: ALGEMEEN # owner: root # group: secretariaat # flags: -s- user::rwx group::rwx group:medewerkers:r-x mask::rwx other::--- default:user::rwx default:group::rwx default:group:medewerkers:r-x default:mask::rwx default:other::--- ------ -- Paul van der Vlis Linux systeembeheer Groningen https://www.vandervlis.nl/ _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul at vandervlis.nl Wed May 15 13:24:17 2019 From: paul at vandervlis.nl (Paul van der Vlis) Date: Wed, 15 May 2019 15:24:17 +0200 Subject: [Gluster-users] Cannot see all data in mount In-Reply-To: <1716249284.809654.1557925164742@mail.yahoo.com> References: <9241cbaf-38ba-63e0-95f0-120bd9856bf5@vandervlis.nl> <1716249284.809654.1557925164742@mail.yahoo.com> Message-ID: <4e160ac2-002a-8ef8-7660-de7cff369882@vandervlis.nl> Hello Strahil, Thanks for your answer. I don't find the word "sharding" in the configfiles. There is not much shared data (24GB), and only 1 brick: --- root at xxx:/etc/glusterfs# gluster volume info DATA Volume Name: DATA Type: Distribute Volume ID: db53ece1-5def-4f7c-b59d-3a230824032a Status: Started Snapshot Count: 0 Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: xxx-vpn:/DATA Options Reconfigured: transport.address-family: inet nfs.disable: on ---- (I have edited this a bit for privacy of my customer). I think they have used glusterfs because it can do ACLs. With regards, Paul van der Vlis Op 15-05-19 om 14:59 schreef Strahil Nikolov: > Most probably you use sharding , which splits the files into smaller > chunks so you can fit a 1TB file into gluster nodes with bricks of > smaller size. > So if you have 2 dispersed servers each having 500Gb brick->? without > sharding you won't be able to store files larger than the brick size - > no matter you have free space on the other server. > > When sharding is enabled - you will see on the brick the first shard as > a file and the rest is in a hidden folder called ".shards" (or something > like that). > > The benefit is also viewable when you need to do some maintenance on a > gluster node, as you will need to heal only the shards containing > modified by the customers' data. > > Best Regards, > Strahil Nikolov > > > ? ?????, 15 ??? 2019 ?., 7:31:39 ?. ???????-4, Paul van der Vlis > ??????: > > > Hello, > > I am the new sysadmin of an organization what uses Glusterfs. > I did not set it up, and I don't know much about Glusterfs. > > What I do not understand is that I do not see all data in the mount. > Not as root, not as a normal user who has privileges. > > When I do "ls" in one of the subdirectories I don't see any data, but > this data exists at the server! > > In another subdirectory I see everything fine, the rights of the > directories and files inside are the same. > > I mount with something like: > /bin/mount -t glusterfs -o acl 10.8.0.1:/data /data > I see data in /data/VOORBEELD/, and I don't see any data in /data/ALGEMEEN/. > > I don't see something special in /etc/exports or in /etc/glusterfs on > the server. > > Is there maybe a mechanism in Glusterfs what can exclude data from > export?? Or is there a way to debug this problem? > > With regards, > Paul van der Vlis > > ---- > # file: VOORBEELD > # owner: root > # group: secretariaat > # flags: -s- > user::rwx > group::rwx > group:medewerkers:r-x > mask::rwx > other::--- > default:user::rwx > default:group::rwx > default:group:medewerkers:r-x > default:mask::rwx > default:other::--- > > # file: ALGEMEEN > # owner: root > # group: secretariaat > # flags: -s- > user::rwx > group::rwx > group:medewerkers:r-x > mask::rwx > other::--- > default:user::rwx > default:group::rwx > default:group:medewerkers:r-x > default:mask::rwx > default:other::--- > ------ > > > > > > -- > Paul van der Vlis Linux systeembeheer Groningen > https://www.vandervlis.nl/ > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -- Paul van der Vlis Linux systeembeheer Groningen https://www.vandervlis.nl/ From nbalacha at redhat.com Wed May 15 13:45:10 2019 From: nbalacha at redhat.com (Nithya Balachandran) Date: Wed, 15 May 2019 19:15:10 +0530 Subject: [Gluster-users] Cannot see all data in mount In-Reply-To: <4e160ac2-002a-8ef8-7660-de7cff369882@vandervlis.nl> References: <9241cbaf-38ba-63e0-95f0-120bd9856bf5@vandervlis.nl> <1716249284.809654.1557925164742@mail.yahoo.com> <4e160ac2-002a-8ef8-7660-de7cff369882@vandervlis.nl> Message-ID: Hi Paul, A few questions: Which version of gluster are you using? Did this behaviour start recently? As in were the contents of that directory visible earlier? Regards, Nithya On Wed, 15 May 2019 at 18:55, Paul van der Vlis wrote: > Hello Strahil, > > Thanks for your answer. I don't find the word "sharding" in the > configfiles. There is not much shared data (24GB), and only 1 brick: > --- > root at xxx:/etc/glusterfs# gluster volume info DATA > > Volume Name: DATA > Type: Distribute > Volume ID: db53ece1-5def-4f7c-b59d-3a230824032a > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 > Transport-type: tcp > Bricks: > Brick1: xxx-vpn:/DATA > Options Reconfigured: > transport.address-family: inet > nfs.disable: on > ---- > (I have edited this a bit for privacy of my customer). > > I think they have used glusterfs because it can do ACLs. > > With regards, > Paul van der Vlis > > > Op 15-05-19 om 14:59 schreef Strahil Nikolov: > > Most probably you use sharding , which splits the files into smaller > > chunks so you can fit a 1TB file into gluster nodes with bricks of > > smaller size. > > So if you have 2 dispersed servers each having 500Gb brick-> without > > sharding you won't be able to store files larger than the brick size - > > no matter you have free space on the other server. > > > > When sharding is enabled - you will see on the brick the first shard as > > a file and the rest is in a hidden folder called ".shards" (or something > > like that). > > > > The benefit is also viewable when you need to do some maintenance on a > > gluster node, as you will need to heal only the shards containing > > modified by the customers' data. > > > > Best Regards, > > Strahil Nikolov > > > > > > ? ?????, 15 ??? 2019 ?., 7:31:39 ?. ???????-4, Paul van der Vlis > > ??????: > > > > > > Hello, > > > > I am the new sysadmin of an organization what uses Glusterfs. > > I did not set it up, and I don't know much about Glusterfs. > > > > What I do not understand is that I do not see all data in the mount. > > Not as root, not as a normal user who has privileges. > > > > When I do "ls" in one of the subdirectories I don't see any data, but > > this data exists at the server! > > > > In another subdirectory I see everything fine, the rights of the > > directories and files inside are the same. > > > > I mount with something like: > > /bin/mount -t glusterfs -o acl 10.8.0.1:/data /data > > I see data in /data/VOORBEELD/, and I don't see any data in > /data/ALGEMEEN/. > > > > I don't see something special in /etc/exports or in /etc/glusterfs on > > the server. > > > > Is there maybe a mechanism in Glusterfs what can exclude data from > > export? Or is there a way to debug this problem? > > > > With regards, > > Paul van der Vlis > > > > ---- > > # file: VOORBEELD > > # owner: root > > # group: secretariaat > > # flags: -s- > > user::rwx > > group::rwx > > group:medewerkers:r-x > > mask::rwx > > other::--- > > default:user::rwx > > default:group::rwx > > default:group:medewerkers:r-x > > default:mask::rwx > > default:other::--- > > > > # file: ALGEMEEN > > # owner: root > > # group: secretariaat > > # flags: -s- > > user::rwx > > group::rwx > > group:medewerkers:r-x > > mask::rwx > > other::--- > > default:user::rwx > > default:group::rwx > > default:group:medewerkers:r-x > > default:mask::rwx > > default:other::--- > > ------ > > > > > > > > > > > > -- > > Paul van der Vlis Linux systeembeheer Groningen > > https://www.vandervlis.nl/ > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > -- > Paul van der Vlis Linux systeembeheer Groningen > https://www.vandervlis.nl/ > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul at vandervlis.nl Wed May 15 21:34:54 2019 From: paul at vandervlis.nl (Paul van der Vlis) Date: Wed, 15 May 2019 23:34:54 +0200 Subject: [Gluster-users] Cannot see all data in mount In-Reply-To: References: <9241cbaf-38ba-63e0-95f0-120bd9856bf5@vandervlis.nl> <1716249284.809654.1557925164742@mail.yahoo.com> <4e160ac2-002a-8ef8-7660-de7cff369882@vandervlis.nl> Message-ID: <5ca473e6-d2d4-3363-6a98-30667a644e05@vandervlis.nl> Op 15-05-19 om 15:45 schreef Nithya Balachandran: > Hi Paul, > > A few questions: > Which version of gluster are you using? On the server and some clients: glusterfs 4.1.2 On a new client: glusterfs 5.5 > Did this behaviour start recently? As in were the contents of that > directory visible earlier? This directory was normally used in the headoffice, and there is direct access to the files without Glusterfs. So I don't know. With regards, Paul van der Vlis > Regards, > Nithya > > > On Wed, 15 May 2019 at 18:55, Paul van der Vlis > wrote: > > Hello Strahil, > > Thanks for your answer. I don't find the word "sharding" in the > configfiles. There is not much shared data (24GB), and only 1 brick: > --- > root at xxx:/etc/glusterfs# gluster volume info DATA > > Volume Name: DATA > Type: Distribute > Volume ID: db53ece1-5def-4f7c-b59d-3a230824032a > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 > Transport-type: tcp > Bricks: > Brick1: xxx-vpn:/DATA > Options Reconfigured: > transport.address-family: inet > nfs.disable: on > ---- > (I have edited this a bit for privacy of my customer). > > I think they have used glusterfs because it can do ACLs. > > With regards, > Paul van der Vlis > > > Op 15-05-19 om 14:59 schreef Strahil Nikolov: > > Most probably you use sharding , which splits the files into smaller > > chunks so you can fit a 1TB file into gluster nodes with bricks of > > smaller size. > > So if you have 2 dispersed servers each having 500Gb brick->? without > > sharding you won't be able to store files larger than the brick size - > > no matter you have free space on the other server. > > > > When sharding is enabled - you will see on the brick the first > shard as > > a file and the rest is in a hidden folder called ".shards" (or > something > > like that). > > > > The benefit is also viewable when you need to do some maintenance on a > > gluster node, as you will need to heal only the shards containing > > modified by the customers' data. > > > > Best Regards, > > Strahil Nikolov > > > > > > ? ?????, 15 ??? 2019 ?., 7:31:39 ?. ???????-4, Paul van der Vlis > > > ??????: > > > > > > Hello, > > > > I am the new sysadmin of an organization what uses Glusterfs. > > I did not set it up, and I don't know much about Glusterfs. > > > > What I do not understand is that I do not see all data in the mount. > > Not as root, not as a normal user who has privileges. > > > > When I do "ls" in one of the subdirectories I don't see any data, but > > this data exists at the server! > > > > In another subdirectory I see everything fine, the rights of the > > directories and files inside are the same. > > > > I mount with something like: > > /bin/mount -t glusterfs -o acl 10.8.0.1:/data /data > > I see data in /data/VOORBEELD/, and I don't see any data in > /data/ALGEMEEN/. > > > > I don't see something special in /etc/exports or in /etc/glusterfs on > > the server. > > > > Is there maybe a mechanism in Glusterfs what can exclude data from > > export?? Or is there a way to debug this problem? > > > > With regards, > > Paul van der Vlis > > > > ---- > > # file: VOORBEELD > > # owner: root > > # group: secretariaat > > # flags: -s- > > user::rwx > > group::rwx > > group:medewerkers:r-x > > mask::rwx > > other::--- > > default:user::rwx > > default:group::rwx > > default:group:medewerkers:r-x > > default:mask::rwx > > default:other::--- > > > > # file: ALGEMEEN > > # owner: root > > # group: secretariaat > > # flags: -s- > > user::rwx > > group::rwx > > group:medewerkers:r-x > > mask::rwx > > other::--- > > default:user::rwx > > default:group::rwx > > default:group:medewerkers:r-x > > default:mask::rwx > > default:other::--- > > ------ > > > > > > > > > > > > -- > > Paul van der Vlis Linux systeembeheer Groningen > > https://www.vandervlis.nl/ > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > -- > Paul van der Vlis Linux systeembeheer Groningen > https://www.vandervlis.nl/ > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > -- Paul van der Vlis Linux systeembeheer Groningen https://www.vandervlis.nl/ From hunter86_bg at yahoo.com Wed May 15 21:46:22 2019 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Wed, 15 May 2019 21:46:22 +0000 (UTC) Subject: [Gluster-users] Cannot see all data in mount In-Reply-To: <4e160ac2-002a-8ef8-7660-de7cff369882@vandervlis.nl> References: <9241cbaf-38ba-63e0-95f0-120bd9856bf5@vandervlis.nl> <1716249284.809654.1557925164742@mail.yahoo.com> <4e160ac2-002a-8ef8-7660-de7cff369882@vandervlis.nl> Message-ID: <1841695718.1050162.1557956782965@mail.yahoo.com> Check with 'gluster volume info | grep shard' If you have it enabled it should show:features.shard: on Keep in mind that disabling sharding is really bad, so if you really use it - do not disable sharding - will cause a real mess. Best Regards,Strahil Nikolov ? ?????, 15 ??? 2019 ?., 16:24:20 ?. ???????+3, Paul van der Vlis ??????: Hello Strahil, Thanks for your answer. I don't find the word "sharding" in the configfiles. There is not much shared data (24GB), and only 1 brick: --- root at xxx:/etc/glusterfs# gluster volume info DATA Volume Name: DATA Type: Distribute Volume ID: db53ece1-5def-4f7c-b59d-3a230824032a Status: Started Snapshot Count: 0 Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: xxx-vpn:/DATA Options Reconfigured: transport.address-family: inet nfs.disable: on ---- (I have edited this a bit for privacy of my customer). I think they have used glusterfs because it can do ACLs. With regards, Paul van der Vlis Op 15-05-19 om 14:59 schreef Strahil Nikolov: > Most probably you use sharding , which splits the files into smaller > chunks so you can fit a 1TB file into gluster nodes with bricks of > smaller size. > So if you have 2 dispersed servers each having 500Gb brick->? without > sharding you won't be able to store files larger than the brick size - > no matter you have free space on the other server. > > When sharding is enabled - you will see on the brick the first shard as > a file and the rest is in a hidden folder called ".shards" (or something > like that). > > The benefit is also viewable when you need to do some maintenance on a > gluster node, as you will need to heal only the shards containing > modified by the customers' data. > > Best Regards, > Strahil Nikolov > > > ? ?????, 15 ??? 2019 ?., 7:31:39 ?. ???????-4, Paul van der Vlis > ??????: > > > Hello, > > I am the new sysadmin of an organization what uses Glusterfs. > I did not set it up, and I don't know much about Glusterfs. > > What I do not understand is that I do not see all data in the mount. > Not as root, not as a normal user who has privileges. > > When I do "ls" in one of the subdirectories I don't see any data, but > this data exists at the server! > > In another subdirectory I see everything fine, the rights of the > directories and files inside are the same. > > I mount with something like: > /bin/mount -t glusterfs -o acl 10.8.0.1:/data /data > I see data in /data/VOORBEELD/, and I don't see any data in /data/ALGEMEEN/. > > I don't see something special in /etc/exports or in /etc/glusterfs on > the server. > > Is there maybe a mechanism in Glusterfs what can exclude data from > export?? Or is there a way to debug this problem? > > With regards, > Paul van der Vlis > > ---- > # file: VOORBEELD > # owner: root > # group: secretariaat > # flags: -s- > user::rwx > group::rwx > group:medewerkers:r-x > mask::rwx > other::--- > default:user::rwx > default:group::rwx > default:group:medewerkers:r-x > default:mask::rwx > default:other::--- > > # file: ALGEMEEN > # owner: root > # group: secretariaat > # flags: -s- > user::rwx > group::rwx > group:medewerkers:r-x > mask::rwx > other::--- > default:user::rwx > default:group::rwx > default:group:medewerkers:r-x > default:mask::rwx > default:other::--- > ------ > > > > > > -- > Paul van der Vlis Linux systeembeheer Groningen > https://www.vandervlis.nl/ > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -- Paul van der Vlis Linux systeembeheer Groningen https://www.vandervlis.nl/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Wed May 15 21:48:20 2019 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Wed, 15 May 2019 21:48:20 +0000 (UTC) Subject: [Gluster-users] Cannot see all data in mount In-Reply-To: <5ca473e6-d2d4-3363-6a98-30667a644e05@vandervlis.nl> References: <9241cbaf-38ba-63e0-95f0-120bd9856bf5@vandervlis.nl> <1716249284.809654.1557925164742@mail.yahoo.com> <4e160ac2-002a-8ef8-7660-de7cff369882@vandervlis.nl> <5ca473e6-d2d4-3363-6a98-30667a644e05@vandervlis.nl> Message-ID: <1246047051.1043872.1557956900813@mail.yahoo.com> It seems that I got confused.So you see the files on the bricks (servers) , but not when you mount glusterfs on the clients ? If so - this is not the sharding feature as it works the opposite way. Best Regards,Strahil Nikolov ? ?????????, 16 ??? 2019 ?., 0:35:04 ?. ???????+3, Paul van der Vlis ??????: Op 15-05-19 om 15:45 schreef Nithya Balachandran: > Hi Paul, > > A few questions: > Which version of gluster are you using? On the server and some clients: glusterfs 4.1.2 On a new client: glusterfs 5.5 > Did this behaviour start recently? As in were the contents of that > directory visible earlier? This directory was normally used in the headoffice, and there is direct access to the files without Glusterfs. So I don't know. With regards, Paul van der Vlis > Regards, > Nithya > > > On Wed, 15 May 2019 at 18:55, Paul van der Vlis > wrote: > >? ? Hello Strahil, > >? ? Thanks for your answer. I don't find the word "sharding" in the >? ? configfiles. There is not much shared data (24GB), and only 1 brick: >? ? --- >? ? root at xxx:/etc/glusterfs# gluster volume info DATA > >? ? Volume Name: DATA >? ? Type: Distribute >? ? Volume ID: db53ece1-5def-4f7c-b59d-3a230824032a >? ? Status: Started >? ? Snapshot Count: 0 >? ? Number of Bricks: 1 >? ? Transport-type: tcp >? ? Bricks: >? ? Brick1: xxx-vpn:/DATA >? ? Options Reconfigured: >? ? transport.address-family: inet >? ? nfs.disable: on >? ? ---- >? ? (I have edited this a bit for privacy of my customer). > >? ? I think they have used glusterfs because it can do ACLs. > >? ? With regards, >? ? Paul van der Vlis > > >? ? Op 15-05-19 om 14:59 schreef Strahil Nikolov: >? ? > Most probably you use sharding , which splits the files into smaller >? ? > chunks so you can fit a 1TB file into gluster nodes with bricks of >? ? > smaller size. >? ? > So if you have 2 dispersed servers each having 500Gb brick->? without >? ? > sharding you won't be able to store files larger than the brick size - >? ? > no matter you have free space on the other server. >? ? > >? ? > When sharding is enabled - you will see on the brick the first >? ? shard as >? ? > a file and the rest is in a hidden folder called ".shards" (or >? ? something >? ? > like that). >? ? > >? ? > The benefit is also viewable when you need to do some maintenance on a >? ? > gluster node, as you will need to heal only the shards containing >? ? > modified by the customers' data. >? ? > >? ? > Best Regards, >? ? > Strahil Nikolov >? ? > >? ? > >? ? > ? ?????, 15 ??? 2019 ?., 7:31:39 ?. ???????-4, Paul van der Vlis >? ? > > ??????: >? ? > >? ? > >? ? > Hello, >? ? > >? ? > I am the new sysadmin of an organization what uses Glusterfs. >? ? > I did not set it up, and I don't know much about Glusterfs. >? ? > >? ? > What I do not understand is that I do not see all data in the mount. >? ? > Not as root, not as a normal user who has privileges. >? ? > >? ? > When I do "ls" in one of the subdirectories I don't see any data, but >? ? > this data exists at the server! >? ? > >? ? > In another subdirectory I see everything fine, the rights of the >? ? > directories and files inside are the same. >? ? > >? ? > I mount with something like: >? ? > /bin/mount -t glusterfs -o acl 10.8.0.1:/data /data >? ? > I see data in /data/VOORBEELD/, and I don't see any data in >? ? /data/ALGEMEEN/. >? ? > >? ? > I don't see something special in /etc/exports or in /etc/glusterfs on >? ? > the server. >? ? > >? ? > Is there maybe a mechanism in Glusterfs what can exclude data from >? ? > export?? Or is there a way to debug this problem? >? ? > >? ? > With regards, >? ? > Paul van der Vlis >? ? > >? ? > ---- >? ? > # file: VOORBEELD >? ? > # owner: root >? ? > # group: secretariaat >? ? > # flags: -s- >? ? > user::rwx >? ? > group::rwx >? ? > group:medewerkers:r-x >? ? > mask::rwx >? ? > other::--- >? ? > default:user::rwx >? ? > default:group::rwx >? ? > default:group:medewerkers:r-x >? ? > default:mask::rwx >? ? > default:other::--- >? ? > >? ? > # file: ALGEMEEN >? ? > # owner: root >? ? > # group: secretariaat >? ? > # flags: -s- >? ? > user::rwx >? ? > group::rwx >? ? > group:medewerkers:r-x >? ? > mask::rwx >? ? > other::--- >? ? > default:user::rwx >? ? > default:group::rwx >? ? > default:group:medewerkers:r-x >? ? > default:mask::rwx >? ? > default:other::--- >? ? > ------ >? ? > >? ? > >? ? > >? ? > >? ? > >? ? > -- >? ? > Paul van der Vlis Linux systeembeheer Groningen >? ? > https://www.vandervlis.nl/ >? ? > _______________________________________________ >? ? > Gluster-users mailing list >? ? > Gluster-users at gluster.org >? ? > >? ? > https://lists.gluster.org/mailman/listinfo/gluster-users > > > >? ? -- >? ? Paul van der Vlis Linux systeembeheer Groningen >? ? https://www.vandervlis.nl/ >? ? _______________________________________________ >? ? Gluster-users mailing list >? ? Gluster-users at gluster.org >? ? https://lists.gluster.org/mailman/listinfo/gluster-users > -- Paul van der Vlis Linux systeembeheer Groningen https://www.vandervlis.nl/ _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From order at rikus.com Thu May 16 02:18:51 2019 From: order at rikus.com (Jeff Bischoff) Date: Wed, 15 May 2019 22:18:51 -0400 Subject: [Gluster-users] Gluster mounts becoming stale and never recovering Message-ID: Hi all, We are having a sporadic issue with our Gluster mounts that is affecting several of our Kubernetes environments. We are having trouble understanding what is causing it, and we could use some guidance from the pros! Scenario We have an environment running a single-node Kubernetes with Heketi and several pods using Gluster mounts. The environment runs fine and the mounts appear to be healthy for up to several days. Suddenly, one or more (sometimes all) Gluster mounts have a problem and shut down the brick. The affected containers enter a crash loop that continues indefinitely, until someone intervenes. To work-around the crash loop, a user needs to trigger the bricks to be started again--either through manually starting them, restarting the Gluster pod or restarting the entire node. Diagnostics The tell-tale error message is seeing the following when describing a pod that is in a crash loop: Message: error while creating mount source path '/var/lib/kubelet/pods/4a2574bb-6fa4-11e9-a315-005056b83c80/volumes/kubernetes.io~glusterfs/db': mkdir /var/lib/kubelet/pods/4a2574bb-6fa4-11e9-a315-005056b83c80/volumes/kubernetes.io~glusterfs/db: file exists We always see that "file exists" message when this error occurs. Looking at the glusterd.log file, there had been nothing in the log for over a day and then suddenly, at the time the crash loop started, this: [2019-05-08 13:49:04.733147] I [MSGID: 106143] [glusterd-pmap.c:397:pmap_registry_remove] 0-pmap: removing brick /var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_a3cef78a5914a2808da0b5736e3daec7/brick on port 49168 [2019-05-08 13:49:04.733374] I [MSGID: 106143] [glusterd-pmap.c:397:pmap_registry_remove] 0-pmap: removing brick /var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_7614e5014a0e402630a0e1fd776acf0a/brick on port 49167 [2019-05-08 13:49:05.003848] W [socket.c:599:__socket_rwv] 0-management: readv on /var/run/gluster/fe4ac75011a4de0e.socket failed (No data available) [2019-05-08 13:49:05.065420] W [socket.c:599:__socket_rwv] 0-management: readv on /var/run/gluster/85e9fb223aa121f2.socket failed (No data available) [2019-05-08 13:49:05.066479] W [socket.c:599:__socket_rwv] 0-management: readv on /var/run/gluster/e2a66e8cd8f5f606.socket failed (No data available) [2019-05-08 13:49:05.067444] W [socket.c:599:__socket_rwv] 0-management: readv on /var/run/gluster/a0625e5b78d69bb8.socket failed (No data available) [2019-05-08 13:49:05.068471] W [socket.c:599:__socket_rwv] 0-management: readv on /var/run/gluster/770bc294526d0360.socket failed (No data available) [2019-05-08 13:49:05.074278] W [socket.c:599:__socket_rwv] 0-management: readv on /var/run/gluster/adbd37fe3e1eed36.socket failed (No data available) [2019-05-08 13:49:05.075497] W [socket.c:599:__socket_rwv] 0-management: readv on /var/run/gluster/17712138f3370e53.socket failed (No data available) [2019-05-08 13:49:05.076545] W [socket.c:599:__socket_rwv] 0-management: readv on /var/run/gluster/a6cf1aca8b23f394.socket failed (No data available) [2019-05-08 13:49:05.077511] W [socket.c:599:__socket_rwv] 0-management: readv on /var/run/gluster/d0f83b191213e877.socket failed (No data available) [2019-05-08 13:49:05.078447] W [socket.c:599:__socket_rwv] 0-management: readv on /var/run/gluster/d5dd08945d4f7f6d.socket failed (No data available) [2019-05-08 13:49:05.079424] W [socket.c:599:__socket_rwv] 0-management: readv on /var/run/gluster/c8d7b10108758e2f.socket failed (No data available) [2019-05-08 13:49:14.778619] I [MSGID: 106143] [glusterd-pmap.c:397:pmap_registry_remove] 0-pmap: removing brick /var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_0ed4f7f941de388cda678fe273e9ceb4/brick on port 49166 ... (and more of the same) Nothing further has been printed to the gluster log since. The bricks do not come back on their own. The version of gluster we are using (running in a container, using the gluster/gluster-centos image from dockerhub): # rpm -qa | grep gluster glusterfs-rdma-4.1.7-1.el7.x86_64 gluster-block-0.3-2.el7.x86_64 python2-gluster-4.1.7-1.el7.x86_64 centos-release-gluster41-1.0-3.el7.centos.noarch glusterfs-4.1.7-1.el7.x86_64 glusterfs-api-4.1.7-1.el7.x86_64 glusterfs-cli-4.1.7-1.el7.x86_64 glusterfs-geo-replication-4.1.7-1.el7.x86_64 glusterfs-libs-4.1.7-1.el7.x86_64 glusterfs-client-xlators-4.1.7-1.el7.x86_64 glusterfs-fuse-4.1.7-1.el7.x86_64 glusterfs-server-4.1.7-1.el7.x86_64 The version of glusterfs running on our Kubernetes node (a CentOS system): ]$ rpm -qa | grep gluster glusterfs-libs-3.12.2-18.el7.x86_64 glusterfs-3.12.2-18.el7.x86_64 glusterfs-fuse-3.12.2-18.el7.x86_64 glusterfs-client-xlators-3.12.2-18.el7.x86_64 The Kubernetes version: $ kubectl version Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5", GitCommit:"2166946f41b36dea2c4626f90a77706f426cdea2", GitTreeState:"clean", BuildDate:"2019-03-25T15:19:22Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5", GitCommit:"2166946f41b36dea2c4626f90a77706f426cdea2", GitTreeState:"clean", BuildDate:"2019-03-25T15:19:22Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"} Our gluster settings/volume options: apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: gluster-heketi selfLink: /apis/storage.k8s.io/v1/storageclasses/gluster-heketi parameters: gidMax: "50000" gidMin: "2000" resturl: http://10.233.35.158:8080 restuser: "null" restuserkey: "null" volumetype: "none" volumeoptions: cluster.post-op-delay-secs 0, performance.client-io-threads off, performance.open-behind off, performance.readdir-ahead off, performance.read-ahead off, performance.stat-prefetch off, performance.write-behind off, performance.io-cache off, cluster.consistent-metadata on, performance.quick-read off, performance.strict-o-direct on provisioner: kubernetes.io/glusterfs reclaimPolicy: Delete Volume info for the heketi volume: gluster> volume info heketidbstorage Volume Name: heketidbstorage Type: Distribute Volume ID: 34b897d0-0953-4f8f-9c5c-54e043e55d92 Status: Started Snapshot Count: 0 Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: 10.10.168.25:/var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_a16f9f0374fe5db948a60a017a3f5e60/brick Options Reconfigured: user.heketi.id: 1d2400626dac780fce12e45a07494853 transport.address-family: inet nfs.disable: on Full Gluster logs available if needed, just let me know how best to provide them. Thanks in advance for any help or suggestions on this! Best, Jeff Bischoff Turbonomic -------------- next part -------------- An HTML attachment was scrubbed... URL: From nbalacha at redhat.com Thu May 16 03:43:30 2019 From: nbalacha at redhat.com (Nithya Balachandran) Date: Thu, 16 May 2019 09:13:30 +0530 Subject: [Gluster-users] Cannot see all data in mount In-Reply-To: <5ca473e6-d2d4-3363-6a98-30667a644e05@vandervlis.nl> References: <9241cbaf-38ba-63e0-95f0-120bd9856bf5@vandervlis.nl> <1716249284.809654.1557925164742@mail.yahoo.com> <4e160ac2-002a-8ef8-7660-de7cff369882@vandervlis.nl> <5ca473e6-d2d4-3363-6a98-30667a644e05@vandervlis.nl> Message-ID: On Thu, 16 May 2019 at 03:05, Paul van der Vlis wrote: > Op 15-05-19 om 15:45 schreef Nithya Balachandran: > > Hi Paul, > > > > A few questions: > > Which version of gluster are you using? > > On the server and some clients: glusterfs 4.1.2 > On a new client: glusterfs 5.5 > > Is the same behaviour seen on both client versions? > > Did this behaviour start recently? As in were the contents of that > > directory visible earlier? > > This directory was normally used in the headoffice, and there is direct > access to the files without Glusterfs. So I don't know. > Do you mean that they access the files on the gluster volume without using the client or that these files were stored elsewhere earlier (not on gluster)? Files on a gluster volume should never be accessed directly. To debug this further, please send the following: 1. The directory contents when the listing is performed directly on the brick. 2. The tcpdump of the gluster client when listing the directory using the following command: tcpdump -i any -s 0 -w /var/tmp/dirls.pcap tcp and not port 22 You can send these directly to me in case you want to keep the information private. Regards, Nithya > > With regards, > Paul van der Vlis > > > Regards, > > Nithya > > > > > > On Wed, 15 May 2019 at 18:55, Paul van der Vlis > > wrote: > > > > Hello Strahil, > > > > Thanks for your answer. I don't find the word "sharding" in the > > configfiles. There is not much shared data (24GB), and only 1 brick: > > --- > > root at xxx:/etc/glusterfs# gluster volume info DATA > > > > Volume Name: DATA > > Type: Distribute > > Volume ID: db53ece1-5def-4f7c-b59d-3a230824032a > > Status: Started > > Snapshot Count: 0 > > Number of Bricks: 1 > > Transport-type: tcp > > Bricks: > > Brick1: xxx-vpn:/DATA > > Options Reconfigured: > > transport.address-family: inet > > nfs.disable: on > > ---- > > (I have edited this a bit for privacy of my customer). > > > > I think they have used glusterfs because it can do ACLs. > > > > With regards, > > Paul van der Vlis > > > > > > Op 15-05-19 om 14:59 schreef Strahil Nikolov: > > > Most probably you use sharding , which splits the files into > smaller > > > chunks so you can fit a 1TB file into gluster nodes with bricks of > > > smaller size. > > > So if you have 2 dispersed servers each having 500Gb brick-> > without > > > sharding you won't be able to store files larger than the brick > size - > > > no matter you have free space on the other server. > > > > > > When sharding is enabled - you will see on the brick the first > > shard as > > > a file and the rest is in a hidden folder called ".shards" (or > > something > > > like that). > > > > > > The benefit is also viewable when you need to do some maintenance > on a > > > gluster node, as you will need to heal only the shards containing > > > modified by the customers' data. > > > > > > Best Regards, > > > Strahil Nikolov > > > > > > > > > ? ?????, 15 ??? 2019 ?., 7:31:39 ?. ???????-4, Paul van der Vlis > > > > ??????: > > > > > > > > > Hello, > > > > > > I am the new sysadmin of an organization what uses Glusterfs. > > > I did not set it up, and I don't know much about Glusterfs. > > > > > > What I do not understand is that I do not see all data in the > mount. > > > Not as root, not as a normal user who has privileges. > > > > > > When I do "ls" in one of the subdirectories I don't see any data, > but > > > this data exists at the server! > > > > > > In another subdirectory I see everything fine, the rights of the > > > directories and files inside are the same. > > > > > > I mount with something like: > > > /bin/mount -t glusterfs -o acl 10.8.0.1:/data /data > > > I see data in /data/VOORBEELD/, and I don't see any data in > > /data/ALGEMEEN/. > > > > > > I don't see something special in /etc/exports or in /etc/glusterfs > on > > > the server. > > > > > > Is there maybe a mechanism in Glusterfs what can exclude data from > > > export? Or is there a way to debug this problem? > > > > > > With regards, > > > Paul van der Vlis > > > > > > ---- > > > # file: VOORBEELD > > > # owner: root > > > # group: secretariaat > > > # flags: -s- > > > user::rwx > > > group::rwx > > > group:medewerkers:r-x > > > mask::rwx > > > other::--- > > > default:user::rwx > > > default:group::rwx > > > default:group:medewerkers:r-x > > > default:mask::rwx > > > default:other::--- > > > > > > # file: ALGEMEEN > > > # owner: root > > > # group: secretariaat > > > # flags: -s- > > > user::rwx > > > group::rwx > > > group:medewerkers:r-x > > > mask::rwx > > > other::--- > > > default:user::rwx > > > default:group::rwx > > > default:group:medewerkers:r-x > > > default:mask::rwx > > > default:other::--- > > > ------ > > > > > > > > > > > > > > > > > > -- > > > Paul van der Vlis Linux systeembeheer Groningen > > > https://www.vandervlis.nl/ > > > _______________________________________________ > > > Gluster-users mailing list > > > Gluster-users at gluster.org > > >> > > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > > > > > -- > > Paul van der Vlis Linux systeembeheer Groningen > > https://www.vandervlis.nl/ > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > > > -- > Paul van der Vlis Linux systeembeheer Groningen > https://www.vandervlis.nl/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abhishpaliwal at gmail.com Thu May 16 05:06:14 2019 From: abhishpaliwal at gmail.com (ABHISHEK PALIWAL) Date: Thu, 16 May 2019 10:36:14 +0530 Subject: [Gluster-users] Memory leak in glusterfs process Message-ID: Hi Team, I upload some valgrind logs from my gluster 5.4 setup. This is writing to the volume every 15 minutes. I stopped glusterd and then copy away the logs. The test was running for some simulated days. They are zipped in valgrind-54.zip. Lots of info in valgrind-2730.log. Lots of possibly lost bytes in glusterfs and even some definitely lost bytes. ==2737== 1,572,880 bytes in 1 blocks are possibly lost in loss record 391 of 391 ==2737== at 0x4C29C25: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==2737== by 0xA22485E: ??? (in /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) ==2737== by 0xA217C94: ??? (in /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) ==2737== by 0xA21D9F8: ??? (in /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) ==2737== by 0xA21DED9: ??? (in /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) ==2737== by 0xA21E685: ??? (in /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) ==2737== by 0xA1B9D8C: init (in /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) ==2737== by 0x4E511CE: xlator_init (in /usr/lib64/libglusterfs.so.0.0.1) ==2737== by 0x4E8A2B8: ??? (in /usr/lib64/libglusterfs.so.0.0.1) ==2737== by 0x4E8AAB3: glusterfs_graph_activate (in /usr/lib64/libglusterfs.so.0.0.1) ==2737== by 0x409C35: glusterfs_process_volfp (in /usr/sbin/glusterfsd) ==2737== by 0x409D99: glusterfs_volumes_init (in /usr/sbin/glusterfsd) ==2737== ==2737== LEAK SUMMARY: ==2737== definitely lost: 1,053 bytes in 10 blocks ==2737== indirectly lost: 317 bytes in 3 blocks ==2737== possibly lost: 2,374,971 bytes in 524 blocks ==2737== still reachable: 53,277 bytes in 201 blocks ==2737== suppressed: 0 bytes in 0 blocks -- Regards Abhishek Paliwal -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: valgrind-54.zip Type: application/zip Size: 45897 bytes Desc: not available URL: From abhishpaliwal at gmail.com Thu May 16 05:19:49 2019 From: abhishpaliwal at gmail.com (ABHISHEK PALIWAL) Date: Thu, 16 May 2019 10:49:49 +0530 Subject: [Gluster-users] Memory leak in glusterfs Message-ID: Hi Team, I upload some valgrind logs from my gluster 5.4 setup. This is writing to the volume every 15 minutes. I stopped glusterd and then copy away the logs. The test was running for some simulated days. They are zipped in valgrind-54.zip. Lots of info in valgrind-2730.log. Lots of possibly lost bytes in glusterfs and even some definitely lost bytes. ==2737== 1,572,880 bytes in 1 blocks are possibly lost in loss record 391 of 391 ==2737== at 0x4C29C25: calloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==2737== by 0xA22485E: ??? (in /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) ==2737== by 0xA217C94: ??? (in /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) ==2737== by 0xA21D9F8: ??? (in /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) ==2737== by 0xA21DED9: ??? (in /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) ==2737== by 0xA21E685: ??? (in /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) ==2737== by 0xA1B9D8C: init (in /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) ==2737== by 0x4E511CE: xlator_init (in /usr/lib64/libglusterfs.so.0.0.1) ==2737== by 0x4E8A2B8: ??? (in /usr/lib64/libglusterfs.so.0.0.1) ==2737== by 0x4E8AAB3: glusterfs_graph_activate (in /usr/lib64/libglusterfs.so.0.0.1) ==2737== by 0x409C35: glusterfs_process_volfp (in /usr/sbin/glusterfsd) ==2737== by 0x409D99: glusterfs_volumes_init (in /usr/sbin/glusterfsd) ==2737== ==2737== LEAK SUMMARY: ==2737== definitely lost: 1,053 bytes in 10 blocks ==2737== indirectly lost: 317 bytes in 3 blocks ==2737== possibly lost: 2,374,971 bytes in 524 blocks ==2737== still reachable: 53,277 bytes in 201 blocks ==2737== suppressed: 0 bytes in 0 blocks -- Regards Abhishek Paliwal -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: valgrind-2748.log Type: text/x-log Size: 23721 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: valgrind-2746.log Type: text/x-log Size: 24526 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: valgrind-2730.log Type: text/x-log Size: 1239130 bytes Desc: not available URL: From spisla80 at gmail.com Thu May 16 07:38:12 2019 From: spisla80 at gmail.com (David Spisla) Date: Thu, 16 May 2019 09:38:12 +0200 Subject: [Gluster-users] [Gluster-devel] Improve stability between SMB/CTDB and Gluster (together with Samba Core Developer) In-Reply-To: References: Message-ID: Hello everyone, if there is any problem in finding a date and time, please contact me. It would be fine to have a meeting soon. Regards David Spisla Am Mo., 13. Mai 2019 um 12:38 Uhr schrieb David Spisla < david.spisla at iternity.com>: > Hi Poornima, > > > > thats fine. I would suggest this dates and times: > > > > May 15th ? 17th at 12:30, 13:30, 14:30 IST (9:00, 10:00, 11:00 CEST) > > May 20th ? 24th at 12:30, 13:30, 14:30 IST (9:00, 10:00, 11:00 CEST) > > > > I add Volker Lendecke from Sernet to the mail. He is the Samba Expert. > > Can someone of you provide a host via bluejeans.com? If not, I will try > it with GoToMeeting (https://www.gotomeeting.com). > > > > @all Please write your prefered dates and times. For me, all oft the above > dates and times are fine > > > > Regards > > David > > > > > > *Von:* Poornima Gurusiddaiah > *Gesendet:* Montag, 13. Mai 2019 07:22 > *An:* David Spisla ; Anoop C S ; > Gunther Deschner > *Cc:* Gluster Devel ; gluster-users at gluster.org > List > *Betreff:* Re: [Gluster-devel] Improve stability between SMB/CTDB and > Gluster (together with Samba Core Developer) > > > > Hi, > > > > We would be definitely interested in this. Thank you for contacting us. > For the starter we can have an online conference. Please suggest few > possible date and times for the week(preferably between IST 7.00AM - 9.PM > )? > > Adding Anoop and Gunther who are also the main contributors to the > Gluster-Samba integration. > > > > Thanks, > > Poornima > > > > > > > > On Thu, May 9, 2019 at 7:43 PM David Spisla wrote: > > Dear Gluster Community, > > at the moment we are improving the stability of SMB/CTDB and Gluster. For > this purpose we are working together with an advanced SAMBA Core Developer. > He did some debugging but needs more information about Gluster Core > Behaviour. > > > > *Would any of the Gluster Developer wants to have a online conference with > him and me?* > > > > I would organize everything. In my opinion this is a good chance to > improve stability of Glusterfs and this is at the moment one of the major > issues in the Community. > > > > Regards > > David Spisla > > _______________________________________________ > > Community Meeting Calendar: > > APAC Schedule - > Every 2nd and 4th Tuesday at 11:30 AM IST > Bridge: https://bluejeans.com/836554017 > > NA/EMEA Schedule - > Every 1st and 3rd Tuesday at 01:00 PM EDT > Bridge: https://bluejeans.com/486278655 > > Gluster-devel mailing list > Gluster-devel at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-devel > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image860747.png Type: image/png Size: 382 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image735814.png Type: image/png Size: 412 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image116096.png Type: image/png Size: 6545 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image142576.png Type: image/png Size: 37146 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image714843.png Type: image/png Size: 522 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image293410.png Type: image/png Size: 591 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image570372.png Type: image/png Size: 775 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image031225.png Type: image/png Size: 508 bytes Desc: not available URL: From spisla80 at gmail.com Thu May 16 07:53:36 2019 From: spisla80 at gmail.com (David Spisla) Date: Thu, 16 May 2019 09:53:36 +0200 Subject: [Gluster-users] Brick-Xlators crashes after Set-RO and Read In-Reply-To: References: Message-ID: Hello Vijay, I could reproduce the issue. After doing a simple DIR Listing from Win10 powershell, all brick processes crashes. Its not the same scenario mentioned before but the crash report in the bricks log is the same. Attached you find the backtrace. Regards David Spisla Am Di., 7. Mai 2019 um 20:08 Uhr schrieb Vijay Bellur : > Hello David, > > On Tue, May 7, 2019 at 2:16 AM David Spisla wrote: > >> Hello Vijay, >> >> how can I create such a core file? Or will it be created automatically if >> a gluster process crashes? >> Maybe you can give me a hint and will try to get a backtrace. >> > > Generation of core file is dependent on the system configuration. `man 5 > core` contains useful information to generate a core file in a directory. > Once a core file is generated, you can use gdb to get a backtrace of all > threads (using "thread apply all bt full"). > > >> Unfortunately this bug is not easy to reproduce because it appears only >> sometimes. >> > > If the bug is not easy to reproduce, having a backtrace from the generated > core would be very useful! > > Thanks, > Vijay > > >> >> Regards >> David Spisla >> >> Am Mo., 6. Mai 2019 um 19:48 Uhr schrieb Vijay Bellur > >: >> >>> Thank you for the report, David. Do you have core files available on any >>> of the servers? If yes, would it be possible for you to provide a backtrace. >>> >>> Regards, >>> Vijay >>> >>> On Mon, May 6, 2019 at 3:09 AM David Spisla wrote: >>> >>>> Hello folks, >>>> >>>> we have a client application (runs on Win10) which does some FOPs on a >>>> gluster volume which is accessed by SMB. >>>> >>>> *Scenario 1* is a READ Operation which reads all files successively >>>> and checks if the files data was correctly copied. While doing this, all >>>> brick processes crashes and in the logs one have this crash report on every >>>> brick log: >>>> >>>>> CTX_ID:a0359502-2c76-4fee-8cb9-365679dc690e-GRAPH_ID:0-PID:32934-HOST:XX-XXXXX-XX-XX-PC_NAME:shortterm-client-2-RECON_NO:-0, gfid: 00000000-0000-0000-0000-000000000001, req(uid:2000,gid:2000,perm:1,ngrps:1), ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission denied] >>>>> pending frames: >>>>> frame : type(0) op(27) >>>>> frame : type(0) op(40) >>>>> patchset: git://git.gluster.org/glusterfs.git >>>>> signal received: 11 >>>>> time of crash: >>>>> 2019-04-16 08:32:21 >>>>> configuration details: >>>>> argp 1 >>>>> backtrace 1 >>>>> dlfcn 1 >>>>> libpthread 1 >>>>> llistxattr 1 >>>>> setfsid 1 >>>>> spinlock 1 >>>>> epoll.h 1 >>>>> xattr.h 1 >>>>> st_atim.tv_nsec 1 >>>>> package-string: glusterfs 5.5 >>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f9a5bd4d64c] >>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f9a5bd57d26] >>>>> /lib64/libc.so.6(+0x361a0)[0x7f9a5af141a0] >>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7f9a4ef0e910] >>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7f9a4ef0b118] >>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7f9a4f1278d6] >>>>> /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7f9a4f35975b] >>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7f9a4f1203b3] >>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7f9a4ef0b5b2] >>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c] >>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c] >>>>> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7f9a4e8cf548] >>>>> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7f9a5bdefc22] >>>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7f9a5bd733a5] >>>>> /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7f9a4e6b7088] >>>>> /lib64/libpthread.so.0(+0x7569)[0x7f9a5b29f569] >>>>> /lib64/libc.so.6(clone+0x3f)[0x7f9a5afd69af] >>>>> >>>>> *Scenario 2 *The application just SET Read-Only on each file >>>> sucessively. After the 70th file was set, all the bricks crashes and again, >>>> one can read this crash report in every brick log: >>>> >>>>> >>>>> >>>>> [2019-05-02 07:43:39.953591] I [MSGID: 139001] >>>>> [posix-acl.c:263:posix_acl_log_permit_denied] 0-longterm-access-control: >>>>> client: >>>>> CTX_ID:21aa9c75-3a5f-41f9-925b-48e4c80bd24a-GRAPH_ID:0-PID:16325-HOST:XXX-X-X-XXX-PC_NAME:longterm-client-0-RECON_NO:-0, >>>>> gfid: 00000000-0000-0000-0000-000000000001, >>>>> req(uid:2000,gid:2000,perm:1,ngrps:1), >>>>> ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission >>>>> denied] >>>>> >>>>> pending frames: >>>>> >>>>> frame : type(0) op(27) >>>>> >>>>> patchset: git://git.gluster.org/glusterfs.git >>>>> >>>>> signal received: 11 >>>>> >>>>> time of crash: >>>>> >>>>> 2019-05-02 07:43:39 >>>>> >>>>> configuration details: >>>>> >>>>> argp 1 >>>>> >>>>> backtrace 1 >>>>> >>>>> dlfcn 1 >>>>> >>>>> libpthread 1 >>>>> >>>>> llistxattr 1 >>>>> >>>>> setfsid 1 >>>>> >>>>> spinlock 1 >>>>> >>>>> epoll.h 1 >>>>> >>>>> xattr.h 1 >>>>> >>>>> st_atim.tv_nsec 1 >>>>> >>>>> package-string: glusterfs 5.5 >>>>> >>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fbb3f0b364c] >>>>> >>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fbb3f0bdd26] >>>>> >>>>> /lib64/libc.so.6(+0x361e0)[0x7fbb3e27a1e0] >>>>> >>>>> >>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7fbb32257910] >>>>> >>>>> >>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7fbb32254118] >>>>> >>>>> >>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7fbb324708d6] >>>>> >>>>> >>>>> /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7fbb326a275b] >>>>> >>>>> >>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7fbb324693b3] >>>>> >>>>> >>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7fbb322545b2] >>>>> >>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c] >>>>> >>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c] >>>>> >>>>> >>>>> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7fbb31c18548] >>>>> >>>>> >>>>> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7fbb3f155c22] >>>>> >>>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7fbb3f0d93a5] >>>>> >>>>> >>>>> /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7fbb31a00088] >>>>> >>>>> /lib64/libpthread.so.0(+0x7569)[0x7fbb3e605569] >>>>> >>>>> /lib64/libc.so.6(clone+0x3f)[0x7fbb3e33c9ef] >>>>> >>>> >>>> This happens on a 3-Node Gluster v5.5 Cluster on two different volumes. >>>> But both volumes has the same settings: >>>> >>>>> Volume Name: shortterm >>>>> Type: Replicate >>>>> Volume ID: 5307e5c5-e8a1-493a-a846-342fb0195dee >>>>> Status: Started >>>>> Snapshot Count: 0 >>>>> Number of Bricks: 1 x 3 = 3 >>>>> Transport-type: tcp >>>>> Bricks: >>>>> Brick1: fs-xxxxx-c1-n1:/gluster/brick4/glusterbrick >>>>> Brick2: fs-xxxxx-c1-n2:/gluster/brick4/glusterbrick >>>>> Brick3: fs-xxxxx-c1-n3:/gluster/brick4/glusterbrick >>>>> Options Reconfigured: >>>>> storage.reserve: 1 >>>>> performance.client-io-threads: off >>>>> nfs.disable: on >>>>> transport.address-family: inet >>>>> user.smb: disable >>>>> features.read-only: off >>>>> features.worm: off >>>>> features.worm-file-level: on >>>>> features.retention-mode: enterprise >>>>> features.default-retention-period: 120 >>>>> network.ping-timeout: 10 >>>>> features.cache-invalidation: on >>>>> features.cache-invalidation-timeout: 600 >>>>> performance.nl-cache: on >>>>> performance.nl-cache-timeout: 600 >>>>> client.event-threads: 32 >>>>> server.event-threads: 32 >>>>> cluster.lookup-optimize: on >>>>> performance.stat-prefetch: on >>>>> performance.cache-invalidation: on >>>>> performance.md-cache-timeout: 600 >>>>> performance.cache-samba-metadata: on >>>>> performance.cache-ima-xattrs: on >>>>> performance.io-thread-count: 64 >>>>> cluster.use-compound-fops: on >>>>> performance.cache-size: 512MB >>>>> performance.cache-refresh-timeout: 10 >>>>> performance.read-ahead: off >>>>> performance.write-behind-window-size: 4MB >>>>> performance.write-behind: on >>>>> storage.build-pgfid: on >>>>> features.utime: on >>>>> storage.ctime: on >>>>> cluster.quorum-type: fixed >>>>> cluster.quorum-count: 2 >>>>> features.bitrot: on >>>>> features.scrub: Active >>>>> features.scrub-freq: daily >>>>> cluster.enable-shared-storage: enable >>>>> >>>>> >>>> Why can this happen to all Brick processes? I don't understand the >>>> crash report. The FOPs are nothing special and after restart brick >>>> processes everything works fine and our application was succeed. >>>> >>>> Regards >>>> David Spisla >>>> >>>> >>>> >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: backtrace.log Type: application/octet-stream Size: 36515 bytes Desc: not available URL: From vbellur at redhat.com Thu May 16 08:05:22 2019 From: vbellur at redhat.com (Vijay Bellur) Date: Thu, 16 May 2019 01:05:22 -0700 Subject: [Gluster-users] Brick-Xlators crashes after Set-RO and Read In-Reply-To: References: Message-ID: Hello David, Do you have any custom patches in your deployment? I looked up v5.5 but could not find the following functions referred to in the core: map_atime_from_server() worm_lookup_cbk() Neither do I see xlator_helper.c in the codebase. Thanks, Vijay #0 map_atime_from_server (this=0x7fdef401af00, stbuf=0x0) at ../../../../xlators/lib/src/xlator_helper.c:21 __FUNCTION__ = "map_to_atime_from_server" #1 0x00007fdef39a0382 in worm_lookup_cbk (frame=frame at entry=0x7fdeac0015c8, cookie=, this=0x7fdef401af00, op_ret=op_ret at entry=-1, op_errno=op_errno at entry=13, inode=inode at entry=0x0, buf=0x0, xdata=0x0, postparent=0x0) at worm.c:531 priv = 0x7fdef4075378 ret = 0 __FUNCTION__ = "worm_lookup_cbk" On Thu, May 16, 2019 at 12:53 AM David Spisla wrote: > Hello Vijay, > > I could reproduce the issue. After doing a simple DIR Listing from Win10 > powershell, all brick processes crashes. Its not the same scenario > mentioned before but the crash report in the bricks log is the same. > Attached you find the backtrace. > > Regards > David Spisla > > Am Di., 7. Mai 2019 um 20:08 Uhr schrieb Vijay Bellur >: > >> Hello David, >> >> On Tue, May 7, 2019 at 2:16 AM David Spisla wrote: >> >>> Hello Vijay, >>> >>> how can I create such a core file? Or will it be created automatically >>> if a gluster process crashes? >>> Maybe you can give me a hint and will try to get a backtrace. >>> >> >> Generation of core file is dependent on the system configuration. `man 5 >> core` contains useful information to generate a core file in a directory. >> Once a core file is generated, you can use gdb to get a backtrace of all >> threads (using "thread apply all bt full"). >> >> >>> Unfortunately this bug is not easy to reproduce because it appears only >>> sometimes. >>> >> >> If the bug is not easy to reproduce, having a backtrace from the >> generated core would be very useful! >> >> Thanks, >> Vijay >> >> >>> >>> Regards >>> David Spisla >>> >>> Am Mo., 6. Mai 2019 um 19:48 Uhr schrieb Vijay Bellur < >>> vbellur at redhat.com>: >>> >>>> Thank you for the report, David. Do you have core files available on >>>> any of the servers? If yes, would it be possible for you to provide a >>>> backtrace. >>>> >>>> Regards, >>>> Vijay >>>> >>>> On Mon, May 6, 2019 at 3:09 AM David Spisla wrote: >>>> >>>>> Hello folks, >>>>> >>>>> we have a client application (runs on Win10) which does some FOPs on a >>>>> gluster volume which is accessed by SMB. >>>>> >>>>> *Scenario 1* is a READ Operation which reads all files successively >>>>> and checks if the files data was correctly copied. While doing this, all >>>>> brick processes crashes and in the logs one have this crash report on every >>>>> brick log: >>>>> >>>>>> CTX_ID:a0359502-2c76-4fee-8cb9-365679dc690e-GRAPH_ID:0-PID:32934-HOST:XX-XXXXX-XX-XX-PC_NAME:shortterm-client-2-RECON_NO:-0, gfid: 00000000-0000-0000-0000-000000000001, req(uid:2000,gid:2000,perm:1,ngrps:1), ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission denied] >>>>>> pending frames: >>>>>> frame : type(0) op(27) >>>>>> frame : type(0) op(40) >>>>>> patchset: git://git.gluster.org/glusterfs.git >>>>>> signal received: 11 >>>>>> time of crash: >>>>>> 2019-04-16 08:32:21 >>>>>> configuration details: >>>>>> argp 1 >>>>>> backtrace 1 >>>>>> dlfcn 1 >>>>>> libpthread 1 >>>>>> llistxattr 1 >>>>>> setfsid 1 >>>>>> spinlock 1 >>>>>> epoll.h 1 >>>>>> xattr.h 1 >>>>>> st_atim.tv_nsec 1 >>>>>> package-string: glusterfs 5.5 >>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f9a5bd4d64c] >>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f9a5bd57d26] >>>>>> /lib64/libc.so.6(+0x361a0)[0x7f9a5af141a0] >>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7f9a4ef0e910] >>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7f9a4ef0b118] >>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7f9a4f1278d6] >>>>>> /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7f9a4f35975b] >>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7f9a4f1203b3] >>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7f9a4ef0b5b2] >>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c] >>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c] >>>>>> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7f9a4e8cf548] >>>>>> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7f9a5bdefc22] >>>>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7f9a5bd733a5] >>>>>> /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7f9a4e6b7088] >>>>>> /lib64/libpthread.so.0(+0x7569)[0x7f9a5b29f569] >>>>>> /lib64/libc.so.6(clone+0x3f)[0x7f9a5afd69af] >>>>>> >>>>>> *Scenario 2 *The application just SET Read-Only on each file >>>>> sucessively. After the 70th file was set, all the bricks crashes and again, >>>>> one can read this crash report in every brick log: >>>>> >>>>>> >>>>>> >>>>>> [2019-05-02 07:43:39.953591] I [MSGID: 139001] >>>>>> [posix-acl.c:263:posix_acl_log_permit_denied] 0-longterm-access-control: >>>>>> client: >>>>>> CTX_ID:21aa9c75-3a5f-41f9-925b-48e4c80bd24a-GRAPH_ID:0-PID:16325-HOST:XXX-X-X-XXX-PC_NAME:longterm-client-0-RECON_NO:-0, >>>>>> gfid: 00000000-0000-0000-0000-000000000001, >>>>>> req(uid:2000,gid:2000,perm:1,ngrps:1), >>>>>> ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission >>>>>> denied] >>>>>> >>>>>> pending frames: >>>>>> >>>>>> frame : type(0) op(27) >>>>>> >>>>>> patchset: git://git.gluster.org/glusterfs.git >>>>>> >>>>>> signal received: 11 >>>>>> >>>>>> time of crash: >>>>>> >>>>>> 2019-05-02 07:43:39 >>>>>> >>>>>> configuration details: >>>>>> >>>>>> argp 1 >>>>>> >>>>>> backtrace 1 >>>>>> >>>>>> dlfcn 1 >>>>>> >>>>>> libpthread 1 >>>>>> >>>>>> llistxattr 1 >>>>>> >>>>>> setfsid 1 >>>>>> >>>>>> spinlock 1 >>>>>> >>>>>> epoll.h 1 >>>>>> >>>>>> xattr.h 1 >>>>>> >>>>>> st_atim.tv_nsec 1 >>>>>> >>>>>> package-string: glusterfs 5.5 >>>>>> >>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fbb3f0b364c] >>>>>> >>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fbb3f0bdd26] >>>>>> >>>>>> /lib64/libc.so.6(+0x361e0)[0x7fbb3e27a1e0] >>>>>> >>>>>> >>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7fbb32257910] >>>>>> >>>>>> >>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7fbb32254118] >>>>>> >>>>>> >>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7fbb324708d6] >>>>>> >>>>>> >>>>>> /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7fbb326a275b] >>>>>> >>>>>> >>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7fbb324693b3] >>>>>> >>>>>> >>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7fbb322545b2] >>>>>> >>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c] >>>>>> >>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c] >>>>>> >>>>>> >>>>>> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7fbb31c18548] >>>>>> >>>>>> >>>>>> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7fbb3f155c22] >>>>>> >>>>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7fbb3f0d93a5] >>>>>> >>>>>> >>>>>> /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7fbb31a00088] >>>>>> >>>>>> /lib64/libpthread.so.0(+0x7569)[0x7fbb3e605569] >>>>>> >>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fbb3e33c9ef] >>>>>> >>>>> >>>>> This happens on a 3-Node Gluster v5.5 Cluster on two different >>>>> volumes. But both volumes has the same settings: >>>>> >>>>>> Volume Name: shortterm >>>>>> Type: Replicate >>>>>> Volume ID: 5307e5c5-e8a1-493a-a846-342fb0195dee >>>>>> Status: Started >>>>>> Snapshot Count: 0 >>>>>> Number of Bricks: 1 x 3 = 3 >>>>>> Transport-type: tcp >>>>>> Bricks: >>>>>> Brick1: fs-xxxxx-c1-n1:/gluster/brick4/glusterbrick >>>>>> Brick2: fs-xxxxx-c1-n2:/gluster/brick4/glusterbrick >>>>>> Brick3: fs-xxxxx-c1-n3:/gluster/brick4/glusterbrick >>>>>> Options Reconfigured: >>>>>> storage.reserve: 1 >>>>>> performance.client-io-threads: off >>>>>> nfs.disable: on >>>>>> transport.address-family: inet >>>>>> user.smb: disable >>>>>> features.read-only: off >>>>>> features.worm: off >>>>>> features.worm-file-level: on >>>>>> features.retention-mode: enterprise >>>>>> features.default-retention-period: 120 >>>>>> network.ping-timeout: 10 >>>>>> features.cache-invalidation: on >>>>>> features.cache-invalidation-timeout: 600 >>>>>> performance.nl-cache: on >>>>>> performance.nl-cache-timeout: 600 >>>>>> client.event-threads: 32 >>>>>> server.event-threads: 32 >>>>>> cluster.lookup-optimize: on >>>>>> performance.stat-prefetch: on >>>>>> performance.cache-invalidation: on >>>>>> performance.md-cache-timeout: 600 >>>>>> performance.cache-samba-metadata: on >>>>>> performance.cache-ima-xattrs: on >>>>>> performance.io-thread-count: 64 >>>>>> cluster.use-compound-fops: on >>>>>> performance.cache-size: 512MB >>>>>> performance.cache-refresh-timeout: 10 >>>>>> performance.read-ahead: off >>>>>> performance.write-behind-window-size: 4MB >>>>>> performance.write-behind: on >>>>>> storage.build-pgfid: on >>>>>> features.utime: on >>>>>> storage.ctime: on >>>>>> cluster.quorum-type: fixed >>>>>> cluster.quorum-count: 2 >>>>>> features.bitrot: on >>>>>> features.scrub: Active >>>>>> features.scrub-freq: daily >>>>>> cluster.enable-shared-storage: enable >>>>>> >>>>>> >>>>> Why can this happen to all Brick processes? I don't understand the >>>>> crash report. The FOPs are nothing special and after restart brick >>>>> processes everything works fine and our application was succeed. >>>>> >>>>> Regards >>>>> David Spisla >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Gluster-users mailing list >>>>> Gluster-users at gluster.org >>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>> >>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul at vandervlis.nl Thu May 16 08:47:46 2019 From: paul at vandervlis.nl (Paul van der Vlis) Date: Thu, 16 May 2019 10:47:46 +0200 Subject: [Gluster-users] Cannot see all data in mount In-Reply-To: References: <9241cbaf-38ba-63e0-95f0-120bd9856bf5@vandervlis.nl> <1716249284.809654.1557925164742@mail.yahoo.com> <4e160ac2-002a-8ef8-7660-de7cff369882@vandervlis.nl> <5ca473e6-d2d4-3363-6a98-30667a644e05@vandervlis.nl> Message-ID: <6ef1edd2-7051-a7ad-a0c3-b59fa00aec03@vandervlis.nl> Op 16-05-19 om 05:43 schreef Nithya Balachandran: > > > On Thu, 16 May 2019 at 03:05, Paul van der Vlis > wrote: > > Op 15-05-19 om 15:45 schreef Nithya Balachandran: > > Hi Paul, > > > > A few questions: > > Which version of gluster are you using? > > On the server and some clients: glusterfs 4.1.2 > On a new client: glusterfs 5.5 > > Is the same behaviour seen on both client versions? Yes. > > Did this behaviour start recently? As in were the contents of that > > directory visible earlier? > > This directory was normally used in the headoffice, and there is direct > access to the files without Glusterfs. So I don't know. > > > Do you mean that they access the files on the gluster volume without > using the client or that these files were stored elsewhere > earlier (not on gluster)? Files on a gluster volume should never be > accessed directly. The central server (this is the only gluster-brick) is a thin-client server, people are working directly on the server using LTSP terminals: http://ltsp.org/). The data is exported using Gluster to some other machines in smaller offices. And to a new thin-client server what I am making (using X2go). The goal is that this server will replace all of the excisting machines in the future. X2go is something like "Citrix for Linux", you can use it over the internet. I did not setup Gluster and I have never met the old sysadmin. I guess it's also very strange to use Gluster with only one brick. So when I understand you right, the whole setup is wrong, and you may not access the files without client? > To debug this further, please send the following: > > 1. The directory contents when the listing is performed directly on the > brick. > 2. The tcpdump of the gluster client when listing the directory using > the following command: > > tcpdump -i any -s 0 -w /var/tmp/dirls.pcap tcp and not port 22 > > > You can send these directly to me in case you want to keep the > information private. I have just heard (during writing this message) that the owner of the firm where I make this for, is in hospital in very critical condition. They've asked me to stop with the work at the moment. I did also hear that there where more problems with the filesystem. Especially when a directory was renamed. And this directory was renamed in the past. With regards, Paul van der Vlis > Regards, > Nithya > ? > > > With regards, > Paul van der Vlis > > > Regards, > > Nithya > > > > > > On Wed, 15 May 2019 at 18:55, Paul van der Vlis > > > >> wrote: > > > >? ? ?Hello Strahil, > > > >? ? ?Thanks for your answer. I don't find the word "sharding" in the > >? ? ?configfiles. There is not much shared data (24GB), and only 1 > brick: > >? ? ?--- > >? ? ?root at xxx:/etc/glusterfs# gluster volume info DATA > > > >? ? ?Volume Name: DATA > >? ? ?Type: Distribute > >? ? ?Volume ID: db53ece1-5def-4f7c-b59d-3a230824032a > >? ? ?Status: Started > >? ? ?Snapshot Count: 0 > >? ? ?Number of Bricks: 1 > >? ? ?Transport-type: tcp > >? ? ?Bricks: > >? ? ?Brick1: xxx-vpn:/DATA > >? ? ?Options Reconfigured: > >? ? ?transport.address-family: inet > >? ? ?nfs.disable: on > >? ? ?---- > >? ? ?(I have edited this a bit for privacy of my customer). > > > >? ? ?I think they have used glusterfs because it can do ACLs. > > > >? ? ?With regards, > >? ? ?Paul van der Vlis > > > > > >? ? ?Op 15-05-19 om 14:59 schreef Strahil Nikolov: > >? ? ?> Most probably you use sharding , which splits the files into > smaller > >? ? ?> chunks so you can fit a 1TB file into gluster nodes with > bricks of > >? ? ?> smaller size. > >? ? ?> So if you have 2 dispersed servers each having 500Gb > brick->? without > >? ? ?> sharding you won't be able to store files larger than the > brick size - > >? ? ?> no matter you have free space on the other server. > >? ? ?> > >? ? ?> When sharding is enabled - you will see on the brick the first > >? ? ?shard as > >? ? ?> a file and the rest is in a hidden folder called ".shards" (or > >? ? ?something > >? ? ?> like that). > >? ? ?> > >? ? ?> The benefit is also viewable when you need to do some > maintenance on a > >? ? ?> gluster node, as you will need to heal only the shards > containing > >? ? ?> modified by the customers' data. > >? ? ?> > >? ? ?> Best Regards, > >? ? ?> Strahil Nikolov > >? ? ?> > >? ? ?> > >? ? ?> ? ?????, 15 ??? 2019 ?., 7:31:39 ?. ???????-4, Paul van der Vlis > >? ? ?> > >> ??????: > >? ? ?> > >? ? ?> > >? ? ?> Hello, > >? ? ?> > >? ? ?> I am the new sysadmin of an organization what uses Glusterfs. > >? ? ?> I did not set it up, and I don't know much about Glusterfs. > >? ? ?> > >? ? ?> What I do not understand is that I do not see all data in > the mount. > >? ? ?> Not as root, not as a normal user who has privileges. > >? ? ?> > >? ? ?> When I do "ls" in one of the subdirectories I don't see any > data, but > >? ? ?> this data exists at the server! > >? ? ?> > >? ? ?> In another subdirectory I see everything fine, the rights of the > >? ? ?> directories and files inside are the same. > >? ? ?> > >? ? ?> I mount with something like: > >? ? ?> /bin/mount -t glusterfs -o acl 10.8.0.1:/data /data > >? ? ?> I see data in /data/VOORBEELD/, and I don't see any data in > >? ? ?/data/ALGEMEEN/. > >? ? ?> > >? ? ?> I don't see something special in /etc/exports or in > /etc/glusterfs on > >? ? ?> the server. > >? ? ?> > >? ? ?> Is there maybe a mechanism in Glusterfs what can exclude > data from > >? ? ?> export?? Or is there a way to debug this problem? > >? ? ?> > >? ? ?> With regards, > >? ? ?> Paul van der Vlis > >? ? ?> > >? ? ?> ---- > >? ? ?> # file: VOORBEELD > >? ? ?> # owner: root > >? ? ?> # group: secretariaat > >? ? ?> # flags: -s- > >? ? ?> user::rwx > >? ? ?> group::rwx > >? ? ?> group:medewerkers:r-x > >? ? ?> mask::rwx > >? ? ?> other::--- > >? ? ?> default:user::rwx > >? ? ?> default:group::rwx > >? ? ?> default:group:medewerkers:r-x > >? ? ?> default:mask::rwx > >? ? ?> default:other::--- > >? ? ?> > >? ? ?> # file: ALGEMEEN > >? ? ?> # owner: root > >? ? ?> # group: secretariaat > >? ? ?> # flags: -s- > >? ? ?> user::rwx > >? ? ?> group::rwx > >? ? ?> group:medewerkers:r-x > >? ? ?> mask::rwx > >? ? ?> other::--- > >? ? ?> default:user::rwx > >? ? ?> default:group::rwx > >? ? ?> default:group:medewerkers:r-x > >? ? ?> default:mask::rwx > >? ? ?> default:other::--- > >? ? ?> ------ > >? ? ?> > >? ? ?> > >? ? ?> > >? ? ?> > >? ? ?> > >? ? ?> -- > >? ? ?> Paul van der Vlis Linux systeembeheer Groningen > >? ? ?> https://www.vandervlis.nl/ > >? ? ?> _______________________________________________ > >? ? ?> Gluster-users mailing list > >? ? ?> Gluster-users at gluster.org > > > >? ? ? >> > >? ? ?> https://lists.gluster.org/mailman/listinfo/gluster-users > > > > > > > >? ? ?-- > >? ? ?Paul van der Vlis Linux systeembeheer Groningen > >? ? ?https://www.vandervlis.nl/ > >? ? ?_______________________________________________ > >? ? ?Gluster-users mailing list > >? ? ?Gluster-users at gluster.org > > > >? ? ?https://lists.gluster.org/mailman/listinfo/gluster-users > > > > > > -- > Paul van der Vlis Linux systeembeheer Groningen > https://www.vandervlis.nl/ > -- Paul van der Vlis Linux systeembeheer Groningen https://www.vandervlis.nl/ From nbalacha at redhat.com Thu May 16 09:04:58 2019 From: nbalacha at redhat.com (Nithya Balachandran) Date: Thu, 16 May 2019 14:34:58 +0530 Subject: [Gluster-users] Cannot see all data in mount In-Reply-To: <6ef1edd2-7051-a7ad-a0c3-b59fa00aec03@vandervlis.nl> References: <9241cbaf-38ba-63e0-95f0-120bd9856bf5@vandervlis.nl> <1716249284.809654.1557925164742@mail.yahoo.com> <4e160ac2-002a-8ef8-7660-de7cff369882@vandervlis.nl> <5ca473e6-d2d4-3363-6a98-30667a644e05@vandervlis.nl> <6ef1edd2-7051-a7ad-a0c3-b59fa00aec03@vandervlis.nl> Message-ID: On Thu, 16 May 2019 at 14:17, Paul van der Vlis wrote: > Op 16-05-19 om 05:43 schreef Nithya Balachandran: > > > > > > On Thu, 16 May 2019 at 03:05, Paul van der Vlis > > wrote: > > > > Op 15-05-19 om 15:45 schreef Nithya Balachandran: > > > Hi Paul, > > > > > > A few questions: > > > Which version of gluster are you using? > > > > On the server and some clients: glusterfs 4.1.2 > > On a new client: glusterfs 5.5 > > > > Is the same behaviour seen on both client versions? > > Yes. > > > > Did this behaviour start recently? As in were the contents of that > > > directory visible earlier? > > > > This directory was normally used in the headoffice, and there is > direct > > access to the files without Glusterfs. So I don't know. > > > > > > Do you mean that they access the files on the gluster volume without > > using the client or that these files were stored elsewhere > > earlier (not on gluster)? Files on a gluster volume should never be > > accessed directly. > > The central server (this is the only gluster-brick) is a thin-client > server, people are working directly on the server using LTSP terminals: > http://ltsp.org/). > > The data is exported using Gluster to some other machines in smaller > offices. > > And to a new thin-client server what I am making (using X2go). The goal > is that this server will replace all of the excisting machines in the > future. X2go is something like "Citrix for Linux", you can use it over > the internet. > > I did not setup Gluster and I have never met the old sysadmin. I guess > it's also very strange to use Gluster with only one brick. So when I > understand you right, the whole setup is wrong, and you may not access > the files without client? > > That is correct - any files on a gluster volume should be accessed only via a gluster client (if using fuse). > > To debug this further, please send the following: > > > > 1. The directory contents when the listing is performed directly on the > > brick. > > 2. The tcpdump of the gluster client when listing the directory using > > the following command: > > > > tcpdump -i any -s 0 -w /var/tmp/dirls.pcap tcp and not port 22 > > > > > > You can send these directly to me in case you want to keep the > > information private. > > I have just heard (during writing this message) that the owner of the > firm where I make this for, is in hospital in very critical condition. > They've asked me to stop with the work at the moment. > > I did also hear that there where more problems with the filesystem. > Especially when a directory was renamed. > And this directory was renamed in the past. > > Let me know when you plan to continue with this . We can take a look. Regards, Nithya > With regards, > Paul van der Vlis > > > Regards, > > Nithya > > > > > > > > With regards, > > Paul van der Vlis > > > > > Regards, > > > Nithya > > > > > > > > > On Wed, 15 May 2019 at 18:55, Paul van der Vlis > > > > > >> wrote: > > > > > > Hello Strahil, > > > > > > Thanks for your answer. I don't find the word "sharding" in the > > > configfiles. There is not much shared data (24GB), and only 1 > > brick: > > > --- > > > root at xxx:/etc/glusterfs# gluster volume info DATA > > > > > > Volume Name: DATA > > > Type: Distribute > > > Volume ID: db53ece1-5def-4f7c-b59d-3a230824032a > > > Status: Started > > > Snapshot Count: 0 > > > Number of Bricks: 1 > > > Transport-type: tcp > > > Bricks: > > > Brick1: xxx-vpn:/DATA > > > Options Reconfigured: > > > transport.address-family: inet > > > nfs.disable: on > > > ---- > > > (I have edited this a bit for privacy of my customer). > > > > > > I think they have used glusterfs because it can do ACLs. > > > > > > With regards, > > > Paul van der Vlis > > > > > > > > > Op 15-05-19 om 14:59 schreef Strahil Nikolov: > > > > Most probably you use sharding , which splits the files into > > smaller > > > > chunks so you can fit a 1TB file into gluster nodes with > > bricks of > > > > smaller size. > > > > So if you have 2 dispersed servers each having 500Gb > > brick-> without > > > > sharding you won't be able to store files larger than the > > brick size - > > > > no matter you have free space on the other server. > > > > > > > > When sharding is enabled - you will see on the brick the > first > > > shard as > > > > a file and the rest is in a hidden folder called ".shards" > (or > > > something > > > > like that). > > > > > > > > The benefit is also viewable when you need to do some > > maintenance on a > > > > gluster node, as you will need to heal only the shards > > containing > > > > modified by the customers' data. > > > > > > > > Best Regards, > > > > Strahil Nikolov > > > > > > > > > > > > ? ?????, 15 ??? 2019 ?., 7:31:39 ?. ???????-4, Paul van der > Vlis > > > > > > >> ??????: > > > > > > > > > > > > Hello, > > > > > > > > I am the new sysadmin of an organization what uses Glusterfs. > > > > I did not set it up, and I don't know much about Glusterfs. > > > > > > > > What I do not understand is that I do not see all data in > > the mount. > > > > Not as root, not as a normal user who has privileges. > > > > > > > > When I do "ls" in one of the subdirectories I don't see any > > data, but > > > > this data exists at the server! > > > > > > > > In another subdirectory I see everything fine, the rights of > the > > > > directories and files inside are the same. > > > > > > > > I mount with something like: > > > > /bin/mount -t glusterfs -o acl 10.8.0.1:/data /data > > > > I see data in /data/VOORBEELD/, and I don't see any data in > > > /data/ALGEMEEN/. > > > > > > > > I don't see something special in /etc/exports or in > > /etc/glusterfs on > > > > the server. > > > > > > > > Is there maybe a mechanism in Glusterfs what can exclude > > data from > > > > export? Or is there a way to debug this problem? > > > > > > > > With regards, > > > > Paul van der Vlis > > > > > > > > ---- > > > > # file: VOORBEELD > > > > # owner: root > > > > # group: secretariaat > > > > # flags: -s- > > > > user::rwx > > > > group::rwx > > > > group:medewerkers:r-x > > > > mask::rwx > > > > other::--- > > > > default:user::rwx > > > > default:group::rwx > > > > default:group:medewerkers:r-x > > > > default:mask::rwx > > > > default:other::--- > > > > > > > > # file: ALGEMEEN > > > > # owner: root > > > > # group: secretariaat > > > > # flags: -s- > > > > user::rwx > > > > group::rwx > > > > group:medewerkers:r-x > > > > mask::rwx > > > > other::--- > > > > default:user::rwx > > > > default:group::rwx > > > > default:group:medewerkers:r-x > > > > default:mask::rwx > > > > default:other::--- > > > > ------ > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Paul van der Vlis Linux systeembeheer Groningen > > > > https://www.vandervlis.nl/ > > > > _______________________________________________ > > > > Gluster-users mailing list > > > > Gluster-users at gluster.org > > >> > > > > > >> > > > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > > > > > > > > > -- > > > Paul van der Vlis Linux systeembeheer Groningen > > > https://www.vandervlis.nl/ > > > _______________________________________________ > > > Gluster-users mailing list > > > Gluster-users at gluster.org > > >> > > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > > > > > > > > -- > > Paul van der Vlis Linux systeembeheer Groningen > > https://www.vandervlis.nl/ > > > > > > -- > Paul van der Vlis Linux systeembeheer Groningen > https://www.vandervlis.nl/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chrmeyer at chrmeyer.de Thu May 16 09:27:15 2019 From: chrmeyer at chrmeyer.de (Christian Meyer) Date: Thu, 16 May 2019 11:27:15 +0200 Subject: [Gluster-users] Memory leak in gluster 5.4 Message-ID: Hi everyone! I'm using a Gluster 5.4 Setup with three Nodes and three volumes (one is the gluster shared storage). The other are replicated volumes. Each node has 64GB of RAM. Over the time of ~2 month the memory consumption of glusterd grow linear. An the end glusterd used ~45% of RAM the brick processes together ~43% of RAM. I think this is a memory leak. I made a coredump of the processes (glusterd, bricks) (zipped ~500MB), hope this will help to find the problem. Could someone please have a look on it? Download Coredumps: https://s3.eu-central-1.amazonaws.com/glusterlogs/gluster_coredump.zip Kind regards Christian From spisla80 at gmail.com Thu May 16 09:36:04 2019 From: spisla80 at gmail.com (David Spisla) Date: Thu, 16 May 2019 11:36:04 +0200 Subject: [Gluster-users] Brick-Xlators crashes after Set-RO and Read In-Reply-To: References: Message-ID: Hello Vijay, yes, we are using custom patches. It s a helper function, which is defined in xlator_helper.c and used in worm_lookup_cbk. Do you think this could be the problem? The functions only manipulates the atime in struct iattr Regards David Spisla Am Do., 16. Mai 2019 um 10:05 Uhr schrieb Vijay Bellur : > Hello David, > > Do you have any custom patches in your deployment? I looked up v5.5 but > could not find the following functions referred to in the core: > > map_atime_from_server() > worm_lookup_cbk() > > Neither do I see xlator_helper.c in the codebase. > > Thanks, > Vijay > > > #0 map_atime_from_server (this=0x7fdef401af00, stbuf=0x0) at > ../../../../xlators/lib/src/xlator_helper.c:21 > __FUNCTION__ = "map_to_atime_from_server" > #1 0x00007fdef39a0382 in worm_lookup_cbk (frame=frame at entry=0x7fdeac0015c8, > cookie=, this=0x7fdef401af00, op_ret=op_ret at entry=-1, > op_errno=op_errno at entry=13, > inode=inode at entry=0x0, buf=0x0, xdata=0x0, postparent=0x0) at > worm.c:531 > priv = 0x7fdef4075378 > ret = 0 > __FUNCTION__ = "worm_lookup_cbk" > > On Thu, May 16, 2019 at 12:53 AM David Spisla wrote: > >> Hello Vijay, >> >> I could reproduce the issue. After doing a simple DIR Listing from Win10 >> powershell, all brick processes crashes. Its not the same scenario >> mentioned before but the crash report in the bricks log is the same. >> Attached you find the backtrace. >> >> Regards >> David Spisla >> >> Am Di., 7. Mai 2019 um 20:08 Uhr schrieb Vijay Bellur > >: >> >>> Hello David, >>> >>> On Tue, May 7, 2019 at 2:16 AM David Spisla wrote: >>> >>>> Hello Vijay, >>>> >>>> how can I create such a core file? Or will it be created automatically >>>> if a gluster process crashes? >>>> Maybe you can give me a hint and will try to get a backtrace. >>>> >>> >>> Generation of core file is dependent on the system configuration. `man >>> 5 core` contains useful information to generate a core file in a directory. >>> Once a core file is generated, you can use gdb to get a backtrace of all >>> threads (using "thread apply all bt full"). >>> >>> >>>> Unfortunately this bug is not easy to reproduce because it appears only >>>> sometimes. >>>> >>> >>> If the bug is not easy to reproduce, having a backtrace from the >>> generated core would be very useful! >>> >>> Thanks, >>> Vijay >>> >>> >>>> >>>> Regards >>>> David Spisla >>>> >>>> Am Mo., 6. Mai 2019 um 19:48 Uhr schrieb Vijay Bellur < >>>> vbellur at redhat.com>: >>>> >>>>> Thank you for the report, David. Do you have core files available on >>>>> any of the servers? If yes, would it be possible for you to provide a >>>>> backtrace. >>>>> >>>>> Regards, >>>>> Vijay >>>>> >>>>> On Mon, May 6, 2019 at 3:09 AM David Spisla >>>>> wrote: >>>>> >>>>>> Hello folks, >>>>>> >>>>>> we have a client application (runs on Win10) which does some FOPs on >>>>>> a gluster volume which is accessed by SMB. >>>>>> >>>>>> *Scenario 1* is a READ Operation which reads all files successively >>>>>> and checks if the files data was correctly copied. While doing this, all >>>>>> brick processes crashes and in the logs one have this crash report on every >>>>>> brick log: >>>>>> >>>>>>> CTX_ID:a0359502-2c76-4fee-8cb9-365679dc690e-GRAPH_ID:0-PID:32934-HOST:XX-XXXXX-XX-XX-PC_NAME:shortterm-client-2-RECON_NO:-0, gfid: 00000000-0000-0000-0000-000000000001, req(uid:2000,gid:2000,perm:1,ngrps:1), ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission denied] >>>>>>> pending frames: >>>>>>> frame : type(0) op(27) >>>>>>> frame : type(0) op(40) >>>>>>> patchset: git://git.gluster.org/glusterfs.git >>>>>>> signal received: 11 >>>>>>> time of crash: >>>>>>> 2019-04-16 08:32:21 >>>>>>> configuration details: >>>>>>> argp 1 >>>>>>> backtrace 1 >>>>>>> dlfcn 1 >>>>>>> libpthread 1 >>>>>>> llistxattr 1 >>>>>>> setfsid 1 >>>>>>> spinlock 1 >>>>>>> epoll.h 1 >>>>>>> xattr.h 1 >>>>>>> st_atim.tv_nsec 1 >>>>>>> package-string: glusterfs 5.5 >>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f9a5bd4d64c] >>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f9a5bd57d26] >>>>>>> /lib64/libc.so.6(+0x361a0)[0x7f9a5af141a0] >>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7f9a4ef0e910] >>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7f9a4ef0b118] >>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7f9a4f1278d6] >>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7f9a4f35975b] >>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7f9a4f1203b3] >>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7f9a4ef0b5b2] >>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c] >>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c] >>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7f9a4e8cf548] >>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7f9a5bdefc22] >>>>>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7f9a5bd733a5] >>>>>>> /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7f9a4e6b7088] >>>>>>> /lib64/libpthread.so.0(+0x7569)[0x7f9a5b29f569] >>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7f9a5afd69af] >>>>>>> >>>>>>> *Scenario 2 *The application just SET Read-Only on each file >>>>>> sucessively. After the 70th file was set, all the bricks crashes and again, >>>>>> one can read this crash report in every brick log: >>>>>> >>>>>>> >>>>>>> >>>>>>> [2019-05-02 07:43:39.953591] I [MSGID: 139001] >>>>>>> [posix-acl.c:263:posix_acl_log_permit_denied] 0-longterm-access-control: >>>>>>> client: >>>>>>> CTX_ID:21aa9c75-3a5f-41f9-925b-48e4c80bd24a-GRAPH_ID:0-PID:16325-HOST:XXX-X-X-XXX-PC_NAME:longterm-client-0-RECON_NO:-0, >>>>>>> gfid: 00000000-0000-0000-0000-000000000001, >>>>>>> req(uid:2000,gid:2000,perm:1,ngrps:1), >>>>>>> ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission >>>>>>> denied] >>>>>>> >>>>>>> pending frames: >>>>>>> >>>>>>> frame : type(0) op(27) >>>>>>> >>>>>>> patchset: git://git.gluster.org/glusterfs.git >>>>>>> >>>>>>> signal received: 11 >>>>>>> >>>>>>> time of crash: >>>>>>> >>>>>>> 2019-05-02 07:43:39 >>>>>>> >>>>>>> configuration details: >>>>>>> >>>>>>> argp 1 >>>>>>> >>>>>>> backtrace 1 >>>>>>> >>>>>>> dlfcn 1 >>>>>>> >>>>>>> libpthread 1 >>>>>>> >>>>>>> llistxattr 1 >>>>>>> >>>>>>> setfsid 1 >>>>>>> >>>>>>> spinlock 1 >>>>>>> >>>>>>> epoll.h 1 >>>>>>> >>>>>>> xattr.h 1 >>>>>>> >>>>>>> st_atim.tv_nsec 1 >>>>>>> >>>>>>> package-string: glusterfs 5.5 >>>>>>> >>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fbb3f0b364c] >>>>>>> >>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fbb3f0bdd26] >>>>>>> >>>>>>> /lib64/libc.so.6(+0x361e0)[0x7fbb3e27a1e0] >>>>>>> >>>>>>> >>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7fbb32257910] >>>>>>> >>>>>>> >>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7fbb32254118] >>>>>>> >>>>>>> >>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7fbb324708d6] >>>>>>> >>>>>>> >>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7fbb326a275b] >>>>>>> >>>>>>> >>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7fbb324693b3] >>>>>>> >>>>>>> >>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7fbb322545b2] >>>>>>> >>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c] >>>>>>> >>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c] >>>>>>> >>>>>>> >>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7fbb31c18548] >>>>>>> >>>>>>> >>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7fbb3f155c22] >>>>>>> >>>>>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7fbb3f0d93a5] >>>>>>> >>>>>>> >>>>>>> /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7fbb31a00088] >>>>>>> >>>>>>> /lib64/libpthread.so.0(+0x7569)[0x7fbb3e605569] >>>>>>> >>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fbb3e33c9ef] >>>>>>> >>>>>> >>>>>> This happens on a 3-Node Gluster v5.5 Cluster on two different >>>>>> volumes. But both volumes has the same settings: >>>>>> >>>>>>> Volume Name: shortterm >>>>>>> Type: Replicate >>>>>>> Volume ID: 5307e5c5-e8a1-493a-a846-342fb0195dee >>>>>>> Status: Started >>>>>>> Snapshot Count: 0 >>>>>>> Number of Bricks: 1 x 3 = 3 >>>>>>> Transport-type: tcp >>>>>>> Bricks: >>>>>>> Brick1: fs-xxxxx-c1-n1:/gluster/brick4/glusterbrick >>>>>>> Brick2: fs-xxxxx-c1-n2:/gluster/brick4/glusterbrick >>>>>>> Brick3: fs-xxxxx-c1-n3:/gluster/brick4/glusterbrick >>>>>>> Options Reconfigured: >>>>>>> storage.reserve: 1 >>>>>>> performance.client-io-threads: off >>>>>>> nfs.disable: on >>>>>>> transport.address-family: inet >>>>>>> user.smb: disable >>>>>>> features.read-only: off >>>>>>> features.worm: off >>>>>>> features.worm-file-level: on >>>>>>> features.retention-mode: enterprise >>>>>>> features.default-retention-period: 120 >>>>>>> network.ping-timeout: 10 >>>>>>> features.cache-invalidation: on >>>>>>> features.cache-invalidation-timeout: 600 >>>>>>> performance.nl-cache: on >>>>>>> performance.nl-cache-timeout: 600 >>>>>>> client.event-threads: 32 >>>>>>> server.event-threads: 32 >>>>>>> cluster.lookup-optimize: on >>>>>>> performance.stat-prefetch: on >>>>>>> performance.cache-invalidation: on >>>>>>> performance.md-cache-timeout: 600 >>>>>>> performance.cache-samba-metadata: on >>>>>>> performance.cache-ima-xattrs: on >>>>>>> performance.io-thread-count: 64 >>>>>>> cluster.use-compound-fops: on >>>>>>> performance.cache-size: 512MB >>>>>>> performance.cache-refresh-timeout: 10 >>>>>>> performance.read-ahead: off >>>>>>> performance.write-behind-window-size: 4MB >>>>>>> performance.write-behind: on >>>>>>> storage.build-pgfid: on >>>>>>> features.utime: on >>>>>>> storage.ctime: on >>>>>>> cluster.quorum-type: fixed >>>>>>> cluster.quorum-count: 2 >>>>>>> features.bitrot: on >>>>>>> features.scrub: Active >>>>>>> features.scrub-freq: daily >>>>>>> cluster.enable-shared-storage: enable >>>>>>> >>>>>>> >>>>>> Why can this happen to all Brick processes? I don't understand the >>>>>> crash report. The FOPs are nothing special and after restart brick >>>>>> processes everything works fine and our application was succeed. >>>>>> >>>>>> Regards >>>>>> David Spisla >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Gluster-users mailing list >>>>>> Gluster-users at gluster.org >>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>> >>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From spisla80 at gmail.com Thu May 16 10:42:27 2019 From: spisla80 at gmail.com (David Spisla) Date: Thu, 16 May 2019 12:42:27 +0200 Subject: [Gluster-users] [Gluster-devel] Improve stability between SMB/CTDB and Gluster (together with Samba Core Developer) In-Reply-To: References: Message-ID: Hello Amar, thank you for the information. Of course, we should wait for Poornima because of her knowledge. Regards David Spisla Am Do., 16. Mai 2019 um 12:23 Uhr schrieb Amar Tumballi Suryanarayan < atumball at redhat.com>: > David, Poornima is on leave from today till 21st May. So having it after > she comes back is better. She has more experience in SMB integration than > many of us. > > -Amar > > On Thu, May 16, 2019 at 1:09 PM David Spisla wrote: > >> Hello everyone, >> >> if there is any problem in finding a date and time, please contact me. It >> would be fine to have a meeting soon. >> >> Regards >> David Spisla >> >> Am Mo., 13. Mai 2019 um 12:38 Uhr schrieb David Spisla < >> david.spisla at iternity.com>: >> >>> Hi Poornima, >>> >>> >>> >>> thats fine. I would suggest this dates and times: >>> >>> >>> >>> May 15th ? 17th at 12:30, 13:30, 14:30 IST (9:00, 10:00, 11:00 CEST) >>> >>> May 20th ? 24th at 12:30, 13:30, 14:30 IST (9:00, 10:00, 11:00 CEST) >>> >>> >>> >>> I add Volker Lendecke from Sernet to the mail. He is the Samba Expert. >>> >>> Can someone of you provide a host via bluejeans.com? If not, I will try >>> it with GoToMeeting (https://www.gotomeeting.com). >>> >>> >>> >>> @all Please write your prefered dates and times. For me, all oft the >>> above dates and times are fine >>> >>> >>> >>> Regards >>> >>> David >>> >>> >>> >>> >>> >>> *Von:* Poornima Gurusiddaiah >>> *Gesendet:* Montag, 13. Mai 2019 07:22 >>> *An:* David Spisla ; Anoop C S ; >>> Gunther Deschner >>> *Cc:* Gluster Devel ; >>> gluster-users at gluster.org List >>> *Betreff:* Re: [Gluster-devel] Improve stability between SMB/CTDB and >>> Gluster (together with Samba Core Developer) >>> >>> >>> >>> Hi, >>> >>> >>> >>> We would be definitely interested in this. Thank you for contacting us. >>> For the starter we can have an online conference. Please suggest few >>> possible date and times for the week(preferably between IST 7.00AM - >>> 9.PM)? >>> >>> Adding Anoop and Gunther who are also the main contributors to the >>> Gluster-Samba integration. >>> >>> >>> >>> Thanks, >>> >>> Poornima >>> >>> >>> >>> >>> >>> >>> >>> On Thu, May 9, 2019 at 7:43 PM David Spisla wrote: >>> >>> Dear Gluster Community, >>> >>> at the moment we are improving the stability of SMB/CTDB and Gluster. >>> For this purpose we are working together with an advanced SAMBA Core >>> Developer. He did some debugging but needs more information about Gluster >>> Core Behaviour. >>> >>> >>> >>> *Would any of the Gluster Developer wants to have a online conference >>> with him and me?* >>> >>> >>> >>> I would organize everything. In my opinion this is a good chance to >>> improve stability of Glusterfs and this is at the moment one of the major >>> issues in the Community. >>> >>> >>> >>> Regards >>> >>> David Spisla >>> >>> _______________________________________________ >>> >>> Community Meeting Calendar: >>> >>> APAC Schedule - >>> Every 2nd and 4th Tuesday at 11:30 AM IST >>> Bridge: https://bluejeans.com/836554017 >>> >>> NA/EMEA Schedule - >>> Every 1st and 3rd Tuesday at 01:00 PM EDT >>> Bridge: https://bluejeans.com/486278655 >>> >>> Gluster-devel mailing list >>> Gluster-devel at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-devel >>> >>> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > > > -- > Amar Tumballi (amarts) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From order at rikus.com Thu May 16 20:50:16 2019 From: order at rikus.com (Jeff Bischoff) Date: Thu, 16 May 2019 16:50:16 -0400 Subject: [Gluster-users] How to prevent Brick terminated by socket temporarily unavailable Message-ID: <0F6B141E-3903-4A8E-8BA5-F2925C782905@rikus.com> I'm having a frequent problem where some temporary condition causes bricks to be shut down. The health-check feature is shutting them down, and according to https://staged-gluster-docs.readthedocs.io/en/release3.7.0beta1/Features/brick-failure-detection/ the brick will stay off and not be restarted (by design). What I don't understand is: What is causing this "Resource temporarily unavailable" in the first place. From searching the web, it sounds like a socket timeout. Have you guys seen this before? If this is truly a temporary failure, why do we shut down the brick indefinitely? Should I try any of the following: Increase 'network.ping-timeout' or 'client.grace-timeout' Disable the health check feature by setting: # gluster volume set storage.health-check-interval 0 The brick log looks like this at the time it is shut down: ------------------ [2019-05-08 13:48:33.642605] W [MSGID: 113075] [posix-helpers.c:1895:posix_fs_health_check] 0-heketidbstorage-posix: aio_write() on /var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_a16f9f0374fe5db948a60a017a3f5e60/brick/.glusterfs/health_check returned [Resource temporarily unavailable] [2019-05-08 13:48:33.749246] M [MSGID: 113075] [posix-helpers.c:1962:posix_health_check_thread_proc] 0-heketidbstorage-posix: health-check failed, going down [2019-05-08 13:48:34.000428] M [MSGID: 113075] [posix-helpers.c:1981:posix_health_check_thread_proc] 0-heketidbstorage-posix: still alive! -> SIGTERM [2019-05-08 13:49:04.597061] W [glusterfsd.c:1514:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f16fdd94dd5] -->/usr/sbin/glusterfsd(glusterfs_sigwaiter+0xe5) [0x556e53da2d65] -->/usr/sbin/glusterfsd(cleanup_and_exit+0x6b) [0x556e53da2b8b] ) 0-: received signum (15), shutting down ------------------ The GlusterD log shows this shortly after: ------------------ [2019-05-08 13:49:04.673536] I [MSGID: 106143] [glusterd-pmap.c:397:pmap_registry_remove] 0-pmap: removing brick /var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_a16f9f0374fe5db948a60a017a3f5e60/brick on port 49152 [2019-05-08 13:49:05.003848] W [socket.c:599:__socket_rwv] 0-management: readv on /var/run/gluster/fe4ac75011a4de0e.socket failed (No data available) ------------------ Any guidance would be greatly appreciated! Best, Jeff Bischoff -------------- next part -------------- An HTML attachment was scrubbed... URL: From vbellur at redhat.com Thu May 16 23:50:50 2019 From: vbellur at redhat.com (Vijay Bellur) Date: Thu, 16 May 2019 16:50:50 -0700 Subject: [Gluster-users] Brick-Xlators crashes after Set-RO and Read In-Reply-To: References: Message-ID: Hello David, >From the backtrace it looks like stbuf is NULL in map_atime_from_server() as worm_lookup_cbk has got an error (op_ret = -1, op_errno = 13). Can you please check if there is an unconditional dereference of stbuf in map_atime_from_server()? Regards, Vijay On Thu, May 16, 2019 at 2:36 AM David Spisla wrote: > Hello Vijay, > > yes, we are using custom patches. It s a helper function, which is defined > in xlator_helper.c and used in worm_lookup_cbk. > Do you think this could be the problem? The functions only manipulates the > atime in struct iattr > > Regards > David Spisla > > Am Do., 16. Mai 2019 um 10:05 Uhr schrieb Vijay Bellur >: > >> Hello David, >> >> Do you have any custom patches in your deployment? I looked up v5.5 but >> could not find the following functions referred to in the core: >> >> map_atime_from_server() >> worm_lookup_cbk() >> >> Neither do I see xlator_helper.c in the codebase. >> >> Thanks, >> Vijay >> >> >> #0 map_atime_from_server (this=0x7fdef401af00, stbuf=0x0) at >> ../../../../xlators/lib/src/xlator_helper.c:21 >> __FUNCTION__ = "map_to_atime_from_server" >> #1 0x00007fdef39a0382 in worm_lookup_cbk (frame=frame at entry=0x7fdeac0015c8, >> cookie=, this=0x7fdef401af00, op_ret=op_ret at entry=-1, >> op_errno=op_errno at entry=13, >> inode=inode at entry=0x0, buf=0x0, xdata=0x0, postparent=0x0) at >> worm.c:531 >> priv = 0x7fdef4075378 >> ret = 0 >> __FUNCTION__ = "worm_lookup_cbk" >> >> On Thu, May 16, 2019 at 12:53 AM David Spisla wrote: >> >>> Hello Vijay, >>> >>> I could reproduce the issue. After doing a simple DIR Listing from Win10 >>> powershell, all brick processes crashes. Its not the same scenario >>> mentioned before but the crash report in the bricks log is the same. >>> Attached you find the backtrace. >>> >>> Regards >>> David Spisla >>> >>> Am Di., 7. Mai 2019 um 20:08 Uhr schrieb Vijay Bellur < >>> vbellur at redhat.com>: >>> >>>> Hello David, >>>> >>>> On Tue, May 7, 2019 at 2:16 AM David Spisla wrote: >>>> >>>>> Hello Vijay, >>>>> >>>>> how can I create such a core file? Or will it be created automatically >>>>> if a gluster process crashes? >>>>> Maybe you can give me a hint and will try to get a backtrace. >>>>> >>>> >>>> Generation of core file is dependent on the system configuration. `man >>>> 5 core` contains useful information to generate a core file in a directory. >>>> Once a core file is generated, you can use gdb to get a backtrace of all >>>> threads (using "thread apply all bt full"). >>>> >>>> >>>>> Unfortunately this bug is not easy to reproduce because it appears >>>>> only sometimes. >>>>> >>>> >>>> If the bug is not easy to reproduce, having a backtrace from the >>>> generated core would be very useful! >>>> >>>> Thanks, >>>> Vijay >>>> >>>> >>>>> >>>>> Regards >>>>> David Spisla >>>>> >>>>> Am Mo., 6. Mai 2019 um 19:48 Uhr schrieb Vijay Bellur < >>>>> vbellur at redhat.com>: >>>>> >>>>>> Thank you for the report, David. Do you have core files available on >>>>>> any of the servers? If yes, would it be possible for you to provide a >>>>>> backtrace. >>>>>> >>>>>> Regards, >>>>>> Vijay >>>>>> >>>>>> On Mon, May 6, 2019 at 3:09 AM David Spisla >>>>>> wrote: >>>>>> >>>>>>> Hello folks, >>>>>>> >>>>>>> we have a client application (runs on Win10) which does some FOPs on >>>>>>> a gluster volume which is accessed by SMB. >>>>>>> >>>>>>> *Scenario 1* is a READ Operation which reads all files successively >>>>>>> and checks if the files data was correctly copied. While doing this, all >>>>>>> brick processes crashes and in the logs one have this crash report on every >>>>>>> brick log: >>>>>>> >>>>>>>> CTX_ID:a0359502-2c76-4fee-8cb9-365679dc690e-GRAPH_ID:0-PID:32934-HOST:XX-XXXXX-XX-XX-PC_NAME:shortterm-client-2-RECON_NO:-0, gfid: 00000000-0000-0000-0000-000000000001, req(uid:2000,gid:2000,perm:1,ngrps:1), ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission denied] >>>>>>>> pending frames: >>>>>>>> frame : type(0) op(27) >>>>>>>> frame : type(0) op(40) >>>>>>>> patchset: git://git.gluster.org/glusterfs.git >>>>>>>> signal received: 11 >>>>>>>> time of crash: >>>>>>>> 2019-04-16 08:32:21 >>>>>>>> configuration details: >>>>>>>> argp 1 >>>>>>>> backtrace 1 >>>>>>>> dlfcn 1 >>>>>>>> libpthread 1 >>>>>>>> llistxattr 1 >>>>>>>> setfsid 1 >>>>>>>> spinlock 1 >>>>>>>> epoll.h 1 >>>>>>>> xattr.h 1 >>>>>>>> st_atim.tv_nsec 1 >>>>>>>> package-string: glusterfs 5.5 >>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f9a5bd4d64c] >>>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f9a5bd57d26] >>>>>>>> /lib64/libc.so.6(+0x361a0)[0x7f9a5af141a0] >>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7f9a4ef0e910] >>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7f9a4ef0b118] >>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7f9a4f1278d6] >>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7f9a4f35975b] >>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7f9a4f1203b3] >>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7f9a4ef0b5b2] >>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c] >>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c] >>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7f9a4e8cf548] >>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7f9a5bdefc22] >>>>>>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7f9a5bd733a5] >>>>>>>> /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7f9a4e6b7088] >>>>>>>> /lib64/libpthread.so.0(+0x7569)[0x7f9a5b29f569] >>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7f9a5afd69af] >>>>>>>> >>>>>>>> *Scenario 2 *The application just SET Read-Only on each file >>>>>>> sucessively. After the 70th file was set, all the bricks crashes and again, >>>>>>> one can read this crash report in every brick log: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> [2019-05-02 07:43:39.953591] I [MSGID: 139001] >>>>>>>> [posix-acl.c:263:posix_acl_log_permit_denied] 0-longterm-access-control: >>>>>>>> client: >>>>>>>> CTX_ID:21aa9c75-3a5f-41f9-925b-48e4c80bd24a-GRAPH_ID:0-PID:16325-HOST:XXX-X-X-XXX-PC_NAME:longterm-client-0-RECON_NO:-0, >>>>>>>> gfid: 00000000-0000-0000-0000-000000000001, >>>>>>>> req(uid:2000,gid:2000,perm:1,ngrps:1), >>>>>>>> ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission >>>>>>>> denied] >>>>>>>> >>>>>>>> pending frames: >>>>>>>> >>>>>>>> frame : type(0) op(27) >>>>>>>> >>>>>>>> patchset: git://git.gluster.org/glusterfs.git >>>>>>>> >>>>>>>> signal received: 11 >>>>>>>> >>>>>>>> time of crash: >>>>>>>> >>>>>>>> 2019-05-02 07:43:39 >>>>>>>> >>>>>>>> configuration details: >>>>>>>> >>>>>>>> argp 1 >>>>>>>> >>>>>>>> backtrace 1 >>>>>>>> >>>>>>>> dlfcn 1 >>>>>>>> >>>>>>>> libpthread 1 >>>>>>>> >>>>>>>> llistxattr 1 >>>>>>>> >>>>>>>> setfsid 1 >>>>>>>> >>>>>>>> spinlock 1 >>>>>>>> >>>>>>>> epoll.h 1 >>>>>>>> >>>>>>>> xattr.h 1 >>>>>>>> >>>>>>>> st_atim.tv_nsec 1 >>>>>>>> >>>>>>>> package-string: glusterfs 5.5 >>>>>>>> >>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fbb3f0b364c] >>>>>>>> >>>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fbb3f0bdd26] >>>>>>>> >>>>>>>> /lib64/libc.so.6(+0x361e0)[0x7fbb3e27a1e0] >>>>>>>> >>>>>>>> >>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7fbb32257910] >>>>>>>> >>>>>>>> >>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7fbb32254118] >>>>>>>> >>>>>>>> >>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7fbb324708d6] >>>>>>>> >>>>>>>> >>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7fbb326a275b] >>>>>>>> >>>>>>>> >>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7fbb324693b3] >>>>>>>> >>>>>>>> >>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7fbb322545b2] >>>>>>>> >>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c] >>>>>>>> >>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c] >>>>>>>> >>>>>>>> >>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7fbb31c18548] >>>>>>>> >>>>>>>> >>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7fbb3f155c22] >>>>>>>> >>>>>>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7fbb3f0d93a5] >>>>>>>> >>>>>>>> >>>>>>>> /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7fbb31a00088] >>>>>>>> >>>>>>>> /lib64/libpthread.so.0(+0x7569)[0x7fbb3e605569] >>>>>>>> >>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fbb3e33c9ef] >>>>>>>> >>>>>>> >>>>>>> This happens on a 3-Node Gluster v5.5 Cluster on two different >>>>>>> volumes. But both volumes has the same settings: >>>>>>> >>>>>>>> Volume Name: shortterm >>>>>>>> Type: Replicate >>>>>>>> Volume ID: 5307e5c5-e8a1-493a-a846-342fb0195dee >>>>>>>> Status: Started >>>>>>>> Snapshot Count: 0 >>>>>>>> Number of Bricks: 1 x 3 = 3 >>>>>>>> Transport-type: tcp >>>>>>>> Bricks: >>>>>>>> Brick1: fs-xxxxx-c1-n1:/gluster/brick4/glusterbrick >>>>>>>> Brick2: fs-xxxxx-c1-n2:/gluster/brick4/glusterbrick >>>>>>>> Brick3: fs-xxxxx-c1-n3:/gluster/brick4/glusterbrick >>>>>>>> Options Reconfigured: >>>>>>>> storage.reserve: 1 >>>>>>>> performance.client-io-threads: off >>>>>>>> nfs.disable: on >>>>>>>> transport.address-family: inet >>>>>>>> user.smb: disable >>>>>>>> features.read-only: off >>>>>>>> features.worm: off >>>>>>>> features.worm-file-level: on >>>>>>>> features.retention-mode: enterprise >>>>>>>> features.default-retention-period: 120 >>>>>>>> network.ping-timeout: 10 >>>>>>>> features.cache-invalidation: on >>>>>>>> features.cache-invalidation-timeout: 600 >>>>>>>> performance.nl-cache: on >>>>>>>> performance.nl-cache-timeout: 600 >>>>>>>> client.event-threads: 32 >>>>>>>> server.event-threads: 32 >>>>>>>> cluster.lookup-optimize: on >>>>>>>> performance.stat-prefetch: on >>>>>>>> performance.cache-invalidation: on >>>>>>>> performance.md-cache-timeout: 600 >>>>>>>> performance.cache-samba-metadata: on >>>>>>>> performance.cache-ima-xattrs: on >>>>>>>> performance.io-thread-count: 64 >>>>>>>> cluster.use-compound-fops: on >>>>>>>> performance.cache-size: 512MB >>>>>>>> performance.cache-refresh-timeout: 10 >>>>>>>> performance.read-ahead: off >>>>>>>> performance.write-behind-window-size: 4MB >>>>>>>> performance.write-behind: on >>>>>>>> storage.build-pgfid: on >>>>>>>> features.utime: on >>>>>>>> storage.ctime: on >>>>>>>> cluster.quorum-type: fixed >>>>>>>> cluster.quorum-count: 2 >>>>>>>> features.bitrot: on >>>>>>>> features.scrub: Active >>>>>>>> features.scrub-freq: daily >>>>>>>> cluster.enable-shared-storage: enable >>>>>>>> >>>>>>>> >>>>>>> Why can this happen to all Brick processes? I don't understand the >>>>>>> crash report. The FOPs are nothing special and after restart brick >>>>>>> processes everything works fine and our application was succeed. >>>>>>> >>>>>>> Regards >>>>>>> David Spisla >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Gluster-users mailing list >>>>>>> Gluster-users at gluster.org >>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>> >>>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From dcunningham at voisonics.com Fri May 17 00:29:58 2019 From: dcunningham at voisonics.com (David Cunningham) Date: Fri, 17 May 2019 12:29:58 +1200 Subject: [Gluster-users] add-brick: failed: Commit failed Message-ID: Hello, We're adding an arbiter node to an existing volume and having an issue. Can anyone help? The root cause error appears to be "00000000-0000-0000-0000-000000000001: failed to resolve (Transport endpoint is not connected)", as below. We are running glusterfs 5.6.1. Thanks in advance for any assistance! On existing node gfs1, trying to add new arbiter node gfs3: # gluster volume add-brick gvol0 replica 3 arbiter 1 gfs3:/nodirectwritedata/gluster/gvol0 volume add-brick: failed: Commit failed on gfs3. Please check log file for details. On new node gfs3 in gvol0-add-brick-mount.log: [2019-05-17 01:20:22.689721] I [fuse-bridge.c:4267:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.22 [2019-05-17 01:20:22.689778] I [fuse-bridge.c:4878:fuse_graph_sync] 0-fuse: switched to graph 0 [2019-05-17 01:20:22.694897] E [fuse-bridge.c:4336:fuse_first_lookup] 0-fuse: first lookup on root failed (Transport endpoint is not connected) [2019-05-17 01:20:22.699770] W [fuse-resolve.c:127:fuse_resolve_gfid_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001: failed to resolve (Transport endpoint is not connected) [2019-05-17 01:20:22.699834] W [fuse-bridge.c:3294:fuse_setxattr_resume] 0-glusterfs-fuse: 2: SETXATTR 00000000-0000-0000-0000-000000000001/1 (trusted.add-brick) resolution failed [2019-05-17 01:20:22.715656] I [fuse-bridge.c:5144:fuse_thread_proc] 0-fuse: initating unmount of /tmp/mntQAtu3f [2019-05-17 01:20:22.715865] W [glusterfsd.c:1500:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dd5) [0x7fb223bf6dd5] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x560886581e75] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x560886581ceb] ) 0-: received signum (15), shutting down [2019-05-17 01:20:22.715926] I [fuse-bridge.c:5914:fini] 0-fuse: Unmounting '/tmp/mntQAtu3f'. [2019-05-17 01:20:22.715953] I [fuse-bridge.c:5919:fini] 0-fuse: Closing fuse connection to '/tmp/mntQAtu3f'. Processes running on new node gfs3: # ps -ef | grep gluster root 6832 1 0 20:17 ? 00:00:00 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO root 15799 1 0 20:17 ? 00:00:00 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/24c12b09f93eec8e.socket --xlator-option *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412 --process-name glustershd root 16856 16735 0 21:21 pts/0 00:00:00 grep --color=auto gluster -- David Cunningham, Voisonics Limited http://voisonics.com/ USA: +1 213 221 1092 New Zealand: +64 (0)28 2558 3782 -------------- next part -------------- An HTML attachment was scrubbed... URL: From spisla80 at gmail.com Fri May 17 07:50:28 2019 From: spisla80 at gmail.com (David Spisla) Date: Fri, 17 May 2019 09:50:28 +0200 Subject: [Gluster-users] Brick-Xlators crashes after Set-RO and Read In-Reply-To: References: Message-ID: Hello Vijay, thank you for the clarification. Yes, there is an unconditional dereference in stbuf. It seems plausible that this causes the crash. I think a check like this should help: if (buf == NULL) { goto out; } map_atime_from_server(this, buf); Is there a reason why buf can be NULL? Regards David Spisla Am Fr., 17. Mai 2019 um 01:51 Uhr schrieb Vijay Bellur : > Hello David, > > From the backtrace it looks like stbuf is NULL in map_atime_from_server() > as worm_lookup_cbk has got an error (op_ret = -1, op_errno = 13). Can you > please check if there is an unconditional dereference of stbuf in > map_atime_from_server()? > > Regards, > Vijay > > On Thu, May 16, 2019 at 2:36 AM David Spisla wrote: > >> Hello Vijay, >> >> yes, we are using custom patches. It s a helper function, which is >> defined in xlator_helper.c and used in worm_lookup_cbk. >> Do you think this could be the problem? The functions only manipulates >> the atime in struct iattr >> >> Regards >> David Spisla >> >> Am Do., 16. Mai 2019 um 10:05 Uhr schrieb Vijay Bellur < >> vbellur at redhat.com>: >> >>> Hello David, >>> >>> Do you have any custom patches in your deployment? I looked up v5.5 but >>> could not find the following functions referred to in the core: >>> >>> map_atime_from_server() >>> worm_lookup_cbk() >>> >>> Neither do I see xlator_helper.c in the codebase. >>> >>> Thanks, >>> Vijay >>> >>> >>> #0 map_atime_from_server (this=0x7fdef401af00, stbuf=0x0) at >>> ../../../../xlators/lib/src/xlator_helper.c:21 >>> __FUNCTION__ = "map_to_atime_from_server" >>> #1 0x00007fdef39a0382 in worm_lookup_cbk (frame=frame at entry=0x7fdeac0015c8, >>> cookie=, this=0x7fdef401af00, op_ret=op_ret at entry=-1, >>> op_errno=op_errno at entry=13, >>> inode=inode at entry=0x0, buf=0x0, xdata=0x0, postparent=0x0) at >>> worm.c:531 >>> priv = 0x7fdef4075378 >>> ret = 0 >>> __FUNCTION__ = "worm_lookup_cbk" >>> >>> On Thu, May 16, 2019 at 12:53 AM David Spisla >>> wrote: >>> >>>> Hello Vijay, >>>> >>>> I could reproduce the issue. After doing a simple DIR Listing from >>>> Win10 powershell, all brick processes crashes. Its not the same scenario >>>> mentioned before but the crash report in the bricks log is the same. >>>> Attached you find the backtrace. >>>> >>>> Regards >>>> David Spisla >>>> >>>> Am Di., 7. Mai 2019 um 20:08 Uhr schrieb Vijay Bellur < >>>> vbellur at redhat.com>: >>>> >>>>> Hello David, >>>>> >>>>> On Tue, May 7, 2019 at 2:16 AM David Spisla >>>>> wrote: >>>>> >>>>>> Hello Vijay, >>>>>> >>>>>> how can I create such a core file? Or will it be created >>>>>> automatically if a gluster process crashes? >>>>>> Maybe you can give me a hint and will try to get a backtrace. >>>>>> >>>>> >>>>> Generation of core file is dependent on the system configuration. >>>>> `man 5 core` contains useful information to generate a core file in a >>>>> directory. Once a core file is generated, you can use gdb to get a >>>>> backtrace of all threads (using "thread apply all bt full"). >>>>> >>>>> >>>>>> Unfortunately this bug is not easy to reproduce because it appears >>>>>> only sometimes. >>>>>> >>>>> >>>>> If the bug is not easy to reproduce, having a backtrace from the >>>>> generated core would be very useful! >>>>> >>>>> Thanks, >>>>> Vijay >>>>> >>>>> >>>>>> >>>>>> Regards >>>>>> David Spisla >>>>>> >>>>>> Am Mo., 6. Mai 2019 um 19:48 Uhr schrieb Vijay Bellur < >>>>>> vbellur at redhat.com>: >>>>>> >>>>>>> Thank you for the report, David. Do you have core files available on >>>>>>> any of the servers? If yes, would it be possible for you to provide a >>>>>>> backtrace. >>>>>>> >>>>>>> Regards, >>>>>>> Vijay >>>>>>> >>>>>>> On Mon, May 6, 2019 at 3:09 AM David Spisla >>>>>>> wrote: >>>>>>> >>>>>>>> Hello folks, >>>>>>>> >>>>>>>> we have a client application (runs on Win10) which does some FOPs >>>>>>>> on a gluster volume which is accessed by SMB. >>>>>>>> >>>>>>>> *Scenario 1* is a READ Operation which reads all files >>>>>>>> successively and checks if the files data was correctly copied. While doing >>>>>>>> this, all brick processes crashes and in the logs one have this crash >>>>>>>> report on every brick log: >>>>>>>> >>>>>>>>> CTX_ID:a0359502-2c76-4fee-8cb9-365679dc690e-GRAPH_ID:0-PID:32934-HOST:XX-XXXXX-XX-XX-PC_NAME:shortterm-client-2-RECON_NO:-0, gfid: 00000000-0000-0000-0000-000000000001, req(uid:2000,gid:2000,perm:1,ngrps:1), ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission denied] >>>>>>>>> pending frames: >>>>>>>>> frame : type(0) op(27) >>>>>>>>> frame : type(0) op(40) >>>>>>>>> patchset: git://git.gluster.org/glusterfs.git >>>>>>>>> signal received: 11 >>>>>>>>> time of crash: >>>>>>>>> 2019-04-16 08:32:21 >>>>>>>>> configuration details: >>>>>>>>> argp 1 >>>>>>>>> backtrace 1 >>>>>>>>> dlfcn 1 >>>>>>>>> libpthread 1 >>>>>>>>> llistxattr 1 >>>>>>>>> setfsid 1 >>>>>>>>> spinlock 1 >>>>>>>>> epoll.h 1 >>>>>>>>> xattr.h 1 >>>>>>>>> st_atim.tv_nsec 1 >>>>>>>>> package-string: glusterfs 5.5 >>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f9a5bd4d64c] >>>>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f9a5bd57d26] >>>>>>>>> /lib64/libc.so.6(+0x361a0)[0x7f9a5af141a0] >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7f9a4ef0e910] >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7f9a4ef0b118] >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7f9a4f1278d6] >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7f9a4f35975b] >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7f9a4f1203b3] >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7f9a4ef0b5b2] >>>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c] >>>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c] >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7f9a4e8cf548] >>>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7f9a5bdefc22] >>>>>>>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7f9a5bd733a5] >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7f9a4e6b7088] >>>>>>>>> /lib64/libpthread.so.0(+0x7569)[0x7f9a5b29f569] >>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7f9a5afd69af] >>>>>>>>> >>>>>>>>> *Scenario 2 *The application just SET Read-Only on each file >>>>>>>> sucessively. After the 70th file was set, all the bricks crashes and again, >>>>>>>> one can read this crash report in every brick log: >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> [2019-05-02 07:43:39.953591] I [MSGID: 139001] >>>>>>>>> [posix-acl.c:263:posix_acl_log_permit_denied] 0-longterm-access-control: >>>>>>>>> client: >>>>>>>>> CTX_ID:21aa9c75-3a5f-41f9-925b-48e4c80bd24a-GRAPH_ID:0-PID:16325-HOST:XXX-X-X-XXX-PC_NAME:longterm-client-0-RECON_NO:-0, >>>>>>>>> gfid: 00000000-0000-0000-0000-000000000001, >>>>>>>>> req(uid:2000,gid:2000,perm:1,ngrps:1), >>>>>>>>> ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission >>>>>>>>> denied] >>>>>>>>> >>>>>>>>> pending frames: >>>>>>>>> >>>>>>>>> frame : type(0) op(27) >>>>>>>>> >>>>>>>>> patchset: git://git.gluster.org/glusterfs.git >>>>>>>>> >>>>>>>>> signal received: 11 >>>>>>>>> >>>>>>>>> time of crash: >>>>>>>>> >>>>>>>>> 2019-05-02 07:43:39 >>>>>>>>> >>>>>>>>> configuration details: >>>>>>>>> >>>>>>>>> argp 1 >>>>>>>>> >>>>>>>>> backtrace 1 >>>>>>>>> >>>>>>>>> dlfcn 1 >>>>>>>>> >>>>>>>>> libpthread 1 >>>>>>>>> >>>>>>>>> llistxattr 1 >>>>>>>>> >>>>>>>>> setfsid 1 >>>>>>>>> >>>>>>>>> spinlock 1 >>>>>>>>> >>>>>>>>> epoll.h 1 >>>>>>>>> >>>>>>>>> xattr.h 1 >>>>>>>>> >>>>>>>>> st_atim.tv_nsec 1 >>>>>>>>> >>>>>>>>> package-string: glusterfs 5.5 >>>>>>>>> >>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fbb3f0b364c] >>>>>>>>> >>>>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fbb3f0bdd26] >>>>>>>>> >>>>>>>>> /lib64/libc.so.6(+0x361e0)[0x7fbb3e27a1e0] >>>>>>>>> >>>>>>>>> >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7fbb32257910] >>>>>>>>> >>>>>>>>> >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7fbb32254118] >>>>>>>>> >>>>>>>>> >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7fbb324708d6] >>>>>>>>> >>>>>>>>> >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7fbb326a275b] >>>>>>>>> >>>>>>>>> >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7fbb324693b3] >>>>>>>>> >>>>>>>>> >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7fbb322545b2] >>>>>>>>> >>>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c] >>>>>>>>> >>>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c] >>>>>>>>> >>>>>>>>> >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7fbb31c18548] >>>>>>>>> >>>>>>>>> >>>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7fbb3f155c22] >>>>>>>>> >>>>>>>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7fbb3f0d93a5] >>>>>>>>> >>>>>>>>> >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7fbb31a00088] >>>>>>>>> >>>>>>>>> /lib64/libpthread.so.0(+0x7569)[0x7fbb3e605569] >>>>>>>>> >>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fbb3e33c9ef] >>>>>>>>> >>>>>>>> >>>>>>>> This happens on a 3-Node Gluster v5.5 Cluster on two different >>>>>>>> volumes. But both volumes has the same settings: >>>>>>>> >>>>>>>>> Volume Name: shortterm >>>>>>>>> Type: Replicate >>>>>>>>> Volume ID: 5307e5c5-e8a1-493a-a846-342fb0195dee >>>>>>>>> Status: Started >>>>>>>>> Snapshot Count: 0 >>>>>>>>> Number of Bricks: 1 x 3 = 3 >>>>>>>>> Transport-type: tcp >>>>>>>>> Bricks: >>>>>>>>> Brick1: fs-xxxxx-c1-n1:/gluster/brick4/glusterbrick >>>>>>>>> Brick2: fs-xxxxx-c1-n2:/gluster/brick4/glusterbrick >>>>>>>>> Brick3: fs-xxxxx-c1-n3:/gluster/brick4/glusterbrick >>>>>>>>> Options Reconfigured: >>>>>>>>> storage.reserve: 1 >>>>>>>>> performance.client-io-threads: off >>>>>>>>> nfs.disable: on >>>>>>>>> transport.address-family: inet >>>>>>>>> user.smb: disable >>>>>>>>> features.read-only: off >>>>>>>>> features.worm: off >>>>>>>>> features.worm-file-level: on >>>>>>>>> features.retention-mode: enterprise >>>>>>>>> features.default-retention-period: 120 >>>>>>>>> network.ping-timeout: 10 >>>>>>>>> features.cache-invalidation: on >>>>>>>>> features.cache-invalidation-timeout: 600 >>>>>>>>> performance.nl-cache: on >>>>>>>>> performance.nl-cache-timeout: 600 >>>>>>>>> client.event-threads: 32 >>>>>>>>> server.event-threads: 32 >>>>>>>>> cluster.lookup-optimize: on >>>>>>>>> performance.stat-prefetch: on >>>>>>>>> performance.cache-invalidation: on >>>>>>>>> performance.md-cache-timeout: 600 >>>>>>>>> performance.cache-samba-metadata: on >>>>>>>>> performance.cache-ima-xattrs: on >>>>>>>>> performance.io-thread-count: 64 >>>>>>>>> cluster.use-compound-fops: on >>>>>>>>> performance.cache-size: 512MB >>>>>>>>> performance.cache-refresh-timeout: 10 >>>>>>>>> performance.read-ahead: off >>>>>>>>> performance.write-behind-window-size: 4MB >>>>>>>>> performance.write-behind: on >>>>>>>>> storage.build-pgfid: on >>>>>>>>> features.utime: on >>>>>>>>> storage.ctime: on >>>>>>>>> cluster.quorum-type: fixed >>>>>>>>> cluster.quorum-count: 2 >>>>>>>>> features.bitrot: on >>>>>>>>> features.scrub: Active >>>>>>>>> features.scrub-freq: daily >>>>>>>>> cluster.enable-shared-storage: enable >>>>>>>>> >>>>>>>>> >>>>>>>> Why can this happen to all Brick processes? I don't understand the >>>>>>>> crash report. The FOPs are nothing special and after restart brick >>>>>>>> processes everything works fine and our application was succeed. >>>>>>>> >>>>>>>> Regards >>>>>>>> David Spisla >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Gluster-users mailing list >>>>>>>> Gluster-users at gluster.org >>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>>> >>>>>>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndevos at redhat.com Fri May 17 08:21:41 2019 From: ndevos at redhat.com (Niels de Vos) Date: Fri, 17 May 2019 10:21:41 +0200 Subject: [Gluster-users] Brick-Xlators crashes after Set-RO and Read In-Reply-To: References: Message-ID: <20190517082141.GA24535@ndevos-x270> On Fri, May 17, 2019 at 09:50:28AM +0200, David Spisla wrote: > Hello Vijay, > thank you for the clarification. Yes, there is an unconditional dereference > in stbuf. It seems plausible that this causes the crash. I think a check > like this should help: > > if (buf == NULL) { > goto out; > } > map_atime_from_server(this, buf); > > Is there a reason why buf can be NULL? It seems LOOKUP returned an error (errno=13: EACCES: Permission denied). This is probably something you need to handle in worm_lookup_cbk. There can be many reasons for a FOP to return an error, why it happened in this case is a little difficult to say without (much) more details. HTH, Niels > > Regards > David Spisla > > > Am Fr., 17. Mai 2019 um 01:51 Uhr schrieb Vijay Bellur : > > > Hello David, > > > > From the backtrace it looks like stbuf is NULL in map_atime_from_server() > > as worm_lookup_cbk has got an error (op_ret = -1, op_errno = 13). Can you > > please check if there is an unconditional dereference of stbuf in > > map_atime_from_server()? > > > > Regards, > > Vijay > > > > On Thu, May 16, 2019 at 2:36 AM David Spisla wrote: > > > >> Hello Vijay, > >> > >> yes, we are using custom patches. It s a helper function, which is > >> defined in xlator_helper.c and used in worm_lookup_cbk. > >> Do you think this could be the problem? The functions only manipulates > >> the atime in struct iattr > >> > >> Regards > >> David Spisla > >> > >> Am Do., 16. Mai 2019 um 10:05 Uhr schrieb Vijay Bellur < > >> vbellur at redhat.com>: > >> > >>> Hello David, > >>> > >>> Do you have any custom patches in your deployment? I looked up v5.5 but > >>> could not find the following functions referred to in the core: > >>> > >>> map_atime_from_server() > >>> worm_lookup_cbk() > >>> > >>> Neither do I see xlator_helper.c in the codebase. > >>> > >>> Thanks, > >>> Vijay > >>> > >>> > >>> #0 map_atime_from_server (this=0x7fdef401af00, stbuf=0x0) at > >>> ../../../../xlators/lib/src/xlator_helper.c:21 > >>> __FUNCTION__ = "map_to_atime_from_server" > >>> #1 0x00007fdef39a0382 in worm_lookup_cbk (frame=frame at entry=0x7fdeac0015c8, > >>> cookie=, this=0x7fdef401af00, op_ret=op_ret at entry=-1, > >>> op_errno=op_errno at entry=13, > >>> inode=inode at entry=0x0, buf=0x0, xdata=0x0, postparent=0x0) at > >>> worm.c:531 > >>> priv = 0x7fdef4075378 > >>> ret = 0 > >>> __FUNCTION__ = "worm_lookup_cbk" > >>> > >>> On Thu, May 16, 2019 at 12:53 AM David Spisla > >>> wrote: > >>> > >>>> Hello Vijay, > >>>> > >>>> I could reproduce the issue. After doing a simple DIR Listing from > >>>> Win10 powershell, all brick processes crashes. Its not the same scenario > >>>> mentioned before but the crash report in the bricks log is the same. > >>>> Attached you find the backtrace. > >>>> > >>>> Regards > >>>> David Spisla > >>>> > >>>> Am Di., 7. Mai 2019 um 20:08 Uhr schrieb Vijay Bellur < > >>>> vbellur at redhat.com>: > >>>> > >>>>> Hello David, > >>>>> > >>>>> On Tue, May 7, 2019 at 2:16 AM David Spisla > >>>>> wrote: > >>>>> > >>>>>> Hello Vijay, > >>>>>> > >>>>>> how can I create such a core file? Or will it be created > >>>>>> automatically if a gluster process crashes? > >>>>>> Maybe you can give me a hint and will try to get a backtrace. > >>>>>> > >>>>> > >>>>> Generation of core file is dependent on the system configuration. > >>>>> `man 5 core` contains useful information to generate a core file in a > >>>>> directory. Once a core file is generated, you can use gdb to get a > >>>>> backtrace of all threads (using "thread apply all bt full"). > >>>>> > >>>>> > >>>>>> Unfortunately this bug is not easy to reproduce because it appears > >>>>>> only sometimes. > >>>>>> > >>>>> > >>>>> If the bug is not easy to reproduce, having a backtrace from the > >>>>> generated core would be very useful! > >>>>> > >>>>> Thanks, > >>>>> Vijay > >>>>> > >>>>> > >>>>>> > >>>>>> Regards > >>>>>> David Spisla > >>>>>> > >>>>>> Am Mo., 6. Mai 2019 um 19:48 Uhr schrieb Vijay Bellur < > >>>>>> vbellur at redhat.com>: > >>>>>> > >>>>>>> Thank you for the report, David. Do you have core files available on > >>>>>>> any of the servers? If yes, would it be possible for you to provide a > >>>>>>> backtrace. > >>>>>>> > >>>>>>> Regards, > >>>>>>> Vijay > >>>>>>> > >>>>>>> On Mon, May 6, 2019 at 3:09 AM David Spisla > >>>>>>> wrote: > >>>>>>> > >>>>>>>> Hello folks, > >>>>>>>> > >>>>>>>> we have a client application (runs on Win10) which does some FOPs > >>>>>>>> on a gluster volume which is accessed by SMB. > >>>>>>>> > >>>>>>>> *Scenario 1* is a READ Operation which reads all files > >>>>>>>> successively and checks if the files data was correctly copied. While doing > >>>>>>>> this, all brick processes crashes and in the logs one have this crash > >>>>>>>> report on every brick log: > >>>>>>>> > >>>>>>>>> CTX_ID:a0359502-2c76-4fee-8cb9-365679dc690e-GRAPH_ID:0-PID:32934-HOST:XX-XXXXX-XX-XX-PC_NAME:shortterm-client-2-RECON_NO:-0, gfid: 00000000-0000-0000-0000-000000000001, req(uid:2000,gid:2000,perm:1,ngrps:1), ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission denied] > >>>>>>>>> pending frames: > >>>>>>>>> frame : type(0) op(27) > >>>>>>>>> frame : type(0) op(40) > >>>>>>>>> patchset: git://git.gluster.org/glusterfs.git > >>>>>>>>> signal received: 11 > >>>>>>>>> time of crash: > >>>>>>>>> 2019-04-16 08:32:21 > >>>>>>>>> configuration details: > >>>>>>>>> argp 1 > >>>>>>>>> backtrace 1 > >>>>>>>>> dlfcn 1 > >>>>>>>>> libpthread 1 > >>>>>>>>> llistxattr 1 > >>>>>>>>> setfsid 1 > >>>>>>>>> spinlock 1 > >>>>>>>>> epoll.h 1 > >>>>>>>>> xattr.h 1 > >>>>>>>>> st_atim.tv_nsec 1 > >>>>>>>>> package-string: glusterfs 5.5 > >>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f9a5bd4d64c] > >>>>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f9a5bd57d26] > >>>>>>>>> /lib64/libc.so.6(+0x361a0)[0x7f9a5af141a0] > >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7f9a4ef0e910] > >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7f9a4ef0b118] > >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7f9a4f1278d6] > >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7f9a4f35975b] > >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7f9a4f1203b3] > >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7f9a4ef0b5b2] > >>>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c] > >>>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c] > >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7f9a4e8cf548] > >>>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7f9a5bdefc22] > >>>>>>>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7f9a5bd733a5] > >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7f9a4e6b7088] > >>>>>>>>> /lib64/libpthread.so.0(+0x7569)[0x7f9a5b29f569] > >>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7f9a5afd69af] > >>>>>>>>> > >>>>>>>>> *Scenario 2 *The application just SET Read-Only on each file > >>>>>>>> sucessively. After the 70th file was set, all the bricks crashes and again, > >>>>>>>> one can read this crash report in every brick log: > >>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> [2019-05-02 07:43:39.953591] I [MSGID: 139001] > >>>>>>>>> [posix-acl.c:263:posix_acl_log_permit_denied] 0-longterm-access-control: > >>>>>>>>> client: > >>>>>>>>> CTX_ID:21aa9c75-3a5f-41f9-925b-48e4c80bd24a-GRAPH_ID:0-PID:16325-HOST:XXX-X-X-XXX-PC_NAME:longterm-client-0-RECON_NO:-0, > >>>>>>>>> gfid: 00000000-0000-0000-0000-000000000001, > >>>>>>>>> req(uid:2000,gid:2000,perm:1,ngrps:1), > >>>>>>>>> ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission > >>>>>>>>> denied] > >>>>>>>>> > >>>>>>>>> pending frames: > >>>>>>>>> > >>>>>>>>> frame : type(0) op(27) > >>>>>>>>> > >>>>>>>>> patchset: git://git.gluster.org/glusterfs.git > >>>>>>>>> > >>>>>>>>> signal received: 11 > >>>>>>>>> > >>>>>>>>> time of crash: > >>>>>>>>> > >>>>>>>>> 2019-05-02 07:43:39 > >>>>>>>>> > >>>>>>>>> configuration details: > >>>>>>>>> > >>>>>>>>> argp 1 > >>>>>>>>> > >>>>>>>>> backtrace 1 > >>>>>>>>> > >>>>>>>>> dlfcn 1 > >>>>>>>>> > >>>>>>>>> libpthread 1 > >>>>>>>>> > >>>>>>>>> llistxattr 1 > >>>>>>>>> > >>>>>>>>> setfsid 1 > >>>>>>>>> > >>>>>>>>> spinlock 1 > >>>>>>>>> > >>>>>>>>> epoll.h 1 > >>>>>>>>> > >>>>>>>>> xattr.h 1 > >>>>>>>>> > >>>>>>>>> st_atim.tv_nsec 1 > >>>>>>>>> > >>>>>>>>> package-string: glusterfs 5.5 > >>>>>>>>> > >>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fbb3f0b364c] > >>>>>>>>> > >>>>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fbb3f0bdd26] > >>>>>>>>> > >>>>>>>>> /lib64/libc.so.6(+0x361e0)[0x7fbb3e27a1e0] > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7fbb32257910] > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7fbb32254118] > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7fbb324708d6] > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7fbb326a275b] > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7fbb324693b3] > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7fbb322545b2] > >>>>>>>>> > >>>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c] > >>>>>>>>> > >>>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c] > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7fbb31c18548] > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7fbb3f155c22] > >>>>>>>>> > >>>>>>>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7fbb3f0d93a5] > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7fbb31a00088] > >>>>>>>>> > >>>>>>>>> /lib64/libpthread.so.0(+0x7569)[0x7fbb3e605569] > >>>>>>>>> > >>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fbb3e33c9ef] > >>>>>>>>> > >>>>>>>> > >>>>>>>> This happens on a 3-Node Gluster v5.5 Cluster on two different > >>>>>>>> volumes. But both volumes has the same settings: > >>>>>>>> > >>>>>>>>> Volume Name: shortterm > >>>>>>>>> Type: Replicate > >>>>>>>>> Volume ID: 5307e5c5-e8a1-493a-a846-342fb0195dee > >>>>>>>>> Status: Started > >>>>>>>>> Snapshot Count: 0 > >>>>>>>>> Number of Bricks: 1 x 3 = 3 > >>>>>>>>> Transport-type: tcp > >>>>>>>>> Bricks: > >>>>>>>>> Brick1: fs-xxxxx-c1-n1:/gluster/brick4/glusterbrick > >>>>>>>>> Brick2: fs-xxxxx-c1-n2:/gluster/brick4/glusterbrick > >>>>>>>>> Brick3: fs-xxxxx-c1-n3:/gluster/brick4/glusterbrick > >>>>>>>>> Options Reconfigured: > >>>>>>>>> storage.reserve: 1 > >>>>>>>>> performance.client-io-threads: off > >>>>>>>>> nfs.disable: on > >>>>>>>>> transport.address-family: inet > >>>>>>>>> user.smb: disable > >>>>>>>>> features.read-only: off > >>>>>>>>> features.worm: off > >>>>>>>>> features.worm-file-level: on > >>>>>>>>> features.retention-mode: enterprise > >>>>>>>>> features.default-retention-period: 120 > >>>>>>>>> network.ping-timeout: 10 > >>>>>>>>> features.cache-invalidation: on > >>>>>>>>> features.cache-invalidation-timeout: 600 > >>>>>>>>> performance.nl-cache: on > >>>>>>>>> performance.nl-cache-timeout: 600 > >>>>>>>>> client.event-threads: 32 > >>>>>>>>> server.event-threads: 32 > >>>>>>>>> cluster.lookup-optimize: on > >>>>>>>>> performance.stat-prefetch: on > >>>>>>>>> performance.cache-invalidation: on > >>>>>>>>> performance.md-cache-timeout: 600 > >>>>>>>>> performance.cache-samba-metadata: on > >>>>>>>>> performance.cache-ima-xattrs: on > >>>>>>>>> performance.io-thread-count: 64 > >>>>>>>>> cluster.use-compound-fops: on > >>>>>>>>> performance.cache-size: 512MB > >>>>>>>>> performance.cache-refresh-timeout: 10 > >>>>>>>>> performance.read-ahead: off > >>>>>>>>> performance.write-behind-window-size: 4MB > >>>>>>>>> performance.write-behind: on > >>>>>>>>> storage.build-pgfid: on > >>>>>>>>> features.utime: on > >>>>>>>>> storage.ctime: on > >>>>>>>>> cluster.quorum-type: fixed > >>>>>>>>> cluster.quorum-count: 2 > >>>>>>>>> features.bitrot: on > >>>>>>>>> features.scrub: Active > >>>>>>>>> features.scrub-freq: daily > >>>>>>>>> cluster.enable-shared-storage: enable > >>>>>>>>> > >>>>>>>>> > >>>>>>>> Why can this happen to all Brick processes? I don't understand the > >>>>>>>> crash report. The FOPs are nothing special and after restart brick > >>>>>>>> processes everything works fine and our application was succeed. > >>>>>>>> > >>>>>>>> Regards > >>>>>>>> David Spisla > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> _______________________________________________ > >>>>>>>> Gluster-users mailing list > >>>>>>>> Gluster-users at gluster.org > >>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users > >>>>>>> > >>>>>>> > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users From abhishpaliwal at gmail.com Fri May 17 09:16:20 2019 From: abhishpaliwal at gmail.com (ABHISHEK PALIWAL) Date: Fri, 17 May 2019 14:46:20 +0530 Subject: [Gluster-users] Memory leak in glusterfs In-Reply-To: References: Message-ID: Anyone please reply.... On Thu, May 16, 2019, 10:49 ABHISHEK PALIWAL wrote: > Hi Team, > > I upload some valgrind logs from my gluster 5.4 setup. This is writing to > the volume every 15 minutes. I stopped glusterd and then copy away the > logs. The test was running for some simulated days. They are zipped in > valgrind-54.zip. > > Lots of info in valgrind-2730.log. Lots of possibly lost bytes in > glusterfs and even some definitely lost bytes. > > ==2737== 1,572,880 bytes in 1 blocks are possibly lost in loss record 391 > of 391 > ==2737== at 0x4C29C25: calloc (in > /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) > ==2737== by 0xA22485E: ??? (in > /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) > ==2737== by 0xA217C94: ??? (in > /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) > ==2737== by 0xA21D9F8: ??? (in > /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) > ==2737== by 0xA21DED9: ??? (in > /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) > ==2737== by 0xA21E685: ??? (in > /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) > ==2737== by 0xA1B9D8C: init (in > /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so) > ==2737== by 0x4E511CE: xlator_init (in /usr/lib64/libglusterfs.so.0.0.1) > ==2737== by 0x4E8A2B8: ??? (in /usr/lib64/libglusterfs.so.0.0.1) > ==2737== by 0x4E8AAB3: glusterfs_graph_activate (in > /usr/lib64/libglusterfs.so.0.0.1) > ==2737== by 0x409C35: glusterfs_process_volfp (in /usr/sbin/glusterfsd) > ==2737== by 0x409D99: glusterfs_volumes_init (in /usr/sbin/glusterfsd) > ==2737== > ==2737== LEAK SUMMARY: > ==2737== definitely lost: 1,053 bytes in 10 blocks > ==2737== indirectly lost: 317 bytes in 3 blocks > ==2737== possibly lost: 2,374,971 bytes in 524 blocks > ==2737== still reachable: 53,277 bytes in 201 blocks > ==2737== suppressed: 0 bytes in 0 blocks > > -- > > > > > Regards > Abhishek Paliwal > -------------- next part -------------- An HTML attachment was scrubbed... URL: From spisla80 at gmail.com Fri May 17 09:17:52 2019 From: spisla80 at gmail.com (David Spisla) Date: Fri, 17 May 2019 11:17:52 +0200 Subject: [Gluster-users] Brick-Xlators crashes after Set-RO and Read In-Reply-To: <20190517082141.GA24535@ndevos-x270> References: <20190517082141.GA24535@ndevos-x270> Message-ID: Hello Niels, Am Fr., 17. Mai 2019 um 10:21 Uhr schrieb Niels de Vos : > On Fri, May 17, 2019 at 09:50:28AM +0200, David Spisla wrote: > > Hello Vijay, > > thank you for the clarification. Yes, there is an unconditional > dereference > > in stbuf. It seems plausible that this causes the crash. I think a check > > like this should help: > > > > if (buf == NULL) { > > goto out; > > } > > map_atime_from_server(this, buf); > > > > Is there a reason why buf can be NULL? > > It seems LOOKUP returned an error (errno=13: EACCES: Permission denied). > This is probably something you need to handle in worm_lookup_cbk. There > can be many reasons for a FOP to return an error, why it happened in > this case is a little difficult to say without (much) more details. > Yes, I will look for a way to handle that case. It is intended, that the struct stbuf ist NULL when an error happens? Regards David Spisla > HTH, > Niels > > > > > > Regards > > David Spisla > > > > > > Am Fr., 17. Mai 2019 um 01:51 Uhr schrieb Vijay Bellur < > vbellur at redhat.com>: > > > > > Hello David, > > > > > > From the backtrace it looks like stbuf is NULL in > map_atime_from_server() > > > as worm_lookup_cbk has got an error (op_ret = -1, op_errno = 13). Can > you > > > please check if there is an unconditional dereference of stbuf in > > > map_atime_from_server()? > > > > > > Regards, > > > Vijay > > > > > > On Thu, May 16, 2019 at 2:36 AM David Spisla > wrote: > > > > > >> Hello Vijay, > > >> > > >> yes, we are using custom patches. It s a helper function, which is > > >> defined in xlator_helper.c and used in worm_lookup_cbk. > > >> Do you think this could be the problem? The functions only manipulates > > >> the atime in struct iattr > > >> > > >> Regards > > >> David Spisla > > >> > > >> Am Do., 16. Mai 2019 um 10:05 Uhr schrieb Vijay Bellur < > > >> vbellur at redhat.com>: > > >> > > >>> Hello David, > > >>> > > >>> Do you have any custom patches in your deployment? I looked up v5.5 > but > > >>> could not find the following functions referred to in the core: > > >>> > > >>> map_atime_from_server() > > >>> worm_lookup_cbk() > > >>> > > >>> Neither do I see xlator_helper.c in the codebase. > > >>> > > >>> Thanks, > > >>> Vijay > > >>> > > >>> > > >>> #0 map_atime_from_server (this=0x7fdef401af00, stbuf=0x0) at > > >>> ../../../../xlators/lib/src/xlator_helper.c:21 > > >>> __FUNCTION__ = "map_to_atime_from_server" > > >>> #1 0x00007fdef39a0382 in worm_lookup_cbk (frame=frame at entry > =0x7fdeac0015c8, > > >>> cookie=, this=0x7fdef401af00, op_ret=op_ret at entry=-1, > > >>> op_errno=op_errno at entry=13, > > >>> inode=inode at entry=0x0, buf=0x0, xdata=0x0, postparent=0x0) at > > >>> worm.c:531 > > >>> priv = 0x7fdef4075378 > > >>> ret = 0 > > >>> __FUNCTION__ = "worm_lookup_cbk" > > >>> > > >>> On Thu, May 16, 2019 at 12:53 AM David Spisla > > >>> wrote: > > >>> > > >>>> Hello Vijay, > > >>>> > > >>>> I could reproduce the issue. After doing a simple DIR Listing from > > >>>> Win10 powershell, all brick processes crashes. Its not the same > scenario > > >>>> mentioned before but the crash report in the bricks log is the same. > > >>>> Attached you find the backtrace. > > >>>> > > >>>> Regards > > >>>> David Spisla > > >>>> > > >>>> Am Di., 7. Mai 2019 um 20:08 Uhr schrieb Vijay Bellur < > > >>>> vbellur at redhat.com>: > > >>>> > > >>>>> Hello David, > > >>>>> > > >>>>> On Tue, May 7, 2019 at 2:16 AM David Spisla > > >>>>> wrote: > > >>>>> > > >>>>>> Hello Vijay, > > >>>>>> > > >>>>>> how can I create such a core file? Or will it be created > > >>>>>> automatically if a gluster process crashes? > > >>>>>> Maybe you can give me a hint and will try to get a backtrace. > > >>>>>> > > >>>>> > > >>>>> Generation of core file is dependent on the system configuration. > > >>>>> `man 5 core` contains useful information to generate a core file > in a > > >>>>> directory. Once a core file is generated, you can use gdb to get a > > >>>>> backtrace of all threads (using "thread apply all bt full"). > > >>>>> > > >>>>> > > >>>>>> Unfortunately this bug is not easy to reproduce because it appears > > >>>>>> only sometimes. > > >>>>>> > > >>>>> > > >>>>> If the bug is not easy to reproduce, having a backtrace from the > > >>>>> generated core would be very useful! > > >>>>> > > >>>>> Thanks, > > >>>>> Vijay > > >>>>> > > >>>>> > > >>>>>> > > >>>>>> Regards > > >>>>>> David Spisla > > >>>>>> > > >>>>>> Am Mo., 6. Mai 2019 um 19:48 Uhr schrieb Vijay Bellur < > > >>>>>> vbellur at redhat.com>: > > >>>>>> > > >>>>>>> Thank you for the report, David. Do you have core files > available on > > >>>>>>> any of the servers? If yes, would it be possible for you to > provide a > > >>>>>>> backtrace. > > >>>>>>> > > >>>>>>> Regards, > > >>>>>>> Vijay > > >>>>>>> > > >>>>>>> On Mon, May 6, 2019 at 3:09 AM David Spisla > > >>>>>>> wrote: > > >>>>>>> > > >>>>>>>> Hello folks, > > >>>>>>>> > > >>>>>>>> we have a client application (runs on Win10) which does some > FOPs > > >>>>>>>> on a gluster volume which is accessed by SMB. > > >>>>>>>> > > >>>>>>>> *Scenario 1* is a READ Operation which reads all files > > >>>>>>>> successively and checks if the files data was correctly copied. > While doing > > >>>>>>>> this, all brick processes crashes and in the logs one have this > crash > > >>>>>>>> report on every brick log: > > >>>>>>>> > > >>>>>>>>> > CTX_ID:a0359502-2c76-4fee-8cb9-365679dc690e-GRAPH_ID:0-PID:32934-HOST:XX-XXXXX-XX-XX-PC_NAME:shortterm-client-2-RECON_NO:-0, > gfid: 00000000-0000-0000-0000-000000000001, > req(uid:2000,gid:2000,perm:1,ngrps:1), > ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission > denied] > > >>>>>>>>> pending frames: > > >>>>>>>>> frame : type(0) op(27) > > >>>>>>>>> frame : type(0) op(40) > > >>>>>>>>> patchset: git://git.gluster.org/glusterfs.git > > >>>>>>>>> signal received: 11 > > >>>>>>>>> time of crash: > > >>>>>>>>> 2019-04-16 08:32:21 > > >>>>>>>>> configuration details: > > >>>>>>>>> argp 1 > > >>>>>>>>> backtrace 1 > > >>>>>>>>> dlfcn 1 > > >>>>>>>>> libpthread 1 > > >>>>>>>>> llistxattr 1 > > >>>>>>>>> setfsid 1 > > >>>>>>>>> spinlock 1 > > >>>>>>>>> epoll.h 1 > > >>>>>>>>> xattr.h 1 > > >>>>>>>>> st_atim.tv_nsec 1 > > >>>>>>>>> package-string: glusterfs 5.5 > > >>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f9a5bd4d64c] > > >>>>>>>>> > /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f9a5bd57d26] > > >>>>>>>>> /lib64/libc.so.6(+0x361a0)[0x7f9a5af141a0] > > >>>>>>>>> > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7f9a4ef0e910] > > >>>>>>>>> > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7f9a4ef0b118] > > >>>>>>>>> > /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7f9a4f1278d6] > > >>>>>>>>> > /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7f9a4f35975b] > > >>>>>>>>> > /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7f9a4f1203b3] > > >>>>>>>>> > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7f9a4ef0b5b2] > > >>>>>>>>> > /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c] > > >>>>>>>>> > /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c] > > >>>>>>>>> > /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7f9a4e8cf548] > > >>>>>>>>> > /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7f9a5bdefc22] > > >>>>>>>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7f9a5bd733a5] > > >>>>>>>>> > /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7f9a4e6b7088] > > >>>>>>>>> /lib64/libpthread.so.0(+0x7569)[0x7f9a5b29f569] > > >>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7f9a5afd69af] > > >>>>>>>>> > > >>>>>>>>> *Scenario 2 *The application just SET Read-Only on each file > > >>>>>>>> sucessively. After the 70th file was set, all the bricks > crashes and again, > > >>>>>>>> one can read this crash report in every brick log: > > >>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> [2019-05-02 07:43:39.953591] I [MSGID: 139001] > > >>>>>>>>> [posix-acl.c:263:posix_acl_log_permit_denied] > 0-longterm-access-control: > > >>>>>>>>> client: > > >>>>>>>>> > CTX_ID:21aa9c75-3a5f-41f9-925b-48e4c80bd24a-GRAPH_ID:0-PID:16325-HOST:XXX-X-X-XXX-PC_NAME:longterm-client-0-RECON_NO:-0, > > >>>>>>>>> gfid: 00000000-0000-0000-0000-000000000001, > > >>>>>>>>> req(uid:2000,gid:2000,perm:1,ngrps:1), > > >>>>>>>>> ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, > acl:-) [Permission > > >>>>>>>>> denied] > > >>>>>>>>> > > >>>>>>>>> pending frames: > > >>>>>>>>> > > >>>>>>>>> frame : type(0) op(27) > > >>>>>>>>> > > >>>>>>>>> patchset: git://git.gluster.org/glusterfs.git > > >>>>>>>>> > > >>>>>>>>> signal received: 11 > > >>>>>>>>> > > >>>>>>>>> time of crash: > > >>>>>>>>> > > >>>>>>>>> 2019-05-02 07:43:39 > > >>>>>>>>> > > >>>>>>>>> configuration details: > > >>>>>>>>> > > >>>>>>>>> argp 1 > > >>>>>>>>> > > >>>>>>>>> backtrace 1 > > >>>>>>>>> > > >>>>>>>>> dlfcn 1 > > >>>>>>>>> > > >>>>>>>>> libpthread 1 > > >>>>>>>>> > > >>>>>>>>> llistxattr 1 > > >>>>>>>>> > > >>>>>>>>> setfsid 1 > > >>>>>>>>> > > >>>>>>>>> spinlock 1 > > >>>>>>>>> > > >>>>>>>>> epoll.h 1 > > >>>>>>>>> > > >>>>>>>>> xattr.h 1 > > >>>>>>>>> > > >>>>>>>>> st_atim.tv_nsec 1 > > >>>>>>>>> > > >>>>>>>>> package-string: glusterfs 5.5 > > >>>>>>>>> > > >>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fbb3f0b364c] > > >>>>>>>>> > > >>>>>>>>> > /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fbb3f0bdd26] > > >>>>>>>>> > > >>>>>>>>> /lib64/libc.so.6(+0x361e0)[0x7fbb3e27a1e0] > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7fbb32257910] > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7fbb32254118] > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7fbb324708d6] > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7fbb326a275b] > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7fbb324693b3] > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7fbb322545b2] > > >>>>>>>>> > > >>>>>>>>> > /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c] > > >>>>>>>>> > > >>>>>>>>> > /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c] > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7fbb31c18548] > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7fbb3f155c22] > > >>>>>>>>> > > >>>>>>>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7fbb3f0d93a5] > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7fbb31a00088] > > >>>>>>>>> > > >>>>>>>>> /lib64/libpthread.so.0(+0x7569)[0x7fbb3e605569] > > >>>>>>>>> > > >>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fbb3e33c9ef] > > >>>>>>>>> > > >>>>>>>> > > >>>>>>>> This happens on a 3-Node Gluster v5.5 Cluster on two different > > >>>>>>>> volumes. But both volumes has the same settings: > > >>>>>>>> > > >>>>>>>>> Volume Name: shortterm > > >>>>>>>>> Type: Replicate > > >>>>>>>>> Volume ID: 5307e5c5-e8a1-493a-a846-342fb0195dee > > >>>>>>>>> Status: Started > > >>>>>>>>> Snapshot Count: 0 > > >>>>>>>>> Number of Bricks: 1 x 3 = 3 > > >>>>>>>>> Transport-type: tcp > > >>>>>>>>> Bricks: > > >>>>>>>>> Brick1: fs-xxxxx-c1-n1:/gluster/brick4/glusterbrick > > >>>>>>>>> Brick2: fs-xxxxx-c1-n2:/gluster/brick4/glusterbrick > > >>>>>>>>> Brick3: fs-xxxxx-c1-n3:/gluster/brick4/glusterbrick > > >>>>>>>>> Options Reconfigured: > > >>>>>>>>> storage.reserve: 1 > > >>>>>>>>> performance.client-io-threads: off > > >>>>>>>>> nfs.disable: on > > >>>>>>>>> transport.address-family: inet > > >>>>>>>>> user.smb: disable > > >>>>>>>>> features.read-only: off > > >>>>>>>>> features.worm: off > > >>>>>>>>> features.worm-file-level: on > > >>>>>>>>> features.retention-mode: enterprise > > >>>>>>>>> features.default-retention-period: 120 > > >>>>>>>>> network.ping-timeout: 10 > > >>>>>>>>> features.cache-invalidation: on > > >>>>>>>>> features.cache-invalidation-timeout: 600 > > >>>>>>>>> performance.nl-cache: on > > >>>>>>>>> performance.nl-cache-timeout: 600 > > >>>>>>>>> client.event-threads: 32 > > >>>>>>>>> server.event-threads: 32 > > >>>>>>>>> cluster.lookup-optimize: on > > >>>>>>>>> performance.stat-prefetch: on > > >>>>>>>>> performance.cache-invalidation: on > > >>>>>>>>> performance.md-cache-timeout: 600 > > >>>>>>>>> performance.cache-samba-metadata: on > > >>>>>>>>> performance.cache-ima-xattrs: on > > >>>>>>>>> performance.io-thread-count: 64 > > >>>>>>>>> cluster.use-compound-fops: on > > >>>>>>>>> performance.cache-size: 512MB > > >>>>>>>>> performance.cache-refresh-timeout: 10 > > >>>>>>>>> performance.read-ahead: off > > >>>>>>>>> performance.write-behind-window-size: 4MB > > >>>>>>>>> performance.write-behind: on > > >>>>>>>>> storage.build-pgfid: on > > >>>>>>>>> features.utime: on > > >>>>>>>>> storage.ctime: on > > >>>>>>>>> cluster.quorum-type: fixed > > >>>>>>>>> cluster.quorum-count: 2 > > >>>>>>>>> features.bitrot: on > > >>>>>>>>> features.scrub: Active > > >>>>>>>>> features.scrub-freq: daily > > >>>>>>>>> cluster.enable-shared-storage: enable > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>> Why can this happen to all Brick processes? I don't understand > the > > >>>>>>>> crash report. The FOPs are nothing special and after restart > brick > > >>>>>>>> processes everything works fine and our application was succeed. > > >>>>>>>> > > >>>>>>>> Regards > > >>>>>>>> David Spisla > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> _______________________________________________ > > >>>>>>>> Gluster-users mailing list > > >>>>>>>> Gluster-users at gluster.org > > >>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users > > >>>>>>> > > >>>>>>> > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndevos at redhat.com Fri May 17 09:35:02 2019 From: ndevos at redhat.com (Niels de Vos) Date: Fri, 17 May 2019 11:35:02 +0200 Subject: [Gluster-users] Brick-Xlators crashes after Set-RO and Read In-Reply-To: References: <20190517082141.GA24535@ndevos-x270> Message-ID: <20190517093502.GB24535@ndevos-x270> On Fri, May 17, 2019 at 11:17:52AM +0200, David Spisla wrote: > Hello Niels, > > Am Fr., 17. Mai 2019 um 10:21 Uhr schrieb Niels de Vos : > > > On Fri, May 17, 2019 at 09:50:28AM +0200, David Spisla wrote: > > > Hello Vijay, > > > thank you for the clarification. Yes, there is an unconditional > > dereference > > > in stbuf. It seems plausible that this causes the crash. I think a check > > > like this should help: > > > > > > if (buf == NULL) { > > > goto out; > > > } > > > map_atime_from_server(this, buf); > > > > > > Is there a reason why buf can be NULL? > > > > It seems LOOKUP returned an error (errno=13: EACCES: Permission denied). > > This is probably something you need to handle in worm_lookup_cbk. There > > can be many reasons for a FOP to return an error, why it happened in > > this case is a little difficult to say without (much) more details. > > > Yes, I will look for a way to handle that case. > It is intended, that the struct stbuf ist NULL when an error happens? Yes, in most error occasions it will not be possible to get a valid stbuf. Niels > > Regards > David Spisla > > > > HTH, > > Niels > > > > > > > > > > Regards > > > David Spisla > > > > > > > > > Am Fr., 17. Mai 2019 um 01:51 Uhr schrieb Vijay Bellur < > > vbellur at redhat.com>: > > > > > > > Hello David, > > > > > > > > From the backtrace it looks like stbuf is NULL in > > map_atime_from_server() > > > > as worm_lookup_cbk has got an error (op_ret = -1, op_errno = 13). Can > > you > > > > please check if there is an unconditional dereference of stbuf in > > > > map_atime_from_server()? > > > > > > > > Regards, > > > > Vijay > > > > > > > > On Thu, May 16, 2019 at 2:36 AM David Spisla > > wrote: > > > > > > > >> Hello Vijay, > > > >> > > > >> yes, we are using custom patches. It s a helper function, which is > > > >> defined in xlator_helper.c and used in worm_lookup_cbk. > > > >> Do you think this could be the problem? The functions only manipulates > > > >> the atime in struct iattr > > > >> > > > >> Regards > > > >> David Spisla > > > >> > > > >> Am Do., 16. Mai 2019 um 10:05 Uhr schrieb Vijay Bellur < > > > >> vbellur at redhat.com>: > > > >> > > > >>> Hello David, > > > >>> > > > >>> Do you have any custom patches in your deployment? I looked up v5.5 > > but > > > >>> could not find the following functions referred to in the core: > > > >>> > > > >>> map_atime_from_server() > > > >>> worm_lookup_cbk() > > > >>> > > > >>> Neither do I see xlator_helper.c in the codebase. > > > >>> > > > >>> Thanks, > > > >>> Vijay > > > >>> > > > >>> > > > >>> #0 map_atime_from_server (this=0x7fdef401af00, stbuf=0x0) at > > > >>> ../../../../xlators/lib/src/xlator_helper.c:21 > > > >>> __FUNCTION__ = "map_to_atime_from_server" > > > >>> #1 0x00007fdef39a0382 in worm_lookup_cbk (frame=frame at entry > > =0x7fdeac0015c8, > > > >>> cookie=, this=0x7fdef401af00, op_ret=op_ret at entry=-1, > > > >>> op_errno=op_errno at entry=13, > > > >>> inode=inode at entry=0x0, buf=0x0, xdata=0x0, postparent=0x0) at > > > >>> worm.c:531 > > > >>> priv = 0x7fdef4075378 > > > >>> ret = 0 > > > >>> __FUNCTION__ = "worm_lookup_cbk" > > > >>> > > > >>> On Thu, May 16, 2019 at 12:53 AM David Spisla > > > >>> wrote: > > > >>> > > > >>>> Hello Vijay, > > > >>>> > > > >>>> I could reproduce the issue. After doing a simple DIR Listing from > > > >>>> Win10 powershell, all brick processes crashes. Its not the same > > scenario > > > >>>> mentioned before but the crash report in the bricks log is the same. > > > >>>> Attached you find the backtrace. > > > >>>> > > > >>>> Regards > > > >>>> David Spisla > > > >>>> > > > >>>> Am Di., 7. Mai 2019 um 20:08 Uhr schrieb Vijay Bellur < > > > >>>> vbellur at redhat.com>: > > > >>>> > > > >>>>> Hello David, > > > >>>>> > > > >>>>> On Tue, May 7, 2019 at 2:16 AM David Spisla > > > >>>>> wrote: > > > >>>>> > > > >>>>>> Hello Vijay, > > > >>>>>> > > > >>>>>> how can I create such a core file? Or will it be created > > > >>>>>> automatically if a gluster process crashes? > > > >>>>>> Maybe you can give me a hint and will try to get a backtrace. > > > >>>>>> > > > >>>>> > > > >>>>> Generation of core file is dependent on the system configuration. > > > >>>>> `man 5 core` contains useful information to generate a core file > > in a > > > >>>>> directory. Once a core file is generated, you can use gdb to get a > > > >>>>> backtrace of all threads (using "thread apply all bt full"). > > > >>>>> > > > >>>>> > > > >>>>>> Unfortunately this bug is not easy to reproduce because it appears > > > >>>>>> only sometimes. > > > >>>>>> > > > >>>>> > > > >>>>> If the bug is not easy to reproduce, having a backtrace from the > > > >>>>> generated core would be very useful! > > > >>>>> > > > >>>>> Thanks, > > > >>>>> Vijay > > > >>>>> > > > >>>>> > > > >>>>>> > > > >>>>>> Regards > > > >>>>>> David Spisla > > > >>>>>> > > > >>>>>> Am Mo., 6. Mai 2019 um 19:48 Uhr schrieb Vijay Bellur < > > > >>>>>> vbellur at redhat.com>: > > > >>>>>> > > > >>>>>>> Thank you for the report, David. Do you have core files > > available on > > > >>>>>>> any of the servers? If yes, would it be possible for you to > > provide a > > > >>>>>>> backtrace. > > > >>>>>>> > > > >>>>>>> Regards, > > > >>>>>>> Vijay > > > >>>>>>> > > > >>>>>>> On Mon, May 6, 2019 at 3:09 AM David Spisla > > > >>>>>>> wrote: > > > >>>>>>> > > > >>>>>>>> Hello folks, > > > >>>>>>>> > > > >>>>>>>> we have a client application (runs on Win10) which does some > > FOPs > > > >>>>>>>> on a gluster volume which is accessed by SMB. > > > >>>>>>>> > > > >>>>>>>> *Scenario 1* is a READ Operation which reads all files > > > >>>>>>>> successively and checks if the files data was correctly copied. > > While doing > > > >>>>>>>> this, all brick processes crashes and in the logs one have this > > crash > > > >>>>>>>> report on every brick log: > > > >>>>>>>> > > > >>>>>>>>> > > CTX_ID:a0359502-2c76-4fee-8cb9-365679dc690e-GRAPH_ID:0-PID:32934-HOST:XX-XXXXX-XX-XX-PC_NAME:shortterm-client-2-RECON_NO:-0, > > gfid: 00000000-0000-0000-0000-000000000001, > > req(uid:2000,gid:2000,perm:1,ngrps:1), > > ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission > > denied] > > > >>>>>>>>> pending frames: > > > >>>>>>>>> frame : type(0) op(27) > > > >>>>>>>>> frame : type(0) op(40) > > > >>>>>>>>> patchset: git://git.gluster.org/glusterfs.git > > > >>>>>>>>> signal received: 11 > > > >>>>>>>>> time of crash: > > > >>>>>>>>> 2019-04-16 08:32:21 > > > >>>>>>>>> configuration details: > > > >>>>>>>>> argp 1 > > > >>>>>>>>> backtrace 1 > > > >>>>>>>>> dlfcn 1 > > > >>>>>>>>> libpthread 1 > > > >>>>>>>>> llistxattr 1 > > > >>>>>>>>> setfsid 1 > > > >>>>>>>>> spinlock 1 > > > >>>>>>>>> epoll.h 1 > > > >>>>>>>>> xattr.h 1 > > > >>>>>>>>> st_atim.tv_nsec 1 > > > >>>>>>>>> package-string: glusterfs 5.5 > > > >>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f9a5bd4d64c] > > > >>>>>>>>> > > /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f9a5bd57d26] > > > >>>>>>>>> /lib64/libc.so.6(+0x361a0)[0x7f9a5af141a0] > > > >>>>>>>>> > > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7f9a4ef0e910] > > > >>>>>>>>> > > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7f9a4ef0b118] > > > >>>>>>>>> > > /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7f9a4f1278d6] > > > >>>>>>>>> > > /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7f9a4f35975b] > > > >>>>>>>>> > > /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7f9a4f1203b3] > > > >>>>>>>>> > > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7f9a4ef0b5b2] > > > >>>>>>>>> > > /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c] > > > >>>>>>>>> > > /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c] > > > >>>>>>>>> > > /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7f9a4e8cf548] > > > >>>>>>>>> > > /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7f9a5bdefc22] > > > >>>>>>>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7f9a5bd733a5] > > > >>>>>>>>> > > /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7f9a4e6b7088] > > > >>>>>>>>> /lib64/libpthread.so.0(+0x7569)[0x7f9a5b29f569] > > > >>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7f9a5afd69af] > > > >>>>>>>>> > > > >>>>>>>>> *Scenario 2 *The application just SET Read-Only on each file > > > >>>>>>>> sucessively. After the 70th file was set, all the bricks > > crashes and again, > > > >>>>>>>> one can read this crash report in every brick log: > > > >>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>> [2019-05-02 07:43:39.953591] I [MSGID: 139001] > > > >>>>>>>>> [posix-acl.c:263:posix_acl_log_permit_denied] > > 0-longterm-access-control: > > > >>>>>>>>> client: > > > >>>>>>>>> > > CTX_ID:21aa9c75-3a5f-41f9-925b-48e4c80bd24a-GRAPH_ID:0-PID:16325-HOST:XXX-X-X-XXX-PC_NAME:longterm-client-0-RECON_NO:-0, > > > >>>>>>>>> gfid: 00000000-0000-0000-0000-000000000001, > > > >>>>>>>>> req(uid:2000,gid:2000,perm:1,ngrps:1), > > > >>>>>>>>> ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, > > acl:-) [Permission > > > >>>>>>>>> denied] > > > >>>>>>>>> > > > >>>>>>>>> pending frames: > > > >>>>>>>>> > > > >>>>>>>>> frame : type(0) op(27) > > > >>>>>>>>> > > > >>>>>>>>> patchset: git://git.gluster.org/glusterfs.git > > > >>>>>>>>> > > > >>>>>>>>> signal received: 11 > > > >>>>>>>>> > > > >>>>>>>>> time of crash: > > > >>>>>>>>> > > > >>>>>>>>> 2019-05-02 07:43:39 > > > >>>>>>>>> > > > >>>>>>>>> configuration details: > > > >>>>>>>>> > > > >>>>>>>>> argp 1 > > > >>>>>>>>> > > > >>>>>>>>> backtrace 1 > > > >>>>>>>>> > > > >>>>>>>>> dlfcn 1 > > > >>>>>>>>> > > > >>>>>>>>> libpthread 1 > > > >>>>>>>>> > > > >>>>>>>>> llistxattr 1 > > > >>>>>>>>> > > > >>>>>>>>> setfsid 1 > > > >>>>>>>>> > > > >>>>>>>>> spinlock 1 > > > >>>>>>>>> > > > >>>>>>>>> epoll.h 1 > > > >>>>>>>>> > > > >>>>>>>>> xattr.h 1 > > > >>>>>>>>> > > > >>>>>>>>> st_atim.tv_nsec 1 > > > >>>>>>>>> > > > >>>>>>>>> package-string: glusterfs 5.5 > > > >>>>>>>>> > > > >>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fbb3f0b364c] > > > >>>>>>>>> > > > >>>>>>>>> > > /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fbb3f0bdd26] > > > >>>>>>>>> > > > >>>>>>>>> /lib64/libc.so.6(+0x361e0)[0x7fbb3e27a1e0] > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>> > > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7fbb32257910] > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>> > > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7fbb32254118] > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>> > > /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7fbb324708d6] > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>> > > /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7fbb326a275b] > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>> > > /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7fbb324693b3] > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>> > > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7fbb322545b2] > > > >>>>>>>>> > > > >>>>>>>>> > > /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c] > > > >>>>>>>>> > > > >>>>>>>>> > > /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c] > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>> > > /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7fbb31c18548] > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>> > > /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7fbb3f155c22] > > > >>>>>>>>> > > > >>>>>>>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7fbb3f0d93a5] > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>> > > /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7fbb31a00088] > > > >>>>>>>>> > > > >>>>>>>>> /lib64/libpthread.so.0(+0x7569)[0x7fbb3e605569] > > > >>>>>>>>> > > > >>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fbb3e33c9ef] > > > >>>>>>>>> > > > >>>>>>>> > > > >>>>>>>> This happens on a 3-Node Gluster v5.5 Cluster on two different > > > >>>>>>>> volumes. But both volumes has the same settings: > > > >>>>>>>> > > > >>>>>>>>> Volume Name: shortterm > > > >>>>>>>>> Type: Replicate > > > >>>>>>>>> Volume ID: 5307e5c5-e8a1-493a-a846-342fb0195dee > > > >>>>>>>>> Status: Started > > > >>>>>>>>> Snapshot Count: 0 > > > >>>>>>>>> Number of Bricks: 1 x 3 = 3 > > > >>>>>>>>> Transport-type: tcp > > > >>>>>>>>> Bricks: > > > >>>>>>>>> Brick1: fs-xxxxx-c1-n1:/gluster/brick4/glusterbrick > > > >>>>>>>>> Brick2: fs-xxxxx-c1-n2:/gluster/brick4/glusterbrick > > > >>>>>>>>> Brick3: fs-xxxxx-c1-n3:/gluster/brick4/glusterbrick > > > >>>>>>>>> Options Reconfigured: > > > >>>>>>>>> storage.reserve: 1 > > > >>>>>>>>> performance.client-io-threads: off > > > >>>>>>>>> nfs.disable: on > > > >>>>>>>>> transport.address-family: inet > > > >>>>>>>>> user.smb: disable > > > >>>>>>>>> features.read-only: off > > > >>>>>>>>> features.worm: off > > > >>>>>>>>> features.worm-file-level: on > > > >>>>>>>>> features.retention-mode: enterprise > > > >>>>>>>>> features.default-retention-period: 120 > > > >>>>>>>>> network.ping-timeout: 10 > > > >>>>>>>>> features.cache-invalidation: on > > > >>>>>>>>> features.cache-invalidation-timeout: 600 > > > >>>>>>>>> performance.nl-cache: on > > > >>>>>>>>> performance.nl-cache-timeout: 600 > > > >>>>>>>>> client.event-threads: 32 > > > >>>>>>>>> server.event-threads: 32 > > > >>>>>>>>> cluster.lookup-optimize: on > > > >>>>>>>>> performance.stat-prefetch: on > > > >>>>>>>>> performance.cache-invalidation: on > > > >>>>>>>>> performance.md-cache-timeout: 600 > > > >>>>>>>>> performance.cache-samba-metadata: on > > > >>>>>>>>> performance.cache-ima-xattrs: on > > > >>>>>>>>> performance.io-thread-count: 64 > > > >>>>>>>>> cluster.use-compound-fops: on > > > >>>>>>>>> performance.cache-size: 512MB > > > >>>>>>>>> performance.cache-refresh-timeout: 10 > > > >>>>>>>>> performance.read-ahead: off > > > >>>>>>>>> performance.write-behind-window-size: 4MB > > > >>>>>>>>> performance.write-behind: on > > > >>>>>>>>> storage.build-pgfid: on > > > >>>>>>>>> features.utime: on > > > >>>>>>>>> storage.ctime: on > > > >>>>>>>>> cluster.quorum-type: fixed > > > >>>>>>>>> cluster.quorum-count: 2 > > > >>>>>>>>> features.bitrot: on > > > >>>>>>>>> features.scrub: Active > > > >>>>>>>>> features.scrub-freq: daily > > > >>>>>>>>> cluster.enable-shared-storage: enable > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>> Why can this happen to all Brick processes? I don't understand > > the > > > >>>>>>>> crash report. The FOPs are nothing special and after restart > > brick > > > >>>>>>>> processes everything works fine and our application was succeed. > > > >>>>>>>> > > > >>>>>>>> Regards > > > >>>>>>>> David Spisla > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> _______________________________________________ > > > >>>>>>>> Gluster-users mailing list > > > >>>>>>>> Gluster-users at gluster.org > > > >>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users > > > >>>>>>> > > > >>>>>>> > > > > > _______________________________________________ > > > Gluster-users mailing list > > > Gluster-users at gluster.org > > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > From spisla80 at gmail.com Fri May 17 09:57:47 2019 From: spisla80 at gmail.com (David Spisla) Date: Fri, 17 May 2019 11:57:47 +0200 Subject: [Gluster-users] Brick-Xlators crashes after Set-RO and Read In-Reply-To: <20190517093502.GB24535@ndevos-x270> References: <20190517082141.GA24535@ndevos-x270> <20190517093502.GB24535@ndevos-x270> Message-ID: Hello Niels, Am Fr., 17. Mai 2019 um 11:35 Uhr schrieb Niels de Vos : > On Fri, May 17, 2019 at 11:17:52AM +0200, David Spisla wrote: > > Hello Niels, > > > > Am Fr., 17. Mai 2019 um 10:21 Uhr schrieb Niels de Vos < > ndevos at redhat.com>: > > > > > On Fri, May 17, 2019 at 09:50:28AM +0200, David Spisla wrote: > > > > Hello Vijay, > > > > thank you for the clarification. Yes, there is an unconditional > > > dereference > > > > in stbuf. It seems plausible that this causes the crash. I think a > check > > > > like this should help: > > > > > > > > if (buf == NULL) { > > > > goto out; > > > > } > > > > map_atime_from_server(this, buf); > > > > > > > > Is there a reason why buf can be NULL? > > > > > > It seems LOOKUP returned an error (errno=13: EACCES: Permission > denied). > > > This is probably something you need to handle in worm_lookup_cbk. There > > > can be many reasons for a FOP to return an error, why it happened in > > > this case is a little difficult to say without (much) more details. > > > > > Yes, I will look for a way to handle that case. > > It is intended, that the struct stbuf ist NULL when an error happens? > > Yes, in most error occasions it will not be possible to get a valid > stbuf. > I will do a check like this assuming that in case of an error op_errno != 0 and ret = -1 if (buf == NULL || op_errno != 0 || ret = -1) { goto out; } map_atime_from_server(this, buf); Does this fit? Regards David > > Niels > > > > > > Regards > > David Spisla > > > > > > > HTH, > > > Niels > > > > > > > > > > > > > > Regards > > > > David Spisla > > > > > > > > > > > > Am Fr., 17. Mai 2019 um 01:51 Uhr schrieb Vijay Bellur < > > > vbellur at redhat.com>: > > > > > > > > > Hello David, > > > > > > > > > > From the backtrace it looks like stbuf is NULL in > > > map_atime_from_server() > > > > > as worm_lookup_cbk has got an error (op_ret = -1, op_errno = 13). > Can > > > you > > > > > please check if there is an unconditional dereference of stbuf in > > > > > map_atime_from_server()? > > > > > > > > > > Regards, > > > > > Vijay > > > > > > > > > > On Thu, May 16, 2019 at 2:36 AM David Spisla > > > wrote: > > > > > > > > > >> Hello Vijay, > > > > >> > > > > >> yes, we are using custom patches. It s a helper function, which is > > > > >> defined in xlator_helper.c and used in worm_lookup_cbk. > > > > >> Do you think this could be the problem? The functions only > manipulates > > > > >> the atime in struct iattr > > > > >> > > > > >> Regards > > > > >> David Spisla > > > > >> > > > > >> Am Do., 16. Mai 2019 um 10:05 Uhr schrieb Vijay Bellur < > > > > >> vbellur at redhat.com>: > > > > >> > > > > >>> Hello David, > > > > >>> > > > > >>> Do you have any custom patches in your deployment? I looked up > v5.5 > > > but > > > > >>> could not find the following functions referred to in the core: > > > > >>> > > > > >>> map_atime_from_server() > > > > >>> worm_lookup_cbk() > > > > >>> > > > > >>> Neither do I see xlator_helper.c in the codebase. > > > > >>> > > > > >>> Thanks, > > > > >>> Vijay > > > > >>> > > > > >>> > > > > >>> #0 map_atime_from_server (this=0x7fdef401af00, stbuf=0x0) at > > > > >>> ../../../../xlators/lib/src/xlator_helper.c:21 > > > > >>> __FUNCTION__ = "map_to_atime_from_server" > > > > >>> #1 0x00007fdef39a0382 in worm_lookup_cbk (frame=frame at entry > > > =0x7fdeac0015c8, > > > > >>> cookie=, this=0x7fdef401af00, op_ret=op_ret at entry > =-1, > > > > >>> op_errno=op_errno at entry=13, > > > > >>> inode=inode at entry=0x0, buf=0x0, xdata=0x0, postparent=0x0) > at > > > > >>> worm.c:531 > > > > >>> priv = 0x7fdef4075378 > > > > >>> ret = 0 > > > > >>> __FUNCTION__ = "worm_lookup_cbk" > > > > >>> > > > > >>> On Thu, May 16, 2019 at 12:53 AM David Spisla < > spisla80 at gmail.com> > > > > >>> wrote: > > > > >>> > > > > >>>> Hello Vijay, > > > > >>>> > > > > >>>> I could reproduce the issue. After doing a simple DIR Listing > from > > > > >>>> Win10 powershell, all brick processes crashes. Its not the same > > > scenario > > > > >>>> mentioned before but the crash report in the bricks log is the > same. > > > > >>>> Attached you find the backtrace. > > > > >>>> > > > > >>>> Regards > > > > >>>> David Spisla > > > > >>>> > > > > >>>> Am Di., 7. Mai 2019 um 20:08 Uhr schrieb Vijay Bellur < > > > > >>>> vbellur at redhat.com>: > > > > >>>> > > > > >>>>> Hello David, > > > > >>>>> > > > > >>>>> On Tue, May 7, 2019 at 2:16 AM David Spisla < > spisla80 at gmail.com> > > > > >>>>> wrote: > > > > >>>>> > > > > >>>>>> Hello Vijay, > > > > >>>>>> > > > > >>>>>> how can I create such a core file? Or will it be created > > > > >>>>>> automatically if a gluster process crashes? > > > > >>>>>> Maybe you can give me a hint and will try to get a backtrace. > > > > >>>>>> > > > > >>>>> > > > > >>>>> Generation of core file is dependent on the system > configuration. > > > > >>>>> `man 5 core` contains useful information to generate a core > file > > > in a > > > > >>>>> directory. Once a core file is generated, you can use gdb to > get a > > > > >>>>> backtrace of all threads (using "thread apply all bt full"). > > > > >>>>> > > > > >>>>> > > > > >>>>>> Unfortunately this bug is not easy to reproduce because it > appears > > > > >>>>>> only sometimes. > > > > >>>>>> > > > > >>>>> > > > > >>>>> If the bug is not easy to reproduce, having a backtrace from > the > > > > >>>>> generated core would be very useful! > > > > >>>>> > > > > >>>>> Thanks, > > > > >>>>> Vijay > > > > >>>>> > > > > >>>>> > > > > >>>>>> > > > > >>>>>> Regards > > > > >>>>>> David Spisla > > > > >>>>>> > > > > >>>>>> Am Mo., 6. Mai 2019 um 19:48 Uhr schrieb Vijay Bellur < > > > > >>>>>> vbellur at redhat.com>: > > > > >>>>>> > > > > >>>>>>> Thank you for the report, David. Do you have core files > > > available on > > > > >>>>>>> any of the servers? If yes, would it be possible for you to > > > provide a > > > > >>>>>>> backtrace. > > > > >>>>>>> > > > > >>>>>>> Regards, > > > > >>>>>>> Vijay > > > > >>>>>>> > > > > >>>>>>> On Mon, May 6, 2019 at 3:09 AM David Spisla < > spisla80 at gmail.com> > > > > >>>>>>> wrote: > > > > >>>>>>> > > > > >>>>>>>> Hello folks, > > > > >>>>>>>> > > > > >>>>>>>> we have a client application (runs on Win10) which does some > > > FOPs > > > > >>>>>>>> on a gluster volume which is accessed by SMB. > > > > >>>>>>>> > > > > >>>>>>>> *Scenario 1* is a READ Operation which reads all files > > > > >>>>>>>> successively and checks if the files data was correctly > copied. > > > While doing > > > > >>>>>>>> this, all brick processes crashes and in the logs one have > this > > > crash > > > > >>>>>>>> report on every brick log: > > > > >>>>>>>> > > > > >>>>>>>>> > > > > CTX_ID:a0359502-2c76-4fee-8cb9-365679dc690e-GRAPH_ID:0-PID:32934-HOST:XX-XXXXX-XX-XX-PC_NAME:shortterm-client-2-RECON_NO:-0, > > > gfid: 00000000-0000-0000-0000-000000000001, > > > req(uid:2000,gid:2000,perm:1,ngrps:1), > > > ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) > [Permission > > > denied] > > > > >>>>>>>>> pending frames: > > > > >>>>>>>>> frame : type(0) op(27) > > > > >>>>>>>>> frame : type(0) op(40) > > > > >>>>>>>>> patchset: git://git.gluster.org/glusterfs.git > > > > >>>>>>>>> signal received: 11 > > > > >>>>>>>>> time of crash: > > > > >>>>>>>>> 2019-04-16 08:32:21 > > > > >>>>>>>>> configuration details: > > > > >>>>>>>>> argp 1 > > > > >>>>>>>>> backtrace 1 > > > > >>>>>>>>> dlfcn 1 > > > > >>>>>>>>> libpthread 1 > > > > >>>>>>>>> llistxattr 1 > > > > >>>>>>>>> setfsid 1 > > > > >>>>>>>>> spinlock 1 > > > > >>>>>>>>> epoll.h 1 > > > > >>>>>>>>> xattr.h 1 > > > > >>>>>>>>> st_atim.tv_nsec 1 > > > > >>>>>>>>> package-string: glusterfs 5.5 > > > > >>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f9a5bd4d64c] > > > > >>>>>>>>> > > > /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f9a5bd57d26] > > > > >>>>>>>>> /lib64/libc.so.6(+0x361a0)[0x7f9a5af141a0] > > > > >>>>>>>>> > > > > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7f9a4ef0e910] > > > > >>>>>>>>> > > > > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7f9a4ef0b118] > > > > >>>>>>>>> > > > > /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7f9a4f1278d6] > > > > >>>>>>>>> > > > > /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7f9a4f35975b] > > > > >>>>>>>>> > > > > /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7f9a4f1203b3] > > > > >>>>>>>>> > > > > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7f9a4ef0b5b2] > > > > >>>>>>>>> > > > /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c] > > > > >>>>>>>>> > > > /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c] > > > > >>>>>>>>> > > > > /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7f9a4e8cf548] > > > > >>>>>>>>> > > > > /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7f9a5bdefc22] > > > > >>>>>>>>> > /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7f9a5bd733a5] > > > > >>>>>>>>> > > > > /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7f9a4e6b7088] > > > > >>>>>>>>> /lib64/libpthread.so.0(+0x7569)[0x7f9a5b29f569] > > > > >>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7f9a5afd69af] > > > > >>>>>>>>> > > > > >>>>>>>>> *Scenario 2 *The application just SET Read-Only on each > file > > > > >>>>>>>> sucessively. After the 70th file was set, all the bricks > > > crashes and again, > > > > >>>>>>>> one can read this crash report in every brick log: > > > > >>>>>>>> > > > > >>>>>>>>> > > > > >>>>>>>>> > > > > >>>>>>>>> [2019-05-02 07:43:39.953591] I [MSGID: 139001] > > > > >>>>>>>>> [posix-acl.c:263:posix_acl_log_permit_denied] > > > 0-longterm-access-control: > > > > >>>>>>>>> client: > > > > >>>>>>>>> > > > > CTX_ID:21aa9c75-3a5f-41f9-925b-48e4c80bd24a-GRAPH_ID:0-PID:16325-HOST:XXX-X-X-XXX-PC_NAME:longterm-client-0-RECON_NO:-0, > > > > >>>>>>>>> gfid: 00000000-0000-0000-0000-000000000001, > > > > >>>>>>>>> req(uid:2000,gid:2000,perm:1,ngrps:1), > > > > >>>>>>>>> ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, > > > acl:-) [Permission > > > > >>>>>>>>> denied] > > > > >>>>>>>>> > > > > >>>>>>>>> pending frames: > > > > >>>>>>>>> > > > > >>>>>>>>> frame : type(0) op(27) > > > > >>>>>>>>> > > > > >>>>>>>>> patchset: git://git.gluster.org/glusterfs.git > > > > >>>>>>>>> > > > > >>>>>>>>> signal received: 11 > > > > >>>>>>>>> > > > > >>>>>>>>> time of crash: > > > > >>>>>>>>> > > > > >>>>>>>>> 2019-05-02 07:43:39 > > > > >>>>>>>>> > > > > >>>>>>>>> configuration details: > > > > >>>>>>>>> > > > > >>>>>>>>> argp 1 > > > > >>>>>>>>> > > > > >>>>>>>>> backtrace 1 > > > > >>>>>>>>> > > > > >>>>>>>>> dlfcn 1 > > > > >>>>>>>>> > > > > >>>>>>>>> libpthread 1 > > > > >>>>>>>>> > > > > >>>>>>>>> llistxattr 1 > > > > >>>>>>>>> > > > > >>>>>>>>> setfsid 1 > > > > >>>>>>>>> > > > > >>>>>>>>> spinlock 1 > > > > >>>>>>>>> > > > > >>>>>>>>> epoll.h 1 > > > > >>>>>>>>> > > > > >>>>>>>>> xattr.h 1 > > > > >>>>>>>>> > > > > >>>>>>>>> st_atim.tv_nsec 1 > > > > >>>>>>>>> > > > > >>>>>>>>> package-string: glusterfs 5.5 > > > > >>>>>>>>> > > > > >>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fbb3f0b364c] > > > > >>>>>>>>> > > > > >>>>>>>>> > > > /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fbb3f0bdd26] > > > > >>>>>>>>> > > > > >>>>>>>>> /lib64/libc.so.6(+0x361e0)[0x7fbb3e27a1e0] > > > > >>>>>>>>> > > > > >>>>>>>>> > > > > >>>>>>>>> > > > > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7fbb32257910] > > > > >>>>>>>>> > > > > >>>>>>>>> > > > > >>>>>>>>> > > > > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7fbb32254118] > > > > >>>>>>>>> > > > > >>>>>>>>> > > > > >>>>>>>>> > > > > /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7fbb324708d6] > > > > >>>>>>>>> > > > > >>>>>>>>> > > > > >>>>>>>>> > > > > /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7fbb326a275b] > > > > >>>>>>>>> > > > > >>>>>>>>> > > > > >>>>>>>>> > > > > /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7fbb324693b3] > > > > >>>>>>>>> > > > > >>>>>>>>> > > > > >>>>>>>>> > > > > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7fbb322545b2] > > > > >>>>>>>>> > > > > >>>>>>>>> > > > /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c] > > > > >>>>>>>>> > > > > >>>>>>>>> > > > /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c] > > > > >>>>>>>>> > > > > >>>>>>>>> > > > > >>>>>>>>> > > > > /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7fbb31c18548] > > > > >>>>>>>>> > > > > >>>>>>>>> > > > > >>>>>>>>> > > > > /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7fbb3f155c22] > > > > >>>>>>>>> > > > > >>>>>>>>> > /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7fbb3f0d93a5] > > > > >>>>>>>>> > > > > >>>>>>>>> > > > > >>>>>>>>> > > > > /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7fbb31a00088] > > > > >>>>>>>>> > > > > >>>>>>>>> /lib64/libpthread.so.0(+0x7569)[0x7fbb3e605569] > > > > >>>>>>>>> > > > > >>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fbb3e33c9ef] > > > > >>>>>>>>> > > > > >>>>>>>> > > > > >>>>>>>> This happens on a 3-Node Gluster v5.5 Cluster on two > different > > > > >>>>>>>> volumes. But both volumes has the same settings: > > > > >>>>>>>> > > > > >>>>>>>>> Volume Name: shortterm > > > > >>>>>>>>> Type: Replicate > > > > >>>>>>>>> Volume ID: 5307e5c5-e8a1-493a-a846-342fb0195dee > > > > >>>>>>>>> Status: Started > > > > >>>>>>>>> Snapshot Count: 0 > > > > >>>>>>>>> Number of Bricks: 1 x 3 = 3 > > > > >>>>>>>>> Transport-type: tcp > > > > >>>>>>>>> Bricks: > > > > >>>>>>>>> Brick1: fs-xxxxx-c1-n1:/gluster/brick4/glusterbrick > > > > >>>>>>>>> Brick2: fs-xxxxx-c1-n2:/gluster/brick4/glusterbrick > > > > >>>>>>>>> Brick3: fs-xxxxx-c1-n3:/gluster/brick4/glusterbrick > > > > >>>>>>>>> Options Reconfigured: > > > > >>>>>>>>> storage.reserve: 1 > > > > >>>>>>>>> performance.client-io-threads: off > > > > >>>>>>>>> nfs.disable: on > > > > >>>>>>>>> transport.address-family: inet > > > > >>>>>>>>> user.smb: disable > > > > >>>>>>>>> features.read-only: off > > > > >>>>>>>>> features.worm: off > > > > >>>>>>>>> features.worm-file-level: on > > > > >>>>>>>>> features.retention-mode: enterprise > > > > >>>>>>>>> features.default-retention-period: 120 > > > > >>>>>>>>> network.ping-timeout: 10 > > > > >>>>>>>>> features.cache-invalidation: on > > > > >>>>>>>>> features.cache-invalidation-timeout: 600 > > > > >>>>>>>>> performance.nl-cache: on > > > > >>>>>>>>> performance.nl-cache-timeout: 600 > > > > >>>>>>>>> client.event-threads: 32 > > > > >>>>>>>>> server.event-threads: 32 > > > > >>>>>>>>> cluster.lookup-optimize: on > > > > >>>>>>>>> performance.stat-prefetch: on > > > > >>>>>>>>> performance.cache-invalidation: on > > > > >>>>>>>>> performance.md-cache-timeout: 600 > > > > >>>>>>>>> performance.cache-samba-metadata: on > > > > >>>>>>>>> performance.cache-ima-xattrs: on > > > > >>>>>>>>> performance.io-thread-count: 64 > > > > >>>>>>>>> cluster.use-compound-fops: on > > > > >>>>>>>>> performance.cache-size: 512MB > > > > >>>>>>>>> performance.cache-refresh-timeout: 10 > > > > >>>>>>>>> performance.read-ahead: off > > > > >>>>>>>>> performance.write-behind-window-size: 4MB > > > > >>>>>>>>> performance.write-behind: on > > > > >>>>>>>>> storage.build-pgfid: on > > > > >>>>>>>>> features.utime: on > > > > >>>>>>>>> storage.ctime: on > > > > >>>>>>>>> cluster.quorum-type: fixed > > > > >>>>>>>>> cluster.quorum-count: 2 > > > > >>>>>>>>> features.bitrot: on > > > > >>>>>>>>> features.scrub: Active > > > > >>>>>>>>> features.scrub-freq: daily > > > > >>>>>>>>> cluster.enable-shared-storage: enable > > > > >>>>>>>>> > > > > >>>>>>>>> > > > > >>>>>>>> Why can this happen to all Brick processes? I don't > understand > > > the > > > > >>>>>>>> crash report. The FOPs are nothing special and after restart > > > brick > > > > >>>>>>>> processes everything works fine and our application was > succeed. > > > > >>>>>>>> > > > > >>>>>>>> Regards > > > > >>>>>>>> David Spisla > > > > >>>>>>>> > > > > >>>>>>>> > > > > >>>>>>>> > > > > >>>>>>>> _______________________________________________ > > > > >>>>>>>> Gluster-users mailing list > > > > >>>>>>>> Gluster-users at gluster.org > > > > >>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users > > > > >>>>>>> > > > > >>>>>>> > > > > > > > _______________________________________________ > > > > Gluster-users mailing list > > > > Gluster-users at gluster.org > > > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndevos at redhat.com Fri May 17 10:15:54 2019 From: ndevos at redhat.com (Niels de Vos) Date: Fri, 17 May 2019 12:15:54 +0200 Subject: [Gluster-users] Brick-Xlators crashes after Set-RO and Read In-Reply-To: References: <20190517082141.GA24535@ndevos-x270> <20190517093502.GB24535@ndevos-x270> Message-ID: <20190517101328.GC24535@ndevos-x270> On Fri, May 17, 2019 at 11:57:47AM +0200, David Spisla wrote: > Hello Niels, > > Am Fr., 17. Mai 2019 um 11:35 Uhr schrieb Niels de Vos : > > > On Fri, May 17, 2019 at 11:17:52AM +0200, David Spisla wrote: > > > Hello Niels, > > > > > > Am Fr., 17. Mai 2019 um 10:21 Uhr schrieb Niels de Vos < > > ndevos at redhat.com>: > > > > > > > On Fri, May 17, 2019 at 09:50:28AM +0200, David Spisla wrote: > > > > > Hello Vijay, > > > > > thank you for the clarification. Yes, there is an unconditional > > > > dereference > > > > > in stbuf. It seems plausible that this causes the crash. I think a > > check > > > > > like this should help: > > > > > > > > > > if (buf == NULL) { > > > > > goto out; > > > > > } > > > > > map_atime_from_server(this, buf); > > > > > > > > > > Is there a reason why buf can be NULL? > > > > > > > > It seems LOOKUP returned an error (errno=13: EACCES: Permission > > denied). > > > > This is probably something you need to handle in worm_lookup_cbk. There > > > > can be many reasons for a FOP to return an error, why it happened in > > > > this case is a little difficult to say without (much) more details. > > > > > > > Yes, I will look for a way to handle that case. > > > It is intended, that the struct stbuf ist NULL when an error happens? > > > > Yes, in most error occasions it will not be possible to get a valid > > stbuf. > > > I will do a check like this assuming that in case of an error op_errno != 0 > and ret = -1 > > if (buf == NULL || op_errno != 0 || ret = -1) { > goto out; > } > map_atime_from_server(this, buf); > > Does this fit? I think it is more common to do if (ret == -1) { /* error handling and unwind */ goto out; } map_atime_from_server(this, buf); Niels > Regards > David > > > > > Niels > > > > > > > > > > Regards > > > David Spisla > > > > > > > > > > HTH, > > > > Niels > > > > > > > > > > > > > > > > > > Regards > > > > > David Spisla > > > > > > > > > > > > > > > Am Fr., 17. Mai 2019 um 01:51 Uhr schrieb Vijay Bellur < > > > > vbellur at redhat.com>: > > > > > > > > > > > Hello David, > > > > > > > > > > > > From the backtrace it looks like stbuf is NULL in > > > > map_atime_from_server() > > > > > > as worm_lookup_cbk has got an error (op_ret = -1, op_errno = 13). > > Can > > > > you > > > > > > please check if there is an unconditional dereference of stbuf in > > > > > > map_atime_from_server()? > > > > > > > > > > > > Regards, > > > > > > Vijay > > > > > > > > > > > > On Thu, May 16, 2019 at 2:36 AM David Spisla > > > > wrote: > > > > > > > > > > > >> Hello Vijay, > > > > > >> > > > > > >> yes, we are using custom patches. It s a helper function, which is > > > > > >> defined in xlator_helper.c and used in worm_lookup_cbk. > > > > > >> Do you think this could be the problem? The functions only > > manipulates > > > > > >> the atime in struct iattr > > > > > >> > > > > > >> Regards > > > > > >> David Spisla > > > > > >> > > > > > >> Am Do., 16. Mai 2019 um 10:05 Uhr schrieb Vijay Bellur < > > > > > >> vbellur at redhat.com>: > > > > > >> > > > > > >>> Hello David, > > > > > >>> > > > > > >>> Do you have any custom patches in your deployment? I looked up > > v5.5 > > > > but > > > > > >>> could not find the following functions referred to in the core: > > > > > >>> > > > > > >>> map_atime_from_server() > > > > > >>> worm_lookup_cbk() > > > > > >>> > > > > > >>> Neither do I see xlator_helper.c in the codebase. > > > > > >>> > > > > > >>> Thanks, > > > > > >>> Vijay > > > > > >>> > > > > > >>> > > > > > >>> #0 map_atime_from_server (this=0x7fdef401af00, stbuf=0x0) at > > > > > >>> ../../../../xlators/lib/src/xlator_helper.c:21 > > > > > >>> __FUNCTION__ = "map_to_atime_from_server" > > > > > >>> #1 0x00007fdef39a0382 in worm_lookup_cbk (frame=frame at entry > > > > =0x7fdeac0015c8, > > > > > >>> cookie=, this=0x7fdef401af00, op_ret=op_ret at entry > > =-1, > > > > > >>> op_errno=op_errno at entry=13, > > > > > >>> inode=inode at entry=0x0, buf=0x0, xdata=0x0, postparent=0x0) > > at > > > > > >>> worm.c:531 > > > > > >>> priv = 0x7fdef4075378 > > > > > >>> ret = 0 > > > > > >>> __FUNCTION__ = "worm_lookup_cbk" > > > > > >>> > > > > > >>> On Thu, May 16, 2019 at 12:53 AM David Spisla < > > spisla80 at gmail.com> > > > > > >>> wrote: > > > > > >>> > > > > > >>>> Hello Vijay, > > > > > >>>> > > > > > >>>> I could reproduce the issue. After doing a simple DIR Listing > > from > > > > > >>>> Win10 powershell, all brick processes crashes. Its not the same > > > > scenario > > > > > >>>> mentioned before but the crash report in the bricks log is the > > same. > > > > > >>>> Attached you find the backtrace. > > > > > >>>> > > > > > >>>> Regards > > > > > >>>> David Spisla > > > > > >>>> > > > > > >>>> Am Di., 7. Mai 2019 um 20:08 Uhr schrieb Vijay Bellur < > > > > > >>>> vbellur at redhat.com>: > > > > > >>>> > > > > > >>>>> Hello David, > > > > > >>>>> > > > > > >>>>> On Tue, May 7, 2019 at 2:16 AM David Spisla < > > spisla80 at gmail.com> > > > > > >>>>> wrote: > > > > > >>>>> > > > > > >>>>>> Hello Vijay, > > > > > >>>>>> > > > > > >>>>>> how can I create such a core file? Or will it be created > > > > > >>>>>> automatically if a gluster process crashes? > > > > > >>>>>> Maybe you can give me a hint and will try to get a backtrace. > > > > > >>>>>> > > > > > >>>>> > > > > > >>>>> Generation of core file is dependent on the system > > configuration. > > > > > >>>>> `man 5 core` contains useful information to generate a core > > file > > > > in a > > > > > >>>>> directory. Once a core file is generated, you can use gdb to > > get a > > > > > >>>>> backtrace of all threads (using "thread apply all bt full"). > > > > > >>>>> > > > > > >>>>> > > > > > >>>>>> Unfortunately this bug is not easy to reproduce because it > > appears > > > > > >>>>>> only sometimes. > > > > > >>>>>> > > > > > >>>>> > > > > > >>>>> If the bug is not easy to reproduce, having a backtrace from > > the > > > > > >>>>> generated core would be very useful! > > > > > >>>>> > > > > > >>>>> Thanks, > > > > > >>>>> Vijay > > > > > >>>>> > > > > > >>>>> > > > > > >>>>>> > > > > > >>>>>> Regards > > > > > >>>>>> David Spisla > > > > > >>>>>> > > > > > >>>>>> Am Mo., 6. Mai 2019 um 19:48 Uhr schrieb Vijay Bellur < > > > > > >>>>>> vbellur at redhat.com>: > > > > > >>>>>> > > > > > >>>>>>> Thank you for the report, David. Do you have core files > > > > available on > > > > > >>>>>>> any of the servers? If yes, would it be possible for you to > > > > provide a > > > > > >>>>>>> backtrace. > > > > > >>>>>>> > > > > > >>>>>>> Regards, > > > > > >>>>>>> Vijay > > > > > >>>>>>> > > > > > >>>>>>> On Mon, May 6, 2019 at 3:09 AM David Spisla < > > spisla80 at gmail.com> > > > > > >>>>>>> wrote: > > > > > >>>>>>> > > > > > >>>>>>>> Hello folks, > > > > > >>>>>>>> > > > > > >>>>>>>> we have a client application (runs on Win10) which does some > > > > FOPs > > > > > >>>>>>>> on a gluster volume which is accessed by SMB. > > > > > >>>>>>>> > > > > > >>>>>>>> *Scenario 1* is a READ Operation which reads all files > > > > > >>>>>>>> successively and checks if the files data was correctly > > copied. > > > > While doing > > > > > >>>>>>>> this, all brick processes crashes and in the logs one have > > this > > > > crash > > > > > >>>>>>>> report on every brick log: > > > > > >>>>>>>> > > > > > >>>>>>>>> > > > > > > CTX_ID:a0359502-2c76-4fee-8cb9-365679dc690e-GRAPH_ID:0-PID:32934-HOST:XX-XXXXX-XX-XX-PC_NAME:shortterm-client-2-RECON_NO:-0, > > > > gfid: 00000000-0000-0000-0000-000000000001, > > > > req(uid:2000,gid:2000,perm:1,ngrps:1), > > > > ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) > > [Permission > > > > denied] > > > > > >>>>>>>>> pending frames: > > > > > >>>>>>>>> frame : type(0) op(27) > > > > > >>>>>>>>> frame : type(0) op(40) > > > > > >>>>>>>>> patchset: git://git.gluster.org/glusterfs.git > > > > > >>>>>>>>> signal received: 11 > > > > > >>>>>>>>> time of crash: > > > > > >>>>>>>>> 2019-04-16 08:32:21 > > > > > >>>>>>>>> configuration details: > > > > > >>>>>>>>> argp 1 > > > > > >>>>>>>>> backtrace 1 > > > > > >>>>>>>>> dlfcn 1 > > > > > >>>>>>>>> libpthread 1 > > > > > >>>>>>>>> llistxattr 1 > > > > > >>>>>>>>> setfsid 1 > > > > > >>>>>>>>> spinlock 1 > > > > > >>>>>>>>> epoll.h 1 > > > > > >>>>>>>>> xattr.h 1 > > > > > >>>>>>>>> st_atim.tv_nsec 1 > > > > > >>>>>>>>> package-string: glusterfs 5.5 > > > > > >>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f9a5bd4d64c] > > > > > >>>>>>>>> > > > > /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f9a5bd57d26] > > > > > >>>>>>>>> /lib64/libc.so.6(+0x361a0)[0x7f9a5af141a0] > > > > > >>>>>>>>> > > > > > > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7f9a4ef0e910] > > > > > >>>>>>>>> > > > > > > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7f9a4ef0b118] > > > > > >>>>>>>>> > > > > > > /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7f9a4f1278d6] > > > > > >>>>>>>>> > > > > > > /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7f9a4f35975b] > > > > > >>>>>>>>> > > > > > > /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7f9a4f1203b3] > > > > > >>>>>>>>> > > > > > > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7f9a4ef0b5b2] > > > > > >>>>>>>>> > > > > /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c] > > > > > >>>>>>>>> > > > > /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c] > > > > > >>>>>>>>> > > > > > > /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7f9a4e8cf548] > > > > > >>>>>>>>> > > > > > > /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7f9a5bdefc22] > > > > > >>>>>>>>> > > /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7f9a5bd733a5] > > > > > >>>>>>>>> > > > > > > /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7f9a4e6b7088] > > > > > >>>>>>>>> /lib64/libpthread.so.0(+0x7569)[0x7f9a5b29f569] > > > > > >>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7f9a5afd69af] > > > > > >>>>>>>>> > > > > > >>>>>>>>> *Scenario 2 *The application just SET Read-Only on each > > file > > > > > >>>>>>>> sucessively. After the 70th file was set, all the bricks > > > > crashes and again, > > > > > >>>>>>>> one can read this crash report in every brick log: > > > > > >>>>>>>> > > > > > >>>>>>>>> > > > > > >>>>>>>>> > > > > > >>>>>>>>> [2019-05-02 07:43:39.953591] I [MSGID: 139001] > > > > > >>>>>>>>> [posix-acl.c:263:posix_acl_log_permit_denied] > > > > 0-longterm-access-control: > > > > > >>>>>>>>> client: > > > > > >>>>>>>>> > > > > > > CTX_ID:21aa9c75-3a5f-41f9-925b-48e4c80bd24a-GRAPH_ID:0-PID:16325-HOST:XXX-X-X-XXX-PC_NAME:longterm-client-0-RECON_NO:-0, > > > > > >>>>>>>>> gfid: 00000000-0000-0000-0000-000000000001, > > > > > >>>>>>>>> req(uid:2000,gid:2000,perm:1,ngrps:1), > > > > > >>>>>>>>> ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, > > > > acl:-) [Permission > > > > > >>>>>>>>> denied] > > > > > >>>>>>>>> > > > > > >>>>>>>>> pending frames: > > > > > >>>>>>>>> > > > > > >>>>>>>>> frame : type(0) op(27) > > > > > >>>>>>>>> > > > > > >>>>>>>>> patchset: git://git.gluster.org/glusterfs.git > > > > > >>>>>>>>> > > > > > >>>>>>>>> signal received: 11 > > > > > >>>>>>>>> > > > > > >>>>>>>>> time of crash: > > > > > >>>>>>>>> > > > > > >>>>>>>>> 2019-05-02 07:43:39 > > > > > >>>>>>>>> > > > > > >>>>>>>>> configuration details: > > > > > >>>>>>>>> > > > > > >>>>>>>>> argp 1 > > > > > >>>>>>>>> > > > > > >>>>>>>>> backtrace 1 > > > > > >>>>>>>>> > > > > > >>>>>>>>> dlfcn 1 > > > > > >>>>>>>>> > > > > > >>>>>>>>> libpthread 1 > > > > > >>>>>>>>> > > > > > >>>>>>>>> llistxattr 1 > > > > > >>>>>>>>> > > > > > >>>>>>>>> setfsid 1 > > > > > >>>>>>>>> > > > > > >>>>>>>>> spinlock 1 > > > > > >>>>>>>>> > > > > > >>>>>>>>> epoll.h 1 > > > > > >>>>>>>>> > > > > > >>>>>>>>> xattr.h 1 > > > > > >>>>>>>>> > > > > > >>>>>>>>> st_atim.tv_nsec 1 > > > > > >>>>>>>>> > > > > > >>>>>>>>> package-string: glusterfs 5.5 > > > > > >>>>>>>>> > > > > > >>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fbb3f0b364c] > > > > > >>>>>>>>> > > > > > >>>>>>>>> > > > > /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fbb3f0bdd26] > > > > > >>>>>>>>> > > > > > >>>>>>>>> /lib64/libc.so.6(+0x361e0)[0x7fbb3e27a1e0] > > > > > >>>>>>>>> > > > > > >>>>>>>>> > > > > > >>>>>>>>> > > > > > > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7fbb32257910] > > > > > >>>>>>>>> > > > > > >>>>>>>>> > > > > > >>>>>>>>> > > > > > > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7fbb32254118] > > > > > >>>>>>>>> > > > > > >>>>>>>>> > > > > > >>>>>>>>> > > > > > > /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7fbb324708d6] > > > > > >>>>>>>>> > > > > > >>>>>>>>> > > > > > >>>>>>>>> > > > > > > /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7fbb326a275b] > > > > > >>>>>>>>> > > > > > >>>>>>>>> > > > > > >>>>>>>>> > > > > > > /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7fbb324693b3] > > > > > >>>>>>>>> > > > > > >>>>>>>>> > > > > > >>>>>>>>> > > > > > > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7fbb322545b2] > > > > > >>>>>>>>> > > > > > >>>>>>>>> > > > > /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c] > > > > > >>>>>>>>> > > > > > >>>>>>>>> > > > > /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c] > > > > > >>>>>>>>> > > > > > >>>>>>>>> > > > > > >>>>>>>>> > > > > > > /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7fbb31c18548] > > > > > >>>>>>>>> > > > > > >>>>>>>>> > > > > > >>>>>>>>> > > > > > > /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7fbb3f155c22] > > > > > >>>>>>>>> > > > > > >>>>>>>>> > > /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7fbb3f0d93a5] > > > > > >>>>>>>>> > > > > > >>>>>>>>> > > > > > >>>>>>>>> > > > > > > /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7fbb31a00088] > > > > > >>>>>>>>> > > > > > >>>>>>>>> /lib64/libpthread.so.0(+0x7569)[0x7fbb3e605569] > > > > > >>>>>>>>> > > > > > >>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fbb3e33c9ef] > > > > > >>>>>>>>> > > > > > >>>>>>>> > > > > > >>>>>>>> This happens on a 3-Node Gluster v5.5 Cluster on two > > different > > > > > >>>>>>>> volumes. But both volumes has the same settings: > > > > > >>>>>>>> > > > > > >>>>>>>>> Volume Name: shortterm > > > > > >>>>>>>>> Type: Replicate > > > > > >>>>>>>>> Volume ID: 5307e5c5-e8a1-493a-a846-342fb0195dee > > > > > >>>>>>>>> Status: Started > > > > > >>>>>>>>> Snapshot Count: 0 > > > > > >>>>>>>>> Number of Bricks: 1 x 3 = 3 > > > > > >>>>>>>>> Transport-type: tcp > > > > > >>>>>>>>> Bricks: > > > > > >>>>>>>>> Brick1: fs-xxxxx-c1-n1:/gluster/brick4/glusterbrick > > > > > >>>>>>>>> Brick2: fs-xxxxx-c1-n2:/gluster/brick4/glusterbrick > > > > > >>>>>>>>> Brick3: fs-xxxxx-c1-n3:/gluster/brick4/glusterbrick > > > > > >>>>>>>>> Options Reconfigured: > > > > > >>>>>>>>> storage.reserve: 1 > > > > > >>>>>>>>> performance.client-io-threads: off > > > > > >>>>>>>>> nfs.disable: on > > > > > >>>>>>>>> transport.address-family: inet > > > > > >>>>>>>>> user.smb: disable > > > > > >>>>>>>>> features.read-only: off > > > > > >>>>>>>>> features.worm: off > > > > > >>>>>>>>> features.worm-file-level: on > > > > > >>>>>>>>> features.retention-mode: enterprise > > > > > >>>>>>>>> features.default-retention-period: 120 > > > > > >>>>>>>>> network.ping-timeout: 10 > > > > > >>>>>>>>> features.cache-invalidation: on > > > > > >>>>>>>>> features.cache-invalidation-timeout: 600 > > > > > >>>>>>>>> performance.nl-cache: on > > > > > >>>>>>>>> performance.nl-cache-timeout: 600 > > > > > >>>>>>>>> client.event-threads: 32 > > > > > >>>>>>>>> server.event-threads: 32 > > > > > >>>>>>>>> cluster.lookup-optimize: on > > > > > >>>>>>>>> performance.stat-prefetch: on > > > > > >>>>>>>>> performance.cache-invalidation: on > > > > > >>>>>>>>> performance.md-cache-timeout: 600 > > > > > >>>>>>>>> performance.cache-samba-metadata: on > > > > > >>>>>>>>> performance.cache-ima-xattrs: on > > > > > >>>>>>>>> performance.io-thread-count: 64 > > > > > >>>>>>>>> cluster.use-compound-fops: on > > > > > >>>>>>>>> performance.cache-size: 512MB > > > > > >>>>>>>>> performance.cache-refresh-timeout: 10 > > > > > >>>>>>>>> performance.read-ahead: off > > > > > >>>>>>>>> performance.write-behind-window-size: 4MB > > > > > >>>>>>>>> performance.write-behind: on > > > > > >>>>>>>>> storage.build-pgfid: on > > > > > >>>>>>>>> features.utime: on > > > > > >>>>>>>>> storage.ctime: on > > > > > >>>>>>>>> cluster.quorum-type: fixed > > > > > >>>>>>>>> cluster.quorum-count: 2 > > > > > >>>>>>>>> features.bitrot: on > > > > > >>>>>>>>> features.scrub: Active > > > > > >>>>>>>>> features.scrub-freq: daily > > > > > >>>>>>>>> cluster.enable-shared-storage: enable > > > > > >>>>>>>>> > > > > > >>>>>>>>> > > > > > >>>>>>>> Why can this happen to all Brick processes? I don't > > understand > > > > the > > > > > >>>>>>>> crash report. The FOPs are nothing special and after restart > > > > brick > > > > > >>>>>>>> processes everything works fine and our application was > > succeed. > > > > > >>>>>>>> > > > > > >>>>>>>> Regards > > > > > >>>>>>>> David Spisla > > > > > >>>>>>>> > > > > > >>>>>>>> > > > > > >>>>>>>> > > > > > >>>>>>>> _______________________________________________ > > > > > >>>>>>>> Gluster-users mailing list > > > > > >>>>>>>> Gluster-users at gluster.org > > > > > >>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users > > > > > >>>>>>> > > > > > >>>>>>> > > > > > > > > > _______________________________________________ > > > > > Gluster-users mailing list > > > > > Gluster-users at gluster.org > > > > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > > > > > > > From spisla80 at gmail.com Fri May 17 10:34:19 2019 From: spisla80 at gmail.com (David Spisla) Date: Fri, 17 May 2019 12:34:19 +0200 Subject: [Gluster-users] Brick-Xlators crashes after Set-RO and Read In-Reply-To: <20190517101328.GC24535@ndevos-x270> References: <20190517082141.GA24535@ndevos-x270> <20190517093502.GB24535@ndevos-x270> <20190517101328.GC24535@ndevos-x270> Message-ID: Thank you all for the clarification. This is very helpful! Regards David Spisla Am Fr., 17. Mai 2019 um 12:15 Uhr schrieb Niels de Vos : > On Fri, May 17, 2019 at 11:57:47AM +0200, David Spisla wrote: > > Hello Niels, > > > > Am Fr., 17. Mai 2019 um 11:35 Uhr schrieb Niels de Vos < > ndevos at redhat.com>: > > > > > On Fri, May 17, 2019 at 11:17:52AM +0200, David Spisla wrote: > > > > Hello Niels, > > > > > > > > Am Fr., 17. Mai 2019 um 10:21 Uhr schrieb Niels de Vos < > > > ndevos at redhat.com>: > > > > > > > > > On Fri, May 17, 2019 at 09:50:28AM +0200, David Spisla wrote: > > > > > > Hello Vijay, > > > > > > thank you for the clarification. Yes, there is an unconditional > > > > > dereference > > > > > > in stbuf. It seems plausible that this causes the crash. I think > a > > > check > > > > > > like this should help: > > > > > > > > > > > > if (buf == NULL) { > > > > > > goto out; > > > > > > } > > > > > > map_atime_from_server(this, buf); > > > > > > > > > > > > Is there a reason why buf can be NULL? > > > > > > > > > > It seems LOOKUP returned an error (errno=13: EACCES: Permission > > > denied). > > > > > This is probably something you need to handle in worm_lookup_cbk. > There > > > > > can be many reasons for a FOP to return an error, why it happened > in > > > > > this case is a little difficult to say without (much) more details. > > > > > > > > > Yes, I will look for a way to handle that case. > > > > It is intended, that the struct stbuf ist NULL when an error happens? > > > > > > Yes, in most error occasions it will not be possible to get a valid > > > stbuf. > > > > > I will do a check like this assuming that in case of an error op_errno > != 0 > > and ret = -1 > > > > if (buf == NULL || op_errno != 0 || ret = -1) { > > goto out; > > } > > map_atime_from_server(this, buf); > > > > Does this fit? > > I think it is more common to do > > if (ret == -1) { > /* error handling and unwind */ > goto out; > } > > map_atime_from_server(this, buf); > > Niels > > > Regards > > David > > > > > > > > Niels > > > > > > > > > > > > > > Regards > > > > David Spisla > > > > > > > > > > > > > HTH, > > > > > Niels > > > > > > > > > > > > > > > > > > > > > > Regards > > > > > > David Spisla > > > > > > > > > > > > > > > > > > Am Fr., 17. Mai 2019 um 01:51 Uhr schrieb Vijay Bellur < > > > > > vbellur at redhat.com>: > > > > > > > > > > > > > Hello David, > > > > > > > > > > > > > > From the backtrace it looks like stbuf is NULL in > > > > > map_atime_from_server() > > > > > > > as worm_lookup_cbk has got an error (op_ret = -1, op_errno = > 13). > > > Can > > > > > you > > > > > > > please check if there is an unconditional dereference of stbuf > in > > > > > > > map_atime_from_server()? > > > > > > > > > > > > > > Regards, > > > > > > > Vijay > > > > > > > > > > > > > > On Thu, May 16, 2019 at 2:36 AM David Spisla < > spisla80 at gmail.com> > > > > > wrote: > > > > > > > > > > > > > >> Hello Vijay, > > > > > > >> > > > > > > >> yes, we are using custom patches. It s a helper function, > which is > > > > > > >> defined in xlator_helper.c and used in worm_lookup_cbk. > > > > > > >> Do you think this could be the problem? The functions only > > > manipulates > > > > > > >> the atime in struct iattr > > > > > > >> > > > > > > >> Regards > > > > > > >> David Spisla > > > > > > >> > > > > > > >> Am Do., 16. Mai 2019 um 10:05 Uhr schrieb Vijay Bellur < > > > > > > >> vbellur at redhat.com>: > > > > > > >> > > > > > > >>> Hello David, > > > > > > >>> > > > > > > >>> Do you have any custom patches in your deployment? I looked > up > > > v5.5 > > > > > but > > > > > > >>> could not find the following functions referred to in the > core: > > > > > > >>> > > > > > > >>> map_atime_from_server() > > > > > > >>> worm_lookup_cbk() > > > > > > >>> > > > > > > >>> Neither do I see xlator_helper.c in the codebase. > > > > > > >>> > > > > > > >>> Thanks, > > > > > > >>> Vijay > > > > > > >>> > > > > > > >>> > > > > > > >>> #0 map_atime_from_server (this=0x7fdef401af00, stbuf=0x0) at > > > > > > >>> ../../../../xlators/lib/src/xlator_helper.c:21 > > > > > > >>> __FUNCTION__ = "map_to_atime_from_server" > > > > > > >>> #1 0x00007fdef39a0382 in worm_lookup_cbk (frame=frame at entry > > > > > =0x7fdeac0015c8, > > > > > > >>> cookie=, this=0x7fdef401af00, > op_ret=op_ret at entry > > > =-1, > > > > > > >>> op_errno=op_errno at entry=13, > > > > > > >>> inode=inode at entry=0x0, buf=0x0, xdata=0x0, > postparent=0x0) > > > at > > > > > > >>> worm.c:531 > > > > > > >>> priv = 0x7fdef4075378 > > > > > > >>> ret = 0 > > > > > > >>> __FUNCTION__ = "worm_lookup_cbk" > > > > > > >>> > > > > > > >>> On Thu, May 16, 2019 at 12:53 AM David Spisla < > > > spisla80 at gmail.com> > > > > > > >>> wrote: > > > > > > >>> > > > > > > >>>> Hello Vijay, > > > > > > >>>> > > > > > > >>>> I could reproduce the issue. After doing a simple DIR > Listing > > > from > > > > > > >>>> Win10 powershell, all brick processes crashes. Its not the > same > > > > > scenario > > > > > > >>>> mentioned before but the crash report in the bricks log is > the > > > same. > > > > > > >>>> Attached you find the backtrace. > > > > > > >>>> > > > > > > >>>> Regards > > > > > > >>>> David Spisla > > > > > > >>>> > > > > > > >>>> Am Di., 7. Mai 2019 um 20:08 Uhr schrieb Vijay Bellur < > > > > > > >>>> vbellur at redhat.com>: > > > > > > >>>> > > > > > > >>>>> Hello David, > > > > > > >>>>> > > > > > > >>>>> On Tue, May 7, 2019 at 2:16 AM David Spisla < > > > spisla80 at gmail.com> > > > > > > >>>>> wrote: > > > > > > >>>>> > > > > > > >>>>>> Hello Vijay, > > > > > > >>>>>> > > > > > > >>>>>> how can I create such a core file? Or will it be created > > > > > > >>>>>> automatically if a gluster process crashes? > > > > > > >>>>>> Maybe you can give me a hint and will try to get a > backtrace. > > > > > > >>>>>> > > > > > > >>>>> > > > > > > >>>>> Generation of core file is dependent on the system > > > configuration. > > > > > > >>>>> `man 5 core` contains useful information to generate a core > > > file > > > > > in a > > > > > > >>>>> directory. Once a core file is generated, you can use gdb > to > > > get a > > > > > > >>>>> backtrace of all threads (using "thread apply all bt > full"). > > > > > > >>>>> > > > > > > >>>>> > > > > > > >>>>>> Unfortunately this bug is not easy to reproduce because it > > > appears > > > > > > >>>>>> only sometimes. > > > > > > >>>>>> > > > > > > >>>>> > > > > > > >>>>> If the bug is not easy to reproduce, having a backtrace > from > > > the > > > > > > >>>>> generated core would be very useful! > > > > > > >>>>> > > > > > > >>>>> Thanks, > > > > > > >>>>> Vijay > > > > > > >>>>> > > > > > > >>>>> > > > > > > >>>>>> > > > > > > >>>>>> Regards > > > > > > >>>>>> David Spisla > > > > > > >>>>>> > > > > > > >>>>>> Am Mo., 6. Mai 2019 um 19:48 Uhr schrieb Vijay Bellur < > > > > > > >>>>>> vbellur at redhat.com>: > > > > > > >>>>>> > > > > > > >>>>>>> Thank you for the report, David. Do you have core files > > > > > available on > > > > > > >>>>>>> any of the servers? If yes, would it be possible for you > to > > > > > provide a > > > > > > >>>>>>> backtrace. > > > > > > >>>>>>> > > > > > > >>>>>>> Regards, > > > > > > >>>>>>> Vijay > > > > > > >>>>>>> > > > > > > >>>>>>> On Mon, May 6, 2019 at 3:09 AM David Spisla < > > > spisla80 at gmail.com> > > > > > > >>>>>>> wrote: > > > > > > >>>>>>> > > > > > > >>>>>>>> Hello folks, > > > > > > >>>>>>>> > > > > > > >>>>>>>> we have a client application (runs on Win10) which does > some > > > > > FOPs > > > > > > >>>>>>>> on a gluster volume which is accessed by SMB. > > > > > > >>>>>>>> > > > > > > >>>>>>>> *Scenario 1* is a READ Operation which reads all files > > > > > > >>>>>>>> successively and checks if the files data was correctly > > > copied. > > > > > While doing > > > > > > >>>>>>>> this, all brick processes crashes and in the logs one > have > > > this > > > > > crash > > > > > > >>>>>>>> report on every brick log: > > > > > > >>>>>>>> > > > > > > >>>>>>>>> > > > > > > > > > CTX_ID:a0359502-2c76-4fee-8cb9-365679dc690e-GRAPH_ID:0-PID:32934-HOST:XX-XXXXX-XX-XX-PC_NAME:shortterm-client-2-RECON_NO:-0, > > > > > gfid: 00000000-0000-0000-0000-000000000001, > > > > > req(uid:2000,gid:2000,perm:1,ngrps:1), > > > > > ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) > > > [Permission > > > > > denied] > > > > > > >>>>>>>>> pending frames: > > > > > > >>>>>>>>> frame : type(0) op(27) > > > > > > >>>>>>>>> frame : type(0) op(40) > > > > > > >>>>>>>>> patchset: git://git.gluster.org/glusterfs.git > > > > > > >>>>>>>>> signal received: 11 > > > > > > >>>>>>>>> time of crash: > > > > > > >>>>>>>>> 2019-04-16 08:32:21 > > > > > > >>>>>>>>> configuration details: > > > > > > >>>>>>>>> argp 1 > > > > > > >>>>>>>>> backtrace 1 > > > > > > >>>>>>>>> dlfcn 1 > > > > > > >>>>>>>>> libpthread 1 > > > > > > >>>>>>>>> llistxattr 1 > > > > > > >>>>>>>>> setfsid 1 > > > > > > >>>>>>>>> spinlock 1 > > > > > > >>>>>>>>> epoll.h 1 > > > > > > >>>>>>>>> xattr.h 1 > > > > > > >>>>>>>>> st_atim.tv_nsec 1 > > > > > > >>>>>>>>> package-string: glusterfs 5.5 > > > > > > >>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f9a5bd4d64c] > > > > > > >>>>>>>>> > > > > > /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f9a5bd57d26] > > > > > > >>>>>>>>> /lib64/libc.so.6(+0x361a0)[0x7f9a5af141a0] > > > > > > >>>>>>>>> > > > > > > > > > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7f9a4ef0e910] > > > > > > >>>>>>>>> > > > > > > > > > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7f9a4ef0b118] > > > > > > >>>>>>>>> > > > > > > > > > /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7f9a4f1278d6] > > > > > > >>>>>>>>> > > > > > > > > > /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7f9a4f35975b] > > > > > > >>>>>>>>> > > > > > > > > > /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7f9a4f1203b3] > > > > > > >>>>>>>>> > > > > > > > > > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7f9a4ef0b5b2] > > > > > > >>>>>>>>> > > > > > /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c] > > > > > > >>>>>>>>> > > > > > /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c] > > > > > > >>>>>>>>> > > > > > > > > > /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7f9a4e8cf548] > > > > > > >>>>>>>>> > > > > > > > > > /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7f9a5bdefc22] > > > > > > >>>>>>>>> > > > /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7f9a5bd733a5] > > > > > > >>>>>>>>> > > > > > > > > > /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7f9a4e6b7088] > > > > > > >>>>>>>>> /lib64/libpthread.so.0(+0x7569)[0x7f9a5b29f569] > > > > > > >>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7f9a5afd69af] > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> *Scenario 2 *The application just SET Read-Only on each > > > file > > > > > > >>>>>>>> sucessively. After the 70th file was set, all the bricks > > > > > crashes and again, > > > > > > >>>>>>>> one can read this crash report in every brick log: > > > > > > >>>>>>>> > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> [2019-05-02 07:43:39.953591] I [MSGID: 139001] > > > > > > >>>>>>>>> [posix-acl.c:263:posix_acl_log_permit_denied] > > > > > 0-longterm-access-control: > > > > > > >>>>>>>>> client: > > > > > > >>>>>>>>> > > > > > > > > > CTX_ID:21aa9c75-3a5f-41f9-925b-48e4c80bd24a-GRAPH_ID:0-PID:16325-HOST:XXX-X-X-XXX-PC_NAME:longterm-client-0-RECON_NO:-0, > > > > > > >>>>>>>>> gfid: 00000000-0000-0000-0000-000000000001, > > > > > > >>>>>>>>> req(uid:2000,gid:2000,perm:1,ngrps:1), > > > > > > >>>>>>>>> > ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, > > > > > acl:-) [Permission > > > > > > >>>>>>>>> denied] > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> pending frames: > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> frame : type(0) op(27) > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> patchset: git://git.gluster.org/glusterfs.git > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> signal received: 11 > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> time of crash: > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> 2019-05-02 07:43:39 > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> configuration details: > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> argp 1 > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> backtrace 1 > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> dlfcn 1 > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> libpthread 1 > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> llistxattr 1 > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> setfsid 1 > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> spinlock 1 > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> epoll.h 1 > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> xattr.h 1 > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> st_atim.tv_nsec 1 > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> package-string: glusterfs 5.5 > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fbb3f0b364c] > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> > > > > > /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fbb3f0bdd26] > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> /lib64/libc.so.6(+0x361e0)[0x7fbb3e27a1e0] > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> > > > > > > > > > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7fbb32257910] > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> > > > > > > > > > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7fbb32254118] > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> > > > > > > > > > /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7fbb324708d6] > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> > > > > > > > > > /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7fbb326a275b] > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> > > > > > > > > > /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7fbb324693b3] > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> > > > > > > > > > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7fbb322545b2] > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> > > > > > /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c] > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> > > > > > /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c] > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> > > > > > > > > > /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7fbb31c18548] > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> > > > > > > > > > /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7fbb3f155c22] > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> > > > /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7fbb3f0d93a5] > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> > > > > > > > > > /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7fbb31a00088] > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> /lib64/libpthread.so.0(+0x7569)[0x7fbb3e605569] > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fbb3e33c9ef] > > > > > > >>>>>>>>> > > > > > > >>>>>>>> > > > > > > >>>>>>>> This happens on a 3-Node Gluster v5.5 Cluster on two > > > different > > > > > > >>>>>>>> volumes. But both volumes has the same settings: > > > > > > >>>>>>>> > > > > > > >>>>>>>>> Volume Name: shortterm > > > > > > >>>>>>>>> Type: Replicate > > > > > > >>>>>>>>> Volume ID: 5307e5c5-e8a1-493a-a846-342fb0195dee > > > > > > >>>>>>>>> Status: Started > > > > > > >>>>>>>>> Snapshot Count: 0 > > > > > > >>>>>>>>> Number of Bricks: 1 x 3 = 3 > > > > > > >>>>>>>>> Transport-type: tcp > > > > > > >>>>>>>>> Bricks: > > > > > > >>>>>>>>> Brick1: fs-xxxxx-c1-n1:/gluster/brick4/glusterbrick > > > > > > >>>>>>>>> Brick2: fs-xxxxx-c1-n2:/gluster/brick4/glusterbrick > > > > > > >>>>>>>>> Brick3: fs-xxxxx-c1-n3:/gluster/brick4/glusterbrick > > > > > > >>>>>>>>> Options Reconfigured: > > > > > > >>>>>>>>> storage.reserve: 1 > > > > > > >>>>>>>>> performance.client-io-threads: off > > > > > > >>>>>>>>> nfs.disable: on > > > > > > >>>>>>>>> transport.address-family: inet > > > > > > >>>>>>>>> user.smb: disable > > > > > > >>>>>>>>> features.read-only: off > > > > > > >>>>>>>>> features.worm: off > > > > > > >>>>>>>>> features.worm-file-level: on > > > > > > >>>>>>>>> features.retention-mode: enterprise > > > > > > >>>>>>>>> features.default-retention-period: 120 > > > > > > >>>>>>>>> network.ping-timeout: 10 > > > > > > >>>>>>>>> features.cache-invalidation: on > > > > > > >>>>>>>>> features.cache-invalidation-timeout: 600 > > > > > > >>>>>>>>> performance.nl-cache: on > > > > > > >>>>>>>>> performance.nl-cache-timeout: 600 > > > > > > >>>>>>>>> client.event-threads: 32 > > > > > > >>>>>>>>> server.event-threads: 32 > > > > > > >>>>>>>>> cluster.lookup-optimize: on > > > > > > >>>>>>>>> performance.stat-prefetch: on > > > > > > >>>>>>>>> performance.cache-invalidation: on > > > > > > >>>>>>>>> performance.md-cache-timeout: 600 > > > > > > >>>>>>>>> performance.cache-samba-metadata: on > > > > > > >>>>>>>>> performance.cache-ima-xattrs: on > > > > > > >>>>>>>>> performance.io-thread-count: 64 > > > > > > >>>>>>>>> cluster.use-compound-fops: on > > > > > > >>>>>>>>> performance.cache-size: 512MB > > > > > > >>>>>>>>> performance.cache-refresh-timeout: 10 > > > > > > >>>>>>>>> performance.read-ahead: off > > > > > > >>>>>>>>> performance.write-behind-window-size: 4MB > > > > > > >>>>>>>>> performance.write-behind: on > > > > > > >>>>>>>>> storage.build-pgfid: on > > > > > > >>>>>>>>> features.utime: on > > > > > > >>>>>>>>> storage.ctime: on > > > > > > >>>>>>>>> cluster.quorum-type: fixed > > > > > > >>>>>>>>> cluster.quorum-count: 2 > > > > > > >>>>>>>>> features.bitrot: on > > > > > > >>>>>>>>> features.scrub: Active > > > > > > >>>>>>>>> features.scrub-freq: daily > > > > > > >>>>>>>>> cluster.enable-shared-storage: enable > > > > > > >>>>>>>>> > > > > > > >>>>>>>>> > > > > > > >>>>>>>> Why can this happen to all Brick processes? I don't > > > understand > > > > > the > > > > > > >>>>>>>> crash report. The FOPs are nothing special and after > restart > > > > > brick > > > > > > >>>>>>>> processes everything works fine and our application was > > > succeed. > > > > > > >>>>>>>> > > > > > > >>>>>>>> Regards > > > > > > >>>>>>>> David Spisla > > > > > > >>>>>>>> > > > > > > >>>>>>>> > > > > > > >>>>>>>> > > > > > > >>>>>>>> _______________________________________________ > > > > > > >>>>>>>> Gluster-users mailing list > > > > > > >>>>>>>> Gluster-users at gluster.org > > > > > > >>>>>>>> > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > > > >>>>>>> > > > > > > >>>>>>> > > > > > > > > > > > _______________________________________________ > > > > > > Gluster-users mailing list > > > > > > Gluster-users at gluster.org > > > > > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ravishankar at redhat.com Fri May 17 10:42:59 2019 From: ravishankar at redhat.com (Ravishankar N) Date: Fri, 17 May 2019 16:12:59 +0530 Subject: [Gluster-users] add-brick: failed: Commit failed In-Reply-To: References: Message-ID: On 17/05/19 5:59 AM, David Cunningham wrote: > Hello, > > We're adding an arbiter node to an existing volume and having an > issue. Can anyone help? The root cause error appears to be > "00000000-0000-0000-0000-000000000001: failed to resolve (Transport > endpoint is not connected)", as below. > Was your root directory of the replica 2 volume? in metadata or entry split-brain? If yes, you need to resolve it before proceeding with the add-brick. -Ravi > We are running glusterfs 5.6.1. Thanks in advance for any assistance! > > On existing node gfs1, trying to add new arbiter node gfs3: > > # gluster volume add-brick gvol0 replica 3 arbiter 1 > gfs3:/nodirectwritedata/gluster/gvol0 > volume add-brick: failed: Commit failed on gfs3. Please check log file > for details. > > On new node gfs3 in gvol0-add-brick-mount.log: > > [2019-05-17 01:20:22.689721] I [fuse-bridge.c:4267:fuse_init] > 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 > kernel 7.22 > [2019-05-17 01:20:22.689778] I [fuse-bridge.c:4878:fuse_graph_sync] > 0-fuse: switched to graph 0 > [2019-05-17 01:20:22.694897] E [fuse-bridge.c:4336:fuse_first_lookup] > 0-fuse: first lookup on root failed (Transport endpoint is not connected) > [2019-05-17 01:20:22.699770] W > [fuse-resolve.c:127:fuse_resolve_gfid_cbk] 0-fuse: > 00000000-0000-0000-0000-000000000001: failed to resolve (Transport > endpoint is not connected) > [2019-05-17 01:20:22.699834] W > [fuse-bridge.c:3294:fuse_setxattr_resume] 0-glusterfs-fuse: 2: > SETXATTR 00000000-0000-0000-0000-000000000001/1 (trusted.add-brick) > resolution failed > [2019-05-17 01:20:22.715656] I [fuse-bridge.c:5144:fuse_thread_proc] > 0-fuse: initating unmount of /tmp/mntQAtu3f > [2019-05-17 01:20:22.715865] W [glusterfsd.c:1500:cleanup_and_exit] > (-->/lib64/libpthread.so.0(+0x7dd5) [0x7fb223bf6dd5] > -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x560886581e75] > -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x560886581ceb] ) 0-: > received signum (15), shutting down > [2019-05-17 01:20:22.715926] I [fuse-bridge.c:5914:fini] 0-fuse: > Unmounting '/tmp/mntQAtu3f'. > [2019-05-17 01:20:22.715953] I [fuse-bridge.c:5919:fini] 0-fuse: > Closing fuse connection to '/tmp/mntQAtu3f'. > > Processes running on new node gfs3: > > # ps -ef | grep gluster > root????? 6832???? 1? 0 20:17 ???????? 00:00:00 /usr/sbin/glusterd -p > /var/run/glusterd.pid --log-level INFO > root???? 15799???? 1? 0 20:17 ???????? 00:00:00 /usr/sbin/glusterfs -s > localhost --volfile-id gluster/glustershd -p > /var/run/gluster/glustershd/glustershd.pid -l > /var/log/glusterfs/glustershd.log -S > /var/run/gluster/24c12b09f93eec8e.socket --xlator-option > *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412 > --process-name glustershd > root???? 16856 16735? 0 21:21 pts/0??? 00:00:00 grep --color=auto gluster > > -- > David Cunningham, Voisonics Limited > http://voisonics.com/ > USA: +1 213 221 1092 > New Zealand: +64 (0)28 2558 3782 > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From dcunningham at voisonics.com Fri May 17 23:01:24 2019 From: dcunningham at voisonics.com (David Cunningham) Date: Sat, 18 May 2019 11:01:24 +1200 Subject: [Gluster-users] add-brick: failed: Commit failed In-Reply-To: References: Message-ID: Hi Ravi, The existing two nodes aren't in split-brain, at least that I'm aware of. Running "gluster volume status all" doesn't show any problem. I'm not sure what "in metadata" means. Can you please explain that one? On Fri, 17 May 2019 at 22:43, Ravishankar N wrote: > > On 17/05/19 5:59 AM, David Cunningham wrote: > > Hello, > > We're adding an arbiter node to an existing volume and having an issue. > Can anyone help? The root cause error appears to be > "00000000-0000-0000-0000-000000000001: failed to resolve (Transport > endpoint is not connected)", as below. > > Was your root directory of the replica 2 volume in metadata or entry > split-brain? If yes, you need to resolve it before proceeding with the > add-brick. > > -Ravi > > > We are running glusterfs 5.6.1. Thanks in advance for any assistance! > > On existing node gfs1, trying to add new arbiter node gfs3: > > # gluster volume add-brick gvol0 replica 3 arbiter 1 > gfs3:/nodirectwritedata/gluster/gvol0 > volume add-brick: failed: Commit failed on gfs3. Please check log file for > details. > > On new node gfs3 in gvol0-add-brick-mount.log: > > [2019-05-17 01:20:22.689721] I [fuse-bridge.c:4267:fuse_init] > 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel > 7.22 > [2019-05-17 01:20:22.689778] I [fuse-bridge.c:4878:fuse_graph_sync] > 0-fuse: switched to graph 0 > [2019-05-17 01:20:22.694897] E [fuse-bridge.c:4336:fuse_first_lookup] > 0-fuse: first lookup on root failed (Transport endpoint is not connected) > [2019-05-17 01:20:22.699770] W [fuse-resolve.c:127:fuse_resolve_gfid_cbk] > 0-fuse: 00000000-0000-0000-0000-000000000001: failed to resolve (Transport > endpoint is not connected) > [2019-05-17 01:20:22.699834] W [fuse-bridge.c:3294:fuse_setxattr_resume] > 0-glusterfs-fuse: 2: SETXATTR 00000000-0000-0000-0000-000000000001/1 > (trusted.add-brick) resolution failed > [2019-05-17 01:20:22.715656] I [fuse-bridge.c:5144:fuse_thread_proc] > 0-fuse: initating unmount of /tmp/mntQAtu3f > [2019-05-17 01:20:22.715865] W [glusterfsd.c:1500:cleanup_and_exit] > (-->/lib64/libpthread.so.0(+0x7dd5) [0x7fb223bf6dd5] > -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x560886581e75] > -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x560886581ceb] ) 0-: > received signum (15), shutting down > [2019-05-17 01:20:22.715926] I [fuse-bridge.c:5914:fini] 0-fuse: > Unmounting '/tmp/mntQAtu3f'. > [2019-05-17 01:20:22.715953] I [fuse-bridge.c:5919:fini] 0-fuse: Closing > fuse connection to '/tmp/mntQAtu3f'. > > Processes running on new node gfs3: > > # ps -ef | grep gluster > root 6832 1 0 20:17 ? 00:00:00 /usr/sbin/glusterd -p > /var/run/glusterd.pid --log-level INFO > root 15799 1 0 20:17 ? 00:00:00 /usr/sbin/glusterfs -s > localhost --volfile-id gluster/glustershd -p > /var/run/gluster/glustershd/glustershd.pid -l > /var/log/glusterfs/glustershd.log -S > /var/run/gluster/24c12b09f93eec8e.socket --xlator-option > *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412 --process-name > glustershd > root 16856 16735 0 21:21 pts/0 00:00:00 grep --color=auto gluster > > -- > David Cunningham, Voisonics Limited > http://voisonics.com/ > USA: +1 213 221 1092 > New Zealand: +64 (0)28 2558 3782 > > _______________________________________________ > Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users > > -- David Cunningham, Voisonics Limited http://voisonics.com/ USA: +1 213 221 1092 New Zealand: +64 (0)28 2558 3782 -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladkopy at gmail.com Fri May 17 23:18:36 2019 From: vladkopy at gmail.com (Vlad Kopylov) Date: Fri, 17 May 2019 19:18:36 -0400 Subject: [Gluster-users] gluster-block v0.4 is alive! In-Reply-To: References: Message-ID: straight from ./autogen.sh && ./configure && make -j install CentOS Linux release 7.6.1810 (Core) May 17 19:13:18 vm2 gluster-blockd[24294]: Error opening log file: No such file or directory May 17 19:13:18 vm2 gluster-blockd[24294]: Logging to stderr. May 17 19:13:18 vm2 gluster-blockd[24294]: [2019-05-17 23:13:18.966992] CRIT: trying to change logDir from /var/log/gluster-block to /var/log/gluster-block [at utils.c+495 :] May 17 19:13:19 vm2 gluster-blockd[24294]: No such path /backstores/user:glfs May 17 19:13:19 vm2 systemd[1]: gluster-blockd.service: main process exited, code=exited, status=1/FAILURE May 17 19:13:19 vm2 systemd[1]: Unit gluster-blockd.service entered failed state. May 17 19:13:19 vm2 systemd[1]: gluster-blockd.service failed. On Thu, May 2, 2019 at 1:35 PM Prasanna Kalever wrote: > Hello Gluster folks, > > Gluster-block team is happy to announce the v0.4 release [1]. > > This is the new stable version of gluster-block, lots of new and > exciting features and interesting bug fixes are made available as part > of this release. > Please find the big list of release highlights and notable fixes at [2]. > > Details about installation can be found in the easy install guide at > [3]. Find the details about prerequisites and setup guide at [4]. > If you are a new user, checkout the demo video attached in the README > doc [5], which will be a good source of intro to the project. > There are good examples about how to use gluster-block both in the man > pages [6] and test file [7] (also in the README). > > gluster-block is part of fedora package collection, an updated package > with release version v0.4 will be soon made available. And the > community provided packages will be soon made available at [8]. > > Please spend a minute to report any kind of issue that comes to your > notice with this handy link [9]. > We look forward to your feedback, which will help gluster-block get better! > > We would like to thank all our users, contributors for bug filing and > fixes, also the whole team who involved in the huge effort with > pre-release testing. > > > [1] https://github.com/gluster/gluster-block > [2] https://github.com/gluster/gluster-block/releases > [3] https://github.com/gluster/gluster-block/blob/master/INSTALL > [4] https://github.com/gluster/gluster-block#usage > [5] https://github.com/gluster/gluster-block/blob/master/README.md > [6] https://github.com/gluster/gluster-block/tree/master/docs > [7] https://github.com/gluster/gluster-block/blob/master/tests/basic.t > [8] https://download.gluster.org/pub/gluster/gluster-block/ > [9] https://github.com/gluster/gluster-block/issues/new > > Cheers, > Team Gluster-Block! > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Sat May 18 10:34:57 2019 From: hunter86_bg at yahoo.com (Strahil) Date: Sat, 18 May 2019 13:34:57 +0300 Subject: [Gluster-users] add-brick: failed: Commit failed Message-ID: <4p7xh1oypo8edvdnaxhb0i8y.1558175697764@email.android.com> Just run 'gluster volume heal my_volume info summary'. It will report any issues - everything should be 'Connected' and show '0'. Best Regards, Strahil NikolovOn May 18, 2019 02:01, David Cunningham wrote: > > Hi Ravi, > > The existing two nodes aren't in split-brain, at least that I'm aware of. Running "gluster volume status all" doesn't show any problem. > > I'm not sure what "in metadata" means. Can you please explain that one? > > > On Fri, 17 May 2019 at 22:43, Ravishankar N wrote: >> >> >> On 17/05/19 5:59 AM, David Cunningham wrote: >>> >>> Hello, >>> >>> We're adding an arbiter node to an existing volume and having an issue. Can anyone help? The root cause error appears to be "00000000-0000-0000-0000-000000000001: failed to resolve (Transport endpoint is not connected)", as below. >>> >> Was your root directory of the replica 2 volume? in metadata or entry split-brain? If yes, you need to resolve it before proceeding with the add-brick. >> >> -Ravi >> >> >>> We are running glusterfs 5.6.1. Thanks in advance for any assistance! >>> >>> On existing node gfs1, trying to add new arbiter node gfs3: >>> >>> # gluster volume add-brick gvol0 replica 3 arbiter 1 gfs3:/nodirectwritedata/gluster/gvol0 >>> volume add-brick: failed: Commit failed on gfs3. Please check log file for details. >>> >>> On new node gfs3 in gvol0-add-brick-mount.log: >>> >>> [2019-05-17 01:20:22.689721] I [fuse-bridge.c:4267:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.22 >>> [2019-05-17 01:20:22.689778] I [fuse-bridge.c:4878:fuse_graph_sync] 0-fuse: switched to graph 0 >>> [2019-05-17 01:20:22.694897] E [fuse-bridge.c:4336:fuse_first_lookup] 0-fuse: first lookup on root failed (Transport endpoint is not connected) >>> [2019-05-17 01:20:22.699770] W [fuse-resolve.c:127:fuse_resolve_gfid_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001: failed to resolve (Transport endpoint is not connected) >>> [2019-05-17 01:20:22.699834] W [fuse-bridge.c:3294:fuse_setxattr_resume] 0-glusterfs-fuse: 2: SETXATTR 00000000-0000-0000-0000-000000000001/1 (trusted.add-brick) resolution failed >>> [2019-05-17 01:20:22.715656] I [fuse-bridge.c:5144:fuse_thread_proc] 0-fuse: initating unmount of /tmp/mntQAtu3f >>> [2019-05-17 01:20:22.715865] W [glusterfsd.c:1500:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dd5) -------------- next part -------------- An HTML attachment was scrubbed... URL: From dcunningham at voisonics.com Sun May 19 23:31:16 2019 From: dcunningham at voisonics.com (David Cunningham) Date: Mon, 20 May 2019 11:31:16 +1200 Subject: [Gluster-users] add-brick: failed: Commit failed In-Reply-To: <4p7xh1oypo8edvdnaxhb0i8y.1558175697764@email.android.com> References: <4p7xh1oypo8edvdnaxhb0i8y.1558175697764@email.android.com> Message-ID: Hello, It does show everything as Connected and 0 for the existing bricks, gfs1 and gfs2. The new brick gfs3 isn't listed, presumably because of the failure as per my original email. Would anyone have any further suggestions on how to prevent the "Transport endpoint is not connected" error when adding the new brick? # gluster volume heal gvol0 info summary Brick gfs1:/nodirectwritedata/gluster/gvol0 Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 Brick gfs2:/nodirectwritedata/gluster/gvol0 Status: Connected Total Number of entries: 0 Number of entries in heal pending: 0 Number of entries in split-brain: 0 Number of entries possibly healing: 0 # gluster volume status all Status of volume: gvol0 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick gfs1:/nodirectwritedata/gluster/gvol0 49152 0 Y 7706 Brick gfs2:/nodirectwritedata/gluster/gvol0 49152 0 Y 7624 Self-heal Daemon on localhost N/A N/A Y 47636 Self-heal Daemon on gfs3 N/A N/A Y 18542 Self-heal Daemon on gfs2 N/A N/A Y 37192 Task Status of Volume gvol0 ------------------------------------------------------------------------------ There are no active volume task On Sat, 18 May 2019 at 22:34, Strahil wrote: > Just run 'gluster volume heal my_volume info summary'. > > It will report any issues - everything should be 'Connected' and show '0'. > > Best Regards, > Strahil Nikolov > On May 18, 2019 02:01, David Cunningham wrote: > > Hi Ravi, > > The existing two nodes aren't in split-brain, at least that I'm aware of. > Running "gluster volume status all" doesn't show any problem. > > I'm not sure what "in metadata" means. Can you please explain that one? > > > On Fri, 17 May 2019 at 22:43, Ravishankar N > wrote: > > > On 17/05/19 5:59 AM, David Cunningham wrote: > > Hello, > > We're adding an arbiter node to an existing volume and having an issue. > Can anyone help? The root cause error appears to be > "00000000-0000-0000-0000-000000000001: failed to resolve (Transport > endpoint is not connected)", as below. > > Was your root directory of the replica 2 volume in metadata or entry > split-brain? If yes, you need to resolve it before proceeding with the > add-brick. > > -Ravi > > > We are running glusterfs 5.6.1. Thanks in advance for any assistance! > > On existing node gfs1, trying to add new arbiter node gfs3: > > # gluster volume add-brick gvol0 replica 3 arbiter 1 > gfs3:/nodirectwritedata/gluster/gvol0 > volume add-brick: failed: Commit failed on gfs3. Please check log file for > details. > > On new node gfs3 in gvol0-add-brick-mount.log: > > [2019-05-17 01:20:22.689721] I [fuse-bridge.c:4267:fuse_init] > 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel > 7.22 > [2019-05-17 01:20:22.689778] I [fuse-bridge.c:4878:fuse_graph_sync] > 0-fuse: switched to graph 0 > [2019-05-17 01:20:22.694897] E [fuse-bridge.c:4336:fuse_first_lookup] > 0-fuse: first lookup on root failed (Transport endpoint is not connected) > [2019-05-17 01:20:22.699770] W [fuse-resolve.c:127:fuse_resolve_gfid_cbk] > 0-fuse: 00000000-0000-0000-0000-000000000001: failed to resolve (Transport > endpoint is not connected) > [2019-05-17 01:20:22.699834] W [fuse-bridge.c:3294:fuse_setxattr_resume] > 0-glusterfs-fuse: 2: SETXATTR 00000000-0000-0000-0000-000000000001/1 > (trusted.add-brick) resolution failed > [2019-05-17 01:20:22.715656] I [fuse-bridge.c:5144:fuse_thread_proc] > 0-fuse: initating unmount of /tmp/mntQAtu3f > [2019-05-17 01:20:22.715865] W [glusterfsd.c:1500:cleanup_and_exit] > (-->/lib64/libpthread.so.0(+0x7dd5) > > -- David Cunningham, Voisonics Limited http://voisonics.com/ USA: +1 213 221 1092 New Zealand: +64 (0)28 2558 3782 -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Mon May 20 03:57:30 2019 From: hunter86_bg at yahoo.com (Strahil) Date: Mon, 20 May 2019 06:57:30 +0300 Subject: [Gluster-users] add-brick: failed: Commit failed Message-ID: As everything seems OK, you can check if your arbiter is ok. Run 'gluster peer status' on all nodes. If all peers report 2 peers connected ,you can run: gluster volume add-brick gvol0 replica 2 arbiter 1 gfs3:/nodirectwritedata/gluster/gvol0 Bewt Regards, Strahil NikolovOn May 20, 2019 02:31, David Cunningham wrote: > > Hello, > > It does show everything as Connected and 0 for the existing bricks, gfs1 and gfs2. The new brick gfs3 isn't listed, presumably because of the failure as per my original email. Would anyone have any further suggestions on how to prevent the "Transport endpoint is not connected" error when adding the new brick? > > # gluster volume heal gvol0 info summary > Brick gfs1:/nodirectwritedata/gluster/gvol0 > Status: Connected > Total Number of entries: 0 > Number of entries in heal pending: 0 > Number of entries in split-brain: 0 > Number of entries possibly healing: 0 > > Brick gfs2:/nodirectwritedata/gluster/gvol0 > Status: Connected > Total Number of entries: 0 > Number of entries in heal pending: 0 > Number of entries in split-brain: 0 > Number of entries possibly healing: 0 > > > # gluster volume status all > Status of volume: gvol0 > Gluster process???????????????????????????? TCP Port? RDMA Port? Online? Pid > ------------------------------------------------------------------------------ > Brick gfs1:/nodirectwritedata/gluster/gvol0 49152???? 0????????? Y?????? 7706 > Brick gfs2:/nodirectwritedata/gluster/gvol0 49152???? 0????????? Y?????? 7624 > Self-heal Daemon on localhost?????????????? N/A?????? N/A??????? Y?????? 47636 > Self-heal Daemon on gfs3??????????????????? N/A?????? N/A??????? Y?????? 18542 > Self-heal Daemon on gfs2??????????????????? N/A?????? N/A??????? Y?????? 37192 > ? > Task Status of Volume gvol0 > ------------------------------------------------------------------------------ > There are no active volume task > > > On Sat, 18 May 2019 at 22:34, Strahil wrote: >> >> Just run 'gluster volume heal my_volume info summary'. >> >> It will report any issues - everything should be 'Connected' and show '0'. >> >> Best Regards, >> Strahil Nikolov >> >> On May 18, 2019 02:01, David Cunningham wrote: >>> >>> Hi Ravi, >>> >>> The existing two nodes aren't in split-brain, at least that I'm aware of. Running "gluster volume status all" doesn't show any problem. >>> >>> I'm not sure what "in metadata" means. Can you please explain that one? >>> >>> >>> On Fri, 17 May 2019 at 22:43, Ravishankar N wrote: >>>> >>>> >>>> On 17/05/19 5:59 AM, David Cunningham wrote: >>>>> >>>>> Hello, >>>>> >>>>> We're adding an arbiter node to an existing volume and having an issue. Can anyone help? The root cause error appears to be "00000000-0000-0000-0000-000000000001: failed to resolve (Transport endpoint is not connected)", as below. >>>>> >>>> Was your root directory of the replica 2 volume? in metadata or entry split-brain? If yes, you need to resolve it before proceeding with the add-brick. >>>> >>>> -Ravi >>>> >>>> >>>>> We are running glusterfs 5.6.1. Thanks in advance for any assistance! >>>>> >>>>> On existing node gfs1, trying to add new arbiter node gfs3: >>>>> >>>>> # gluster volume add-brick gvol0 replica 3 arbiter 1 gfs3:/nodirectwritedata/gluster/gvol0 >>>>> volume add-brick: failed: Commit failed on gfs3. Please check log file for details. >>>>> >>>>> On new node gfs3 in gvol0-add-brick-mount.log: >>>>> >>>>> [2019-05-17 01:20:22.689721] I [fuse-bridge.c:4267:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.22 >>>>> [2019-05-17 01:20:22.689778] I [fuse-bridge.c:4878:fuse_graph_sync] 0-fuse: switched to graph 0 >>>>> [2019-05-17 01:20:22.694897] E [fuse-bridge.c:4336:fuse_first_lookup] 0-fuse: first lookup on root failed (Transport endpoint is not connected) >>>>> [2019-05-17 01:20:22.699770] W [fuse-resolve.c:127:fuse_resolve_gfid_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001: failed to resolve (Transport endpoint is not connected) >>>>> [2019-05-17 01:20:22.699834] W [fuse-bridge.c:3294:fuse_setxattr_resume] 0-glusterfs-fuse: 2: SETXATTR 00000000-0000-0000-0000-000000000001/1 (trusted.add-brick) resolution failed >>>>> [2019-05-17 01:20:22.715656] I [fuse-bridge.c:5144:fuse_thread_proc] 0-fuse: initating unmount of /tmp/mntQAtu3f >>>>> [2019-05-17 01:20:22.715865] W [glusterfsd.c:1500:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dd5) > > > > -- > David Cunningham, Voisonics Limited > http://voisonics.com/ > USA: +1 213 221 1092 > New Zealand: +64 (0)28 2558 3782 -------------- next part -------------- An HTML attachment was scrubbed... URL: From nbalacha at redhat.com Mon May 20 04:39:44 2019 From: nbalacha at redhat.com (Nithya Balachandran) Date: Mon, 20 May 2019 10:09:44 +0530 Subject: [Gluster-users] add-brick: failed: Commit failed In-Reply-To: References: Message-ID: On Fri, 17 May 2019 at 06:01, David Cunningham wrote: > Hello, > > We're adding an arbiter node to an existing volume and having an issue. > Can anyone help? The root cause error appears to be > "00000000-0000-0000-0000-000000000001: failed to resolve (Transport > endpoint is not connected)", as below. > > We are running glusterfs 5.6.1. Thanks in advance for any assistance! > > On existing node gfs1, trying to add new arbiter node gfs3: > > # gluster volume add-brick gvol0 replica 3 arbiter 1 > gfs3:/nodirectwritedata/gluster/gvol0 > volume add-brick: failed: Commit failed on gfs3. Please check log file for > details. > This looks like a glusterd issue. Please check the glusterd logs for more info. Adding the glusterd dev to this thread. Sanju, can you take a look? Regards, Nithya > > On new node gfs3 in gvol0-add-brick-mount.log: > > [2019-05-17 01:20:22.689721] I [fuse-bridge.c:4267:fuse_init] > 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel > 7.22 > [2019-05-17 01:20:22.689778] I [fuse-bridge.c:4878:fuse_graph_sync] > 0-fuse: switched to graph 0 > [2019-05-17 01:20:22.694897] E [fuse-bridge.c:4336:fuse_first_lookup] > 0-fuse: first lookup on root failed (Transport endpoint is not connected) > [2019-05-17 01:20:22.699770] W [fuse-resolve.c:127:fuse_resolve_gfid_cbk] > 0-fuse: 00000000-0000-0000-0000-000000000001: failed to resolve (Transport > endpoint is not connected) > [2019-05-17 01:20:22.699834] W [fuse-bridge.c:3294:fuse_setxattr_resume] > 0-glusterfs-fuse: 2: SETXATTR 00000000-0000-0000-0000-000000000001/1 > (trusted.add-brick) resolution failed > [2019-05-17 01:20:22.715656] I [fuse-bridge.c:5144:fuse_thread_proc] > 0-fuse: initating unmount of /tmp/mntQAtu3f > [2019-05-17 01:20:22.715865] W [glusterfsd.c:1500:cleanup_and_exit] > (-->/lib64/libpthread.so.0(+0x7dd5) [0x7fb223bf6dd5] > -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x560886581e75] > -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x560886581ceb] ) 0-: > received signum (15), shutting down > [2019-05-17 01:20:22.715926] I [fuse-bridge.c:5914:fini] 0-fuse: > Unmounting '/tmp/mntQAtu3f'. > [2019-05-17 01:20:22.715953] I [fuse-bridge.c:5919:fini] 0-fuse: Closing > fuse connection to '/tmp/mntQAtu3f'. > > Processes running on new node gfs3: > > # ps -ef | grep gluster > root 6832 1 0 20:17 ? 00:00:00 /usr/sbin/glusterd -p > /var/run/glusterd.pid --log-level INFO > root 15799 1 0 20:17 ? 00:00:00 /usr/sbin/glusterfs -s > localhost --volfile-id gluster/glustershd -p > /var/run/gluster/glustershd/glustershd.pid -l > /var/log/glusterfs/glustershd.log -S > /var/run/gluster/24c12b09f93eec8e.socket --xlator-option > *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412 --process-name > glustershd > root 16856 16735 0 21:21 pts/0 00:00:00 grep --color=auto gluster > > -- > David Cunningham, Voisonics Limited > http://voisonics.com/ > USA: +1 213 221 1092 > New Zealand: +64 (0)28 2558 3782 > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From snowmailer at gmail.com Mon May 20 10:07:54 2019 From: snowmailer at gmail.com (Martin) Date: Mon, 20 May 2019 12:07:54 +0200 Subject: [Gluster-users] VMs blocked for more than 120 seconds In-Reply-To: References: <20190513065548.GI25080@althea.ulrar.net> <681F0862-7C80-414D-9637-7697A8C65AFA@gmail.com> Message-ID: <76CB580E-0F53-468F-B7F9-FE46C2971B8C@gmail.com> Hi Krutika, > Also, gluster version please? I am running old 3.7.6. (Yes I know I should upgrade asap) I?ve applied firstly "network.remote-dio off", behaviour did not changed, VMs got stuck after some time again. Then I?ve set "performance.strict-o-direct on" and problem completly disappeared. No more stucks at all (7 days without any problems at all). This SOLVED the issue. Can you explain what remote-dio and strict-o-direct variables changed in behaviour of my Gluster? It would be great for later archive/users to understand what and why this solved my issue. Anyway, Thanks a LOT!!! BR, Martin > On 13 May 2019, at 10:20, Krutika Dhananjay wrote: > > OK. In that case, can you check if the following two changes help: > > # gluster volume set $VOL network.remote-dio off > # gluster volume set $VOL performance.strict-o-direct on > > preferably one option changed at a time, its impact tested and then the next change applied and tested. > > Also, gluster version please? > > -Krutika > > On Mon, May 13, 2019 at 1:02 PM Martin Toth > wrote: > Cache in qemu is none. That should be correct. This is full command : > > /usr/bin/qemu-system-x86_64 -name one-312 -S -machine pc-i440fx-xenial,accel=kvm,usb=off -m 4096 -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -uuid e95a774e-a594-4e98-b141-9f30a3f848c1 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-one-312/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime -no-shutdown -boot order=c,menu=on,splash-time=3000,strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 > > -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 > -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 > -drive file=/var/lib/one//datastores/116/312/disk.0,format=raw,if=none,id=drive-virtio-disk1,cache=none > -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk1,id=virtio-disk1 > -drive file=gluster://localhost:24007/imagestore/ <>7b64d6757acc47a39503f68731f89b8e,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none > -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0 > -drive file=/var/lib/one//datastores/116/312/disk.1,format=raw,if=none,id=drive-ide0-0-0,readonly=on > -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 > > -netdev tap,fd=26,id=hostnet0 -device e1000,netdev=hostnet0,id=net0,mac=02:00:5c:f0:e4:39,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-one-312/org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -vnc 0.0.0.0:312 ,password -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on > > I?ve highlighted disks. First is VM context disk - Fuse used, second is SDA (OS is installed here) - libgfapi used, third is SWAP - Fuse used. > > Krutika, > I will start profiling on Gluster Volumes and wait for next VM to fail. Than I will attach/send profiling info after some VM will be failed. I suppose this is correct profiling strategy. > > About this, how many vms do you need to recreate it? A single vm? Or multiple vms doing IO in parallel? > > > Thanks, > BR! > Martin > >> On 13 May 2019, at 09:21, Krutika Dhananjay > wrote: >> >> Also, what's the caching policy that qemu is using on the affected vms? >> Is it cache=none? Or something else? You can get this information in the command line of qemu-kvm process corresponding to your vm in the ps output. >> >> -Krutika >> >> On Mon, May 13, 2019 at 12:49 PM Krutika Dhananjay > wrote: >> What version of gluster are you using? >> Also, can you capture and share volume-profile output for a run where you manage to recreate this issue? >> https://docs.gluster.org/en/v3/Administrator%20Guide/Monitoring%20Workload/#running-glusterfs-volume-profile-command >> Let me know if you have any questions. >> >> -Krutika >> >> On Mon, May 13, 2019 at 12:34 PM Martin Toth > wrote: >> Hi, >> >> there is no healing operation, not peer disconnects, no readonly filesystem. Yes, storage is slow and unavailable for 120 seconds, but why, its SSD with 10G, performance is good. >> >> > you'd have it's log on qemu's standard output, >> >> If you mean /var/log/libvirt/qemu/vm.log there is nothing. I am looking for problem for more than month, tried everything. Can?t find anything. Any more clues or leads? >> >> BR, >> Martin >> >> > On 13 May 2019, at 08:55, lemonnierk at ulrar.net wrote: >> > >> > On Mon, May 13, 2019 at 08:47:45AM +0200, Martin Toth wrote: >> >> Hi all, >> > >> > Hi >> > >> >> >> >> I am running replica 3 on SSDs with 10G networking, everything works OK but VMs stored in Gluster volume occasionally freeze with ?Task XY blocked for more than 120 seconds?. >> >> Only solution is to poweroff (hard) VM and than boot it up again. I am unable to SSH and also login with console, its stuck probably on some disk operation. No error/warning logs or messages are store in VMs logs. >> >> >> > >> > As far as I know this should be unrelated, I get this during heals >> > without any freezes, it just means the storage is slow I think. >> > >> >> KVM/Libvirt(qemu) using libgfapi and fuse mount to access VM disks on replica volume. Can someone advice how to debug this problem or what can cause these issues? >> >> It?s really annoying, I?ve tried to google everything but nothing came up. I?ve tried changing virtio-scsi-pci to virtio-blk-pci disk drivers, but its not related. >> >> >> > >> > Any chance your gluster goes readonly ? Have you checked your gluster >> > logs to see if maybe they lose each other some times ? >> > /var/log/glusterfs >> > >> > For libgfapi accesses you'd have it's log on qemu's standard output, >> > that might contain the actual error at the time of the freez. >> > _______________________________________________ >> > Gluster-users mailing list >> > Gluster-users at gluster.org >> > https://lists.gluster.org/mailman/listinfo/gluster-users >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From srakonde at redhat.com Mon May 20 12:10:25 2019 From: srakonde at redhat.com (Sanju Rakonde) Date: Mon, 20 May 2019 17:40:25 +0530 Subject: [Gluster-users] add-brick: failed: Commit failed In-Reply-To: References: Message-ID: David, can you please attach glusterd.logs? As the error message says, Commit failed on the arbitar node, we might be able to find some issue on that node. On Mon, May 20, 2019 at 10:10 AM Nithya Balachandran wrote: > > > On Fri, 17 May 2019 at 06:01, David Cunningham > wrote: > >> Hello, >> >> We're adding an arbiter node to an existing volume and having an issue. >> Can anyone help? The root cause error appears to be >> "00000000-0000-0000-0000-000000000001: failed to resolve (Transport >> endpoint is not connected)", as below. >> >> We are running glusterfs 5.6.1. Thanks in advance for any assistance! >> >> On existing node gfs1, trying to add new arbiter node gfs3: >> >> # gluster volume add-brick gvol0 replica 3 arbiter 1 >> gfs3:/nodirectwritedata/gluster/gvol0 >> volume add-brick: failed: Commit failed on gfs3. Please check log file >> for details. >> > > This looks like a glusterd issue. Please check the glusterd logs for more > info. > Adding the glusterd dev to this thread. Sanju, can you take a look? > > Regards, > Nithya > >> >> On new node gfs3 in gvol0-add-brick-mount.log: >> >> [2019-05-17 01:20:22.689721] I [fuse-bridge.c:4267:fuse_init] >> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel >> 7.22 >> [2019-05-17 01:20:22.689778] I [fuse-bridge.c:4878:fuse_graph_sync] >> 0-fuse: switched to graph 0 >> [2019-05-17 01:20:22.694897] E [fuse-bridge.c:4336:fuse_first_lookup] >> 0-fuse: first lookup on root failed (Transport endpoint is not connected) >> [2019-05-17 01:20:22.699770] W [fuse-resolve.c:127:fuse_resolve_gfid_cbk] >> 0-fuse: 00000000-0000-0000-0000-000000000001: failed to resolve (Transport >> endpoint is not connected) >> [2019-05-17 01:20:22.699834] W [fuse-bridge.c:3294:fuse_setxattr_resume] >> 0-glusterfs-fuse: 2: SETXATTR 00000000-0000-0000-0000-000000000001/1 >> (trusted.add-brick) resolution failed >> [2019-05-17 01:20:22.715656] I [fuse-bridge.c:5144:fuse_thread_proc] >> 0-fuse: initating unmount of /tmp/mntQAtu3f >> [2019-05-17 01:20:22.715865] W [glusterfsd.c:1500:cleanup_and_exit] >> (-->/lib64/libpthread.so.0(+0x7dd5) [0x7fb223bf6dd5] >> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x560886581e75] >> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x560886581ceb] ) 0-: >> received signum (15), shutting down >> [2019-05-17 01:20:22.715926] I [fuse-bridge.c:5914:fini] 0-fuse: >> Unmounting '/tmp/mntQAtu3f'. >> [2019-05-17 01:20:22.715953] I [fuse-bridge.c:5919:fini] 0-fuse: Closing >> fuse connection to '/tmp/mntQAtu3f'. >> >> Processes running on new node gfs3: >> >> # ps -ef | grep gluster >> root 6832 1 0 20:17 ? 00:00:00 /usr/sbin/glusterd -p >> /var/run/glusterd.pid --log-level INFO >> root 15799 1 0 20:17 ? 00:00:00 /usr/sbin/glusterfs -s >> localhost --volfile-id gluster/glustershd -p >> /var/run/gluster/glustershd/glustershd.pid -l >> /var/log/glusterfs/glustershd.log -S >> /var/run/gluster/24c12b09f93eec8e.socket --xlator-option >> *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412 --process-name >> glustershd >> root 16856 16735 0 21:21 pts/0 00:00:00 grep --color=auto gluster >> >> -- >> David Cunningham, Voisonics Limited >> http://voisonics.com/ >> USA: +1 213 221 1092 >> New Zealand: +64 (0)28 2558 3782 >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > -- Thanks, Sanju -------------- next part -------------- An HTML attachment was scrubbed... URL: From pkalever at redhat.com Mon May 20 12:36:41 2019 From: pkalever at redhat.com (Prasanna Kalever) Date: Mon, 20 May 2019 18:06:41 +0530 Subject: [Gluster-users] gluster-block v0.4 is alive! In-Reply-To: References: Message-ID: Hey Vlad, Thanks for trying gluster-block. Appreciate your feedback. Here is the patch which should fix the issue you have noticed: https://github.com/gluster/gluster-block/pull/233 Thanks! -- Prasanna On Sat, May 18, 2019 at 4:48 AM Vlad Kopylov wrote: > > > straight from > > ./autogen.sh && ./configure && make -j install > > > CentOS Linux release 7.6.1810 (Core) > > > May 17 19:13:18 vm2 gluster-blockd[24294]: Error opening log file: No such file or directory > May 17 19:13:18 vm2 gluster-blockd[24294]: Logging to stderr. > May 17 19:13:18 vm2 gluster-blockd[24294]: [2019-05-17 23:13:18.966992] CRIT: trying to change logDir from /var/log/gluster-block to /var/log/gluster-block [at utils.c+495 :] > May 17 19:13:19 vm2 gluster-blockd[24294]: No such path /backstores/user:glfs > May 17 19:13:19 vm2 systemd[1]: gluster-blockd.service: main process exited, code=exited, status=1/FAILURE > May 17 19:13:19 vm2 systemd[1]: Unit gluster-blockd.service entered failed state. > May 17 19:13:19 vm2 systemd[1]: gluster-blockd.service failed. > > > > On Thu, May 2, 2019 at 1:35 PM Prasanna Kalever wrote: >> >> Hello Gluster folks, >> >> Gluster-block team is happy to announce the v0.4 release [1]. >> >> This is the new stable version of gluster-block, lots of new and >> exciting features and interesting bug fixes are made available as part >> of this release. >> Please find the big list of release highlights and notable fixes at [2]. >> >> Details about installation can be found in the easy install guide at >> [3]. Find the details about prerequisites and setup guide at [4]. >> If you are a new user, checkout the demo video attached in the README >> doc [5], which will be a good source of intro to the project. >> There are good examples about how to use gluster-block both in the man >> pages [6] and test file [7] (also in the README). >> >> gluster-block is part of fedora package collection, an updated package >> with release version v0.4 will be soon made available. And the >> community provided packages will be soon made available at [8]. >> >> Please spend a minute to report any kind of issue that comes to your >> notice with this handy link [9]. >> We look forward to your feedback, which will help gluster-block get better! >> >> We would like to thank all our users, contributors for bug filing and >> fixes, also the whole team who involved in the huge effort with >> pre-release testing. >> >> >> [1] https://github.com/gluster/gluster-block >> [2] https://github.com/gluster/gluster-block/releases >> [3] https://github.com/gluster/gluster-block/blob/master/INSTALL >> [4] https://github.com/gluster/gluster-block#usage >> [5] https://github.com/gluster/gluster-block/blob/master/README.md >> [6] https://github.com/gluster/gluster-block/tree/master/docs >> [7] https://github.com/gluster/gluster-block/blob/master/tests/basic.t >> [8] https://download.gluster.org/pub/gluster/gluster-block/ >> [9] https://github.com/gluster/gluster-block/issues/new >> >> Cheers, >> Team Gluster-Block! >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users From vladkopy at gmail.com Mon May 20 15:35:20 2019 From: vladkopy at gmail.com (Vlad Kopylov) Date: Mon, 20 May 2019 11:35:20 -0400 Subject: [Gluster-users] gluster-block v0.4 is alive! In-Reply-To: References: Message-ID: Thank you Prasanna. Do we have architecture somewhere? Dies it bypass Fuse and go directly gfapi ? v On Mon, May 20, 2019, 8:36 AM Prasanna Kalever wrote: > Hey Vlad, > > Thanks for trying gluster-block. Appreciate your feedback. > > Here is the patch which should fix the issue you have noticed: > https://github.com/gluster/gluster-block/pull/233 > > Thanks! > -- > Prasanna > > On Sat, May 18, 2019 at 4:48 AM Vlad Kopylov wrote: > > > > > > straight from > > > > ./autogen.sh && ./configure && make -j install > > > > > > CentOS Linux release 7.6.1810 (Core) > > > > > > May 17 19:13:18 vm2 gluster-blockd[24294]: Error opening log file: No > such file or directory > > May 17 19:13:18 vm2 gluster-blockd[24294]: Logging to stderr. > > May 17 19:13:18 vm2 gluster-blockd[24294]: [2019-05-17 23:13:18.966992] > CRIT: trying to change logDir from /var/log/gluster-block to > /var/log/gluster-block [at utils.c+495 :] > > May 17 19:13:19 vm2 gluster-blockd[24294]: No such path > /backstores/user:glfs > > May 17 19:13:19 vm2 systemd[1]: gluster-blockd.service: main process > exited, code=exited, status=1/FAILURE > > May 17 19:13:19 vm2 systemd[1]: Unit gluster-blockd.service entered > failed state. > > May 17 19:13:19 vm2 systemd[1]: gluster-blockd.service failed. > > > > > > > > On Thu, May 2, 2019 at 1:35 PM Prasanna Kalever > wrote: > >> > >> Hello Gluster folks, > >> > >> Gluster-block team is happy to announce the v0.4 release [1]. > >> > >> This is the new stable version of gluster-block, lots of new and > >> exciting features and interesting bug fixes are made available as part > >> of this release. > >> Please find the big list of release highlights and notable fixes at [2]. > >> > >> Details about installation can be found in the easy install guide at > >> [3]. Find the details about prerequisites and setup guide at [4]. > >> If you are a new user, checkout the demo video attached in the README > >> doc [5], which will be a good source of intro to the project. > >> There are good examples about how to use gluster-block both in the man > >> pages [6] and test file [7] (also in the README). > >> > >> gluster-block is part of fedora package collection, an updated package > >> with release version v0.4 will be soon made available. And the > >> community provided packages will be soon made available at [8]. > >> > >> Please spend a minute to report any kind of issue that comes to your > >> notice with this handy link [9]. > >> We look forward to your feedback, which will help gluster-block get > better! > >> > >> We would like to thank all our users, contributors for bug filing and > >> fixes, also the whole team who involved in the huge effort with > >> pre-release testing. > >> > >> > >> [1] https://github.com/gluster/gluster-block > >> [2] https://github.com/gluster/gluster-block/releases > >> [3] https://github.com/gluster/gluster-block/blob/master/INSTALL > >> [4] https://github.com/gluster/gluster-block#usage > >> [5] https://github.com/gluster/gluster-block/blob/master/README.md > >> [6] https://github.com/gluster/gluster-block/tree/master/docs > >> [7] https://github.com/gluster/gluster-block/blob/master/tests/basic.t > >> [8] https://download.gluster.org/pub/gluster/gluster-block/ > >> [9] https://github.com/gluster/gluster-block/issues/new > >> > >> Cheers, > >> Team Gluster-Block! > >> _______________________________________________ > >> Gluster-users mailing list > >> Gluster-users at gluster.org > >> https://lists.gluster.org/mailman/listinfo/gluster-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From spisla80 at gmail.com Tue May 21 09:24:18 2019 From: spisla80 at gmail.com (David Spisla) Date: Tue, 21 May 2019 11:24:18 +0200 Subject: [Gluster-users] [Gluster-devel] Improve stability between SMB/CTDB and Gluster (together with Samba Core Developer) In-Reply-To: References: Message-ID: Hello together, we are still seeking a day and time to talk about interesting Samba / Glusterfs issues. Here is a new list of possible dates and time. May 22th ? 24th at 12:30 - 14:30 IST or (9:00 - 11:00 CEST) May 27th ? 29th and 31th at 12:30 - 14:30 IST (9:00 - 11:00 CEST) On May 30th there is a holiday here in germany. @Poornima Gurusiddaiah If there is any problem finding a date please contanct me. I will look for alternatives Regards David Spisla Am Do., 16. Mai 2019 um 12:42 Uhr schrieb David Spisla : > Hello Amar, > > thank you for the information. Of course, we should wait for Poornima > because of her knowledge. > > Regards > David Spisla > > Am Do., 16. Mai 2019 um 12:23 Uhr schrieb Amar Tumballi Suryanarayan < > atumball at redhat.com>: > >> David, Poornima is on leave from today till 21st May. So having it after >> she comes back is better. She has more experience in SMB integration than >> many of us. >> >> -Amar >> >> On Thu, May 16, 2019 at 1:09 PM David Spisla wrote: >> >>> Hello everyone, >>> >>> if there is any problem in finding a date and time, please contact me. >>> It would be fine to have a meeting soon. >>> >>> Regards >>> David Spisla >>> >>> Am Mo., 13. Mai 2019 um 12:38 Uhr schrieb David Spisla < >>> david.spisla at iternity.com>: >>> >>>> Hi Poornima, >>>> >>>> >>>> >>>> thats fine. I would suggest this dates and times: >>>> >>>> >>>> >>>> May 15th ? 17th at 12:30, 13:30, 14:30 IST (9:00, 10:00, 11:00 CEST) >>>> >>>> May 20th ? 24th at 12:30, 13:30, 14:30 IST (9:00, 10:00, 11:00 CEST) >>>> >>>> >>>> >>>> I add Volker Lendecke from Sernet to the mail. He is the Samba Expert. >>>> >>>> Can someone of you provide a host via bluejeans.com? If not, I will >>>> try it with GoToMeeting (https://www.gotomeeting.com). >>>> >>>> >>>> >>>> @all Please write your prefered dates and times. For me, all oft the >>>> above dates and times are fine >>>> >>>> >>>> >>>> Regards >>>> >>>> David >>>> >>>> >>>> >>>> >>>> >>>> *Von:* Poornima Gurusiddaiah >>>> *Gesendet:* Montag, 13. Mai 2019 07:22 >>>> *An:* David Spisla ; Anoop C S ; >>>> Gunther Deschner >>>> *Cc:* Gluster Devel ; >>>> gluster-users at gluster.org List >>>> *Betreff:* Re: [Gluster-devel] Improve stability between SMB/CTDB and >>>> Gluster (together with Samba Core Developer) >>>> >>>> >>>> >>>> Hi, >>>> >>>> >>>> >>>> We would be definitely interested in this. Thank you for contacting us. >>>> For the starter we can have an online conference. Please suggest few >>>> possible date and times for the week(preferably between IST 7.00AM - >>>> 9.PM)? >>>> >>>> Adding Anoop and Gunther who are also the main contributors to the >>>> Gluster-Samba integration. >>>> >>>> >>>> >>>> Thanks, >>>> >>>> Poornima >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Thu, May 9, 2019 at 7:43 PM David Spisla wrote: >>>> >>>> Dear Gluster Community, >>>> >>>> at the moment we are improving the stability of SMB/CTDB and Gluster. >>>> For this purpose we are working together with an advanced SAMBA Core >>>> Developer. He did some debugging but needs more information about Gluster >>>> Core Behaviour. >>>> >>>> >>>> >>>> *Would any of the Gluster Developer wants to have a online conference >>>> with him and me?* >>>> >>>> >>>> >>>> I would organize everything. In my opinion this is a good chance to >>>> improve stability of Glusterfs and this is at the moment one of the major >>>> issues in the Community. >>>> >>>> >>>> >>>> Regards >>>> >>>> David Spisla >>>> >>>> _______________________________________________ >>>> >>>> Community Meeting Calendar: >>>> >>>> APAC Schedule - >>>> Every 2nd and 4th Tuesday at 11:30 AM IST >>>> Bridge: https://bluejeans.com/836554017 >>>> >>>> NA/EMEA Schedule - >>>> Every 1st and 3rd Tuesday at 01:00 PM EDT >>>> Bridge: https://bluejeans.com/486278655 >>>> >>>> Gluster-devel mailing list >>>> Gluster-devel at gluster.org >>>> https://lists.gluster.org/mailman/listinfo/gluster-devel >>>> >>>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> >> >> -- >> Amar Tumballi (amarts) >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kdhananj at redhat.com Tue May 21 10:05:07 2019 From: kdhananj at redhat.com (Krutika Dhananjay) Date: Tue, 21 May 2019 15:35:07 +0530 Subject: [Gluster-users] VMs blocked for more than 120 seconds In-Reply-To: <76CB580E-0F53-468F-B7F9-FE46C2971B8C@gmail.com> References: <20190513065548.GI25080@althea.ulrar.net> <681F0862-7C80-414D-9637-7697A8C65AFA@gmail.com> <76CB580E-0F53-468F-B7F9-FE46C2971B8C@gmail.com> Message-ID: Hi Martin, Glad it worked! And yes, 3.7.6 is really old! :) So the issue is occurring when the vm flushes outstanding data to disk. And this is taking > 120s because there's lot of buffered writes to flush, possibly followed by an fsync too which needs to sync them to disk (volume profile would have been helpful in confirming this). All these two options do is to truly honor O_DIRECT flag (which is what we want anyway given the vms are opened with 'cache=none' qemu option). This will skip write-caching on gluster client side and also bypass the page-cache on the gluster-bricks, and so data gets flushed faster, thereby eliminating these timeouts. -Krutika On Mon, May 20, 2019 at 3:38 PM Martin wrote: > Hi Krutika, > > Also, gluster version please? > > I am running old 3.7.6. (Yes I know I should upgrade asap) > > I?ve applied firstly "network.remote-dio off", behaviour did not changed, > VMs got stuck after some time again. > Then I?ve set "performance.strict-o-direct on" and problem completly > disappeared. No more stucks at all (7 days without any problems at all). > This SOLVED the issue. > > Can you explain what remote-dio and strict-o-direct variables changed in > behaviour of my Gluster? It would be great for later archive/users to > understand what and why this solved my issue. > > Anyway, Thanks a LOT!!! > > BR, > Martin > > On 13 May 2019, at 10:20, Krutika Dhananjay wrote: > > OK. In that case, can you check if the following two changes help: > > # gluster volume set $VOL network.remote-dio off > # gluster volume set $VOL performance.strict-o-direct on > > preferably one option changed at a time, its impact tested and then the > next change applied and tested. > > Also, gluster version please? > > -Krutika > > On Mon, May 13, 2019 at 1:02 PM Martin Toth wrote: > >> Cache in qemu is none. That should be correct. This is full command : >> >> /usr/bin/qemu-system-x86_64 -name one-312 -S -machine >> pc-i440fx-xenial,accel=kvm,usb=off -m 4096 -realtime mlock=off -smp >> 4,sockets=4,cores=1,threads=1 -uuid e95a774e-a594-4e98-b141-9f30a3f848c1 >> -no-user-config -nodefaults -chardev >> socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-one-312/monitor.sock,server,nowait >> -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime >> -no-shutdown -boot order=c,menu=on,splash-time=3000,strict=on -device >> piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 >> >> -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 >> -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 >> -drive file=/var/lib/one//datastores/116/312/*disk.0* >> ,format=raw,if=none,id=drive-virtio-disk1,cache=none >> -device >> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk1,id=virtio-disk1 >> -drive file=gluster://localhost:24007/imagestore/ >> *7b64d6757acc47a39503f68731f89b8e* >> ,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none >> -device >> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0 >> -drive file=/var/lib/one//datastores/116/312/*disk.1* >> ,format=raw,if=none,id=drive-ide0-0-0,readonly=on >> -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 >> >> -netdev tap,fd=26,id=hostnet0 >> -device e1000,netdev=hostnet0,id=net0,mac=02:00:5c:f0:e4:39,bus=pci.0,addr=0x3 >> -chardev pty,id=charserial0 -device >> isa-serial,chardev=charserial0,id=serial0 >> -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-one-312/org.qemu.guest_agent.0,server,nowait >> -device >> virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 >> -vnc 0.0.0.0:312,password -device >> cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device >> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on >> >> I?ve highlighted disks. First is VM context disk - Fuse used, second is >> SDA (OS is installed here) - libgfapi used, third is SWAP - Fuse used. >> >> Krutika, >> I will start profiling on Gluster Volumes and wait for next VM to fail. >> Than I will attach/send profiling info after some VM will be failed. I >> suppose this is correct profiling strategy. >> > > About this, how many vms do you need to recreate it? A single vm? Or > multiple vms doing IO in parallel? > > >> Thanks, >> BR! >> Martin >> >> On 13 May 2019, at 09:21, Krutika Dhananjay wrote: >> >> Also, what's the caching policy that qemu is using on the affected vms? >> Is it cache=none? Or something else? You can get this information in the >> command line of qemu-kvm process corresponding to your vm in the ps output. >> >> -Krutika >> >> On Mon, May 13, 2019 at 12:49 PM Krutika Dhananjay >> wrote: >> >>> What version of gluster are you using? >>> Also, can you capture and share volume-profile output for a run where >>> you manage to recreate this issue? >>> >>> https://docs.gluster.org/en/v3/Administrator%20Guide/Monitoring%20Workload/#running-glusterfs-volume-profile-command >>> Let me know if you have any questions. >>> >>> -Krutika >>> >>> On Mon, May 13, 2019 at 12:34 PM Martin Toth >>> wrote: >>> >>>> Hi, >>>> >>>> there is no healing operation, not peer disconnects, no readonly >>>> filesystem. Yes, storage is slow and unavailable for 120 seconds, but why, >>>> its SSD with 10G, performance is good. >>>> >>>> > you'd have it's log on qemu's standard output, >>>> >>>> If you mean /var/log/libvirt/qemu/vm.log there is nothing. I am looking >>>> for problem for more than month, tried everything. Can?t find anything. Any >>>> more clues or leads? >>>> >>>> BR, >>>> Martin >>>> >>>> > On 13 May 2019, at 08:55, lemonnierk at ulrar.net wrote: >>>> > >>>> > On Mon, May 13, 2019 at 08:47:45AM +0200, Martin Toth wrote: >>>> >> Hi all, >>>> > >>>> > Hi >>>> > >>>> >> >>>> >> I am running replica 3 on SSDs with 10G networking, everything works >>>> OK but VMs stored in Gluster volume occasionally freeze with ?Task XY >>>> blocked for more than 120 seconds?. >>>> >> Only solution is to poweroff (hard) VM and than boot it up again. I >>>> am unable to SSH and also login with console, its stuck probably on some >>>> disk operation. No error/warning logs or messages are store in VMs logs. >>>> >> >>>> > >>>> > As far as I know this should be unrelated, I get this during heals >>>> > without any freezes, it just means the storage is slow I think. >>>> > >>>> >> KVM/Libvirt(qemu) using libgfapi and fuse mount to access VM disks >>>> on replica volume. Can someone advice how to debug this problem or what >>>> can cause these issues? >>>> >> It?s really annoying, I?ve tried to google everything but nothing >>>> came up. I?ve tried changing virtio-scsi-pci to virtio-blk-pci disk >>>> drivers, but its not related. >>>> >> >>>> > >>>> > Any chance your gluster goes readonly ? Have you checked your gluster >>>> > logs to see if maybe they lose each other some times ? >>>> > /var/log/glusterfs >>>> > >>>> > For libgfapi accesses you'd have it's log on qemu's standard output, >>>> > that might contain the actual error at the time of the freez. >>>> > _______________________________________________ >>>> > Gluster-users mailing list >>>> > Gluster-users at gluster.org >>>> > https://lists.gluster.org/mailman/listinfo/gluster-users >>>> >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pkalever at redhat.com Tue May 21 14:39:22 2019 From: pkalever at redhat.com (Prasanna Kalever) Date: Tue, 21 May 2019 20:09:22 +0530 Subject: [Gluster-users] gluster-block v0.4 is alive! In-Reply-To: References: Message-ID: On Mon, May 20, 2019 at 9:05 PM Vlad Kopylov wrote: > > Thank you Prasanna. > > Do we have architecture somewhere? Vlad, Although the complete set of details might be missing at one place right now, some pointers to start are available at, https://github.com/gluster/gluster-block#gluster-block and https://pkalever.wordpress.com/2019/05/06/starting-with-gluster-block, hopefully that should give some clarity about the project. Also checkout the man pages. > Dies it bypass Fuse and go directly gfapi ? yes, we don't use Fuse access with gluster-block. The management as-well-as IO happens over gfapi. Please go through the docs pointed above, if you have any specific queries, feel free to ask them here or on github. Best Regards, -- Prasanna > > v > > On Mon, May 20, 2019, 8:36 AM Prasanna Kalever wrote: >> >> Hey Vlad, >> >> Thanks for trying gluster-block. Appreciate your feedback. >> >> Here is the patch which should fix the issue you have noticed: >> https://github.com/gluster/gluster-block/pull/233 >> >> Thanks! >> -- >> Prasanna >> >> On Sat, May 18, 2019 at 4:48 AM Vlad Kopylov wrote: >> > >> > >> > straight from >> > >> > ./autogen.sh && ./configure && make -j install >> > >> > >> > CentOS Linux release 7.6.1810 (Core) >> > >> > >> > May 17 19:13:18 vm2 gluster-blockd[24294]: Error opening log file: No such file or directory >> > May 17 19:13:18 vm2 gluster-blockd[24294]: Logging to stderr. >> > May 17 19:13:18 vm2 gluster-blockd[24294]: [2019-05-17 23:13:18.966992] CRIT: trying to change logDir from /var/log/gluster-block to /var/log/gluster-block [at utils.c+495 :] >> > May 17 19:13:19 vm2 gluster-blockd[24294]: No such path /backstores/user:glfs >> > May 17 19:13:19 vm2 systemd[1]: gluster-blockd.service: main process exited, code=exited, status=1/FAILURE >> > May 17 19:13:19 vm2 systemd[1]: Unit gluster-blockd.service entered failed state. >> > May 17 19:13:19 vm2 systemd[1]: gluster-blockd.service failed. >> > >> > >> > >> > On Thu, May 2, 2019 at 1:35 PM Prasanna Kalever wrote: >> >> >> >> Hello Gluster folks, >> >> >> >> Gluster-block team is happy to announce the v0.4 release [1]. >> >> >> >> This is the new stable version of gluster-block, lots of new and >> >> exciting features and interesting bug fixes are made available as part >> >> of this release. >> >> Please find the big list of release highlights and notable fixes at [2]. >> >> >> >> Details about installation can be found in the easy install guide at >> >> [3]. Find the details about prerequisites and setup guide at [4]. >> >> If you are a new user, checkout the demo video attached in the README >> >> doc [5], which will be a good source of intro to the project. >> >> There are good examples about how to use gluster-block both in the man >> >> pages [6] and test file [7] (also in the README). >> >> >> >> gluster-block is part of fedora package collection, an updated package >> >> with release version v0.4 will be soon made available. And the >> >> community provided packages will be soon made available at [8]. >> >> >> >> Please spend a minute to report any kind of issue that comes to your >> >> notice with this handy link [9]. >> >> We look forward to your feedback, which will help gluster-block get better! >> >> >> >> We would like to thank all our users, contributors for bug filing and >> >> fixes, also the whole team who involved in the huge effort with >> >> pre-release testing. >> >> >> >> >> >> [1] https://github.com/gluster/gluster-block >> >> [2] https://github.com/gluster/gluster-block/releases >> >> [3] https://github.com/gluster/gluster-block/blob/master/INSTALL >> >> [4] https://github.com/gluster/gluster-block#usage >> >> [5] https://github.com/gluster/gluster-block/blob/master/README.md >> >> [6] https://github.com/gluster/gluster-block/tree/master/docs >> >> [7] https://github.com/gluster/gluster-block/blob/master/tests/basic.t >> >> [8] https://download.gluster.org/pub/gluster/gluster-block/ >> >> [9] https://github.com/gluster/gluster-block/issues/new >> >> >> >> Cheers, >> >> Team Gluster-Block! >> >> _______________________________________________ >> >> Gluster-users mailing list >> >> Gluster-users at gluster.org >> >> https://lists.gluster.org/mailman/listinfo/gluster-users From rabhat at redhat.com Tue May 21 15:13:18 2019 From: rabhat at redhat.com (FNU Raghavendra Manjunath) Date: Tue, 21 May 2019 11:13:18 -0400 Subject: [Gluster-users] Proposal: Changes in Gluster Community meetings In-Reply-To: References: <62104B6F-99CF-4C22-80FC-9C177F73E897@onholyground.com> <907BA003-F786-46CF-A31B-38C93CE9BB20@onholyground.com> Message-ID: Today's meeting will happen couple of hours from now. i.e. 1PM EST at ( https://bluejeans.com/486278655) I am not able to see the meeting in my calendar. I am not sure whether this is the case just for me or is it not visible to others as well. Either way, I will be waiting at the above mentioned bluejeans link. Regards, Raghavendra On Wed, May 1, 2019 at 8:37 AM Amar Tumballi Suryanarayan < atumball at redhat.com> wrote: > > > On Tue, Apr 23, 2019 at 8:47 PM Darrell Budic > wrote: > >> I was one of the folk who wanted a NA/EMEA scheduled meeting, and I?m >> going to have to miss it due to some real life issues (clogged sewer I?m >> going to have to be dealing with at the time). Apologies, I?ll work on >> making the next one. >> >> > No problem. We will continue to have these meetings every week (ie, > bi-weekly in each timezone). Feel free to join when possible. We surely > like to see more community participation for sure, but everyone would have > their day jobs, so no pressure :-) > > -Amar > > >> -Darrell >> >> On Apr 22, 2019, at 4:20 PM, FNU Raghavendra Manjunath >> wrote: >> >> >> Hi, >> >> This is the agenda for tomorrow's community meeting for NA/EMEA timezone. >> >> https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g?both >> ---- >> >> >> >> On Thu, Apr 11, 2019 at 4:56 AM Amar Tumballi Suryanarayan < >> atumball at redhat.com> wrote: >> >>> Hi All, >>> >>> Below is the final details of our community meeting, and I will be >>> sending invites to mailing list following this email. You can add Gluster >>> Community Calendar so you can get notifications on the meetings. >>> >>> We are starting the meetings from next week. For the first meeting, we >>> need 1 volunteer from users to discuss the use case / what went well, and >>> what went bad, etc. preferrably in APAC region. NA/EMEA region, next week. >>> >>> Draft Content: https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g >>> ---- >>> Gluster Community Meeting >>> Previous >>> Meeting minutes: >>> >>> - http://github.com/gluster/community >>> >>> >>> Date/Time: >>> Check the community calendar >>> >>> Bridge >>> >>> - APAC friendly hours >>> - Bridge: https://bluejeans.com/836554017 >>> - NA/EMEA >>> - Bridge: https://bluejeans.com/486278655 >>> >>> ------------------------------ >>> Attendance >>> >>> - Name, Company >>> >>> Host >>> >>> - Who will host next meeting? >>> - Host will need to send out the agenda 24hr - 12hrs in advance >>> to mailing list, and also make sure to send the meeting minutes. >>> - Host will need to reach out to one user at least who can talk >>> about their usecase, their experience, and their needs. >>> - Host needs to send meeting minutes as PR to >>> http://github.com/gluster/community >>> >>> User stories >>> >>> - Discuss 1 usecase from a user. >>> - How was the architecture derived, what volume type used, >>> options, etc? >>> - What were the major issues faced ? How to improve them? >>> - What worked good? >>> - How can we all collaborate well, so it is win-win for the >>> community and the user? How can we >>> >>> Community >>> >>> - >>> >>> Any release updates? >>> - >>> >>> Blocker issues across the project? >>> - >>> >>> Metrics >>> - Number of new bugs since previous meeting. How many are not >>> triaged? >>> - Number of emails, anything unanswered? >>> >>> Conferences >>> / Meetups >>> >>> - Any conference in next 1 month where gluster-developers are going? >>> gluster-users are going? So we can meet and discuss. >>> >>> Developer >>> focus >>> >>> - >>> >>> Any design specs to discuss? >>> - >>> >>> Metrics of the week? >>> - Coverity >>> - Clang-Scan >>> - Number of patches from new developers. >>> - Did we increase test coverage? >>> - [Atin] Also talk about most frequent test failures in the CI >>> and carve out an AI to get them fixed. >>> >>> RoundTable >>> >>> - >>> >>> ---- >>> >>> Regards, >>> Amar >>> >>> On Mon, Mar 25, 2019 at 8:53 PM Amar Tumballi Suryanarayan < >>> atumball at redhat.com> wrote: >>> >>>> Thanks for the feedback Darrell, >>>> >>>> The new proposal is to have one in North America 'morning' time. (10AM >>>> PST), And another in ASIA day time, which is evening 7pm/6pm in Australia, >>>> 9pm Newzealand, 5pm Tokyo, 4pm Beijing. >>>> >>>> For example, if we choose Every other Tuesday for meeting, and 1st of >>>> the month is Tuesday, we would have North America time for 1st, and on 15th >>>> it would be ASIA/Pacific time. >>>> >>>> Hopefully, this way, we can cover all the timezones, and meeting >>>> minutes would be committed to github repo, so that way, it will be easier >>>> for everyone to be aware of what is happening. >>>> >>>> Regards, >>>> Amar >>>> >>>> On Mon, Mar 25, 2019 at 8:40 PM Darrell Budic >>>> wrote: >>>> >>>>> As a user, I?d like to visit more of these, but the time slot is my >>>>> 3AM. Any possibility for a rolling schedule (move meeting +6 hours each >>>>> week with rolling attendance from maintainers?) or an occasional regional >>>>> meeting 12 hours opposed to the one you?re proposing? >>>>> >>>>> -Darrell >>>>> >>>>> On Mar 25, 2019, at 4:25 AM, Amar Tumballi Suryanarayan < >>>>> atumball at redhat.com> wrote: >>>>> >>>>> All, >>>>> >>>>> We currently have 3 meetings which are public: >>>>> >>>>> 1. Maintainer's Meeting >>>>> >>>>> - Runs once in 2 weeks (on Mondays), and current attendance is around >>>>> 3-5 on an avg, and not much is discussed. >>>>> - Without majority attendance, we can't take any decisions too. >>>>> >>>>> 2. Community meeting >>>>> >>>>> - Supposed to happen on #gluster-meeting, every 2 weeks, and is the >>>>> only meeting which is for 'Community/Users'. Others are for >>>>> developers as of now. >>>>> Sadly attendance is getting closer to 0 in recent times. >>>>> >>>>> 3. GCS meeting >>>>> >>>>> - We started it as an effort inside Red Hat gluster team, and opened >>>>> it up for community from Jan 2019, but the attendance was always from >>>>> RHT members, and haven't seen any traction from wider group. >>>>> >>>>> So, I have a proposal to call out for cancelling all these meeting, >>>>> and keeping just 1 weekly 'Community' meeting, where even topics >>>>> related to maintainers and GCS and other projects can be discussed. >>>>> >>>>> I have a template of a draft template @ >>>>> https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g >>>>> >>>>> Please feel free to suggest improvements, both in agenda and in >>>>> timings. So, we can have more participation from members of community, >>>>> which allows more user - developer interactions, and hence quality of >>>>> project. >>>>> >>>>> Waiting for feedbacks, >>>>> >>>>> Regards, >>>>> Amar >>>>> >>>>> >>>>> _______________________________________________ >>>>> Gluster-users mailing list >>>>> Gluster-users at gluster.org >>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>> >>>>> >>>>> >>>> >>>> -- >>>> Amar Tumballi (amarts) >>>> >>> >>> >>> -- >>> Amar Tumballi (amarts) >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > > > -- > Amar Tumballi (amarts) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dcunningham at voisonics.com Tue May 21 23:27:09 2019 From: dcunningham at voisonics.com (David Cunningham) Date: Wed, 22 May 2019 11:27:09 +1200 Subject: [Gluster-users] add-brick: failed: Commit failed In-Reply-To: References: Message-ID: Hi Sanju, Here's what glusterd.log says on the new arbiter server when trying to add the node: [2019-05-22 00:15:05.963059] I [run.c:242:runner_log] (-->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x3b2cd) [0x7fe4ca9102cd] -->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0xe6b85) [0x7fe4ca9bbb85] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7fe4d5ecc955] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh --volname=gvol0 --version=1 --volume-op=add-brick --gd-workdir=/var/lib/glusterd [2019-05-22 00:15:05.963177] I [MSGID: 106578] [glusterd-brick-ops.c:1355:glusterd_op_perform_add_bricks] 0-management: replica-count is set 3 [2019-05-22 00:15:05.963228] I [MSGID: 106578] [glusterd-brick-ops.c:1360:glusterd_op_perform_add_bricks] 0-management: arbiter-count is set 1 [2019-05-22 00:15:05.963257] I [MSGID: 106578] [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks] 0-management: type is set 0, need to change it [2019-05-22 00:15:17.015268] E [MSGID: 106053] [glusterd-utils.c:13942:glusterd_handle_replicate_brick_ops] 0-management: Failed to set extended attribute trusted.add-brick : Transport endpoint is not connected [Transport endpoint is not connected] [2019-05-22 00:15:17.036479] E [MSGID: 106073] [glusterd-brick-ops.c:2595:glusterd_op_add_brick] 0-glusterd: Unable to add bricks [2019-05-22 00:15:17.036595] E [MSGID: 106122] [glusterd-mgmt.c:299:gd_mgmt_v3_commit_fn] 0-management: Add-brick commit failed. [2019-05-22 00:15:17.036710] E [MSGID: 106122] [glusterd-mgmt-handler.c:594:glusterd_handle_commit_fn] 0-management: commit failed on operation Add brick As before gvol0-add-brick-mount.log said: [2019-05-22 00:15:17.005695] I [fuse-bridge.c:4267:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.22 [2019-05-22 00:15:17.005749] I [fuse-bridge.c:4878:fuse_graph_sync] 0-fuse: switched to graph 0 [2019-05-22 00:15:17.010101] E [fuse-bridge.c:4336:fuse_first_lookup] 0-fuse: first lookup on root failed (Transport endpoint is not connected) [2019-05-22 00:15:17.014217] W [fuse-bridge.c:897:fuse_attr_cbk] 0-glusterfs-fuse: 2: LOOKUP() / => -1 (Transport endpoint is not connected) [2019-05-22 00:15:17.015097] W [fuse-resolve.c:127:fuse_resolve_gfid_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001: failed to resolve (Transport endpoint is not connected) [2019-05-22 00:15:17.015158] W [fuse-bridge.c:3294:fuse_setxattr_resume] 0-glusterfs-fuse: 3: SETXATTR 00000000-0000-0000-0000-000000000001/1 (trusted.add-brick) resolution failed [2019-05-22 00:15:17.035636] I [fuse-bridge.c:5144:fuse_thread_proc] 0-fuse: initating unmount of /tmp/mntYGNbj9 [2019-05-22 00:15:17.035854] W [glusterfsd.c:1500:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f7745ccedd5] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55c81b63de75] -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55c81b63dceb] ) 0-: received signum (15), shutting down [2019-05-22 00:15:17.035942] I [fuse-bridge.c:5914:fini] 0-fuse: Unmounting '/tmp/mntYGNbj9'. [2019-05-22 00:15:17.035966] I [fuse-bridge.c:5919:fini] 0-fuse: Closing fuse connection to '/tmp/mntYGNbj9'. Here are the processes running on the new arbiter server: # ps -ef | grep gluster root 3466 1 0 20:13 ? 00:00:00 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/24c12b09f93eec8e.socket --xlator-option *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412 --process-name glustershd root 6832 1 0 May16 ? 00:02:10 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO root 17841 1 0 May16 ? 00:00:58 /usr/sbin/glusterfs --process-name fuse --volfile-server=gfs1 --volfile-id=/gvol0 /mnt/glusterfs Here are the files created on the new arbiter server: # find /nodirectwritedata/gluster/gvol0 | xargs ls -ald drwxr-xr-x 3 root root 4096 May 21 20:15 /nodirectwritedata/gluster/gvol0 drw------- 2 root root 4096 May 21 20:15 /nodirectwritedata/gluster/gvol0/.glusterfs Thank you for your help! On Tue, 21 May 2019 at 00:10, Sanju Rakonde wrote: > David, > > can you please attach glusterd.logs? As the error message says, Commit > failed on the arbitar node, we might be able to find some issue on that > node. > > On Mon, May 20, 2019 at 10:10 AM Nithya Balachandran > wrote: > >> >> >> On Fri, 17 May 2019 at 06:01, David Cunningham >> wrote: >> >>> Hello, >>> >>> We're adding an arbiter node to an existing volume and having an issue. >>> Can anyone help? The root cause error appears to be >>> "00000000-0000-0000-0000-000000000001: failed to resolve (Transport >>> endpoint is not connected)", as below. >>> >>> We are running glusterfs 5.6.1. Thanks in advance for any assistance! >>> >>> On existing node gfs1, trying to add new arbiter node gfs3: >>> >>> # gluster volume add-brick gvol0 replica 3 arbiter 1 >>> gfs3:/nodirectwritedata/gluster/gvol0 >>> volume add-brick: failed: Commit failed on gfs3. Please check log file >>> for details. >>> >> >> This looks like a glusterd issue. Please check the glusterd logs for more >> info. >> Adding the glusterd dev to this thread. Sanju, can you take a look? >> >> Regards, >> Nithya >> >>> >>> On new node gfs3 in gvol0-add-brick-mount.log: >>> >>> [2019-05-17 01:20:22.689721] I [fuse-bridge.c:4267:fuse_init] >>> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel >>> 7.22 >>> [2019-05-17 01:20:22.689778] I [fuse-bridge.c:4878:fuse_graph_sync] >>> 0-fuse: switched to graph 0 >>> [2019-05-17 01:20:22.694897] E [fuse-bridge.c:4336:fuse_first_lookup] >>> 0-fuse: first lookup on root failed (Transport endpoint is not connected) >>> [2019-05-17 01:20:22.699770] W >>> [fuse-resolve.c:127:fuse_resolve_gfid_cbk] 0-fuse: >>> 00000000-0000-0000-0000-000000000001: failed to resolve (Transport endpoint >>> is not connected) >>> [2019-05-17 01:20:22.699834] W [fuse-bridge.c:3294:fuse_setxattr_resume] >>> 0-glusterfs-fuse: 2: SETXATTR 00000000-0000-0000-0000-000000000001/1 >>> (trusted.add-brick) resolution failed >>> [2019-05-17 01:20:22.715656] I [fuse-bridge.c:5144:fuse_thread_proc] >>> 0-fuse: initating unmount of /tmp/mntQAtu3f >>> [2019-05-17 01:20:22.715865] W [glusterfsd.c:1500:cleanup_and_exit] >>> (-->/lib64/libpthread.so.0(+0x7dd5) [0x7fb223bf6dd5] >>> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x560886581e75] >>> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x560886581ceb] ) 0-: >>> received signum (15), shutting down >>> [2019-05-17 01:20:22.715926] I [fuse-bridge.c:5914:fini] 0-fuse: >>> Unmounting '/tmp/mntQAtu3f'. >>> [2019-05-17 01:20:22.715953] I [fuse-bridge.c:5919:fini] 0-fuse: Closing >>> fuse connection to '/tmp/mntQAtu3f'. >>> >>> Processes running on new node gfs3: >>> >>> # ps -ef | grep gluster >>> root 6832 1 0 20:17 ? 00:00:00 /usr/sbin/glusterd -p >>> /var/run/glusterd.pid --log-level INFO >>> root 15799 1 0 20:17 ? 00:00:00 /usr/sbin/glusterfs -s >>> localhost --volfile-id gluster/glustershd -p >>> /var/run/gluster/glustershd/glustershd.pid -l >>> /var/log/glusterfs/glustershd.log -S >>> /var/run/gluster/24c12b09f93eec8e.socket --xlator-option >>> *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412 --process-name >>> glustershd >>> root 16856 16735 0 21:21 pts/0 00:00:00 grep --color=auto gluster >>> >>> -- >>> David Cunningham, Voisonics Limited >>> http://voisonics.com/ >>> USA: +1 213 221 1092 >>> New Zealand: +64 (0)28 2558 3782 >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> > > -- > Thanks, > Sanju > -- David Cunningham, Voisonics Limited http://voisonics.com/ USA: +1 213 221 1092 New Zealand: +64 (0)28 2558 3782 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ravishankar at redhat.com Wed May 22 00:43:11 2019 From: ravishankar at redhat.com (Ravishankar N) Date: Wed, 22 May 2019 06:13:11 +0530 Subject: [Gluster-users] add-brick: failed: Commit failed In-Reply-To: References: Message-ID: <924b8cb6-5a61-3a7f-1591-07ffe0d80a24@redhat.com> Hi David, Could you provide the `getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0` output of all bricks and the output of `gluster volume info`? Thanks, Ravi On 22/05/19 4:57 AM, David Cunningham wrote: > Hi Sanju, > > Here's what glusterd.log says on the new arbiter server when trying to > add the node: > > [2019-05-22 00:15:05.963059] I [run.c:242:runner_log] > (-->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x3b2cd) > [0x7fe4ca9102cd] > -->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0xe6b85) > [0x7fe4ca9bbb85] -->/lib64/libglusterfs.so.0(runner_log+0x115) > [0x7fe4d5ecc955] ) 0-management: Ran script: > /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh > --volname=gvol0 --version=1 --volume-op=add-brick > --gd-workdir=/var/lib/glusterd > [2019-05-22 00:15:05.963177] I [MSGID: 106578] > [glusterd-brick-ops.c:1355:glusterd_op_perform_add_bricks] > 0-management: replica-count is set 3 > [2019-05-22 00:15:05.963228] I [MSGID: 106578] > [glusterd-brick-ops.c:1360:glusterd_op_perform_add_bricks] > 0-management: arbiter-count is set 1 > [2019-05-22 00:15:05.963257] I [MSGID: 106578] > [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks] > 0-management: type is set 0, need to change it > [2019-05-22 00:15:17.015268] E [MSGID: 106053] > [glusterd-utils.c:13942:glusterd_handle_replicate_brick_ops] > 0-management: Failed to set extended attribute trusted.add-brick : > Transport endpoint is not connected [Transport endpoint is not connected] > [2019-05-22 00:15:17.036479] E [MSGID: 106073] > [glusterd-brick-ops.c:2595:glusterd_op_add_brick] 0-glusterd: Unable > to add bricks > [2019-05-22 00:15:17.036595] E [MSGID: 106122] > [glusterd-mgmt.c:299:gd_mgmt_v3_commit_fn] 0-management: Add-brick > commit failed. > [2019-05-22 00:15:17.036710] E [MSGID: 106122] > [glusterd-mgmt-handler.c:594:glusterd_handle_commit_fn] 0-management: > commit failed on operation Add brick > > As before gvol0-add-brick-mount.log said: > > [2019-05-22 00:15:17.005695] I [fuse-bridge.c:4267:fuse_init] > 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 > kernel 7.22 > [2019-05-22 00:15:17.005749] I [fuse-bridge.c:4878:fuse_graph_sync] > 0-fuse: switched to graph 0 > [2019-05-22 00:15:17.010101] E [fuse-bridge.c:4336:fuse_first_lookup] > 0-fuse: first lookup on root failed (Transport endpoint is not connected) > [2019-05-22 00:15:17.014217] W [fuse-bridge.c:897:fuse_attr_cbk] > 0-glusterfs-fuse: 2: LOOKUP() / => -1 (Transport endpoint is not > connected) > [2019-05-22 00:15:17.015097] W > [fuse-resolve.c:127:fuse_resolve_gfid_cbk] 0-fuse: > 00000000-0000-0000-0000-000000000001: failed to resolve (Transport > endpoint is not connected) > [2019-05-22 00:15:17.015158] W > [fuse-bridge.c:3294:fuse_setxattr_resume] 0-glusterfs-fuse: 3: > SETXATTR 00000000-0000-0000-0000-000000000001/1 (trusted.add-brick) > resolution failed > [2019-05-22 00:15:17.035636] I [fuse-bridge.c:5144:fuse_thread_proc] > 0-fuse: initating unmount of /tmp/mntYGNbj9 > [2019-05-22 00:15:17.035854] W [glusterfsd.c:1500:cleanup_and_exit] > (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f7745ccedd5] > -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55c81b63de75] > -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55c81b63dceb] ) 0-: > received signum (15), shutting down > [2019-05-22 00:15:17.035942] I [fuse-bridge.c:5914:fini] 0-fuse: > Unmounting '/tmp/mntYGNbj9'. > [2019-05-22 00:15:17.035966] I [fuse-bridge.c:5919:fini] 0-fuse: > Closing fuse connection to '/tmp/mntYGNbj9'. > > Here are the processes running on the new arbiter server: > # ps -ef | grep gluster > root????? 3466???? 1? 0 20:13 ???????? 00:00:00 /usr/sbin/glusterfs -s > localhost --volfile-id gluster/glustershd -p > /var/run/gluster/glustershd/glustershd.pid -l > /var/log/glusterfs/glustershd.log -S > /var/run/gluster/24c12b09f93eec8e.socket --xlator-option > *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412 > --process-name glustershd > root????? 6832???? 1? 0 May16 ???????? 00:02:10 /usr/sbin/glusterd -p > /var/run/glusterd.pid --log-level INFO > root???? 17841???? 1? 0 May16 ???????? 00:00:58 /usr/sbin/glusterfs > --process-name fuse --volfile-server=gfs1 --volfile-id=/gvol0 > /mnt/glusterfs > > Here are the files created on the new arbiter server: > # find /nodirectwritedata/gluster/gvol0 | xargs ls -ald > drwxr-xr-x 3 root root 4096 May 21 20:15 /nodirectwritedata/gluster/gvol0 > drw------- 2 root root 4096 May 21 20:15 > /nodirectwritedata/gluster/gvol0/.glusterfs > > Thank you for your help! > > > On Tue, 21 May 2019 at 00:10, Sanju Rakonde > wrote: > > David, > > can you please attach glusterd.logs? As the error message says, > Commit failed on the arbitar node, we might be able to find some > issue on that node. > > On Mon, May 20, 2019 at 10:10 AM Nithya Balachandran > > wrote: > > > > On Fri, 17 May 2019 at 06:01, David Cunningham > > > wrote: > > Hello, > > We're adding an arbiter node to an existing volume and > having an issue. Can anyone help? The root cause error > appears to be "00000000-0000-0000-0000-000000000001: > failed to resolve (Transport endpoint is not connected)", > as below. > > We are running glusterfs 5.6.1. Thanks in advance for any > assistance! > > On existing node gfs1, trying to add new arbiter node gfs3: > > # gluster volume add-brick gvol0 replica 3 arbiter 1 > gfs3:/nodirectwritedata/gluster/gvol0 > volume add-brick: failed: Commit failed on gfs3. Please > check log file for details. > > > This looks like a glusterd issue. Please check the glusterd > logs for more info. > Adding the glusterd dev to this thread. Sanju, can you take a > look? > Regards, > Nithya > > > On new node gfs3 in gvol0-add-brick-mount.log: > > [2019-05-17 01:20:22.689721] I > [fuse-bridge.c:4267:fuse_init] 0-glusterfs-fuse: FUSE > inited with protocol versions: glusterfs 7.24 kernel 7.22 > [2019-05-17 01:20:22.689778] I > [fuse-bridge.c:4878:fuse_graph_sync] 0-fuse: switched to > graph 0 > [2019-05-17 01:20:22.694897] E > [fuse-bridge.c:4336:fuse_first_lookup] 0-fuse: first > lookup on root failed (Transport endpoint is not connected) > [2019-05-17 01:20:22.699770] W > [fuse-resolve.c:127:fuse_resolve_gfid_cbk] 0-fuse: > 00000000-0000-0000-0000-000000000001: failed to resolve > (Transport endpoint is not connected) > [2019-05-17 01:20:22.699834] W > [fuse-bridge.c:3294:fuse_setxattr_resume] > 0-glusterfs-fuse: 2: SETXATTR > 00000000-0000-0000-0000-000000000001/1 (trusted.add-brick) > resolution failed > [2019-05-17 01:20:22.715656] I > [fuse-bridge.c:5144:fuse_thread_proc] 0-fuse: initating > unmount of /tmp/mntQAtu3f > [2019-05-17 01:20:22.715865] W > [glusterfsd.c:1500:cleanup_and_exit] > (-->/lib64/libpthread.so.0(+0x7dd5) [0x7fb223bf6dd5] > -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) > [0x560886581e75] > -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) > [0x560886581ceb] ) 0-: received signum (15), shutting down > [2019-05-17 01:20:22.715926] I [fuse-bridge.c:5914:fini] > 0-fuse: Unmounting '/tmp/mntQAtu3f'. > [2019-05-17 01:20:22.715953] I [fuse-bridge.c:5919:fini] > 0-fuse: Closing fuse connection to '/tmp/mntQAtu3f'. > > Processes running on new node gfs3: > > # ps -ef | grep gluster > root????? 6832???? 1? 0 20:17 ???????? 00:00:00 > /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO > root???? 15799???? 1? 0 20:17 ???????? 00:00:00 > /usr/sbin/glusterfs -s localhost --volfile-id > gluster/glustershd -p > /var/run/gluster/glustershd/glustershd.pid -l > /var/log/glusterfs/glustershd.log -S > /var/run/gluster/24c12b09f93eec8e.socket --xlator-option > *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412 > --process-name glustershd > root???? 16856 16735? 0 21:21 pts/0??? 00:00:00 grep > --color=auto gluster > > -- > David Cunningham, Voisonics Limited > http://voisonics.com/ > USA: +1 213 221 1092 > New Zealand: +64 (0)28 2558 3782 > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > -- > Thanks, > Sanju > > > > -- > David Cunningham, Voisonics Limited > http://voisonics.com/ > USA: +1 213 221 1092 > New Zealand: +64 (0)28 2558 3782 > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From dcunningham at voisonics.com Wed May 22 01:20:56 2019 From: dcunningham at voisonics.com (David Cunningham) Date: Wed, 22 May 2019 13:20:56 +1200 Subject: [Gluster-users] add-brick: failed: Commit failed In-Reply-To: <924b8cb6-5a61-3a7f-1591-07ffe0d80a24@redhat.com> References: <924b8cb6-5a61-3a7f-1591-07ffe0d80a24@redhat.com> Message-ID: Hi Ravi, Certainly. On the existing two nodes: gfs1 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0 getfattr: Removing leading '/' from absolute path names # file: nodirectwritedata/gluster/gvol0 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.gvol0-client-2=0x000000000000000000000000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6 gfs2 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0 getfattr: Removing leading '/' from absolute path names # file: nodirectwritedata/gluster/gvol0 trusted.afr.dirty=0x000000000000000000000000 trusted.afr.gvol0-client-0=0x000000000000000000000000 trusted.afr.gvol0-client-2=0x000000000000000000000000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x000000010000000000000000ffffffff trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6 On the new node: gfs3 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0 getfattr: Removing leading '/' from absolute path names # file: nodirectwritedata/gluster/gvol0 trusted.afr.dirty=0x000000000000000000000001 trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6 Output of "gluster volume info" is the same on all 3 nodes and is: # gluster volume info Volume Name: gvol0 Type: Replicate Volume ID: fb5af69e-1c3e-4164-8b23-c1d7bec9b1b6 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: gfs1:/nodirectwritedata/gluster/gvol0 Brick2: gfs2:/nodirectwritedata/gluster/gvol0 Brick3: gfs3:/nodirectwritedata/gluster/gvol0 (arbiter) Options Reconfigured: performance.client-io-threads: off nfs.disable: on transport.address-family: inet On Wed, 22 May 2019 at 12:43, Ravishankar N wrote: > Hi David, > Could you provide the `getfattr -d -m. -e hex > /nodirectwritedata/gluster/gvol0` output of all bricks and the output of > `gluster volume info`? > > Thanks, > Ravi > On 22/05/19 4:57 AM, David Cunningham wrote: > > Hi Sanju, > > Here's what glusterd.log says on the new arbiter server when trying to add > the node: > > [2019-05-22 00:15:05.963059] I [run.c:242:runner_log] > (-->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x3b2cd) > [0x7fe4ca9102cd] > -->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0xe6b85) > [0x7fe4ca9bbb85] -->/lib64/libglusterfs.so.0(runner_log+0x115) > [0x7fe4d5ecc955] ) 0-management: Ran script: > /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh > --volname=gvol0 --version=1 --volume-op=add-brick > --gd-workdir=/var/lib/glusterd > [2019-05-22 00:15:05.963177] I [MSGID: 106578] > [glusterd-brick-ops.c:1355:glusterd_op_perform_add_bricks] 0-management: > replica-count is set 3 > [2019-05-22 00:15:05.963228] I [MSGID: 106578] > [glusterd-brick-ops.c:1360:glusterd_op_perform_add_bricks] 0-management: > arbiter-count is set 1 > [2019-05-22 00:15:05.963257] I [MSGID: 106578] > [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks] 0-management: > type is set 0, need to change it > [2019-05-22 00:15:17.015268] E [MSGID: 106053] > [glusterd-utils.c:13942:glusterd_handle_replicate_brick_ops] 0-management: > Failed to set extended attribute trusted.add-brick : Transport endpoint is > not connected [Transport endpoint is not connected] > [2019-05-22 00:15:17.036479] E [MSGID: 106073] > [glusterd-brick-ops.c:2595:glusterd_op_add_brick] 0-glusterd: Unable to add > bricks > [2019-05-22 00:15:17.036595] E [MSGID: 106122] > [glusterd-mgmt.c:299:gd_mgmt_v3_commit_fn] 0-management: Add-brick commit > failed. > [2019-05-22 00:15:17.036710] E [MSGID: 106122] > [glusterd-mgmt-handler.c:594:glusterd_handle_commit_fn] 0-management: > commit failed on operation Add brick > > As before gvol0-add-brick-mount.log said: > > [2019-05-22 00:15:17.005695] I [fuse-bridge.c:4267:fuse_init] > 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel > 7.22 > [2019-05-22 00:15:17.005749] I [fuse-bridge.c:4878:fuse_graph_sync] > 0-fuse: switched to graph 0 > [2019-05-22 00:15:17.010101] E [fuse-bridge.c:4336:fuse_first_lookup] > 0-fuse: first lookup on root failed (Transport endpoint is not connected) > [2019-05-22 00:15:17.014217] W [fuse-bridge.c:897:fuse_attr_cbk] > 0-glusterfs-fuse: 2: LOOKUP() / => -1 (Transport endpoint is not connected) > [2019-05-22 00:15:17.015097] W [fuse-resolve.c:127:fuse_resolve_gfid_cbk] > 0-fuse: 00000000-0000-0000-0000-000000000001: failed to resolve (Transport > endpoint is not connected) > [2019-05-22 00:15:17.015158] W [fuse-bridge.c:3294:fuse_setxattr_resume] > 0-glusterfs-fuse: 3: SETXATTR 00000000-0000-0000-0000-000000000001/1 > (trusted.add-brick) resolution failed > [2019-05-22 00:15:17.035636] I [fuse-bridge.c:5144:fuse_thread_proc] > 0-fuse: initating unmount of /tmp/mntYGNbj9 > [2019-05-22 00:15:17.035854] W [glusterfsd.c:1500:cleanup_and_exit] > (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f7745ccedd5] > -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55c81b63de75] > -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55c81b63dceb] ) 0-: > received signum (15), shutting down > [2019-05-22 00:15:17.035942] I [fuse-bridge.c:5914:fini] 0-fuse: > Unmounting '/tmp/mntYGNbj9'. > [2019-05-22 00:15:17.035966] I [fuse-bridge.c:5919:fini] 0-fuse: Closing > fuse connection to '/tmp/mntYGNbj9'. > > Here are the processes running on the new arbiter server: > # ps -ef | grep gluster > root 3466 1 0 20:13 ? 00:00:00 /usr/sbin/glusterfs -s > localhost --volfile-id gluster/glustershd -p > /var/run/gluster/glustershd/glustershd.pid -l > /var/log/glusterfs/glustershd.log -S > /var/run/gluster/24c12b09f93eec8e.socket --xlator-option > *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412 --process-name > glustershd > root 6832 1 0 May16 ? 00:02:10 /usr/sbin/glusterd -p > /var/run/glusterd.pid --log-level INFO > root 17841 1 0 May16 ? 00:00:58 /usr/sbin/glusterfs > --process-name fuse --volfile-server=gfs1 --volfile-id=/gvol0 /mnt/glusterfs > > Here are the files created on the new arbiter server: > # find /nodirectwritedata/gluster/gvol0 | xargs ls -ald > drwxr-xr-x 3 root root 4096 May 21 20:15 /nodirectwritedata/gluster/gvol0 > drw------- 2 root root 4096 May 21 20:15 > /nodirectwritedata/gluster/gvol0/.glusterfs > > Thank you for your help! > > > On Tue, 21 May 2019 at 00:10, Sanju Rakonde wrote: > >> David, >> >> can you please attach glusterd.logs? As the error message says, Commit >> failed on the arbitar node, we might be able to find some issue on that >> node. >> >> On Mon, May 20, 2019 at 10:10 AM Nithya Balachandran >> wrote: >> >>> >>> >>> On Fri, 17 May 2019 at 06:01, David Cunningham < >>> dcunningham at voisonics.com> wrote: >>> >>>> Hello, >>>> >>>> We're adding an arbiter node to an existing volume and having an issue. >>>> Can anyone help? The root cause error appears to be >>>> "00000000-0000-0000-0000-000000000001: failed to resolve (Transport >>>> endpoint is not connected)", as below. >>>> >>>> We are running glusterfs 5.6.1. Thanks in advance for any assistance! >>>> >>>> On existing node gfs1, trying to add new arbiter node gfs3: >>>> >>>> # gluster volume add-brick gvol0 replica 3 arbiter 1 >>>> gfs3:/nodirectwritedata/gluster/gvol0 >>>> volume add-brick: failed: Commit failed on gfs3. Please check log file >>>> for details. >>>> >>> >>> This looks like a glusterd issue. Please check the glusterd logs for >>> more info. >>> Adding the glusterd dev to this thread. Sanju, can you take a look? >>> >>> Regards, >>> Nithya >>> >>>> >>>> On new node gfs3 in gvol0-add-brick-mount.log: >>>> >>>> [2019-05-17 01:20:22.689721] I [fuse-bridge.c:4267:fuse_init] >>>> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel >>>> 7.22 >>>> [2019-05-17 01:20:22.689778] I [fuse-bridge.c:4878:fuse_graph_sync] >>>> 0-fuse: switched to graph 0 >>>> [2019-05-17 01:20:22.694897] E [fuse-bridge.c:4336:fuse_first_lookup] >>>> 0-fuse: first lookup on root failed (Transport endpoint is not connected) >>>> [2019-05-17 01:20:22.699770] W >>>> [fuse-resolve.c:127:fuse_resolve_gfid_cbk] 0-fuse: >>>> 00000000-0000-0000-0000-000000000001: failed to resolve (Transport endpoint >>>> is not connected) >>>> [2019-05-17 01:20:22.699834] W >>>> [fuse-bridge.c:3294:fuse_setxattr_resume] 0-glusterfs-fuse: 2: SETXATTR >>>> 00000000-0000-0000-0000-000000000001/1 (trusted.add-brick) resolution failed >>>> [2019-05-17 01:20:22.715656] I [fuse-bridge.c:5144:fuse_thread_proc] >>>> 0-fuse: initating unmount of /tmp/mntQAtu3f >>>> [2019-05-17 01:20:22.715865] W [glusterfsd.c:1500:cleanup_and_exit] >>>> (-->/lib64/libpthread.so.0(+0x7dd5) [0x7fb223bf6dd5] >>>> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x560886581e75] >>>> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x560886581ceb] ) 0-: >>>> received signum (15), shutting down >>>> [2019-05-17 01:20:22.715926] I [fuse-bridge.c:5914:fini] 0-fuse: >>>> Unmounting '/tmp/mntQAtu3f'. >>>> [2019-05-17 01:20:22.715953] I [fuse-bridge.c:5919:fini] 0-fuse: >>>> Closing fuse connection to '/tmp/mntQAtu3f'. >>>> >>>> Processes running on new node gfs3: >>>> >>>> # ps -ef | grep gluster >>>> root 6832 1 0 20:17 ? 00:00:00 /usr/sbin/glusterd -p >>>> /var/run/glusterd.pid --log-level INFO >>>> root 15799 1 0 20:17 ? 00:00:00 /usr/sbin/glusterfs -s >>>> localhost --volfile-id gluster/glustershd -p >>>> /var/run/gluster/glustershd/glustershd.pid -l >>>> /var/log/glusterfs/glustershd.log -S >>>> /var/run/gluster/24c12b09f93eec8e.socket --xlator-option >>>> *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412 --process-name >>>> glustershd >>>> root 16856 16735 0 21:21 pts/0 00:00:00 grep --color=auto >>>> gluster >>>> >>>> -- >>>> David Cunningham, Voisonics Limited >>>> http://voisonics.com/ >>>> USA: +1 213 221 1092 >>>> New Zealand: +64 (0)28 2558 3782 >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> >> >> -- >> Thanks, >> Sanju >> > > > -- > David Cunningham, Voisonics Limited > http://voisonics.com/ > USA: +1 213 221 1092 > New Zealand: +64 (0)28 2558 3782 > > _______________________________________________ > Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users > > -- David Cunningham, Voisonics Limited http://voisonics.com/ USA: +1 213 221 1092 New Zealand: +64 (0)28 2558 3782 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ravishankar at redhat.com Wed May 22 01:55:58 2019 From: ravishankar at redhat.com (Ravishankar N) Date: Wed, 22 May 2019 07:25:58 +0530 Subject: [Gluster-users] add-brick: failed: Commit failed In-Reply-To: References: <924b8cb6-5a61-3a7f-1591-07ffe0d80a24@redhat.com> Message-ID: <47a6c5fa-4304-4680-d63f-99ecd1e43c4c@redhat.com> Hmm, so the volume info seems to indicate that the add-brick was successful but the gfid xattr is missing on the new brick (as are the actual files, barring the .glusterfs folder, according to your previous mail). Do you want to try removing and adding it again? 1. `gluster volume remove-brick gvol0 replica 2 gfs3:/nodirectwritedata/gluster/gvol0 force` from gfs1 2. Check that gluster volume info is now back to a 1x2 volume on all nodes and `gluster peer status` is? connected on all nodes. 3. Cleanup or reformat '/nodirectwritedata/gluster/gvol0' on gfs3. 4. `gluster volume add-brick gvol0 replica 3 arbiter 1 gfs3:/nodirectwritedata/gluster/gvol0` from gfs1. 5. Check that the files are getting healed on to the new brick. Thanks, Ravi On 22/05/19 6:50 AM, David Cunningham wrote: > Hi Ravi, > > Certainly. On the existing two nodes: > > gfs1 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0 > getfattr: Removing leading '/' from absolute path names > # file: nodirectwritedata/gluster/gvol0 > trusted.afr.dirty=0x000000000000000000000000 > trusted.afr.gvol0-client-2=0x000000000000000000000000 > trusted.gfid=0x00000000000000000000000000000001 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6 > > gfs2 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0 > getfattr: Removing leading '/' from absolute path names > # file: nodirectwritedata/gluster/gvol0 > trusted.afr.dirty=0x000000000000000000000000 > trusted.afr.gvol0-client-0=0x000000000000000000000000 > trusted.afr.gvol0-client-2=0x000000000000000000000000 > trusted.gfid=0x00000000000000000000000000000001 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6 > > On the new node: > > gfs3 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0 > getfattr: Removing leading '/' from absolute path names > # file: nodirectwritedata/gluster/gvol0 > trusted.afr.dirty=0x000000000000000000000001 > trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6 > > Output of "gluster volume info" is the same on all 3 nodes and is: > > # gluster volume info > > Volume Name: gvol0 > Type: Replicate > Volume ID: fb5af69e-1c3e-4164-8b23-c1d7bec9b1b6 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x (2 + 1) = 3 > Transport-type: tcp > Bricks: > Brick1: gfs1:/nodirectwritedata/gluster/gvol0 > Brick2: gfs2:/nodirectwritedata/gluster/gvol0 > Brick3: gfs3:/nodirectwritedata/gluster/gvol0 (arbiter) > Options Reconfigured: > performance.client-io-threads: off > nfs.disable: on > transport.address-family: inet > > > On Wed, 22 May 2019 at 12:43, Ravishankar N > wrote: > > Hi David, > Could you provide the `getfattr -d -m. -e hex > /nodirectwritedata/gluster/gvol0` output of all bricks and the > output of `gluster volume info`? > > Thanks, > Ravi > On 22/05/19 4:57 AM, David Cunningham wrote: >> Hi Sanju, >> >> Here's what glusterd.log says on the new arbiter server when >> trying to add the node: >> >> [2019-05-22 00:15:05.963059] I [run.c:242:runner_log] >> (-->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x3b2cd) >> [0x7fe4ca9102cd] >> -->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0xe6b85) >> [0x7fe4ca9bbb85] -->/lib64/libglusterfs.so.0(runner_log+0x115) >> [0x7fe4d5ecc955] ) 0-management: Ran script: >> /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh >> --volname=gvol0 --version=1 --volume-op=add-brick >> --gd-workdir=/var/lib/glusterd >> [2019-05-22 00:15:05.963177] I [MSGID: 106578] >> [glusterd-brick-ops.c:1355:glusterd_op_perform_add_bricks] >> 0-management: replica-count is set 3 >> [2019-05-22 00:15:05.963228] I [MSGID: 106578] >> [glusterd-brick-ops.c:1360:glusterd_op_perform_add_bricks] >> 0-management: arbiter-count is set 1 >> [2019-05-22 00:15:05.963257] I [MSGID: 106578] >> [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks] >> 0-management: type is set 0, need to change it >> [2019-05-22 00:15:17.015268] E [MSGID: 106053] >> [glusterd-utils.c:13942:glusterd_handle_replicate_brick_ops] >> 0-management: Failed to set extended attribute trusted.add-brick >> : Transport endpoint is not connected [Transport endpoint is not >> connected] >> [2019-05-22 00:15:17.036479] E [MSGID: 106073] >> [glusterd-brick-ops.c:2595:glusterd_op_add_brick] 0-glusterd: >> Unable to add bricks >> [2019-05-22 00:15:17.036595] E [MSGID: 106122] >> [glusterd-mgmt.c:299:gd_mgmt_v3_commit_fn] 0-management: >> Add-brick commit failed. >> [2019-05-22 00:15:17.036710] E [MSGID: 106122] >> [glusterd-mgmt-handler.c:594:glusterd_handle_commit_fn] >> 0-management: commit failed on operation Add brick >> >> As before gvol0-add-brick-mount.log said: >> >> [2019-05-22 00:15:17.005695] I [fuse-bridge.c:4267:fuse_init] >> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs >> 7.24 kernel 7.22 >> [2019-05-22 00:15:17.005749] I >> [fuse-bridge.c:4878:fuse_graph_sync] 0-fuse: switched to graph 0 >> [2019-05-22 00:15:17.010101] E >> [fuse-bridge.c:4336:fuse_first_lookup] 0-fuse: first lookup on >> root failed (Transport endpoint is not connected) >> [2019-05-22 00:15:17.014217] W [fuse-bridge.c:897:fuse_attr_cbk] >> 0-glusterfs-fuse: 2: LOOKUP() / => -1 (Transport endpoint is not >> connected) >> [2019-05-22 00:15:17.015097] W >> [fuse-resolve.c:127:fuse_resolve_gfid_cbk] 0-fuse: >> 00000000-0000-0000-0000-000000000001: failed to resolve >> (Transport endpoint is not connected) >> [2019-05-22 00:15:17.015158] W >> [fuse-bridge.c:3294:fuse_setxattr_resume] 0-glusterfs-fuse: 3: >> SETXATTR 00000000-0000-0000-0000-000000000001/1 >> (trusted.add-brick) resolution failed >> [2019-05-22 00:15:17.035636] I >> [fuse-bridge.c:5144:fuse_thread_proc] 0-fuse: initating unmount >> of /tmp/mntYGNbj9 >> [2019-05-22 00:15:17.035854] W >> [glusterfsd.c:1500:cleanup_and_exit] >> (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f7745ccedd5] >> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55c81b63de75] >> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55c81b63dceb] ) >> 0-: received signum (15), shutting down >> [2019-05-22 00:15:17.035942] I [fuse-bridge.c:5914:fini] 0-fuse: >> Unmounting '/tmp/mntYGNbj9'. >> [2019-05-22 00:15:17.035966] I [fuse-bridge.c:5919:fini] 0-fuse: >> Closing fuse connection to '/tmp/mntYGNbj9'. >> >> Here are the processes running on the new arbiter server: >> # ps -ef | grep gluster >> root????? 3466???? 1? 0 20:13 ? 00:00:00 /usr/sbin/glusterfs -s >> localhost --volfile-id gluster/glustershd -p >> /var/run/gluster/glustershd/glustershd.pid -l >> /var/log/glusterfs/glustershd.log -S >> /var/run/gluster/24c12b09f93eec8e.socket --xlator-option >> *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412 >> --process-name glustershd >> root????? 6832???? 1? 0 May16 ? 00:02:10 /usr/sbin/glusterd -p >> /var/run/glusterd.pid --log-level INFO >> root???? 17841???? 1? 0 May16 ? 00:00:58 /usr/sbin/glusterfs >> --process-name fuse --volfile-server=gfs1 --volfile-id=/gvol0 >> /mnt/glusterfs >> >> Here are the files created on the new arbiter server: >> # find /nodirectwritedata/gluster/gvol0 | xargs ls -ald >> drwxr-xr-x 3 root root 4096 May 21 20:15 >> /nodirectwritedata/gluster/gvol0 >> drw------- 2 root root 4096 May 21 20:15 >> /nodirectwritedata/gluster/gvol0/.glusterfs >> >> Thank you for your help! >> >> >> On Tue, 21 May 2019 at 00:10, Sanju Rakonde > > wrote: >> >> David, >> >> can you please attach glusterd.logs? As the error message >> says, Commit failed on the arbitar node, we might be able to >> find some issue on that node. >> >> On Mon, May 20, 2019 at 10:10 AM Nithya Balachandran >> > wrote: >> >> >> >> On Fri, 17 May 2019 at 06:01, David Cunningham >> > > wrote: >> >> Hello, >> >> We're adding an arbiter node to an existing volume >> and having an issue. Can anyone help? The root cause >> error appears to be >> "00000000-0000-0000-0000-000000000001: failed to >> resolve (Transport endpoint is not connected)", as below. >> >> We are running glusterfs 5.6.1. Thanks in advance for >> any assistance! >> >> On existing node gfs1, trying to add new arbiter node >> gfs3: >> >> # gluster volume add-brick gvol0 replica 3 arbiter 1 >> gfs3:/nodirectwritedata/gluster/gvol0 >> volume add-brick: failed: Commit failed on gfs3. >> Please check log file for details. >> >> >> This looks like a glusterd issue. Please check the >> glusterd logs for more info. >> Adding the glusterd dev to this thread. Sanju, can you >> take a look? >> Regards, >> Nithya >> >> >> On new node gfs3 in gvol0-add-brick-mount.log: >> >> [2019-05-17 01:20:22.689721] I >> [fuse-bridge.c:4267:fuse_init] 0-glusterfs-fuse: FUSE >> inited with protocol versions: glusterfs 7.24 kernel 7.22 >> [2019-05-17 01:20:22.689778] I >> [fuse-bridge.c:4878:fuse_graph_sync] 0-fuse: switched >> to graph 0 >> [2019-05-17 01:20:22.694897] E >> [fuse-bridge.c:4336:fuse_first_lookup] 0-fuse: first >> lookup on root failed (Transport endpoint is not >> connected) >> [2019-05-17 01:20:22.699770] W >> [fuse-resolve.c:127:fuse_resolve_gfid_cbk] 0-fuse: >> 00000000-0000-0000-0000-000000000001: failed to >> resolve (Transport endpoint is not connected) >> [2019-05-17 01:20:22.699834] W >> [fuse-bridge.c:3294:fuse_setxattr_resume] >> 0-glusterfs-fuse: 2: SETXATTR >> 00000000-0000-0000-0000-000000000001/1 >> (trusted.add-brick) resolution failed >> [2019-05-17 01:20:22.715656] I >> [fuse-bridge.c:5144:fuse_thread_proc] 0-fuse: >> initating unmount of /tmp/mntQAtu3f >> [2019-05-17 01:20:22.715865] W >> [glusterfsd.c:1500:cleanup_and_exit] >> (-->/lib64/libpthread.so.0(+0x7dd5) [0x7fb223bf6dd5] >> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) >> [0x560886581e75] >> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) >> [0x560886581ceb] ) 0-: received signum (15), shutting >> down >> [2019-05-17 01:20:22.715926] I >> [fuse-bridge.c:5914:fini] 0-fuse: Unmounting >> '/tmp/mntQAtu3f'. >> [2019-05-17 01:20:22.715953] I >> [fuse-bridge.c:5919:fini] 0-fuse: Closing fuse >> connection to '/tmp/mntQAtu3f'. >> >> Processes running on new node gfs3: >> >> # ps -ef | grep gluster >> root????? 6832???? 1? 0 20:17 ???????? 00:00:00 >> /usr/sbin/glusterd -p /var/run/glusterd.pid >> --log-level INFO >> root???? 15799???? 1? 0 20:17 ???????? 00:00:00 >> /usr/sbin/glusterfs -s localhost --volfile-id >> gluster/glustershd -p >> /var/run/gluster/glustershd/glustershd.pid -l >> /var/log/glusterfs/glustershd.log -S >> /var/run/gluster/24c12b09f93eec8e.socket >> --xlator-option >> *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412 >> --process-name glustershd >> root???? 16856 16735? 0 21:21 pts/0??? 00:00:00 grep >> --color=auto gluster >> >> -- >> David Cunningham, Voisonics Limited >> http://voisonics.com/ >> USA: +1 213 221 1092 >> New Zealand: +64 (0)28 2558 3782 >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> >> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> >> >> -- >> Thanks, >> Sanju >> >> >> >> -- >> David Cunningham, Voisonics Limited >> http://voisonics.com/ >> USA: +1 213 221 1092 >> New Zealand: +64 (0)28 2558 3782 >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > > > -- > David Cunningham, Voisonics Limited > http://voisonics.com/ > USA: +1 213 221 1092 > New Zealand: +64 (0)28 2558 3782 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dcunningham at voisonics.com Wed May 22 05:53:11 2019 From: dcunningham at voisonics.com (David Cunningham) Date: Wed, 22 May 2019 17:53:11 +1200 Subject: [Gluster-users] add-brick: failed: Commit failed In-Reply-To: <47a6c5fa-4304-4680-d63f-99ecd1e43c4c@redhat.com> References: <924b8cb6-5a61-3a7f-1591-07ffe0d80a24@redhat.com> <47a6c5fa-4304-4680-d63f-99ecd1e43c4c@redhat.com> Message-ID: Hi Ravi, I'd already done exactly that before, where step 3 was a simple 'rm -rf /nodirectwritedata/gluster/gvol0'. Have you another suggestion on what the cleanup or reformat should be? Thank you. On Wed, 22 May 2019 at 13:56, Ravishankar N wrote: > Hmm, so the volume info seems to indicate that the add-brick was > successful but the gfid xattr is missing on the new brick (as are the > actual files, barring the .glusterfs folder, according to your previous > mail). > > Do you want to try removing and adding it again? > > 1. `gluster volume remove-brick gvol0 replica 2 > gfs3:/nodirectwritedata/gluster/gvol0 force` from gfs1 > > 2. Check that gluster volume info is now back to a 1x2 volume on all nodes > and `gluster peer status` is connected on all nodes. > > 3. Cleanup or reformat '/nodirectwritedata/gluster/gvol0' on gfs3. > > 4. `gluster volume add-brick gvol0 replica 3 arbiter 1 > gfs3:/nodirectwritedata/gluster/gvol0` from gfs1. > > 5. Check that the files are getting healed on to the new brick. > Thanks, > Ravi > On 22/05/19 6:50 AM, David Cunningham wrote: > > Hi Ravi, > > Certainly. On the existing two nodes: > > gfs1 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0 > getfattr: Removing leading '/' from absolute path names > # file: nodirectwritedata/gluster/gvol0 > trusted.afr.dirty=0x000000000000000000000000 > trusted.afr.gvol0-client-2=0x000000000000000000000000 > trusted.gfid=0x00000000000000000000000000000001 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6 > > gfs2 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0 > getfattr: Removing leading '/' from absolute path names > # file: nodirectwritedata/gluster/gvol0 > trusted.afr.dirty=0x000000000000000000000000 > trusted.afr.gvol0-client-0=0x000000000000000000000000 > trusted.afr.gvol0-client-2=0x000000000000000000000000 > trusted.gfid=0x00000000000000000000000000000001 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6 > > On the new node: > > gfs3 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0 > getfattr: Removing leading '/' from absolute path names > # file: nodirectwritedata/gluster/gvol0 > trusted.afr.dirty=0x000000000000000000000001 > trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6 > > Output of "gluster volume info" is the same on all 3 nodes and is: > > # gluster volume info > > Volume Name: gvol0 > Type: Replicate > Volume ID: fb5af69e-1c3e-4164-8b23-c1d7bec9b1b6 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x (2 + 1) = 3 > Transport-type: tcp > Bricks: > Brick1: gfs1:/nodirectwritedata/gluster/gvol0 > Brick2: gfs2:/nodirectwritedata/gluster/gvol0 > Brick3: gfs3:/nodirectwritedata/gluster/gvol0 (arbiter) > Options Reconfigured: > performance.client-io-threads: off > nfs.disable: on > transport.address-family: inet > > > On Wed, 22 May 2019 at 12:43, Ravishankar N > wrote: > >> Hi David, >> Could you provide the `getfattr -d -m. -e hex >> /nodirectwritedata/gluster/gvol0` output of all bricks and the output of >> `gluster volume info`? >> >> Thanks, >> Ravi >> On 22/05/19 4:57 AM, David Cunningham wrote: >> >> Hi Sanju, >> >> Here's what glusterd.log says on the new arbiter server when trying to >> add the node: >> >> [2019-05-22 00:15:05.963059] I [run.c:242:runner_log] >> (-->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x3b2cd) >> [0x7fe4ca9102cd] >> -->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0xe6b85) >> [0x7fe4ca9bbb85] -->/lib64/libglusterfs.so.0(runner_log+0x115) >> [0x7fe4d5ecc955] ) 0-management: Ran script: >> /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh >> --volname=gvol0 --version=1 --volume-op=add-brick >> --gd-workdir=/var/lib/glusterd >> [2019-05-22 00:15:05.963177] I [MSGID: 106578] >> [glusterd-brick-ops.c:1355:glusterd_op_perform_add_bricks] 0-management: >> replica-count is set 3 >> [2019-05-22 00:15:05.963228] I [MSGID: 106578] >> [glusterd-brick-ops.c:1360:glusterd_op_perform_add_bricks] 0-management: >> arbiter-count is set 1 >> [2019-05-22 00:15:05.963257] I [MSGID: 106578] >> [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks] 0-management: >> type is set 0, need to change it >> [2019-05-22 00:15:17.015268] E [MSGID: 106053] >> [glusterd-utils.c:13942:glusterd_handle_replicate_brick_ops] 0-management: >> Failed to set extended attribute trusted.add-brick : Transport endpoint is >> not connected [Transport endpoint is not connected] >> [2019-05-22 00:15:17.036479] E [MSGID: 106073] >> [glusterd-brick-ops.c:2595:glusterd_op_add_brick] 0-glusterd: Unable to add >> bricks >> [2019-05-22 00:15:17.036595] E [MSGID: 106122] >> [glusterd-mgmt.c:299:gd_mgmt_v3_commit_fn] 0-management: Add-brick commit >> failed. >> [2019-05-22 00:15:17.036710] E [MSGID: 106122] >> [glusterd-mgmt-handler.c:594:glusterd_handle_commit_fn] 0-management: >> commit failed on operation Add brick >> >> As before gvol0-add-brick-mount.log said: >> >> [2019-05-22 00:15:17.005695] I [fuse-bridge.c:4267:fuse_init] >> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel >> 7.22 >> [2019-05-22 00:15:17.005749] I [fuse-bridge.c:4878:fuse_graph_sync] >> 0-fuse: switched to graph 0 >> [2019-05-22 00:15:17.010101] E [fuse-bridge.c:4336:fuse_first_lookup] >> 0-fuse: first lookup on root failed (Transport endpoint is not connected) >> [2019-05-22 00:15:17.014217] W [fuse-bridge.c:897:fuse_attr_cbk] >> 0-glusterfs-fuse: 2: LOOKUP() / => -1 (Transport endpoint is not connected) >> [2019-05-22 00:15:17.015097] W [fuse-resolve.c:127:fuse_resolve_gfid_cbk] >> 0-fuse: 00000000-0000-0000-0000-000000000001: failed to resolve (Transport >> endpoint is not connected) >> [2019-05-22 00:15:17.015158] W [fuse-bridge.c:3294:fuse_setxattr_resume] >> 0-glusterfs-fuse: 3: SETXATTR 00000000-0000-0000-0000-000000000001/1 >> (trusted.add-brick) resolution failed >> [2019-05-22 00:15:17.035636] I [fuse-bridge.c:5144:fuse_thread_proc] >> 0-fuse: initating unmount of /tmp/mntYGNbj9 >> [2019-05-22 00:15:17.035854] W [glusterfsd.c:1500:cleanup_and_exit] >> (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f7745ccedd5] >> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55c81b63de75] >> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55c81b63dceb] ) 0-: >> received signum (15), shutting down >> [2019-05-22 00:15:17.035942] I [fuse-bridge.c:5914:fini] 0-fuse: >> Unmounting '/tmp/mntYGNbj9'. >> [2019-05-22 00:15:17.035966] I [fuse-bridge.c:5919:fini] 0-fuse: Closing >> fuse connection to '/tmp/mntYGNbj9'. >> >> Here are the processes running on the new arbiter server: >> # ps -ef | grep gluster >> root 3466 1 0 20:13 ? 00:00:00 /usr/sbin/glusterfs -s >> localhost --volfile-id gluster/glustershd -p >> /var/run/gluster/glustershd/glustershd.pid -l >> /var/log/glusterfs/glustershd.log -S >> /var/run/gluster/24c12b09f93eec8e.socket --xlator-option >> *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412 --process-name >> glustershd >> root 6832 1 0 May16 ? 00:02:10 /usr/sbin/glusterd -p >> /var/run/glusterd.pid --log-level INFO >> root 17841 1 0 May16 ? 00:00:58 /usr/sbin/glusterfs >> --process-name fuse --volfile-server=gfs1 --volfile-id=/gvol0 /mnt/glusterfs >> >> Here are the files created on the new arbiter server: >> # find /nodirectwritedata/gluster/gvol0 | xargs ls -ald >> drwxr-xr-x 3 root root 4096 May 21 20:15 /nodirectwritedata/gluster/gvol0 >> drw------- 2 root root 4096 May 21 20:15 >> /nodirectwritedata/gluster/gvol0/.glusterfs >> >> Thank you for your help! >> >> >> On Tue, 21 May 2019 at 00:10, Sanju Rakonde wrote: >> >>> David, >>> >>> can you please attach glusterd.logs? As the error message says, Commit >>> failed on the arbitar node, we might be able to find some issue on that >>> node. >>> >>> On Mon, May 20, 2019 at 10:10 AM Nithya Balachandran < >>> nbalacha at redhat.com> wrote: >>> >>>> >>>> >>>> On Fri, 17 May 2019 at 06:01, David Cunningham < >>>> dcunningham at voisonics.com> wrote: >>>> >>>>> Hello, >>>>> >>>>> We're adding an arbiter node to an existing volume and having an >>>>> issue. Can anyone help? The root cause error appears to be >>>>> "00000000-0000-0000-0000-000000000001: failed to resolve (Transport >>>>> endpoint is not connected)", as below. >>>>> >>>>> We are running glusterfs 5.6.1. Thanks in advance for any assistance! >>>>> >>>>> On existing node gfs1, trying to add new arbiter node gfs3: >>>>> >>>>> # gluster volume add-brick gvol0 replica 3 arbiter 1 >>>>> gfs3:/nodirectwritedata/gluster/gvol0 >>>>> volume add-brick: failed: Commit failed on gfs3. Please check log file >>>>> for details. >>>>> >>>> >>>> This looks like a glusterd issue. Please check the glusterd logs for >>>> more info. >>>> Adding the glusterd dev to this thread. Sanju, can you take a look? >>>> >>>> Regards, >>>> Nithya >>>> >>>>> >>>>> On new node gfs3 in gvol0-add-brick-mount.log: >>>>> >>>>> [2019-05-17 01:20:22.689721] I [fuse-bridge.c:4267:fuse_init] >>>>> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel >>>>> 7.22 >>>>> [2019-05-17 01:20:22.689778] I [fuse-bridge.c:4878:fuse_graph_sync] >>>>> 0-fuse: switched to graph 0 >>>>> [2019-05-17 01:20:22.694897] E [fuse-bridge.c:4336:fuse_first_lookup] >>>>> 0-fuse: first lookup on root failed (Transport endpoint is not connected) >>>>> [2019-05-17 01:20:22.699770] W >>>>> [fuse-resolve.c:127:fuse_resolve_gfid_cbk] 0-fuse: >>>>> 00000000-0000-0000-0000-000000000001: failed to resolve (Transport endpoint >>>>> is not connected) >>>>> [2019-05-17 01:20:22.699834] W >>>>> [fuse-bridge.c:3294:fuse_setxattr_resume] 0-glusterfs-fuse: 2: SETXATTR >>>>> 00000000-0000-0000-0000-000000000001/1 (trusted.add-brick) resolution failed >>>>> [2019-05-17 01:20:22.715656] I [fuse-bridge.c:5144:fuse_thread_proc] >>>>> 0-fuse: initating unmount of /tmp/mntQAtu3f >>>>> [2019-05-17 01:20:22.715865] W [glusterfsd.c:1500:cleanup_and_exit] >>>>> (-->/lib64/libpthread.so.0(+0x7dd5) [0x7fb223bf6dd5] >>>>> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x560886581e75] >>>>> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x560886581ceb] ) 0-: >>>>> received signum (15), shutting down >>>>> [2019-05-17 01:20:22.715926] I [fuse-bridge.c:5914:fini] 0-fuse: >>>>> Unmounting '/tmp/mntQAtu3f'. >>>>> [2019-05-17 01:20:22.715953] I [fuse-bridge.c:5919:fini] 0-fuse: >>>>> Closing fuse connection to '/tmp/mntQAtu3f'. >>>>> >>>>> Processes running on new node gfs3: >>>>> >>>>> # ps -ef | grep gluster >>>>> root 6832 1 0 20:17 ? 00:00:00 /usr/sbin/glusterd -p >>>>> /var/run/glusterd.pid --log-level INFO >>>>> root 15799 1 0 20:17 ? 00:00:00 /usr/sbin/glusterfs -s >>>>> localhost --volfile-id gluster/glustershd -p >>>>> /var/run/gluster/glustershd/glustershd.pid -l >>>>> /var/log/glusterfs/glustershd.log -S >>>>> /var/run/gluster/24c12b09f93eec8e.socket --xlator-option >>>>> *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412 --process-name >>>>> glustershd >>>>> root 16856 16735 0 21:21 pts/0 00:00:00 grep --color=auto >>>>> gluster >>>>> >>>>> -- >>>>> David Cunningham, Voisonics Limited >>>>> http://voisonics.com/ >>>>> USA: +1 213 221 1092 >>>>> New Zealand: +64 (0)28 2558 3782 >>>>> _______________________________________________ >>>>> Gluster-users mailing list >>>>> Gluster-users at gluster.org >>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>> >>>> >>> >>> -- >>> Thanks, >>> Sanju >>> >> >> >> -- >> David Cunningham, Voisonics Limited >> http://voisonics.com/ >> USA: +1 213 221 1092 >> New Zealand: +64 (0)28 2558 3782 >> >> _______________________________________________ >> Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users >> >> > > -- > David Cunningham, Voisonics Limited > http://voisonics.com/ > USA: +1 213 221 1092 > New Zealand: +64 (0)28 2558 3782 > > -- David Cunningham, Voisonics Limited http://voisonics.com/ USA: +1 213 221 1092 New Zealand: +64 (0)28 2558 3782 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ravishankar at redhat.com Wed May 22 06:02:04 2019 From: ravishankar at redhat.com (Ravishankar N) Date: Wed, 22 May 2019 11:32:04 +0530 Subject: [Gluster-users] add-brick: failed: Commit failed In-Reply-To: References: <924b8cb6-5a61-3a7f-1591-07ffe0d80a24@redhat.com> <47a6c5fa-4304-4680-d63f-99ecd1e43c4c@redhat.com> Message-ID: <764773c5-38d4-e427-d699-3192bf9a1005@redhat.com> On 22/05/19 11:23 AM, David Cunningham wrote: > Hi Ravi, > > I'd already done exactly that before, where step 3 was a simple 'rm > -rf /nodirectwritedata/gluster/gvol0'. Have you another suggestion on > what the cleanup or reformat should be? `rm -rf /nodirectwritedata/gluster/gvol0` does look okay to me David. Basically, '/nodirectwritedata/gluster/gvol0' must be empty and must not have any extended attributes set on it. Why fuse_first_lookup() is failing is a bit of a mystery to me at this point. :-( Regards, Ravi > > Thank you. > > > On Wed, 22 May 2019 at 13:56, Ravishankar N > wrote: > > Hmm, so the volume info seems to indicate that the add-brick was > successful but the gfid xattr is missing on the new brick (as are > the actual files, barring the .glusterfs folder, according to your > previous mail). > > Do you want to try removing and adding it again? > > 1. `gluster volume remove-brick gvol0 replica 2 > gfs3:/nodirectwritedata/gluster/gvol0 force` from gfs1 > > 2. Check that gluster volume info is now back to a 1x2 volume on > all nodes and `gluster peer status` is connected on all nodes. > > 3. Cleanup or reformat '/nodirectwritedata/gluster/gvol0' on gfs3. > > 4. `gluster volume add-brick gvol0 replica 3 arbiter 1 > gfs3:/nodirectwritedata/gluster/gvol0` from gfs1. > > 5. Check that the files are getting healed on to the new brick. > > Thanks, > Ravi > On 22/05/19 6:50 AM, David Cunningham wrote: >> Hi Ravi, >> >> Certainly. On the existing two nodes: >> >> gfs1 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0 >> getfattr: Removing leading '/' from absolute path names >> # file: nodirectwritedata/gluster/gvol0 >> trusted.afr.dirty=0x000000000000000000000000 >> trusted.afr.gvol0-client-2=0x000000000000000000000000 >> trusted.gfid=0x00000000000000000000000000000001 >> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >> trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6 >> >> gfs2 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0 >> getfattr: Removing leading '/' from absolute path names >> # file: nodirectwritedata/gluster/gvol0 >> trusted.afr.dirty=0x000000000000000000000000 >> trusted.afr.gvol0-client-0=0x000000000000000000000000 >> trusted.afr.gvol0-client-2=0x000000000000000000000000 >> trusted.gfid=0x00000000000000000000000000000001 >> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >> trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6 >> >> On the new node: >> >> gfs3 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0 >> getfattr: Removing leading '/' from absolute path names >> # file: nodirectwritedata/gluster/gvol0 >> trusted.afr.dirty=0x000000000000000000000001 >> trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6 >> >> Output of "gluster volume info" is the same on all 3 nodes and is: >> >> # gluster volume info >> >> Volume Name: gvol0 >> Type: Replicate >> Volume ID: fb5af69e-1c3e-4164-8b23-c1d7bec9b1b6 >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 1 x (2 + 1) = 3 >> Transport-type: tcp >> Bricks: >> Brick1: gfs1:/nodirectwritedata/gluster/gvol0 >> Brick2: gfs2:/nodirectwritedata/gluster/gvol0 >> Brick3: gfs3:/nodirectwritedata/gluster/gvol0 (arbiter) >> Options Reconfigured: >> performance.client-io-threads: off >> nfs.disable: on >> transport.address-family: inet >> >> >> On Wed, 22 May 2019 at 12:43, Ravishankar N >> > wrote: >> >> Hi David, >> Could you provide the `getfattr -d -m. -e hex >> /nodirectwritedata/gluster/gvol0` output of all bricks and >> the output of `gluster volume info`? >> >> Thanks, >> Ravi >> On 22/05/19 4:57 AM, David Cunningham wrote: >>> Hi Sanju, >>> >>> Here's what glusterd.log says on the new arbiter server when >>> trying to add the node: >>> >>> [2019-05-22 00:15:05.963059] I [run.c:242:runner_log] >>> (-->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x3b2cd) >>> [0x7fe4ca9102cd] >>> -->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0xe6b85) >>> [0x7fe4ca9bbb85] >>> -->/lib64/libglusterfs.so.0(runner_log+0x115) >>> [0x7fe4d5ecc955] ) 0-management: Ran script: >>> /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh >>> --volname=gvol0 --version=1 --volume-op=add-brick >>> --gd-workdir=/var/lib/glusterd >>> [2019-05-22 00:15:05.963177] I [MSGID: 106578] >>> [glusterd-brick-ops.c:1355:glusterd_op_perform_add_bricks] >>> 0-management: replica-count is set 3 >>> [2019-05-22 00:15:05.963228] I [MSGID: 106578] >>> [glusterd-brick-ops.c:1360:glusterd_op_perform_add_bricks] >>> 0-management: arbiter-count is set 1 >>> [2019-05-22 00:15:05.963257] I [MSGID: 106578] >>> [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks] >>> 0-management: type is set 0, need to change it >>> [2019-05-22 00:15:17.015268] E [MSGID: 106053] >>> [glusterd-utils.c:13942:glusterd_handle_replicate_brick_ops] >>> 0-management: Failed to set extended attribute >>> trusted.add-brick : Transport endpoint is not connected >>> [Transport endpoint is not connected] >>> [2019-05-22 00:15:17.036479] E [MSGID: 106073] >>> [glusterd-brick-ops.c:2595:glusterd_op_add_brick] >>> 0-glusterd: Unable to add bricks >>> [2019-05-22 00:15:17.036595] E [MSGID: 106122] >>> [glusterd-mgmt.c:299:gd_mgmt_v3_commit_fn] 0-management: >>> Add-brick commit failed. >>> [2019-05-22 00:15:17.036710] E [MSGID: 106122] >>> [glusterd-mgmt-handler.c:594:glusterd_handle_commit_fn] >>> 0-management: commit failed on operation Add brick >>> >>> As before gvol0-add-brick-mount.log said: >>> >>> [2019-05-22 00:15:17.005695] I >>> [fuse-bridge.c:4267:fuse_init] 0-glusterfs-fuse: FUSE inited >>> with protocol versions: glusterfs 7.24 kernel 7.22 >>> [2019-05-22 00:15:17.005749] I >>> [fuse-bridge.c:4878:fuse_graph_sync] 0-fuse: switched to graph 0 >>> [2019-05-22 00:15:17.010101] E >>> [fuse-bridge.c:4336:fuse_first_lookup] 0-fuse: first lookup >>> on root failed (Transport endpoint is not connected) >>> [2019-05-22 00:15:17.014217] W >>> [fuse-bridge.c:897:fuse_attr_cbk] 0-glusterfs-fuse: 2: >>> LOOKUP() / => -1 (Transport endpoint is not connected) >>> [2019-05-22 00:15:17.015097] W >>> [fuse-resolve.c:127:fuse_resolve_gfid_cbk] 0-fuse: >>> 00000000-0000-0000-0000-000000000001: failed to resolve >>> (Transport endpoint is not connected) >>> [2019-05-22 00:15:17.015158] W >>> [fuse-bridge.c:3294:fuse_setxattr_resume] 0-glusterfs-fuse: >>> 3: SETXATTR 00000000-0000-0000-0000-000000000001/1 >>> (trusted.add-brick) resolution failed >>> [2019-05-22 00:15:17.035636] I >>> [fuse-bridge.c:5144:fuse_thread_proc] 0-fuse: initating >>> unmount of /tmp/mntYGNbj9 >>> [2019-05-22 00:15:17.035854] W >>> [glusterfsd.c:1500:cleanup_and_exit] >>> (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f7745ccedd5] >>> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) >>> [0x55c81b63de75] >>> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) >>> [0x55c81b63dceb] ) 0-: received signum (15), shutting down >>> [2019-05-22 00:15:17.035942] I [fuse-bridge.c:5914:fini] >>> 0-fuse: Unmounting '/tmp/mntYGNbj9'. >>> [2019-05-22 00:15:17.035966] I [fuse-bridge.c:5919:fini] >>> 0-fuse: Closing fuse connection to '/tmp/mntYGNbj9'. >>> >>> Here are the processes running on the new arbiter server: >>> # ps -ef | grep gluster >>> root????? 3466???? 1? 0 20:13 ???????? 00:00:00 >>> /usr/sbin/glusterfs -s localhost --volfile-id >>> gluster/glustershd -p >>> /var/run/gluster/glustershd/glustershd.pid -l >>> /var/log/glusterfs/glustershd.log -S >>> /var/run/gluster/24c12b09f93eec8e.socket --xlator-option >>> *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412 >>> --process-name glustershd >>> root????? 6832???? 1? 0 May16 ???????? 00:02:10 >>> /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO >>> root???? 17841???? 1? 0 May16 ???????? 00:00:58 >>> /usr/sbin/glusterfs --process-name fuse >>> --volfile-server=gfs1 --volfile-id=/gvol0 /mnt/glusterfs >>> >>> Here are the files created on the new arbiter server: >>> # find /nodirectwritedata/gluster/gvol0 | xargs ls -ald >>> drwxr-xr-x 3 root root 4096 May 21 20:15 >>> /nodirectwritedata/gluster/gvol0 >>> drw------- 2 root root 4096 May 21 20:15 >>> /nodirectwritedata/gluster/gvol0/.glusterfs >>> >>> Thank you for your help! >>> >>> >>> On Tue, 21 May 2019 at 00:10, Sanju Rakonde >>> > wrote: >>> >>> David, >>> >>> can you please attach glusterd.logs? As the error >>> message says, Commit failed on the arbitar node, we >>> might be able to find some issue on that node. >>> >>> On Mon, May 20, 2019 at 10:10 AM Nithya Balachandran >>> > wrote: >>> >>> >>> >>> On Fri, 17 May 2019 at 06:01, David Cunningham >>> >> > wrote: >>> >>> Hello, >>> >>> We're adding an arbiter node to an existing >>> volume and having an issue. Can anyone help? The >>> root cause error appears to be >>> "00000000-0000-0000-0000-000000000001: failed to >>> resolve (Transport endpoint is not connected)", >>> as below. >>> >>> We are running glusterfs 5.6.1. Thanks in >>> advance for any assistance! >>> >>> On existing node gfs1, trying to add new arbiter >>> node gfs3: >>> >>> # gluster volume add-brick gvol0 replica 3 >>> arbiter 1 gfs3:/nodirectwritedata/gluster/gvol0 >>> volume add-brick: failed: Commit failed on gfs3. >>> Please check log file for details. >>> >>> >>> This looks like a glusterd issue. Please check the >>> glusterd logs for more info. >>> Adding the glusterd dev to this thread. Sanju, can >>> you take a look? >>> Regards, >>> Nithya >>> >>> >>> On new node gfs3 in gvol0-add-brick-mount.log: >>> >>> [2019-05-17 01:20:22.689721] I >>> [fuse-bridge.c:4267:fuse_init] 0-glusterfs-fuse: >>> FUSE inited with protocol versions: glusterfs >>> 7.24 kernel 7.22 >>> [2019-05-17 01:20:22.689778] I >>> [fuse-bridge.c:4878:fuse_graph_sync] 0-fuse: >>> switched to graph 0 >>> [2019-05-17 01:20:22.694897] E >>> [fuse-bridge.c:4336:fuse_first_lookup] 0-fuse: >>> first lookup on root failed (Transport endpoint >>> is not connected) >>> [2019-05-17 01:20:22.699770] W >>> [fuse-resolve.c:127:fuse_resolve_gfid_cbk] >>> 0-fuse: 00000000-0000-0000-0000-000000000001: >>> failed to resolve (Transport endpoint is not >>> connected) >>> [2019-05-17 01:20:22.699834] W >>> [fuse-bridge.c:3294:fuse_setxattr_resume] >>> 0-glusterfs-fuse: 2: SETXATTR >>> 00000000-0000-0000-0000-000000000001/1 >>> (trusted.add-brick) resolution failed >>> [2019-05-17 01:20:22.715656] I >>> [fuse-bridge.c:5144:fuse_thread_proc] 0-fuse: >>> initating unmount of /tmp/mntQAtu3f >>> [2019-05-17 01:20:22.715865] W >>> [glusterfsd.c:1500:cleanup_and_exit] >>> (-->/lib64/libpthread.so.0(+0x7dd5) >>> [0x7fb223bf6dd5] >>> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) >>> [0x560886581e75] >>> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) >>> [0x560886581ceb] ) 0-: received signum (15), >>> shutting down >>> [2019-05-17 01:20:22.715926] I >>> [fuse-bridge.c:5914:fini] 0-fuse: Unmounting >>> '/tmp/mntQAtu3f'. >>> [2019-05-17 01:20:22.715953] I >>> [fuse-bridge.c:5919:fini] 0-fuse: Closing fuse >>> connection to '/tmp/mntQAtu3f'. >>> >>> Processes running on new node gfs3: >>> >>> # ps -ef | grep gluster >>> root????? 6832 1? 0 20:17 ???????? 00:00:00 >>> /usr/sbin/glusterd -p /var/run/glusterd.pid >>> --log-level INFO >>> root???? 15799 1? 0 20:17 ???????? 00:00:00 >>> /usr/sbin/glusterfs -s localhost --volfile-id >>> gluster/glustershd -p >>> /var/run/gluster/glustershd/glustershd.pid -l >>> /var/log/glusterfs/glustershd.log -S >>> /var/run/gluster/24c12b09f93eec8e.socket >>> --xlator-option >>> *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412 >>> --process-name glustershd >>> root???? 16856 16735? 0 21:21 pts/0??? 00:00:00 >>> grep --color=auto gluster >>> >>> -- >>> David Cunningham, Voisonics Limited >>> http://voisonics.com/ >>> USA: +1 213 221 1092 >>> New Zealand: +64 (0)28 2558 3782 >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> >>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> >>> >>> -- >>> Thanks, >>> Sanju >>> >>> >>> >>> -- >>> David Cunningham, Voisonics Limited >>> http://voisonics.com/ >>> USA: +1 213 221 1092 >>> New Zealand: +64 (0)28 2558 3782 >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> >> >> -- >> David Cunningham, Voisonics Limited >> http://voisonics.com/ >> USA: +1 213 221 1092 >> New Zealand: +64 (0)28 2558 3782 > > > > -- > David Cunningham, Voisonics Limited > http://voisonics.com/ > USA: +1 213 221 1092 > New Zealand: +64 (0)28 2558 3782 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ravishankar at redhat.com Wed May 22 06:06:36 2019 From: ravishankar at redhat.com (Ravishankar N) Date: Wed, 22 May 2019 11:36:36 +0530 Subject: [Gluster-users] add-brick: failed: Commit failed In-Reply-To: <764773c5-38d4-e427-d699-3192bf9a1005@redhat.com> References: <924b8cb6-5a61-3a7f-1591-07ffe0d80a24@redhat.com> <47a6c5fa-4304-4680-d63f-99ecd1e43c4c@redhat.com> <764773c5-38d4-e427-d699-3192bf9a1005@redhat.com> Message-ID: If you are trying this again, please 'gluster volume set $volname client-log-level DEBUG`before attempting the add-brick and attach the gvol0-add-brick-mount.log here. After that, you can change the client-log-level back to INFO. -Ravi On 22/05/19 11:32 AM, Ravishankar N wrote: > > > On 22/05/19 11:23 AM, David Cunningham wrote: >> Hi Ravi, >> >> I'd already done exactly that before, where step 3 was a simple 'rm >> -rf /nodirectwritedata/gluster/gvol0'. Have you another suggestion on >> what the cleanup or reformat should be? > `rm -rf /nodirectwritedata/gluster/gvol0` does look okay to me David. > Basically, '/nodirectwritedata/gluster/gvol0' must be empty and must > not have any extended attributes set on it. Why fuse_first_lookup() is > failing is a bit of a mystery to me at this point. :-( > Regards, > Ravi >> >> Thank you. >> >> >> On Wed, 22 May 2019 at 13:56, Ravishankar N > > wrote: >> >> Hmm, so the volume info seems to indicate that the add-brick was >> successful but the gfid xattr is missing on the new brick (as are >> the actual files, barring the .glusterfs folder, according to >> your previous mail). >> >> Do you want to try removing and adding it again? >> >> 1. `gluster volume remove-brick gvol0 replica 2 >> gfs3:/nodirectwritedata/gluster/gvol0 force` from gfs1 >> >> 2. Check that gluster volume info is now back to a 1x2 volume on >> all nodes and `gluster peer status` is connected on all nodes. >> >> 3. Cleanup or reformat '/nodirectwritedata/gluster/gvol0' on gfs3. >> >> 4. `gluster volume add-brick gvol0 replica 3 arbiter 1 >> gfs3:/nodirectwritedata/gluster/gvol0` from gfs1. >> >> 5. Check that the files are getting healed on to the new brick. >> >> Thanks, >> Ravi >> On 22/05/19 6:50 AM, David Cunningham wrote: >>> Hi Ravi, >>> >>> Certainly. On the existing two nodes: >>> >>> gfs1 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0 >>> getfattr: Removing leading '/' from absolute path names >>> # file: nodirectwritedata/gluster/gvol0 >>> trusted.afr.dirty=0x000000000000000000000000 >>> trusted.afr.gvol0-client-2=0x000000000000000000000000 >>> trusted.gfid=0x00000000000000000000000000000001 >>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>> trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6 >>> >>> gfs2 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0 >>> getfattr: Removing leading '/' from absolute path names >>> # file: nodirectwritedata/gluster/gvol0 >>> trusted.afr.dirty=0x000000000000000000000000 >>> trusted.afr.gvol0-client-0=0x000000000000000000000000 >>> trusted.afr.gvol0-client-2=0x000000000000000000000000 >>> trusted.gfid=0x00000000000000000000000000000001 >>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>> trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6 >>> >>> On the new node: >>> >>> gfs3 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0 >>> getfattr: Removing leading '/' from absolute path names >>> # file: nodirectwritedata/gluster/gvol0 >>> trusted.afr.dirty=0x000000000000000000000001 >>> trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6 >>> >>> Output of "gluster volume info" is the same on all 3 nodes and is: >>> >>> # gluster volume info >>> >>> Volume Name: gvol0 >>> Type: Replicate >>> Volume ID: fb5af69e-1c3e-4164-8b23-c1d7bec9b1b6 >>> Status: Started >>> Snapshot Count: 0 >>> Number of Bricks: 1 x (2 + 1) = 3 >>> Transport-type: tcp >>> Bricks: >>> Brick1: gfs1:/nodirectwritedata/gluster/gvol0 >>> Brick2: gfs2:/nodirectwritedata/gluster/gvol0 >>> Brick3: gfs3:/nodirectwritedata/gluster/gvol0 (arbiter) >>> Options Reconfigured: >>> performance.client-io-threads: off >>> nfs.disable: on >>> transport.address-family: inet >>> >>> >>> On Wed, 22 May 2019 at 12:43, Ravishankar N >>> > wrote: >>> >>> Hi David, >>> Could you provide the `getfattr -d -m. -e hex >>> /nodirectwritedata/gluster/gvol0` output of all bricks and >>> the output of `gluster volume info`? >>> >>> Thanks, >>> Ravi >>> On 22/05/19 4:57 AM, David Cunningham wrote: >>>> Hi Sanju, >>>> >>>> Here's what glusterd.log says on the new arbiter server >>>> when trying to add the node: >>>> >>>> [2019-05-22 00:15:05.963059] I [run.c:242:runner_log] >>>> (-->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x3b2cd) >>>> [0x7fe4ca9102cd] >>>> -->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0xe6b85) >>>> [0x7fe4ca9bbb85] >>>> -->/lib64/libglusterfs.so.0(runner_log+0x115) >>>> [0x7fe4d5ecc955] ) 0-management: Ran script: >>>> /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh >>>> --volname=gvol0 --version=1 --volume-op=add-brick >>>> --gd-workdir=/var/lib/glusterd >>>> [2019-05-22 00:15:05.963177] I [MSGID: 106578] >>>> [glusterd-brick-ops.c:1355:glusterd_op_perform_add_bricks] >>>> 0-management: replica-count is set 3 >>>> [2019-05-22 00:15:05.963228] I [MSGID: 106578] >>>> [glusterd-brick-ops.c:1360:glusterd_op_perform_add_bricks] >>>> 0-management: arbiter-count is set 1 >>>> [2019-05-22 00:15:05.963257] I [MSGID: 106578] >>>> [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks] >>>> 0-management: type is set 0, need to change it >>>> [2019-05-22 00:15:17.015268] E [MSGID: 106053] >>>> [glusterd-utils.c:13942:glusterd_handle_replicate_brick_ops] >>>> 0-management: Failed to set extended attribute >>>> trusted.add-brick : Transport endpoint is not connected >>>> [Transport endpoint is not connected] >>>> [2019-05-22 00:15:17.036479] E [MSGID: 106073] >>>> [glusterd-brick-ops.c:2595:glusterd_op_add_brick] >>>> 0-glusterd: Unable to add bricks >>>> [2019-05-22 00:15:17.036595] E [MSGID: 106122] >>>> [glusterd-mgmt.c:299:gd_mgmt_v3_commit_fn] 0-management: >>>> Add-brick commit failed. >>>> [2019-05-22 00:15:17.036710] E [MSGID: 106122] >>>> [glusterd-mgmt-handler.c:594:glusterd_handle_commit_fn] >>>> 0-management: commit failed on operation Add brick >>>> >>>> As before gvol0-add-brick-mount.log said: >>>> >>>> [2019-05-22 00:15:17.005695] I >>>> [fuse-bridge.c:4267:fuse_init] 0-glusterfs-fuse: FUSE >>>> inited with protocol versions: glusterfs 7.24 kernel 7.22 >>>> [2019-05-22 00:15:17.005749] I >>>> [fuse-bridge.c:4878:fuse_graph_sync] 0-fuse: switched to >>>> graph 0 >>>> [2019-05-22 00:15:17.010101] E >>>> [fuse-bridge.c:4336:fuse_first_lookup] 0-fuse: first lookup >>>> on root failed (Transport endpoint is not connected) >>>> [2019-05-22 00:15:17.014217] W >>>> [fuse-bridge.c:897:fuse_attr_cbk] 0-glusterfs-fuse: 2: >>>> LOOKUP() / => -1 (Transport endpoint is not connected) >>>> [2019-05-22 00:15:17.015097] W >>>> [fuse-resolve.c:127:fuse_resolve_gfid_cbk] 0-fuse: >>>> 00000000-0000-0000-0000-000000000001: failed to resolve >>>> (Transport endpoint is not connected) >>>> [2019-05-22 00:15:17.015158] W >>>> [fuse-bridge.c:3294:fuse_setxattr_resume] 0-glusterfs-fuse: >>>> 3: SETXATTR 00000000-0000-0000-0000-000000000001/1 >>>> (trusted.add-brick) resolution failed >>>> [2019-05-22 00:15:17.035636] I >>>> [fuse-bridge.c:5144:fuse_thread_proc] 0-fuse: initating >>>> unmount of /tmp/mntYGNbj9 >>>> [2019-05-22 00:15:17.035854] W >>>> [glusterfsd.c:1500:cleanup_and_exit] >>>> (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f7745ccedd5] >>>> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) >>>> [0x55c81b63de75] >>>> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) >>>> [0x55c81b63dceb] ) 0-: received signum (15), shutting down >>>> [2019-05-22 00:15:17.035942] I [fuse-bridge.c:5914:fini] >>>> 0-fuse: Unmounting '/tmp/mntYGNbj9'. >>>> [2019-05-22 00:15:17.035966] I [fuse-bridge.c:5919:fini] >>>> 0-fuse: Closing fuse connection to '/tmp/mntYGNbj9'. >>>> >>>> Here are the processes running on the new arbiter server: >>>> # ps -ef | grep gluster >>>> root????? 3466???? 1? 0 20:13 ???????? 00:00:00 >>>> /usr/sbin/glusterfs -s localhost --volfile-id >>>> gluster/glustershd -p >>>> /var/run/gluster/glustershd/glustershd.pid -l >>>> /var/log/glusterfs/glustershd.log -S >>>> /var/run/gluster/24c12b09f93eec8e.socket --xlator-option >>>> *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412 >>>> --process-name glustershd >>>> root????? 6832???? 1? 0 May16 ???????? 00:02:10 >>>> /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO >>>> root???? 17841???? 1? 0 May16 ???????? 00:00:58 >>>> /usr/sbin/glusterfs --process-name fuse >>>> --volfile-server=gfs1 --volfile-id=/gvol0 /mnt/glusterfs >>>> >>>> Here are the files created on the new arbiter server: >>>> # find /nodirectwritedata/gluster/gvol0 | xargs ls -ald >>>> drwxr-xr-x 3 root root 4096 May 21 20:15 >>>> /nodirectwritedata/gluster/gvol0 >>>> drw------- 2 root root 4096 May 21 20:15 >>>> /nodirectwritedata/gluster/gvol0/.glusterfs >>>> >>>> Thank you for your help! >>>> >>>> >>>> On Tue, 21 May 2019 at 00:10, Sanju Rakonde >>>> > wrote: >>>> >>>> David, >>>> >>>> can you please attach glusterd.logs? As the error >>>> message says, Commit failed on the arbitar node, we >>>> might be able to find some issue on that node. >>>> >>>> On Mon, May 20, 2019 at 10:10 AM Nithya Balachandran >>>> > wrote: >>>> >>>> >>>> >>>> On Fri, 17 May 2019 at 06:01, David Cunningham >>>> >>> > wrote: >>>> >>>> Hello, >>>> >>>> We're adding an arbiter node to an existing >>>> volume and having an issue. Can anyone help? >>>> The root cause error appears to be >>>> "00000000-0000-0000-0000-000000000001: failed >>>> to resolve (Transport endpoint is not >>>> connected)", as below. >>>> >>>> We are running glusterfs 5.6.1. Thanks in >>>> advance for any assistance! >>>> >>>> On existing node gfs1, trying to add new >>>> arbiter node gfs3: >>>> >>>> # gluster volume add-brick gvol0 replica 3 >>>> arbiter 1 gfs3:/nodirectwritedata/gluster/gvol0 >>>> volume add-brick: failed: Commit failed on >>>> gfs3. Please check log file for details. >>>> >>>> >>>> This looks like a glusterd issue. Please check the >>>> glusterd logs for more info. >>>> Adding the glusterd dev to this thread. Sanju, can >>>> you take a look? >>>> Regards, >>>> Nithya >>>> >>>> >>>> On new node gfs3 in gvol0-add-brick-mount.log: >>>> >>>> [2019-05-17 01:20:22.689721] I >>>> [fuse-bridge.c:4267:fuse_init] >>>> 0-glusterfs-fuse: FUSE inited with protocol >>>> versions: glusterfs 7.24 kernel 7.22 >>>> [2019-05-17 01:20:22.689778] I >>>> [fuse-bridge.c:4878:fuse_graph_sync] 0-fuse: >>>> switched to graph 0 >>>> [2019-05-17 01:20:22.694897] E >>>> [fuse-bridge.c:4336:fuse_first_lookup] 0-fuse: >>>> first lookup on root failed (Transport endpoint >>>> is not connected) >>>> [2019-05-17 01:20:22.699770] W >>>> [fuse-resolve.c:127:fuse_resolve_gfid_cbk] >>>> 0-fuse: 00000000-0000-0000-0000-000000000001: >>>> failed to resolve (Transport endpoint is not >>>> connected) >>>> [2019-05-17 01:20:22.699834] W >>>> [fuse-bridge.c:3294:fuse_setxattr_resume] >>>> 0-glusterfs-fuse: 2: SETXATTR >>>> 00000000-0000-0000-0000-000000000001/1 >>>> (trusted.add-brick) resolution failed >>>> [2019-05-17 01:20:22.715656] I >>>> [fuse-bridge.c:5144:fuse_thread_proc] 0-fuse: >>>> initating unmount of /tmp/mntQAtu3f >>>> [2019-05-17 01:20:22.715865] W >>>> [glusterfsd.c:1500:cleanup_and_exit] >>>> (-->/lib64/libpthread.so.0(+0x7dd5) >>>> [0x7fb223bf6dd5] >>>> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) >>>> [0x560886581e75] >>>> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) >>>> [0x560886581ceb] ) 0-: received signum (15), >>>> shutting down >>>> [2019-05-17 01:20:22.715926] I >>>> [fuse-bridge.c:5914:fini] 0-fuse: Unmounting >>>> '/tmp/mntQAtu3f'. >>>> [2019-05-17 01:20:22.715953] I >>>> [fuse-bridge.c:5919:fini] 0-fuse: Closing fuse >>>> connection to '/tmp/mntQAtu3f'. >>>> >>>> Processes running on new node gfs3: >>>> >>>> # ps -ef | grep gluster >>>> root 6832???? 1? 0 20:17 ? 00:00:00 >>>> /usr/sbin/glusterd -p /var/run/glusterd.pid >>>> --log-level INFO >>>> root 15799???? 1? 0 20:17 ? 00:00:00 >>>> /usr/sbin/glusterfs -s localhost --volfile-id >>>> gluster/glustershd -p >>>> /var/run/gluster/glustershd/glustershd.pid -l >>>> /var/log/glusterfs/glustershd.log -S >>>> /var/run/gluster/24c12b09f93eec8e.socket >>>> --xlator-option >>>> *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412 >>>> --process-name glustershd >>>> root???? 16856 16735? 0 21:21 pts/0 00:00:00 >>>> grep --color=auto gluster >>>> >>>> -- >>>> David Cunningham, Voisonics Limited >>>> http://voisonics.com/ >>>> USA: +1 213 221 1092 >>>> New Zealand: +64 (0)28 2558 3782 >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> >>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>> >>>> >>>> >>>> -- >>>> Thanks, >>>> Sanju >>>> >>>> >>>> >>>> -- >>>> David Cunningham, Voisonics Limited >>>> http://voisonics.com/ >>>> USA: +1 213 221 1092 >>>> New Zealand: +64 (0)28 2558 3782 >>>> >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> >>> >>> -- >>> David Cunningham, Voisonics Limited >>> http://voisonics.com/ >>> USA: +1 213 221 1092 >>> New Zealand: +64 (0)28 2558 3782 >> >> >> >> -- >> David Cunningham, Voisonics Limited >> http://voisonics.com/ >> USA: +1 213 221 1092 >> New Zealand: +64 (0)28 2558 3782 -------------- next part -------------- An HTML attachment was scrubbed... URL: From revirii at googlemail.com Wed May 22 07:09:46 2019 From: revirii at googlemail.com (Hu Bert) Date: Wed, 22 May 2019 09:09:46 +0200 Subject: [Gluster-users] gluster 5.6: Gfid mismatch detected Message-ID: Hi @ll, today i updated and rebooted the 3 servers of my replicate 3 setup; after the 3rd one came up again i noticed this error: [2019-05-22 06:41:26.781165] E [MSGID: 108008] [afr-self-heal-common.c:392:afr_gfid_split_brain_source] 0-workdata-replicate-0: Gfid mismatch detected for /120710351>, 82025ab3-8034-4257-9628-d8ebde909629 on workdata-client-2 and eaf2f31e-b4a7-4fa8-b710-d6ff9cd4eace on workdata-client-1. [2019-05-22 06:41:27.069969] W [MSGID: 108027] [afr-common.c:2270:afr_attempt_readsubvol_set] 0-workdata-replicate-0: no read subvols for /staticmap/120/710/120710351 [2019-05-22 06:41:27.808532] W [fuse-bridge.c:582:fuse_entry_cbk] 0-glusterfs-fuse: 1834335: LOOKUP() /staticmap/120/710/120710351 => -1 (Transport endpoint is not connected) A simple 'gluster volume heal workdata' didn't help; 'gluster volume heal workdata info' says: Brick gluster1:/gluster/md4/workdata /staticmap/120/710 /staticmap/120/710/120710351 Status: Connected Number of entries: 3 Brick gluster2:/gluster/md4/workdata /staticmap/120/710 /staticmap/120/710/120710351 Status: Connected Number of entries: 3 Brick gluster3:/gluster/md4/workdata /staticmap/120/710/120710351 Status: Connected Number of entries: 1 There's a mismatch in one directory; I tried to follow these instructions: https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/ gluster volume heal workdata split-brain source-brick gluster1:/gluster/md4/workdata gfid:fe7fdbe8-9a39-4793-8d38-6dfdd3d5089b Healing gfid:fe7fdbe8-9a39-4793-8d38-6dfdd3d5089b failed: File not in split-brain. Volume heal failed. Is there any other documentation for gfid mismatch and how to resolve this? Thx, Hubert From ravishankar at redhat.com Wed May 22 07:32:38 2019 From: ravishankar at redhat.com (Ravishankar N) Date: Wed, 22 May 2019 13:02:38 +0530 Subject: [Gluster-users] gluster 5.6: Gfid mismatch detected In-Reply-To: References: Message-ID: <46d9718a-491f-0d91-7721-30267727684f@redhat.com> On 22/05/19 12:39 PM, Hu Bert wrote: > Hi @ll, > > today i updated and rebooted the 3 servers of my replicate 3 setup; > after the 3rd one came up again i noticed this error: > > [2019-05-22 06:41:26.781165] E [MSGID: 108008] > [afr-self-heal-common.c:392:afr_gfid_split_brain_source] > 0-workdata-replicate-0: Gfid mismatch detected for > /120710351>, > 82025ab3-8034-4257-9628-d8ebde909629 on workdata-client-2 and > eaf2f31e-b4a7-4fa8-b710-d6ff9cd4eace on workdata-client-1. 120710351 seems to be the entry that is in split-brain. Is /staticmap/120/710/120710351 the complete path to that entry? (check if gfid:751233b0-7789-4550-bd95-4dd9c8f57c19 corresponds to the gfid of 710). You can then try "gluster volume heal workdata split-brain source-brick gluster1:/gluster/md4/workdata /staticmap/120/710/120710351" -Ravi > [2019-05-22 06:41:27.069969] W [MSGID: 108027] > [afr-common.c:2270:afr_attempt_readsubvol_set] 0-workdata-replicate-0: > no read subvols for /staticmap/120/710/120710351 > [2019-05-22 06:41:27.808532] W [fuse-bridge.c:582:fuse_entry_cbk] > 0-glusterfs-fuse: 1834335: LOOKUP() /staticmap/120/710/120710351 => -1 > (Transport endpoint is not connected) > > A simple 'gluster volume heal workdata' didn't help; 'gluster volume > heal workdata info' says: > > Brick gluster1:/gluster/md4/workdata > /staticmap/120/710 > /staticmap/120/710/120710351 > > Status: Connected > Number of entries: 3 > > Brick gluster2:/gluster/md4/workdata > /staticmap/120/710 > /staticmap/120/710/120710351 > > Status: Connected > Number of entries: 3 > > Brick gluster3:/gluster/md4/workdata > /staticmap/120/710/120710351 > Status: Connected > Number of entries: 1 > > There's a mismatch in one directory; I tried to follow these instructions: > https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/ > > gluster volume heal workdata split-brain source-brick > gluster1:/gluster/md4/workdata > gfid:fe7fdbe8-9a39-4793-8d38-6dfdd3d5089b > Healing gfid:fe7fdbe8-9a39-4793-8d38-6dfdd3d5089b failed: File not in > split-brain. > Volume heal failed. > > Is there any other documentation for gfid mismatch and how to resolve this? > > > Thx, > Hubert > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users From revirii at googlemail.com Wed May 22 07:59:13 2019 From: revirii at googlemail.com (Hu Bert) Date: Wed, 22 May 2019 09:59:13 +0200 Subject: [Gluster-users] gluster 5.6: Gfid mismatch detected In-Reply-To: <46d9718a-491f-0d91-7721-30267727684f@redhat.com> References: <46d9718a-491f-0d91-7721-30267727684f@redhat.com> Message-ID: Hi Ravi, mount path of the volume is /shared/public, so complete paths are /shared/public/staticmap/120/710/ and /shared/public/staticmap/120/710/120710351/ . getfattr -n glusterfs.gfid.string /shared/public/staticmap/120/710/ getfattr: Removing leading '/' from absolute path names # file: shared/public/staticmap/120/710/ glusterfs.gfid.string="751233b0-7789-4550-bd95-4dd9c8f57c19" getfattr -n glusterfs.gfid.string /shared/public/staticmap/120/710/120710351/ getfattr: Removing leading '/' from absolute path names # file: shared/public/staticmap/120/710/120710351/ glusterfs.gfid.string="eaf2f31e-b4a7-4fa8-b710-d6ff9cd4eace" So that fits. It somehow took a couple of attempts to resolve this, and none of the commands seem to have "officially" succeeded: gluster3 (host with the "fail"): gluster volume heal workdata split-brain source-brick gluster1:/gluster/md4/workdata /shared/public/staticmap/120/710/120710351/ Lookup failed on /shared/public/staticmap/120/710:No such file or directory Volume heal failed. gluster1 ("good" host): gluster volume heal workdata split-brain source-brick gluster1:/gluster/md4/workdata /shared/public/staticmap/120/710/120710351/ Lookup failed on /shared/public/staticmap/120/710:No such file or directory Volume heal failed. Only in the logs i see: [2019-05-22 07:42:22.004182] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-workdata-replicate-0: performing metadata selfheal on eaf2f31e-b4a7-4fa8-b710-d6ff9cd4eace [2019-05-22 07:42:22.008502] I [MSGID: 108026] [afr-self-heal-common.c:1729:afr_log_selfheal] 0-workdata-replicate-0: Completed metadata selfheal on eaf2f31e-b4a7-4fa8-b710-d6ff9cd4eace. sources=0 [1] sinks=2 And via "gluster volume heal workdata statistics heal-count" there are 0 entries left. Files/directories are there. Happened the first time with this setup, but everything ok now. Thx for your fast help :-) Hubert Am Mi., 22. Mai 2019 um 09:32 Uhr schrieb Ravishankar N : > > > On 22/05/19 12:39 PM, Hu Bert wrote: > > Hi @ll, > > > > today i updated and rebooted the 3 servers of my replicate 3 setup; > > after the 3rd one came up again i noticed this error: > > > > [2019-05-22 06:41:26.781165] E [MSGID: 108008] > > [afr-self-heal-common.c:392:afr_gfid_split_brain_source] > > 0-workdata-replicate-0: Gfid mismatch detected for > > /120710351>, > > 82025ab3-8034-4257-9628-d8ebde909629 on workdata-client-2 and > > eaf2f31e-b4a7-4fa8-b710-d6ff9cd4eace on workdata-client-1. > > 120710351 seems to be the entry that is in split-brain. Is > /staticmap/120/710/120710351 the complete path to that entry? (check if > gfid:751233b0-7789-4550-bd95-4dd9c8f57c19 corresponds to the gfid of 710). > > You can then try "gluster volume heal workdata split-brain source-brick > gluster1:/gluster/md4/workdata /staticmap/120/710/120710351" > > -Ravi > > > [2019-05-22 06:41:27.069969] W [MSGID: 108027] > > [afr-common.c:2270:afr_attempt_readsubvol_set] 0-workdata-replicate-0: > > no read subvols for /staticmap/120/710/120710351 > > [2019-05-22 06:41:27.808532] W [fuse-bridge.c:582:fuse_entry_cbk] > > 0-glusterfs-fuse: 1834335: LOOKUP() /staticmap/120/710/120710351 => -1 > > (Transport endpoint is not connected) > > > > A simple 'gluster volume heal workdata' didn't help; 'gluster volume > > heal workdata info' says: > > > > Brick gluster1:/gluster/md4/workdata > > /staticmap/120/710 > > /staticmap/120/710/120710351 > > > > Status: Connected > > Number of entries: 3 > > > > Brick gluster2:/gluster/md4/workdata > > /staticmap/120/710 > > /staticmap/120/710/120710351 > > > > Status: Connected > > Number of entries: 3 > > > > Brick gluster3:/gluster/md4/workdata > > /staticmap/120/710/120710351 > > Status: Connected > > Number of entries: 1 > > > > There's a mismatch in one directory; I tried to follow these instructions: > > https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/ > > > > gluster volume heal workdata split-brain source-brick > > gluster1:/gluster/md4/workdata > > gfid:fe7fdbe8-9a39-4793-8d38-6dfdd3d5089b > > Healing gfid:fe7fdbe8-9a39-4793-8d38-6dfdd3d5089b failed: File not in > > split-brain. > > Volume heal failed. > > > > > Is there any other documentation for gfid mismatch and how to resolve this? > > > > > > Thx, > > Hubert > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users From ravishankar at redhat.com Wed May 22 08:53:49 2019 From: ravishankar at redhat.com (Ravishankar N) Date: Wed, 22 May 2019 14:23:49 +0530 Subject: [Gluster-users] gluster 5.6: Gfid mismatch detected In-Reply-To: References: <46d9718a-491f-0d91-7721-30267727684f@redhat.com> Message-ID: <36b7e1b1-6fb6-88c9-b8e3-341dc67a44c9@redhat.com> On 22/05/19 1:29 PM, Hu Bert wrote: > Hi Ravi, > > mount path of the volume is /shared/public, so complete paths are > /shared/public/staticmap/120/710/ and > /shared/public/staticmap/120/710/120710351/ . > > getfattr -n glusterfs.gfid.string /shared/public/staticmap/120/710/ > getfattr: Removing leading '/' from absolute path names > # file: shared/public/staticmap/120/710/ > glusterfs.gfid.string="751233b0-7789-4550-bd95-4dd9c8f57c19" > > getfattr -n glusterfs.gfid.string /shared/public/staticmap/120/710/120710351/ > getfattr: Removing leading '/' from absolute path names > # file: shared/public/staticmap/120/710/120710351/ > glusterfs.gfid.string="eaf2f31e-b4a7-4fa8-b710-d6ff9cd4eace" > > So that fits. It somehow took a couple of attempts to resolve this, > and none of the commands seem to have "officially" succeeded: > > gluster3 (host with the "fail"): > gluster volume heal workdata split-brain source-brick > gluster1:/gluster/md4/workdata > /shared/public/staticmap/120/710/120710351/ > Lookup failed on /shared/public/staticmap/120/710:No such file or directory The file path given to this command must be the absolute path /as seen from the root of the volume/. So the location where it is mounted (/shared/public) must be omitted. Only /staticmap/120/710/120710351/ is required. HTH, Ravi > Volume heal failed. > > gluster1 ("good" host): > gluster volume heal workdata split-brain source-brick > gluster1:/gluster/md4/workdata > /shared/public/staticmap/120/710/120710351/ > Lookup failed on /shared/public/staticmap/120/710:No such file or directory > Volume heal failed. > > Only in the logs i see: > > [2019-05-22 07:42:22.004182] I [MSGID: 108026] > [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] > 0-workdata-replicate-0: performing metadata selfheal on > eaf2f31e-b4a7-4fa8-b710-d6ff9cd4eace > [2019-05-22 07:42:22.008502] I [MSGID: 108026] > [afr-self-heal-common.c:1729:afr_log_selfheal] 0-workdata-replicate-0: > Completed metadata selfheal on eaf2f31e-b4a7-4fa8-b710-d6ff9cd4eace. > sources=0 [1] sinks=2 > > And via "gluster volume heal workdata statistics heal-count" there are > 0 entries left. Files/directories are there. Happened the first time > with this setup, but everything ok now. > > Thx for your fast help :-) > > > Hubert > > Am Mi., 22. Mai 2019 um 09:32 Uhr schrieb Ravishankar N > : >> >> On 22/05/19 12:39 PM, Hu Bert wrote: >>> Hi @ll, >>> >>> today i updated and rebooted the 3 servers of my replicate 3 setup; >>> after the 3rd one came up again i noticed this error: >>> >>> [2019-05-22 06:41:26.781165] E [MSGID: 108008] >>> [afr-self-heal-common.c:392:afr_gfid_split_brain_source] >>> 0-workdata-replicate-0: Gfid mismatch detected for >>> /120710351>, >>> 82025ab3-8034-4257-9628-d8ebde909629 on workdata-client-2 and >>> eaf2f31e-b4a7-4fa8-b710-d6ff9cd4eace on workdata-client-1. >> 120710351 seems to be the entry that is in split-brain. Is >> /staticmap/120/710/120710351 the complete path to that entry? (check if >> gfid:751233b0-7789-4550-bd95-4dd9c8f57c19 corresponds to the gfid of 710). >> >> You can then try "gluster volume heal workdata split-brain source-brick >> gluster1:/gluster/md4/workdata /staticmap/120/710/120710351" >> >> -Ravi >> >>> [2019-05-22 06:41:27.069969] W [MSGID: 108027] >>> [afr-common.c:2270:afr_attempt_readsubvol_set] 0-workdata-replicate-0: >>> no read subvols for /staticmap/120/710/120710351 >>> [2019-05-22 06:41:27.808532] W [fuse-bridge.c:582:fuse_entry_cbk] >>> 0-glusterfs-fuse: 1834335: LOOKUP() /staticmap/120/710/120710351 => -1 >>> (Transport endpoint is not connected) >>> >>> A simple 'gluster volume heal workdata' didn't help; 'gluster volume >>> heal workdata info' says: >>> >>> Brick gluster1:/gluster/md4/workdata >>> /staticmap/120/710 >>> /staticmap/120/710/120710351 >>> >>> Status: Connected >>> Number of entries: 3 >>> >>> Brick gluster2:/gluster/md4/workdata >>> /staticmap/120/710 >>> /staticmap/120/710/120710351 >>> >>> Status: Connected >>> Number of entries: 3 >>> >>> Brick gluster3:/gluster/md4/workdata >>> /staticmap/120/710/120710351 >>> Status: Connected >>> Number of entries: 1 >>> >>> There's a mismatch in one directory; I tried to follow these instructions: >>> https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/ >>> >>> gluster volume heal workdata split-brain source-brick >>> gluster1:/gluster/md4/workdata >>> gfid:fe7fdbe8-9a39-4793-8d38-6dfdd3d5089b >>> Healing gfid:fe7fdbe8-9a39-4793-8d38-6dfdd3d5089b failed: File not in >>> split-brain. >>> Volume heal failed. >>> Is there any other documentation for gfid mismatch and how to resolve this? >>> >>> >>> Thx, >>> Hubert >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.orth at gmail.com Wed May 22 21:10:45 2019 From: alan.orth at gmail.com (Alan Orth) Date: Thu, 23 May 2019 00:10:45 +0300 Subject: [Gluster-users] Does replace-brick migrate data? Message-ID: Dear list, I seem to have gotten into a tricky situation. Today I brought up a shiny new server with new disk arrays and attempted to replace one brick of a replica 2 distribute/replicate volume on an older server using the `replace-brick` command: # gluster volume replace-brick homes wingu0:/mnt/gluster/homes wingu06:/data/glusterfs/sdb/homes commit force The command was successful and I see the new brick in the output of `gluster volume info`. The problem is that Gluster doesn't seem to be migrating the data, and now the original brick that I replaced is no longer part of the volume (and a few terabytes of data are just sitting on the old brick): # gluster volume info homes | grep -E "Brick[0-9]:" Brick1: wingu4:/mnt/gluster/homes Brick2: wingu3:/mnt/gluster/homes Brick3: wingu06:/data/glusterfs/sdb/homes Brick4: wingu05:/data/glusterfs/sdb/homes Brick5: wingu05:/data/glusterfs/sdc/homes Brick6: wingu06:/data/glusterfs/sdc/homes I see the Gluster docs have a more complicated procedure for replacing bricks that involves getfattr/setfattr?. How can I tell Gluster about the old brick? I see that I have a backup of the old volfile thanks to yum's rpmsave function if that helps. We are using Gluster 5.6 on CentOS 7. Thank you for any advice you can give. ? https://docs.gluster.org/en/latest/Administrator%20Guide/Managing%20Volumes/#replace-faulty-brick -- Alan Orth alan.orth at gmail.com https://picturingjordan.com https://englishbulgaria.net https://mjanja.ch "In heaven all the interesting people are missing." ?Friedrich Nietzsche -------------- next part -------------- An HTML attachment was scrubbed... URL: From dcunningham at voisonics.com Wed May 22 22:24:28 2019 From: dcunningham at voisonics.com (David Cunningham) Date: Thu, 23 May 2019 10:24:28 +1200 Subject: [Gluster-users] add-brick: failed: Commit failed In-Reply-To: References: <924b8cb6-5a61-3a7f-1591-07ffe0d80a24@redhat.com> <47a6c5fa-4304-4680-d63f-99ecd1e43c4c@redhat.com> <764773c5-38d4-e427-d699-3192bf9a1005@redhat.com> Message-ID: Hi Ravi, Please see the log attached. The output of "gluster volume status" is as follows. Should there be something listening on gfs3? I'm not sure whether it having TCP Port and Pid as N/A is a symptom or cause. Thank you. # gluster volume status Status of volume: gvol0 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick gfs1:/nodirectwritedata/gluster/gvol0 49152 0 Y 7706 Brick gfs2:/nodirectwritedata/gluster/gvol0 49152 0 Y 7624 Brick gfs3:/nodirectwritedata/gluster/gvol0 N/A N/A N N/A Self-heal Daemon on localhost N/A N/A Y 19853 Self-heal Daemon on gfs1 N/A N/A Y 28600 Self-heal Daemon on gfs2 N/A N/A Y 17614 Task Status of Volume gvol0 ------------------------------------------------------------------------------ There are no active volume tasks On Wed, 22 May 2019 at 18:06, Ravishankar N wrote: > If you are trying this again, please 'gluster volume set $volname > client-log-level DEBUG`before attempting the add-brick and attach the > gvol0-add-brick-mount.log here. After that, you can change the > client-log-level back to INFO. > > -Ravi > On 22/05/19 11:32 AM, Ravishankar N wrote: > > > On 22/05/19 11:23 AM, David Cunningham wrote: > > Hi Ravi, > > I'd already done exactly that before, where step 3 was a simple 'rm -rf > /nodirectwritedata/gluster/gvol0'. Have you another suggestion on what the > cleanup or reformat should be? > > `rm -rf /nodirectwritedata/gluster/gvol0` does look okay to me David. > Basically, '/nodirectwritedata/gluster/gvol0' must be empty and must not > have any extended attributes set on it. Why fuse_first_lookup() is failing > is a bit of a mystery to me at this point. :-( > Regards, > Ravi > > > Thank you. > > > On Wed, 22 May 2019 at 13:56, Ravishankar N > wrote: > >> Hmm, so the volume info seems to indicate that the add-brick was >> successful but the gfid xattr is missing on the new brick (as are the >> actual files, barring the .glusterfs folder, according to your previous >> mail). >> >> Do you want to try removing and adding it again? >> >> 1. `gluster volume remove-brick gvol0 replica 2 >> gfs3:/nodirectwritedata/gluster/gvol0 force` from gfs1 >> >> 2. Check that gluster volume info is now back to a 1x2 volume on all >> nodes and `gluster peer status` is connected on all nodes. >> >> 3. Cleanup or reformat '/nodirectwritedata/gluster/gvol0' on gfs3. >> >> 4. `gluster volume add-brick gvol0 replica 3 arbiter 1 >> gfs3:/nodirectwritedata/gluster/gvol0` from gfs1. >> >> 5. Check that the files are getting healed on to the new brick. >> Thanks, >> Ravi >> On 22/05/19 6:50 AM, David Cunningham wrote: >> >> Hi Ravi, >> >> Certainly. On the existing two nodes: >> >> gfs1 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0 >> getfattr: Removing leading '/' from absolute path names >> # file: nodirectwritedata/gluster/gvol0 >> trusted.afr.dirty=0x000000000000000000000000 >> trusted.afr.gvol0-client-2=0x000000000000000000000000 >> trusted.gfid=0x00000000000000000000000000000001 >> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >> trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6 >> >> gfs2 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0 >> getfattr: Removing leading '/' from absolute path names >> # file: nodirectwritedata/gluster/gvol0 >> trusted.afr.dirty=0x000000000000000000000000 >> trusted.afr.gvol0-client-0=0x000000000000000000000000 >> trusted.afr.gvol0-client-2=0x000000000000000000000000 >> trusted.gfid=0x00000000000000000000000000000001 >> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >> trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6 >> >> On the new node: >> >> gfs3 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0 >> getfattr: Removing leading '/' from absolute path names >> # file: nodirectwritedata/gluster/gvol0 >> trusted.afr.dirty=0x000000000000000000000001 >> trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6 >> >> Output of "gluster volume info" is the same on all 3 nodes and is: >> >> # gluster volume info >> >> Volume Name: gvol0 >> Type: Replicate >> Volume ID: fb5af69e-1c3e-4164-8b23-c1d7bec9b1b6 >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 1 x (2 + 1) = 3 >> Transport-type: tcp >> Bricks: >> Brick1: gfs1:/nodirectwritedata/gluster/gvol0 >> Brick2: gfs2:/nodirectwritedata/gluster/gvol0 >> Brick3: gfs3:/nodirectwritedata/gluster/gvol0 (arbiter) >> Options Reconfigured: >> performance.client-io-threads: off >> nfs.disable: on >> transport.address-family: inet >> >> >> On Wed, 22 May 2019 at 12:43, Ravishankar N >> wrote: >> >>> Hi David, >>> Could you provide the `getfattr -d -m. -e hex >>> /nodirectwritedata/gluster/gvol0` output of all bricks and the output of >>> `gluster volume info`? >>> >>> Thanks, >>> Ravi >>> On 22/05/19 4:57 AM, David Cunningham wrote: >>> >>> Hi Sanju, >>> >>> Here's what glusterd.log says on the new arbiter server when trying to >>> add the node: >>> >>> [2019-05-22 00:15:05.963059] I [run.c:242:runner_log] >>> (-->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x3b2cd) >>> [0x7fe4ca9102cd] >>> -->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0xe6b85) >>> [0x7fe4ca9bbb85] -->/lib64/libglusterfs.so.0(runner_log+0x115) >>> [0x7fe4d5ecc955] ) 0-management: Ran script: >>> /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh >>> --volname=gvol0 --version=1 --volume-op=add-brick >>> --gd-workdir=/var/lib/glusterd >>> [2019-05-22 00:15:05.963177] I [MSGID: 106578] >>> [glusterd-brick-ops.c:1355:glusterd_op_perform_add_bricks] 0-management: >>> replica-count is set 3 >>> [2019-05-22 00:15:05.963228] I [MSGID: 106578] >>> [glusterd-brick-ops.c:1360:glusterd_op_perform_add_bricks] 0-management: >>> arbiter-count is set 1 >>> [2019-05-22 00:15:05.963257] I [MSGID: 106578] >>> [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks] 0-management: >>> type is set 0, need to change it >>> [2019-05-22 00:15:17.015268] E [MSGID: 106053] >>> [glusterd-utils.c:13942:glusterd_handle_replicate_brick_ops] 0-management: >>> Failed to set extended attribute trusted.add-brick : Transport endpoint is >>> not connected [Transport endpoint is not connected] >>> [2019-05-22 00:15:17.036479] E [MSGID: 106073] >>> [glusterd-brick-ops.c:2595:glusterd_op_add_brick] 0-glusterd: Unable to add >>> bricks >>> [2019-05-22 00:15:17.036595] E [MSGID: 106122] >>> [glusterd-mgmt.c:299:gd_mgmt_v3_commit_fn] 0-management: Add-brick commit >>> failed. >>> [2019-05-22 00:15:17.036710] E [MSGID: 106122] >>> [glusterd-mgmt-handler.c:594:glusterd_handle_commit_fn] 0-management: >>> commit failed on operation Add brick >>> >>> As before gvol0-add-brick-mount.log said: >>> >>> [2019-05-22 00:15:17.005695] I [fuse-bridge.c:4267:fuse_init] >>> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel >>> 7.22 >>> [2019-05-22 00:15:17.005749] I [fuse-bridge.c:4878:fuse_graph_sync] >>> 0-fuse: switched to graph 0 >>> [2019-05-22 00:15:17.010101] E [fuse-bridge.c:4336:fuse_first_lookup] >>> 0-fuse: first lookup on root failed (Transport endpoint is not connected) >>> [2019-05-22 00:15:17.014217] W [fuse-bridge.c:897:fuse_attr_cbk] >>> 0-glusterfs-fuse: 2: LOOKUP() / => -1 (Transport endpoint is not connected) >>> [2019-05-22 00:15:17.015097] W >>> [fuse-resolve.c:127:fuse_resolve_gfid_cbk] 0-fuse: >>> 00000000-0000-0000-0000-000000000001: failed to resolve (Transport endpoint >>> is not connected) >>> [2019-05-22 00:15:17.015158] W [fuse-bridge.c:3294:fuse_setxattr_resume] >>> 0-glusterfs-fuse: 3: SETXATTR 00000000-0000-0000-0000-000000000001/1 >>> (trusted.add-brick) resolution failed >>> [2019-05-22 00:15:17.035636] I [fuse-bridge.c:5144:fuse_thread_proc] >>> 0-fuse: initating unmount of /tmp/mntYGNbj9 >>> [2019-05-22 00:15:17.035854] W [glusterfsd.c:1500:cleanup_and_exit] >>> (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f7745ccedd5] >>> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55c81b63de75] >>> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55c81b63dceb] ) 0-: >>> received signum (15), shutting down >>> [2019-05-22 00:15:17.035942] I [fuse-bridge.c:5914:fini] 0-fuse: >>> Unmounting '/tmp/mntYGNbj9'. >>> [2019-05-22 00:15:17.035966] I [fuse-bridge.c:5919:fini] 0-fuse: Closing >>> fuse connection to '/tmp/mntYGNbj9'. >>> >>> Here are the processes running on the new arbiter server: >>> # ps -ef | grep gluster >>> root 3466 1 0 20:13 ? 00:00:00 /usr/sbin/glusterfs -s >>> localhost --volfile-id gluster/glustershd -p >>> /var/run/gluster/glustershd/glustershd.pid -l >>> /var/log/glusterfs/glustershd.log -S >>> /var/run/gluster/24c12b09f93eec8e.socket --xlator-option >>> *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412 --process-name >>> glustershd >>> root 6832 1 0 May16 ? 00:02:10 /usr/sbin/glusterd -p >>> /var/run/glusterd.pid --log-level INFO >>> root 17841 1 0 May16 ? 00:00:58 /usr/sbin/glusterfs >>> --process-name fuse --volfile-server=gfs1 --volfile-id=/gvol0 /mnt/glusterfs >>> >>> Here are the files created on the new arbiter server: >>> # find /nodirectwritedata/gluster/gvol0 | xargs ls -ald >>> drwxr-xr-x 3 root root 4096 May 21 20:15 /nodirectwritedata/gluster/gvol0 >>> drw------- 2 root root 4096 May 21 20:15 >>> /nodirectwritedata/gluster/gvol0/.glusterfs >>> >>> Thank you for your help! >>> >>> >>> On Tue, 21 May 2019 at 00:10, Sanju Rakonde wrote: >>> >>>> David, >>>> >>>> can you please attach glusterd.logs? As the error message says, Commit >>>> failed on the arbitar node, we might be able to find some issue on that >>>> node. >>>> >>>> On Mon, May 20, 2019 at 10:10 AM Nithya Balachandran < >>>> nbalacha at redhat.com> wrote: >>>> >>>>> >>>>> >>>>> On Fri, 17 May 2019 at 06:01, David Cunningham < >>>>> dcunningham at voisonics.com> wrote: >>>>> >>>>>> Hello, >>>>>> >>>>>> We're adding an arbiter node to an existing volume and having an >>>>>> issue. Can anyone help? The root cause error appears to be >>>>>> "00000000-0000-0000-0000-000000000001: failed to resolve (Transport >>>>>> endpoint is not connected)", as below. >>>>>> >>>>>> We are running glusterfs 5.6.1. Thanks in advance for any assistance! >>>>>> >>>>>> On existing node gfs1, trying to add new arbiter node gfs3: >>>>>> >>>>>> # gluster volume add-brick gvol0 replica 3 arbiter 1 >>>>>> gfs3:/nodirectwritedata/gluster/gvol0 >>>>>> volume add-brick: failed: Commit failed on gfs3. Please check log >>>>>> file for details. >>>>>> >>>>> >>>>> This looks like a glusterd issue. Please check the glusterd logs for >>>>> more info. >>>>> Adding the glusterd dev to this thread. Sanju, can you take a look? >>>>> >>>>> Regards, >>>>> Nithya >>>>> >>>>>> >>>>>> On new node gfs3 in gvol0-add-brick-mount.log: >>>>>> >>>>>> [2019-05-17 01:20:22.689721] I [fuse-bridge.c:4267:fuse_init] >>>>>> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel >>>>>> 7.22 >>>>>> [2019-05-17 01:20:22.689778] I [fuse-bridge.c:4878:fuse_graph_sync] >>>>>> 0-fuse: switched to graph 0 >>>>>> [2019-05-17 01:20:22.694897] E [fuse-bridge.c:4336:fuse_first_lookup] >>>>>> 0-fuse: first lookup on root failed (Transport endpoint is not connected) >>>>>> [2019-05-17 01:20:22.699770] W >>>>>> [fuse-resolve.c:127:fuse_resolve_gfid_cbk] 0-fuse: >>>>>> 00000000-0000-0000-0000-000000000001: failed to resolve (Transport endpoint >>>>>> is not connected) >>>>>> [2019-05-17 01:20:22.699834] W >>>>>> [fuse-bridge.c:3294:fuse_setxattr_resume] 0-glusterfs-fuse: 2: SETXATTR >>>>>> 00000000-0000-0000-0000-000000000001/1 (trusted.add-brick) resolution failed >>>>>> [2019-05-17 01:20:22.715656] I [fuse-bridge.c:5144:fuse_thread_proc] >>>>>> 0-fuse: initating unmount of /tmp/mntQAtu3f >>>>>> [2019-05-17 01:20:22.715865] W [glusterfsd.c:1500:cleanup_and_exit] >>>>>> (-->/lib64/libpthread.so.0(+0x7dd5) [0x7fb223bf6dd5] >>>>>> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x560886581e75] >>>>>> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x560886581ceb] ) 0-: >>>>>> received signum (15), shutting down >>>>>> [2019-05-17 01:20:22.715926] I [fuse-bridge.c:5914:fini] 0-fuse: >>>>>> Unmounting '/tmp/mntQAtu3f'. >>>>>> [2019-05-17 01:20:22.715953] I [fuse-bridge.c:5919:fini] 0-fuse: >>>>>> Closing fuse connection to '/tmp/mntQAtu3f'. >>>>>> >>>>>> Processes running on new node gfs3: >>>>>> >>>>>> # ps -ef | grep gluster >>>>>> root 6832 1 0 20:17 ? 00:00:00 /usr/sbin/glusterd -p >>>>>> /var/run/glusterd.pid --log-level INFO >>>>>> root 15799 1 0 20:17 ? 00:00:00 /usr/sbin/glusterfs >>>>>> -s localhost --volfile-id gluster/glustershd -p >>>>>> /var/run/gluster/glustershd/glustershd.pid -l >>>>>> /var/log/glusterfs/glustershd.log -S >>>>>> /var/run/gluster/24c12b09f93eec8e.socket --xlator-option >>>>>> *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412 --process-name >>>>>> glustershd >>>>>> root 16856 16735 0 21:21 pts/0 00:00:00 grep --color=auto >>>>>> gluster >>>>>> >>>>>> -- >>>>>> David Cunningham, Voisonics Limited >>>>>> http://voisonics.com/ >>>>>> USA: +1 213 221 1092 >>>>>> New Zealand: +64 (0)28 2558 3782 >>>>>> _______________________________________________ >>>>>> Gluster-users mailing list >>>>>> Gluster-users at gluster.org >>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>> >>>>> >>>> >>>> -- >>>> Thanks, >>>> Sanju >>>> >>> >>> >>> -- >>> David Cunningham, Voisonics Limited >>> http://voisonics.com/ >>> USA: +1 213 221 1092 >>> New Zealand: +64 (0)28 2558 3782 >>> >>> _______________________________________________ >>> Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> >> >> -- >> David Cunningham, Voisonics Limited >> http://voisonics.com/ >> USA: +1 213 221 1092 >> New Zealand: +64 (0)28 2558 3782 >> >> > > -- > David Cunningham, Voisonics Limited > http://voisonics.com/ > USA: +1 213 221 1092 > New Zealand: +64 (0)28 2558 3782 > > -- David Cunningham, Voisonics Limited http://voisonics.com/ USA: +1 213 221 1092 New Zealand: +64 (0)28 2558 3782 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: gvol0-add-brick-mount.log Type: text/x-log Size: 30154 bytes Desc: not available URL: From spisla80 at gmail.com Thu May 23 10:10:33 2019 From: spisla80 at gmail.com (David Spisla) Date: Thu, 23 May 2019 12:10:33 +0200 Subject: [Gluster-users] Create Gluster RPMs on a SLES15 machine In-Reply-To: References: Message-ID: Hello Kaleb, there is no rpcsvc-proto rpm for SLES15 according to this: https://software.opensuse.org/package/rpcsvc-proto?locale=fa It really seems to be that the SLES15 from OpenSUSE has a special setup. Remove the BuildRequires: rpcgen line and use the glibc bundled rpcgen is working. I have my packages now! Regards David Am Mo., 13. Mai 2019 um 08:10 Uhr schrieb David Spisla : > Hello Kaleb, > > thank you for the info. I'll try this out. > > Regards > David > > Am Fr., 10. Mai 2019 um 16:24 Uhr schrieb Kaleb Keithley < > kkeithle at redhat.com>: > >> Seems I accidentally omitted gluster-users in my first reply. >> >> On Thu, May 9, 2019 at 3:19 PM Kaleb Keithley >> wrote: >> >>> On Thu, May 9, 2019 at 8:53 AM David Spisla wrote: >>> >>>> Hello Kaleb, >>>> >>>> i am trying to create my own Gluster v5.5 RPMs for SLES15 and I am >>>> using a SLES15 system to create them. I got the following error message: >>>> >>>> rpmbuild --define '_topdir >>>>> /home/davids/glusterfs/extras/LinuxRPM/rpmbuild' --with gnfs -bb >>>>> rpmbuild/SPECS/glusterfs.spec >>>>> warning: bogus date in %changelog: Tue Apr 17 2019 kkeithle at >>>>> redhat.com >>>>> warning: bogus date in %changelog: Fri Sep 19 2018 kkeithle at >>>>> redhat.com >>>>> error: Failed build dependencies: >>>>> rpcgen is needed by glusterfs-5.5-100.x86_64 >>>>> make: *** [Makefile:579: rpms] Error 1 >>>>> >>>>> >>>> In the corresponding glusterfs.spec file (branch sles15-glusterfs-5 in >>>> Repo glusterfs-suse) there is rpcgen listed as dependency. But >>>> unfortunately there is no rpcgen package provided on SLES15. Or with other >>>> words: >>>> I did only find RPMs for other SUSE distributions, but not for SLES15. >>>> >>>> Do you know that issue? >>>> >>> >>> I'm afraid I don't. >>> >>> >>>> What is the name of the distribution which you are using to create >>>> Packages for SLES15? >>>> >>> >>> The community packages are built on the OpenSUSE OBS and they are built >>> on SLES15 ?the one that OBS provides. I don't know any details beyond that. >>> It could be a real SLES15 system, or it could be a build in mock, or SUSE's >>> chroot build tool if they don't have mock. >>> >>> You can see the build logs from the community builds of glusterfs-5.5 >>> and glusterfs-5.6 for SLES15 at [1] and [2] respectively. AFAIK it's a >>> completely "vanilla" SLES15 and seems to have rpcgen-1.3-2.18 available. >>> Finding things in the OBS repos seems to be hit or miss sometimes. I can't >>> find the SLE_15 rpcgen package. >>> >>> (Back in SLES11 days I had a free eval license that let me update and >>> install add-on packages on my own system. I tried to get a similar license >>> for SLES12 and was advised to just use OBS. I haven't even bothered trying >>> to get one for SLES15. It makes it harder IMO to figure things out.) >>> >>> I recommend asking the OBS team on #opensuse-buildservice on (freenode) >>> IRC. They've always been very helpful to me. >>> >> >> Miuku on #opensuse-buildservice poked around and found that the unbundled >> rpcgen in SLE_15 comes from the rpcsvc-proto rpm. (Not the rpcgen rpm as it >> does in Fedora and RHEL8.) >> >> All the gluster community packages for SLE_15 going back to glusterfs-5.0 >> in October 2018 have used the unbundled rpcgen. You can do the same, or >> remove the BuildRequires: rpcgen line and use the glibc bundled rpcgen. >> >> HTH >> >> -- >> >> Kaleb >> >> >> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From srakonde at redhat.com Thu May 23 10:34:37 2019 From: srakonde at redhat.com (Sanju Rakonde) Date: Thu, 23 May 2019 16:04:37 +0530 Subject: [Gluster-users] ./tests/basic/gfapi/gfapi-ssl-test.t is failing too often in regression Message-ID: I see a lot of patches are failing regressions due to the .t mentioned in the subject line. I've filed a bug[1] for the same. https://bugzilla.redhat.com/show_bug.cgi?id=1713284 -- Thanks, Sanju -------------- next part -------------- An HTML attachment was scrubbed... URL: From srakonde at redhat.com Thu May 23 11:04:28 2019 From: srakonde at redhat.com (Sanju Rakonde) Date: Thu, 23 May 2019 16:34:28 +0530 Subject: [Gluster-users] ./tests/basic/gfapi/gfapi-ssl-test.t is failing too often in regression In-Reply-To: References: Message-ID: I apologize for the wrong mail. This .t failed only for one patch and I don't think it is spurious. Closing this bug as not a bug. On Thu, May 23, 2019 at 4:04 PM Sanju Rakonde wrote: > I see a lot of patches are failing regressions due to the .t mentioned in > the subject line. I've filed a bug[1] for the same. > > https://bugzilla.redhat.com/show_bug.cgi?id=1713284 > -- > Thanks, > Sanju > -- Thanks, Sanju -------------- next part -------------- An HTML attachment was scrubbed... URL: From brandon at thinkhuge.net Thu May 23 17:45:40 2019 From: brandon at thinkhuge.net (brandon at thinkhuge.net) Date: Thu, 23 May 2019 10:45:40 -0700 Subject: [Gluster-users] remove-brick failure on distributed with 5.6 Message-ID: <080701d5118f$574a72b0$05df5810$@thinkhuge.net> Does anyone know what should be done on a glusterfs v5.6 "gluster volume remove-brick" operation that fails? I'm trying to remove 1 of 8 distributed smaller nodes for replacement with larger node. The "gluster volume remove-brick ... status" command reports status failed and failures = "3" cat /var/log/glusterfs/volbackups-rebalance.log ... [2019-05-23 16:43:37.442283] I [MSGID: 109028] [dht-rebalance.c:5070:gf_defrag_status_get] 0-volbackups-dht: Rebalance is failed. Time taken is 545.00 secs All servers are confirmed in good communications and updated and freshly rebooted and retried the remove-brick few times with fail each time -------------- next part -------------- An HTML attachment was scrubbed... URL: From ravishankar at redhat.com Fri May 24 13:48:52 2019 From: ravishankar at redhat.com (Ravishankar N) Date: Fri, 24 May 2019 19:18:52 +0530 Subject: [Gluster-users] add-brick: failed: Commit failed In-Reply-To: References: <924b8cb6-5a61-3a7f-1591-07ffe0d80a24@redhat.com> <47a6c5fa-4304-4680-d63f-99ecd1e43c4c@redhat.com> <764773c5-38d4-e427-d699-3192bf9a1005@redhat.com> Message-ID: <2132c601-86a4-2b85-5d72-7b9926f890a2@redhat.com> Hi David, On 23/05/19 3:54 AM, David Cunningham wrote: > Hi Ravi, > > Please see the log attached. When I grep -E "Connected to |disconnected from" gvol0-add-brick-mount.log,? I don't see a "Connected to gvol0-client-1". It looks like this temporary mount is not able to connect to the 2nd brick, which is why the lookup is failing due to lack of quorum. > The output of "gluster volume status" is as follows. Should there be > something listening on gfs3? I'm not sure whether it having TCP Port > and Pid as N/A is a symptom or cause. Thank you. > > # gluster volume status > Status of volume: gvol0 > Gluster process???????????????????????????? TCP Port? RDMA Port? > Online? Pid > ------------------------------------------------------------------------------ > Brick gfs1:/nodirectwritedata/gluster/gvol0 49152 0????????? Y?????? 7706 > Brick gfs2:/nodirectwritedata/gluster/gvol0 49152 0????????? Y?????? 7624 > Brick gfs3:/nodirectwritedata/gluster/gvol0 N/A N/A??????? N?????? N/A Can you see if the following steps help? 1. Do a `setfattr -n trusted.afr.gvol0-client-2 -v 0x000000000000000100000001 /nodirectwritedata/gluster/gvol0` on *both* gfs1 and gfs2. 2. 'gluster volume start gvol0 force` 3. Check if Brick-3 now comes online with a valid TCP port and PID. If it doesn't, check the brick log under /var/log/glusterfs/bricks on gfs3 to see why. Thanks, Ravi > Self-heal Daemon on localhost?????????????? N/A N/A??????? Y?????? 19853 > Self-heal Daemon on gfs1??????????????????? N/A N/A??????? Y?????? 28600 > Self-heal Daemon on gfs2??????????????????? N/A N/A??????? Y?????? 17614 > > Task Status of Volume gvol0 > ------------------------------------------------------------------------------ > There are no active volume tasks > > > On Wed, 22 May 2019 at 18:06, Ravishankar N > wrote: > > If you are trying this again, please 'gluster volume set $volname > client-log-level DEBUG`before attempting the add-brick and attach > the gvol0-add-brick-mount.log here. After that, you can change the > client-log-level back to INFO. > > -Ravi > > On 22/05/19 11:32 AM, Ravishankar N wrote: >> >> >> On 22/05/19 11:23 AM, David Cunningham wrote: >>> Hi Ravi, >>> >>> I'd already done exactly that before, where step 3 was a simple >>> 'rm -rf /nodirectwritedata/gluster/gvol0'. Have you another >>> suggestion on what the cleanup or reformat should be? >> `rm -rf /nodirectwritedata/gluster/gvol0` does look okay to me >> David. Basically, '/nodirectwritedata/gluster/gvol0' must be >> empty and must not have any extended attributes set on it. Why >> fuse_first_lookup() is failing is a bit of a mystery to me at >> this point. :-( >> Regards, >> Ravi >>> >>> Thank you. >>> >>> >>> On Wed, 22 May 2019 at 13:56, Ravishankar N >>> > wrote: >>> >>> Hmm, so the volume info seems to indicate that the add-brick >>> was successful but the gfid xattr is missing on the new >>> brick (as are the actual files, barring the .glusterfs >>> folder, according to your previous mail). >>> >>> Do you want to try removing and adding it again? >>> >>> 1. `gluster volume remove-brick gvol0 replica 2 >>> gfs3:/nodirectwritedata/gluster/gvol0 force` from gfs1 >>> >>> 2. Check that gluster volume info is now back to a 1x2 >>> volume on all nodes and `gluster peer status` is? connected >>> on all nodes. >>> >>> 3. Cleanup or reformat '/nodirectwritedata/gluster/gvol0' on >>> gfs3. >>> >>> 4. `gluster volume add-brick gvol0 replica 3 arbiter 1 >>> gfs3:/nodirectwritedata/gluster/gvol0` from gfs1. >>> >>> 5. Check that the files are getting healed on to the new brick. >>> >>> Thanks, >>> Ravi >>> On 22/05/19 6:50 AM, David Cunningham wrote: >>>> Hi Ravi, >>>> >>>> Certainly. On the existing two nodes: >>>> >>>> gfs1 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0 >>>> getfattr: Removing leading '/' from absolute path names >>>> # file: nodirectwritedata/gluster/gvol0 >>>> trusted.afr.dirty=0x000000000000000000000000 >>>> trusted.afr.gvol0-client-2=0x000000000000000000000000 >>>> trusted.gfid=0x00000000000000000000000000000001 >>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>> trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6 >>>> >>>> gfs2 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0 >>>> getfattr: Removing leading '/' from absolute path names >>>> # file: nodirectwritedata/gluster/gvol0 >>>> trusted.afr.dirty=0x000000000000000000000000 >>>> trusted.afr.gvol0-client-0=0x000000000000000000000000 >>>> trusted.afr.gvol0-client-2=0x000000000000000000000000 >>>> trusted.gfid=0x00000000000000000000000000000001 >>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>> trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6 >>>> >>>> On the new node: >>>> >>>> gfs3 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0 >>>> getfattr: Removing leading '/' from absolute path names >>>> # file: nodirectwritedata/gluster/gvol0 >>>> trusted.afr.dirty=0x000000000000000000000001 >>>> trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6 >>>> >>>> Output of "gluster volume info" is the same on all 3 nodes >>>> and is: >>>> >>>> # gluster volume info >>>> >>>> Volume Name: gvol0 >>>> Type: Replicate >>>> Volume ID: fb5af69e-1c3e-4164-8b23-c1d7bec9b1b6 >>>> Status: Started >>>> Snapshot Count: 0 >>>> Number of Bricks: 1 x (2 + 1) = 3 >>>> Transport-type: tcp >>>> Bricks: >>>> Brick1: gfs1:/nodirectwritedata/gluster/gvol0 >>>> Brick2: gfs2:/nodirectwritedata/gluster/gvol0 >>>> Brick3: gfs3:/nodirectwritedata/gluster/gvol0 (arbiter) >>>> Options Reconfigured: >>>> performance.client-io-threads: off >>>> nfs.disable: on >>>> transport.address-family: inet >>>> >>>> >>>> On Wed, 22 May 2019 at 12:43, Ravishankar N >>>> > wrote: >>>> >>>> Hi David, >>>> Could you provide the `getfattr -d -m. -e hex >>>> /nodirectwritedata/gluster/gvol0` output of all bricks >>>> and the output of `gluster volume info`? >>>> >>>> Thanks, >>>> Ravi >>>> On 22/05/19 4:57 AM, David Cunningham wrote: >>>>> Hi Sanju, >>>>> >>>>> Here's what glusterd.log says on the new arbiter >>>>> server when trying to add the node: >>>>> >>>>> [2019-05-22 00:15:05.963059] I [run.c:242:runner_log] >>>>> (-->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x3b2cd) >>>>> [0x7fe4ca9102cd] >>>>> -->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0xe6b85) >>>>> [0x7fe4ca9bbb85] >>>>> -->/lib64/libglusterfs.so.0(runner_log+0x115) >>>>> [0x7fe4d5ecc955] ) 0-management: Ran script: >>>>> /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh >>>>> --volname=gvol0 --version=1 --volume-op=add-brick >>>>> --gd-workdir=/var/lib/glusterd >>>>> [2019-05-22 00:15:05.963177] I [MSGID: 106578] >>>>> [glusterd-brick-ops.c:1355:glusterd_op_perform_add_bricks] >>>>> 0-management: replica-count is set 3 >>>>> [2019-05-22 00:15:05.963228] I [MSGID: 106578] >>>>> [glusterd-brick-ops.c:1360:glusterd_op_perform_add_bricks] >>>>> 0-management: arbiter-count is set 1 >>>>> [2019-05-22 00:15:05.963257] I [MSGID: 106578] >>>>> [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks] >>>>> 0-management: type is set 0, need to change it >>>>> [2019-05-22 00:15:17.015268] E [MSGID: 106053] >>>>> [glusterd-utils.c:13942:glusterd_handle_replicate_brick_ops] >>>>> 0-management: Failed to set extended attribute >>>>> trusted.add-brick : Transport endpoint is not >>>>> connected [Transport endpoint is not connected] >>>>> [2019-05-22 00:15:17.036479] E [MSGID: 106073] >>>>> [glusterd-brick-ops.c:2595:glusterd_op_add_brick] >>>>> 0-glusterd: Unable to add bricks >>>>> [2019-05-22 00:15:17.036595] E [MSGID: 106122] >>>>> [glusterd-mgmt.c:299:gd_mgmt_v3_commit_fn] >>>>> 0-management: Add-brick commit failed. >>>>> [2019-05-22 00:15:17.036710] E [MSGID: 106122] >>>>> [glusterd-mgmt-handler.c:594:glusterd_handle_commit_fn] >>>>> 0-management: commit failed on operation Add brick >>>>> >>>>> As before gvol0-add-brick-mount.log said: >>>>> >>>>> [2019-05-22 00:15:17.005695] I >>>>> [fuse-bridge.c:4267:fuse_init] 0-glusterfs-fuse: FUSE >>>>> inited with protocol versions: glusterfs 7.24 kernel 7.22 >>>>> [2019-05-22 00:15:17.005749] I >>>>> [fuse-bridge.c:4878:fuse_graph_sync] 0-fuse: switched >>>>> to graph 0 >>>>> [2019-05-22 00:15:17.010101] E >>>>> [fuse-bridge.c:4336:fuse_first_lookup] 0-fuse: first >>>>> lookup on root failed (Transport endpoint is not >>>>> connected) >>>>> [2019-05-22 00:15:17.014217] W >>>>> [fuse-bridge.c:897:fuse_attr_cbk] 0-glusterfs-fuse: 2: >>>>> LOOKUP() / => -1 (Transport endpoint is not connected) >>>>> [2019-05-22 00:15:17.015097] W >>>>> [fuse-resolve.c:127:fuse_resolve_gfid_cbk] 0-fuse: >>>>> 00000000-0000-0000-0000-000000000001: failed to >>>>> resolve (Transport endpoint is not connected) >>>>> [2019-05-22 00:15:17.015158] W >>>>> [fuse-bridge.c:3294:fuse_setxattr_resume] >>>>> 0-glusterfs-fuse: 3: SETXATTR >>>>> 00000000-0000-0000-0000-000000000001/1 >>>>> (trusted.add-brick) resolution failed >>>>> [2019-05-22 00:15:17.035636] I >>>>> [fuse-bridge.c:5144:fuse_thread_proc] 0-fuse: >>>>> initating unmount of /tmp/mntYGNbj9 >>>>> [2019-05-22 00:15:17.035854] W >>>>> [glusterfsd.c:1500:cleanup_and_exit] >>>>> (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f7745ccedd5] >>>>> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) >>>>> [0x55c81b63de75] >>>>> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) >>>>> [0x55c81b63dceb] ) 0-: received signum (15), shutting down >>>>> [2019-05-22 00:15:17.035942] I >>>>> [fuse-bridge.c:5914:fini] 0-fuse: Unmounting >>>>> '/tmp/mntYGNbj9'. >>>>> [2019-05-22 00:15:17.035966] I >>>>> [fuse-bridge.c:5919:fini] 0-fuse: Closing fuse >>>>> connection to '/tmp/mntYGNbj9'. >>>>> >>>>> Here are the processes running on the new arbiter server: >>>>> # ps -ef | grep gluster >>>>> root????? 3466???? 1? 0 20:13 ???????? 00:00:00 >>>>> /usr/sbin/glusterfs -s localhost --volfile-id >>>>> gluster/glustershd -p >>>>> /var/run/gluster/glustershd/glustershd.pid -l >>>>> /var/log/glusterfs/glustershd.log -S >>>>> /var/run/gluster/24c12b09f93eec8e.socket >>>>> --xlator-option >>>>> *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412 >>>>> --process-name glustershd >>>>> root????? 6832???? 1? 0 May16 ???????? 00:02:10 >>>>> /usr/sbin/glusterd -p /var/run/glusterd.pid >>>>> --log-level INFO >>>>> root???? 17841???? 1? 0 May16 ???????? 00:00:58 >>>>> /usr/sbin/glusterfs --process-name fuse >>>>> --volfile-server=gfs1 --volfile-id=/gvol0 /mnt/glusterfs >>>>> >>>>> Here are the files created on the new arbiter server: >>>>> # find /nodirectwritedata/gluster/gvol0 | xargs ls -ald >>>>> drwxr-xr-x 3 root root 4096 May 21 20:15 >>>>> /nodirectwritedata/gluster/gvol0 >>>>> drw------- 2 root root 4096 May 21 20:15 >>>>> /nodirectwritedata/gluster/gvol0/.glusterfs >>>>> >>>>> Thank you for your help! >>>>> >>>>> >>>>> On Tue, 21 May 2019 at 00:10, Sanju Rakonde >>>>> > wrote: >>>>> >>>>> David, >>>>> >>>>> can you please attach glusterd.logs? As the error >>>>> message says, Commit failed on the arbitar node, >>>>> we might be able to find some issue on that node. >>>>> >>>>> On Mon, May 20, 2019 at 10:10 AM Nithya >>>>> Balachandran >>>> > wrote: >>>>> >>>>> >>>>> >>>>> On Fri, 17 May 2019 at 06:01, David Cunningham >>>>> >>>> > wrote: >>>>> >>>>> Hello, >>>>> >>>>> We're adding an arbiter node to an >>>>> existing volume and having an issue. Can >>>>> anyone help? The root cause error appears >>>>> to be >>>>> "00000000-0000-0000-0000-000000000001: >>>>> failed to resolve (Transport endpoint is >>>>> not connected)", as below. >>>>> >>>>> We are running glusterfs 5.6.1. Thanks in >>>>> advance for any assistance! >>>>> >>>>> On existing node gfs1, trying to add new >>>>> arbiter node gfs3: >>>>> >>>>> # gluster volume add-brick gvol0 replica 3 >>>>> arbiter 1 >>>>> gfs3:/nodirectwritedata/gluster/gvol0 >>>>> volume add-brick: failed: Commit failed on >>>>> gfs3. Please check log file for details. >>>>> >>>>> >>>>> This looks like a glusterd issue. Please check >>>>> the glusterd logs for more info. >>>>> Adding the glusterd dev to this thread. Sanju, >>>>> can you take a look? >>>>> Regards, >>>>> Nithya >>>>> >>>>> >>>>> On new node gfs3 in gvol0-add-brick-mount.log: >>>>> >>>>> [2019-05-17 01:20:22.689721] I >>>>> [fuse-bridge.c:4267:fuse_init] >>>>> 0-glusterfs-fuse: FUSE inited with >>>>> protocol versions: glusterfs 7.24 kernel 7.22 >>>>> [2019-05-17 01:20:22.689778] I >>>>> [fuse-bridge.c:4878:fuse_graph_sync] >>>>> 0-fuse: switched to graph 0 >>>>> [2019-05-17 01:20:22.694897] E >>>>> [fuse-bridge.c:4336:fuse_first_lookup] >>>>> 0-fuse: first lookup on root failed >>>>> (Transport endpoint is not connected) >>>>> [2019-05-17 01:20:22.699770] W >>>>> [fuse-resolve.c:127:fuse_resolve_gfid_cbk] >>>>> 0-fuse: >>>>> 00000000-0000-0000-0000-000000000001: >>>>> failed to resolve (Transport endpoint is >>>>> not connected) >>>>> [2019-05-17 01:20:22.699834] W >>>>> [fuse-bridge.c:3294:fuse_setxattr_resume] >>>>> 0-glusterfs-fuse: 2: SETXATTR >>>>> 00000000-0000-0000-0000-000000000001/1 >>>>> (trusted.add-brick) resolution failed >>>>> [2019-05-17 01:20:22.715656] I >>>>> [fuse-bridge.c:5144:fuse_thread_proc] >>>>> 0-fuse: initating unmount of /tmp/mntQAtu3f >>>>> [2019-05-17 01:20:22.715865] W >>>>> [glusterfsd.c:1500:cleanup_and_exit] >>>>> (-->/lib64/libpthread.so.0(+0x7dd5) >>>>> [0x7fb223bf6dd5] >>>>> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) >>>>> [0x560886581e75] >>>>> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) >>>>> [0x560886581ceb] ) 0-: received signum >>>>> (15), shutting down >>>>> [2019-05-17 01:20:22.715926] I >>>>> [fuse-bridge.c:5914:fini] 0-fuse: >>>>> Unmounting '/tmp/mntQAtu3f'. >>>>> [2019-05-17 01:20:22.715953] I >>>>> [fuse-bridge.c:5919:fini] 0-fuse: Closing >>>>> fuse connection to '/tmp/mntQAtu3f'. >>>>> >>>>> Processes running on new node gfs3: >>>>> >>>>> # ps -ef | grep gluster >>>>> root 6832???? 1? 0 20:17 ? 00:00:00 >>>>> /usr/sbin/glusterd -p >>>>> /var/run/glusterd.pid --log-level INFO >>>>> root 15799???? 1? 0 20:17 ? 00:00:00 >>>>> /usr/sbin/glusterfs -s localhost >>>>> --volfile-id gluster/glustershd -p >>>>> /var/run/gluster/glustershd/glustershd.pid >>>>> -l /var/log/glusterfs/glustershd.log -S >>>>> /var/run/gluster/24c12b09f93eec8e.socket >>>>> --xlator-option >>>>> *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412 >>>>> --process-name glustershd >>>>> root???? 16856 16735? 0 21:21 pts/0 >>>>> 00:00:00 grep --color=auto gluster >>>>> >>>>> -- >>>>> David Cunningham, Voisonics Limited >>>>> http://voisonics.com/ >>>>> USA: +1 213 221 1092 >>>>> New Zealand: +64 (0)28 2558 3782 >>>>> _______________________________________________ >>>>> Gluster-users mailing list >>>>> Gluster-users at gluster.org >>>>> >>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>> >>>>> >>>>> >>>>> -- >>>>> Thanks, >>>>> Sanju >>>>> >>>>> >>>>> >>>>> -- >>>>> David Cunningham, Voisonics Limited >>>>> http://voisonics.com/ >>>>> USA: +1 213 221 1092 >>>>> New Zealand: +64 (0)28 2558 3782 >>>>> >>>>> _______________________________________________ >>>>> Gluster-users mailing list >>>>> Gluster-users at gluster.org >>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>> >>>> >>>> >>>> -- >>>> David Cunningham, Voisonics Limited >>>> http://voisonics.com/ >>>> USA: +1 213 221 1092 >>>> New Zealand: +64 (0)28 2558 3782 >>> >>> >>> >>> -- >>> David Cunningham, Voisonics Limited >>> http://voisonics.com/ >>> USA: +1 213 221 1092 >>> New Zealand: +64 (0)28 2558 3782 > > > > -- > David Cunningham, Voisonics Limited > http://voisonics.com/ > USA: +1 213 221 1092 > New Zealand: +64 (0)28 2558 3782 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ravishankar at redhat.com Fri May 24 13:59:55 2019 From: ravishankar at redhat.com (Ravishankar N) Date: Fri, 24 May 2019 19:29:55 +0530 Subject: [Gluster-users] Does replace-brick migrate data? In-Reply-To: References: Message-ID: <32e26faf-e5c0-b944-2a32-c9eae408b146@redhat.com> On 23/05/19 2:40 AM, Alan Orth wrote: > Dear list, > > I seem to have gotten into a tricky situation. Today I brought up a > shiny new server with new disk arrays and attempted to replace one > brick of a replica 2 distribute/replicate volume on an older server > using the `replace-brick` command: > > # gluster volume replace-brick homes wingu0:/mnt/gluster/homes > wingu06:/data/glusterfs/sdb/homes commit force > > The command was successful and I see the new brick in the output of > `gluster volume info`. The problem is that Gluster doesn't seem to be > migrating the data, `replace-brick` definitely must heal (not migrate) the data. In your case, data must have been healed from Brick-4 to the replaced Brick-3. Are there any errors in the self-heal daemon logs of Brick-4's node? Does Brick-4 have pending AFR xattrs blaming Brick-3? The doc is a bit out of date. replace-brick command internally does all the setfattr steps that are mentioned in the doc. -Ravi > and now the original brick that I replaced is no longer part of the > volume (and a few terabytes of data are just sitting on the old brick): > > # gluster volume info homes | grep -E "Brick[0-9]:" > Brick1: wingu4:/mnt/gluster/homes > Brick2: wingu3:/mnt/gluster/homes > Brick3: wingu06:/data/glusterfs/sdb/homes > Brick4: wingu05:/data/glusterfs/sdb/homes > Brick5: wingu05:/data/glusterfs/sdc/homes > Brick6: wingu06:/data/glusterfs/sdc/homes > > I see the Gluster docs have a more complicated procedure for replacing > bricks that involves getfattr/setfattr?. How can I tell Gluster about > the old brick? I see that I have a backup of the old volfile thanks to > yum's rpmsave function if that helps. > > We are using Gluster 5.6 on CentOS 7. Thank you for any advice you can > give. > > ? > https://docs.gluster.org/en/latest/Administrator%20Guide/Managing%20Volumes/#replace-faulty-brick > > -- > Alan Orth > alan.orth at gmail.com > https://picturingjordan.com > https://englishbulgaria.net > https://mjanja.ch > "In heaven all the interesting people are missing." ?Friedrich Nietzsche > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From ravishankar at redhat.com Fri May 24 14:03:03 2019 From: ravishankar at redhat.com (Ravishankar N) Date: Fri, 24 May 2019 19:33:03 +0530 Subject: [Gluster-users] remove-brick failure on distributed with 5.6 In-Reply-To: <080701d5118f$574a72b0$05df5810$@thinkhuge.net> References: <080701d5118f$574a72b0$05df5810$@thinkhuge.net> Message-ID: <281bd623-f8e9-400b-bdcf-00aee1cdcf95@redhat.com> Adding a few DHT folks for some possible suggestions. -Ravi On 23/05/19 11:15 PM, brandon at thinkhuge.net wrote: > > Does anyone know what should be done on a glusterfs v5.6 "gluster > volume remove-brick" operation that fails?? I'm trying to remove 1 of > 8 distributed smaller nodes for replacement with larger node. > > The "gluster volume remove-brick ... status" command reports status > failed and failures = "3" > > cat /var/log/glusterfs/volbackups-rebalance.log > > ... > > [2019-05-23 16:43:37.442283] I [MSGID: 109028] > [dht-rebalance.c:5070:gf_defrag_status_get] 0-volbackups-dht: > Rebalance is failed. Time taken is 545.00 secs > > All servers are confirmed in good communications and updated and > freshly rebooted and retried the remove-brick few times with fail each > time > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From nbalacha at redhat.com Sat May 25 05:00:13 2019 From: nbalacha at redhat.com (Nithya Balachandran) Date: Sat, 25 May 2019 10:30:13 +0530 Subject: [Gluster-users] remove-brick failure on distributed with 5.6 In-Reply-To: <281bd623-f8e9-400b-bdcf-00aee1cdcf95@redhat.com> References: <080701d5118f$574a72b0$05df5810$@thinkhuge.net> <281bd623-f8e9-400b-bdcf-00aee1cdcf95@redhat.com> Message-ID: Hi Brandon, Please send the following: 1. the gluster volume info 2. Information about which brick was removed 3. The rebalance log file for all nodes hosting removed bricks. Regards, Nithya On Fri, 24 May 2019 at 19:33, Ravishankar N wrote: > Adding a few DHT folks for some possible suggestions. > > -Ravi > On 23/05/19 11:15 PM, brandon at thinkhuge.net wrote: > > Does anyone know what should be done on a glusterfs v5.6 "gluster volume > remove-brick" operation that fails? I'm trying to remove 1 of 8 > distributed smaller nodes for replacement with larger node. > > > > The "gluster volume remove-brick ... status" command reports status failed > and failures = "3" > > > > cat /var/log/glusterfs/volbackups-rebalance.log > > ... > > [2019-05-23 16:43:37.442283] I [MSGID: 109028] > [dht-rebalance.c:5070:gf_defrag_status_get] 0-volbackups-dht: Rebalance is > failed. Time taken is 545.00 secs > > > > All servers are confirmed in good communications and updated and freshly > rebooted and retried the remove-brick few times with fail each time > > > > _______________________________________________ > Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Sun May 26 13:38:13 2019 From: hunter86_bg at yahoo.com (Strahil Nikolov) Date: Sun, 26 May 2019 13:38:13 +0000 (UTC) Subject: [Gluster-users] [ovirt-users] Re: Single instance scaleup. In-Reply-To: References: Message-ID: <626088321.4969320.1558877893362@mail.yahoo.com> Yeah,it seems different from the docs.I'm adding the gluster users list ,as they are more experienced into that. @Gluster-users, can you provide some hint how to add aditional replicas to the below volumes , so they become 'replica 2 arbiter 1' or 'replica 3' type volumes ? Best Regards,Strahil Nikolov ? ??????, 26 ??? 2019 ?., 15:16:18 ?. ???????+3, Leo David ??????: Thank you Strahil,The engine and ssd-samsung are distributed...So these are the ones that I need to have replicated accross new nodes.I am not very sure about the procedure to accomplish this.Thanks, Leo On Sun, May 26, 2019, 13:04 Strahil wrote: Hi Leo, As you do not have a distributed volume , you can easily switch to replica 2 arbiter 1 or replica 3 volumes. You can use the following for adding the bricks: https://access.redhat.com/documentation/en-US/Red_Hat_Storage/2.1/html/Administration_Guide/Expanding_Volumes.html Best Regards, Strahil Nikoliv On May 26, 2019 10:54, Leo David wrote: Hi Stahil,Thank you so much for yout input ! ?gluster volume info Volume Name: engine Type: Distribute Volume ID: d7449fc2-cc35-4f80-a776-68e4a3dbd7e1 Status: Started Snapshot Count: 0 Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: 192.168.80.191:/gluster_bricks/engine/engine Options Reconfigured: nfs.disable: on transport.address-family: inet storage.owner-uid: 36 storage.owner-gid: 36 features.shard: on performance.low-prio-threads: 32 performance.strict-o-direct: off network.remote-dio: off network.ping-timeout: 30 user.cifs: off performance.quick-read: off performance.read-ahead: off performance.io-cache: off cluster.eager-lock: enableVolume Name: ssd-samsung Type: Distribute Volume ID: 76576cc6-220b-4651-952d-99846178a19e Status: Started Snapshot Count: 0 Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: 192.168.80.191:/gluster_bricks/sdc/data Options Reconfigured: cluster.eager-lock: enable performance.io-cache: off performance.read-ahead: off performance.quick-read: off user.cifs: off network.ping-timeout: 30 network.remote-dio: off performance.strict-o-direct: on performance.low-prio-threads: 32 features.shard: on storage.owner-gid: 36 storage.owner-uid: 36 transport.address-family: inet nfs.disable: on The other two hosts will be 192.168.80.192/193??- this is gluster dedicated network over 10GB sfp+ switch.- host 2?wil have identical harware configuration with host 1 ( each disk is actually a raid0 array )- host 3 has:?? -? 1 ssd for OS?? -??1 ssd - for adding to engine volume in a full replica 3?? -? 2 ssd's in a raid 1 array?to be added?as arbiter for the data volume ( ssd-samsung )So the plan is to have "engine"? scaled in a full replica 3,? and "ssd-samsung" scalled in a replica 3 arbitrated. On Sun, May 26, 2019 at 10:34 AM Strahil wrote: Hi Leo, Gluster is quite smart, but in order to provide any hints , can you provide output of 'gluster volume info '. If you have 2 more systems , keep in mind that it is best to mirror the storage on the second replica (2 disks on 1 machine -> 2 disks on the new machine), while for the arbiter this is not neccessary. What is your network and NICs ? Based on my experience , I can recommend at least 10 gbit/s? interfase(s). Best Regards, Strahil Nikolov On May 26, 2019 07:52, Leo David wrote: Hello Everyone,Can someone help me to clarify this ?I have a single-node 4.2.8 installation ( only two gluster storage domains - distributed? single drive volumes ). Now I just got two identintical servers and I would like to go for a 3 nodes bundle.Is it possible ( after joining the new nodes to the cluster ) to expand the existing volumes across the new nodes and change them to replica 3 arbitrated ?If so, could you share with me what would it be the procedure ?Thank you very much ! Leo -- Best regards, Leo David -------------- next part -------------- An HTML attachment was scrubbed... URL: From dcunningham at voisonics.com Mon May 27 00:23:59 2019 From: dcunningham at voisonics.com (David Cunningham) Date: Mon, 27 May 2019 12:23:59 +1200 Subject: [Gluster-users] add-brick: failed: Commit failed In-Reply-To: <2132c601-86a4-2b85-5d72-7b9926f890a2@redhat.com> References: <924b8cb6-5a61-3a7f-1591-07ffe0d80a24@redhat.com> <47a6c5fa-4304-4680-d63f-99ecd1e43c4c@redhat.com> <764773c5-38d4-e427-d699-3192bf9a1005@redhat.com> <2132c601-86a4-2b85-5d72-7b9926f890a2@redhat.com> Message-ID: Hi Ravi, Thank you, that seems to have resolved the issue. After doing this, "gluster volume status all" showed gfs3 as online with a port and pid, however "gluster volume status all" didn't show any sync activity happening. At this point we loaded gfs3 with new firewall rules which explicitly allowed access from gfs1 and gfs2, and then "gluster volume status all" showed the file syncing. The gfs3 server should have allow access from gfs1 and gfs2 anyway by default, however I now believe that perhaps this wasn't the case, and maybe it was a firewall issue all along. Thanks for all your help. On Sat, 25 May 2019 at 01:49, Ravishankar N wrote: > Hi David, > On 23/05/19 3:54 AM, David Cunningham wrote: > > Hi Ravi, > > Please see the log attached. > > When I grep -E "Connected to |disconnected from" > gvol0-add-brick-mount.log, I don't see a "Connected to gvol0-client-1". > It looks like this temporary mount is not able to connect to the 2nd brick, > which is why the lookup is failing due to lack of quorum. > > The output of "gluster volume status" is as follows. Should there be > something listening on gfs3? I'm not sure whether it having TCP Port and > Pid as N/A is a symptom or cause. Thank you. > > # gluster volume status > Status of volume: gvol0 > Gluster process TCP Port RDMA Port Online > Pid > > ------------------------------------------------------------------------------ > Brick gfs1:/nodirectwritedata/gluster/gvol0 49152 0 Y > 7706 > Brick gfs2:/nodirectwritedata/gluster/gvol0 49152 0 Y > 7624 > Brick gfs3:/nodirectwritedata/gluster/gvol0 N/A N/A N > N/A > > Can you see if the following steps help? > > 1. Do a `setfattr -n trusted.afr.gvol0-client-2 -v > 0x000000000000000100000001 /nodirectwritedata/gluster/gvol0` on *both* > gfs1 and gfs2. > > 2. 'gluster volume start gvol0 force` > > 3. Check if Brick-3 now comes online with a valid TCP port and PID. If it > doesn't, check the brick log under /var/log/glusterfs/bricks on gfs3 to see > why. > > Thanks, > > Ravi > > > Self-heal Daemon on localhost N/A N/A Y > 19853 > Self-heal Daemon on gfs1 N/A N/A Y > 28600 > Self-heal Daemon on gfs2 N/A N/A Y > 17614 > > Task Status of Volume gvol0 > > ------------------------------------------------------------------------------ > There are no active volume tasks > > > On Wed, 22 May 2019 at 18:06, Ravishankar N > wrote: > >> If you are trying this again, please 'gluster volume set $volname >> client-log-level DEBUG`before attempting the add-brick and attach the >> gvol0-add-brick-mount.log here. After that, you can change the >> client-log-level back to INFO. >> >> -Ravi >> On 22/05/19 11:32 AM, Ravishankar N wrote: >> >> >> On 22/05/19 11:23 AM, David Cunningham wrote: >> >> Hi Ravi, >> >> I'd already done exactly that before, where step 3 was a simple 'rm -rf >> /nodirectwritedata/gluster/gvol0'. Have you another suggestion on what the >> cleanup or reformat should be? >> >> `rm -rf /nodirectwritedata/gluster/gvol0` does look okay to me David. >> Basically, '/nodirectwritedata/gluster/gvol0' must be empty and must not >> have any extended attributes set on it. Why fuse_first_lookup() is failing >> is a bit of a mystery to me at this point. :-( >> Regards, >> Ravi >> >> >> Thank you. >> >> >> On Wed, 22 May 2019 at 13:56, Ravishankar N >> wrote: >> >>> Hmm, so the volume info seems to indicate that the add-brick was >>> successful but the gfid xattr is missing on the new brick (as are the >>> actual files, barring the .glusterfs folder, according to your previous >>> mail). >>> >>> Do you want to try removing and adding it again? >>> >>> 1. `gluster volume remove-brick gvol0 replica 2 >>> gfs3:/nodirectwritedata/gluster/gvol0 force` from gfs1 >>> >>> 2. Check that gluster volume info is now back to a 1x2 volume on all >>> nodes and `gluster peer status` is connected on all nodes. >>> >>> 3. Cleanup or reformat '/nodirectwritedata/gluster/gvol0' on gfs3. >>> >>> 4. `gluster volume add-brick gvol0 replica 3 arbiter 1 >>> gfs3:/nodirectwritedata/gluster/gvol0` from gfs1. >>> >>> 5. Check that the files are getting healed on to the new brick. >>> Thanks, >>> Ravi >>> On 22/05/19 6:50 AM, David Cunningham wrote: >>> >>> Hi Ravi, >>> >>> Certainly. On the existing two nodes: >>> >>> gfs1 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0 >>> getfattr: Removing leading '/' from absolute path names >>> # file: nodirectwritedata/gluster/gvol0 >>> trusted.afr.dirty=0x000000000000000000000000 >>> trusted.afr.gvol0-client-2=0x000000000000000000000000 >>> trusted.gfid=0x00000000000000000000000000000001 >>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>> trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6 >>> >>> gfs2 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0 >>> getfattr: Removing leading '/' from absolute path names >>> # file: nodirectwritedata/gluster/gvol0 >>> trusted.afr.dirty=0x000000000000000000000000 >>> trusted.afr.gvol0-client-0=0x000000000000000000000000 >>> trusted.afr.gvol0-client-2=0x000000000000000000000000 >>> trusted.gfid=0x00000000000000000000000000000001 >>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>> trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6 >>> >>> On the new node: >>> >>> gfs3 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0 >>> getfattr: Removing leading '/' from absolute path names >>> # file: nodirectwritedata/gluster/gvol0 >>> trusted.afr.dirty=0x000000000000000000000001 >>> trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6 >>> >>> Output of "gluster volume info" is the same on all 3 nodes and is: >>> >>> # gluster volume info >>> >>> Volume Name: gvol0 >>> Type: Replicate >>> Volume ID: fb5af69e-1c3e-4164-8b23-c1d7bec9b1b6 >>> Status: Started >>> Snapshot Count: 0 >>> Number of Bricks: 1 x (2 + 1) = 3 >>> Transport-type: tcp >>> Bricks: >>> Brick1: gfs1:/nodirectwritedata/gluster/gvol0 >>> Brick2: gfs2:/nodirectwritedata/gluster/gvol0 >>> Brick3: gfs3:/nodirectwritedata/gluster/gvol0 (arbiter) >>> Options Reconfigured: >>> performance.client-io-threads: off >>> nfs.disable: on >>> transport.address-family: inet >>> >>> >>> On Wed, 22 May 2019 at 12:43, Ravishankar N >>> wrote: >>> >>>> Hi David, >>>> Could you provide the `getfattr -d -m. -e hex >>>> /nodirectwritedata/gluster/gvol0` output of all bricks and the output of >>>> `gluster volume info`? >>>> >>>> Thanks, >>>> Ravi >>>> On 22/05/19 4:57 AM, David Cunningham wrote: >>>> >>>> Hi Sanju, >>>> >>>> Here's what glusterd.log says on the new arbiter server when trying to >>>> add the node: >>>> >>>> [2019-05-22 00:15:05.963059] I [run.c:242:runner_log] >>>> (-->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x3b2cd) >>>> [0x7fe4ca9102cd] >>>> -->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0xe6b85) >>>> [0x7fe4ca9bbb85] -->/lib64/libglusterfs.so.0(runner_log+0x115) >>>> [0x7fe4d5ecc955] ) 0-management: Ran script: >>>> /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh >>>> --volname=gvol0 --version=1 --volume-op=add-brick >>>> --gd-workdir=/var/lib/glusterd >>>> [2019-05-22 00:15:05.963177] I [MSGID: 106578] >>>> [glusterd-brick-ops.c:1355:glusterd_op_perform_add_bricks] 0-management: >>>> replica-count is set 3 >>>> [2019-05-22 00:15:05.963228] I [MSGID: 106578] >>>> [glusterd-brick-ops.c:1360:glusterd_op_perform_add_bricks] 0-management: >>>> arbiter-count is set 1 >>>> [2019-05-22 00:15:05.963257] I [MSGID: 106578] >>>> [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks] 0-management: >>>> type is set 0, need to change it >>>> [2019-05-22 00:15:17.015268] E [MSGID: 106053] >>>> [glusterd-utils.c:13942:glusterd_handle_replicate_brick_ops] 0-management: >>>> Failed to set extended attribute trusted.add-brick : Transport endpoint is >>>> not connected [Transport endpoint is not connected] >>>> [2019-05-22 00:15:17.036479] E [MSGID: 106073] >>>> [glusterd-brick-ops.c:2595:glusterd_op_add_brick] 0-glusterd: Unable to add >>>> bricks >>>> [2019-05-22 00:15:17.036595] E [MSGID: 106122] >>>> [glusterd-mgmt.c:299:gd_mgmt_v3_commit_fn] 0-management: Add-brick commit >>>> failed. >>>> [2019-05-22 00:15:17.036710] E [MSGID: 106122] >>>> [glusterd-mgmt-handler.c:594:glusterd_handle_commit_fn] 0-management: >>>> commit failed on operation Add brick >>>> >>>> As before gvol0-add-brick-mount.log said: >>>> >>>> [2019-05-22 00:15:17.005695] I [fuse-bridge.c:4267:fuse_init] >>>> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel >>>> 7.22 >>>> [2019-05-22 00:15:17.005749] I [fuse-bridge.c:4878:fuse_graph_sync] >>>> 0-fuse: switched to graph 0 >>>> [2019-05-22 00:15:17.010101] E [fuse-bridge.c:4336:fuse_first_lookup] >>>> 0-fuse: first lookup on root failed (Transport endpoint is not connected) >>>> [2019-05-22 00:15:17.014217] W [fuse-bridge.c:897:fuse_attr_cbk] >>>> 0-glusterfs-fuse: 2: LOOKUP() / => -1 (Transport endpoint is not connected) >>>> [2019-05-22 00:15:17.015097] W >>>> [fuse-resolve.c:127:fuse_resolve_gfid_cbk] 0-fuse: >>>> 00000000-0000-0000-0000-000000000001: failed to resolve (Transport endpoint >>>> is not connected) >>>> [2019-05-22 00:15:17.015158] W >>>> [fuse-bridge.c:3294:fuse_setxattr_resume] 0-glusterfs-fuse: 3: SETXATTR >>>> 00000000-0000-0000-0000-000000000001/1 (trusted.add-brick) resolution failed >>>> [2019-05-22 00:15:17.035636] I [fuse-bridge.c:5144:fuse_thread_proc] >>>> 0-fuse: initating unmount of /tmp/mntYGNbj9 >>>> [2019-05-22 00:15:17.035854] W [glusterfsd.c:1500:cleanup_and_exit] >>>> (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f7745ccedd5] >>>> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55c81b63de75] >>>> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55c81b63dceb] ) 0-: >>>> received signum (15), shutting down >>>> [2019-05-22 00:15:17.035942] I [fuse-bridge.c:5914:fini] 0-fuse: >>>> Unmounting '/tmp/mntYGNbj9'. >>>> [2019-05-22 00:15:17.035966] I [fuse-bridge.c:5919:fini] 0-fuse: >>>> Closing fuse connection to '/tmp/mntYGNbj9'. >>>> >>>> Here are the processes running on the new arbiter server: >>>> # ps -ef | grep gluster >>>> root 3466 1 0 20:13 ? 00:00:00 /usr/sbin/glusterfs -s >>>> localhost --volfile-id gluster/glustershd -p >>>> /var/run/gluster/glustershd/glustershd.pid -l >>>> /var/log/glusterfs/glustershd.log -S >>>> /var/run/gluster/24c12b09f93eec8e.socket --xlator-option >>>> *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412 --process-name >>>> glustershd >>>> root 6832 1 0 May16 ? 00:02:10 /usr/sbin/glusterd -p >>>> /var/run/glusterd.pid --log-level INFO >>>> root 17841 1 0 May16 ? 00:00:58 /usr/sbin/glusterfs >>>> --process-name fuse --volfile-server=gfs1 --volfile-id=/gvol0 /mnt/glusterfs >>>> >>>> Here are the files created on the new arbiter server: >>>> # find /nodirectwritedata/gluster/gvol0 | xargs ls -ald >>>> drwxr-xr-x 3 root root 4096 May 21 20:15 >>>> /nodirectwritedata/gluster/gvol0 >>>> drw------- 2 root root 4096 May 21 20:15 >>>> /nodirectwritedata/gluster/gvol0/.glusterfs >>>> >>>> Thank you for your help! >>>> >>>> >>>> On Tue, 21 May 2019 at 00:10, Sanju Rakonde >>>> wrote: >>>> >>>>> David, >>>>> >>>>> can you please attach glusterd.logs? As the error message says, Commit >>>>> failed on the arbitar node, we might be able to find some issue on that >>>>> node. >>>>> >>>>> On Mon, May 20, 2019 at 10:10 AM Nithya Balachandran < >>>>> nbalacha at redhat.com> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Fri, 17 May 2019 at 06:01, David Cunningham < >>>>>> dcunningham at voisonics.com> wrote: >>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> We're adding an arbiter node to an existing volume and having an >>>>>>> issue. Can anyone help? The root cause error appears to be >>>>>>> "00000000-0000-0000-0000-000000000001: failed to resolve (Transport >>>>>>> endpoint is not connected)", as below. >>>>>>> >>>>>>> We are running glusterfs 5.6.1. Thanks in advance for any assistance! >>>>>>> >>>>>>> On existing node gfs1, trying to add new arbiter node gfs3: >>>>>>> >>>>>>> # gluster volume add-brick gvol0 replica 3 arbiter 1 >>>>>>> gfs3:/nodirectwritedata/gluster/gvol0 >>>>>>> volume add-brick: failed: Commit failed on gfs3. Please check log >>>>>>> file for details. >>>>>>> >>>>>> >>>>>> This looks like a glusterd issue. Please check the glusterd logs for >>>>>> more info. >>>>>> Adding the glusterd dev to this thread. Sanju, can you take a look? >>>>>> >>>>>> Regards, >>>>>> Nithya >>>>>> >>>>>>> >>>>>>> On new node gfs3 in gvol0-add-brick-mount.log: >>>>>>> >>>>>>> [2019-05-17 01:20:22.689721] I [fuse-bridge.c:4267:fuse_init] >>>>>>> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel >>>>>>> 7.22 >>>>>>> [2019-05-17 01:20:22.689778] I [fuse-bridge.c:4878:fuse_graph_sync] >>>>>>> 0-fuse: switched to graph 0 >>>>>>> [2019-05-17 01:20:22.694897] E >>>>>>> [fuse-bridge.c:4336:fuse_first_lookup] 0-fuse: first lookup on root failed >>>>>>> (Transport endpoint is not connected) >>>>>>> [2019-05-17 01:20:22.699770] W >>>>>>> [fuse-resolve.c:127:fuse_resolve_gfid_cbk] 0-fuse: >>>>>>> 00000000-0000-0000-0000-000000000001: failed to resolve (Transport endpoint >>>>>>> is not connected) >>>>>>> [2019-05-17 01:20:22.699834] W >>>>>>> [fuse-bridge.c:3294:fuse_setxattr_resume] 0-glusterfs-fuse: 2: SETXATTR >>>>>>> 00000000-0000-0000-0000-000000000001/1 (trusted.add-brick) resolution failed >>>>>>> [2019-05-17 01:20:22.715656] I [fuse-bridge.c:5144:fuse_thread_proc] >>>>>>> 0-fuse: initating unmount of /tmp/mntQAtu3f >>>>>>> [2019-05-17 01:20:22.715865] W [glusterfsd.c:1500:cleanup_and_exit] >>>>>>> (-->/lib64/libpthread.so.0(+0x7dd5) [0x7fb223bf6dd5] >>>>>>> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x560886581e75] >>>>>>> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x560886581ceb] ) 0-: >>>>>>> received signum (15), shutting down >>>>>>> [2019-05-17 01:20:22.715926] I [fuse-bridge.c:5914:fini] 0-fuse: >>>>>>> Unmounting '/tmp/mntQAtu3f'. >>>>>>> [2019-05-17 01:20:22.715953] I [fuse-bridge.c:5919:fini] 0-fuse: >>>>>>> Closing fuse connection to '/tmp/mntQAtu3f'. >>>>>>> >>>>>>> Processes running on new node gfs3: >>>>>>> >>>>>>> # ps -ef | grep gluster >>>>>>> root 6832 1 0 20:17 ? 00:00:00 /usr/sbin/glusterd >>>>>>> -p /var/run/glusterd.pid --log-level INFO >>>>>>> root 15799 1 0 20:17 ? 00:00:00 /usr/sbin/glusterfs >>>>>>> -s localhost --volfile-id gluster/glustershd -p >>>>>>> /var/run/gluster/glustershd/glustershd.pid -l >>>>>>> /var/log/glusterfs/glustershd.log -S >>>>>>> /var/run/gluster/24c12b09f93eec8e.socket --xlator-option >>>>>>> *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412 --process-name >>>>>>> glustershd >>>>>>> root 16856 16735 0 21:21 pts/0 00:00:00 grep --color=auto >>>>>>> gluster >>>>>>> >>>>>>> -- >>>>>>> David Cunningham, Voisonics Limited >>>>>>> http://voisonics.com/ >>>>>>> USA: +1 213 221 1092 >>>>>>> New Zealand: +64 (0)28 2558 3782 >>>>>>> _______________________________________________ >>>>>>> Gluster-users mailing list >>>>>>> Gluster-users at gluster.org >>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>>>>> >>>>>> >>>>> >>>>> -- >>>>> Thanks, >>>>> Sanju >>>>> >>>> >>>> >>>> -- >>>> David Cunningham, Voisonics Limited >>>> http://voisonics.com/ >>>> USA: +1 213 221 1092 >>>> New Zealand: +64 (0)28 2558 3782 >>>> >>>> _______________________________________________ >>>> Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users >>>> >>>> >>> >>> -- >>> David Cunningham, Voisonics Limited >>> http://voisonics.com/ >>> USA: +1 213 221 1092 >>> New Zealand: +64 (0)28 2558 3782 >>> >>> >> >> -- >> David Cunningham, Voisonics Limited >> http://voisonics.com/ >> USA: +1 213 221 1092 >> New Zealand: +64 (0)28 2558 3782 >> >> > > -- > David Cunningham, Voisonics Limited > http://voisonics.com/ > USA: +1 213 221 1092 > New Zealand: +64 (0)28 2558 3782 > > -- David Cunningham, Voisonics Limited http://voisonics.com/ USA: +1 213 221 1092 New Zealand: +64 (0)28 2558 3782 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mauro.tridici at cmcc.it Mon May 27 08:53:22 2019 From: mauro.tridici at cmcc.it (Mauro Tridici) Date: Mon, 27 May 2019 10:53:22 +0200 Subject: [Gluster-users] read-only glusterfs volume mount on a virtual machine Message-ID: <7A414EC3-54AD-4FAF-A680-5896A659AB5A@cmcc.it> Dear Users, anyone of us could help me to identify the right way to export a gluster volume (or some directories of it) in read-only access to a specific IP address (assigned to a virtual machine)? At this moment, the volume is already mounted on 3 other client servers (in RW mode) using glusterfs native client. Now, I would like to add a new read-only client but without specifying RO mode in /etc/fstab file on the client virutal machine: i would like to set RO access mode from gluster server side. Is it possible? Thank you in advance, Mauro -------------- next part -------------- An HTML attachment was scrubbed... URL: From kontakt at taste-of-it.de Mon May 27 21:43:08 2019 From: kontakt at taste-of-it.de (Taste-Of-IT) Date: Mon, 27 May 2019 21:43:08 +0000 Subject: [Gluster-users] remove-brick failure on distributed with 5.6 In-Reply-To: References: <080701d5118f$574a72b0$05df5810$@thinkhuge.net> <281bd623-f8e9-400b-bdcf-00aee1cdcf95@redhat.com> Message-ID: <42bb895028f4def62830fd1be0054b52e1b83f32@taste-of-it.de> Hi, i had similar problem. In my case the rebalance didnt finish because of not enough free space to migrate the space to other nodes. Reason was 1% Reservation Options which is setup by default in distributed, but i set it to 0%, which was ignored by gluster Greatings? Taste Am 25.05.2019 07:00:13, schrieb Nithya Balachandran: > Hi Brandon, > Please send the following: > > 1. the gluster volume info > > 2. Information about which brick was removed > > 3. The rebalance log file for all nodes hosting removed bricks. > > Regards, > > Nithya > > > On Fri, 24 May 2019 at 19:33, Ravishankar N <> ravishankar at redhat.com> > wrote: > > > > > > Adding a few DHT folks for some possible suggestions. > > > > -Ravi > > > > On 23/05/19 11:15 PM, > > brandon at thinkhuge.net> > wrote: > > > > > > > > > > > Does anyone know what should be done on a glusterfs v5.6 "gluster volume remove-brick" operation that fails?? I'm trying to remove 1 of 8 distributed smaller nodes for replacement with larger node. > > > > > > ?> > > > > > The "gluster volume remove-brick ... status" command reports status failed and failures = "3" > > > > > > ?> > > > > > cat /var/log/glusterfs/volbackups-rebalance.log > > > > > > ... > > > > > > [2019-05-23 16:43:37.442283] I [MSGID: 109028] [dht-rebalance.c:5070:gf_defrag_status_get] 0-volbackups-dht: Rebalance is failed. Time taken is 545.00 secs > > > > > > ?> > > > > > All servers are confirmed in good communications and updated and freshly rebooted and retried the remove-brick few times with fail each time> > > > > > ?> > > > > > > > > > > > _______________________________________________ Gluster-users mailing list > > > Gluster-users at gluster.org> > > > > > https://lists.gluster.org/mailman/listinfo/gluster-users> > > > > > > > > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.orth at gmail.com Tue May 28 22:29:59 2019 From: alan.orth at gmail.com (Alan Orth) Date: Wed, 29 May 2019 01:29:59 +0300 Subject: [Gluster-users] Does replace-brick migrate data? In-Reply-To: <32e26faf-e5c0-b944-2a32-c9eae408b146@redhat.com> References: <32e26faf-e5c0-b944-2a32-c9eae408b146@redhat.com> Message-ID: Dear Ravishankar, I'm not sure if Brick4 had pending AFRs because I don't know what that means and it's been a few days so I am not sure I would be able to find that information. Anyways, after wasting a few days rsyncing the old brick to a new host I decided to just try to add the old brick back into the volume instead of bringing it up on the new host. I created a new brick directory on the old host, moved the old brick's contents into that new directory (minus the .glusterfs directory), added the new brick to the volume, and then did Vlad's find/stat trick? from the brick to the FUSE mount point. The interesting problem I have now is that some files don't appear in the FUSE mount's directory listings, but I can actually list them directly and even read them. What could cause that? Thanks, ? https://lists.gluster.org/pipermail/gluster-users/2018-February/033584.html On Fri, May 24, 2019 at 4:59 PM Ravishankar N wrote: > > On 23/05/19 2:40 AM, Alan Orth wrote: > > Dear list, > > I seem to have gotten into a tricky situation. Today I brought up a shiny > new server with new disk arrays and attempted to replace one brick of a > replica 2 distribute/replicate volume on an older server using the > `replace-brick` command: > > # gluster volume replace-brick homes wingu0:/mnt/gluster/homes > wingu06:/data/glusterfs/sdb/homes commit force > > The command was successful and I see the new brick in the output of > `gluster volume info`. The problem is that Gluster doesn't seem to be > migrating the data, > > `replace-brick` definitely must heal (not migrate) the data. In your case, > data must have been healed from Brick-4 to the replaced Brick-3. Are there > any errors in the self-heal daemon logs of Brick-4's node? Does Brick-4 > have pending AFR xattrs blaming Brick-3? The doc is a bit out of date. > replace-brick command internally does all the setfattr steps that are > mentioned in the doc. > > -Ravi > > > and now the original brick that I replaced is no longer part of the volume > (and a few terabytes of data are just sitting on the old brick): > > # gluster volume info homes | grep -E "Brick[0-9]:" > Brick1: wingu4:/mnt/gluster/homes > Brick2: wingu3:/mnt/gluster/homes > Brick3: wingu06:/data/glusterfs/sdb/homes > Brick4: wingu05:/data/glusterfs/sdb/homes > Brick5: wingu05:/data/glusterfs/sdc/homes > Brick6: wingu06:/data/glusterfs/sdc/homes > > I see the Gluster docs have a more complicated procedure for replacing > bricks that involves getfattr/setfattr?. How can I tell Gluster about the > old brick? I see that I have a backup of the old volfile thanks to yum's > rpmsave function if that helps. > > We are using Gluster 5.6 on CentOS 7. Thank you for any advice you can > give. > > ? > https://docs.gluster.org/en/latest/Administrator%20Guide/Managing%20Volumes/#replace-faulty-brick > > -- > Alan Orth > alan.orth at gmail.com > https://picturingjordan.com > https://englishbulgaria.net > https://mjanja.ch > "In heaven all the interesting people are missing." ?Friedrich Nietzsche > > _______________________________________________ > Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users > > -- Alan Orth alan.orth at gmail.com https://picturingjordan.com https://englishbulgaria.net https://mjanja.ch "In heaven all the interesting people are missing." ?Friedrich Nietzsche -------------- next part -------------- An HTML attachment was scrubbed... URL: From dcunningham at voisonics.com Wed May 29 00:51:48 2019 From: dcunningham at voisonics.com (David Cunningham) Date: Wed, 29 May 2019 12:51:48 +1200 Subject: [Gluster-users] Transport endpoint is not connected Message-ID: Hello all, We are seeing a strange issue where a new node gfs3 shows another node gfs2 as not connected on the "gluster volume heal" info: [root at gfs3 bricks]# gluster volume heal gvol0 info Brick gfs1:/nodirectwritedata/gluster/gvol0 Status: Connected Number of entries: 0 Brick gfs2:/nodirectwritedata/gluster/gvol0 Status: Transport endpoint is not connected Number of entries: - Brick gfs3:/nodirectwritedata/gluster/gvol0 Status: Connected Number of entries: 0 However it does show the same node connected on "gluster peer status". Does anyone know why this would be? [root at gfs3 bricks]# gluster peer status Number of Peers: 2 Hostname: gfs2 Uuid: 91863102-23a8-43e1-b3d3-f0a1bd57f350 State: Peer in Cluster (Connected) Hostname: gfs1 Uuid: 32c99e7d-71f2-421c-86fc-b87c0f68ad1b State: Peer in Cluster (Connected) In nodirectwritedata-gluster-gvol0.log on gfs3 we see this logged with regards to gfs2: [2019-05-29 00:17:50.646360] I [MSGID: 115029] [server-handshake.c:537:server_setvolume] 0-gvol0-server: accepted client from CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0 (version: 5.6) [2019-05-29 00:17:50.761120] I [MSGID: 115036] [server.c:469:server_rpc_notify] 0-gvol0-server: disconnecting connection from CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0 [2019-05-29 00:17:50.761352] I [MSGID: 101055] [client_t.c:435:gf_client_unref] 0-gvol0-server: Shutting down connection CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0 Thanks in advance for any assistance. -- David Cunningham, Voisonics Limited http://voisonics.com/ USA: +1 213 221 1092 New Zealand: +64 (0)28 2558 3782 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ravishankar at redhat.com Wed May 29 04:20:51 2019 From: ravishankar at redhat.com (Ravishankar N) Date: Wed, 29 May 2019 09:50:51 +0530 Subject: [Gluster-users] Does replace-brick migrate data? In-Reply-To: References: <32e26faf-e5c0-b944-2a32-c9eae408b146@redhat.com> Message-ID: <0ab0c28a-48a1-92c0-a106-f4fa94cb620f@redhat.com> On 29/05/19 3:59 AM, Alan Orth wrote: > Dear Ravishankar, > > I'm not sure if Brick4 had pending AFRs because I don't know what that > means and it's been a few days so I am not sure I would be able to > find that information. When you find some time, have a look at a blog series I wrote about AFR- I've tried to explain what one needs to know to debug replication related issues in it. > > Anyways, after wasting a few days rsyncing the old brick to a new host > I decided to just try to add the old brick back into the volume > instead of bringing it up on the new host. I created a new brick > directory on the old host, moved the old brick's contents into that > new directory (minus the .glusterfs directory), added the new brick to > the volume, and then did Vlad's find/stat trick? from the brick to the > FUSE mount point. > > The interesting problem I have now is that some files don't appear in > the FUSE mount's directory listings, but I can actually list them > directly and even read them. What could cause that? Not sure, too many variables in the hacks that you did to take a guess. You can check if the contents of the .glusterfs folder are in order on the new brick (example hardlink for files and symlinks for directories are present etc.) . Regards, Ravi > > Thanks, > > ? > https://lists.gluster.org/pipermail/gluster-users/2018-February/033584.html > > On Fri, May 24, 2019 at 4:59 PM Ravishankar N > wrote: > > > On 23/05/19 2:40 AM, Alan Orth wrote: >> Dear list, >> >> I seem to have gotten into a tricky situation. Today I brought up >> a shiny new server with new disk arrays and attempted to replace >> one brick of a replica 2 distribute/replicate volume on an older >> server using the `replace-brick` command: >> >> # gluster volume replace-brick homes wingu0:/mnt/gluster/homes >> wingu06:/data/glusterfs/sdb/homes commit force >> >> The command was successful and I see the new brick in the output >> of `gluster volume info`. The problem is that Gluster doesn't >> seem to be migrating the data, > > `replace-brick` definitely must heal (not migrate) the data. In > your case, data must have been healed from Brick-4 to the replaced > Brick-3. Are there any errors in the self-heal daemon logs of > Brick-4's node? Does Brick-4 have pending AFR xattrs blaming > Brick-3? The doc is a bit out of date. replace-brick command > internally does all the setfattr steps that are mentioned in the doc. > > -Ravi > > >> and now the original brick that I replaced is no longer part of >> the volume (and a few terabytes of data are just sitting on the >> old brick): >> >> # gluster volume info homes | grep -E "Brick[0-9]:" >> Brick1: wingu4:/mnt/gluster/homes >> Brick2: wingu3:/mnt/gluster/homes >> Brick3: wingu06:/data/glusterfs/sdb/homes >> Brick4: wingu05:/data/glusterfs/sdb/homes >> Brick5: wingu05:/data/glusterfs/sdc/homes >> Brick6: wingu06:/data/glusterfs/sdc/homes >> >> I see the Gluster docs have a more complicated procedure for >> replacing bricks that involves getfattr/setfattr?. How can I tell >> Gluster about the old brick? I see that I have a backup of the >> old volfile thanks to yum's rpmsave function if that helps. >> >> We are using Gluster 5.6 on CentOS 7. Thank you for any advice >> you can give. >> >> ? >> https://docs.gluster.org/en/latest/Administrator%20Guide/Managing%20Volumes/#replace-faulty-brick >> >> -- >> Alan Orth >> alan.orth at gmail.com >> https://picturingjordan.com >> https://englishbulgaria.net >> https://mjanja.ch >> "In heaven all the interesting people are missing." ?Friedrich >> Nietzsche >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > > > -- > Alan Orth > alan.orth at gmail.com > https://picturingjordan.com > https://englishbulgaria.net > https://mjanja.ch > "In heaven all the interesting people are missing." ?Friedrich Nietzsche -------------- next part -------------- An HTML attachment was scrubbed... URL: From ravishankar at redhat.com Wed May 29 04:24:33 2019 From: ravishankar at redhat.com (Ravishankar N) Date: Wed, 29 May 2019 09:54:33 +0530 Subject: [Gluster-users] Does replace-brick migrate data? In-Reply-To: <0ab0c28a-48a1-92c0-a106-f4fa94cb620f@redhat.com> References: <32e26faf-e5c0-b944-2a32-c9eae408b146@redhat.com> <0ab0c28a-48a1-92c0-a106-f4fa94cb620f@redhat.com> Message-ID: <39dcc6a5-1610-93e1-aaff-7fef9b6c1faa@redhat.com> On 29/05/19 9:50 AM, Ravishankar N wrote: > > > On 29/05/19 3:59 AM, Alan Orth wrote: >> Dear Ravishankar, >> >> I'm not sure if Brick4 had pending AFRs because I don't know what >> that means and it's been a few days so I am not sure I would be able >> to find that information. > When you find some time, have a look at a blog series > I wrote about AFR- I've tried to explain what one needs to know to > debug replication related issues in it. Made a typo error. The URL for the blog is https://wp.me/peiBB-6b -Ravi >> >> Anyways, after wasting a few days rsyncing the old brick to a new >> host I decided to just try to add the old brick back into the volume >> instead of bringing it up on the new host. I created a new brick >> directory on the old host, moved the old brick's contents into that >> new directory (minus the .glusterfs directory), added the new brick >> to the volume, and then did Vlad's find/stat trick? from the brick to >> the FUSE mount point. >> >> The interesting problem I have now is that some files don't appear in >> the FUSE mount's directory listings, but I can actually list them >> directly and even read them. What could cause that? > Not sure, too many variables in the hacks that you did to take a > guess. You can check if the contents of the .glusterfs folder are in > order on the new brick (example hardlink for files and symlinks for > directories are present etc.) . > Regards, > Ravi >> >> Thanks, >> >> ? >> https://lists.gluster.org/pipermail/gluster-users/2018-February/033584.html >> >> On Fri, May 24, 2019 at 4:59 PM Ravishankar N > > wrote: >> >> >> On 23/05/19 2:40 AM, Alan Orth wrote: >>> Dear list, >>> >>> I seem to have gotten into a tricky situation. Today I brought >>> up a shiny new server with new disk arrays and attempted to >>> replace one brick of a replica 2 distribute/replicate volume on >>> an older server using the `replace-brick` command: >>> >>> # gluster volume replace-brick homes wingu0:/mnt/gluster/homes >>> wingu06:/data/glusterfs/sdb/homes commit force >>> >>> The command was successful and I see the new brick in the output >>> of `gluster volume info`. The problem is that Gluster doesn't >>> seem to be migrating the data, >> >> `replace-brick` definitely must heal (not migrate) the data. In >> your case, data must have been healed from Brick-4 to the >> replaced Brick-3. Are there any errors in the self-heal daemon >> logs of Brick-4's node? Does Brick-4 have pending AFR xattrs >> blaming Brick-3? The doc is a bit out of date. replace-brick >> command internally does all the setfattr steps that are mentioned >> in the doc. >> >> -Ravi >> >> >>> and now the original brick that I replaced is no longer part of >>> the volume (and a few terabytes of data are just sitting on the >>> old brick): >>> >>> # gluster volume info homes | grep -E "Brick[0-9]:" >>> Brick1: wingu4:/mnt/gluster/homes >>> Brick2: wingu3:/mnt/gluster/homes >>> Brick3: wingu06:/data/glusterfs/sdb/homes >>> Brick4: wingu05:/data/glusterfs/sdb/homes >>> Brick5: wingu05:/data/glusterfs/sdc/homes >>> Brick6: wingu06:/data/glusterfs/sdc/homes >>> >>> I see the Gluster docs have a more complicated procedure for >>> replacing bricks that involves getfattr/setfattr?. How can I >>> tell Gluster about the old brick? I see that I have a backup of >>> the old volfile thanks to yum's rpmsave function if that helps. >>> >>> We are using Gluster 5.6 on CentOS 7. Thank you for any advice >>> you can give. >>> >>> ? >>> https://docs.gluster.org/en/latest/Administrator%20Guide/Managing%20Volumes/#replace-faulty-brick >>> >>> -- >>> Alan Orth >>> alan.orth at gmail.com >>> https://picturingjordan.com >>> https://englishbulgaria.net >>> https://mjanja.ch >>> "In heaven all the interesting people are missing." ?Friedrich >>> Nietzsche >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> >> >> -- >> Alan Orth >> alan.orth at gmail.com >> https://picturingjordan.com >> https://englishbulgaria.net >> https://mjanja.ch >> "In heaven all the interesting people are missing." ?Friedrich Nietzsche > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From ravishankar at redhat.com Wed May 29 04:26:31 2019 From: ravishankar at redhat.com (Ravishankar N) Date: Wed, 29 May 2019 09:56:31 +0530 Subject: [Gluster-users] Transport endpoint is not connected In-Reply-To: References: Message-ID: On 29/05/19 6:21 AM, David Cunningham wrote: > Hello all, > > We are seeing a strange issue where a new node gfs3 shows another node > gfs2 as not connected on the "gluster volume heal" info: > > [root at gfs3 bricks]# gluster volume heal gvol0 info > Brick gfs1:/nodirectwritedata/gluster/gvol0 > Status: Connected > Number of entries: 0 > > Brick gfs2:/nodirectwritedata/gluster/gvol0 > Status: Transport endpoint is not connected > Number of entries: - > > Brick gfs3:/nodirectwritedata/gluster/gvol0 > Status: Connected > Number of entries: 0 > > > However it does show the same node connected on "gluster peer status". > Does anyone know why this would be? > > [root at gfs3 bricks]# gluster peer status > Number of Peers: 2 > > Hostname: gfs2 > Uuid: 91863102-23a8-43e1-b3d3-f0a1bd57f350 > State: Peer in Cluster (Connected) > > Hostname: gfs1 > Uuid: 32c99e7d-71f2-421c-86fc-b87c0f68ad1b > State: Peer in Cluster (Connected) > > > In nodirectwritedata-gluster-gvol0.log on gfs3 we see this logged with > regards to gfs2: You need to check glfsheal-$volname.log on the node where you ran the command and check for any connection related errors. -Ravi > > [2019-05-29 00:17:50.646360] I [MSGID: 115029] > [server-handshake.c:537:server_setvolume] 0-gvol0-server: accepted > client from > CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0 > (version: 5.6) > [2019-05-29 00:17:50.761120] I [MSGID: 115036] > [server.c:469:server_rpc_notify] 0-gvol0-server: disconnecting > connection from > CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0 > [2019-05-29 00:17:50.761352] I [MSGID: 101055] > [client_t.c:435:gf_client_unref] 0-gvol0-server: Shutting down > connection > CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0 > > Thanks in advance for any assistance. > > -- > David Cunningham, Voisonics Limited > http://voisonics.com/ > USA: +1 213 221 1092 > New Zealand: +64 (0)28 2558 3782 > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From joe at julianfamily.org Wed May 29 05:17:51 2019 From: joe at julianfamily.org (Joe Julian) Date: Tue, 28 May 2019 22:17:51 -0700 Subject: [Gluster-users] Transport endpoint is not connected In-Reply-To: References: Message-ID: Check gluster volume status gvol0 and make sure your bricks are all running. On 5/29/19 2:51 AM, David Cunningham wrote: > Hello all, > > We are seeing a strange issue where a new node gfs3 shows another node > gfs2 as not connected on the "gluster volume heal" info: > > [root at gfs3 bricks]# gluster volume heal gvol0 info > Brick gfs1:/nodirectwritedata/gluster/gvol0 > Status: Connected > Number of entries: 0 > > Brick gfs2:/nodirectwritedata/gluster/gvol0 > Status: Transport endpoint is not connected > Number of entries: - > > Brick gfs3:/nodirectwritedata/gluster/gvol0 > Status: Connected > Number of entries: 0 > > > However it does show the same node connected on "gluster peer status". > Does anyone know why this would be? > > [root at gfs3 bricks]# gluster peer status > Number of Peers: 2 > > Hostname: gfs2 > Uuid: 91863102-23a8-43e1-b3d3-f0a1bd57f350 > State: Peer in Cluster (Connected) > > Hostname: gfs1 > Uuid: 32c99e7d-71f2-421c-86fc-b87c0f68ad1b > State: Peer in Cluster (Connected) > > > In nodirectwritedata-gluster-gvol0.log on gfs3 we see this logged with > regards to gfs2: > > [2019-05-29 00:17:50.646360] I [MSGID: 115029] > [server-handshake.c:537:server_setvolume] 0-gvol0-server: accepted > client from > CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0 > (version: 5.6) > [2019-05-29 00:17:50.761120] I [MSGID: 115036] > [server.c:469:server_rpc_notify] 0-gvol0-server: disconnecting > connection from > CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0 > [2019-05-29 00:17:50.761352] I [MSGID: 101055] > [client_t.c:435:gf_client_unref] 0-gvol0-server: Shutting down > connection > CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0 > > Thanks in advance for any assistance. > > -- > David Cunningham, Voisonics Limited > http://voisonics.com/ > USA: +1 213 221 1092 > New Zealand: +64 (0)28 2558 3782 > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users -------------- next part -------------- An HTML attachment was scrubbed... URL: From dcunningham at voisonics.com Wed May 29 08:56:23 2019 From: dcunningham at voisonics.com (David Cunningham) Date: Wed, 29 May 2019 20:56:23 +1200 Subject: [Gluster-users] Transport endpoint is not connected In-Reply-To: References: Message-ID: Hi Ravi and Joe, The command "gluster volume status gvol0" shows all 3 nodes as being online, even on gfs3 as below. I've attached the glfsheal-gvol0.log, in which I can't see anything like a connection error. Would you have any further suggestions? Thank you. [root at gfs3 glusterfs]# gluster volume status gvol0 Status of volume: gvol0 Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick gfs1:/nodirectwritedata/gluster/gvol0 49152 0 Y 7706 Brick gfs2:/nodirectwritedata/gluster/gvol0 49152 0 Y 7625 Brick gfs3:/nodirectwritedata/gluster/gvol0 49152 0 Y 7307 Self-heal Daemon on localhost N/A N/A Y 7316 Self-heal Daemon on gfs1 N/A N/A Y 40591 Self-heal Daemon on gfs2 N/A N/A Y 7634 Task Status of Volume gvol0 ------------------------------------------------------------------------------ There are no active volume tasks On Wed, 29 May 2019 at 16:26, Ravishankar N wrote: > > On 29/05/19 6:21 AM, David Cunningham wrote: > > Hello all, > > We are seeing a strange issue where a new node gfs3 shows another node > gfs2 as not connected on the "gluster volume heal" info: > > [root at gfs3 bricks]# gluster volume heal gvol0 info > Brick gfs1:/nodirectwritedata/gluster/gvol0 > Status: Connected > Number of entries: 0 > > Brick gfs2:/nodirectwritedata/gluster/gvol0 > Status: Transport endpoint is not connected > Number of entries: - > > Brick gfs3:/nodirectwritedata/gluster/gvol0 > Status: Connected > Number of entries: 0 > > > However it does show the same node connected on "gluster peer status". > Does anyone know why this would be? > > [root at gfs3 bricks]# gluster peer status > Number of Peers: 2 > > Hostname: gfs2 > Uuid: 91863102-23a8-43e1-b3d3-f0a1bd57f350 > State: Peer in Cluster (Connected) > > Hostname: gfs1 > Uuid: 32c99e7d-71f2-421c-86fc-b87c0f68ad1b > State: Peer in Cluster (Connected) > > > In nodirectwritedata-gluster-gvol0.log on gfs3 we see this logged with > regards to gfs2: > > You need to check glfsheal-$volname.log on the node where you ran the > command and check for any connection related errors. > > -Ravi > > > [2019-05-29 00:17:50.646360] I [MSGID: 115029] > [server-handshake.c:537:server_setvolume] 0-gvol0-server: accepted client > from > CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0 > (version: 5.6) > [2019-05-29 00:17:50.761120] I [MSGID: 115036] > [server.c:469:server_rpc_notify] 0-gvol0-server: disconnecting connection > from > CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0 > [2019-05-29 00:17:50.761352] I [MSGID: 101055] > [client_t.c:435:gf_client_unref] 0-gvol0-server: Shutting down connection > CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0 > > Thanks in advance for any assistance. > > -- > David Cunningham, Voisonics Limited > http://voisonics.com/ > USA: +1 213 221 1092 > New Zealand: +64 (0)28 2558 3782 > > _______________________________________________ > Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users > > -- David Cunningham, Voisonics Limited http://voisonics.com/ USA: +1 213 221 1092 New Zealand: +64 (0)28 2558 3782 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: glfsheal-gvol0.log Type: text/x-log Size: 6160 bytes Desc: not available URL: From ravishankar at redhat.com Wed May 29 11:10:49 2019 From: ravishankar at redhat.com (Ravishankar N) Date: Wed, 29 May 2019 16:40:49 +0530 Subject: [Gluster-users] Transport endpoint is not connected In-Reply-To: References: Message-ID: <64ca2efd-7ce2-e88c-db75-1bbb20db44ad@redhat.com> I don't see a "Connected to gvol0-client-1" in the log.? Perhaps a firewall issue like the last time? Even in the earlier add-brick log from the other email thread, connection to the 2nd brick was not established. -Ravi On 29/05/19 2:26 PM, David Cunningham wrote: > Hi Ravi and Joe, > > The command "gluster volume status gvol0" shows all 3 nodes as being > online, even on gfs3 as below. I've attached the glfsheal-gvol0.log, > in which I can't see anything like a connection error. Would you have > any further suggestions? Thank you. > > [root at gfs3 glusterfs]# gluster volume status gvol0 > Status of volume: gvol0 > Gluster process???????????????????????????? TCP Port RDMA Port? > Online? Pid > ------------------------------------------------------------------------------ > Brick gfs1:/nodirectwritedata/gluster/gvol0 49152 0????????? Y?????? 7706 > Brick gfs2:/nodirectwritedata/gluster/gvol0 49152 0????????? Y?????? 7625 > Brick gfs3:/nodirectwritedata/gluster/gvol0 49152 0????????? Y?????? 7307 > Self-heal Daemon on localhost?????????????? N/A N/A??????? Y?????? 7316 > Self-heal Daemon on gfs1??????????????????? N/A N/A??????? Y?????? 40591 > Self-heal Daemon on gfs2??????????????????? N/A N/A??????? Y?????? 7634 > > Task Status of Volume gvol0 > ------------------------------------------------------------------------------ > There are no active volume tasks > > > On Wed, 29 May 2019 at 16:26, Ravishankar N > wrote: > > > On 29/05/19 6:21 AM, David Cunningham wrote: >> Hello all, >> >> We are seeing a strange issue where a new node gfs3 shows another >> node gfs2 as not connected on the "gluster volume heal" info: >> >> [root at gfs3 bricks]# gluster volume heal gvol0 info >> Brick gfs1:/nodirectwritedata/gluster/gvol0 >> Status: Connected >> Number of entries: 0 >> >> Brick gfs2:/nodirectwritedata/gluster/gvol0 >> Status: Transport endpoint is not connected >> Number of entries: - >> >> Brick gfs3:/nodirectwritedata/gluster/gvol0 >> Status: Connected >> Number of entries: 0 >> >> >> However it does show the same node connected on "gluster peer >> status". Does anyone know why this would be? >> >> [root at gfs3 bricks]# gluster peer status >> Number of Peers: 2 >> >> Hostname: gfs2 >> Uuid: 91863102-23a8-43e1-b3d3-f0a1bd57f350 >> State: Peer in Cluster (Connected) >> >> Hostname: gfs1 >> Uuid: 32c99e7d-71f2-421c-86fc-b87c0f68ad1b >> State: Peer in Cluster (Connected) >> >> >> In nodirectwritedata-gluster-gvol0.log on gfs3 we see this logged >> with regards to gfs2: > > You need to check glfsheal-$volname.log on the node where you ran > the command and check for any connection related errors. > > -Ravi > >> >> [2019-05-29 00:17:50.646360] I [MSGID: 115029] >> [server-handshake.c:537:server_setvolume] 0-gvol0-server: >> accepted client from >> CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0 >> (version: 5.6) >> [2019-05-29 00:17:50.761120] I [MSGID: 115036] >> [server.c:469:server_rpc_notify] 0-gvol0-server: disconnecting >> connection from >> CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0 >> [2019-05-29 00:17:50.761352] I [MSGID: 101055] >> [client_t.c:435:gf_client_unref] 0-gvol0-server: Shutting down >> connection >> CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0 >> >> Thanks in advance for any assistance. >> >> -- >> David Cunningham, Voisonics Limited >> http://voisonics.com/ >> USA: +1 213 221 1092 >> New Zealand: +64 (0)28 2558 3782 >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > > > -- > David Cunningham, Voisonics Limited > http://voisonics.com/ > USA: +1 213 221 1092 > New Zealand: +64 (0)28 2558 3782 -------------- next part -------------- An HTML attachment was scrubbed... URL: From srakonde at redhat.com Wed May 29 11:13:05 2019 From: srakonde at redhat.com (Sanju Rakonde) Date: Wed, 29 May 2019 16:43:05 +0530 Subject: [Gluster-users] Memory leak in gluster 5.4 In-Reply-To: References: Message-ID: Hi Christian, I see below errors when I try to unzip the file. [root at localhost Downloads]# unzip gluster_coredump.zip Archive: gluster_coredump.zip checkdir error: coredump exists but is not directory unable to process coredump/. checkdir error: coredump exists but is not directory unable to process coredump/core.glusterd.0.ed02597e2d374210985795ab82dd48e7.2209.1557381154000000.lz4. checkdir error: coredump exists but is not directory unable to process coredump/core.glusterfsd.0.ed02597e2d374210985795ab82dd48e7.2634.1557381672000000.lz4. checkdir error: coredump exists but is not directory unable to process coredump/core.glusterfsd.0.ed02597e2d374210985795ab82dd48e7.2653.1557381626000000.lz4. [root at localhost Downloads]# Periodic statedumps will be much helpful in debugging memory leaks than coredumps. Thanks, Sanju On Thu, May 16, 2019 at 2:57 PM Christian Meyer wrote: > Hi everyone! > > I'm using a Gluster 5.4 Setup with three Nodes and three volumes > (one is the gluster shared storage). The other are replicated volumes. > Each node has 64GB of RAM. > Over the time of ~2 month the memory consumption of glusterd grow > linear. An the end glusterd used ~45% of RAM the brick processes > together ~43% of RAM. > I think this is a memory leak. > > I made a coredump of the processes (glusterd, bricks) (zipped ~500MB), > hope this will help to find the problem. > > Could someone please have a look on it? > > Download Coredumps: > https://s3.eu-central-1.amazonaws.com/glusterlogs/gluster_coredump.zip > > Kind regards > > Christian > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > -- Thanks, Sanju -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.orth at gmail.com Wed May 29 14:20:20 2019 From: alan.orth at gmail.com (Alan Orth) Date: Wed, 29 May 2019 17:20:20 +0300 Subject: [Gluster-users] Does replace-brick migrate data? In-Reply-To: <39dcc6a5-1610-93e1-aaff-7fef9b6c1faa@redhat.com> References: <32e26faf-e5c0-b944-2a32-c9eae408b146@redhat.com> <0ab0c28a-48a1-92c0-a106-f4fa94cb620f@redhat.com> <39dcc6a5-1610-93e1-aaff-7fef9b6c1faa@redhat.com> Message-ID: Dear Ravi, Thank you for the link to the blog post series?it is very informative and current! If I understand your blog post correctly then I think the answer to your previous question about pending AFRs is: no, there are no pending AFRs. I have identified one file that is a good test case to try to understand what happened after I issued the `gluster volume replace-brick ... commit force` a few days ago and then added the same original brick back to the volume later. This is the current state of the replica 2 distribute/replicate volume: [root at wingu0 ~]# gluster volume info apps Volume Name: apps Type: Distributed-Replicate Volume ID: f118d2da-79df-4ee1-919d-53884cd34eda Status: Started Snapshot Count: 0 Number of Bricks: 3 x 2 = 6 Transport-type: tcp Bricks: Brick1: wingu3:/mnt/gluster/apps Brick2: wingu4:/mnt/gluster/apps Brick3: wingu05:/data/glusterfs/sdb/apps Brick4: wingu06:/data/glusterfs/sdb/apps Brick5: wingu0:/mnt/gluster/apps Brick6: wingu05:/data/glusterfs/sdc/apps Options Reconfigured: diagnostics.client-log-level: DEBUG storage.health-check-interval: 10 nfs.disable: on I checked the xattrs of one file that is missing from the volume's FUSE mount (though I can read it if I access its full path explicitly), but is present in several of the volume's bricks (some with full size, others empty): [root at wingu0 ~]# getfattr -d -m. -e hex /mnt/gluster/apps/clcgenomics/clclicsrv/licenseserver.cfg getfattr: Removing leading '/' from absolute path names # file: mnt/gluster/apps/clcgenomics/clclicsrv/licenseserver.cfg security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.apps-client-3=0x000000000000000000000000 trusted.afr.apps-client-5=0x000000000000000000000000 trusted.afr.dirty=0x000000000000000000000000 trusted.bit-rot.version=0x0200000000000000585a396f00046e15 trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd [root at wingu05 ~]# getfattr -d -m. -e hex /data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg getfattr: Removing leading '/' from absolute path names # file: data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667 trusted.glusterfs.dht.linkto=0x617070732d7265706c69636174652d3200 [root at wingu05 ~]# getfattr -d -m. -e hex /data/glusterfs/sdc/apps/clcgenomics/clclicsrv/licenseserver.cfg getfattr: Removing leading '/' from absolute path names # file: data/glusterfs/sdc/apps/clcgenomics/clclicsrv/licenseserver.cfg security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667 [root at wingu06 ~]# getfattr -d -m. -e hex /data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg getfattr: Removing leading '/' from absolute path names # file: data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667 trusted.glusterfs.dht.linkto=0x617070732d7265706c69636174652d3200 According to the trusted.afr.apps-client-xx xattrs this particular file should be on bricks with id "apps-client-3" and "apps-client-5". It took me a few hours to realize that the brick-id values are recorded in the volume's volfiles in /var/lib/glusterd/vols/apps/bricks. After comparing those brick-id values with a volfile backup from before the replace-brick, I realized that the files are simply on the wrong brick now as far as Gluster is concerned. This particular file is now on the brick for "apps-client-4". As an experiment I copied this one file to the two bricks listed in the xattrs and I was then able to see the file from the FUSE mount (yay!). Other than replacing the brick, removing it, and then adding the old brick on the original server back, there has been no change in the data this entire time. Can I change the brick IDs in the volfiles so they reflect where the data actually is? Or perhaps script something to reset all the xattrs on the files/directories to point to the correct bricks? Thank you for any help or pointers, On Wed, May 29, 2019 at 7:24 AM Ravishankar N wrote: > > On 29/05/19 9:50 AM, Ravishankar N wrote: > > > On 29/05/19 3:59 AM, Alan Orth wrote: > > Dear Ravishankar, > > I'm not sure if Brick4 had pending AFRs because I don't know what that > means and it's been a few days so I am not sure I would be able to find > that information. > > When you find some time, have a look at a blog > series I wrote about AFR- I've tried to explain what one needs to know to > debug replication related issues in it. > > Made a typo error. The URL for the blog is https://wp.me/peiBB-6b > > -Ravi > > > Anyways, after wasting a few days rsyncing the old brick to a new host I > decided to just try to add the old brick back into the volume instead of > bringing it up on the new host. I created a new brick directory on the old > host, moved the old brick's contents into that new directory (minus the > .glusterfs directory), added the new brick to the volume, and then did > Vlad's find/stat trick? from the brick to the FUSE mount point. > > The interesting problem I have now is that some files don't appear in the > FUSE mount's directory listings, but I can actually list them directly and > even read them. What could cause that? > > Not sure, too many variables in the hacks that you did to take a guess. > You can check if the contents of the .glusterfs folder are in order on the > new brick (example hardlink for files and symlinks for directories are > present etc.) . > Regards, > Ravi > > > Thanks, > > ? > https://lists.gluster.org/pipermail/gluster-users/2018-February/033584.html > > On Fri, May 24, 2019 at 4:59 PM Ravishankar N > wrote: > >> >> On 23/05/19 2:40 AM, Alan Orth wrote: >> >> Dear list, >> >> I seem to have gotten into a tricky situation. Today I brought up a shiny >> new server with new disk arrays and attempted to replace one brick of a >> replica 2 distribute/replicate volume on an older server using the >> `replace-brick` command: >> >> # gluster volume replace-brick homes wingu0:/mnt/gluster/homes >> wingu06:/data/glusterfs/sdb/homes commit force >> >> The command was successful and I see the new brick in the output of >> `gluster volume info`. The problem is that Gluster doesn't seem to be >> migrating the data, >> >> `replace-brick` definitely must heal (not migrate) the data. In your >> case, data must have been healed from Brick-4 to the replaced Brick-3. Are >> there any errors in the self-heal daemon logs of Brick-4's node? Does >> Brick-4 have pending AFR xattrs blaming Brick-3? The doc is a bit out of >> date. replace-brick command internally does all the setfattr steps that are >> mentioned in the doc. >> >> -Ravi >> >> >> and now the original brick that I replaced is no longer part of the >> volume (and a few terabytes of data are just sitting on the old brick): >> >> # gluster volume info homes | grep -E "Brick[0-9]:" >> Brick1: wingu4:/mnt/gluster/homes >> Brick2: wingu3:/mnt/gluster/homes >> Brick3: wingu06:/data/glusterfs/sdb/homes >> Brick4: wingu05:/data/glusterfs/sdb/homes >> Brick5: wingu05:/data/glusterfs/sdc/homes >> Brick6: wingu06:/data/glusterfs/sdc/homes >> >> I see the Gluster docs have a more complicated procedure for replacing >> bricks that involves getfattr/setfattr?. How can I tell Gluster about the >> old brick? I see that I have a backup of the old volfile thanks to yum's >> rpmsave function if that helps. >> >> We are using Gluster 5.6 on CentOS 7. Thank you for any advice you can >> give. >> >> ? >> https://docs.gluster.org/en/latest/Administrator%20Guide/Managing%20Volumes/#replace-faulty-brick >> >> -- >> Alan Orth >> alan.orth at gmail.com >> https://picturingjordan.com >> https://englishbulgaria.net >> https://mjanja.ch >> "In heaven all the interesting people are missing." ?Friedrich Nietzsche >> >> _______________________________________________ >> Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users >> >> > > -- > Alan Orth > alan.orth at gmail.com > https://picturingjordan.com > https://englishbulgaria.net > https://mjanja.ch > "In heaven all the interesting people are missing." ?Friedrich Nietzsche > > > _______________________________________________ > Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users > > -- Alan Orth alan.orth at gmail.com https://picturingjordan.com https://englishbulgaria.net https://mjanja.ch "In heaven all the interesting people are missing." ?Friedrich Nietzsche -------------- next part -------------- An HTML attachment was scrubbed... URL: From dcunningham at voisonics.com Wed May 29 22:33:01 2019 From: dcunningham at voisonics.com (David Cunningham) Date: Thu, 30 May 2019 10:33:01 +1200 Subject: [Gluster-users] Transport endpoint is not connected In-Reply-To: <64ca2efd-7ce2-e88c-db75-1bbb20db44ad@redhat.com> References: <64ca2efd-7ce2-e88c-db75-1bbb20db44ad@redhat.com> Message-ID: Hi Ravi, I think it probably is a firewall issue with the network provider. I was hoping to see a specific connection failure message we could send to them, but will take it up with them anyway. Thanks for your help. On Wed, 29 May 2019 at 23:10, Ravishankar N wrote: > I don't see a "Connected to gvol0-client-1" in the log. Perhaps a > firewall issue like the last time? Even in the earlier add-brick log from > the other email thread, connection to the 2nd brick was not established. > > -Ravi > On 29/05/19 2:26 PM, David Cunningham wrote: > > Hi Ravi and Joe, > > The command "gluster volume status gvol0" shows all 3 nodes as being > online, even on gfs3 as below. I've attached the glfsheal-gvol0.log, in > which I can't see anything like a connection error. Would you have any > further suggestions? Thank you. > > [root at gfs3 glusterfs]# gluster volume status gvol0 > Status of volume: gvol0 > Gluster process TCP Port RDMA Port Online > Pid > > ------------------------------------------------------------------------------ > Brick gfs1:/nodirectwritedata/gluster/gvol0 49152 0 Y > 7706 > Brick gfs2:/nodirectwritedata/gluster/gvol0 49152 0 Y > 7625 > Brick gfs3:/nodirectwritedata/gluster/gvol0 49152 0 Y > 7307 > Self-heal Daemon on localhost N/A N/A Y > 7316 > Self-heal Daemon on gfs1 N/A N/A Y > 40591 > Self-heal Daemon on gfs2 N/A N/A Y > 7634 > > Task Status of Volume gvol0 > > ------------------------------------------------------------------------------ > There are no active volume tasks > > > On Wed, 29 May 2019 at 16:26, Ravishankar N > wrote: > >> >> On 29/05/19 6:21 AM, David Cunningham wrote: >> >> Hello all, >> >> We are seeing a strange issue where a new node gfs3 shows another node >> gfs2 as not connected on the "gluster volume heal" info: >> >> [root at gfs3 bricks]# gluster volume heal gvol0 info >> Brick gfs1:/nodirectwritedata/gluster/gvol0 >> Status: Connected >> Number of entries: 0 >> >> Brick gfs2:/nodirectwritedata/gluster/gvol0 >> Status: Transport endpoint is not connected >> Number of entries: - >> >> Brick gfs3:/nodirectwritedata/gluster/gvol0 >> Status: Connected >> Number of entries: 0 >> >> >> However it does show the same node connected on "gluster peer status". >> Does anyone know why this would be? >> >> [root at gfs3 bricks]# gluster peer status >> Number of Peers: 2 >> >> Hostname: gfs2 >> Uuid: 91863102-23a8-43e1-b3d3-f0a1bd57f350 >> State: Peer in Cluster (Connected) >> >> Hostname: gfs1 >> Uuid: 32c99e7d-71f2-421c-86fc-b87c0f68ad1b >> State: Peer in Cluster (Connected) >> >> >> In nodirectwritedata-gluster-gvol0.log on gfs3 we see this logged with >> regards to gfs2: >> >> You need to check glfsheal-$volname.log on the node where you ran the >> command and check for any connection related errors. >> >> -Ravi >> >> >> [2019-05-29 00:17:50.646360] I [MSGID: 115029] >> [server-handshake.c:537:server_setvolume] 0-gvol0-server: accepted client >> from >> CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0 >> (version: 5.6) >> [2019-05-29 00:17:50.761120] I [MSGID: 115036] >> [server.c:469:server_rpc_notify] 0-gvol0-server: disconnecting connection >> from >> CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0 >> [2019-05-29 00:17:50.761352] I [MSGID: 101055] >> [client_t.c:435:gf_client_unref] 0-gvol0-server: Shutting down connection >> CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0 >> >> Thanks in advance for any assistance. >> >> -- >> David Cunningham, Voisonics Limited >> http://voisonics.com/ >> USA: +1 213 221 1092 >> New Zealand: +64 (0)28 2558 3782 >> >> _______________________________________________ >> Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users >> >> > > -- > David Cunningham, Voisonics Limited > http://voisonics.com/ > USA: +1 213 221 1092 > New Zealand: +64 (0)28 2558 3782 > > -- David Cunningham, Voisonics Limited http://voisonics.com/ USA: +1 213 221 1092 New Zealand: +64 (0)28 2558 3782 -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Wed May 29 10:27:47 2019 From: hunter86_bg at yahoo.com (Strahil) Date: Wed, 29 May 2019 13:27:47 +0300 Subject: [Gluster-users] Transport endpoint is not connected Message-ID: Check with gluster volume status the tcp pprts and then try with telnet/ncat to connect from gfs3 to gfs2 on the tcp port. Best Regards, Strahil NikolovOn May 29, 2019 03:51, David Cunningham wrote: > > Hello all, > > We are seeing a strange issue where a new node gfs3 shows another node gfs2 as not connected on the "gluster volume heal" info: > > [root at gfs3 bricks]# gluster volume heal gvol0 info > Brick gfs1:/nodirectwritedata/gluster/gvol0 > Status: Connected > Number of entries: 0 > > Brick gfs2:/nodirectwritedata/gluster/gvol0 > Status: Transport endpoint is not connected > Number of entries: - > > Brick gfs3:/nodirectwritedata/gluster/gvol0 > Status: Connected > Number of entries: 0 > > > However it does show the same node connected on "gluster peer status". Does anyone know why this would be? > > [root at gfs3 bricks]# gluster peer status > Number of Peers: 2 > > Hostname: gfs2 > Uuid: 91863102-23a8-43e1-b3d3-f0a1bd57f350 > State: Peer in Cluster (Connected) > > Hostname: gfs1 > Uuid: 32c99e7d-71f2-421c-86fc-b87c0f68ad1b > State: Peer in Cluster (Connected) > > > In nodirectwritedata-gluster-gvol0.log on gfs3 we see this logged with regards to gfs2: > > [2019-05-29 00:17:50.646360] I [MSGID: 115029] [server-handshake.c:537:server_setvolume] 0-gvol0-server: accepted client from CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0 (version: 5.6) > [2019-05-29 00:17:50.761120] I [MSGID: 115036] [server.c:469:server_rpc_notify] 0-gvol0-server: disconnecting connection from CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0 > [2019-05-29 00:17:50.761352] I [MSGID: 101055] [client_t.c:435:gf_client_unref] 0-gvol0-server: Shutting down connection CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0 > > Thanks in advance for any assistance. > > -- > David Cunningham, Voisonics Limited > http://voisonics.com/ > USA: +1 213 221 1092 > New Zealand: +64 (0)28 2558 3782 -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunter86_bg at yahoo.com Thu May 30 04:11:51 2019 From: hunter86_bg at yahoo.com (Strahil) Date: Thu, 30 May 2019 07:11:51 +0300 Subject: [Gluster-users] Transport endpoint is not connected Message-ID: <20r8rlguxb86gpnxjwe3wpqw.1559189511842@email.android.com> You can try to run a ncat from gfs3: ncat -z -v gfs1 49152 ncat -z -v gfs2 49152 If ncat fails to connect -> it's definately a firewall. Best Regards, Strahil NikolovOn May 30, 2019 01:33, David Cunningham wrote: > > Hi Ravi, > > I think it probably is a firewall issue with the network provider. I was hoping to see a specific connection failure message we could send to them, but will take it up with them anyway. > > Thanks for your help. > > > On Wed, 29 May 2019 at 23:10, Ravishankar N wrote: >> >> I don't see a "Connected to gvol0-client-1" in the log.? Perhaps a firewall issue like the last time? Even in the earlier add-brick log from the other email thread, connection to the 2nd brick was not established. >> >> -Ravi >> >> On 29/05/19 2:26 PM, David Cunningham wrote: >>> >>> Hi Ravi and Joe, >>> >>> The command "gluster volume status gvol0" shows all 3 nodes as being online, even on gfs3 as below. I've attached the glfsheal-gvol0.log, in which I can't see anything like a connection error. Would you have any further suggestions? Thank you. >>> >>> [root at gfs3 glusterfs]# gluster volume status gvol0 >>> Status of volume: gvol0 >>> Gluster process???????????????????????????? TCP Port? RDMA Port? Online? Pid >>> ------------------------------------------------------------------------------ >>> Brick gfs1:/nodirectwritedata/gluster/gvol0 49152???? 0????????? Y?????? 7706 >>> Brick gfs2:/nodirectwritedata/gluster/gvol0 49152???? 0????????? Y?????? 7625 >>> Brick gfs3:/nodirectwritedata/gluster/gvol0 49152???? 0????????? Y?????? 7307 >>> Self-heal Daemon on localhost?????????????? N/A?????? N/A??????? Y?????? 7316 >>> Self-heal Daemon on gfs1??????????????????? N/A?????? N/A??????? Y?????? 40591 >>> Self-heal Daemon on gfs2??????????????????? N/A?????? N/A??????? Y?????? 7634 >>> ? >>> Task Status of Volume gvol0 >>> ------------------------------------------------------------------------------ >>> There are no active volume tasks >>> >>> >>> On Wed, 29 May 2019 at 16:26, Ravishankar N wrote: >>>> >>>> >>>> On 29/05/19 6:21 AM, David Cunningham wrote: -------------- next part -------------- An HTML attachment was scrubbed... URL: From hgowtham at redhat.com Thu May 30 09:02:41 2019 From: hgowtham at redhat.com (Hari Gowtham) Date: Thu, 30 May 2019 14:32:41 +0530 Subject: [Gluster-users] Announcing Gluster release 6.2 Message-ID: The Gluster community is pleased to announce the release of Gluster 6.2 (packages available at [1]). Release notes for the release can be found at [2]. Major changes, features and limitations addressed in this release: None Thanks, Gluster community [1] Packages for 6.2: https://download.gluster.org/pub/gluster/glusterfs/6/6.2/ [2] Release notes for 6.2: https://docs.gluster.org/en/latest/release-notes/6.2/ -- Regards, Hari Gowtham. From alan.orth at gmail.com Thu May 30 21:50:51 2019 From: alan.orth at gmail.com (Alan Orth) Date: Fri, 31 May 2019 00:50:51 +0300 Subject: [Gluster-users] Does replace-brick migrate data? In-Reply-To: References: <32e26faf-e5c0-b944-2a32-c9eae408b146@redhat.com> <0ab0c28a-48a1-92c0-a106-f4fa94cb620f@redhat.com> <39dcc6a5-1610-93e1-aaff-7fef9b6c1faa@redhat.com> Message-ID: Dear Ravi, I spent a bit of time inspecting the xattrs on some files and directories on a few bricks for this volume and it looks a bit messy. Even if I could make sense of it for a few and potentially heal them manually, there are millions of files and directories in total so that's definitely not a scalable solution. After a few missteps with `replace-brick ... commit force` in the last week?one of which on a brick that was dead/offline?as well as some premature `remove-brick` commands, I'm unsure how how to proceed and I'm getting demotivated. It's scary how quickly things get out of hand in distributed systems... I had hoped that bringing the old brick back up would help, but by the time I added it again a few days had passed and all the brick-id's had changed due to the replace/remove brick commands, not to mention that the trusted.afr.$volume-client-xx values were now probably pointing to the wrong bricks (?). Anyways, a few hours ago I started a full heal on the volume and I see that there is a sustained 100MiB/sec of network traffic going from the old brick's host to the new one. The completed heals reported in the logs look promising too: Old brick host: # grep '2019-05-30' /var/log/glusterfs/glustershd.log | grep -o -E 'Completed (data|metadata|entry) selfheal' | sort | uniq -c 281614 Completed data selfheal 84 Completed entry selfheal 299648 Completed metadata selfheal New brick host: # grep '2019-05-30' /var/log/glusterfs/glustershd.log | grep -o -E 'Completed (data|metadata|entry) selfheal' | sort | uniq -c 198256 Completed data selfheal 16829 Completed entry selfheal 229664 Completed metadata selfheal So that's good I guess, though I have no idea how long it will take or if it will fix the "missing files" issue on the FUSE mount. I've increased cluster.shd-max-threads to 8 to hopefully speed up the heal process. I'd be happy for any advice or pointers, On Wed, May 29, 2019 at 5:20 PM Alan Orth wrote: > Dear Ravi, > > Thank you for the link to the blog post series?it is very informative and > current! If I understand your blog post correctly then I think the answer > to your previous question about pending AFRs is: no, there are no pending > AFRs. I have identified one file that is a good test case to try to > understand what happened after I issued the `gluster volume replace-brick > ... commit force` a few days ago and then added the same original brick > back to the volume later. This is the current state of the replica 2 > distribute/replicate volume: > > [root at wingu0 ~]# gluster volume info apps > > Volume Name: apps > Type: Distributed-Replicate > Volume ID: f118d2da-79df-4ee1-919d-53884cd34eda > Status: Started > Snapshot Count: 0 > Number of Bricks: 3 x 2 = 6 > Transport-type: tcp > Bricks: > Brick1: wingu3:/mnt/gluster/apps > Brick2: wingu4:/mnt/gluster/apps > Brick3: wingu05:/data/glusterfs/sdb/apps > Brick4: wingu06:/data/glusterfs/sdb/apps > Brick5: wingu0:/mnt/gluster/apps > Brick6: wingu05:/data/glusterfs/sdc/apps > Options Reconfigured: > diagnostics.client-log-level: DEBUG > storage.health-check-interval: 10 > nfs.disable: on > > I checked the xattrs of one file that is missing from the volume's FUSE > mount (though I can read it if I access its full path explicitly), but is > present in several of the volume's bricks (some with full size, others > empty): > > [root at wingu0 ~]# getfattr -d -m. -e hex > /mnt/gluster/apps/clcgenomics/clclicsrv/licenseserver.cfg > > getfattr: Removing leading '/' from absolute path names > # file: mnt/gluster/apps/clcgenomics/clclicsrv/licenseserver.cfg > security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 > trusted.afr.apps-client-3=0x000000000000000000000000 > trusted.afr.apps-client-5=0x000000000000000000000000 > trusted.afr.dirty=0x000000000000000000000000 > trusted.bit-rot.version=0x0200000000000000585a396f00046e15 > trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd > > [root at wingu05 ~]# getfattr -d -m. -e hex /data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg > getfattr: Removing leading '/' from absolute path names > # file: data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg > security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 > trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd > trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667 > trusted.glusterfs.dht.linkto=0x617070732d7265706c69636174652d3200 > > [root at wingu05 ~]# getfattr -d -m. -e hex /data/glusterfs/sdc/apps/clcgenomics/clclicsrv/licenseserver.cfg > getfattr: Removing leading '/' from absolute path names > # file: data/glusterfs/sdc/apps/clcgenomics/clclicsrv/licenseserver.cfg > security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 > trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd > trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667 > > [root at wingu06 ~]# getfattr -d -m. -e hex /data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg > getfattr: Removing leading '/' from absolute path names > # file: data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg > security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 > trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd > trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667 > trusted.glusterfs.dht.linkto=0x617070732d7265706c69636174652d3200 > > According to the trusted.afr.apps-client-xx xattrs this particular file > should be on bricks with id "apps-client-3" and "apps-client-5". It took me > a few hours to realize that the brick-id values are recorded in the > volume's volfiles in /var/lib/glusterd/vols/apps/bricks. After comparing > those brick-id values with a volfile backup from before the replace-brick, > I realized that the files are simply on the wrong brick now as far as > Gluster is concerned. This particular file is now on the brick for > "apps-client-4". As an experiment I copied this one file to the two > bricks listed in the xattrs and I was then able to see the file from the > FUSE mount (yay!). > > Other than replacing the brick, removing it, and then adding the old brick > on the original server back, there has been no change in the data this > entire time. Can I change the brick IDs in the volfiles so they reflect > where the data actually is? Or perhaps script something to reset all the > xattrs on the files/directories to point to the correct bricks? > > Thank you for any help or pointers, > > On Wed, May 29, 2019 at 7:24 AM Ravishankar N > wrote: > >> >> On 29/05/19 9:50 AM, Ravishankar N wrote: >> >> >> On 29/05/19 3:59 AM, Alan Orth wrote: >> >> Dear Ravishankar, >> >> I'm not sure if Brick4 had pending AFRs because I don't know what that >> means and it's been a few days so I am not sure I would be able to find >> that information. >> >> When you find some time, have a look at a blog >> series I wrote about AFR- I've tried to explain what one needs to know to >> debug replication related issues in it. >> >> Made a typo error. The URL for the blog is https://wp.me/peiBB-6b >> >> -Ravi >> >> >> Anyways, after wasting a few days rsyncing the old brick to a new host I >> decided to just try to add the old brick back into the volume instead of >> bringing it up on the new host. I created a new brick directory on the old >> host, moved the old brick's contents into that new directory (minus the >> .glusterfs directory), added the new brick to the volume, and then did >> Vlad's find/stat trick? from the brick to the FUSE mount point. >> >> The interesting problem I have now is that some files don't appear in the >> FUSE mount's directory listings, but I can actually list them directly and >> even read them. What could cause that? >> >> Not sure, too many variables in the hacks that you did to take a guess. >> You can check if the contents of the .glusterfs folder are in order on the >> new brick (example hardlink for files and symlinks for directories are >> present etc.) . >> Regards, >> Ravi >> >> >> Thanks, >> >> ? >> https://lists.gluster.org/pipermail/gluster-users/2018-February/033584.html >> >> On Fri, May 24, 2019 at 4:59 PM Ravishankar N >> wrote: >> >>> >>> On 23/05/19 2:40 AM, Alan Orth wrote: >>> >>> Dear list, >>> >>> I seem to have gotten into a tricky situation. Today I brought up a >>> shiny new server with new disk arrays and attempted to replace one brick of >>> a replica 2 distribute/replicate volume on an older server using the >>> `replace-brick` command: >>> >>> # gluster volume replace-brick homes wingu0:/mnt/gluster/homes >>> wingu06:/data/glusterfs/sdb/homes commit force >>> >>> The command was successful and I see the new brick in the output of >>> `gluster volume info`. The problem is that Gluster doesn't seem to be >>> migrating the data, >>> >>> `replace-brick` definitely must heal (not migrate) the data. In your >>> case, data must have been healed from Brick-4 to the replaced Brick-3. Are >>> there any errors in the self-heal daemon logs of Brick-4's node? Does >>> Brick-4 have pending AFR xattrs blaming Brick-3? The doc is a bit out of >>> date. replace-brick command internally does all the setfattr steps that are >>> mentioned in the doc. >>> >>> -Ravi >>> >>> >>> and now the original brick that I replaced is no longer part of the >>> volume (and a few terabytes of data are just sitting on the old brick): >>> >>> # gluster volume info homes | grep -E "Brick[0-9]:" >>> Brick1: wingu4:/mnt/gluster/homes >>> Brick2: wingu3:/mnt/gluster/homes >>> Brick3: wingu06:/data/glusterfs/sdb/homes >>> Brick4: wingu05:/data/glusterfs/sdb/homes >>> Brick5: wingu05:/data/glusterfs/sdc/homes >>> Brick6: wingu06:/data/glusterfs/sdc/homes >>> >>> I see the Gluster docs have a more complicated procedure for replacing >>> bricks that involves getfattr/setfattr?. How can I tell Gluster about the >>> old brick? I see that I have a backup of the old volfile thanks to yum's >>> rpmsave function if that helps. >>> >>> We are using Gluster 5.6 on CentOS 7. Thank you for any advice you can >>> give. >>> >>> ? >>> https://docs.gluster.org/en/latest/Administrator%20Guide/Managing%20Volumes/#replace-faulty-brick >>> >>> -- >>> Alan Orth >>> alan.orth at gmail.com >>> https://picturingjordan.com >>> https://englishbulgaria.net >>> https://mjanja.ch >>> "In heaven all the interesting people are missing." ?Friedrich Nietzsche >>> >>> _______________________________________________ >>> Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> >> >> -- >> Alan Orth >> alan.orth at gmail.com >> https://picturingjordan.com >> https://englishbulgaria.net >> https://mjanja.ch >> "In heaven all the interesting people are missing." ?Friedrich Nietzsche >> >> >> _______________________________________________ >> Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users >> >> > > -- > Alan Orth > alan.orth at gmail.com > https://picturingjordan.com > https://englishbulgaria.net > https://mjanja.ch > "In heaven all the interesting people are missing." ?Friedrich Nietzsche > -- Alan Orth alan.orth at gmail.com https://picturingjordan.com https://englishbulgaria.net https://mjanja.ch "In heaven all the interesting people are missing." ?Friedrich Nietzsche -------------- next part -------------- An HTML attachment was scrubbed... URL: From ravishankar at redhat.com Fri May 31 04:57:18 2019 From: ravishankar at redhat.com (Ravishankar N) Date: Fri, 31 May 2019 10:27:18 +0530 Subject: [Gluster-users] Does replace-brick migrate data? In-Reply-To: References: <32e26faf-e5c0-b944-2a32-c9eae408b146@redhat.com> <0ab0c28a-48a1-92c0-a106-f4fa94cb620f@redhat.com> <39dcc6a5-1610-93e1-aaff-7fef9b6c1faa@redhat.com> Message-ID: On 31/05/19 3:20 AM, Alan Orth wrote: > Dear Ravi, > > I spent a bit of time inspecting the xattrs on some files and > directories on a few bricks for this volume and it looks a bit messy. > Even if I could make sense of it for a few and potentially heal them > manually, there are millions of files and directories in total so > that's definitely not a scalable solution. After a few missteps with > `replace-brick ... commit force` in the last week?one of which on a > brick that was dead/offline?as well as some premature `remove-brick` > commands, I'm unsure how how to proceed and I'm getting demotivated. > It's scary how quickly things get out of hand in distributed systems... Hi Alan, The one good thing about gluster is it that the data is always available directly on the backed bricks even if your volume has inconsistencies at the gluster level. So theoretically, if your cluster is FUBAR, you could just create a new volume and copy all data onto it via its mount from the old volume's bricks. > > I had hoped that bringing the old brick back up would help, but by the > time I added it again a few days had passed and all the brick-id's had > changed due to the replace/remove brick commands, not to mention that > the trusted.afr.$volume-client-xx values were now probably pointing to > the wrong bricks (?). > > Anyways, a few hours ago I started a full heal on the volume and I see > that there is a sustained 100MiB/sec of network traffic going from the > old brick's host to the new one. The completed heals reported in the > logs look promising too: > > Old brick host: > > # grep '2019-05-30' /var/log/glusterfs/glustershd.log | grep -o -E > 'Completed (data|metadata|entry) selfheal' | sort | uniq -c > ?281614 Completed data selfheal > ? ? ?84 Completed entry selfheal > ?299648 Completed metadata selfheal > > New brick host: > > # grep '2019-05-30' /var/log/glusterfs/glustershd.log | grep -o -E > 'Completed (data|metadata|entry) selfheal' | sort | uniq -c > ?198256 Completed data selfheal > ? 16829 Completed entry selfheal > ?229664 Completed metadata selfheal > > So that's good I guess, though I have no idea how long it will take or > if it will fix the "missing files" issue on the FUSE mount. I've > increased cluster.shd-max-threads to 8 to hopefully speed up the heal > process. The afr xattrs should not cause files to disappear from mount. If the xattr names do not match what each AFR subvol expects (for eg. in a replica 2 volume, trusted.afr.*-client-{0,1} for 1st subvol, client-{2,3} for 2nd subvol and so on - ) for its children then it won't heal the data, that is all. But in your case I see some inconsistencies like one brick having the actual file (licenseserver.cfg) and the other having a linkto file (the one with thedht.linkto xattr) /in the same replica pair/. > > I'd be happy for any advice or pointers, Did you check if the .glusterfs hardlinks/symlinks exist and are in order for all bricks? -Ravi > > On Wed, May 29, 2019 at 5:20 PM Alan Orth > wrote: > > Dear Ravi, > > Thank you for the link to the blog post series?it is very > informative and current! If I understand your blog post correctly > then I think the answer to your previous question about pending > AFRs is: no, there are no pending AFRs. I have identified one file > that is a good test case to try to understand what happened after > I issued the `gluster volume replace-brick ... commit force` a few > days ago and then added the same original brick back to the volume > later. This is the current state of the replica 2 > distribute/replicate volume: > > [root at wingu0 ~]# gluster volume info apps > > Volume Name: apps > Type: Distributed-Replicate > Volume ID: f118d2da-79df-4ee1-919d-53884cd34eda > Status: Started > Snapshot Count: 0 > Number of Bricks: 3 x 2 = 6 > Transport-type: tcp > Bricks: > Brick1: wingu3:/mnt/gluster/apps > Brick2: wingu4:/mnt/gluster/apps > Brick3: wingu05:/data/glusterfs/sdb/apps > Brick4: wingu06:/data/glusterfs/sdb/apps > Brick5: wingu0:/mnt/gluster/apps > Brick6: wingu05:/data/glusterfs/sdc/apps > Options Reconfigured: > diagnostics.client-log-level: DEBUG > storage.health-check-interval: 10 > nfs.disable: on > > I checked the xattrs of one file that is missing from the volume's > FUSE mount (though I can read it if I access its full path > explicitly), but is present in several of the volume's bricks > (some with full size, others empty): > > [root at wingu0 ~]# getfattr -d -m. -e hex > /mnt/gluster/apps/clcgenomics/clclicsrv/licenseserver.cfg > > getfattr: Removing leading '/' from absolute path names # file: > mnt/gluster/apps/clcgenomics/clclicsrv/licenseserver.cfg > security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 > trusted.afr.apps-client-3=0x000000000000000000000000 > trusted.afr.apps-client-5=0x000000000000000000000000 > trusted.afr.dirty=0x000000000000000000000000 > trusted.bit-rot.version=0x0200000000000000585a396f00046e15 > trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd [root at wingu05 ~]# > getfattr -d -m. -e hex > /data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg > getfattr: Removing leading '/' from absolute path names # file: > data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg > security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 > trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd > trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667 > trusted.glusterfs.dht.linkto=0x617070732d7265706c69636174652d3200 > [root at wingu05 ~]# getfattr -d -m. -e hex > /data/glusterfs/sdc/apps/clcgenomics/clclicsrv/licenseserver.cfg > getfattr: Removing leading '/' from absolute path names # file: > data/glusterfs/sdc/apps/clcgenomics/clclicsrv/licenseserver.cfg > security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 > trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd > trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667 > [root at wingu06 ~]# getfattr -d -m. -e hex > /data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg > getfattr: Removing leading '/' from absolute path names # file: > data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg > security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 > trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd > trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667 > trusted.glusterfs.dht.linkto=0x617070732d7265706c69636174652d3200 > > According to the trusted.afr.apps-client-xxxattrs this particular > file should be on bricks with id "apps-client-3" and > "apps-client-5". It took me a few hours to realize that the > brick-id values are recorded in the volume's volfiles in > /var/lib/glusterd/vols/apps/bricks. After comparing those brick-id > values with a volfile backup from before the replace-brick, I > realized that the files are simply on the wrong brick now as far > as Gluster is concerned. This particular file is now on the brick > for "apps-client-4". As an experiment I copied this one file to > the two bricks listed in the xattrs and I was then able to see the > file from the FUSE mount (yay!). > > Other than replacing the brick, removing it, and then adding the > old brick on the original server back, there has been no change in > the data this entire time. Can I change the brick IDs in the > volfiles so they reflect where the data actually is? Or perhaps > script something to reset all the xattrs on the files/directories > to point to the correct bricks? > > Thank you for any help or pointers, > > On Wed, May 29, 2019 at 7:24 AM Ravishankar N > > wrote: > > > On 29/05/19 9:50 AM, Ravishankar N wrote: >> >> >> On 29/05/19 3:59 AM, Alan Orth wrote: >>> Dear Ravishankar, >>> >>> I'm not sure if Brick4 had pending AFRs because I don't know >>> what that means and it's been a few days so I am not sure I >>> would be able to find that information. >> When you find some time, have a look at a blog >> series I wrote about AFR- I've tried >> to explain what one needs to know to debug replication >> related issues in it. > > Made a typo error. The URL for the blog is https://wp.me/peiBB-6b > > -Ravi > >>> >>> Anyways, after wasting a few days rsyncing the old brick to >>> a new host I decided to just try to add the old brick back >>> into the volume instead of bringing it up on the new host. I >>> created a new brick directory on the old host, moved the old >>> brick's contents into that new directory (minus the >>> .glusterfs directory), added the new brick to the volume, >>> and then did Vlad's find/stat trick? from the brick to the >>> FUSE mount point. >>> >>> The interesting problem I have now is that some files don't >>> appear in the FUSE mount's directory listings, but I can >>> actually list them directly and even read them. What could >>> cause that? >> Not sure, too many variables in the hacks that you did to >> take a guess. You can check if the contents of the .glusterfs >> folder are in order on the new brick (example hardlink for >> files and symlinks for directories are present etc.) . >> Regards, >> Ravi >>> >>> Thanks, >>> >>> ? >>> https://lists.gluster.org/pipermail/gluster-users/2018-February/033584.html >>> >>> On Fri, May 24, 2019 at 4:59 PM Ravishankar N >>> > wrote: >>> >>> >>> On 23/05/19 2:40 AM, Alan Orth wrote: >>>> Dear list, >>>> >>>> I seem to have gotten into a tricky situation. Today I >>>> brought up a shiny new server with new disk arrays and >>>> attempted to replace one brick of a replica 2 >>>> distribute/replicate volume on an older server using >>>> the `replace-brick` command: >>>> >>>> # gluster volume replace-brick homes >>>> wingu0:/mnt/gluster/homes >>>> wingu06:/data/glusterfs/sdb/homes commit force >>>> >>>> The command was successful and I see the new brick in >>>> the output of `gluster volume info`. The problem is >>>> that Gluster doesn't seem to be migrating the data, >>> >>> `replace-brick` definitely must heal (not migrate) the >>> data. In your case, data must have been healed from >>> Brick-4 to the replaced Brick-3. Are there any errors in >>> the self-heal daemon logs of Brick-4's node? Does >>> Brick-4 have pending AFR xattrs blaming Brick-3? The doc >>> is a bit out of date. replace-brick command internally >>> does all the setfattr steps that are mentioned in the doc. >>> >>> -Ravi >>> >>> >>>> and now the original brick that I replaced is no longer >>>> part of the volume (and a few terabytes of data are >>>> just sitting on the old brick): >>>> >>>> # gluster volume info homes | grep -E "Brick[0-9]:" >>>> Brick1: wingu4:/mnt/gluster/homes >>>> Brick2: wingu3:/mnt/gluster/homes >>>> Brick3: wingu06:/data/glusterfs/sdb/homes >>>> Brick4: wingu05:/data/glusterfs/sdb/homes >>>> Brick5: wingu05:/data/glusterfs/sdc/homes >>>> Brick6: wingu06:/data/glusterfs/sdc/homes >>>> >>>> I see the Gluster docs have a more complicated >>>> procedure for replacing bricks that involves >>>> getfattr/setfattr?. How can I tell Gluster about the >>>> old brick? I see that I have a backup of the old >>>> volfile thanks to yum's rpmsave function if that helps. >>>> >>>> We are using Gluster 5.6 on CentOS 7. Thank you for any >>>> advice you can give. >>>> >>>> ? >>>> https://docs.gluster.org/en/latest/Administrator%20Guide/Managing%20Volumes/#replace-faulty-brick >>>> >>>> -- >>>> Alan Orth >>>> alan.orth at gmail.com >>>> https://picturingjordan.com >>>> https://englishbulgaria.net >>>> https://mjanja.ch >>>> "In heaven all the interesting people are missing." >>>> ?Friedrich Nietzsche >>>> >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> >>> >>> -- >>> Alan Orth >>> alan.orth at gmail.com >>> https://picturingjordan.com >>> https://englishbulgaria.net >>> https://mjanja.ch >>> "In heaven all the interesting people are missing." >>> ?Friedrich Nietzsche >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > > > -- > Alan Orth > alan.orth at gmail.com > https://picturingjordan.com > https://englishbulgaria.net > https://mjanja.ch > "In heaven all the interesting people are missing." ?Friedrich > Nietzsche > > > > -- > Alan Orth > alan.orth at gmail.com > https://picturingjordan.com > https://englishbulgaria.net > https://mjanja.ch > "In heaven all the interesting people are missing." ?Friedrich Nietzsche -------------- next part -------------- An HTML attachment was scrubbed... URL: From khiremat at redhat.com Fri May 31 05:34:39 2019 From: khiremat at redhat.com (Kotresh Hiremath Ravishankar) Date: Fri, 31 May 2019 11:04:39 +0530 Subject: [Gluster-users] Geo Replication stops replicating In-Reply-To: References: Message-ID: Hi, This looks like the hang because stderr buffer filled up with errors messages and no one reading it. I think this issue is fixed in latest releases. As a workaround, you can do following and check if it works. Prerequisite: rsync version should be > 3.1.0 Workaround: gluster volume geo-replication :: config rsync-options "--ignore-missing-args" Thanks, Kotresh HR On Thu, May 30, 2019 at 5:39 PM deepu srinivasan wrote: > Hi > We were evaluating Gluster geo Replication between two DCs one is in US > west and one is in US east. We took multiple trials for different file > size. > The Geo Replication tends to stop replicating but while checking the > status it appears to be in Active state. But the slave volume did not > increase in size. > So we have restarted the geo-replication session and checked the status. > The status was in an active state and it was in History Crawl for a long > time. We have enabled the DEBUG mode in logging and checked for any error. > There was around 2000 file appeared for syncing candidate. The Rsync > process starts but the rsync did not happen in the slave volume. Every time > the rsync process appears in the "ps auxxx" list but the replication did > not happen in the slave end. What would be the cause of this problem? Is > there anyway to debug it? > > We have also checked the strace of the rync program. > it displays something like this > > "write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 128" > > > We are using the below specs > > Gluster version - 4.1.7 > Sync mode - rsync > Volume - 1x3 in each end (master and slave) > Intranet Bandwidth - 10 Gig > -- Thanks and Regards, Kotresh H R -------------- next part -------------- An HTML attachment was scrubbed... URL: From khiremat at redhat.com Fri May 31 09:55:37 2019 From: khiremat at redhat.com (Kotresh Hiremath Ravishankar) Date: Fri, 31 May 2019 15:25:37 +0530 Subject: [Gluster-users] Geo Replication stops replicating In-Reply-To: References: Message-ID: Hi, Could you take the strace with with more string size? The argument strings are truncated. strace -s 500 -ttt -T -p On Fri, May 31, 2019 at 3:17 PM deepu srinivasan wrote: > Hi Kotresh > The above-mentioned work around did not work properly. > > On Fri, May 31, 2019 at 3:16 PM deepu srinivasan > wrote: > >> Hi Kotresh >> We have tried the above-mentioned rsync option and we are planning to >> have the version upgrade to 6.0. >> >> On Fri, May 31, 2019 at 11:04 AM Kotresh Hiremath Ravishankar < >> khiremat at redhat.com> wrote: >> >>> Hi, >>> >>> This looks like the hang because stderr buffer filled up with errors >>> messages and no one reading it. >>> I think this issue is fixed in latest releases. As a workaround, you can >>> do following and check if it works. >>> >>> Prerequisite: >>> rsync version should be > 3.1.0 >>> >>> Workaround: >>> gluster volume geo-replication :: >>> config rsync-options "--ignore-missing-args" >>> >>> Thanks, >>> Kotresh HR >>> >>> >>> >>> >>> On Thu, May 30, 2019 at 5:39 PM deepu srinivasan >>> wrote: >>> >>>> Hi >>>> We were evaluating Gluster geo Replication between two DCs one is in US >>>> west and one is in US east. We took multiple trials for different file >>>> size. >>>> The Geo Replication tends to stop replicating but while checking the >>>> status it appears to be in Active state. But the slave volume did not >>>> increase in size. >>>> So we have restarted the geo-replication session and checked the >>>> status. The status was in an active state and it was in History Crawl for a >>>> long time. We have enabled the DEBUG mode in logging and checked for any >>>> error. >>>> There was around 2000 file appeared for syncing candidate. The Rsync >>>> process starts but the rsync did not happen in the slave volume. Every time >>>> the rsync process appears in the "ps auxxx" list but the replication did >>>> not happen in the slave end. What would be the cause of this problem? Is >>>> there anyway to debug it? >>>> >>>> We have also checked the strace of the rync program. >>>> it displays something like this >>>> >>>> "write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 128" >>>> >>>> >>>> We are using the below specs >>>> >>>> Gluster version - 4.1.7 >>>> Sync mode - rsync >>>> Volume - 1x3 in each end (master and slave) >>>> Intranet Bandwidth - 10 Gig >>>> >>> >>> >>> -- >>> Thanks and Regards, >>> Kotresh H R >>> >> -- Thanks and Regards, Kotresh H R -------------- next part -------------- An HTML attachment was scrubbed... URL: From khiremat at redhat.com Fri May 31 10:52:52 2019 From: khiremat at redhat.com (Kotresh Hiremath Ravishankar) Date: Fri, 31 May 2019 16:22:52 +0530 Subject: [Gluster-users] Geo Replication stops replicating In-Reply-To: References: Message-ID: Yes, rsync config option should have fixed this issue. Could you share the output of the following? 1. gluster volume geo-replication :: config rsync-options 2. ps -ef | grep rsync On Fri, May 31, 2019 at 4:11 PM deepu srinivasan wrote: > Done. > We got the following result . > >> 1559298781.338234 write(2, "rsync: link_stat >> \"/tmp/gsyncd-aux-mount-EEJ_sY/.gfid/3fa6aed8-802e-4efe-9903-8bc171176d88\" >> failed: No such file or directory (2)", 128 > > seems like a file is missing ? > > On Fri, May 31, 2019 at 3:25 PM Kotresh Hiremath Ravishankar < > khiremat at redhat.com> wrote: > >> Hi, >> >> Could you take the strace with with more string size? The argument >> strings are truncated. >> >> strace -s 500 -ttt -T -p >> >> On Fri, May 31, 2019 at 3:17 PM deepu srinivasan >> wrote: >> >>> Hi Kotresh >>> The above-mentioned work around did not work properly. >>> >>> On Fri, May 31, 2019 at 3:16 PM deepu srinivasan >>> wrote: >>> >>>> Hi Kotresh >>>> We have tried the above-mentioned rsync option and we are planning to >>>> have the version upgrade to 6.0. >>>> >>>> On Fri, May 31, 2019 at 11:04 AM Kotresh Hiremath Ravishankar < >>>> khiremat at redhat.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> This looks like the hang because stderr buffer filled up with errors >>>>> messages and no one reading it. >>>>> I think this issue is fixed in latest releases. As a workaround, you >>>>> can do following and check if it works. >>>>> >>>>> Prerequisite: >>>>> rsync version should be > 3.1.0 >>>>> >>>>> Workaround: >>>>> gluster volume geo-replication :: >>>>> config rsync-options "--ignore-missing-args" >>>>> >>>>> Thanks, >>>>> Kotresh HR >>>>> >>>>> >>>>> >>>>> >>>>> On Thu, May 30, 2019 at 5:39 PM deepu srinivasan >>>>> wrote: >>>>> >>>>>> Hi >>>>>> We were evaluating Gluster geo Replication between two DCs one is in >>>>>> US west and one is in US east. We took multiple trials for different file >>>>>> size. >>>>>> The Geo Replication tends to stop replicating but while checking the >>>>>> status it appears to be in Active state. But the slave volume did not >>>>>> increase in size. >>>>>> So we have restarted the geo-replication session and checked the >>>>>> status. The status was in an active state and it was in History Crawl for a >>>>>> long time. We have enabled the DEBUG mode in logging and checked for any >>>>>> error. >>>>>> There was around 2000 file appeared for syncing candidate. The Rsync >>>>>> process starts but the rsync did not happen in the slave volume. Every time >>>>>> the rsync process appears in the "ps auxxx" list but the replication did >>>>>> not happen in the slave end. What would be the cause of this problem? Is >>>>>> there anyway to debug it? >>>>>> >>>>>> We have also checked the strace of the rync program. >>>>>> it displays something like this >>>>>> >>>>>> "write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 128" >>>>>> >>>>>> >>>>>> We are using the below specs >>>>>> >>>>>> Gluster version - 4.1.7 >>>>>> Sync mode - rsync >>>>>> Volume - 1x3 in each end (master and slave) >>>>>> Intranet Bandwidth - 10 Gig >>>>>> >>>>> >>>>> >>>>> -- >>>>> Thanks and Regards, >>>>> Kotresh H R >>>>> >>>> >> >> -- >> Thanks and Regards, >> Kotresh H R >> > -- Thanks and Regards, Kotresh H R -------------- next part -------------- An HTML attachment was scrubbed... URL: From khiremat at redhat.com Fri May 31 11:05:50 2019 From: khiremat at redhat.com (Kotresh Hiremath Ravishankar) Date: Fri, 31 May 2019 16:35:50 +0530 Subject: [Gluster-users] Geo Replication stops replicating In-Reply-To: References: Message-ID: That means it could be working and the defunct process might be some old zombie one. Could you check, that data progress ? On Fri, May 31, 2019 at 4:29 PM deepu srinivasan wrote: > Hi > When i change the rsync option the rsync process doesnt seem to start . > Only a defunt process is listed in ps aux. Only when i set rsync option to > " " and restart all the process the rsync process is listed in ps aux. > > > On Fri, May 31, 2019 at 4:23 PM Kotresh Hiremath Ravishankar < > khiremat at redhat.com> wrote: > >> Yes, rsync config option should have fixed this issue. >> >> Could you share the output of the following? >> >> 1. gluster volume geo-replication :: >> config rsync-options >> 2. ps -ef | grep rsync >> >> On Fri, May 31, 2019 at 4:11 PM deepu srinivasan >> wrote: >> >>> Done. >>> We got the following result . >>> >>>> 1559298781.338234 write(2, "rsync: link_stat >>>> \"/tmp/gsyncd-aux-mount-EEJ_sY/.gfid/3fa6aed8-802e-4efe-9903-8bc171176d88\" >>>> failed: No such file or directory (2)", 128 >>> >>> seems like a file is missing ? >>> >>> On Fri, May 31, 2019 at 3:25 PM Kotresh Hiremath Ravishankar < >>> khiremat at redhat.com> wrote: >>> >>>> Hi, >>>> >>>> Could you take the strace with with more string size? The argument >>>> strings are truncated. >>>> >>>> strace -s 500 -ttt -T -p >>>> >>>> On Fri, May 31, 2019 at 3:17 PM deepu srinivasan >>>> wrote: >>>> >>>>> Hi Kotresh >>>>> The above-mentioned work around did not work properly. >>>>> >>>>> On Fri, May 31, 2019 at 3:16 PM deepu srinivasan >>>>> wrote: >>>>> >>>>>> Hi Kotresh >>>>>> We have tried the above-mentioned rsync option and we are planning to >>>>>> have the version upgrade to 6.0. >>>>>> >>>>>> On Fri, May 31, 2019 at 11:04 AM Kotresh Hiremath Ravishankar < >>>>>> khiremat at redhat.com> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> This looks like the hang because stderr buffer filled up with errors >>>>>>> messages and no one reading it. >>>>>>> I think this issue is fixed in latest releases. As a workaround, you >>>>>>> can do following and check if it works. >>>>>>> >>>>>>> Prerequisite: >>>>>>> rsync version should be > 3.1.0 >>>>>>> >>>>>>> Workaround: >>>>>>> gluster volume geo-replication :: >>>>>>> config rsync-options "--ignore-missing-args" >>>>>>> >>>>>>> Thanks, >>>>>>> Kotresh HR >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, May 30, 2019 at 5:39 PM deepu srinivasan >>>>>>> wrote: >>>>>>> >>>>>>>> Hi >>>>>>>> We were evaluating Gluster geo Replication between two DCs one is >>>>>>>> in US west and one is in US east. We took multiple trials for different >>>>>>>> file size. >>>>>>>> The Geo Replication tends to stop replicating but while checking >>>>>>>> the status it appears to be in Active state. But the slave volume did not >>>>>>>> increase in size. >>>>>>>> So we have restarted the geo-replication session and checked the >>>>>>>> status. The status was in an active state and it was in History Crawl for a >>>>>>>> long time. We have enabled the DEBUG mode in logging and checked for any >>>>>>>> error. >>>>>>>> There was around 2000 file appeared for syncing candidate. The >>>>>>>> Rsync process starts but the rsync did not happen in the slave volume. >>>>>>>> Every time the rsync process appears in the "ps auxxx" list but the >>>>>>>> replication did not happen in the slave end. What would be the cause of >>>>>>>> this problem? Is there anyway to debug it? >>>>>>>> >>>>>>>> We have also checked the strace of the rync program. >>>>>>>> it displays something like this >>>>>>>> >>>>>>>> "write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 128" >>>>>>>> >>>>>>>> >>>>>>>> We are using the below specs >>>>>>>> >>>>>>>> Gluster version - 4.1.7 >>>>>>>> Sync mode - rsync >>>>>>>> Volume - 1x3 in each end (master and slave) >>>>>>>> Intranet Bandwidth - 10 Gig >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Thanks and Regards, >>>>>>> Kotresh H R >>>>>>> >>>>>> >>>> >>>> -- >>>> Thanks and Regards, >>>> Kotresh H R >>>> >>> >> >> -- >> Thanks and Regards, >> Kotresh H R >> > -- Thanks and Regards, Kotresh H R -------------- next part -------------- An HTML attachment was scrubbed... URL: From spisla80 at gmail.com Fri May 31 12:32:22 2019 From: spisla80 at gmail.com (David Spisla) Date: Fri, 31 May 2019 14:32:22 +0200 Subject: [Gluster-users] [Gluster-devel] Improve stability between SMB/CTDB and Gluster (together with Samba Core Developer) In-Reply-To: References: Message-ID: Hello together, in order not to lose the focus for the topic, I make new date suggestions for next week June 03th ? 07th at 12:30 - 14:30 IST or (9:00 - 11:00 CEST) June 03th ? 06th at 16:30 - 18:30 IST or (13:00 - 15:00 CEST) Regards David Spisla Am Di., 21. Mai 2019 um 11:24 Uhr schrieb David Spisla : > Hello together, > > we are still seeking a day and time to talk about interesting Samba / > Glusterfs issues. Here is a new list of possible dates and time. > > May 22th ? 24th at 12:30 - 14:30 IST or (9:00 - 11:00 CEST) > > May 27th ? 29th and 31th at 12:30 - 14:30 IST (9:00 - 11:00 CEST) > > > On May 30th there is a holiday here in germany. > > @Poornima Gurusiddaiah If there is any problem > finding a date please contanct me. I will look for alternatives > > > Regards > > David Spisla > > > > Am Do., 16. Mai 2019 um 12:42 Uhr schrieb David Spisla >: > >> Hello Amar, >> >> thank you for the information. Of course, we should wait for Poornima >> because of her knowledge. >> >> Regards >> David Spisla >> >> Am Do., 16. Mai 2019 um 12:23 Uhr schrieb Amar Tumballi Suryanarayan < >> atumball at redhat.com>: >> >>> David, Poornima is on leave from today till 21st May. So having it after >>> she comes back is better. She has more experience in SMB integration than >>> many of us. >>> >>> -Amar >>> >>> On Thu, May 16, 2019 at 1:09 PM David Spisla wrote: >>> >>>> Hello everyone, >>>> >>>> if there is any problem in finding a date and time, please contact me. >>>> It would be fine to have a meeting soon. >>>> >>>> Regards >>>> David Spisla >>>> >>>> Am Mo., 13. Mai 2019 um 12:38 Uhr schrieb David Spisla < >>>> david.spisla at iternity.com>: >>>> >>>>> Hi Poornima, >>>>> >>>>> >>>>> >>>>> thats fine. I would suggest this dates and times: >>>>> >>>>> >>>>> >>>>> May 15th ? 17th at 12:30, 13:30, 14:30 IST (9:00, 10:00, 11:00 CEST) >>>>> >>>>> May 20th ? 24th at 12:30, 13:30, 14:30 IST (9:00, 10:00, 11:00 CEST) >>>>> >>>>> >>>>> >>>>> I add Volker Lendecke from Sernet to the mail. He is the Samba Expert. >>>>> >>>>> Can someone of you provide a host via bluejeans.com? If not, I will >>>>> try it with GoToMeeting (https://www.gotomeeting.com). >>>>> >>>>> >>>>> >>>>> @all Please write your prefered dates and times. For me, all oft the >>>>> above dates and times are fine >>>>> >>>>> >>>>> >>>>> Regards >>>>> >>>>> David >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> *Von:* Poornima Gurusiddaiah >>>>> *Gesendet:* Montag, 13. Mai 2019 07:22 >>>>> *An:* David Spisla ; Anoop C S ; >>>>> Gunther Deschner >>>>> *Cc:* Gluster Devel ; >>>>> gluster-users at gluster.org List >>>>> *Betreff:* Re: [Gluster-devel] Improve stability between SMB/CTDB and >>>>> Gluster (together with Samba Core Developer) >>>>> >>>>> >>>>> >>>>> Hi, >>>>> >>>>> >>>>> >>>>> We would be definitely interested in this. Thank you for contacting >>>>> us. For the starter we can have an online conference. Please suggest few >>>>> possible date and times for the week(preferably between IST 7.00AM - >>>>> 9.PM)? >>>>> >>>>> Adding Anoop and Gunther who are also the main contributors to the >>>>> Gluster-Samba integration. >>>>> >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Poornima >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Thu, May 9, 2019 at 7:43 PM David Spisla >>>>> wrote: >>>>> >>>>> Dear Gluster Community, >>>>> >>>>> at the moment we are improving the stability of SMB/CTDB and Gluster. >>>>> For this purpose we are working together with an advanced SAMBA Core >>>>> Developer. He did some debugging but needs more information about Gluster >>>>> Core Behaviour. >>>>> >>>>> >>>>> >>>>> *Would any of the Gluster Developer wants to have a online conference >>>>> with him and me?* >>>>> >>>>> >>>>> >>>>> I would organize everything. In my opinion this is a good chance to >>>>> improve stability of Glusterfs and this is at the moment one of the major >>>>> issues in the Community. >>>>> >>>>> >>>>> >>>>> Regards >>>>> >>>>> David Spisla >>>>> >>>>> _______________________________________________ >>>>> >>>>> Community Meeting Calendar: >>>>> >>>>> APAC Schedule - >>>>> Every 2nd and 4th Tuesday at 11:30 AM IST >>>>> Bridge: https://bluejeans.com/836554017 >>>>> >>>>> NA/EMEA Schedule - >>>>> Every 1st and 3rd Tuesday at 01:00 PM EDT >>>>> Bridge: https://bluejeans.com/486278655 >>>>> >>>>> Gluster-devel mailing list >>>>> Gluster-devel at gluster.org >>>>> https://lists.gluster.org/mailman/listinfo/gluster-devel >>>>> >>>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users at gluster.org >>>> https://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> >>> >>> -- >>> Amar Tumballi (amarts) >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From cody at platform9.com Wed May 1 17:42:40 2019 From: cody at platform9.com (Cody Hill) Date: Wed, 01 May 2019 17:42:40 -0000 Subject: [Gluster-users] GlusterFS on ZFS In-Reply-To: References: <085deed5-f048-4baa-84f8-1f6ef1436a5b@email.android.com> Message-ID: <60CDF008-E11D-40C7-9D10-A30665FF8847@platform9.com> Thanks Amar. I?m going to see what kind of performance I get with just ZFS cache using Intel Optane and RaidZ10 with 12x drives. If this performs better than AWS GP2, I?m good. If not I?ll look into dmcache. Has anyone used bcache? Have any experience there? Thank you, Cody Hill | Director of Technology | Platform9 Direct: (650) 567-3107 cody at platform9.com | Platform9.com | Public Calendar > On May 1, 2019, at 7:34 AM, Amar Tumballi Suryanarayan wrote: > > > > On Tue, Apr 23, 2019 at 11:38 PM Cody Hill > wrote: > > Thanks for the info Karli, > > I wasn?t aware ZFS Dedup was such a dog. I guess I?ll leave that off. My data get?s 3.5:1 savings on compression alone. I was aware of stripped sets. I will be doing 6x Striped sets across 12x disks. > > On top of this design I?m going to try and test Intel Optane DIMM (512GB) as a ?Tier? for GlusterFS to try and get further write acceleration. And issues with GlusterFS ?Tier? functionality that anyone is aware of? > > > Hi Cody, I wanted to be honest about GlusterFS 'Tier' functionality. While it is functional and works, we had not seen the actual benefit we expected with the feature, and noticed it is better to use the tiering on each host machine (ie, on bricks) and use those bricks as glusterfs bricks. (like dmcache). > > Also note that from glusterfs-6.x releases, Tier feature is deprecated. > > -Amar > > Thank you, > Cody Hill > >> On Apr 18, 2019, at 2:32 AM, Karli Sj?berg > wrote: >> >> >> >> Den 17 apr. 2019 16:30 skrev Cody Hill >: >> Hey folks. >> >> I?m looking to deploy GlusterFS to host some VMs. I?ve done a lot of reading and would like to implement Deduplication and Compression in this setup. My thought would be to run ZFS to handle the Compression and Deduplication. >> >> You _really_ don't want ZFS doing dedup for any reason. >> >> >> ZFS would give me the following benefits: >> 1. If a single disk fails rebuilds happen locally instead of over the network >> 2. Zil & L2Arc should add a slight performance increase >> >> Adding two really good NVME SSD's as a mirrored SLOG vdev does a huge deal for synchronous write performance, turning every random write into large streams that the spinning drives handle better. >> >> Don't know how picky Gluster is about synchronicity though, most "performance" tweaking suggests setting stuff to async, which I wouldn't recommend, but it's a huge boost for throughput obviously; not having to wait for stuff to actually get written, but it's dangerous. >> >> With mirrored NVME SLOG's, you could probably get that throughput without going asynchronous, which saves you from potential data corruption in a sudden power loss. >> >> L2ARC on the other hand does a bit for read latency, but for a general purpose file server- in practice- not a huge difference, the working set is just too large. Also keep in mind that L2ARC isn't "free". You need more RAM to know where you've cached stuff... >> >> 3. Deduplication and Compression are inline and have pretty good performance with modern hardware (Intel Skylake) >> >> ZFS deduplication has terrible performance. Watch your throughput automatically drop from hundreds or thousands of MB/s down to, like 5. It's a feature;) >> >> 4. Automated Snapshotting >> >> I can then layer GlusterFS on top to handle distribution to allow 3x Replicas of my storage. >> My question is? Why aren?t more people doing this? Is this a horrible idea for some reason that I?m missing? >> >> While it could save a lot of space in some hypothetical instance, the drawbacks can never motivate it. E.g. if you want one node to suddenly die and never recover because of RAM exhaustion, go with ZFS dedup ;) >> >> I?d be very interested to hear your thoughts. >> >> Avoid ZFS dedup at all costs. LZ4 compression on the hand is awesome, definitely use that! It's basically a free performance enhancer the also saves space :) >> >> As another person has said, the best performance layout is RAID10- striped mirrors. I understand you'd want to get as much volume as possible with RAID-Z/RAID(5|6) since gluster also replicates/distributes, but it has a huge impact on IOPS. If performance is the main concern, do striped mirrors with replica 3 in Gluster. My advice is to test thoroughly with different pool layouts to see what gives acceptable performance against your volume requirements. >> >> /K >> >> >> Additional thoughts: >> I?d like to use Ganesha pNFS to connect to this storage. (Any issues here?) >> I think I?d need KeepAliveD across these 3x nodes to store in the FSTAB (Is this correct?) >> I?m also thinking about creating a ?Gluster Tier? of 512GB of Intel Optane DIMM to really smooth out write latencies? Any issues here? >> >> Thank you, >> Cody Hill >> >> >> >> >> >> >> >> >> >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users > > -- > Amar Tumballi (amarts) -------------- next part -------------- An HTML attachment was scrubbed... URL: From Tim.Stalker at ucdenver.edu Thu May 2 19:09:07 2019 From: Tim.Stalker at ucdenver.edu (Stalker, Tim) Date: Thu, 02 May 2019 19:09:07 -0000 Subject: [Gluster-users] geo-replication create push-pem command failure Message-ID: I'm running gluster 4.1.8 on a two node replicated cluster. I'm trying to get geo-replication setup between one slave node with a running volume on it. Everything is working, I have a passwordless connection between a geogrp user on the slave and the root user on master 1. When I run gluster volume geo-replication mastervol slavevolumemngr at slave1::slavevol create push-pem I get this error: gluster command on slavevolumemngr at slave1 failed. Error: bash: gluster: command not found I have passwordless login setup from the root user on master1's pub key and the key here /var/lib/glusterd/geo-replication/secret.pem.pub Is there a logfile for the attempt to run create push-pem? ssh is setup correctly. On master1 I've tried both of these commands then the create push-pem command gluster-georep-sshkey generate --no-prefix gluster-georep-sshkey generate Any help anyone can provide would be great Thanks, -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdeepugd at gmail.com Wed May 8 19:16:38 2019 From: sdeepugd at gmail.com (deepu srinivasan) Date: Wed, 08 May 2019 19:16:38 -0000 Subject: [Gluster-users] No Variation in Gluster geo Replication when sync jobs value is increased. Message-ID: Hi We were evaluating Gluster geo Replication. We took multiple trials for different file size and different syncjob value. While increasing or decreasing the syncjob value the geo replication didnot increase or decrease its time period for synching.Is this normal or should we change any other configuration other than syncjob value? Here are the following stats Fie Count 8192 8192 8192 Folders 174 174 174 Trial 3 3 3 Trial 1 time taken 4min 4min20sec 4min56sec Trial 2 time taken 4min16sec 4min9sec 4min Trial 3 time taken 4min11sec 4min30sec 4min50sec Sync-jobs (gluster geo config) 3 10 30 Tool Gluster Geo Gluster Geo Gluster Geo -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.spisla at iternity.com Tue May 14 14:43:11 2019 From: david.spisla at iternity.com (David Spisla) Date: Tue, 14 May 2019 14:43:11 -0000 Subject: [Gluster-users] [Gluster-devel] Improve stability between SMB/CTDB and Gluster (together with Samba Core Developer) In-Reply-To: References: Message-ID: Hi Poornima, thats fine. I would suggest this dates and times: May 15th ? 17th at 12:30, 13:30, 14:30 IST (9:00, 10:00, 11:00 CEST) May 20th ? 24th at 12:30, 13:30, 14:30 IST (9:00, 10:00, 11:00 CEST) I add Volker Lendecke from Sernet to the mail. He is the Samba Expert. Can someone of you provide a host via bluejeans.com? If not, I will try it with GoToMeeting (https://www.gotomeeting.com). @all Please write your prefered dates and times. For me, all oft the above dates and times are fine Regards David David Spisla Software Engineer david.spisla at iternity.com +49 761 59034852 iTernity GmbH Heinrich-von-Stephan-Str. 21 79100 Freiburg Deutschland Website Newsletter Support Portal iTernity GmbH. Gesch?ftsf?hrer: Ralf Steinemann. ?Eingetragen beim Amtsgericht Freiburg: HRB-Nr. 701332. ?USt.Id DE242664311. [v01.023] Von: Poornima Gurusiddaiah Gesendet: Montag, 13. Mai 2019 07:22 An: David Spisla ; Anoop C S ; Gunther Deschner Cc: Gluster Devel ; gluster-users at gluster.org List Betreff: Re: [Gluster-devel] Improve stability between SMB/CTDB and Gluster (together with Samba Core Developer) Hi, We would be definitely interested in this. Thank you for contacting us. For the starter we can have an online conference. Please suggest few possible date and times for the week(preferably between IST 7.00AM - 9.PM)? Adding Anoop and Gunther who are also the main contributors to the Gluster-Samba integration. Thanks, Poornima On Thu, May 9, 2019 at 7:43 PM David Spisla > wrote: Dear Gluster Community, at the moment we are improving the stability of SMB/CTDB and Gluster. For this purpose we are working together with an advanced SAMBA Core Developer. He did some debugging but needs more information about Gluster Core Behaviour. Would any of the Gluster Developer wants to have a online conference with him and me? I would organize everything. In my opinion this is a good chance to improve stability of Glusterfs and this is at the moment one of the major issues in the Community. Regards David Spisla _______________________________________________ Community Meeting Calendar: APAC Schedule - Every 2nd and 4th Tuesday at 11:30 AM IST Bridge: https://bluejeans.com/836554017 NA/EMEA Schedule - Every 1st and 3rd Tuesday at 01:00 PM EDT Bridge: https://bluejeans.com/486278655 Gluster-devel mailing list Gluster-devel at gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image860747.png Type: image/png Size: 382 bytes Desc: image860747.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image735814.png Type: image/png Size: 412 bytes Desc: image735814.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image116096.png Type: image/png Size: 6545 bytes Desc: image116096.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image142576.png Type: image/png Size: 37146 bytes Desc: image142576.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image714843.png Type: image/png Size: 522 bytes Desc: image714843.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image293410.png Type: image/png Size: 591 bytes Desc: image293410.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image570372.png Type: image/png Size: 775 bytes Desc: image570372.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image031225.png Type: image/png Size: 508 bytes Desc: image031225.png URL: From jeff.bischoff at turbonomic.com Wed May 15 01:45:57 2019 From: jeff.bischoff at turbonomic.com (Jeff Bischoff) Date: Wed, 15 May 2019 01:45:57 -0000 Subject: [Gluster-users] Gluster mounts becoming stale and never recovering Message-ID: Hi all, We are having a sporadic issue with our Gluster mounts that is affecting several of our Kubernetes environments. We are having trouble understanding what is causing it, and we could use some guidance from the pros! Scenario We have an environment running a single-node Kubernetes with Heketi and several pods using Gluster mounts. The environment runs fine and the mounts appear to be healthy for up to several days. Suddenly, one or more (sometimes all) Gluster mounts report a stale mount and shut down the brick. The affected containers enter a crash loop that continues indefinitely, until someone intervenes. To work-around the crash loop, a user needs to trigger the bricks to be started again--either through manually starting them, restarting the Gluster pod or restarting the entire node. Diagnostics Looking at the glusterd.log file, the error message at the time the problem starts looks something like this: got disconnect from stale rpc on /var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_d0456279568a623a16a5508daa89b4d5/brick This message occurs once for each brick that stops responding. The brick does not recover on its own. Here is that same message again, with surrounding context included. [2019-05-07 11:53:38.663362] I [run.c:241:runner_log] (-->/usr/lib64/glusterfs/4.1.7/xlator/mgmt/glusterd.so(+0x3a7a5) [0x7f795f0d77a5] -->/usr/lib64/glusterfs/4.1.7/xlator/mgmt/glusterd.so(+0xe2765) [0x7f795f17f765] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7f79643180f5] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/stop/pre/S29CTDB-teardown.sh --volname=vol_d0a0dcf9903e236f68a3933c3060ec5a --last=no [2019-05-07 11:53:38.905338] E [run.c:241:runner_log] (-->/usr/lib64/glusterfs/4.1.7/xlator/mgmt/glusterd.so(+0x3a7a5) [0x7f795f0d77a5] -->/usr/lib64/glusterfs/4.1.7/xlator/mgmt/glusterd.so(+0xe26c3) [0x7f795f17f6c3] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7f79643180f5] ) 0-management: Failed to execute script: /var/lib/glusterd/hooks/1/stop/pre/S30samba-stop.sh --volname=vol_d0a0dcf9903e236f68a3933c3060ec5a --last=no [2019-05-07 11:53:38.982785] I [MSGID: 106542] [glusterd-utils.c:8253:glusterd_brick_signal] 0-glusterd: sending signal 15 to brick with pid 8951 [2019-05-07 11:53:39.983244] I [MSGID: 106143] [glusterd-pmap.c:397:pmap_registry_remove] 0-pmap: removing brick /var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_d0456279568a623a16a5508daa89b4d5/brick on port 49169 [2019-05-07 11:53:39.984656] W [glusterd-handler.c:6124:__glusterd_brick_rpc_notify] 0-management: got disconnect from stale rpc on /var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_d0456279568a623a16a5508daa89b4d5/brick [2019-05-07 11:53:40.316466] I [MSGID: 106131] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already stopped [2019-05-07 11:53:40.316601] I [MSGID: 106568] [glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: nfs service is stopped [2019-05-07 11:53:40.316644] I [MSGID: 106599] [glusterd-nfs-svc.c:82:glusterd_nfssvc_manager] 0-management: nfs/server.so xlator is not installed [2019-05-07 11:53:40.319650] I [MSGID: 106131] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd already stopped [2019-05-07 11:53:40.319708] I [MSGID: 106568] [glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: bitd service is stopped [2019-05-07 11:53:40.321091] I [MSGID: 106131] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already stopped [2019-05-07 11:53:40.321132] I [MSGID: 106568] [glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: scrub service is stopped The version of gluster we are using (running in a container, using the gluster/gluster-centos image from dockerhub): # rpm -qa | grep gluster glusterfs-rdma-4.1.7-1.el7.x86_64 gluster-block-0.3-2.el7.x86_64 python2-gluster-4.1.7-1.el7.x86_64 centos-release-gluster41-1.0-3.el7.centos.noarch glusterfs-4.1.7-1.el7.x86_64 glusterfs-api-4.1.7-1.el7.x86_64 glusterfs-cli-4.1.7-1.el7.x86_64 glusterfs-geo-replication-4.1.7-1.el7.x86_64 glusterfs-libs-4.1.7-1.el7.x86_64 glusterfs-client-xlators-4.1.7-1.el7.x86_64 glusterfs-fuse-4.1.7-1.el7.x86_64 glusterfs-server-4.1.7-1.el7.x86_64 The version of gluster running on our Kubernetes node (a CentOS system): ]$ rpm -qa | grep gluster glusterfs-libs-3.12.2-18.el7.x86_64 glusterfs-3.12.2-18.el7.x86_64 glusterfs-fuse-3.12.2-18.el7.x86_64 glusterfs-client-xlators-3.12.2-18.el7.x86_64 The Kubernetes version: $ kubectl version Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5", GitCommit:"2166946f41b36dea2c4626f90a77706f426cdea2", GitTreeState:"clean", BuildDate:"2019-03-25T15:19:22Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5", GitCommit:"2166946f41b36dea2c4626f90a77706f426cdea2", GitTreeState:"clean", BuildDate:"2019-03-25T15:19:22Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"} Full Gluster logs available if needed, just let me know how to provide them. Thanks in advance for any help or suggestions on this! Best, Jeff Bischoff Turbonomic This message and its attachments are intended only for the designated recipient(s). It may contain confidential or proprietary information and may be subject to legal or other confidentiality protections. If you are not a designated recipient, you may not review, copy or distribute this message. If you receive this in error, please notify the sender by reply e-mail and delete this message. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeff.bischoff at turbonomic.com Thu May 16 15:22:04 2019 From: jeff.bischoff at turbonomic.com (Jeff Bischoff) Date: Thu, 16 May 2019 15:22:04 -0000 Subject: [Gluster-users] Gluster mounts becoming stale and never recovering Message-ID: <2110B5B6-B284-4DF4-A0AA-87863F8FC7BF@vmturbo.com> Hi all, We are having a sporadic issue with our Gluster mounts that is affecting several of our Kubernetes environments. We are having trouble understanding what is causing it, and we could use some guidance from the pros! Scenario We have an environment running a single-node Kubernetes with Heketi and several pods using Gluster mounts. The environment runs fine and the mounts appear to be healthy for up to several days. Suddenly, one or more (sometimes all) Gluster mounts have a problem and shut down the brick. The affected containers enter a crash loop that continues indefinitely, until someone intervenes. To work-around the crash loop, a user needs to trigger the bricks to be started again--either through manually starting them, restarting the Gluster pod or restarting the entire node. Diagnostics The tell-tale error message is seeing the following when describing a pod that is in a crash loop: Message: error while creating mount source path '/var/lib/kubelet/pods/4a2574bb-6fa4-11e9-a315-005056b83c80/volumes/kubernetes.io~glusterfs/db': mkdir /var/lib/kubelet/pods/4a2574bb-6fa4-11e9-a315-005056b83c80/volumes/kubernetes.io~glusterfs/db: file exists We always see that "file exists" message when this error occurs. Looking at the glusterd.log file, there had been nothing in the log for over a day and then suddenly, at the time the crash loop started, this: [2019-05-08 13:49:04.733147] I [MSGID: 106143] [glusterd-pmap.c:397:pmap_registry_remove] 0-pmap: removing brick /var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_a3cef78a5914a2808da0b5736e3daec7/brick on port 49168 [2019-05-08 13:49:04.733374] I [MSGID: 106143] [glusterd-pmap.c:397:pmap_registry_remove] 0-pmap: removing brick /var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_7614e5014a0e402630a0e1fd776acf0a/brick on port 49167 [2019-05-08 13:49:05.003848] W [socket.c:599:__socket_rwv] 0-management: readv on /var/run/gluster/fe4ac75011a4de0e.socket failed (No data available) [2019-05-08 13:49:05.065420] W [socket.c:599:__socket_rwv] 0-management: readv on /var/run/gluster/85e9fb223aa121f2.socket failed (No data available) [2019-05-08 13:49:05.066479] W [socket.c:599:__socket_rwv] 0-management: readv on /var/run/gluster/e2a66e8cd8f5f606.socket failed (No data available) [2019-05-08 13:49:05.067444] W [socket.c:599:__socket_rwv] 0-management: readv on /var/run/gluster/a0625e5b78d69bb8.socket failed (No data available) [2019-05-08 13:49:05.068471] W [socket.c:599:__socket_rwv] 0-management: readv on /var/run/gluster/770bc294526d0360.socket failed (No data available) [2019-05-08 13:49:05.074278] W [socket.c:599:__socket_rwv] 0-management: readv on /var/run/gluster/adbd37fe3e1eed36.socket failed (No data available) [2019-05-08 13:49:05.075497] W [socket.c:599:__socket_rwv] 0-management: readv on /var/run/gluster/17712138f3370e53.socket failed (No data available) [2019-05-08 13:49:05.076545] W [socket.c:599:__socket_rwv] 0-management: readv on /var/run/gluster/a6cf1aca8b23f394.socket failed (No data available) [2019-05-08 13:49:05.077511] W [socket.c:599:__socket_rwv] 0-management: readv on /var/run/gluster/d0f83b191213e877.socket failed (No data available) [2019-05-08 13:49:05.078447] W [socket.c:599:__socket_rwv] 0-management: readv on /var/run/gluster/d5dd08945d4f7f6d.socket failed (No data available) [2019-05-08 13:49:05.079424] W [socket.c:599:__socket_rwv] 0-management: readv on /var/run/gluster/c8d7b10108758e2f.socket failed (No data available) [2019-05-08 13:49:14.778619] I [MSGID: 106143] [glusterd-pmap.c:397:pmap_registry_remove] 0-pmap: removing brick /var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_0ed4f7f941de388cda678fe273e9ceb4/brick on port 49166 ... (and more of the same) Nothing further has been printed to the gluster log since. The bricks do not come back on their own. The version of gluster we are using (running in a container, using the gluster/gluster-centos image from dockerhub): # rpm -qa | grep gluster glusterfs-rdma-4.1.7-1.el7.x86_64 gluster-block-0.3-2.el7.x86_64 python2-gluster-4.1.7-1.el7.x86_64 centos-release-gluster41-1.0-3.el7.centos.noarch glusterfs-4.1.7-1.el7.x86_64 glusterfs-api-4.1.7-1.el7.x86_64 glusterfs-cli-4.1.7-1.el7.x86_64 glusterfs-geo-replication-4.1.7-1.el7.x86_64 glusterfs-libs-4.1.7-1.el7.x86_64 glusterfs-client-xlators-4.1.7-1.el7.x86_64 glusterfs-fuse-4.1.7-1.el7.x86_64 glusterfs-server-4.1.7-1.el7.x86_64 The version of glusterfs running on our Kubernetes node (a CentOS system): ]$ rpm -qa | grep gluster glusterfs-libs-3.12.2-18.el7.x86_64 glusterfs-3.12.2-18.el7.x86_64 glusterfs-fuse-3.12.2-18.el7.x86_64 glusterfs-client-xlators-3.12.2-18.el7.x86_64 The Kubernetes version: $ kubectl version Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5", GitCommit:"2166946f41b36dea2c4626f90a77706f426cdea2", GitTreeState:"clean", BuildDate:"2019-03-25T15:19:22Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5", GitCommit:"2166946f41b36dea2c4626f90a77706f426cdea2", GitTreeState:"clean", BuildDate:"2019-03-25T15:19:22Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"} Our gluster settings/volume options: apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: gluster-heketi selfLink: /apis/storage.k8s.io/v1/storageclasses/gluster-heketi parameters: gidMax: "50000" gidMin: "2000" resturl: http://10.233.35.158:8080 restuser: "null" restuserkey: "null" volumetype: "none" volumeoptions: cluster.post-op-delay-secs 0, performance.client-io-threads off, performance.open-behind off, performance.readdir-ahead off, performance.read-ahead off, performance.stat-prefetch off, performance.write-behind off, performance.io-cache off, cluster.consistent-metadata on, performance.quick-read off, performance.strict-o-direct on provisioner: kubernetes.io/glusterfs reclaimPolicy: Delete Volume info for the heketi volume: gluster> volume info heketidbstorage Volume Name: heketidbstorage Type: Distribute Volume ID: 34b897d0-0953-4f8f-9c5c-54e043e55d92 Status: Started Snapshot Count: 0 Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: 10.10.168.25:/var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_a16f9f0374fe5db948a60a017a3f5e60/brick Options Reconfigured: user.heketi.id: 1d2400626dac780fce12e45a07494853 transport.address-family: inet nfs.disable: on Full Gluster logs available if needed, just let me know how best to provide them. Thanks in advance for any help or suggestions on this! Best, Jeff Bischoff Turbonomic This message and its attachments are intended only for the designated recipient(s). It may contain confidential or proprietary information and may be subject to legal or other confidentiality protections. If you are not a designated recipient, you may not review, copy or distribute this message. If you receive this in error, please notify the sender by reply e-mail and delete this message. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: From leoalex at gmail.com Mon May 27 10:54:01 2019 From: leoalex at gmail.com (Leo David) Date: Mon, 27 May 2019 10:54:01 -0000 Subject: [Gluster-users] [ovirt-users] Re: Single instance scaleup. In-Reply-To: <626088321.4969320.1558877893362@mail.yahoo.com> References: <626088321.4969320.1558877893362@mail.yahoo.com> Message-ID: Hi, Any suggestions ? Thank you very much ! Leo On Sun, May 26, 2019 at 4:38 PM Strahil Nikolov wrote: > Yeah, > it seems different from the docs. > I'm adding the gluster users list ,as they are more experienced into that. > > @Gluster-users, > > can you provide some hint how to add aditional replicas to the below > volumes , so they become 'replica 2 arbiter 1' or 'replica 3' type volumes ? > > > Best Regards, > Strahil Nikolov > > ? ??????, 26 ??? 2019 ?., 15:16:18 ?. ???????+3, Leo David < > leoalex at gmail.com> ??????: > > > Thank you Strahil, > The engine and ssd-samsung are distributed... > So these are the ones that I need to have replicated accross new nodes. > I am not very sure about the procedure to accomplish this. > Thanks, > > Leo > > On Sun, May 26, 2019, 13:04 Strahil wrote: > > Hi Leo, > As you do not have a distributed volume , you can easily switch to replica > 2 arbiter 1 or replica 3 volumes. > > You can use the following for adding the bricks: > > > https://access.redhat.com/documentation/en-US/Red_Hat_Storage/2.1/html/Administration_Guide/Expanding_Volumes.html > > Best Regards, > Strahil Nikoliv > On May 26, 2019 10:54, Leo David wrote: > > Hi Stahil, > Thank you so much for yout input ! > > gluster volume info > > > Volume Name: engine > Type: Distribute > Volume ID: d7449fc2-cc35-4f80-a776-68e4a3dbd7e1 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 > Transport-type: tcp > Bricks: > Brick1: 192.168.80.191:/gluster_bricks/engine/engine > Options Reconfigured: > nfs.disable: on > transport.address-family: inet > storage.owner-uid: 36 > storage.owner-gid: 36 > features.shard: on > performance.low-prio-threads: 32 > performance.strict-o-direct: off > network.remote-dio: off > network.ping-timeout: 30 > user.cifs: off > performance.quick-read: off > performance.read-ahead: off > performance.io-cache: off > cluster.eager-lock: enable > Volume Name: ssd-samsung > Type: Distribute > Volume ID: 76576cc6-220b-4651-952d-99846178a19e > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 > Transport-type: tcp > Bricks: > Brick1: 192.168.80.191:/gluster_bricks/sdc/data > Options Reconfigured: > cluster.eager-lock: enable > performance.io-cache: off > performance.read-ahead: off > performance.quick-read: off > user.cifs: off > network.ping-timeout: 30 > network.remote-dio: off > performance.strict-o-direct: on > performance.low-prio-threads: 32 > features.shard: on > storage.owner-gid: 36 > storage.owner-uid: 36 > transport.address-family: inet > nfs.disable: on > > The other two hosts will be 192.168.80.192/193 - this is gluster > dedicated network over 10GB sfp+ switch. > - host 2 wil have identical harware configuration with host 1 ( each disk > is actually a raid0 array ) > - host 3 has: > - 1 ssd for OS > - 1 ssd - for adding to engine volume in a full replica 3 > - 2 ssd's in a raid 1 array to be added as arbiter for the data volume > ( ssd-samsung ) > So the plan is to have "engine" scaled in a full replica 3, and > "ssd-samsung" scalled in a replica 3 arbitrated. > > > > > On Sun, May 26, 2019 at 10:34 AM Strahil wrote: > > Hi Leo, > > Gluster is quite smart, but in order to provide any hints , can you > provide output of 'gluster volume info '. > If you have 2 more systems , keep in mind that it is best to mirror the > storage on the second replica (2 disks on 1 machine -> 2 disks on the new > machine), while for the arbiter this is not neccessary. > > What is your network and NICs ? Based on my experience , I can recommend > at least 10 gbit/s interfase(s). > > Best Regards, > Strahil Nikolov > On May 26, 2019 07:52, Leo David wrote: > > Hello Everyone, > Can someone help me to clarify this ? > I have a single-node 4.2.8 installation ( only two gluster storage domains > - distributed single drive volumes ). Now I just got two identintical > servers and I would like to go for a 3 nodes bundle. > Is it possible ( after joining the new nodes to the cluster ) to expand > the existing volumes across the new nodes and change them to replica 3 > arbitrated ? > If so, could you share with me what would it be the procedure ? > Thank you very much ! > > Leo > > > > -- > Best regards, Leo David > > -- Best regards, Leo David -------------- next part -------------- An HTML attachment was scrubbed... URL: From sunkumar at redhat.com Mon May 27 12:53:16 2019 From: sunkumar at redhat.com (sunkumar at redhat.com) Date: Mon, 27 May 2019 12:53:16 -0000 Subject: [Gluster-users] Gluster Community Meeting (APAC friendly hours) Message-ID: <0000000000001f378e0589de08c2@google.com> Bridge: https://bluejeans.com/836554017 Meeting minutes: https://hackmd.io/B4vOpJumRgexzqeQiNPVOw Flash Talk : What is Thin Arbiter? (By Ashish Pandey) Previous Meeting notes: http://github.com/gluster/community Title: Gluster Community Meeting (APAC friendly hours) Bridge: https://bluejeans.com/836554017Meeting minutes: https://hackmd.io/B4vOpJumRgexzqeQiNPVOwFlash Talk : What is Thin Arbiter? (By Ashish Pandey)Previous Meeting notes: http://github.com/gluster/community When: Tue May 28, 2019 11:30am ? 12:30pm India Standard Time - Kolkata Where: https://bluejeans.com/836554017 Who: * pgurusid at redhat.com - organizer * javico at paradigmadigital.com * spentaparthi at idirect.net * sstephen at redhat.com * brian.riddle at storagecraft.com * sthomas at rpstechnologysolutions.co.uk * kdhananj at redhat.com * rwareing at fb.com * david.spisla at iternity.com * khiremat at redhat.com * pkarampu at redhat.com * gluster-users at gluster.org * dcunningham at voisonics.com * m.vrgotic at activevideo.com * barchu02 at unm.edu * gluster-devel at gluster.org * sunkumar at redhat.com * jpark at dexyp.com * rouge2507 at gmail.com * dan at clough.xyz * Max de Graaf * mark.boulton at uwa.edu.au * hgowtham at redhat.com * gabriel.lindeborg at svenskaspel.se * maintainers at gluster.org * ranaraya at redhat.com * philip.ruenagel at gmail.com * spalai at redhat.com * m.ragusa at eurodata.de * pauyeung at connexity.com * duprel at email.sc.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From leoalex at gmail.com Tue May 28 14:08:49 2019 From: leoalex at gmail.com (Leo David) Date: Tue, 28 May 2019 14:08:49 -0000 Subject: [Gluster-users] [ovirt-users] Re: Single instance scaleup. In-Reply-To: References: <626088321.4969320.1558877893362@mail.yahoo.com> Message-ID: Hi, Looks like the only way arround would be to create a brand-new volume as replicated on other disks, and start moving the vms all around the place between volumes ? Cheers, Leo On Mon, May 27, 2019 at 1:53 PM Leo David wrote: > Hi, > Any suggestions ? > Thank you very much ! > > Leo > > On Sun, May 26, 2019 at 4:38 PM Strahil Nikolov > wrote: > >> Yeah, >> it seems different from the docs. >> I'm adding the gluster users list ,as they are more experienced into that. >> >> @Gluster-users, >> >> can you provide some hint how to add aditional replicas to the below >> volumes , so they become 'replica 2 arbiter 1' or 'replica 3' type volumes ? >> >> >> Best Regards, >> Strahil Nikolov >> >> ? ??????, 26 ??? 2019 ?., 15:16:18 ?. ???????+3, Leo David < >> leoalex at gmail.com> ??????: >> >> >> Thank you Strahil, >> The engine and ssd-samsung are distributed... >> So these are the ones that I need to have replicated accross new nodes. >> I am not very sure about the procedure to accomplish this. >> Thanks, >> >> Leo >> >> On Sun, May 26, 2019, 13:04 Strahil wrote: >> >> Hi Leo, >> As you do not have a distributed volume , you can easily switch to >> replica 2 arbiter 1 or replica 3 volumes. >> >> You can use the following for adding the bricks: >> >> >> https://access.redhat.com/documentation/en-US/Red_Hat_Storage/2.1/html/Administration_Guide/Expanding_Volumes.html >> >> Best Regards, >> Strahil Nikoliv >> On May 26, 2019 10:54, Leo David wrote: >> >> Hi Stahil, >> Thank you so much for yout input ! >> >> gluster volume info >> >> >> Volume Name: engine >> Type: Distribute >> Volume ID: d7449fc2-cc35-4f80-a776-68e4a3dbd7e1 >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 1 >> Transport-type: tcp >> Bricks: >> Brick1: 192.168.80.191:/gluster_bricks/engine/engine >> Options Reconfigured: >> nfs.disable: on >> transport.address-family: inet >> storage.owner-uid: 36 >> storage.owner-gid: 36 >> features.shard: on >> performance.low-prio-threads: 32 >> performance.strict-o-direct: off >> network.remote-dio: off >> network.ping-timeout: 30 >> user.cifs: off >> performance.quick-read: off >> performance.read-ahead: off >> performance.io-cache: off >> cluster.eager-lock: enable >> Volume Name: ssd-samsung >> Type: Distribute >> Volume ID: 76576cc6-220b-4651-952d-99846178a19e >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 1 >> Transport-type: tcp >> Bricks: >> Brick1: 192.168.80.191:/gluster_bricks/sdc/data >> Options Reconfigured: >> cluster.eager-lock: enable >> performance.io-cache: off >> performance.read-ahead: off >> performance.quick-read: off >> user.cifs: off >> network.ping-timeout: 30 >> network.remote-dio: off >> performance.strict-o-direct: on >> performance.low-prio-threads: 32 >> features.shard: on >> storage.owner-gid: 36 >> storage.owner-uid: 36 >> transport.address-family: inet >> nfs.disable: on >> >> The other two hosts will be 192.168.80.192/193 - this is gluster >> dedicated network over 10GB sfp+ switch. >> - host 2 wil have identical harware configuration with host 1 ( each disk >> is actually a raid0 array ) >> - host 3 has: >> - 1 ssd for OS >> - 1 ssd - for adding to engine volume in a full replica 3 >> - 2 ssd's in a raid 1 array to be added as arbiter for the data >> volume ( ssd-samsung ) >> So the plan is to have "engine" scaled in a full replica 3, and >> "ssd-samsung" scalled in a replica 3 arbitrated. >> >> >> >> >> On Sun, May 26, 2019 at 10:34 AM Strahil wrote: >> >> Hi Leo, >> >> Gluster is quite smart, but in order to provide any hints , can you >> provide output of 'gluster volume info '. >> If you have 2 more systems , keep in mind that it is best to mirror the >> storage on the second replica (2 disks on 1 machine -> 2 disks on the new >> machine), while for the arbiter this is not neccessary. >> >> What is your network and NICs ? Based on my experience , I can recommend >> at least 10 gbit/s interfase(s). >> >> Best Regards, >> Strahil Nikolov >> On May 26, 2019 07:52, Leo David wrote: >> >> Hello Everyone, >> Can someone help me to clarify this ? >> I have a single-node 4.2.8 installation ( only two gluster storage >> domains - distributed single drive volumes ). Now I just got two >> identintical servers and I would like to go for a 3 nodes bundle. >> Is it possible ( after joining the new nodes to the cluster ) to expand >> the existing volumes across the new nodes and change them to replica 3 >> arbitrated ? >> If so, could you share with me what would it be the procedure ? >> Thank you very much ! >> >> Leo >> >> >> >> -- >> Best regards, Leo David >> >> > > -- > Best regards, Leo David > -- Best regards, Leo David -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdeepugd at gmail.com Thu May 30 12:09:53 2019 From: sdeepugd at gmail.com (deepu srinivasan) Date: Thu, 30 May 2019 12:09:53 -0000 Subject: [Gluster-users] Geo Replication Stops Message-ID: Hi We were evaluating Gluster geo Replication between two DCs one is in US west and one is in US east. We took multiple trials for different file size. The Geo Replication tends to stop replicating but while checking the status it appears to be in Active state. But the slave volume did not increase in size. So we have restarted the geo-replication session and checked the status. The status was in an active state and it was in History Crawl for a long time. We have enabled the DEBUG mode in logging and checked for any error. There was around 2000 file appeared for syncing candidate. The Rsync process starts but the rsync did not happen in the slave volume. Every time the rsync process appears in the "ps auxxx" list but the replication did not happen in the slave end. What would be the cause of this problem? Is there anyway to debug it? We have also checked the strace of the rync program. it displays something like this "write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 128" We are using the below specs Gluster version - 4.1.7 Sync mode - rsync Volume - 1x3 in each end (master and slave) Intranet Bandwidth - 10 Gig -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdeepugd at gmail.com Fri May 31 09:46:36 2019 From: sdeepugd at gmail.com (deepu srinivasan) Date: Fri, 31 May 2019 09:46:36 -0000 Subject: [Gluster-users] Geo Replication stops replicating In-Reply-To: References: Message-ID: Hi Kotresh We have tried the above-mentioned rsync option and we are planning to have the version upgrade to 6.0. On Fri, May 31, 2019 at 11:04 AM Kotresh Hiremath Ravishankar < khiremat at redhat.com> wrote: > Hi, > > This looks like the hang because stderr buffer filled up with errors > messages and no one reading it. > I think this issue is fixed in latest releases. As a workaround, you can > do following and check if it works. > > Prerequisite: > rsync version should be > 3.1.0 > > Workaround: > gluster volume geo-replication :: config > rsync-options "--ignore-missing-args" > > Thanks, > Kotresh HR > > > > > On Thu, May 30, 2019 at 5:39 PM deepu srinivasan > wrote: > >> Hi >> We were evaluating Gluster geo Replication between two DCs one is in US >> west and one is in US east. We took multiple trials for different file >> size. >> The Geo Replication tends to stop replicating but while checking the >> status it appears to be in Active state. But the slave volume did not >> increase in size. >> So we have restarted the geo-replication session and checked the status. >> The status was in an active state and it was in History Crawl for a long >> time. We have enabled the DEBUG mode in logging and checked for any error. >> There was around 2000 file appeared for syncing candidate. The Rsync >> process starts but the rsync did not happen in the slave volume. Every time >> the rsync process appears in the "ps auxxx" list but the replication did >> not happen in the slave end. What would be the cause of this problem? Is >> there anyway to debug it? >> >> We have also checked the strace of the rync program. >> it displays something like this >> >> "write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 128" >> >> >> We are using the below specs >> >> Gluster version - 4.1.7 >> Sync mode - rsync >> Volume - 1x3 in each end (master and slave) >> Intranet Bandwidth - 10 Gig >> > > > -- > Thanks and Regards, > Kotresh H R > -------------- next part -------------- An HTML attachment was scrubbed... URL: From maybeonly at gmail.com Mon May 13 13:57:46 2019 From: maybeonly at gmail.com (Only Maybe) Date: Mon, 13 May 2019 13:57:46 -0000 Subject: [Gluster-users] File not updated on gluster fuse volume when written by another client Message-ID: I have a cluster with many replica volumes One of the node was rebooted for some reason Then I mounted all volumes from localhost, on the rebooted server I found some contents of file which was updated by other clients does not change. If I remount the volume, the file was updated, but stopped updating again from then on. Gluserfs version: 6.1 & 6.0 & 3.8 opversion: 3.8 Everything seems well in gluster volume status * is it related to dentry cache? what can I do? Thank you -------------- next part -------------- An HTML attachment was scrubbed... URL: From chrmeyer at chrmeyer.de Thu May 16 07:54:02 2019 From: chrmeyer at chrmeyer.de (Christian Meyer) Date: Thu, 16 May 2019 07:54:02 -0000 Subject: [Gluster-users] Memory leak in gluster 5.4 Message-ID: Hi everyone! I'm using a Gluster 5.4 Setup with 3 Nodes with three volumes (replicated) (one is the gluster shared storage). Each node has 64GB of RAM. Over the time of ~2 month the memory consumption of glusterd grow linear. An the end glusterd used ~45% of RAM the brick processes together ~43% of RAM. I think this is a memory leak. I made a coredump of the processes (glusterd, bricks) (zipped ~500MB), hope this will help you to find the problem. Could you have a look on it? Download Coredumps: https://s3.eu-central-1.amazonaws.com/glusterlogs/gluster_coredump.zip Kind regards Christian From chrmeyer at chrmeyer.de Thu May 16 09:19:48 2019 From: chrmeyer at chrmeyer.de (Christian Meyer) Date: Thu, 16 May 2019 09:19:48 -0000 Subject: [Gluster-users] Memory leak in gluster 5.4 Message-ID: Hi everyone! I'm using a Gluster 5.4 Setup with 3 Nodes with three volumes (replicated) (one is the gluster shared storage). Each node has 64GB of RAM. Over the time of ~2 month the memory consumption of glusterd grow linear. An the end glusterd used ~45% of RAM the brick processes together ~43% of RAM. I think this is a memory leak. I made a coredump of the processes (glusterd, bricks) (zipped ~500MB), hope this will help you to find the problem. Could you please have a look on it? Download Coredumps: https://s3.eu-central-1.amazonaws.com/glusterlogs/gluster_coredump.zip Kind regards Christian From saravana20july at gmail.com Wed May 22 13:15:33 2019 From: saravana20july at gmail.com (Saravana Kumar) Date: Wed, 22 May 2019 13:15:33 -0000 Subject: [Gluster-users] Geo Replication Hangs Message-ID: Hi We were evaluating Gluster geo Replication between two DataCenters, one is in US west and other in US east. We took multiple trials for different file size . The Geo Replication tend to stop replicating, While checking the status it appears to be in Active state, but the slave volume did not increase in size. So we have restarted the geo replication session and checked the status . The status was in active state and the it was in History Crawl for a long time. We have enabled the DEBUG mode in logging and checked for any error. There were around 2000 file appeared for syncing candidate. The Rsync process starts but the rsync did not happen in the slave volume. Every time the rsync process appears in the "ps auxxx" list but the replication did not happen in the slave end. What would be the cause for this problem? Is there anyways to debug it? We are using the below specs Gluster version - 4.1.7 Sync mode - rsync Volume - 1x3 in each end (master and slave) Intranet Bandwidth - 10 Gig -- Regards, Saravana Kumar.N [image: Picture] -------------- next part -------------- An HTML attachment was scrubbed... URL: From Rene.Kucera at ontec.at Fri May 24 12:10:40 2019 From: Rene.Kucera at ontec.at (Rene Kucera) Date: Fri, 24 May 2019 12:10:40 -0000 Subject: [Gluster-users] Glusterfs Split-Brain problem Message-ID: <4E7878669989AC42976A9FD44C0FBF546CF0CB08@mail02.olymp.ontec.at> Hi Gluster Community We have a PVE Proxmox cluster with two nodes. These two nodes each have 4 HDDs over which we have a glusterfs to migrate VMs live. A few days ago we had the problem that some disk files in the glusterfs got into a split-brain condition. We were able to secure the corresponding logfiles and resolve the split brain condition, but don't know how it happened. In the appendix you can find the Glusterfs log files. Maybe one of you can tell us what caused the problem: Here is the network setup of the PVE Cluster 192.168.231.0/24 --> Serverlan (reach PVE Gui port 8006) 10.10.11.0 /24 --> Cluster Ha Lan 10.10.12.0 /24 --> Glusterfs Storage lan Glusterfs Lan .) PVEServer1 - 10.10.12.31 .) PVEServer2 - 10.10.12.32 What we've seen in the mnt-pve-GlusterVol01.log log file: Server1: [2019-05-13 04:25:01.509716] I [MSGID: 100011] [glusterfsd.c:1396:reincarnate] 0-glusterfsd: Fetching the volume file from server... [2019-05-13 09:47:48.277650] W [socket.c:590:__socket_rwv] 0-glusterfs: readv on 10.10.12.31:24007 failed (No data available) [2019-05-13 09:47:48.277696] E [glusterfsd-mgmt.c:1908:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: 10.10.12.31 (No data available) [2019-05-13 09:47:48.277704] I [glusterfsd-mgmt.c:1926:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers [2019-05-13 09:47:50.926948] W [glusterfsd.c:1327:cleanup_and_exit] (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x7494) [0x7fe58a1eb494] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xf5) [0x55a8728115e5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x54) [0x55a872811444] ) 0-: received signum (15), shutting down [2019-05-13 09:47:50.926977] I [fuse-bridge.c:5794:fini] 0-fuse: Unmounting '/mnt/pve/GlusterVol01'. [2019-05-13 09:47:50.950381] I [fuse-bridge.c:5086:fuse_thread_proc] 0-fuse: unmounting /mnt/pve/GlusterVol01 [2019-05-13 09:49:43.823117] I [MSGID: 100030] [glusterfsd.c:2454:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.8.8 (args: /usr/sbin/glusterfs --volfile-server=10.10.12.31 --volfile-id=vol0 /mnt/pve/GlusterVol01) [2019-05-13 09:49:43.828117] I [MSGID: 101190] [event-epoll.c:628:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2019-05-13 09:49:43.869885] W [MSGID: 108003] [afr.c:102:fix_quorum_options] 0-vol0-replicate-0: quorum-type none overriding quorum-count 1 [2019-05-13 09:49:43.871644] I [MSGID: 101190] [event-epoll.c:628:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2 [2019-05-13 09:49:43.880208] I [MSGID: 114020] [client.c:2356:notify] 0-vol0-client-0: parent translators are ready, attempting connect on transport [2019-05-13 09:49:43.880609] I [MSGID: 114020] [client.c:2356:notify] 0-vol0-client-1: parent translators are ready, attempting connect on transport [2019-05-13 09:49:43.880816] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-vol0-client-0: changing port to 49155 (from 0) Final graph: +------------------------------------------------------------------------------+ 1: volume vol0-client-0 2: type protocol/client 3: option ping-timeout 5 4: option remote-host pvetau01-storage 5: option remote-subvolume /var/lib/glusterfs/data01/brick1/vol0 6: option transport-type socket 7: option transport.address-family inet 8: option username 4ccc2234-fba7-40f9-b97b-26d3fa8ab401 9: option password cef1b5f5-b16c-4a3c-b49f-f814901a3252 10: option filter-O_DIRECT enable 11: option send-gids true 12: end-volume 13: 14: volume vol0-client-1 15: type protocol/client 16: option ping-timeout 5 17: option remote-host pvetau02-storage 18: option remote-subvolume /var/lib/glusterfs/data01/brick1/vol0 19: option transport-type socket 20: option transport.address-family inet 21: option username 4ccc2234-fba7-40f9-b97b-26d3fa8ab401 22: option password cef1b5f5-b16c-4a3c-b49f-f814901a3252 23: option filter-O_DIRECT enable 24: option send-gids true 25: end-volume 26: 27: volume vol0-replicate-0 28: type cluster/replicate 29: option eager-lock enable 30: option quorum-count 1 31: subvolumes vol0-client-0 vol0-client-1 32: end-volume 33: 34: volume vol0-dht 35: type cluster/distribute 36: option lock-migration off 37: subvolumes vol0-replicate-0 38: end-volume 39: 40: volume vol0-write-behind 41: type performance/write-behind 42: subvolumes vol0-dht 43: end-volume 44: 45: volume vol0-readdir-ahead 46: type performance/readdir-ahead 47: subvolumes vol0-write-behind 48: end-volume 49: 50: volume vol0-open-behind 51: type performance/open-behind 52: subvolumes vol0-readdir-ahead 53: end-volume 54: 55: volume vol0 56: type debug/io-stats 57: option log-level INFO 58: option latency-measurement off 59: option count-fop-hits off 60: subvolumes vol0-open-behind 61: end-volume 62: 63: volume meta-autoload 64: type meta 65: subvolumes vol0 66: end-volume 67: +------------------------------------------------------------------------------+ [2019-05-13 09:49:43.881243] I [MSGID: 114057] [client-handshake.c:1447:select_server_supported_programs] 0-vol0-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2019-05-13 09:49:43.881434] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-vol0-client-1: changing port to 49154 (from 0) [2019-05-13 09:49:43.881906] I [MSGID: 114057] [client-handshake.c:1447:select_server_supported_programs] 0-vol0-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2019-05-13 09:49:43.882213] I [MSGID: 114046] [client-handshake.c:1223:client_setvolume_cbk] 0-vol0-client-1: Connected to vol0-client-1, attached to remote volume '/var/lib/glusterfs/data01/brick1/vol0'. [2019-05-13 09:49:43.882222] I [MSGID: 114047] [client-handshake.c:1234:client_setvolume_cbk] 0-vol0-client-1: Server and Client lk-version numbers are not same, reopening the fds [2019-05-13 09:49:43.882249] I [MSGID: 108005] [afr-common.c:4382:afr_notify] 0-vol0-replicate-0: Subvolume 'vol0-client-1' came back up; going online. [2019-05-13 09:49:43.882360] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-vol0-client-1: Server lk version = 1 [2019-05-13 09:49:43.886625] I [MSGID: 114046] [client-handshake.c:1223:client_setvolume_cbk] 0-vol0-client-0: Connected to vol0-client-0, attached to remote volume '/var/lib/glusterfs/data01/brick1/vol0'. [2019-05-13 09:49:43.886633] I [MSGID: 114047] [client-handshake.c:1234:client_setvolume_cbk] 0-vol0-client-0: Server and Client lk-version numbers are not same, reopening the fds [2019-05-13 09:49:43.890995] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-vol0-client-0: Server lk version = 1 [2019-05-13 09:49:43.891049] I [fuse-bridge.c:4153:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.26 [2019-05-13 09:49:43.891067] I [fuse-bridge.c:4838:fuse_graph_sync] 0-fuse: switched to graph 0 [2019-05-13 09:49:43.891625] I [MSGID: 108031] [afr-common.c:2152:afr_local_discovery_cbk] 0-vol0-replicate-0: selecting local read_child vol0-client-0 [2019-05-13 10:20:38.998246] C [rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired] 0-vol0-client-1: server 10.10.12.32:49154 has not responded in the last 5 seconds, disconnecting. [2019-05-13 10:20:38.998657] E [rpc-clnt.c:365:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7f69df41fe83] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1d1)[0x7f69df1e7b61] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f69df1e7c7e] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x89)[0x7f69df1e92e9] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x94)[0x7f69df1e9bb4] ))))) 0-vol0-client-1: forced unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2019-05-13 10:20:33.237111 (xid=0x492) [2019-05-13 10:20:38.998681] W [MSGID: 114031] [client-rpc-fops.c:2933:client3_3_lookup_cbk] 0-vol0-client-1: remote operation failed. Path: / (00000000-0000-0000-0000-000000000001) [Transport endpoint is not connected] [2019-05-13 10:20:38.998829] E [rpc-clnt.c:365:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7f69df41fe83] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1d1)[0x7f69df1e7b61] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f69df1e7c7e] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x89)[0x7f69df1e92e9] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x94)[0x7f69df1e9bb4] ))))) 0-vol0-client-1: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2019-05-13 10:20:33.237115 (xid=0x493) [2019-05-13 10:20:38.998843] W [rpc-clnt-ping.c:203:rpc_clnt_ping_cbk] 0-vol0-client-1: socket disconnected [2019-05-13 10:20:38.998854] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-vol0-client-1: disconnected from vol0-client-1. Client process will keep trying to connect to glusterd until brick's port is available [2019-05-13 10:20:43.355917] I [MSGID: 108031] [afr-common.c:2152:afr_local_discovery_cbk] 0-vol0-replicate-0: selecting local read_child vol0-client-0 [2019-05-13 10:21:20.850030] E [socket.c:2309:socket_connect_finish] 0-vol0-client-1: connection to 10.10.12.32:24007 failed (No route to host) [2019-05-13 10:22:07.026615] E [MSGID: 114058] [client-handshake.c:1534:client_query_portmap_cbk] 0-vol0-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. [2019-05-13 10:22:07.026663] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-vol0-client-1: disconnected from vol0-client-1. Client process will keep trying to connect to glusterd until brick's port is available [2019-05-13 10:22:10.010421] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-vol0-client-1: changing port to 49154 (from 0) [2019-05-13 10:22:10.011105] I [MSGID: 114057] [client-handshake.c:1447:select_server_supported_programs] 0-vol0-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2019-05-13 10:22:10.011558] I [MSGID: 114046] [client-handshake.c:1223:client_setvolume_cbk] 0-vol0-client-1: Connected to vol0-client-1, attached to remote volume '/var/lib/glusterfs/data01/brick1/vol0'. [2019-05-13 10:22:10.011609] I [MSGID: 114047] [client-handshake.c:1234:client_setvolume_cbk] 0-vol0-client-1: Server and Client lk-version numbers are not same, reopening the fds [2019-05-13 10:22:10.011622] I [MSGID: 114042] [client-handshake.c:1054:client_post_handshake] 0-vol0-client-1: 2 fds open - Delaying child_up until they are re-opened [2019-05-13 10:22:10.032258] I [MSGID: 114041] [client-handshake.c:676:client_child_up_reopen_done] 0-vol0-client-1: last fd open'd/lock-self-heal'd - notifying CHILD-UP [2019-05-13 10:22:10.032492] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-vol0-client-1: Server lk version = 1 [2019-05-13 10:22:13.790586] I [MSGID: 108031] [afr-common.c:2152:afr_local_discovery_cbk] 0-vol0-replicate-0: selecting local read_child vol0-client-0 [2019-05-13 11:12:57.300347] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing READ on gfid 5f9490a8-ec56-410e-9c70-653e0da77174: split-brain observed. [Input/output error] [2019-05-13 11:12:57.305284] W [MSGID: 108008] [afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 found with event generation 4 for gfid 5f9490a8-ec56-410e-9c70-653e0da77174. (Possible split-brain) [2019-05-13 11:12:57.305712] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing FGETXATTR on gfid 5f9490a8-ec56-410e-9c70-653e0da77174: split-brain observed. [Input/output error] [2019-05-13 11:12:57.306277] W [MSGID: 108027] [afr-common.c:2491:afr_discover_done] 0-vol0-replicate-0: no read subvols for (null) [2019-05-13 11:12:57.306938] I [MSGID: 114024] [client-helpers.c:99:this_fd_set_ctx] 0-vol0-client-0: /images/103/vm-103-disk-0.qcow2 (5f9490a8-ec56-410e-9c70-653e0da77174): trying duplicate remote fd set. [2019-05-13 11:12:57.306973] I [MSGID: 114024] [client-helpers.c:99:this_fd_set_ctx] 0-vol0-client-1: /images/103/vm-103-disk-0.qcow2 (5f9490a8-ec56-410e-9c70-653e0da77174): trying duplicate remote fd set. [2019-05-13 11:12:57.310052] W [fuse-bridge.c:2228:fuse_readv_cbk] 0-glusterfs-fuse: 2698: READ => -1 gfid=5f9490a8-ec56-410e-9c70-653e0da77174 fd=0x7f69d1cba184 (Input/output error) [2019-05-13 11:12:57.310137] W [fuse-bridge.c:2228:fuse_readv_cbk] 0-glusterfs-fuse: 2697: READ => -1 gfid=5f9490a8-ec56-410e-9c70-653e0da77174 fd=0x7f69d1cba184 (Input/output error) [2019-05-13 11:12:57.311543] W [fuse-bridge.c:2228:fuse_readv_cbk] 0-glusterfs-fuse: 2699: READ => -1 gfid=5f9490a8-ec56-410e-9c70-653e0da77174 fd=0x7f69d1cba184 (Input/output error) The message "E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing FGETXATTR on gfid 5f9490a8-ec56-410e-9c70-653e0da77174: split-brain observed. [Input/output error]" repeated 2 times between [2019-05-13 11:12:57.305712] and [2019-05-13 11:12:57.310816] The message "W [MSGID: 108027] [afr-common.c:2491:afr_discover_done] 0-vol0-replicate-0: no read subvols for (null)" repeated 2 times between [2019-05-13 11:12:57.306277] and [2019-05-13 11:12:57.311184] The message "W [MSGID: 108008] [afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 found with event generation 4 for gfid 5f9490a8-ec56-410e-9c70-653e0da77174. (Possible split-brain)" repeated 6 times between [2019-05-13 11:12:57.305284] and [2019-05-13 11:12:57.311274] The message "E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing READ on gfid 5f9490a8-ec56-410e-9c70-653e0da77174: split-brain observed. [Input/output error]" repeated 5 times between [2019-05-13 11:12:57.300347] and [2019-05-13 11:12:57.311531] Server 2: [2019-05-13 04:25:01.338790] I [MSGID: 100011] [glusterfsd.c:1396:reincarnate] 0-glusterfsd: Fetching the volume file from server... [2019-05-13 09:47:59.443328] E [socket.c:2309:socket_connect_finish] 0-glusterfs: connection to 10.10.12.31:24007 failed (Connection refused) [2019-05-13 09:48:17.426580] C [rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired] 0-vol0-client-0: server 10.10.12.31:49155 has not responded in the last 5 seconds, disconnecting. [2019-05-13 09:48:17.426872] E [rpc-clnt.c:365:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7efebd3f9e83] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1d1)[0x7efebd1c1b61] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7efebd1c1c7e] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x89)[0x7efebd1c32e9] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x94)[0x7efebd1c3bb4] ))))) 0-vol0-client-0: forced unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2019-05-13 09:48:12.180579 (xid=0x5663a4) [2019-05-13 09:48:17.426899] W [MSGID: 114031] [client-rpc-fops.c:2933:client3_3_lookup_cbk] 0-vol0-client-0: remote operation failed. Path: / (00000000-0000-0000-0000-000000000001) [Transport endpoint is not connected] [2019-05-13 09:48:17.427056] E [rpc-clnt.c:365:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7efebd3f9e83] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1d1)[0x7efebd1c1b61] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7efebd1c1c7e] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x89)[0x7efebd1c32e9] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x94)[0x7efebd1c3bb4] ))))) 0-vol0-client-0: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2019-05-13 09:48:12.180591 (xid=0x5663a5) [2019-05-13 09:48:17.427067] W [rpc-clnt-ping.c:203:rpc_clnt_ping_cbk] 0-vol0-client-0: socket disconnected [2019-05-13 09:48:17.427077] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-vol0-client-0: disconnected from vol0-client-0. Client process will keep trying to connect to glusterd until brick's port is available [2019-05-13 09:48:21.479100] I [MSGID: 108031] [afr-common.c:2152:afr_local_discovery_cbk] 0-vol0-replicate-0: selecting local read_child vol0-client-1 [2019-05-13 09:48:59.219302] E [socket.c:2309:socket_connect_finish] 0-vol0-client-0: connection to 10.10.12.31:24007 failed (No route to host) [2019-05-13 09:49:41.468469] I [glusterfsd-mgmt.c:1600:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing [2019-05-13 09:49:42.505174] E [MSGID: 114058] [client-handshake.c:1534:client_query_portmap_cbk] 0-vol0-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. [2019-05-13 09:49:42.505225] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-vol0-client-0: disconnected from vol0-client-0. Client process will keep trying to connect to glusterd until brick's port is available [2019-05-13 09:49:45.442003] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-vol0-client-0: changing port to 49155 (from 0) [2019-05-13 09:49:45.442523] I [MSGID: 114057] [client-handshake.c:1447:select_server_supported_programs] 0-vol0-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2019-05-13 09:49:45.442802] I [MSGID: 114046] [client-handshake.c:1223:client_setvolume_cbk] 0-vol0-client-0: Connected to vol0-client-0, attached to remote volume '/var/lib/glusterfs/data01/brick1/vol0'. [2019-05-13 09:49:45.442812] I [MSGID: 114047] [client-handshake.c:1234:client_setvolume_cbk] 0-vol0-client-0: Server and Client lk-version numbers are not same, reopening the fds [2019-05-13 09:49:45.442820] I [MSGID: 114042] [client-handshake.c:1054:client_post_handshake] 0-vol0-client-0: 2 fds open - Delaying child_up until they are re-opened [2019-05-13 09:49:45.443244] I [MSGID: 114041] [client-handshake.c:676:client_child_up_reopen_done] 0-vol0-client-0: last fd open'd/lock-self-heal'd - notifying CHILD-UP [2019-05-13 09:49:45.443353] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-vol0-client-0: Server lk version = 1 [2019-05-13 09:49:49.622255] I [MSGID: 108031] [afr-common.c:2152:afr_local_discovery_cbk] 0-vol0-replicate-0: selecting local read_child vol0-client-1 [2019-05-13 10:20:06.060045] W [glusterfsd.c:1327:cleanup_and_exit] (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x7494) [0x7efebc254494] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xf5) [0x55dba7a3b5e5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x54) [0x55dba7a3b444] ) 0-: received signum (15), shutting down [2019-05-13 10:20:06.068969] I [fuse-bridge.c:5794:fini] 0-fuse: Unmounting '/mnt/pve/GlusterVol01'. [2019-05-13 10:20:06.103235] I [fuse-bridge.c:5086:fuse_thread_proc] 0-fuse: unmounting /mnt/pve/GlusterVol01 [2019-05-13 10:22:08.842734] I [MSGID: 100030] [glusterfsd.c:2454:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.8.8 (args: /usr/sbin/glusterfs --volfile-server=10.10.12.31 --volfile-id=vol0 /mnt/pve/GlusterVol01) [2019-05-13 10:22:08.853935] I [MSGID: 101190] [event-epoll.c:628:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2019-05-13 10:22:08.944855] W [MSGID: 108003] [afr.c:102:fix_quorum_options] 0-vol0-replicate-0: quorum-type none overriding quorum-count 1 [2019-05-13 10:22:08.946502] I [MSGID: 101190] [event-epoll.c:628:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2 [2019-05-13 10:22:08.972020] I [MSGID: 114020] [client.c:2356:notify] 0-vol0-client-0: parent translators are ready, attempting connect on transport [2019-05-13 10:22:08.972395] I [MSGID: 114020] [client.c:2356:notify] 0-vol0-client-1: parent translators are ready, attempting connect on transport [2019-05-13 10:22:08.972832] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-vol0-client-0: changing port to 49155 (from 0) [2019-05-13 10:22:08.973142] I [MSGID: 114057] [client-handshake.c:1447:select_server_supported_programs] 0-vol0-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2019-05-13 10:22:08.973231] I [MSGID: 114057] [client-handshake.c:1447:select_server_supported_programs] 0-vol0-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2019-05-13 10:22:08.973544] I [MSGID: 114046] [client-handshake.c:1223:client_setvolume_cbk] 0-vol0-client-1: Connected to vol0-client-1, attached to remote volume '/var/lib/glusterfs/data01/brick1/vol0'. [2019-05-13 10:22:08.973544] I [MSGID: 114046] [client-handshake.c:1223:client_setvolume_cbk] 0-vol0-client-0: Connected to vol0-client-0, attached to remote volume '/var/lib/glusterfs/data01/brick1/vol0'. [2019-05-13 10:22:08.973566] I [MSGID: 114047] [client-handshake.c:1234:client_setvolume_cbk] 0-vol0-client-0: Server and Client lk-version numbers are not same, reopening the fds [2019-05-13 10:22:08.973567] I [MSGID: 114047] [client-handshake.c:1234:client_setvolume_cbk] 0-vol0-client-1: Server and Client lk-version numbers are not same, reopening the fds [2019-05-13 10:22:08.973616] I [MSGID: 108005] [afr-common.c:4382:afr_notify] 0-vol0-replicate-0: Subvolume 'vol0-client-1' came back up; going online. [2019-05-13 10:22:08.973639] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-vol0-client-1: Server lk version = 1 [2019-05-13 10:22:08.977940] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-vol0-client-0: Server lk version = 1 [2019-05-13 10:22:08.978055] I [fuse-bridge.c:4153:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.26 [2019-05-13 10:22:08.978075] I [fuse-bridge.c:4838:fuse_graph_sync] 0-fuse: switched to graph 0 [2019-05-13 10:22:08.978603] I [MSGID: 108031] [afr-common.c:2152:afr_local_discovery_cbk] 0-vol0-replicate-0: selecting local read_child vol0-client-1 [2019-05-13 10:53:46.573894] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing READ on gfid 5f9490a8-ec56-410e-9c70-653e0da77174: split-brain observed. [Input/output error] [2019-05-13 10:53:46.573992] W [MSGID: 108008] [afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 found with event generation 2 for gfid 5f9490a8-ec56-410e-9c70-653e0da77174. (Possible split-brain) [2019-05-13 10:53:46.574253] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing FGETXATTR on gfid 5f9490a8-ec56-410e-9c70-653e0da77174: split-brain observed. [Input/output error] [2019-05-13 10:53:46.574949] W [MSGID: 108027] [afr-common.c:2491:afr_discover_done] 0-vol0-replicate-0: no read subvols for (null) [2019-05-13 10:53:46.575526] W [fuse-bridge.c:2228:fuse_readv_cbk] 0-glusterfs-fuse: 1380: READ => -1 gfid=5f9490a8-ec56-410e-9c70-653e0da77174 fd=0x7f649c00e06c (Input/output error) [2019-05-13 10:53:46.577820] W [fuse-bridge.c:2228:fuse_readv_cbk] 0-glusterfs-fuse: 1381: READ => -1 gfid=5f9490a8-ec56-410e-9c70-653e0da77174 fd=0x7f649c00e06c (Input/output error) [2019-05-13 10:53:46.596838] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing READ on gfid 609bb8be-3ae8-470d-9f88-2b65095fbed4: split-brain observed. [Input/output error] [2019-05-13 10:53:46.597759] W [MSGID: 108008] [afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 found with event generation 2 for gfid 609bb8be-3ae8-470d-9f88-2b65095fbed4. (Possible split-brain) [2019-05-13 10:53:46.598916] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing FGETXATTR on gfid 609bb8be-3ae8-470d-9f88-2b65095fbed4: split-brain observed. [Input/output error] The message "W [MSGID: 108027] [afr-common.c:2491:afr_discover_done] 0-vol0-replicate-0: no read subvols for (null)" repeated 2 times between [2019-05-13 10:53:46.574949] and [2019-05-13 10:53:46.599257] [2019-05-13 10:53:46.599525] W [MSGID: 108008] [afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 found with event generation 2 for gfid 609bb8be-3ae8-470d-9f88-2b65095fbed4. (Possible split-brain) [2019-05-13 10:53:46.599797] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing READ on gfid 609bb8be-3ae8-470d-9f88-2b65095fbed4: split-brain observed. [Input/output error] [2019-05-13 10:53:46.599825] W [fuse-bridge.c:2228:fuse_readv_cbk] 0-glusterfs-fuse: 1389: READ => -1 gfid=609bb8be-3ae8-470d-9f88-2b65095fbed4 fd=0x7f649c00e06c (Input/output error) [2019-05-13 10:53:46.599876] W [MSGID: 108008] [afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 found with event generation 2 for gfid 609bb8be-3ae8-470d-9f88-2b65095fbed4. (Possible split-brain) [2019-05-13 10:53:46.600149] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing READ on gfid 609bb8be-3ae8-470d-9f88-2b65095fbed4: split-brain observed. [Input/output error] [2019-05-13 10:53:46.600193] W [MSGID: 108008] [afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 found with event generation 2 for gfid 609bb8be-3ae8-470d-9f88-2b65095fbed4. (Possible split-brain) [2019-05-13 10:53:46.600417] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing FGETXATTR on gfid 609bb8be-3ae8-470d-9f88-2b65095fbed4: split-brain observed. [Input/output error] [2019-05-13 10:53:46.600775] W [MSGID: 108027] [afr-common.c:2491:afr_discover_done] 0-vol0-replicate-0: no read subvols for (null) [2019-05-13 10:53:46.601071] W [MSGID: 108008] [afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 found with event generation 2 for gfid 609bb8be-3ae8-470d-9f88-2b65095fbed4. (Possible split-brain) [2019-05-13 10:53:46.601537] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing READ on gfid 609bb8be-3ae8-470d-9f88-2b65095fbed4: split-brain observed. [Input/output error] [2019-05-13 10:53:46.601577] W [fuse-bridge.c:2228:fuse_readv_cbk] 0-glusterfs-fuse: 1390: READ => -1 gfid=609bb8be-3ae8-470d-9f88-2b65095fbed4 fd=0x7f649c00e06c (Input/output error) [2019-05-13 10:53:46.619830] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing READ on gfid 79423c92-0338-4dc9-bafc-091172e8d845: split-brain observed. [Input/output error] [2019-05-13 10:53:46.620701] W [MSGID: 108008] [afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 found with event generation 2 for gfid 79423c92-0338-4dc9-bafc-091172e8d845. (Possible split-brain) [2019-05-13 10:53:46.621098] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing FGETXATTR on gfid 79423c92-0338-4dc9-bafc-091172e8d845: split-brain observed. [Input/output error] [2019-05-13 10:53:46.621455] W [MSGID: 108027] [afr-common.c:2491:afr_discover_done] 0-vol0-replicate-0: no read subvols for (null) [2019-05-13 10:53:46.621732] W [MSGID: 108008] [afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 found with event generation 2 for gfid 79423c92-0338-4dc9-bafc-091172e8d845. (Possible split-brain) [2019-05-13 10:53:46.623509] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing READ on gfid 79423c92-0338-4dc9-bafc-091172e8d845: split-brain observed. [Input/output error] [2019-05-13 10:53:46.624891] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing FGETXATTR on gfid 79423c92-0338-4dc9-bafc-091172e8d845: split-brain observed. [Input/output error] [2019-05-13 10:53:46.625212] W [MSGID: 108027] [afr-common.c:2491:afr_discover_done] 0-vol0-replicate-0: no read subvols for (null) [2019-05-13 10:53:46.625314] W [MSGID: 108008] [afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 found with event generation 2 for gfid 79423c92-0338-4dc9-bafc-091172e8d845. (Possible split-brain) [2019-05-13 10:53:46.625721] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing READ on gfid 79423c92-0338-4dc9-bafc-091172e8d845: split-brain observed. [Input/output error] [2019-05-13 10:53:46.625754] W [fuse-bridge.c:2228:fuse_readv_cbk] 0-glusterfs-fuse: 1399: READ => -1 gfid=79423c92-0338-4dc9-bafc-091172e8d845 fd=0x7f649c00e06c (Input/output error) [2019-05-13 10:53:46.576286] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing FGETXATTR on gfid 5f9490a8-ec56-410e-9c70-653e0da77174: split-brain observed. [Input/output error] [2019-05-13 10:56:28.176786] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing READ on gfid 5f9490a8-ec56-410e-9c70-653e0da77174: split-brain observed. [Input/output error] [2019-05-13 10:56:28.177684] W [MSGID: 108008] [afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 found with event generation 2 for gfid 5f9490a8-ec56-410e-9c70-653e0da77174. (Possible split-brain) [2019-05-13 10:56:28.178782] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing FGETXATTR on gfid 5f9490a8-ec56-410e-9c70-653e0da77174: split-brain observed. [Input/output error] [2019-05-13 10:56:28.179128] W [MSGID: 108027] [afr-common.c:2491:afr_discover_done] 0-vol0-replicate-0: no read subvols for (null) [2019-05-13 10:56:28.180634] W [fuse-bridge.c:2228:fuse_readv_cbk] 0-glusterfs-fuse: 1533: READ => -1 gfid=5f9490a8-ec56-410e-9c70-653e0da77174 fd=0x7f649c00e06c (Input/output error) [2019-05-13 10:56:28.179439] W [MSGID: 108008] [afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 found with event generation 2 for gfid 5f9490a8-ec56-410e-9c70-653e0da77174. (Possible split-brain) [2019-05-13 10:56:28.180620] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing READ on gfid 5f9490a8-ec56-410e-9c70-653e0da77174: split-brain observed. [Input/output error] [2019-05-13 10:59:25.278595] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing READ on gfid 5f9490a8-ec56-410e-9c70-653e0da77174: split-brain observed. [Input/output error] [2019-05-13 10:59:25.279517] W [MSGID: 108008] [afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 found with event generation 2 for gfid 5f9490a8-ec56-410e-9c70-653e0da77174. (Possible split-brain) [2019-05-13 10:59:25.280605] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing FGETXATTR on gfid 5f9490a8-ec56-410e-9c70-653e0da77174: split-brain observed. [Input/output error] [2019-05-13 10:59:25.281649] W [fuse-bridge.c:2228:fuse_readv_cbk] 0-glusterfs-fuse: 1685: READ => -1 gfid=5f9490a8-ec56-410e-9c70-653e0da77174 fd=0x7f649c00e06c (Input/output error) [2019-05-13 10:59:25.281250] W [MSGID: 108008] [afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 found with event generation 2 for gfid 5f9490a8-ec56-410e-9c70-653e0da77174. (Possible split-brain) ------------------------------------------------- What we can't explain is why server 1 does the following: [2019-05-13 09:47:48.277650] W [socket.c:590:__socket_rwv] 0-glusterfs: readv on 10.10.12.31:24007 failed (No data available) [2019-05-13 09:47:48.277696] E [glusterfsd-mgmt.c:1908:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: 10.10.12.31 (No data available) [2019-05-13 09:47:48.277704] I [glusterfsd-mgmt.c:1926:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers then the volume will be unmounted and re-mounted with another port again. In further consequence server 2 behaves exactly like this which consequences in a a split-brain condition of the disk files of the VMs. we would be glad if someone could explain these behaviors to us. BR Ren? -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdeepugd at gmail.com Fri May 31 09:47:09 2019 From: sdeepugd at gmail.com (deepu srinivasan) Date: Fri, 31 May 2019 09:47:09 -0000 Subject: [Gluster-users] Geo Replication stops replicating In-Reply-To: References: Message-ID: Hi Kotresh The above-mentioned work around did not work properly. On Fri, May 31, 2019 at 3:16 PM deepu srinivasan wrote: > Hi Kotresh > We have tried the above-mentioned rsync option and we are planning to have > the version upgrade to 6.0. > > On Fri, May 31, 2019 at 11:04 AM Kotresh Hiremath Ravishankar < > khiremat at redhat.com> wrote: > >> Hi, >> >> This looks like the hang because stderr buffer filled up with errors >> messages and no one reading it. >> I think this issue is fixed in latest releases. As a workaround, you can >> do following and check if it works. >> >> Prerequisite: >> rsync version should be > 3.1.0 >> >> Workaround: >> gluster volume geo-replication :: config >> rsync-options "--ignore-missing-args" >> >> Thanks, >> Kotresh HR >> >> >> >> >> On Thu, May 30, 2019 at 5:39 PM deepu srinivasan >> wrote: >> >>> Hi >>> We were evaluating Gluster geo Replication between two DCs one is in US >>> west and one is in US east. We took multiple trials for different file >>> size. >>> The Geo Replication tends to stop replicating but while checking the >>> status it appears to be in Active state. But the slave volume did not >>> increase in size. >>> So we have restarted the geo-replication session and checked the status. >>> The status was in an active state and it was in History Crawl for a long >>> time. We have enabled the DEBUG mode in logging and checked for any error. >>> There was around 2000 file appeared for syncing candidate. The Rsync >>> process starts but the rsync did not happen in the slave volume. Every time >>> the rsync process appears in the "ps auxxx" list but the replication did >>> not happen in the slave end. What would be the cause of this problem? Is >>> there anyway to debug it? >>> >>> We have also checked the strace of the rync program. >>> it displays something like this >>> >>> "write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 128" >>> >>> >>> We are using the below specs >>> >>> Gluster version - 4.1.7 >>> Sync mode - rsync >>> Volume - 1x3 in each end (master and slave) >>> Intranet Bandwidth - 10 Gig >>> >> >> >> -- >> Thanks and Regards, >> Kotresh H R >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdeepugd at gmail.com Fri May 31 10:41:41 2019 From: sdeepugd at gmail.com (deepu srinivasan) Date: Fri, 31 May 2019 10:41:41 -0000 Subject: [Gluster-users] Geo Replication stops replicating In-Reply-To: References: Message-ID: Done. We got the following result . > 1559298781.338234 write(2, "rsync: link_stat > \"/tmp/gsyncd-aux-mount-EEJ_sY/.gfid/3fa6aed8-802e-4efe-9903-8bc171176d88\" > failed: No such file or directory (2)", 128 seems like a file is missing ? On Fri, May 31, 2019 at 3:25 PM Kotresh Hiremath Ravishankar < khiremat at redhat.com> wrote: > Hi, > > Could you take the strace with with more string size? The argument strings > are truncated. > > strace -s 500 -ttt -T -p > > On Fri, May 31, 2019 at 3:17 PM deepu srinivasan > wrote: > >> Hi Kotresh >> The above-mentioned work around did not work properly. >> >> On Fri, May 31, 2019 at 3:16 PM deepu srinivasan >> wrote: >> >>> Hi Kotresh >>> We have tried the above-mentioned rsync option and we are planning to >>> have the version upgrade to 6.0. >>> >>> On Fri, May 31, 2019 at 11:04 AM Kotresh Hiremath Ravishankar < >>> khiremat at redhat.com> wrote: >>> >>>> Hi, >>>> >>>> This looks like the hang because stderr buffer filled up with errors >>>> messages and no one reading it. >>>> I think this issue is fixed in latest releases. As a workaround, you >>>> can do following and check if it works. >>>> >>>> Prerequisite: >>>> rsync version should be > 3.1.0 >>>> >>>> Workaround: >>>> gluster volume geo-replication :: >>>> config rsync-options "--ignore-missing-args" >>>> >>>> Thanks, >>>> Kotresh HR >>>> >>>> >>>> >>>> >>>> On Thu, May 30, 2019 at 5:39 PM deepu srinivasan >>>> wrote: >>>> >>>>> Hi >>>>> We were evaluating Gluster geo Replication between two DCs one is in >>>>> US west and one is in US east. We took multiple trials for different file >>>>> size. >>>>> The Geo Replication tends to stop replicating but while checking the >>>>> status it appears to be in Active state. But the slave volume did not >>>>> increase in size. >>>>> So we have restarted the geo-replication session and checked the >>>>> status. The status was in an active state and it was in History Crawl for a >>>>> long time. We have enabled the DEBUG mode in logging and checked for any >>>>> error. >>>>> There was around 2000 file appeared for syncing candidate. The Rsync >>>>> process starts but the rsync did not happen in the slave volume. Every time >>>>> the rsync process appears in the "ps auxxx" list but the replication did >>>>> not happen in the slave end. What would be the cause of this problem? Is >>>>> there anyway to debug it? >>>>> >>>>> We have also checked the strace of the rync program. >>>>> it displays something like this >>>>> >>>>> "write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 128" >>>>> >>>>> >>>>> We are using the below specs >>>>> >>>>> Gluster version - 4.1.7 >>>>> Sync mode - rsync >>>>> Volume - 1x3 in each end (master and slave) >>>>> Intranet Bandwidth - 10 Gig >>>>> >>>> >>>> >>>> -- >>>> Thanks and Regards, >>>> Kotresh H R >>>> >>> > > -- > Thanks and Regards, > Kotresh H R > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdeepugd at gmail.com Fri May 31 10:59:03 2019 From: sdeepugd at gmail.com (deepu srinivasan) Date: Fri, 31 May 2019 10:59:03 -0000 Subject: [Gluster-users] Geo Replication stops replicating In-Reply-To: References: Message-ID: Hi When i change the rsync option the rsync process doesnt seem to start . Only a defunt process is listed in ps aux. Only when i set rsync option to " " and restart all the process the rsync process is listed in ps aux. On Fri, May 31, 2019 at 4:23 PM Kotresh Hiremath Ravishankar < khiremat at redhat.com> wrote: > Yes, rsync config option should have fixed this issue. > > Could you share the output of the following? > > 1. gluster volume geo-replication :: > config rsync-options > 2. ps -ef | grep rsync > > On Fri, May 31, 2019 at 4:11 PM deepu srinivasan > wrote: > >> Done. >> We got the following result . >> >>> 1559298781.338234 write(2, "rsync: link_stat >>> \"/tmp/gsyncd-aux-mount-EEJ_sY/.gfid/3fa6aed8-802e-4efe-9903-8bc171176d88\" >>> failed: No such file or directory (2)", 128 >> >> seems like a file is missing ? >> >> On Fri, May 31, 2019 at 3:25 PM Kotresh Hiremath Ravishankar < >> khiremat at redhat.com> wrote: >> >>> Hi, >>> >>> Could you take the strace with with more string size? The argument >>> strings are truncated. >>> >>> strace -s 500 -ttt -T -p >>> >>> On Fri, May 31, 2019 at 3:17 PM deepu srinivasan >>> wrote: >>> >>>> Hi Kotresh >>>> The above-mentioned work around did not work properly. >>>> >>>> On Fri, May 31, 2019 at 3:16 PM deepu srinivasan >>>> wrote: >>>> >>>>> Hi Kotresh >>>>> We have tried the above-mentioned rsync option and we are planning to >>>>> have the version upgrade to 6.0. >>>>> >>>>> On Fri, May 31, 2019 at 11:04 AM Kotresh Hiremath Ravishankar < >>>>> khiremat at redhat.com> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> This looks like the hang because stderr buffer filled up with errors >>>>>> messages and no one reading it. >>>>>> I think this issue is fixed in latest releases. As a workaround, you >>>>>> can do following and check if it works. >>>>>> >>>>>> Prerequisite: >>>>>> rsync version should be > 3.1.0 >>>>>> >>>>>> Workaround: >>>>>> gluster volume geo-replication :: >>>>>> config rsync-options "--ignore-missing-args" >>>>>> >>>>>> Thanks, >>>>>> Kotresh HR >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Thu, May 30, 2019 at 5:39 PM deepu srinivasan >>>>>> wrote: >>>>>> >>>>>>> Hi >>>>>>> We were evaluating Gluster geo Replication between two DCs one is in >>>>>>> US west and one is in US east. We took multiple trials for different file >>>>>>> size. >>>>>>> The Geo Replication tends to stop replicating but while checking the >>>>>>> status it appears to be in Active state. But the slave volume did not >>>>>>> increase in size. >>>>>>> So we have restarted the geo-replication session and checked the >>>>>>> status. The status was in an active state and it was in History Crawl for a >>>>>>> long time. We have enabled the DEBUG mode in logging and checked for any >>>>>>> error. >>>>>>> There was around 2000 file appeared for syncing candidate. The Rsync >>>>>>> process starts but the rsync did not happen in the slave volume. Every time >>>>>>> the rsync process appears in the "ps auxxx" list but the replication did >>>>>>> not happen in the slave end. What would be the cause of this problem? Is >>>>>>> there anyway to debug it? >>>>>>> >>>>>>> We have also checked the strace of the rync program. >>>>>>> it displays something like this >>>>>>> >>>>>>> "write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 128" >>>>>>> >>>>>>> >>>>>>> We are using the below specs >>>>>>> >>>>>>> Gluster version - 4.1.7 >>>>>>> Sync mode - rsync >>>>>>> Volume - 1x3 in each end (master and slave) >>>>>>> Intranet Bandwidth - 10 Gig >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Thanks and Regards, >>>>>> Kotresh H R >>>>>> >>>>> >>> >>> -- >>> Thanks and Regards, >>> Kotresh H R >>> >> > > -- > Thanks and Regards, > Kotresh H R > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdeepugd at gmail.com Fri May 31 12:02:35 2019 From: sdeepugd at gmail.com (deepu srinivasan) Date: Fri, 31 May 2019 12:02:35 -0000 Subject: [Gluster-users] Geo Replication stops replicating In-Reply-To: References: Message-ID: Checked the data. It remains in 2708. No progress. On Fri, May 31, 2019 at 4:36 PM Kotresh Hiremath Ravishankar < khiremat at redhat.com> wrote: > That means it could be working and the defunct process might be some old > zombie one. Could you check, that data progress ? > > On Fri, May 31, 2019 at 4:29 PM deepu srinivasan > wrote: > >> Hi >> When i change the rsync option the rsync process doesnt seem to start . >> Only a defunt process is listed in ps aux. Only when i set rsync option to >> " " and restart all the process the rsync process is listed in ps aux. >> >> >> On Fri, May 31, 2019 at 4:23 PM Kotresh Hiremath Ravishankar < >> khiremat at redhat.com> wrote: >> >>> Yes, rsync config option should have fixed this issue. >>> >>> Could you share the output of the following? >>> >>> 1. gluster volume geo-replication :: >>> config rsync-options >>> 2. ps -ef | grep rsync >>> >>> On Fri, May 31, 2019 at 4:11 PM deepu srinivasan >>> wrote: >>> >>>> Done. >>>> We got the following result . >>>> >>>>> 1559298781.338234 write(2, "rsync: link_stat >>>>> \"/tmp/gsyncd-aux-mount-EEJ_sY/.gfid/3fa6aed8-802e-4efe-9903-8bc171176d88\" >>>>> failed: No such file or directory (2)", 128 >>>> >>>> seems like a file is missing ? >>>> >>>> On Fri, May 31, 2019 at 3:25 PM Kotresh Hiremath Ravishankar < >>>> khiremat at redhat.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> Could you take the strace with with more string size? The argument >>>>> strings are truncated. >>>>> >>>>> strace -s 500 -ttt -T -p >>>>> >>>>> On Fri, May 31, 2019 at 3:17 PM deepu srinivasan >>>>> wrote: >>>>> >>>>>> Hi Kotresh >>>>>> The above-mentioned work around did not work properly. >>>>>> >>>>>> On Fri, May 31, 2019 at 3:16 PM deepu srinivasan >>>>>> wrote: >>>>>> >>>>>>> Hi Kotresh >>>>>>> We have tried the above-mentioned rsync option and we are planning >>>>>>> to have the version upgrade to 6.0. >>>>>>> >>>>>>> On Fri, May 31, 2019 at 11:04 AM Kotresh Hiremath Ravishankar < >>>>>>> khiremat at redhat.com> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> This looks like the hang because stderr buffer filled up with >>>>>>>> errors messages and no one reading it. >>>>>>>> I think this issue is fixed in latest releases. As a workaround, >>>>>>>> you can do following and check if it works. >>>>>>>> >>>>>>>> Prerequisite: >>>>>>>> rsync version should be > 3.1.0 >>>>>>>> >>>>>>>> Workaround: >>>>>>>> gluster volume geo-replication :: >>>>>>>> config rsync-options "--ignore-missing-args" >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Kotresh HR >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Thu, May 30, 2019 at 5:39 PM deepu srinivasan < >>>>>>>> sdeepugd at gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi >>>>>>>>> We were evaluating Gluster geo Replication between two DCs one is >>>>>>>>> in US west and one is in US east. We took multiple trials for different >>>>>>>>> file size. >>>>>>>>> The Geo Replication tends to stop replicating but while checking >>>>>>>>> the status it appears to be in Active state. But the slave volume did not >>>>>>>>> increase in size. >>>>>>>>> So we have restarted the geo-replication session and checked the >>>>>>>>> status. The status was in an active state and it was in History Crawl for a >>>>>>>>> long time. We have enabled the DEBUG mode in logging and checked for any >>>>>>>>> error. >>>>>>>>> There was around 2000 file appeared for syncing candidate. The >>>>>>>>> Rsync process starts but the rsync did not happen in the slave volume. >>>>>>>>> Every time the rsync process appears in the "ps auxxx" list but the >>>>>>>>> replication did not happen in the slave end. What would be the cause of >>>>>>>>> this problem? Is there anyway to debug it? >>>>>>>>> >>>>>>>>> We have also checked the strace of the rync program. >>>>>>>>> it displays something like this >>>>>>>>> >>>>>>>>> "write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 128" >>>>>>>>> >>>>>>>>> >>>>>>>>> We are using the below specs >>>>>>>>> >>>>>>>>> Gluster version - 4.1.7 >>>>>>>>> Sync mode - rsync >>>>>>>>> Volume - 1x3 in each end (master and slave) >>>>>>>>> Intranet Bandwidth - 10 Gig >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Thanks and Regards, >>>>>>>> Kotresh H R >>>>>>>> >>>>>>> >>>>> >>>>> -- >>>>> Thanks and Regards, >>>>> Kotresh H R >>>>> >>>> >>> >>> -- >>> Thanks and Regards, >>> Kotresh H R >>> >> > > -- > Thanks and Regards, > Kotresh H R > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.has.questions at gmail.com Thu May 23 18:10:32 2019 From: matthew.has.questions at gmail.com (Matthew B) Date: Thu, 23 May 2019 18:10:32 -0000 Subject: [Gluster-users] Geo-Replication faulty - Changelog register failed error=[Errno 21] Is a directory Message-ID: Hello - I am having a problem with geo-replication on glusterv5 that I hope someone can help me with. I have a 7-server distribute cluster as the primary volume, and a 2 server distribute cluster as the secondary volume. Both are running the same version of gluster on CentOS 7: glusterfs-5.3-2.el7.x86_64 I was able to setup the replication keys, user, groups, etc and establish the session, but it goes faulty quickly after initializing. I ran into the missing libgfchangelog.so error and fixed with a symlink: [root at pcic-backup01 ~]# ln -s /usr/lib64/libgfchangelog.so.0 /usr/lib64/libgfchangelog.so [root at pcic-backup01 ~]# ls -lh /usr/lib64/libgfchangelog.so* lrwxrwxrwx. 1 root root 30 May 16 13:16 /usr/lib64/libgfchangelog.so -> /usr/lib64/libgfchangelog.so.0 lrwxrwxrwx. 1 root root 23 May 16 08:58 /usr/lib64/libgfchangelog.so.0 -> libgfchangelog.so.0.0.1 -rwxr-xr-x. 1 root root 62K Feb 25 04:02 /usr/lib64/libgfchangelog.so.0.0.1 But right now, when trying to start replication it goes faulty: [root at gluster01 ~]# gluster volume geo-replication storage geoaccount at 10.0.231.81::pcic-backup start Starting geo-replication session between storage & geoaccount at 10.0.231.81::pcic-backup has been successful [root at gluster01 ~]# gluster volume geo-replication status MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 10.0.231.50 storage /mnt/raid6-storage/storage geoaccount ssh://geoaccount at 10.0.231.81::pcic-backup N/A Initializing... N/A N/A 10.0.231.54 storage /mnt/raid6-storage/storage geoaccount ssh://geoaccount at 10.0.231.81::pcic-backup N/A Initializing... N/A N/A 10.0.231.56 storage /mnt/raid6-storage/storage geoaccount ssh://geoaccount at 10.0.231.81::pcic-backup N/A Initializing... N/A N/A 10.0.231.52 storage /mnt/raid6-storage/storage geoaccount ssh://geoaccount at 10.0.231.81::pcic-backup N/A Initializing... N/A N/A 10.0.231.55 storage /mnt/raid6-storage/storage geoaccount ssh://geoaccount at 10.0.231.81::pcic-backup N/A Initializing... N/A N/A 10.0.231.51 storage /mnt/raid6-storage/storage geoaccount ssh://geoaccount at 10.0.231.81::pcic-backup N/A Initializing... N/A N/A 10.0.231.53 storage /mnt/raid6-storage/storage geoaccount ssh://geoaccount at 10.0.231.81::pcic-backup N/A Initializing... N/A N/A [root at gluster01 ~]# gluster volume geo-replication status MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 10.0.231.50 storage /mnt/raid6-storage/storage geoaccount ssh://geoaccount at 10.0.231.81::pcic-backup N/A Faulty N/A N/A 10.0.231.54 storage /mnt/raid6-storage/storage geoaccount ssh://geoaccount at 10.0.231.81::pcic-backup N/A Faulty N/A N/A 10.0.231.56 storage /mnt/raid6-storage/storage geoaccount ssh://geoaccount at 10.0.231.81::pcic-backup N/A Faulty N/A N/A 10.0.231.55 storage /mnt/raid6-storage/storage geoaccount ssh://geoaccount at 10.0.231.81::pcic-backup N/A Faulty N/A N/A 10.0.231.53 storage /mnt/raid6-storage/storage geoaccount ssh://geoaccount at 10.0.231.81::pcic-backup N/A Faulty N/A N/A 10.0.231.51 storage /mnt/raid6-storage/storage geoaccount ssh://geoaccount at 10.0.231.81::pcic-backup N/A Faulty N/A N/A 10.0.231.52 storage /mnt/raid6-storage/storage geoaccount ssh://geoaccount at 10.0.231.81::pcic-backup N/A Faulty N/A N/A [root at gluster01 ~]# gluster volume geo-replication storage geoaccount at 10.0.231.81::pcic-backup stop Stopping geo-replication session between storage & geoaccount at 10.0.231.81::pcic-backup has been successful And the /var/log/glusterfs/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.log log file contains the error: GLUSTER: Changelog register failed error=[Errno 21] Is a directory [root at gluster01 ~]# cat /var/log/glusterfs/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.log [2019-05-23 17:07:23.500781] I [gsyncd(config-get):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:23.629298] I [gsyncd(status):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:31.354005] I [gsyncd(config-get):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:31.483582] I [gsyncd(config-get):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:31.863888] I [gsyncd(config-get):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:31.994895] I [gsyncd(monitor):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:33.133888] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change status=Initializing... [2019-05-23 17:07:33.134301] I [monitor(monitor):157:monitor] Monitor: starting gsyncd worker brick=/mnt/raid6-storage/storage slave_node=10.0.231.81 [2019-05-23 17:07:33.214462] I [gsyncd(agent /mnt/raid6-storage/storage):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:33.216737] I [changelogagent(agent /mnt/raid6-storage/storage):72:__init__] ChangelogAgent: Agent listining... [2019-05-23 17:07:33.228072] I [gsyncd(worker /mnt/raid6-storage/storage):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:33.247236] I [resource(worker /mnt/raid6-storage/storage):1366:connect_remote] SSH: Initializing SSH connection between master and slave... [2019-05-23 17:07:34.948796] I [gsyncd(config-get):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:35.73339] I [gsyncd(status):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:35.232405] I [resource(worker /mnt/raid6-storage/storage):1413:connect_remote] SSH: SSH connection between master and slave established. duration=1.9849 [2019-05-23 17:07:35.232748] I [resource(worker /mnt/raid6-storage/storage):1085:connect] GLUSTER: Mounting gluster volume locally... [2019-05-23 17:07:36.359250] I [resource(worker /mnt/raid6-storage/storage):1108:connect] GLUSTER: Mounted gluster volume duration=1.1262 [2019-05-23 17:07:36.359639] I [subcmds(worker /mnt/raid6-storage/storage):80:subcmd_worker] : Worker spawn successful. Acknowledging back to monitor [2019-05-23 17:07:36.380975] E [repce(agent /mnt/raid6-storage/storage):122:worker] : call failed: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 118, in worker res = getattr(self.obj, rmeth)(*in_data[2:]) File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 40, in register return Changes.cl_register(cl_brick, cl_dir, cl_log, cl_level, retries) File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 45, in cl_register cls.raise_changelog_err() File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 29, in raise_changelog_err raise ChangelogException(errn, os.strerror(errn)) ChangelogException: [Errno 21] Is a directory [2019-05-23 17:07:36.382556] E [repce(worker /mnt/raid6-storage/storage):214:__call__] RepceClient: call failed call=27412:140659114579776:1558631256.38 method=register error=ChangelogException [2019-05-23 17:07:36.382833] E [resource(worker /mnt/raid6-storage/storage):1266:service_loop] GLUSTER: Changelog register failed error=[Errno 21] Is a directory [2019-05-23 17:07:36.404313] I [repce(agent /mnt/raid6-storage/storage):97:service_loop] RepceServer: terminating on reaching EOF. [2019-05-23 17:07:37.361396] I [monitor(monitor):278:monitor] Monitor: worker died in startup phase brick=/mnt/raid6-storage/storage [2019-05-23 17:07:37.370690] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change status=Faulty [2019-05-23 17:07:41.526408] I [gsyncd(config-get):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:41.643923] I [gsyncd(status):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:45.722193] I [gsyncd(config-get):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:45.817210] I [gsyncd(config-get):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:46.188499] I [gsyncd(config-get):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:46.258817] I [gsyncd(config-get):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:47.350276] I [gsyncd(monitor-status):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:47.364751] I [subcmds(monitor-status):29:subcmd_monitor_status] : Monitor Status Change status=Stopped I'm not really sure where to go from here... [root at gluster01 ~]# gluster volume geo-replication storage geoaccount at 10.0.231.81::pcic-backup config | grep -i changelog change_detector:changelog changelog_archive_format:%Y%m changelog_batch_size:727040 changelog_log_file:/var/log/glusterfs/geo-replication/storage_10.0.231.81_pcic-backup/changes-${local_id}.log changelog_log_level:INFO Thanks, -Matthew -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.has.questions at gmail.com Tue May 28 15:03:11 2019 From: matthew.has.questions at gmail.com (Matthew B) Date: Tue, 28 May 2019 15:03:11 -0000 Subject: [Gluster-users] Geo-Replication faulty - changelog register failed - Is a directory Message-ID: Hello - I am having a problem with geo-replication on glusterv5 that I hope someone can help me with. I have a 7-server distribute cluster as the primary volume, and a 2 server distribute cluster as the secondary volume. Both are running the same version of gluster on CentOS 7: glusterfs-5.3-2.el7.x86_64 I was able to setup the replication keys, user, groups, etc and establish the session, but it goes faulty quickly after initializing. I ran into the missing libgfchangelog.so error and fixed with a symlink: [root at pcic-backup01 ~]# ln -s /usr/lib64/libgfchangelog.so.0 /usr/lib64/libgfchangelog.so [root at pcic-backup01 ~]# ls -lh /usr/lib64/libgfchangelog.so* lrwxrwxrwx. 1 root root 30 May 16 13:16 /usr/lib64/libgfchangelog.so -> /usr/lib64/libgfchangelog.so.0 lrwxrwxrwx. 1 root root 23 May 16 08:58 /usr/lib64/libgfchangelog.so.0 -> libgfchangelog.so.0.0.1 -rwxr-xr-x. 1 root root 62K Feb 25 04:02 /usr/lib64/libgfchangelog.so.0.0.1 But right now, when trying to start replication it goes faulty: [root at gluster01 ~]# gluster volume geo-replication storage geoaccount at 10.0.231.81::pcic-backup start Starting geo-replication session between storage & geoaccount at 10.0.231.81::pcic-backup has been successful [root at gluster01 ~]# gluster volume geo-replication status MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 10.0.231.50 storage /mnt/raid6-storage/storage geoaccount ssh://geoaccount at 10.0.231.81::pcic-backup N/A Initializing... N/A N/A 10.0.231.54 storage /mnt/raid6-storage/storage geoaccount ssh://geoaccount at 10.0.231.81::pcic-backup N/A Initializing... N/A N/A 10.0.231.56 storage /mnt/raid6-storage/storage geoaccount ssh://geoaccount at 10.0.231.81::pcic-backup N/A Initializing... N/A N/A 10.0.231.52 storage /mnt/raid6-storage/storage geoaccount ssh://geoaccount at 10.0.231.81::pcic-backup N/A Initializing... N/A N/A 10.0.231.55 storage /mnt/raid6-storage/storage geoaccount ssh://geoaccount at 10.0.231.81::pcic-backup N/A Initializing... N/A N/A 10.0.231.51 storage /mnt/raid6-storage/storage geoaccount ssh://geoaccount at 10.0.231.81::pcic-backup N/A Initializing... N/A N/A 10.0.231.53 storage /mnt/raid6-storage/storage geoaccount ssh://geoaccount at 10.0.231.81::pcic-backup N/A Initializing... N/A N/A [root at gluster01 ~]# gluster volume geo-replication status MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 10.0.231.50 storage /mnt/raid6-storage/storage geoaccount ssh://geoaccount at 10.0.231.81::pcic-backup N/A Faulty N/A N/A 10.0.231.54 storage /mnt/raid6-storage/storage geoaccount ssh://geoaccount at 10.0.231.81::pcic-backup N/A Faulty N/A N/A 10.0.231.56 storage /mnt/raid6-storage/storage geoaccount ssh://geoaccount at 10.0.231.81::pcic-backup N/A Faulty N/A N/A 10.0.231.55 storage /mnt/raid6-storage/storage geoaccount ssh://geoaccount at 10.0.231.81::pcic-backup N/A Faulty N/A N/A 10.0.231.53 storage /mnt/raid6-storage/storage geoaccount ssh://geoaccount at 10.0.231.81::pcic-backup N/A Faulty N/A N/A 10.0.231.51 storage /mnt/raid6-storage/storage geoaccount ssh://geoaccount at 10.0.231.81::pcic-backup N/A Faulty N/A N/A 10.0.231.52 storage /mnt/raid6-storage/storage geoaccount ssh://geoaccount at 10.0.231.81::pcic-backup N/A Faulty N/A N/A [root at gluster01 ~]# gluster volume geo-replication storage geoaccount at 10.0.231.81::pcic-backup stop Stopping geo-replication session between storage & geoaccount at 10.0.231.81::pcic-backup has been successful And the /var/log/glusterfs/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.log log file contains the error: GLUSTER: Changelog register failed error=[Errno 21] Is a directory [root at gluster01 ~]# cat /var/log/glusterfs/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.log [2019-05-23 17:07:23.500781] I [gsyncd(config-get):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:23.629298] I [gsyncd(status):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:31.354005] I [gsyncd(config-get):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:31.483582] I [gsyncd(config-get):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:31.863888] I [gsyncd(config-get):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:31.994895] I [gsyncd(monitor):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:33.133888] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change status=Initializing... [2019-05-23 17:07:33.134301] I [monitor(monitor):157:monitor] Monitor: starting gsyncd worker brick=/mnt/raid6-storage/storage slave_node=10.0.231.81 [2019-05-23 17:07:33.214462] I [gsyncd(agent /mnt/raid6-storage/storage):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:33.216737] I [changelogagent(agent /mnt/raid6-storage/storage):72:__init__] ChangelogAgent: Agent listining... [2019-05-23 17:07:33.228072] I [gsyncd(worker /mnt/raid6-storage/storage):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:33.247236] I [resource(worker /mnt/raid6-storage/storage):1366:connect_remote] SSH: Initializing SSH connection between master and slave... [2019-05-23 17:07:34.948796] I [gsyncd(config-get):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:35.73339] I [gsyncd(status):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:35.232405] I [resource(worker /mnt/raid6-storage/storage):1413:connect_remote] SSH: SSH connection between master and slave established. duration=1.9849 [2019-05-23 17:07:35.232748] I [resource(worker /mnt/raid6-storage/storage):1085:connect] GLUSTER: Mounting gluster volume locally... [2019-05-23 17:07:36.359250] I [resource(worker /mnt/raid6-storage/storage):1108:connect] GLUSTER: Mounted gluster volume duration=1.1262 [2019-05-23 17:07:36.359639] I [subcmds(worker /mnt/raid6-storage/storage):80:subcmd_worker] : Worker spawn successful. Acknowledging back to monitor [2019-05-23 17:07:36.380975] E [repce(agent /mnt/raid6-storage/storage):122:worker] : call failed: Traceback (most recent call last): File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 118, in worker res = getattr(self.obj, rmeth)(*in_data[2:]) File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line 40, in register return Changes.cl_register(cl_brick, cl_dir, cl_log, cl_level, retries) File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 45, in cl_register cls.raise_changelog_err() File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line 29, in raise_changelog_err raise ChangelogException(errn, os.strerror(errn)) ChangelogException: [Errno 21] Is a directory [2019-05-23 17:07:36.382556] E [repce(worker /mnt/raid6-storage/storage):214:__call__] RepceClient: call failed call=27412:140659114579776:1558631256.38 method=register error=ChangelogException [2019-05-23 17:07:36.382833] E [resource(worker /mnt/raid6-storage/storage):1266:service_loop] GLUSTER: Changelog register failed error=[Errno 21] Is a directory [2019-05-23 17:07:36.404313] I [repce(agent /mnt/raid6-storage/storage):97:service_loop] RepceServer: terminating on reaching EOF. [2019-05-23 17:07:37.361396] I [monitor(monitor):278:monitor] Monitor: worker died in startup phase brick=/mnt/raid6-storage/storage [2019-05-23 17:07:37.370690] I [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status Change status=Faulty [2019-05-23 17:07:41.526408] I [gsyncd(config-get):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:41.643923] I [gsyncd(status):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:45.722193] I [gsyncd(config-get):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:45.817210] I [gsyncd(config-get):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:46.188499] I [gsyncd(config-get):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:46.258817] I [gsyncd(config-get):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:47.350276] I [gsyncd(monitor-status):308:main] : Using session config file path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf [2019-05-23 17:07:47.364751] I [subcmds(monitor-status):29:subcmd_monitor_status] : Monitor Status Change status=Stopped I'm not really sure where to go from here... [root at gluster01 ~]# gluster volume geo-replication storage geoaccount at 10.0.231.81::pcic-backup config | grep -i changelog change_detector:changelog changelog_archive_format:%Y%m changelog_batch_size:727040 changelog_log_file:/var/log/glusterfs/geo-replication/storage_10.0.231.81_pcic-backup/changes-${local_id}.log changelog_log_level:INFO Thanks, -Matthew -------------- next part -------------- An HTML attachment was scrubbed... URL: