From atumball at redhat.com  Wed May  1 12:29:43 2019
From: atumball at redhat.com (Amar Tumballi Suryanarayan)
Date: Wed, 1 May 2019 17:59:43 +0530
Subject: [Gluster-users] parallel-readdir prevents directories and files
 listing - Bug 1670382
In-Reply-To: <CAEkxddLheiAj2YAqw8Hh-+6kPZe18B19TYVw=w69ktYcAAvS6w@mail.gmail.com>
References: <CAEkxddLheiAj2YAqw8Hh-+6kPZe18B19TYVw=w69ktYcAAvS6w@mail.gmail.com>
Message-ID: <CAHxyDdNTasiOdMAEu=BmMJOT7PpSXz2v0fQeWsDWTEyX8PP_dw@mail.gmail.com>

On Mon, Apr 29, 2019 at 3:56 PM Jo?o Ba?to <
joao.bauto at neuro.fchampalimaud.org> wrote:

> Hi,
>
> I have an 8 brick distributed volume where Windows and Linux clients mount
> the volume via samba and headless compute servers using gluster native
> fuse. With parallel-readdir on, if a Windows client creates a new folder,
> the folder is indeed created but invisible to the Windows client. Accessing
> the same samba share in a Linux client, the folder is again visible and
> with normal behavior. The same folder is also visible when mounting via
> gluster native fuse.
>
> The Windows client can list existing directories and rename them while,
> for files, everything seems to be working fine.
>
> Gluster servers: CentOS 7.5 with Gluster 5.3 and Samba 4.8.3-4.el7.0.1
> from @fasttrack
> Clients tested: Windows 10, Ubuntu 18.10, CentOS 7.5
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1670382
>

Thanks for the bug report. Will look into this, and get back.

Last I knew, we recommended to avoid fuse and samba shares on same volume
(Mainly as we couldn't spend a lot of effort on testing the configuration).
Anyways, we would treat the behavior as bug for sure. One possible path
looking at below volume info is to disable 'stat-prefetch' option and see
if it helps. Next option I would try is to disable readdir-ahead.

Regards,
Amar


> <https://bugzilla.redhat.com/show_bug.cgi?id=1670382>
>
> Volume Name: tank
> Type: Distribute
> Volume ID: 9582685f-07fa-41fd-b9fc-ebab3a6989cf
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 8
> Transport-type: tcp
> Bricks:
> Brick1: swp-gluster-01:/tank/volume1/brick
> Brick2: swp-gluster-02:/tank/volume1/brick
> Brick3: swp-gluster-03:/tank/volume1/brick
> Brick4: swp-gluster-04:/tank/volume1/brick
> Brick5: swp-gluster-01:/tank/volume2/brick
> Brick6: swp-gluster-02:/tank/volume2/brick
> Brick7: swp-gluster-03:/tank/volume2/brick
> Brick8: swp-gluster-04:/tank/volume2/brick
> Options Reconfigured:
> performance.parallel-readdir: on
> performance.readdir-ahead: on
> performance.cache-invalidation: on
> performance.md-cache-timeout: 600
> storage.batch-fsync-delay-usec: 0
> performance.write-behind-window-size: 32MB
> performance.stat-prefetch: on
> performance.read-ahead: on
> performance.read-ahead-page-count: 16
> performance.rda-request-size: 131072
> performance.quick-read: on
> performance.open-behind: on
> performance.nl-cache-timeout: 600
> performance.nl-cache: on
> performance.io-thread-count: 64
> performance.io-cache: off
> performance.flush-behind: on
> performance.client-io-threads: off
> performance.write-behind: off
> performance.cache-samba-metadata: on
> network.inode-lru-limit: 0
> features.cache-invalidation-timeout: 600
> features.cache-invalidation: on
> cluster.readdir-optimize: on
> cluster.lookup-optimize: on
> client.event-threads: 4
> server.event-threads: 16
> features.quota-deem-statfs: on
> nfs.disable: on
> features.quota: on
> features.inode-quota: on
> cluster.enable-shared-storage: disable
>
> Cheers,
>
> Jo?o Ba?to
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users


-- 
Amar Tumballi (amarts)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190501/9c91d3c6/attachment.html>

From atumball at redhat.com  Wed May  1 12:31:58 2019
From: atumball at redhat.com (Amar Tumballi Suryanarayan)
Date: Wed, 1 May 2019 18:01:58 +0530
Subject: [Gluster-users] Gluster 5 Geo-replication Guide
In-Reply-To: <CABDxbRhcDHwDi=8XHhj0j8uLjS-VuL1-JChgxPx0jm8Re-QZDA@mail.gmail.com>
References: <CABDxbRhcDHwDi=8XHhj0j8uLjS-VuL1-JChgxPx0jm8Re-QZDA@mail.gmail.com>
Message-ID: <CAHxyDdOy7RHefJExcyuHHzQD7DuhjjHBCymsvsxuySMFo+kd+A@mail.gmail.com>

On Fri, Apr 26, 2019 at 7:00 PM Shon Stephens <shonrs at redhat.com> wrote:

> Dear All,
>      Is there a good, step by step guide for setting up geo-replication
> with Glusterfs 5? The docs are a difficult to decipher read, for me, and
> seem more feature guide than actual instruction.
>
>
Geo-Replication steps in glusterfs-5 is similar to the previous versions
(and glusterfs-6.x too).  If you are used to Ansible to setup gluster for
you, we already have geo-replication setup automated with Ansible @
http://github.com/gluster/gluster-ansible

-Amar


> Thank you,
> Shon
> --
>
> SHON STEPHENS
>
> SENIOR CONSULTANT
>
> Red Hat <https://www.redhat.com/>
>
> T: 571-781-0787    M: 703-297-0682
> <https://red.ht/sig>
> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users


-- 
Amar Tumballi (amarts)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190501/da7dff77/attachment.html>

From atumball at redhat.com  Wed May  1 12:34:52 2019
From: atumball at redhat.com (Amar Tumballi Suryanarayan)
Date: Wed, 1 May 2019 18:04:52 +0530
Subject: [Gluster-users] GlusterFS on ZFS
In-Reply-To: <AFAEF562-85BF-431D-B222-6EA01B51A6B9@platform9.com>
References: <085deed5-f048-4baa-84f8-1f6ef1436a5b@email.android.com>
	<AFAEF562-85BF-431D-B222-6EA01B51A6B9@platform9.com>
Message-ID: <CAHxyDdPAS7d+b67E+XSLm-27FFU6uU0BkAuYGUuGkoRRQzeZkA@mail.gmail.com>

On Tue, Apr 23, 2019 at 11:38 PM Cody Hill <cody at platform9.com> wrote:

>
> Thanks for the info Karli,
>
> I wasn?t aware ZFS Dedup was such a dog. I guess I?ll leave that off. My
> data get?s 3.5:1 savings on compression alone. I was aware of stripped
> sets. I will be doing 6x Striped sets across 12x disks.
>
> On top of this design I?m going to try and test Intel Optane DIMM (512GB)
> as a ?Tier? for GlusterFS to try and get further write acceleration. And
> issues with GlusterFS ?Tier? functionality that anyone is aware of?
>
>
Hi Cody, I wanted to be honest about GlusterFS 'Tier' functionality. While
it is functional and works, we had not seen the actual benefit we expected
with the feature, and noticed it is better to use the tiering on each host
machine (ie, on bricks) and use those bricks as glusterfs bricks. (like
dmcache).

Also note that from glusterfs-6.x releases, Tier feature is deprecated.

-Amar


> Thank you,
> Cody Hill
>
> On Apr 18, 2019, at 2:32 AM, Karli Sj?berg <karli at inparadise.se> wrote:
>
>
>
> Den 17 apr. 2019 16:30 skrev Cody Hill <cody at platform9.com>:
>
> Hey folks.
>
> I?m looking to deploy GlusterFS to host some VMs. I?ve done a lot of
> reading and would like to implement Deduplication and Compression in this
> setup. My thought would be to run ZFS to handle the Compression and
> Deduplication.
>
>
> You _really_ don't want ZFS doing dedup for any reason.
>
>
> ZFS would give me the following benefits:
> 1. If a single disk fails rebuilds happen locally instead of over the
> network
> 2. Zil & L2Arc should add a slight performance increase
>
>
> Adding two really good NVME SSD's as a mirrored SLOG vdev does a huge deal
> for synchronous write performance, turning every random write into large
> streams that the spinning drives handle better.
>
> Don't know how picky Gluster is about synchronicity though, most
> "performance" tweaking suggests setting stuff to async, which I wouldn't
> recommend, but it's a huge boost for throughput obviously; not having to
> wait for stuff to actually get written, but it's dangerous.
>
> With mirrored NVME SLOG's, you could probably get that throughput without
> going asynchronous, which saves you from potential data corruption in a
> sudden power loss.
>
> L2ARC on the other hand does a bit for read latency, but for a general
> purpose file server- in practice- not a huge difference, the working set is
> just too large. Also keep in mind that L2ARC isn't "free". You need more
> RAM to know where you've cached stuff...
>
> 3. Deduplication and Compression are inline and have pretty good
> performance with modern hardware (Intel Skylake)
>
>
> ZFS deduplication has terrible performance. Watch your throughput
> automatically drop from hundreds or thousands of MB/s down to, like 5. It's
> a feature;)
>
> 4. Automated Snapshotting
>
> I can then layer GlusterFS on top to handle distribution to allow 3x
> Replicas of my storage.
> My question is? Why aren?t more people doing this? Is this a horrible idea
> for some reason that I?m missing?
>
>
> While it could save a lot of space in some hypothetical instance, the
> drawbacks can never motivate it. E.g. if you want one node to suddenly die
> and never recover because of RAM exhaustion, go with ZFS dedup ;)
>
> I?d be very interested to hear your thoughts.
>
>
> Avoid ZFS dedup at all costs. LZ4 compression on the hand is awesome,
> definitely use that! It's basically a free performance enhancer the also
> saves space :)
>
> As another person has said, the best performance layout is RAID10- striped
> mirrors. I understand you'd want to get as much volume as possible with
> RAID-Z/RAID(5|6) since gluster also replicates/distributes, but it has a
> huge impact on IOPS. If performance is the main concern, do striped mirrors
> with replica 3 in Gluster. My advice is to test thoroughly with different
> pool layouts to see what gives acceptable performance against your volume
> requirements.
>
> /K
>
>
> Additional thoughts:
> I?d like to use Ganesha pNFS to connect to this storage. (Any issues here?)
> I think I?d need KeepAliveD across these 3x nodes to store in the FSTAB
> (Is this correct?)
> I?m also thinking about creating a ?Gluster Tier? of 512GB of Intel Optane
> DIMM to really smooth out write latencies? Any issues here?
>
> Thank you,
> Cody Hill
>
>
>
>
>
>
>
>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users


-- 
Amar Tumballi (amarts)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190501/57ff1ba2/attachment-0001.html>

From atumball at redhat.com  Wed May  1 12:36:45 2019
From: atumball at redhat.com (Amar Tumballi Suryanarayan)
Date: Wed, 1 May 2019 18:06:45 +0530
Subject: [Gluster-users] Proposal: Changes in Gluster Community meetings
In-Reply-To: <907BA003-F786-46CF-A31B-38C93CE9BB20@onholyground.com>
References: <CAHxyDdN3NdNyGM3L-uX3zYFPxEYHvwNt7XQOSbA8y=1GH1uhaQ@mail.gmail.com>
	<62104B6F-99CF-4C22-80FC-9C177F73E897@onholyground.com>
	<CAHxyDdOhYfcOMh+LUV3dU+UqrU5AycNHaU21SRu=SVdZYaDuRg@mail.gmail.com>
	<CAHxyDdOZz-aap+_o0ibucAAd202Mq1=y6kDuaJM02ciL-1WRcg@mail.gmail.com>
	<CAMRKmT_oUCj1fvVgpRXLGW87v8ajk73QP1aEMhi5-oekE+oaRA@mail.gmail.com>
	<907BA003-F786-46CF-A31B-38C93CE9BB20@onholyground.com>
Message-ID: <CAHxyDdOUkV=iV1TQLiVLkZ5AGgvBmxiByrPDFsrwNbTtVt2RUw@mail.gmail.com>

On Tue, Apr 23, 2019 at 8:47 PM Darrell Budic <budic at onholyground.com>
wrote:

> I was one of the folk who wanted a NA/EMEA scheduled meeting, and I?m
> going to have to miss it due to some real life issues (clogged sewer I?m
> going to have to be dealing with at the time). Apologies, I?ll work on
>  making the next one.
>
>
No problem. We will continue to have these meetings every week (ie,
bi-weekly in each timezone). Feel free to join when possible. We surely
like to see more community participation for sure, but everyone would have
their day jobs, so no pressure :-)

-Amar


>   -Darrell
>
> On Apr 22, 2019, at 4:20 PM, FNU Raghavendra Manjunath <rabhat at redhat.com>
> wrote:
>
>
> Hi,
>
> This is the agenda for tomorrow's community meeting for NA/EMEA timezone.
>
> https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g?both
> ----
>
>
>
> On Thu, Apr 11, 2019 at 4:56 AM Amar Tumballi Suryanarayan <
> atumball at redhat.com> wrote:
>
>> Hi All,
>>
>> Below is the final details of our community meeting, and I will be
>> sending invites to mailing list following this email. You can add Gluster
>> Community Calendar so you can get notifications on the meetings.
>>
>> We are starting the meetings from next week. For the first meeting, we
>> need 1 volunteer from users to discuss the use case / what went well, and
>> what went bad, etc. preferrably in APAC region.  NA/EMEA region, next week.
>>
>> Draft Content: https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g
>> ----
>> Gluster Community Meeting
>> <https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g?both#Previous-Meeting-minutes>Previous
>> Meeting minutes:
>>
>>    - http://github.com/gluster/community
>>
>>
>> <https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g?both#DateTime-Check-the-community-calendar>Date/Time:
>> Check the community calendar
>> <https://calendar.google.com/calendar/b/1?cid=dmViajVibDBrbnNiOWQwY205ZWg5cGJsaTRAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ>
>> <https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g?both#Bridge>Bridge
>>
>>    - APAC friendly hours
>>       - Bridge: https://bluejeans.com/836554017
>>    - NA/EMEA
>>       - Bridge: https://bluejeans.com/486278655
>>
>> ------------------------------
>> <https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g?both#Attendance>Attendance
>>
>>    - Name, Company<Optional>
>>
>> <https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g?both#Host>Host
>>
>>    - Who will host next meeting?
>>       - Host will need to send out the agenda 24hr - 12hrs in advance to
>>       mailing list, and also make sure to send the meeting minutes.
>>       - Host will need to reach out to one user at least who can talk
>>       about their usecase, their experience, and their needs.
>>       - Host needs to send meeting minutes as PR to
>>       http://github.com/gluster/community
>>
>> <https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g?both#User-stories>User stories
>>
>>    - Discuss 1 usecase from a user.
>>       - How was the architecture derived, what volume type used,
>>       options, etc?
>>       - What were the major issues faced ? How to improve them?
>>       - What worked good?
>>       - How can we all collaborate well, so it is win-win for the
>>       community and the user? How can we
>>
>> <https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g?both#Community>Community
>>
>>    -
>>
>>    Any release updates?
>>    -
>>
>>    Blocker issues across the project?
>>    -
>>
>>    Metrics
>>    - Number of new bugs since previous meeting. How many are not triaged?
>>       - Number of emails, anything unanswered?
>>
>> <https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g?both#Conferences--Meetups>Conferences
>> / Meetups
>>
>>    - Any conference in next 1 month where gluster-developers are going?
>>    gluster-users are going? So we can meet and discuss.
>>
>> <https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g?both#Developer-focus>Developer
>> focus
>>
>>    -
>>
>>    Any design specs to discuss?
>>    -
>>
>>    Metrics of the week?
>>    - Coverity
>>       - Clang-Scan
>>       - Number of patches from new developers.
>>       - Did we increase test coverage?
>>       - [Atin] Also talk about most frequent test failures in the CI and
>>       carve out an AI to get them fixed.
>>
>> <https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g?both#RoundTable>RoundTable
>>
>>    - <Open for anything which is not covered in agenda>
>>
>> ----
>>
>> Regards,
>> Amar
>>
>> On Mon, Mar 25, 2019 at 8:53 PM Amar Tumballi Suryanarayan <
>> atumball at redhat.com> wrote:
>>
>>> Thanks for the feedback Darrell,
>>>
>>> The new proposal is to have one in North America 'morning' time. (10AM
>>> PST), And another in ASIA day time, which is evening 7pm/6pm in Australia,
>>> 9pm Newzealand, 5pm Tokyo, 4pm Beijing.
>>>
>>> For example, if we choose Every other Tuesday for meeting, and 1st of
>>> the month is Tuesday, we would have North America time for 1st, and on 15th
>>> it would be ASIA/Pacific time.
>>>
>>> Hopefully, this way, we can cover all the timezones, and meeting minutes
>>> would be committed to github repo, so that way, it will be easier for
>>> everyone to be aware of what is happening.
>>>
>>> Regards,
>>> Amar
>>>
>>> On Mon, Mar 25, 2019 at 8:40 PM Darrell Budic <budic at onholyground.com>
>>> wrote:
>>>
>>>> As a user, I?d like to visit more of these, but the time slot is my
>>>> 3AM. Any possibility for a rolling schedule (move meeting +6 hours each
>>>> week with rolling attendance from maintainers?) or an occasional regional
>>>> meeting 12 hours opposed to the one you?re proposing?
>>>>
>>>>   -Darrell
>>>>
>>>> On Mar 25, 2019, at 4:25 AM, Amar Tumballi Suryanarayan <
>>>> atumball at redhat.com> wrote:
>>>>
>>>> All,
>>>>
>>>> We currently have 3 meetings which are public:
>>>>
>>>> 1. Maintainer's Meeting
>>>>
>>>> - Runs once in 2 weeks (on Mondays), and current attendance is around
>>>> 3-5 on an avg, and not much is discussed.
>>>> - Without majority attendance, we can't take any decisions too.
>>>>
>>>> 2. Community meeting
>>>>
>>>> - Supposed to happen on #gluster-meeting, every 2 weeks, and is the
>>>> only meeting which is for 'Community/Users'. Others are for developers
>>>> as of now.
>>>> Sadly attendance is getting closer to 0 in recent times.
>>>>
>>>> 3. GCS meeting
>>>>
>>>> - We started it as an effort inside Red Hat gluster team, and opened it
>>>> up for community from Jan 2019, but the attendance was always from RHT
>>>> members, and haven't seen any traction from wider group.
>>>>
>>>> So, I have a proposal to call out for cancelling all these meeting,
>>>> and keeping just 1 weekly 'Community' meeting, where even topics
>>>> related to maintainers and GCS and other projects can be discussed.
>>>>
>>>> I have a template of a draft template @
>>>> https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g
>>>>
>>>> Please feel free to suggest improvements, both in agenda and in
>>>> timings. So, we can have more participation from members of community,
>>>> which allows more user - developer interactions, and hence quality of
>>>> project.
>>>>
>>>> Waiting for feedbacks,
>>>>
>>>> Regards,
>>>> Amar
>>>>
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>>
>>>>
>>>
>>> --
>>> Amar Tumballi (amarts)
>>>
>>
>>
>> --
>> Amar Tumballi (amarts)
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users


-- 
Amar Tumballi (amarts)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190501/04288457/attachment.html>

From atumball at redhat.com  Wed May  1 12:39:44 2019
From: atumball at redhat.com (Amar Tumballi Suryanarayan)
Date: Wed, 1 May 2019 18:09:44 +0530
Subject: [Gluster-users] Community Happy Hour at Red Hat Summit
In-Reply-To: <CACDUr8UJvKcjMUqowSv=r_BhLqESn-J-m1pdFvMYepbiC1hNRA@mail.gmail.com>
References: <CACDUr8UJvKcjMUqowSv=r_BhLqESn-J-m1pdFvMYepbiC1hNRA@mail.gmail.com>
Message-ID: <CAHxyDdNw_zQKNWcXwX8T=yvr5wKt+eP_kHLswZnSRRd-h9CKoQ@mail.gmail.com>

On Mon, Apr 22, 2019 at 8:14 PM Amye Scavarda <amye at redhat.com> wrote:

> The Ceph and Gluster teams are joining forces to put on a Community
> Happy Hour in Boston on Tuesday, May 7th as part of Red Hat Summit.
>
>
I will be there at Gluster Booth in Red Hat Summit. If you, or your
colleagues/friends are attending, let me know. Would like to catch up for
sure!

-Amar


> More details, including RSVP at:
> https://cephandglusterhappyhour_rhsummit.eventbrite.com
> -- amye
>
> --
> Amye Scavarda | amye at redhat.com | Gluster Community Lead
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>

-- 
Amar Tumballi (amarts)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190501/ecf3ba96/attachment.html>

From atumball at redhat.com  Wed May  1 12:43:25 2019
From: atumball at redhat.com (Amar Tumballi Suryanarayan)
Date: Wed, 1 May 2019 18:13:25 +0530
Subject: [Gluster-users] adding thin arbiter
In-Reply-To: <CAHRDaUEJmoMyvwBzhxje=J+mffmxekfJ36BPfJnbQagJBTk45w@mail.gmail.com>
References: <abd4b6694b8f1067b296cc7425e99fb4@headdesk.me>
	<CAHRDaUEJmoMyvwBzhxje=J+mffmxekfJ36BPfJnbQagJBTk45w@mail.gmail.com>
Message-ID: <CAHxyDdNXV4+nq0Aj3MF4wmDFqiCetQvaCtsk=LC6EBiUyXxd4Q@mail.gmail.com>

On Mon, Apr 22, 2019 at 3:12 PM Karthik Subrahmanya <ksubrahm at redhat.com>
wrote:

> Hi,
>
> Currently we do not have support for converting an existing volume to a
> thin-arbiter volume. It is also not supported to replace the thin-arbiter
> brick with a new one.
> You can create a fresh thin arbiter volume using GD2 framework and play
> around that. Feel free to share your experience with thin-arbiter.
> The GD1 CLIs are being implemented. We will keep things posted on this
> list as and when they are ready to consume.
>
>
Effort on this can be found @ https://review.gluster.org/22612


> Regards,
> Karthik
>
> On Fri, Apr 19, 2019 at 8:39 PM <xpk at headdesk.me> wrote:
>
>> Hi guys,
>>
>> On an existing volume, I have a volume with 3 replica. One of them is an
>> arbiter. Is there a way to change the arbiter to a thin-arbiter? I tried
>> removing the arbiter brick and add it back, but the add-brick command
>> does't take the --thin-arbiter option.
>>
>> xpk
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users


-- 
Amar Tumballi (amarts)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190501/00da0ab4/attachment.html>

From atumball at redhat.com  Wed May  1 12:46:23 2019
From: atumball at redhat.com (Amar Tumballi Suryanarayan)
Date: Wed, 1 May 2019 18:16:23 +0530
Subject: [Gluster-users] Hard Failover with Samba and Glusterfs
In-Reply-To: <CAJyj9j-ZWmEBjfVDkdTU8R2BG+d-16tnCowD-LMUhmEyjGxeXA@mail.gmail.com>
References: <CAJyj9j-ZWmEBjfVDkdTU8R2BG+d-16tnCowD-LMUhmEyjGxeXA@mail.gmail.com>
Message-ID: <CAHxyDdN1j+TH9FLOorUvU0H=n_sgmQYyN4B+HGLF8sWRk_EfjA@mail.gmail.com>

On Wed, Apr 17, 2019 at 1:33 PM David Spisla <spisla80 at gmail.com> wrote:

> Dear Gluster Community,
>
> I have this setup: 4-Node Glusterfs v5.5 Cluster, using SAMBA/CTDB v4.8 to
> access the volumes (each node has a VIP)
>
> I was testing this failover scenario:
>
> 1. Start Writing 940 GB with small files (64K-100K)from a Win10 Client to
> node1
> 2. During the write process I hardly shutdown node1  (where the client is
> connect via VIP) by turn off the power
>
> My expectation is, that the write process stops and after a while the
> Win10 Client offers me a Retry, so I can continue the write on different
> node (which has now the VIP of node1).
> In past time I did this observation, but now the system shows a strange
> bahaviour:
>
> The Win10 Client do nothing and the Explorer freezes, in the backend CTDB
> can not perform the failover and throws errors. The glusterd from node2 and
> node3 logs this messages:
>
>> [2019-04-16 14:47:31.828323] W [glusterd-locks.c:795:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x24349) [0x7f1a62fcb349] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x2d950) [0x7f1a62fd4950] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0xe0359) [0x7f1a63087359] ) 0-management: Lock for vol archive1 not held
>> [2019-04-16 14:47:31.828350] W [MSGID: 106117] [glusterd-handler.c:6451:__glusterd_peer_rpc_notify] 0-management: Lock not released for archive1
>> [2019-04-16 14:47:31.828369] W [glusterd-locks.c:795:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x24349) [0x7f1a62fcb349] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x2d950) [0x7f1a62fd4950] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0xe0359) [0x7f1a63087359] ) 0-management: Lock for vol archive2 not held
>> [2019-04-16 14:47:31.828376] W [MSGID: 106117] [glusterd-handler.c:6451:__glusterd_peer_rpc_notify] 0-management: Lock not released for archive2
>> [2019-04-16 14:47:31.828412] W [glusterd-locks.c:795:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x24349) [0x7f1a62fcb349] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x2d950) [0x7f1a62fd4950] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0xe0359) [0x7f1a63087359] ) 0-management: Lock for vol gluster_shared_storage not held
>> [2019-04-16 14:47:31.828423] W [MSGID: 106117] [glusterd-handler.c:6451:__glusterd_peer_rpc_notify] 0-management: Lock not released for gluster_shared_storage
>>
>>
> *In my oponion Samba/CTDB can not perform the failover correctly and
> continue the write process because glusterfs didn't released the lock.*
> What do you think? It seems to me like a bug because in past time the
> failover works correctly.
>
>
Thanks for the report David. It surely looks like a bug, and I would let
some experts on this domain answer the question. One request on such thing
is to file a bug (preferred) or github issue, so it can be present in
system.


> Regards
> David Spisla
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users


-- 
Amar Tumballi (amarts)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190501/e1e8c7e6/attachment.html>

From atumball at redhat.com  Wed May  1 12:49:21 2019
From: atumball at redhat.com (Amar Tumballi Suryanarayan)
Date: Wed, 1 May 2019 18:19:21 +0530
Subject: [Gluster-users] gluster mountbroker failed after upgrade to
 gluster 6
In-Reply-To: <b8359745-3cbb-aa64-da74-2aa8ac520e37@forumZFD.de>
References: <b8359745-3cbb-aa64-da74-2aa8ac520e37@forumZFD.de>
Message-ID: <CAHxyDdPmtiuFsoH22rTxmref2079Kpcmk5rBNjAn+knpQbQAfw@mail.gmail.com>

Few questions inline.

On Fri, Apr 12, 2019 at 1:09 PM Benedikt Kale? <benedikt.kaless at forumzfd.de>
wrote:

> |Hi,|
>
> |I updated to gluster to gluster 6 and now the geo-replication remains
> in status "Faulty".
>

>From which version did you upgrade? And what does the volume info look like
? (Helps us to understand if this is something we have already tested or
not).


> |
>
> |If I run a "gluster-mountbroker status" I get:
> |
>
> |Traceback (most recent call last):
>   File "/usr/sbin/gluster-mountbroker", line 396, in <module>
>     runcli()
>   File "/usr/lib/python3/dist-packages/gluster/cliutils/cliutils.py",
> line 225, in runcli
>     cls.run(args)
>   File "/usr/sbin/gluster-mountbroker", line 275, in run
>     out = execute_in_peers("node-status")
>   File "/usr/lib/python3/dist-packages/gluster/cliutils/cliutils.py",
> line 127, in execute_in_peers
>     raise GlusterCmdException((rc, out, err, " ".join(cmd)))
> gluster.cliutils.cliutils.GlusterCmdException: (1, '', 'Unable to end.
> Error : Success\n', 'gluster system:: execute mountbroker.py node-status')
> |
>
> |What can I do: set up the georeplication again?|
>
>
Sorry for delay, and we will surely try to get you back to normal state.
Can you check the logs in /var/log/glusterfs/geo-replication/* and see if
there is anything concerning there? That would help in understanding the
situation.

-Amar


> |Best regards|
>
> |Benedikt
> |
>
> |
> |
>
> --
> ?forumZFD
> Entschieden f?r Frieden|Committed to Peace
>
> Benedikt Kale?
> Leiter Team IT|Head team IT
>
> Forum Ziviler Friedensdienst e.V.|Forum Civil Peace Service
> Am K?lner Brett 8 | 50825 K?ln | Germany
>
> Tel 0221 91273233 | Fax 0221 91273299 |
> http://www.forumZFD.de
>
> Vorstand nach ? 26 BGB, einzelvertretungsberechtigt|Executive Board:
> Oliver Knabe (Vorsitz|Chair), Sonja Wiekenberg-Mlalandle, Alexander Mauz
> VR 17651 Amtsgericht K?ln
>
> Spenden|Donations: IBAN DE37 3702 0500 0008 2401 01 BIC BFSWDE33XXX
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users


-- 
Amar Tumballi (amarts)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190501/ac2cd819/attachment.html>

From atumball at redhat.com  Wed May  1 12:55:01 2019
From: atumball at redhat.com (Amar Tumballi Suryanarayan)
Date: Wed, 1 May 2019 18:25:01 +0530
Subject: [Gluster-users] performance - what can I expect
In-Reply-To: <8f150899-321b-f184-978c-9b7b01e6fb39@dalco.ch>
References: <ed80c396-f2c1-c629-8496-83faa6c7eaf4@dalco.ch>
	<381efa03-78b3-e244-9f52-054b357b5d57@dalco.ch>
	<8f150899-321b-f184-978c-9b7b01e6fb39@dalco.ch>
Message-ID: <CAHxyDdPLcfHvkPZqJ86=rj=c64+pf2smU22to5WCdrBm5xcx5w@mail.gmail.com>

Hi Pascal,

Sorry for complete delay in this one. And thanks for testing out in
different scenarios.  Few questions before others can have a look and
advice you.

1. What is the volume info output ?

2. Do you see any concerning logs in glusterfs log files?

3. Please use `gluster volume profile` while running the tests, and that
gives a lot of information.

4. Considering you are using glusterfs-6.0, please take statedump of client
process (on any node) before and after the test, so we can analyze the
latency information of each translators.

With these information, I hope we will be in a better state to answer the
questions.


On Wed, Apr 10, 2019 at 3:45 PM Pascal Suter <pascal.suter at dalco.ch> wrote:

> i continued my testing with 5 clients, all attached over 100Gbit/s
> omni-path via IP over IB. when i run the same iozone benchmark across
> all 5 clients where gluster is mounted using the glusterfs client, i get
> an aggretated write throughput of only about 400GB/s and an aggregated
> read throughput of 1.5GB/s. Each node was writing a single 200Gb file in
> 16MB chunks and the files where distributed across all three bricks on
> the server.
>
> the connection was established over Omnipath for sure, as there is no
> other link between the nodes and server.
>
> i have no clue what i'm doing wrong here. i can't believe that this is a
> normal performance people would expect to see from gluster. i guess
> nobody would be using it if it was this slow.
>
> again, when written dreictly to the xfs filesystem on the bricks, i get
> over 6GB/s read and write throughput using the same benchmark.
>
> any advise is appreciated
>
> cheers
>
> Pascal
>
> On 04.04.19 12:03, Pascal Suter wrote:
> > I just noticed i left the most important parameters out :)
> >
> > here's the write command with filesize and recordsize in it as well :)
> >
> > ./iozone -i 0 -t 1 -F /mnt/gluster/storage/thread1 -+n -c -C -e -I -w
> > -+S 0 -s 200G -r 16384k
> >
> > also i ran the benchmark without direct_io which resulted in an even
> > worse performance.
> >
> > i also tried to mount the gluster volume via nfs-ganesha which further
> > reduced throughput down to about 450MB/s
> >
> > if i run the iozone benchmark with 3 threads writing to all three
> > bricks directly (from the xfs filesystem) i get throughputs of around
> > 6GB/s .. if I run the same benchmark through gluster mounted locally
> > using the fuse client and with enough threads so that each brick gets
> > at least one file written to it, i end up seing throughputs around
> > 1.5GB/s .. that's a 4x decrease in performance. at it actually is the
> > same if i run the benchmark with less threads and files only get
> > written to two out of three bricks.
> >
> > cpu load on the server is around 25% by the way, nicely distributed
> > across all available cores.
> >
> > i can't believe that gluster should really be so slow and everybody is
> > just happily using it. any hints on what i'm doing wrong are very
> > welcome.
> >
> > i'm using gluster 6.0 by the way.
> >
> > regards
> >
> > Pascal
> >
> > On 03.04.19 12:28, Pascal Suter wrote:
> >> Hi all
> >>
> >> I am currently testing gluster on a single server. I have three
> >> bricks, each a hardware RAID6 volume with thin provisioned LVM that
> >> was aligned to the RAID and then formatted with xfs.
> >>
> >> i've created a distributed volume so that entire files get
> >> distributed across my three bricks.
> >>
> >> first I ran a iozone benchmark across each brick testing the read and
> >> write perofrmance of a single large file per brick
> >>
> >> i then mounted my gluster volume locally and ran another iozone run
> >> with the same parameters writing a single file. the file went to
> >> brick 1 which, when used driectly, would write with 2.3GB/s and read
> >> with 1.5GB/s. however, through gluster i got only 800MB/s read and
> >> 750MB/s write throughput
> >>
> >> another run with two processes each writing a file, where one file
> >> went to the first brick and the other file to the second brick (which
> >> by itself when directly accessed wrote at 2.8GB/s and read at
> >> 2.7GB/s) resulted in 1.2GB/s of aggregated write and also aggregated
> >> read throughput.
> >>
> >> Is this a normal performance i can expect out of a glusterfs or is it
> >> worth tuning in order to really get closer to the actual brick
> >> filesystem performance?
> >>
> >> here are the iozone commands i use for writing and reading.. note
> >> that i am using directIO in order to make sure i don't get fooled by
> >> cache :)
> >>
> >> ./iozone -i 0 -t 1 -F /mnt/brick${b}/thread1 -+n -c -C -e -I -w -+S 0
> >> -s $filesize -r $recordsize > iozone-brick${b}-write.txt
> >>
> >> ./iozone -i 1 -t 1 -F /mnt/brick${b}/thread1 -+n -c -C -e -I -w -+S 0
> >> -s $filesize -r $recordsize > iozone-brick${b}-read.txt
> >>
> >> cheers
> >>
> >> Pascal
> >>
> >> _______________________________________________
> >> Gluster-users mailing list
> >> Gluster-users at gluster.org
> >> https://lists.gluster.org/mailman/listinfo/gluster-users
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > https://lists.gluster.org/mailman/listinfo/gluster-users
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>

-- 
Amar Tumballi (amarts)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190501/19c6cf32/attachment.html>

From sankarshan.mukhopadhyay at gmail.com  Thu May  2 02:51:18 2019
From: sankarshan.mukhopadhyay at gmail.com (Sankarshan Mukhopadhyay)
Date: Thu, 2 May 2019 08:21:18 +0530
Subject: [Gluster-users] Posting a set of conversations around
	troubleshooting Gluster
Message-ID: <CAJWA-5ZkWaFBKSAA9C2eBG9qCrif4a4sPQRzS5jY+mvterJ7Wg@mail.gmail.com>

<https://www.youtube.com/watch?v=_3L6fB93mBs&list=PLUjCssFKEMhXvdC3kgsi_1dyHOCxpwbhJ>
is the link to the play list. We are at this point 2 episodes in.

I'd like to (a) keep this first pass as basic introduction to
troubleshooting (b) focus on the components which seem complicated
enough.

Requesting for feedback on which components we should cover next.
Please reply to this thread. Additionally, as an administrator/user,
if you have a set of scripts/tools which you use to troubleshoot and
would like to talk about them, let me know and we will set up a
conversation.

From pascal.suter at dalco.ch  Thu May  2 07:51:12 2019
From: pascal.suter at dalco.ch (Pascal Suter)
Date: Thu, 2 May 2019 09:51:12 +0200
Subject: [Gluster-users] performance - what can I expect
In-Reply-To: <CAHxyDdPLcfHvkPZqJ86=rj=c64+pf2smU22to5WCdrBm5xcx5w@mail.gmail.com>
References: <ed80c396-f2c1-c629-8496-83faa6c7eaf4@dalco.ch>
	<381efa03-78b3-e244-9f52-054b357b5d57@dalco.ch>
	<8f150899-321b-f184-978c-9b7b01e6fb39@dalco.ch>
	<CAHxyDdPLcfHvkPZqJ86=rj=c64+pf2smU22to5WCdrBm5xcx5w@mail.gmail.com>
Message-ID: <263c3d8d-d3ab-f052-85e1-ad7ade4073d7@dalco.ch>

Hi Amar

thanks for rolling this back up. Actually i have done some more 
benchmarking and fiddled with the config to finally reach a performance 
figure i could live with. I now can squeeze about 3GB/s out of that 
server which seems to be close to what i can get out of its network 
uplink (using IP over Omni-Path). The system is now set up and in 
production so i can't run any benchmarks on it anymore but i will get 
back at benchmarking in the near future to test some storage related 
hardware, and i will try it with gluster on top again.

embarassingly the biggest performance issue was that the default 
installation of the server was running the "performance" profile of 
tuned. once i switched it to "throughput-performance" performance 
increased dramatically.

the volume info now looks pretty unspectacular:

Volume Name: storage
Type: Distribute
Volume ID: c81c7e46-add5-4d88-9945-24cf7947ef8c
Status: Started
Snapshot Count: 0
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: themis01:/data/brick1/brick
Brick2: themis01:/data/brick2/brick
Brick3: themis01:/data/brick3/brick
Options Reconfigured:
transport.address-family: inet
nfs.disable: on

thanks for pointing out gluster volume profile, i'll have a go with it 
during my next benchmarking session. so far i was using iostat to track 
brick-level io performance during my benchmarks.

the main question i wanted to ask was, if there is a general rule of 
thumb, how much throughput of the original bare brick throughput would 
be expected to be left over once gluster is added on top of it. to give 
you an example: when I use a parallel filesystem like Lustre or BeeGFS i 
usually expect to get at least about 85% of the raw storage target 
throughput as aggregated bandwidth over a multi-node test out of my 
Lustre or BeeGFS setup. I consider any numbers below that to be too low 
and therefore will have to dig into performance tuning to find the 
bottle neck.

i was hoping someone could give me a rule-of-thumb number for a simple 
distributed gluster setup, like that 85% number i've established for a 
parallel file system.

so at the moment my takeaway is, in a simple distributed volume across 3 
bricks with an aggregated bandwidth of 6GB/s i can expect to get about 
3GB/s aggregated bandwith out of the gluster mount, given there are no 
bottle necks in the network. the 3GB/s is a number conducted under ideal 
circumstances, meaning, i primed the storage to make sure i could run a 
benchmark run using three nodes, with each node running a single thread 
writing to a single file and each file was located on another bricke. 
this yielded the maximum perfomance as this was pure streaming IO 
without any overlapping file writing to the bricks other than the 
overhead created by gluster's own internal mechanisms.

Interestingly, the performance didn't drop much when i added nodes and 
threads and introduced more random-ish io by having several processes 
write to the same brick. So I assume, what "eats" up the 50% performance 
in the end is probably Gluster writing all these additional hidden files 
which I assume is some sort of Metadata. This causes additional IO on 
the disk that i'm streaming my one file to and therefore turns my 
streaming IO into a random io load for the raid controller and 
underlying harddisks which on spinning disks would have about the 
performance impact i was seing in my benchmarks.

I have yet to try gluster on a Flash based brick and test its 
performance there.. i would expect to see a better "efficiency" than the 
50% i've measured on this system here as random io vs. streaming io 
should not make such a difference (or acutally almost no difference at 
all) on a flash based storage. but that's? me guessing now.

so for the moment i'm fine but i would still be interested in hearing 
ball-park figure "efficiency" numbers from others using gluster in a 
similar setup.

cheers

Pascal

On 01.05.19 14:55, Amar Tumballi Suryanarayan wrote:
> Hi Pascal,
>
> Sorry for complete delay in this one. And thanks for testing out in 
> different scenarios.? Few questions before others can have a look and 
> advice you.
>
> 1. What is the volume info output ?
>
> 2. Do you see any concerning logs in glusterfs log files?
>
> 3. Please use `gluster volume profile` while running the tests, and 
> that gives a lot of information.
>
> 4. Considering you are using glusterfs-6.0, please take statedump of 
> client process (on any node) before and after the test, so we can 
> analyze the latency information of each translators.
>
> With these information, I hope we will be in a better state to answer 
> the questions.
>
>
> On Wed, Apr 10, 2019 at 3:45 PM Pascal Suter <pascal.suter at dalco.ch 
> <mailto:pascal.suter at dalco.ch>> wrote:
>
>     i continued my testing with 5 clients, all attached over 100Gbit/s
>     omni-path via IP over IB. when i run the same iozone benchmark across
>     all 5 clients where gluster is mounted using the glusterfs client,
>     i get
>     an aggretated write throughput of only about 400GB/s and an
>     aggregated
>     read throughput of 1.5GB/s. Each node was writing a single 200Gb
>     file in
>     16MB chunks and the files where distributed across all three
>     bricks on
>     the server.
>
>     the connection was established over Omnipath for sure, as there is no
>     other link between the nodes and server.
>
>     i have no clue what i'm doing wrong here. i can't believe that
>     this is a
>     normal performance people would expect to see from gluster. i guess
>     nobody would be using it if it was this slow.
>
>     again, when written dreictly to the xfs filesystem on the bricks,
>     i get
>     over 6GB/s read and write throughput using the same benchmark.
>
>     any advise is appreciated
>
>     cheers
>
>     Pascal
>
>     On 04.04.19 12:03, Pascal Suter wrote:
>     > I just noticed i left the most important parameters out :)
>     >
>     > here's the write command with filesize and recordsize in it as
>     well :)
>     >
>     > ./iozone -i 0 -t 1 -F /mnt/gluster/storage/thread1 -+n -c -C -e
>     -I -w
>     > -+S 0 -s 200G -r 16384k
>     >
>     > also i ran the benchmark without direct_io which resulted in an
>     even
>     > worse performance.
>     >
>     > i also tried to mount the gluster volume via nfs-ganesha which
>     further
>     > reduced throughput down to about 450MB/s
>     >
>     > if i run the iozone benchmark with 3 threads writing to all three
>     > bricks directly (from the xfs filesystem) i get throughputs of
>     around
>     > 6GB/s .. if I run the same benchmark through gluster mounted
>     locally
>     > using the fuse client and with enough threads so that each brick
>     gets
>     > at least one file written to it, i end up seing throughputs around
>     > 1.5GB/s .. that's a 4x decrease in performance. at it actually
>     is the
>     > same if i run the benchmark with less threads and files only get
>     > written to two out of three bricks.
>     >
>     > cpu load on the server is around 25% by the way, nicely distributed
>     > across all available cores.
>     >
>     > i can't believe that gluster should really be so slow and
>     everybody is
>     > just happily using it. any hints on what i'm doing wrong are very
>     > welcome.
>     >
>     > i'm using gluster 6.0 by the way.
>     >
>     > regards
>     >
>     > Pascal
>     >
>     > On 03.04.19 12:28, Pascal Suter wrote:
>     >> Hi all
>     >>
>     >> I am currently testing gluster on a single server. I have three
>     >> bricks, each a hardware RAID6 volume with thin provisioned LVM
>     that
>     >> was aligned to the RAID and then formatted with xfs.
>     >>
>     >> i've created a distributed volume so that entire files get
>     >> distributed across my three bricks.
>     >>
>     >> first I ran a iozone benchmark across each brick testing the
>     read and
>     >> write perofrmance of a single large file per brick
>     >>
>     >> i then mounted my gluster volume locally and ran another iozone
>     run
>     >> with the same parameters writing a single file. the file went to
>     >> brick 1 which, when used driectly, would write with 2.3GB/s and
>     read
>     >> with 1.5GB/s. however, through gluster i got only 800MB/s read and
>     >> 750MB/s write throughput
>     >>
>     >> another run with two processes each writing a file, where one file
>     >> went to the first brick and the other file to the second brick
>     (which
>     >> by itself when directly accessed wrote at 2.8GB/s and read at
>     >> 2.7GB/s) resulted in 1.2GB/s of aggregated write and also
>     aggregated
>     >> read throughput.
>     >>
>     >> Is this a normal performance i can expect out of a glusterfs or
>     is it
>     >> worth tuning in order to really get closer to the actual brick
>     >> filesystem performance?
>     >>
>     >> here are the iozone commands i use for writing and reading.. note
>     >> that i am using directIO in order to make sure i don't get
>     fooled by
>     >> cache :)
>     >>
>     >> ./iozone -i 0 -t 1 -F /mnt/brick${b}/thread1 -+n -c -C -e -I -w
>     -+S 0
>     >> -s $filesize -r $recordsize > iozone-brick${b}-write.txt
>     >>
>     >> ./iozone -i 1 -t 1 -F /mnt/brick${b}/thread1 -+n -c -C -e -I -w
>     -+S 0
>     >> -s $filesize -r $recordsize > iozone-brick${b}-read.txt
>     >>
>     >> cheers
>     >>
>     >> Pascal
>     >>
>     >> _______________________________________________
>     >> Gluster-users mailing list
>     >> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>     >> https://lists.gluster.org/mailman/listinfo/gluster-users
>     > _______________________________________________
>     > Gluster-users mailing list
>     > Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>     > https://lists.gluster.org/mailman/listinfo/gluster-users
>     _______________________________________________
>     Gluster-users mailing list
>     Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>     https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
>
> -- 
> Amar Tumballi (amarts)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190502/5af3500e/attachment.html>

From atumball at redhat.com  Thu May  2 08:30:46 2019
From: atumball at redhat.com (Amar Tumballi Suryanarayan)
Date: Thu, 2 May 2019 14:00:46 +0530
Subject: [Gluster-users] performance - what can I expect
In-Reply-To: <263c3d8d-d3ab-f052-85e1-ad7ade4073d7@dalco.ch>
References: <ed80c396-f2c1-c629-8496-83faa6c7eaf4@dalco.ch>
	<381efa03-78b3-e244-9f52-054b357b5d57@dalco.ch>
	<8f150899-321b-f184-978c-9b7b01e6fb39@dalco.ch>
	<CAHxyDdPLcfHvkPZqJ86=rj=c64+pf2smU22to5WCdrBm5xcx5w@mail.gmail.com>
	<263c3d8d-d3ab-f052-85e1-ad7ade4073d7@dalco.ch>
Message-ID: <CAHxyDdNV2FuQQCY_ind5v-Qv0sGsPFNM1xzLUX3vVrxkC53oxA@mail.gmail.com>

On Thu, May 2, 2019 at 1:21 PM Pascal Suter <pascal.suter at dalco.ch> wrote:

> Hi Amar
>
> thanks for rolling this back up. Actually i have done some more
> benchmarking and fiddled with the config to finally reach a performance
> figure i could live with. I now can squeeze about 3GB/s out of that server
> which seems to be close to what i can get out of its network uplink (using
> IP over Omni-Path). The system is now set up and in production so i can't
> run any benchmarks on it anymore but i will get back at benchmarking in the
> near future to test some storage related hardware, and i will try it with
> gluster on top again.
>
> embarassingly the biggest performance issue was that the default
> installation of the server was running the "performance" profile of tuned.
> once i switched it to "throughput-performance" performance increased
> dramatically.
>
> the volume info now looks pretty unspectacular:
>
> Volume Name: storage
> Type: Distribute
> Volume ID: c81c7e46-add5-4d88-9945-24cf7947ef8c
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 3
> Transport-type: tcp
> Bricks:
> Brick1: themis01:/data/brick1/brick
> Brick2: themis01:/data/brick2/brick
> Brick3: themis01:/data/brick3/brick
> Options Reconfigured:
> transport.address-family: inet
> nfs.disable: on
>
> thanks for pointing out gluster volume profile, i'll have a go with it
> during my next benchmarking session. so far i was using iostat to track
> brick-level io performance during my benchmarks.
>
> the main question i wanted to ask was, if there is a general rule of
> thumb, how much throughput of the original bare brick throughput would be
> expected to be left over once gluster is added on top of it. to give you an
> example: when I use a parallel filesystem like Lustre or BeeGFS i usually
> expect to get at least about 85% of the raw storage target throughput as
> aggregated bandwidth over a multi-node test out of my Lustre or BeeGFS
> setup. I consider any numbers below that to be too low and therefore will
> have to dig into performance tuning to find the bottle neck.
>
> i was hoping someone could give me a rule-of-thumb number for a simple
> distributed gluster setup, like that 85% number i've established for a
> parallel file system.
>
> so at the moment my takeaway is, in a simple distributed volume across 3
> bricks with an aggregated bandwidth of 6GB/s i can expect to get about
> 3GB/s aggregated bandwith out of the gluster mount, given there are no
> bottle necks in the network. the 3GB/s is a number conducted under ideal
> circumstances, meaning, i primed the storage to make sure i could run a
> benchmark run using three nodes, with each node running a single thread
> writing to a single file and each file was located on another bricke. this
> yielded the maximum perfomance as this was pure streaming IO without any
> overlapping file writing to the bricks other than the overhead created by
> gluster's own internal mechanisms.
>
> Interestingly, the performance didn't drop much when i added nodes and
> threads and introduced more random-ish io by having several processes write
> to the same brick. So I assume, what "eats" up the 50% performance in the
> end is probably Gluster writing all these additional hidden files which I
> assume is some sort of Metadata. This causes additional IO on the disk that
> i'm streaming my one file to and therefore turns my streaming IO into a
> random io load for the raid controller and underlying harddisks which on
> spinning disks would have about the performance impact i was seing in my
> benchmarks.
>

Thanks for all these details.

I have yet to try gluster on a Flash based brick and test its performance
> there.. i would expect to see a better "efficiency" than the 50% i've
> measured on this system here as random io vs. streaming io should not make
> such a difference (or acutally almost no difference at all) on a flash
> based storage. but that's  me guessing now.
>
> so for the moment i'm fine but i would still be interested in hearing
> ball-park figure "efficiency" numbers from others using gluster in a
> similar setup.
>

We couldn't get a single number on this yet. Mainly because of multiple
reasons.
* Gluster's volume type has different behavior (performance wise)
* Network plays more significant role than that of disk performance. Mostly
latency involved in n/w than the throughput.
* Different work loads (like create heavy Vs read/write, sequential
read/write Vs random read/write) needs different options (currently they
are not auto-tuned).
* If one has good n/w and disk speed, even back end filesystem
configuration (because of the layout we have with gfid etc) too matter a
bit.

Best thing is to understand the workload first, and then tuning for it (at
present).

cheers
>
> Pascal
> On 01.05.19 14:55, Amar Tumballi Suryanarayan wrote:
>
> Hi Pascal,
>
> Sorry for complete delay in this one. And thanks for testing out in
> different scenarios.  Few questions before others can have a look and
> advice you.
>
> 1. What is the volume info output ?
>
> 2. Do you see any concerning logs in glusterfs log files?
>
> 3. Please use `gluster volume profile` while running the tests, and that
> gives a lot of information.
>
> 4. Considering you are using glusterfs-6.0, please take statedump of
> client process (on any node) before and after the test, so we can analyze
> the latency information of each translators.
>
> With these information, I hope we will be in a better state to answer the
> questions.
>
>
> On Wed, Apr 10, 2019 at 3:45 PM Pascal Suter <pascal.suter at dalco.ch>
> wrote:
>
>> i continued my testing with 5 clients, all attached over 100Gbit/s
>> omni-path via IP over IB. when i run the same iozone benchmark across
>> all 5 clients where gluster is mounted using the glusterfs client, i get
>> an aggretated write throughput of only about 400GB/s and an aggregated
>> read throughput of 1.5GB/s. Each node was writing a single 200Gb file in
>> 16MB chunks and the files where distributed across all three bricks on
>> the server.
>>
>> the connection was established over Omnipath for sure, as there is no
>> other link between the nodes and server.
>>
>> i have no clue what i'm doing wrong here. i can't believe that this is a
>> normal performance people would expect to see from gluster. i guess
>> nobody would be using it if it was this slow.
>>
>> again, when written dreictly to the xfs filesystem on the bricks, i get
>> over 6GB/s read and write throughput using the same benchmark.
>>
>> any advise is appreciated
>>
>> cheers
>>
>> Pascal
>>
>> On 04.04.19 12:03, Pascal Suter wrote:
>> > I just noticed i left the most important parameters out :)
>> >
>> > here's the write command with filesize and recordsize in it as well :)
>> >
>> > ./iozone -i 0 -t 1 -F /mnt/gluster/storage/thread1 -+n -c -C -e -I -w
>> > -+S 0 -s 200G -r 16384k
>> >
>> > also i ran the benchmark without direct_io which resulted in an even
>> > worse performance.
>> >
>> > i also tried to mount the gluster volume via nfs-ganesha which further
>> > reduced throughput down to about 450MB/s
>> >
>> > if i run the iozone benchmark with 3 threads writing to all three
>> > bricks directly (from the xfs filesystem) i get throughputs of around
>> > 6GB/s .. if I run the same benchmark through gluster mounted locally
>> > using the fuse client and with enough threads so that each brick gets
>> > at least one file written to it, i end up seing throughputs around
>> > 1.5GB/s .. that's a 4x decrease in performance. at it actually is the
>> > same if i run the benchmark with less threads and files only get
>> > written to two out of three bricks.
>> >
>> > cpu load on the server is around 25% by the way, nicely distributed
>> > across all available cores.
>> >
>> > i can't believe that gluster should really be so slow and everybody is
>> > just happily using it. any hints on what i'm doing wrong are very
>> > welcome.
>> >
>> > i'm using gluster 6.0 by the way.
>> >
>> > regards
>> >
>> > Pascal
>> >
>> > On 03.04.19 12:28, Pascal Suter wrote:
>> >> Hi all
>> >>
>> >> I am currently testing gluster on a single server. I have three
>> >> bricks, each a hardware RAID6 volume with thin provisioned LVM that
>> >> was aligned to the RAID and then formatted with xfs.
>> >>
>> >> i've created a distributed volume so that entire files get
>> >> distributed across my three bricks.
>> >>
>> >> first I ran a iozone benchmark across each brick testing the read and
>> >> write perofrmance of a single large file per brick
>> >>
>> >> i then mounted my gluster volume locally and ran another iozone run
>> >> with the same parameters writing a single file. the file went to
>> >> brick 1 which, when used driectly, would write with 2.3GB/s and read
>> >> with 1.5GB/s. however, through gluster i got only 800MB/s read and
>> >> 750MB/s write throughput
>> >>
>> >> another run with two processes each writing a file, where one file
>> >> went to the first brick and the other file to the second brick (which
>> >> by itself when directly accessed wrote at 2.8GB/s and read at
>> >> 2.7GB/s) resulted in 1.2GB/s of aggregated write and also aggregated
>> >> read throughput.
>> >>
>> >> Is this a normal performance i can expect out of a glusterfs or is it
>> >> worth tuning in order to really get closer to the actual brick
>> >> filesystem performance?
>> >>
>> >> here are the iozone commands i use for writing and reading.. note
>> >> that i am using directIO in order to make sure i don't get fooled by
>> >> cache :)
>> >>
>> >> ./iozone -i 0 -t 1 -F /mnt/brick${b}/thread1 -+n -c -C -e -I -w -+S 0
>> >> -s $filesize -r $recordsize > iozone-brick${b}-write.txt
>> >>
>> >> ./iozone -i 1 -t 1 -F /mnt/brick${b}/thread1 -+n -c -C -e -I -w -+S 0
>> >> -s $filesize -r $recordsize > iozone-brick${b}-read.txt
>> >>
>> >> cheers
>> >>
>> >> Pascal
>> >>
>> >> _______________________________________________
>> >> Gluster-users mailing list
>> >> Gluster-users at gluster.org
>> >> https://lists.gluster.org/mailman/listinfo/gluster-users
>> > _______________________________________________
>> > Gluster-users mailing list
>> > Gluster-users at gluster.org
>> > https://lists.gluster.org/mailman/listinfo/gluster-users
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>
> --
> Amar Tumballi (amarts)
>
>

-- 
Amar Tumballi (amarts)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190502/72bbfa83/attachment.html>

From joao.bauto at neuro.fchampalimaud.org  Thu May  2 09:54:44 2019
From: joao.bauto at neuro.fchampalimaud.org (=?UTF-8?B?Sm/Do28gQmHDunRv?=)
Date: Thu, 2 May 2019 10:54:44 +0100
Subject: [Gluster-users] parallel-readdir prevents directories and files
 listing - Bug 1670382
In-Reply-To: <CAHxyDdNTasiOdMAEu=BmMJOT7PpSXz2v0fQeWsDWTEyX8PP_dw@mail.gmail.com>
References: <CAEkxddLheiAj2YAqw8Hh-+6kPZe18B19TYVw=w69ktYcAAvS6w@mail.gmail.com>
	<CAHxyDdNTasiOdMAEu=BmMJOT7PpSXz2v0fQeWsDWTEyX8PP_dw@mail.gmail.com>
Message-ID: <CAEkxddKAZT2OhUFA-whHtp_8y+gGPUK=nZr3nwyBJPYQyP90-g@mail.gmail.com>

Thanks for the reply Amar.

Last I knew, we recommended to avoid fuse and samba shares on same volume
> (Mainly as we couldn't spend a lot of effort on testing the configuration).


Does this also apply to samba shares when using vfs glusterfs?

Anyways, we would treat the behavior as bug for sure. One possible path
> looking at below volume info is to disable 'stat-prefetch' option and see
> if it helps. Next option I would try is to disable readdir-ahead.


I'll try and give feedback.

Thanks,
Jo?o

Amar Tumballi Suryanarayan <atumball at redhat.com> escreveu no dia quarta,
1/05/2019 ?(s) 13:30:

>
>
> On Mon, Apr 29, 2019 at 3:56 PM Jo?o Ba?to <
> joao.bauto at neuro.fchampalimaud.org> wrote:
>
>> Hi,
>>
>> I have an 8 brick distributed volume where Windows and Linux clients
>> mount the volume via samba and headless compute servers using gluster
>> native fuse. With parallel-readdir on, if a Windows client creates a new
>> folder, the folder is indeed created but invisible to the Windows client.
>> Accessing the same samba share in a Linux client, the folder is again
>> visible and with normal behavior. The same folder is also visible when
>> mounting via gluster native fuse.
>>
>> The Windows client can list existing directories and rename them while,
>> for files, everything seems to be working fine.
>>
>> Gluster servers: CentOS 7.5 with Gluster 5.3 and Samba 4.8.3-4.el7.0.1
>> from @fasttrack
>> Clients tested: Windows 10, Ubuntu 18.10, CentOS 7.5
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=1670382
>>
>
> Thanks for the bug report. Will look into this, and get back.
>
> Last I knew, we recommended to avoid fuse and samba shares on same volume
> (Mainly as we couldn't spend a lot of effort on testing the configuration).
> Anyways, we would treat the behavior as bug for sure. One possible path
> looking at below volume info is to disable 'stat-prefetch' option and see
> if it helps. Next option I would try is to disable readdir-ahead.
>
> Regards,
> Amar
>
>
>> <https://bugzilla.redhat.com/show_bug.cgi?id=1670382>
>>
>> Volume Name: tank
>> Type: Distribute
>> Volume ID: 9582685f-07fa-41fd-b9fc-ebab3a6989cf
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 8
>> Transport-type: tcp
>> Bricks:
>> Brick1: swp-gluster-01:/tank/volume1/brick
>> Brick2: swp-gluster-02:/tank/volume1/brick
>> Brick3: swp-gluster-03:/tank/volume1/brick
>> Brick4: swp-gluster-04:/tank/volume1/brick
>> Brick5: swp-gluster-01:/tank/volume2/brick
>> Brick6: swp-gluster-02:/tank/volume2/brick
>> Brick7: swp-gluster-03:/tank/volume2/brick
>> Brick8: swp-gluster-04:/tank/volume2/brick
>> Options Reconfigured:
>> performance.parallel-readdir: on
>> performance.readdir-ahead: on
>> performance.cache-invalidation: on
>> performance.md-cache-timeout: 600
>> storage.batch-fsync-delay-usec: 0
>> performance.write-behind-window-size: 32MB
>> performance.stat-prefetch: on
>> performance.read-ahead: on
>> performance.read-ahead-page-count: 16
>> performance.rda-request-size: 131072
>> performance.quick-read: on
>> performance.open-behind: on
>> performance.nl-cache-timeout: 600
>> performance.nl-cache: on
>> performance.io-thread-count: 64
>> performance.io-cache: off
>> performance.flush-behind: on
>> performance.client-io-threads: off
>> performance.write-behind: off
>> performance.cache-samba-metadata: on
>> network.inode-lru-limit: 0
>> features.cache-invalidation-timeout: 600
>> features.cache-invalidation: on
>> cluster.readdir-optimize: on
>> cluster.lookup-optimize: on
>> client.event-threads: 4
>> server.event-threads: 16
>> features.quota-deem-statfs: on
>> nfs.disable: on
>> features.quota: on
>> features.inode-quota: on
>> cluster.enable-shared-storage: disable
>>
>> Cheers,
>>
>> Jo?o Ba?to
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
> --
> Amar Tumballi (amarts)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190502/54d2a74a/attachment.html>

From pkalever at redhat.com  Thu May  2 17:34:41 2019
From: pkalever at redhat.com (Prasanna Kalever)
Date: Thu, 2 May 2019 23:04:41 +0530
Subject: [Gluster-users] gluster-block v0.4 is alive!
Message-ID: <CANwsLLHaP7OV50wHbc0jSH0frvhLr9a2qNZf1BagoY1+q0S8jQ@mail.gmail.com>

Hello Gluster folks,

Gluster-block team is happy to announce the v0.4 release [1].

This is the new stable version of gluster-block, lots of new and
exciting features and interesting bug fixes are made available as part
of this release.
Please find the big list of release highlights and notable fixes at [2].

Details about installation can be found in the easy install guide at
[3]. Find the details about prerequisites and setup guide at [4].
If you are a new user, checkout the demo video attached in the README
doc [5], which will be a good source of intro to the project.
There are good examples about how to use gluster-block both in the man
pages [6] and test file [7] (also in the README).

gluster-block is part of fedora package collection, an updated package
with release version v0.4 will be soon made available. And the
community provided packages will be soon made available at [8].

Please spend a minute to report any kind of issue that comes to your
notice with this handy link [9].
We look forward to your feedback, which will help gluster-block get better!

We would like to thank all our users, contributors for bug filing and
fixes, also the whole team who involved in the huge effort with
pre-release testing.


[1] https://github.com/gluster/gluster-block
[2] https://github.com/gluster/gluster-block/releases
[3] https://github.com/gluster/gluster-block/blob/master/INSTALL
[4] https://github.com/gluster/gluster-block#usage
[5] https://github.com/gluster/gluster-block/blob/master/README.md
[6] https://github.com/gluster/gluster-block/tree/master/docs
[7] https://github.com/gluster/gluster-block/blob/master/tests/basic.t
[8] https://download.gluster.org/pub/gluster/gluster-block/
[9] https://github.com/gluster/gluster-block/issues/new

Cheers,
Team Gluster-Block!

From dcunningham at voisonics.com  Fri May  3 02:10:03 2019
From: dcunningham at voisonics.com (David Cunningham)
Date: Fri, 3 May 2019 14:10:03 +1200
Subject: [Gluster-users] Thin-arbiter questions
Message-ID: <CAHGbP-_X6WqQh+t41WfF+v72mG2QvyakyOokSiOBeuY43mgXiA@mail.gmail.com>

Hello,

We are setting up a thin-arbiter and hope someone can help with some
questions. We've been following the documentation from
https://docs.gluster.org/en/latest/Administrator%20Guide/Thin-Arbiter-Volumes/
.

1. What release of 5.x supports thin-arbiter? We tried a "gluster volume
create" with the --thin-arbiter option on 5.5 and got an "unrecognized
option --thin-arbiter" error.

2. The instruction to create a new volume with a thin-arbiter is clear. How
do you add a thin-arbiter to an already existing volume though?

3. The documentation suggests running glusterfsd manually to start the
thin-arbiter. Is there a service that can do this instead? I found a
mention of one in https://bugzilla.redhat.com/show_bug.cgi?id=1579786 but
it's not really documented.

Thanks in advance for your help,

-- 
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190503/5457b432/attachment.html>

From aspandey at redhat.com  Fri May  3 02:30:11 2019
From: aspandey at redhat.com (Ashish Pandey)
Date: Thu, 2 May 2019 22:30:11 -0400 (EDT)
Subject: [Gluster-users] Thin-arbiter questions
In-Reply-To: <CAHGbP-_X6WqQh+t41WfF+v72mG2QvyakyOokSiOBeuY43mgXiA@mail.gmail.com>
References: <CAHGbP-_X6WqQh+t41WfF+v72mG2QvyakyOokSiOBeuY43mgXiA@mail.gmail.com>
Message-ID: <1272256010.16170135.1556850611144.JavaMail.zimbra@redhat.com>

Hi David, 

Creation of thin-arbiter volume is currently supported by GD2 only. The command " glustercli " is available when glusterd2 is running. 
We are also working on providing thin-arbiter support on glusted however, it is not available right now. 
https://review.gluster.org/#/c/glusterfs/+/22612/ 

--- 
Ashish 

----- Original Message -----

From: "David Cunningham" <dcunningham at voisonics.com> 
To: gluster-users at gluster.org 
Sent: Friday, May 3, 2019 7:40:03 AM 
Subject: [Gluster-users] Thin-arbiter questions 

Hello, 

We are setting up a thin-arbiter and hope someone can help with some questions. We've been following the documentation from https://docs.gluster.org/en/latest/Administrator%20Guide/Thin-Arbiter-Volumes/ . 

1. What release of 5.x supports thin-arbiter? We tried a "gluster volume create" with the --thin-arbiter option on 5.5 and got an "unrecognized option --thin-arbiter" error. 

2. The instruction to create a new volume with a thin-arbiter is clear. How do you add a thin-arbiter to an already existing volume though? 

3. The documentation suggests running glusterfsd manually to start the thin-arbiter. Is there a service that can do this instead? I found a mention of one in https://bugzilla.redhat.com/show_bug.cgi?id=1579786 but it's not really documented. 

Thanks in advance for your help, 

-- 
David Cunningham, Voisonics Limited 
http://voisonics.com/ 
USA: +1 213 221 1092 
New Zealand: +64 (0)28 2558 3782 

_______________________________________________ 
Gluster-users mailing list 
Gluster-users at gluster.org 
https://lists.gluster.org/mailman/listinfo/gluster-users 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190502/2dbe6bc7/attachment.html>

From dcunningham at voisonics.com  Fri May  3 02:34:04 2019
From: dcunningham at voisonics.com (David Cunningham)
Date: Fri, 3 May 2019 14:34:04 +1200
Subject: [Gluster-users] Thin-arbiter questions
In-Reply-To: <1272256010.16170135.1556850611144.JavaMail.zimbra@redhat.com>
References: <CAHGbP-_X6WqQh+t41WfF+v72mG2QvyakyOokSiOBeuY43mgXiA@mail.gmail.com>
	<1272256010.16170135.1556850611144.JavaMail.zimbra@redhat.com>
Message-ID: <CAHGbP--Jyj1zwcR5bs3ntRQQMfUjfw0BK06rwj4RjRwUb02MTw@mail.gmail.com>

Hi Ashish,

Thanks very much for that reply. How stable is GD2? Is there even a vague
ETA on when it might be supported in gluster?


On Fri, 3 May 2019 at 14:30, Ashish Pandey <aspandey at redhat.com> wrote:

> Hi David,
>
> Creation of thin-arbiter volume is currently supported by GD2 only. The
> command "glustercli" is available when glusterd2 is running.
> We are also working on providing thin-arbiter support on glusted however,
> it is not available right now.
> https://review.gluster.org/#/c/glusterfs/+/22612/
>
> ---
> Ashish
>
> ------------------------------
> *From: *"David Cunningham" <dcunningham at voisonics.com>
> *To: *gluster-users at gluster.org
> *Sent: *Friday, May 3, 2019 7:40:03 AM
> *Subject: *[Gluster-users] Thin-arbiter questions
>
> Hello,
>
> We are setting up a thin-arbiter and hope someone can help with some
> questions. We've been following the documentation from
> https://docs.gluster.org/en/latest/Administrator%20Guide/Thin-Arbiter-Volumes/
> .
>
> 1. What release of 5.x supports thin-arbiter? We tried a "gluster volume
> create" with the --thin-arbiter option on 5.5 and got an "unrecognized
> option --thin-arbiter" error.
>
> 2. The instruction to create a new volume with a thin-arbiter is clear.
> How do you add a thin-arbiter to an already existing volume though?
>
> 3. The documentation suggests running glusterfsd manually to start the
> thin-arbiter. Is there a service that can do this instead? I found a
> mention of one in https://bugzilla.redhat.com/show_bug.cgi?id=1579786 but
> it's not really documented.
>
> Thanks in advance for your help,
>
> --
> David Cunningham, Voisonics Limited
> http://voisonics.com/
> USA: +1 213 221 1092
> New Zealand: +64 (0)28 2558 3782
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>

-- 
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190503/3ed0ec7c/attachment.html>

From aspandey at redhat.com  Fri May  3 02:43:25 2019
From: aspandey at redhat.com (Ashish Pandey)
Date: Thu, 2 May 2019 22:43:25 -0400 (EDT)
Subject: [Gluster-users] Thin-arbiter questions
In-Reply-To: <CAHGbP--Jyj1zwcR5bs3ntRQQMfUjfw0BK06rwj4RjRwUb02MTw@mail.gmail.com>
References: <CAHGbP-_X6WqQh+t41WfF+v72mG2QvyakyOokSiOBeuY43mgXiA@mail.gmail.com>
	<1272256010.16170135.1556850611144.JavaMail.zimbra@redhat.com>
	<CAHGbP--Jyj1zwcR5bs3ntRQQMfUjfw0BK06rwj4RjRwUb02MTw@mail.gmail.com>
Message-ID: <1331654608.16170402.1556851405061.JavaMail.zimbra@redhat.com>

David, 

I am adding members who are working on glusterd2 (Aravinda) and thin-arbiter support in glusterd (Vishal) and who can 
better reply on these questions. 

Patch for glusterd has been sent and it only requires reviews. I hope it should be completed in next 1 month or so. 
https://review.gluster.org/#/c/glusterfs/+/22612/ 

--- 
Ashish 

----- Original Message -----

From: "David Cunningham" <dcunningham at voisonics.com> 
To: "Ashish Pandey" <aspandey at redhat.com> 
Cc: gluster-users at gluster.org 
Sent: Friday, May 3, 2019 8:04:04 AM 
Subject: Re: [Gluster-users] Thin-arbiter questions 

Hi Ashish, 

Thanks very much for that reply. How stable is GD2? Is there even a vague ETA on when it might be supported in gluster? 


On Fri, 3 May 2019 at 14:30, Ashish Pandey < aspandey at redhat.com > wrote: 


Hi David, 

Creation of thin-arbiter volume is currently supported by GD2 only. The command " glustercli " is available when glusterd2 is running. 
We are also working on providing thin-arbiter support on glusted however, it is not available right now. 
https://review.gluster.org/#/c/glusterfs/+/22612/ 

--- 
Ashish 


From: "David Cunningham" < dcunningham at voisonics.com > 
To: gluster-users at gluster.org 
Sent: Friday, May 3, 2019 7:40:03 AM 
Subject: [Gluster-users] Thin-arbiter questions 

Hello, 

We are setting up a thin-arbiter and hope someone can help with some questions. We've been following the documentation from https://docs.gluster.org/en/latest/Administrator%20Guide/Thin-Arbiter-Volumes/ . 

1. What release of 5.x supports thin-arbiter? We tried a "gluster volume create" with the --thin-arbiter option on 5.5 and got an "unrecognized option --thin-arbiter" error. 

2. The instruction to create a new volume with a thin-arbiter is clear. How do you add a thin-arbiter to an already existing volume though? 

3. The documentation suggests running glusterfsd manually to start the thin-arbiter. Is there a service that can do this instead? I found a mention of one in https://bugzilla.redhat.com/show_bug.cgi?id=1579786 but it's not really documented. 

Thanks in advance for your help, 

-- 
David Cunningham, Voisonics Limited 
http://voisonics.com/ 
USA: +1 213 221 1092 
New Zealand: +64 (0)28 2558 3782 

_______________________________________________ 
Gluster-users mailing list 
Gluster-users at gluster.org 
https://lists.gluster.org/mailman/listinfo/gluster-users 


-- 
David Cunningham, Voisonics Limited 
http://voisonics.com/ 
USA: +1 213 221 1092 
New Zealand: +64 (0)28 2558 3782 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190502/49108c37/attachment.html>

From jthottan at redhat.com  Fri May  3 06:04:50 2019
From: jthottan at redhat.com (Jiffin Tony Thottan)
Date: Fri, 3 May 2019 11:34:50 +0530
Subject: [Gluster-users] Proposing to previous ganesha HA cluster
 solution back to gluster code as gluster-7 feature
In-Reply-To: <7d75b62f0eb0495782c46ef8521790d5@ul-exc-pr-mbx13.ulaval.ca>
References: <b0ac05db-f0d7-143a-e8a9-48fff1189cdf@redhat.com>
	<9BE7F129-DE42-46A5-896B-81460E605E9E@gmail.com>
	<7d75b62f0eb0495782c46ef8521790d5@ul-exc-pr-mbx13.ulaval.ca>
Message-ID: <bfca6e2b-bc9a-a8e4-dcee-2e10f35315d6@redhat.com>


On 30/04/19 6:41 PM, Renaud Fortier wrote:
>
> IMO, you should keep storhaug and maintain it. At the beginning, we 
> were with pacemaker and corosync. Then we move to storhaug with the 
> upgrade to gluster 4.1.x. Now you are talking about going back like it 
> was. Maybe it will be better with pacemake and corosync but the 
> important is to have a solution that will be stable and maintained.
>

I agree it is very frustrating, there is no longer development planned 
for future unless someone pick it and work on for its stabilization and 
improvement.

My plan is just to get back what gluster and nfs-ganesha had before

--

Jiffin

> thanks
>
> Renaud
>
> *De?:*gluster-users-bounces at gluster.org 
> [mailto:gluster-users-bounces at gluster.org] *De la part de* Jim Kinney
> *Envoy??:* 30 avril 2019 08:20
> *??:* gluster-users at gluster.org; Jiffin Tony Thottan 
> <jthottan at redhat.com>; gluster-users at gluster.org; Gluster Devel 
> <gluster-devel at gluster.org>; gluster-maintainers at gluster.org; 
> nfs-ganesha <nfs-ganesha at redhat.com>; devel at lists.nfs-ganesha.org
> *Objet?:* Re: [Gluster-users] Proposing to previous ganesha HA cluster 
> solution back to gluster code as gluster-7 feature
>
> +1!
> I'm using nfs-ganesha in my next upgrade so my client systems can use 
> NFS instead of fuse mounts. Having an integrated, designed in process 
> to coordinate multiple nodes into an HA cluster will very welcome.
>
> On April 30, 2019 3:20:11 AM EDT, Jiffin Tony Thottan 
> <jthottan at redhat.com <mailto:jthottan at redhat.com>> wrote:
>
>     Hi all,
>
>     Some of you folks may be familiar with HA solution provided for
>     nfs-ganesha by gluster using pacemaker and corosync.
>
>     That feature was removed in glusterfs 3.10 in favour for common HA
>     project "Storhaug". Even Storhaug was not progressed
>
>     much from last two years and current development is in halt state,
>     hence planning to restore old HA ganesha solution back
>
>     to gluster code repository with some improvement and targetting
>     for next gluster release 7.
>
>     I have opened up an issue [1] with details and posted initial set
>     of patches [2]
>
>     Please share your thoughts on the same
>
>     Regards,
>
>     Jiffin
>
>     [1]https://github.com/gluster/glusterfs/issues/663
>     <https://github.com/gluster/glusterfs/issues/663>
>
>     [2]
>     https://review.gluster.org/#/q/topic:rfc-663+(status:open+OR+status:merged)
>
>
> -- 
> Sent from my Android device with K-9 Mail. All tyopes are thumb 
> related and reflect authenticity.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190503/e06fb039/attachment.html>

From jthottan at redhat.com  Fri May  3 06:08:07 2019
From: jthottan at redhat.com (Jiffin Tony Thottan)
Date: Fri, 3 May 2019 11:38:07 +0530
Subject: [Gluster-users] Proposing to previous ganesha HA
 clustersolution back to gluster code as gluster-7 feature
In-Reply-To: <1028413072.2343069.1556630991785@mail.yahoo.com>
References: <geuj12ndtqkroi36h5xmrydi.1556629490318@email.android.com>
	<1028413072.2343069.1556630991785@mail.yahoo.com>
Message-ID: <84885b70-e6b0-6e9b-f43d-a13dbafc6b6a@redhat.com>


On 30/04/19 6:59 PM, Strahil Nikolov wrote:
> Hi,
>
> I'm posting this again as it got bounced.
> Keep in mind that corosync/pacemaker? is hard for proper setup by new admins/users.
>
> I'm still trying to remediate the effects of poor configuration at work.
> Also, storhaug is nice for hyperconverged setups where the host is not only hosting bricks, but? other? workloads.
> Corosync/pacemaker require proper fencing to be setup and most of the stonith resources 'shoot the other node in the head'.
> I would be happy to see an easy to deploy (let say 'cluster.enable-ha-ganesha true') and gluster to be bringing up the Floating IPs and taking care of the NFS locks, so no disruption will be felt by the clients.


It do take care those, but need to follow certain prerequisite, but 
please fencing won't configured for this setup. May we think about in 
future.

--

Jiffin

>
> Still, this will be a lot of work to achieve.
>
> Best Regards,
> Strahil Nikolov
>
> On Apr 30, 2019 15:19, Jim Kinney <jim.kinney at gmail.com> wrote:
>>    
>> +1!
>> I'm using nfs-ganesha in my next upgrade so my client systems can use NFS instead of fuse mounts. Having an integrated, designed in process to coordinate multiple nodes into an HA cluster will very welcome.
>>
>> On April 30, 2019 3:20:11 AM EDT, Jiffin Tony Thottan <jthottan at redhat.com> wrote:
>>>    
>>> Hi all,
>>>
>>> Some of you folks may be familiar with HA solution provided for nfs-ganesha by gluster using pacemaker and corosync.
>>>
>>> That feature was removed in glusterfs 3.10 in favour for common HA project "Storhaug". Even Storhaug was not progressed
>>>
>>> much from last two years and current development is in halt state, hence planning to restore old HA ganesha solution back
>>>
>>> to gluster code repository with some improvement and targetting for next gluster release 7.
>>>
>>>  ??I have opened up an issue [1] with details and posted initial set of patches [2]
>>>
>>> Please share your thoughts on the same
>>>
>>>
>>> Regards,
>>>
>>> Jiffin
>>>
>>> [1] https://github.com/gluster/glusterfs/issues/663
>>>
>>> [2] https://review.gluster.org/#/q/topic:rfc-663+(status:open+OR+status:merged)
>>>
>>>
>> -- 
>> Sent from my Android device with K-9 Mail. All tyopes are thumb related and reflect authenticity.
> Keep in mind that corosync/pacemaker? is hard for proper setup by new admins/users.
>
> I'm still trying to remediate the effects of poor configuration at work.
> Also, storhaug is nice for hyperconverged setups where the host is not only hosting bricks, but? other? workloads.
> Corosync/pacemaker require proper fencing to be setup and most of the stonith resources 'shoot the other node in the head'.
> I would be happy to see an easy to deploy (let say 'cluster.enable-ha-ganesha true') and gluster to be bringing up the Floating IPs and taking care of the NFS locks, so no disruption will be felt by the clients.
>
> Still, this will be a lot of work to achieve.
>
> Best Regards,
> Strahil NikolovOn Apr 30, 2019 15:19, Jim Kinney <jim.kinney at gmail.com> wrote:
>> +1!
>> I'm using nfs-ganesha in my next upgrade so my client systems can use NFS instead of fuse mounts. Having an integrated, designed in process to coordinate multiple nodes into an HA cluster will very welcome.
>>
>> On April 30, 2019 3:20:11 AM EDT, Jiffin Tony Thottan <jthottan at redhat.com> wrote:
>>> Hi all,
>>>
>>> Some of you folks may be familiar with HA solution provided for nfs-ganesha by gluster using pacemaker and corosync.
>>>
>>> That feature was removed in glusterfs 3.10 in favour for common HA project "Storhaug". Even Storhaug was not progressed
>>>
>>> much from last two years and current development is in halt state, hence planning to restore old HA ganesha solution back
>>>
>>> to gluster code repository with some improvement and targetting for next gluster release 7.
>>>
>>> I have opened up an issue [1] with details and posted initial set of patches [2]
>>>
>>> Please share your thoughts on the same
>>>
>>> Regards,
>>>
>>> Jiffin
>>>
>>> [1] https://github.com/gluster/glusterfs/issues/663
>>>
>>> [2] https://review.gluster.org/#/q/topic:rfc-663+(status:open+OR+status:merged)
>>
>> -- 
>> Sent from my Android device with K-9 Mail. All tyopes are thumb related and reflect authenticity.

From hunter86_bg at yahoo.com  Fri May  3 18:40:01 2019
From: hunter86_bg at yahoo.com (Strahil)
Date: Fri, 03 May 2019 21:40:01 +0300
Subject: [Gluster-users] Thin-arbiter questions
Message-ID: <cq9m440agc1x2ypmme86q99j.1556908801194@email.android.com>

Hi Ashish,

Can someone commit the doc change I have already proposed ?
At least, the doc will clarify that fact .

Best Regards,
Strahil NikolovOn May 3, 2019 05:30, Ashish Pandey <aspandey at redhat.com> wrote:
>
> Hi David,
>
> Creation of thin-arbiter volume is currently supported by GD2 only. The command "glustercli" is available when glusterd2 is running.
> We are also working on providing thin-arbiter support on glusted however, it is not available right now.
> https://review.gluster.org/#/c/glusterfs/+/22612/ 
>
> ---
> Ashish
>
> ________________________________
> From: "David Cunningham" <dcunningham at voisonics.com>
> To: gluster-users at gluster.org
> Sent: Friday, May 3, 2019 7:40:03 AM
> Subject: [Gluster-users] Thin-arbiter questions
>
> Hello,
>
> We are setting up a thin-arbiter and hope someone can help with some questions. We've been following the documentation from https://docs.gluster.org/en/latest/Administrator%20Guide/Thin-Arbiter-Volumes/.
>
> 1. What release of 5.x supports thin-arbiter? We tried a "gluster volume create" with the --thin-arbiter option on 5.5 and got an "unrecognized option --thin-arbiter" error.
>
> 2. The instruction to create a new volume with a thin-arbiter is clear. How do you add a thin-arbiter to an already existing volume though?
>
> 3. The documentation suggests running glusterfsd manually to start the thin-arbiter. Is there a service that can do this instead? I found a mention of one in https://bugzilla.redhat.com/show_bug.cgi?id=1579786 but it's not really documented.
>
> Thanks in advance for your help,
>
> -- 
> David Cunningham, Voisonics Limited
> http://voisonics.com/
> USA: +1 213 221 1092
> New Zealand: +64 (0)28 2558 3782
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190503/03107b25/attachment.html>

From dcunningham at voisonics.com  Fri May  3 21:15:03 2019
From: dcunningham at voisonics.com (David Cunningham)
Date: Sat, 4 May 2019 09:15:03 +1200
Subject: [Gluster-users] Thin-arbiter questions
In-Reply-To: <1331654608.16170402.1556851405061.JavaMail.zimbra@redhat.com>
References: <CAHGbP-_X6WqQh+t41WfF+v72mG2QvyakyOokSiOBeuY43mgXiA@mail.gmail.com>
	<1272256010.16170135.1556850611144.JavaMail.zimbra@redhat.com>
	<CAHGbP--Jyj1zwcR5bs3ntRQQMfUjfw0BK06rwj4RjRwUb02MTw@mail.gmail.com>
	<1331654608.16170402.1556851405061.JavaMail.zimbra@redhat.com>
Message-ID: <CAHGbP-8zLpALhkfwY5CUCAQ9vnbd3iz3ZQ_=Ykbzp3m3nF+Sgw@mail.gmail.com>

OK, thank you Ashish.


On Fri, 3 May 2019 at 14:43, Ashish Pandey <aspandey at redhat.com> wrote:

> David,
>
> I am adding members who are working on glusterd2 (Aravinda) and
> thin-arbiter support in glusterd (Vishal) and who can
> better reply on these questions.
>
> Patch for glusterd has been sent and it only requires reviews. I hope it
> should be completed in next 1 month or so.
> https://review.gluster.org/#/c/glusterfs/+/22612/
>
> ---
> Ashish
>
> ------------------------------
> *From: *"David Cunningham" <dcunningham at voisonics.com>
> *To: *"Ashish Pandey" <aspandey at redhat.com>
> *Cc: *gluster-users at gluster.org
> *Sent: *Friday, May 3, 2019 8:04:04 AM
> *Subject: *Re: [Gluster-users] Thin-arbiter questions
>
> Hi Ashish,
>
> Thanks very much for that reply. How stable is GD2? Is there even a vague
> ETA on when it might be supported in gluster?
>
>
> On Fri, 3 May 2019 at 14:30, Ashish Pandey <aspandey at redhat.com> wrote:
>
>> Hi David,
>>
>> Creation of thin-arbiter volume is currently supported by GD2 only. The
>> command "glustercli" is available when glusterd2 is running.
>> We are also working on providing thin-arbiter support on glusted however,
>> it is not available right now.
>> https://review.gluster.org/#/c/glusterfs/+/22612/
>>
>> ---
>> Ashish
>>
>> ------------------------------
>> *From: *"David Cunningham" <dcunningham at voisonics.com>
>> *To: *gluster-users at gluster.org
>> *Sent: *Friday, May 3, 2019 7:40:03 AM
>> *Subject: *[Gluster-users] Thin-arbiter questions
>>
>> Hello,
>>
>> We are setting up a thin-arbiter and hope someone can help with some
>> questions. We've been following the documentation from
>> https://docs.gluster.org/en/latest/Administrator%20Guide/Thin-Arbiter-Volumes/
>> .
>>
>> 1. What release of 5.x supports thin-arbiter? We tried a "gluster volume
>> create" with the --thin-arbiter option on 5.5 and got an "unrecognized
>> option --thin-arbiter" error.
>>
>> 2. The instruction to create a new volume with a thin-arbiter is clear.
>> How do you add a thin-arbiter to an already existing volume though?
>>
>> 3. The documentation suggests running glusterfsd manually to start the
>> thin-arbiter. Is there a service that can do this instead? I found a
>> mention of one in https://bugzilla.redhat.com/show_bug.cgi?id=1579786
>> but it's not really documented.
>>
>> Thanks in advance for your help,
>>
>> --
>> David Cunningham, Voisonics Limited
>> http://voisonics.com/
>> USA: +1 213 221 1092
>> New Zealand: +64 (0)28 2558 3782
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>
> --
> David Cunningham, Voisonics Limited
> http://voisonics.com/
> USA: +1 213 221 1092
> New Zealand: +64 (0)28 2558 3782
>
>

-- 
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190504/148461db/attachment.html>

From archon810 at gmail.com  Fri May  3 22:44:33 2019
From: archon810 at gmail.com (Artem Russakovskii)
Date: Fri, 3 May 2019 15:44:33 -0700
Subject: [Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used
 instead of LAN IP
In-Reply-To: <CAD+dzQcTVhEt2+uKK=0tSZxShKqUBb-Kqvj6sJ27JsojP-N3rA@mail.gmail.com>
References: <CAAV-98962JWQwed2aGYcJTwLYjLDjxkodOvABagLu712HROLcQ@mail.gmail.com>
	<CAED=hWAEbJhZ7Du9fdVa+w9COq7ZMqTxWUras6iA8bYbTueyRw@mail.gmail.com>
	<CAAV-989HMsay-n6tuaGo7FWWS66K8NPCaT=g=dzevffjYyyyiQ@mail.gmail.com>
	<CAAV-988hM2AsO5_28oK9m1N2fE_pSbML7esMbBX_Kt=nfMy3zw@mail.gmail.com>
	<CAKh1kXtfiy16gq2RvT_3EpHC7ikWOAM96t2d8DRqf-K0CBSqsw@mail.gmail.com>
	<CAAV-98_UCyD0M2bHT=ua_yQ5PXm8wttCynMHs_knXabtx9967Q@mail.gmail.com>
	<CAKh1kXv5oDNPf4-z7fYHbG8syoG52dzVR6v06Jo2t5wVQ92u4Q@mail.gmail.com>
	<CAAV-9898M0=KcjnxdJdONHz13OPhXrNg46pRTXGYYc1qEUpB9Q@mail.gmail.com>
	<CAD+dzQeS-4Hyxm16KH0QwkAcRD9h0E8WS2WaYT8C4G6NH0iLJw@mail.gmail.com>
	<CAD+dzQfW0kc=Fq6DN3k2z1=0YJddnbjTNeJxzQT+kUrccWdu1g@mail.gmail.com>
	<CAD+dzQfr72Aig4jK0z7-dN5yxYZU8sD6Q+XPEa5HAYMQaZOk-A@mail.gmail.com>
	<CAHxyDdNxkWJP=kZyG0LspZb8svG56Bwq3=o=QrW3wNAcx+i7Og@mail.gmail.com>
	<CAD+dzQf8o5TsUx1udH1E7QAADaT42v0ydn95dMvrjDyou2nqiQ@mail.gmail.com>
	<CAD+dzQe0aKUYjg7_kumB8Q0CZg1uR8zJWYumLxN9f10BzvsfUg@mail.gmail.com>
	<2ed171d2-df68-ada3-e0de-53f19cb79520@redhat.com>
	<CAAV-98_iJ0UrH_LnekebXu-z7Qw6Xz2euQ1pEGa4wFSJEXKoWQ@mail.gmail.com>
	<CAAV-98_sd7KCB9YpjM8Ri_8UUuVCNpUrSYsyAMv46X_4u-FSVg@mail.gmail.com>
	<CAHxyDdNnZ0aQNgfEadHpDx6Vh7nen6VM7GkkWHPZCBdE8Ko33g@mail.gmail.com>
	<CAAV-98-85zCV-bE7pK0Fc0wJbwN1J-W=_jb=OAeeJWSZr-czjg@mail.gmail.com>
	<CAD+dzQepnip-y=kaGV+-1k-iEC1bGMm8cJcBTKgxSxuF_FEnGg@mail.gmail.com>
	<CAOUCJ=jSJeGHB2uADLakQgmy0b9DRA28KJAc-S0DBjPQL0WfoA@mail.gmail.com>
	<CAD+dzQc7YBFG_cfgnC9n1o05HVbkeueBJnh4kGuunOLyhVgCaw@mail.gmail.com>
	<CAHxyDdOF-jSPHHKYdM=x6RW7M_Zuuz_vTrQuBfFoGAJCsAppvg@mail.gmail.com>
	<CAD+dzQcTVhEt2+uKK=0tSZxShKqUBb-Kqvj6sJ27JsojP-N3rA@mail.gmail.com>
Message-ID: <CAD+dzQdEgL+K_LrPEbBgZSnG_k4LGmry7nK0c7-ZCLaHVU6MtA@mail.gmail.com>

Just to update everyone on the nasty crash one of our servers continued
having even after 5.5/5.6, I posted a summary of the results here:
https://bugzilla.redhat.com/show_bug.cgi?id=1690769#c4.

Sincerely,
Artem

--
Founder, Android Police <http://www.androidpolice.com>, APK Mirror
<http://www.apkmirror.com/>, Illogical Robot LLC
beerpla.net | +ArtemRussakovskii
<https://plus.google.com/+ArtemRussakovskii> | @ArtemR
<http://twitter.com/ArtemR>


On Wed, Mar 20, 2019 at 12:57 PM Artem Russakovskii <archon810 at gmail.com>
wrote:

> Amar,
>
> I see debuginfo packages now and have installed them. I'm available via
> Skype as before, just ping me there.
>
> Sincerely,
> Artem
>
> --
> Founder, Android Police <http://www.androidpolice.com>, APK Mirror
> <http://www.apkmirror.com/>, Illogical Robot LLC
> beerpla.net | +ArtemRussakovskii
> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
> <http://twitter.com/ArtemR>
>
>
> On Tue, Mar 19, 2019 at 10:46 PM Amar Tumballi Suryanarayan <
> atumball at redhat.com> wrote:
>
>>
>>
>> On Wed, Mar 20, 2019 at 9:52 AM Artem Russakovskii <archon810 at gmail.com>
>> wrote:
>>
>>> Can I roll back performance.write-behind: off and lru-limit=0 then? I'm
>>> waiting for the debug packages to be available for OpenSUSE, then I can
>>> help Amar with another debug session.
>>>
>>>
>> Yes, the write-behind issue is now fixed. You can enable write-behind.
>> Also remove lru-limit=0, so you can also utilize the benefit of garbage
>> collection introduced in 5.4
>>
>> Lets get to fixing the problem once the debuginfo packages are available.
>>
>>
>>
>>> In the meantime, have you had time to set up 1x4 replicate testing? I was
>>> told you were only testing 1x3, and it's the 4th brick that may be
>>> causing
>>> the crash, which is consistent with this whole time only 1 of 4 bricks
>>> constantly crashing. The other 3 have been rock solid. I'm hoping you
>>> could
>>> find the issue without a debug session this way.
>>>
>>>
>> That is my gut feeling still. Added a basic test case with 4 bricks,
>> https://review.gluster.org/#/c/glusterfs/+/22328/. But I think this
>> particular issue is happening only on certain pattern of access for 1x4
>> setup. Lets get to the root of it once we have debuginfo packages for Suse
>> builds.
>>
>> -Amar
>>
>> Sincerely,
>>> Artem
>>>
>>> --
>>> Founder, Android Police <http://www.androidpolice.com>, APK Mirror
>>> <http://www.apkmirror.com/>, Illogical Robot LLC
>>> beerpla.net | +ArtemRussakovskii
>>> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>> <http://twitter.com/ArtemR>
>>>
>>>
>>> On Tue, Mar 19, 2019 at 8:27 PM Nithya Balachandran <nbalacha at redhat.com
>>> >
>>> wrote:
>>>
>>> > Hi Artem,
>>> >
>>> > I think you are running into a different crash. The ones reported which
>>> > were prevented by turning off write-behind are now fixed.
>>> > We will need to look into the one you are seeing to see why it is
>>> > happening.
>>> >
>>> > Regards,
>>> > Nithya
>>> >
>>> >
>>> > On Tue, 19 Mar 2019 at 20:25, Artem Russakovskii <archon810 at gmail.com>
>>> > wrote:
>>> >
>>> >> The flood is indeed fixed for us on 5.5. However, the crashes are not.
>>> >>
>>> >> Sincerely,
>>> >> Artem
>>> >>
>>> >> --
>>> >> Founder, Android Police <http://www.androidpolice.com>, APK Mirror
>>> >> <http://www.apkmirror.com/>, Illogical Robot LLC
>>> >> beerpla.net | +ArtemRussakovskii
>>> >> <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>> >> <http://twitter.com/ArtemR>
>>> >>
>>> >>
>>> >> On Mon, Mar 18, 2019 at 5:41 AM Hu Bert <revirii at googlemail.com>
>>> wrote:
>>> >>
>>> >>> Hi Amar,
>>> >>>
>>> >>> if you refer to this bug:
>>> >>> https://bugzilla.redhat.com/show_bug.cgi?id=1674225 : in the test
>>> >>> setup i haven't seen those entries, while copying & deleting a few
>>> GBs
>>> >>> of data. For a final statement we have to wait until i updated our
>>> >>> live gluster servers - could take place on tuesday or wednesday.
>>> >>>
>>> >>> Maybe other users can do an update to 5.4 as well and report back
>>> here.
>>> >>>
>>> >>>
>>> >>> Hubert
>>> >>>
>>> >>>
>>> >>>
>>> >>> Am Mo., 18. M?rz 2019 um 11:36 Uhr schrieb Amar Tumballi Suryanarayan
>>> >>> <atumball at redhat.com>:
>>> >>> >
>>> >>> > Hi Hu Bert,
>>> >>> >
>>> >>> > Appreciate the feedback. Also are the other boiling issues related
>>> to
>>> >>> logs fixed now?
>>> >>> >
>>> >>> > -Amar
>>> >>> >
>>> >>> > On Mon, Mar 18, 2019 at 3:54 PM Hu Bert <revirii at googlemail.com>
>>> >>> wrote:
>>> >>> >>
>>> >>> >> update: upgrade from 5.3 -> 5.5 in a replicate 3 test setup with 2
>>> >>> >> volumes done. In 'gluster peer status' the peers stay connected
>>> during
>>> >>> >> the upgrade, no 'peer rejected' messages. No cksum mismatches in
>>> the
>>> >>> >> logs. Looks good :-)
>>> >>> >>
>>> >>> >> Am Mo., 18. M?rz 2019 um 09:54 Uhr schrieb Hu Bert <
>>> >>> revirii at googlemail.com>:
>>> >>> >> >
>>> >>> >> > Good morning :-)
>>> >>> >> >
>>> >>> >> > for debian the packages are there:
>>> >>> >> >
>>> >>>
>>> https://download.gluster.org/pub/gluster/glusterfs/5/5.5/Debian/stretch/amd64/apt/pool/main/g/glusterfs/
>>> >>> >> >
>>> >>> >> > I'll do an upgrade of a test installation 5.3 -> 5.5 and see if
>>> >>> there
>>> >>> >> > are some errors etc. and report back.
>>> >>> >> >
>>> >>> >> > btw: no release notes for 5.4 and 5.5 so far?
>>> >>> >> > https://docs.gluster.org/en/latest/release-notes/ ?
>>> >>> >> >
>>> >>> >> > Am Fr., 15. M?rz 2019 um 14:28 Uhr schrieb Shyam Ranganathan
>>> >>> >> > <srangana at redhat.com>:
>>> >>> >> > >
>>> >>> >> > > We created a 5.5 release tag, and it is under packaging now.
>>> It
>>> >>> should
>>> >>> >> > > be packaged and ready for testing early next week and should
>>> be
>>> >>> released
>>> >>> >> > > close to mid-week next week.
>>> >>> >> > >
>>> >>> >> > > Thanks,
>>> >>> >> > > Shyam
>>> >>> >> > > On 3/13/19 12:34 PM, Artem Russakovskii wrote:
>>> >>> >> > > > Wednesday now with no update :-/
>>> >>> >> > > >
>>> >>> >> > > > Sincerely,
>>> >>> >> > > > Artem
>>> >>> >> > > >
>>> >>> >> > > > --
>>> >>> >> > > > Founder, Android Police <http://www.androidpolice.com>, APK
>>> >>> Mirror
>>> >>> >> > > > <http://www.apkmirror.com/>, Illogical Robot LLC
>>> >>> >> > > > beerpla.net <http://beerpla.net/> | +ArtemRussakovskii
>>> >>> >> > > > <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>> >>> >> > > > <http://twitter.com/ArtemR>
>>> >>> >> > > >
>>> >>> >> > > >
>>> >>> >> > > > On Tue, Mar 12, 2019 at 10:28 AM Artem Russakovskii <
>>> >>> archon810 at gmail.com
>>> >>> >> > > > <mailto:archon810 at gmail.com>> wrote:
>>> >>> >> > > >
>>> >>> >> > > >     Hi Amar,
>>> >>> >> > > >
>>> >>> >> > > >     Any updates on this? I'm still not seeing it in OpenSUSE
>>> >>> build
>>> >>> >> > > >     repos. Maybe later today?
>>> >>> >> > > >
>>> >>> >> > > >     Thanks.
>>> >>> >> > > >
>>> >>> >> > > >     Sincerely,
>>> >>> >> > > >     Artem
>>> >>> >> > > >
>>> >>> >> > > >     --
>>> >>> >> > > >     Founder, Android Police <http://www.androidpolice.com>,
>>> >>> APK Mirror
>>> >>> >> > > >     <http://www.apkmirror.com/>, Illogical Robot LLC
>>> >>> >> > > >     beerpla.net <http://beerpla.net/> | +ArtemRussakovskii
>>> >>> >> > > >     <https://plus.google.com/+ArtemRussakovskii> | @ArtemR
>>> >>> >> > > >     <http://twitter.com/ArtemR>
>>> >>> >> > > >
>>> >>> >> > > >
>>> >>> >> > > >     On Wed, Mar 6, 2019 at 10:30 PM Amar Tumballi
>>> Suryanarayan
>>> >>> >> > > >     <atumball at redhat.com <mailto:atumball at redhat.com>>
>>> wrote:
>>> >>> >> > > >
>>> >>> >> > > >         We are talking days. Not weeks. Considering already
>>> it
>>> >>> is
>>> >>> >> > > >         Thursday here. 1 more day for tagging, and
>>> packaging.
>>> >>> May be ok
>>> >>> >> > > >         to expect it on Monday.
>>> >>> >> > > >
>>> >>> >> > > >         -Amar
>>> >>> >> > > >
>>> >>> >> > > >         On Thu, Mar 7, 2019 at 11:54 AM Artem Russakovskii
>>> >>> >> > > >         <archon810 at gmail.com <mailto:archon810 at gmail.com>>
>>> >>> wrote:
>>> >>> >> > > >
>>> >>> >> > > >             Is the next release going to be an imminent
>>> hotfix,
>>> >>> i.e.
>>> >>> >> > > >             something like today/tomorrow, or are we talking
>>> >>> weeks?
>>> >>> >> > > >
>>> >>> >> > > >             Sincerely,
>>> >>> >> > > >             Artem
>>> >>> >> > > >
>>> >>> >> > > >             --
>>> >>> >> > > >             Founder, Android Police <
>>> >>> http://www.androidpolice.com>, APK
>>> >>> >> > > >             Mirror <http://www.apkmirror.com/>, Illogical
>>> >>> Robot LLC
>>> >>> >> > > >             beerpla.net <http://beerpla.net/> |
>>> >>> +ArtemRussakovskii
>>> >>> >> > > >             <https://plus.google.com/+ArtemRussakovskii> |
>>> >>> @ArtemR
>>> >>> >> > > >             <http://twitter.com/ArtemR>
>>> >>> >> > > >
>>> >>> >> > > >
>>> >>> >> > > >             On Tue, Mar 5, 2019 at 11:09 AM Artem
>>> Russakovskii
>>> >>> >> > > >             <archon810 at gmail.com <mailto:
>>> archon810 at gmail.com>>
>>> >>> wrote:
>>> >>> >> > > >
>>> >>> >> > > >                 Ended up downgrading to 5.3 just in case.
>>> Peer
>>> >>> status
>>> >>> >> > > >                 and volume status are OK now.
>>> >>> >> > > >
>>> >>> >> > > >                 zypper install --oldpackage
>>> >>> glusterfs-5.3-lp150.100.1
>>> >>> >> > > >                 Loading repository data...
>>> >>> >> > > >                 Reading installed packages...
>>> >>> >> > > >                 Resolving package dependencies...
>>> >>> >> > > >
>>> >>> >> > > >                 Problem: glusterfs-5.3-lp150.100.1.x86_64
>>> >>> requires
>>> >>> >> > > >                 libgfapi0 = 5.3, but this requirement
>>> cannot be
>>> >>> provided
>>> >>> >> > > >                   not installable providers:
>>> >>> >> > > >                 libgfapi0-5.3-lp150.100.1.x86_64[glusterfs]
>>> >>> >> > > >                  Solution 1: Following actions will be done:
>>> >>> >> > > >                   downgrade of
>>> libgfapi0-5.4-lp150.100.1.x86_64
>>> >>> to
>>> >>> >> > > >                 libgfapi0-5.3-lp150.100.1.x86_64
>>> >>> >> > > >                   downgrade of
>>> >>> libgfchangelog0-5.4-lp150.100.1.x86_64 to
>>> >>> >> > > >                 libgfchangelog0-5.3-lp150.100.1.x86_64
>>> >>> >> > > >                   downgrade of
>>> libgfrpc0-5.4-lp150.100.1.x86_64
>>> >>> to
>>> >>> >> > > >                 libgfrpc0-5.3-lp150.100.1.x86_64
>>> >>> >> > > >                   downgrade of
>>> libgfxdr0-5.4-lp150.100.1.x86_64
>>> >>> to
>>> >>> >> > > >                 libgfxdr0-5.3-lp150.100.1.x86_64
>>> >>> >> > > >                   downgrade of
>>> >>> libglusterfs0-5.4-lp150.100.1.x86_64 to
>>> >>> >> > > >                 libglusterfs0-5.3-lp150.100.1.x86_64
>>> >>> >> > > >                  Solution 2: do not install
>>> >>> glusterfs-5.3-lp150.100.1.x86_64
>>> >>> >> > > >                  Solution 3: break
>>> >>> glusterfs-5.3-lp150.100.1.x86_64 by
>>> >>> >> > > >                 ignoring some of its dependencies
>>> >>> >> > > >
>>> >>> >> > > >                 Choose from above solutions by number or
>>> cancel
>>> >>> >> > > >                 [1/2/3/c] (c): 1
>>> >>> >> > > >                 Resolving dependencies...
>>> >>> >> > > >                 Resolving package dependencies...
>>> >>> >> > > >
>>> >>> >> > > >                 The following 6 packages are going to be
>>> >>> downgraded:
>>> >>> >> > > >                   glusterfs libgfapi0 libgfchangelog0
>>> libgfrpc0
>>> >>> >> > > >                 libgfxdr0 libglusterfs0
>>> >>> >> > > >
>>> >>> >> > > >                 6 packages to downgrade.
>>> >>> >> > > >
>>> >>> >> > > >                 Sincerely,
>>> >>> >> > > >                 Artem
>>> >>> >> > > >
>>> >>> >> > > >                 --
>>> >>> >> > > >                 Founder, Android Police
>>> >>> >> > > >                 <http://www.androidpolice.com>, APK Mirror
>>> >>> >> > > >                 <http://www.apkmirror.com/>, Illogical
>>> Robot
>>> >>> LLC
>>> >>> >> > > >                 beerpla.net <http://beerpla.net/> |
>>> >>> +ArtemRussakovskii
>>> >>> >> > > >                 <https://plus.google.com/+ArtemRussakovskii>
>>> |
>>> >>> @ArtemR
>>> >>> >> > > >                 <http://twitter.com/ArtemR>
>>> >>> >> > > >
>>> >>> >> > > >
>>> >>> >> > > >                 On Tue, Mar 5, 2019 at 10:57 AM Artem
>>> >>> Russakovskii
>>> >>> >> > > >                 <archon810 at gmail.com <mailto:
>>> >>> archon810 at gmail.com>> wrote:
>>> >>> >> > > >
>>> >>> >> > > >                     Noticed the same when upgrading from
>>> 5.3 to
>>> >>> 5.4, as
>>> >>> >> > > >                     mentioned.
>>> >>> >> > > >
>>> >>> >> > > >                     I'm confused though. Is actual
>>> replication
>>> >>> affected,
>>> >>> >> > > >                     because the 5.4 server and the 3x 5.3
>>> >>> servers still
>>> >>> >> > > >                     show heal info as all 4 connected, and
>>> the
>>> >>> files
>>> >>> >> > > >                     seem to be replicating correctly as
>>> well.
>>> >>> >> > > >
>>> >>> >> > > >                     So what's actually affected - just the
>>> >>> status
>>> >>> >> > > >                     command, or leaving 5.4 on one of the
>>> nodes
>>> >>> is doing
>>> >>> >> > > >                     some damage to the underlying fs? Is it
>>> >>> fixable by
>>> >>> >> > > >                     tweaking transport.socket.ssl-enabled?
>>> Does
>>> >>> >> > > >                     upgrading all servers to 5.4 resolve
>>> it, or
>>> >>> should
>>> >>> >> > > >                     we revert back to 5.3?
>>> >>> >> > > >
>>> >>> >> > > >                     Sincerely,
>>> >>> >> > > >                     Artem
>>> >>> >> > > >
>>> >>> >> > > >                     --
>>> >>> >> > > >                     Founder, Android Police
>>> >>> >> > > >                     <http://www.androidpolice.com>, APK
>>> Mirror
>>> >>> >> > > >                     <http://www.apkmirror.com/>, Illogical
>>> >>> Robot LLC
>>> >>> >> > > >                     beerpla.net <http://beerpla.net/> |
>>> >>> >> > > >                     +ArtemRussakovskii
>>> >>> >> > > >                     <
>>> https://plus.google.com/+ArtemRussakovskii
>>> >>> >
>>> >>> >> > > >                     | @ArtemR <http://twitter.com/ArtemR>
>>> >>> >> > > >
>>> >>> >> > > >
>>> >>> >> > > >                     On Tue, Mar 5, 2019 at 2:02 AM Hu Bert
>>> >>> >> > > >                     <revirii at googlemail.com
>>> >>> >> > > >                     <mailto:revirii at googlemail.com>> wrote:
>>> >>> >> > > >
>>> >>> >> > > >                         fyi: did a downgrade 5.4 -> 5.3 and
>>> it
>>> >>> worked.
>>> >>> >> > > >                         all replicas are up and
>>> >>> >> > > >                         running. Awaiting updated v5.4.
>>> >>> >> > > >
>>> >>> >> > > >                         thx :-)
>>> >>> >> > > >
>>> >>> >> > > >                         Am Di., 5. M?rz 2019 um 09:26 Uhr
>>> >>> schrieb Hari
>>> >>> >> > > >                         Gowtham <hgowtham at redhat.com
>>> >>> >> > > >                         <mailto:hgowtham at redhat.com>>:
>>> >>> >> > > >                         >
>>> >>> >> > > >                         > There are plans to revert the
>>> patch
>>> >>> causing
>>> >>> >> > > >                         this error and rebuilt 5.4.
>>> >>> >> > > >                         > This should happen faster. the
>>> >>> rebuilt 5.4
>>> >>> >> > > >                         should be void of this upgrade
>>> issue.
>>> >>> >> > > >                         >
>>> >>> >> > > >                         > In the meantime, you can use 5.3
>>> for
>>> >>> this cluster.
>>> >>> >> > > >                         > Downgrading to 5.3 will work if it
>>> >>> was just
>>> >>> >> > > >                         one node that was upgrade to 5.4
>>> >>> >> > > >                         > and the other nodes are still in
>>> 5.3
>>
>>
>>
>> --
>> Amar Tumballi (amarts)
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190503/05a1074d/attachment.html>

From hunter86_bg at yahoo.com  Sat May  4 06:34:56 2019
From: hunter86_bg at yahoo.com (Strahil)
Date: Sat, 04 May 2019 09:34:56 +0300
Subject: [Gluster-users] Proposing to previous ganesha HA
 clustersolution back to gluster code as gluster-7 feature
Message-ID: <xesix0fr4eyn6d8tmgeermgk.1556951696318@email.android.com>

Hi Jiffin,

No vendor will support your corosync/pacemaker stack if you do not have proper fencing.
As Gluster is already a cluster of its own, it makes sense to control everything from there.

Best Regards,
Strahil NikolovOn May 3, 2019 09:08, Jiffin Tony Thottan <jthottan at redhat.com> wrote:
>
>
> On 30/04/19 6:59 PM, Strahil Nikolov wrote: 
> > Hi, 
> > 
> > I'm posting this again as it got bounced. 
> > Keep in mind that corosync/pacemaker? is hard for proper setup by new admins/users. 
> > 
> > I'm still trying to remediate the effects of poor configuration at work. 
> > Also, storhaug is nice for hyperconverged setups where the host is not only hosting bricks, but? other? workloads. 
> > Corosync/pacemaker require proper fencing to be setup and most of the stonith resources 'shoot the other node in the head'. 
> > I would be happy to see an easy to deploy (let say 'cluster.enable-ha-ganesha true') and gluster to be bringing up the Floating IPs and taking care of the NFS locks, so no disruption will be felt by the clients. 
>
>
> It do take care those, but need to follow certain prerequisite, but 
> please fencing won't configured for this setup. May we think about in 
> future. 
>
> -- 
>
> Jiffin 
>
> > 
> > Still, this will be a lot of work to achieve. 
> > 
> > Best Regards, 
> > Strahil Nikolov 
> > 
> > On Apr 30, 2019 15:19, Jim Kinney <jim.kinney at gmail.com> wrote: 
> >>??? 
> >> +1! 
> >> I'm using nfs-ganesha in my next upgrade so my client systems can use NFS instead of fuse mounts. Having an integrated, designed in process to coordinate multiple nodes into an HA cluster will very welcome. 
> >> 
> >> On April 30, 2019 3:20:11 AM EDT, Jiffin Tony Thottan <jthottan at redhat.com> wrote: 
> >>>??? 
> >>> Hi all, 
> >>> 
> >>> Some of you folks may be familiar with HA solution provided for nfs-ganesha by gluster using pacemaker and corosync. 
> >>> 
> >>> That feature was removed in glusterfs 3.10 in favour for common HA project "Storhaug". Even Storhaug was not progressed 
> >>> 
> >>> much from last two years and current development is in halt state, hence planning to restore old HA ganesha solution back 
> >>> 
> >>> to gluster code repository with some improvement and targetting for next gluster release 7. 
> >>> 
> >>>? ??I have opened up an issue [1] with details and posted initial set of patches [2] 
> >>> 
> >>> Please share your thoughts on the same 
> >>> 
> >>> 
> >>> Regards, 
> >>> 
> >>> Jiffin 
> >>> 
> >>> [1] https://github.com/gluster/glusterfs/issues/663 
> >>> 
> >>> [2] https://review.gluster.org/#/q/topic:rfc-663+(status:open+OR+status:merged) 
> >>> 
> >>> 
> >> -- 
> >> Sent from my Android device with K-9 Mail. All tyopes are thumb related and reflect authenticity. 
> > Keep in mind that corosync/pacemaker? is hard for proper setup by new admins/users. 
> > 
> > I'm still trying to remediate the effects of poor configuration at work. 
> > Also, storhaug is nice for hyperconverged setups where the host is not only hosting bricks, but? other? workloads. 
> > Corosync/pacemaker require proper fencing to be setup and most of the stonith resources 'shoot the other node in the head'. 
> > I would be happy to see an easy to deploy (let say 'cluster.enable-ha-ganesha true') and gluster to be bringing up the Floating IPs and taking care of the NFS locks, so no disruption will be felt by the clients. 
> > 
> > Still, this will be a lot of work to achieve. 
> > 
> > Best Regards, 
> > Strahil NikolovOn Apr 30, 2019 15:19, Jim Kinney <jim.kinney at gmail.com> wrote: 
> >> +1! 
> >> I'm using nfs-ganesha in my next upgrade so my client systems can use NFS instead of fuse mounts. Having an integrated, designed in process to coordinate multiple nodes into an HA cluster will very welcome. 
> >> 
> >> On April 30, 2019 3:20:11 AM EDT, Jiffin Tony Thottan <jthottan at redhat.com> wrote: 
> >>> Hi all, 
> >>> 
> >>> Some of you folks may be familiar with HA solution provided for nfs-ganesha by gluster using pacemaker and corosync. 
> >>> 
> >>> That feature was removed in glusterfs 3.10 in favour for common HA project "Storhaug". Even Storhaug was not progressed 
> >>> 
> >>> much from last two years and current development is in halt state, hence planning to restore old HA ganesha solution back 
> >>> 
> >>> to gluster code repository with some improvement and targetting for next gluster release 7. 
> >>> 
> >>> I have opened up an issue [1] with details and posted initial set of patches [2] 
> >>> 
> >>> Please share your thoughts on the same 
> >>> 
> >>> Regards, 
> >>> 
> >>> Jiffin 
> >>> 
> >>> [1] https://github.com/gluster/glusterfs/issues/663 
> >>> 
> >>> [2] https://review.gluster.org/#/q/topic:rfc-663+(status:open+OR+status:merged) 
> >> 
> >> -- 
> >> Sent from my Android device with K-9 Mail. All tyopes are thumb related and reflect authenticity. 

From aspandey at redhat.com  Mon May  6 06:21:22 2019
From: aspandey at redhat.com (Ashish Pandey)
Date: Mon, 6 May 2019 02:21:22 -0400 (EDT)
Subject: [Gluster-users] Thin-arbiter questions
In-Reply-To: <cq9m440agc1x2ypmme86q99j.1556908801194@email.android.com>
References: <cq9m440agc1x2ypmme86q99j.1556908801194@email.android.com>
Message-ID: <757816852.16925254.1557123682731.JavaMail.zimbra@redhat.com>

Hi, 

I can see that Amar has already committed the changes and those are visible on https://docs.gluster.org/en/latest/Administrator%20Guide/Thin-Arbiter-Volumes/ 

--- 
Ashish 


----- Original Message -----

From: "Strahil" <hunter86_bg at yahoo.com> 
To: "Ashish" <aspandey at redhat.com>, "David" <dcunningham at voisonics.com> 
Cc: "gluster-users" <gluster-users at gluster.org> 
Sent: Saturday, May 4, 2019 12:10:01 AM 
Subject: Re: [Gluster-users] Thin-arbiter questions 


Hi Ashish, 

Can someone commit the doc change I have already proposed ? 
At least, the doc will clarify that fact . 

Best Regards, 
Strahil Nikolov 
On May 3, 2019 05:30, Ashish Pandey <aspandey at redhat.com> wrote: 


Hi David, 

Creation of thin-arbiter volume is currently supported by GD2 only. The command " glustercli " is available when glusterd2 is running. 
We are also working on providing thin-arbiter support on glusted however, it is not available right now. 
https://review.gluster.org/#/c/glusterfs/+/22612/ 

--- 
Ashish 


From: "David Cunningham" <dcunningham at voisonics.com> 
To: gluster-users at gluster.org 
Sent: Friday, May 3, 2019 7:40:03 AM 
Subject: [Gluster-users] Thin-arbiter questions 

Hello, 

We are setting up a thin-arbiter and hope someone can help with some questions. We've been following the documentation from https://docs.gluster.org/en/latest/Administrator%20Guide/Thin-Arbiter-Volumes/ . 

1. What release of 5.x supports thin-arbiter? We tried a "gluster volume create" with the --thin-arbiter option on 5.5 and got an "unrecognized option --thin-arbiter" error. 

2. The instruction to create a new volume with a thin-arbiter is clear. How do you add a thin-arbiter to an already existing volume though? 

3. The documentation suggests running glusterfsd manually to start the thin-arbiter. Is there a service that can do this instead? I found a mention of one in https://bugzilla.redhat.com/show_bug.cgi?id=1579786 but it's not really documented. 

Thanks in advance for your help, 

-- 
David Cunningham, Voisonics Limited 
http://voisonics.com/ 
USA: +1 213 221 1092 
New Zealand: +64 (0)28 2558 3782 

_______________________________________________ 
Gluster-users mailing list 
Gluster-users at gluster.org 
https://lists.gluster.org/mailman/listinfo/gluster-users 


_______________________________________________ 
Gluster-users mailing list 
Gluster-users at gluster.org 
https://lists.gluster.org/mailman/listinfo/gluster-users 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190506/04f2475a/attachment.html>

From dcunningham at voisonics.com  Mon May  6 08:10:30 2019
From: dcunningham at voisonics.com (David Cunningham)
Date: Mon, 6 May 2019 20:10:30 +1200
Subject: [Gluster-users] Thin-arbiter questions
In-Reply-To: <757816852.16925254.1557123682731.JavaMail.zimbra@redhat.com>
References: <cq9m440agc1x2ypmme86q99j.1556908801194@email.android.com>
	<757816852.16925254.1557123682731.JavaMail.zimbra@redhat.com>
Message-ID: <CAHGbP-9fgTx-DNamyj3_-YMWxBm=nWb=gM+J+5H1DGs_6uY95A@mail.gmail.com>

Hi Ashish,

Thank you for the update. Does that mean they're now in the regular
Glusterfs? Any idea how long it typically takes the Ubuntu and CentOS
packages  to be updated with the latest code?


On Mon, 6 May 2019 at 18:21, Ashish Pandey <aspandey at redhat.com> wrote:

> Hi,
>
> I can see that Amar has already committed the changes and those are
> visible on
> https://docs.gluster.org/en/latest/Administrator%20Guide/Thin-Arbiter-Volumes/
>
> ---
> Ashish
>
>
>
> ------------------------------
> *From: *"Strahil" <hunter86_bg at yahoo.com>
> *To: *"Ashish" <aspandey at redhat.com>, "David" <dcunningham at voisonics.com>
> *Cc: *"gluster-users" <gluster-users at gluster.org>
> *Sent: *Saturday, May 4, 2019 12:10:01 AM
> *Subject: *Re: [Gluster-users] Thin-arbiter questions
>
> Hi Ashish,
>
> Can someone commit the doc change I have already proposed ?
> At least, the doc will clarify that fact .
>
> Best Regards,
> Strahil Nikolov
> On May 3, 2019 05:30, Ashish Pandey <aspandey at redhat.com> wrote:
>
> Hi David,
>
> Creation of thin-arbiter volume is currently supported by GD2 only. The
> command "glustercli" is available when glusterd2 is running.
> We are also working on providing thin-arbiter support on glusted however,
> it is not available right now.
> https://review.gluster.org/#/c/glusterfs/+/22612/
>
> ---
> Ashish
>
> ------------------------------
> *From: *"David Cunningham" <dcunningham at voisonics.com>
> *To: *gluster-users at gluster.org
> *Sent: *Friday, May 3, 2019 7:40:03 AM
> *Subject: *[Gluster-users] Thin-arbiter questions
>
> Hello,
>
> We are setting up a thin-arbiter and hope someone can help with some
> questions. We've been following the documentation from
> https://docs.gluster.org/en/latest/Administrator%20Guide/Thin-Arbiter-Volumes/
> .
>
> 1. What release of 5.x supports thin-arbiter? We tried a "gluster volume
> create" with the --thin-arbiter option on 5.5 and got an "unrecognized
> option --thin-arbiter" error.
>
> 2. The instruction to create a new volume with a thin-arbiter is clear.
> How do you add a thin-arbiter to an already existing volume though?
>
> 3. The documentation suggests running glusterfsd manually to start the
> thin-arbiter. Is there a service that can do this instead? I found a
> mention of one in https://bugzilla.redhat.com/show_bug.cgi?id=1579786 but
> it's not really documented.
>
> Thanks in advance for your help,
>
> --
> David Cunningham, Voisonics Limited
> http://voisonics.com/
> USA: +1 213 221 1092
> New Zealand: +64 (0)28 2558 3782
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>

-- 
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190506/5eeb5968/attachment.html>

From aspandey at redhat.com  Mon May  6 08:34:07 2019
From: aspandey at redhat.com (Ashish Pandey)
Date: Mon, 6 May 2019 04:34:07 -0400 (EDT)
Subject: [Gluster-users] Thin-arbiter questions
In-Reply-To: <CAHGbP-9fgTx-DNamyj3_-YMWxBm=nWb=gM+J+5H1DGs_6uY95A@mail.gmail.com>
References: <cq9m440agc1x2ypmme86q99j.1556908801194@email.android.com>
	<757816852.16925254.1557123682731.JavaMail.zimbra@redhat.com>
	<CAHGbP-9fgTx-DNamyj3_-YMWxBm=nWb=gM+J+5H1DGs_6uY95A@mail.gmail.com>
Message-ID: <645227359.16980056.1557131647054.JavaMail.zimbra@redhat.com>


----- Original Message -----

From: "David Cunningham" <dcunningham at voisonics.com> 
To: "Ashish Pandey" <aspandey at redhat.com> 
Cc: "gluster-users" <gluster-users at gluster.org> 
Sent: Monday, May 6, 2019 1:40:30 PM 
Subject: Re: [Gluster-users] Thin-arbiter questions 

Hi Ashish, 

Thank you for the update. Does that mean they're now in the regular Glusterfs? Any idea how long it typically takes the Ubuntu and CentOS packages to be updated with the latest code? 

No, for regular glusterd, work is still in progress. It will be done soon. 
I don't have answer for the next question. May be Amar have information regarding this. Adding him in CC. 


On Mon, 6 May 2019 at 18:21, Ashish Pandey < aspandey at redhat.com > wrote: 


Hi, 

I can see that Amar has already committed the changes and those are visible on https://docs.gluster.org/en/latest/Administrator%20Guide/Thin-Arbiter-Volumes/ 

--- 
Ashish 


From: "Strahil" < hunter86_bg at yahoo.com > 
To: "Ashish" < aspandey at redhat.com >, "David" < dcunningham at voisonics.com > 
Cc: "gluster-users" < gluster-users at gluster.org > 
Sent: Saturday, May 4, 2019 12:10:01 AM 
Subject: Re: [Gluster-users] Thin-arbiter questions 


Hi Ashish, 

Can someone commit the doc change I have already proposed ? 
At least, the doc will clarify that fact . 

Best Regards, 
Strahil Nikolov 
On May 3, 2019 05:30, Ashish Pandey < aspandey at redhat.com > wrote: 

<blockquote>

Hi David, 

Creation of thin-arbiter volume is currently supported by GD2 only. The command " glustercli " is available when glusterd2 is running. 
We are also working on providing thin-arbiter support on glusted however, it is not available right now. 
https://review.gluster.org/#/c/glusterfs/+/22612/ 

--- 
Ashish 


From: "David Cunningham" < dcunningham at voisonics.com > 
To: gluster-users at gluster.org 
Sent: Friday, May 3, 2019 7:40:03 AM 
Subject: [Gluster-users] Thin-arbiter questions 

Hello, 

We are setting up a thin-arbiter and hope someone can help with some questions. We've been following the documentation from https://docs.gluster.org/en/latest/Administrator%20Guide/Thin-Arbiter-Volumes/ . 

1. What release of 5.x supports thin-arbiter? We tried a "gluster volume create" with the --thin-arbiter option on 5.5 and got an "unrecognized option --thin-arbiter" error. 

2. The instruction to create a new volume with a thin-arbiter is clear. How do you add a thin-arbiter to an already existing volume though? 

3. The documentation suggests running glusterfsd manually to start the thin-arbiter. Is there a service that can do this instead? I found a mention of one in https://bugzilla.redhat.com/show_bug.cgi?id=1579786 but it's not really documented. 

Thanks in advance for your help, 

-- 
David Cunningham, Voisonics Limited 
http://voisonics.com/ 
USA: +1 213 221 1092 
New Zealand: +64 (0)28 2558 3782 

_______________________________________________ 
Gluster-users mailing list 
Gluster-users at gluster.org 
https://lists.gluster.org/mailman/listinfo/gluster-users 


_______________________________________________ 
Gluster-users mailing list 
Gluster-users at gluster.org 
https://lists.gluster.org/mailman/listinfo/gluster-users 


</blockquote>


-- 
David Cunningham, Voisonics Limited 
http://voisonics.com/ 
USA: +1 213 221 1092 
New Zealand: +64 (0)28 2558 3782 

_______________________________________________ 
Gluster-users mailing list 
Gluster-users at gluster.org 
https://lists.gluster.org/mailman/listinfo/gluster-users 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190506/91d21778/attachment.html>

From spisla80 at gmail.com  Mon May  6 10:08:39 2019
From: spisla80 at gmail.com (David Spisla)
Date: Mon, 6 May 2019 12:08:39 +0200
Subject: [Gluster-users] Brick-Xlators crashes after Set-RO and Read
Message-ID: <CAJyj9j9wLf7BLJOP8Se5NrXr2ySPw5bTMY8OQ+RU0f4tCzAXcA@mail.gmail.com>

Hello folks,

we have a client application (runs on Win10) which does some FOPs on a
gluster volume which is accessed by SMB.

*Scenario 1* is a READ Operation which reads all files successively and
checks if the files data was correctly copied. While doing this, all brick
processes crashes and in the logs one have this crash report on every brick
log:

> CTX_ID:a0359502-2c76-4fee-8cb9-365679dc690e-GRAPH_ID:0-PID:32934-HOST:XX-XXXXX-XX-XX-PC_NAME:shortterm-client-2-RECON_NO:-0, gfid: 00000000-0000-0000-0000-000000000001, req(uid:2000,gid:2000,perm:1,ngrps:1), ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission denied]
> pending frames:
> frame : type(0) op(27)
> frame : type(0) op(40)
> patchset: git://git.gluster.org/glusterfs.git
> signal received: 11
> time of crash:
> 2019-04-16 08:32:21
> configuration details:
> argp 1
> backtrace 1
> dlfcn 1
> libpthread 1
> llistxattr 1
> setfsid 1
> spinlock 1
> epoll.h 1
> xattr.h 1
> st_atim.tv_nsec 1
> package-string: glusterfs 5.5
> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f9a5bd4d64c]
> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f9a5bd57d26]
> /lib64/libc.so.6(+0x361a0)[0x7f9a5af141a0]
> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7f9a4ef0e910]
> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7f9a4ef0b118]
> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7f9a4f1278d6]
> /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7f9a4f35975b]
> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7f9a4f1203b3]
> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7f9a4ef0b5b2]
> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c]
> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c]
> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7f9a4e8cf548]
> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7f9a5bdefc22]
> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7f9a5bd733a5]
> /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7f9a4e6b7088]
> /lib64/libpthread.so.0(+0x7569)[0x7f9a5b29f569]
> /lib64/libc.so.6(clone+0x3f)[0x7f9a5afd69af]
>
> *Scenario 2 *The application just SET Read-Only on each file sucessively.
After the 70th file was set, all the bricks crashes and again, one can read
this crash report in every brick log:

>
>
> [2019-05-02 07:43:39.953591] I [MSGID: 139001]
> [posix-acl.c:263:posix_acl_log_permit_denied] 0-longterm-access-control:
> client:
> CTX_ID:21aa9c75-3a5f-41f9-925b-48e4c80bd24a-GRAPH_ID:0-PID:16325-HOST:XXX-X-X-XXX-PC_NAME:longterm-client-0-RECON_NO:-0,
> gfid: 00000000-0000-0000-0000-000000000001,
> req(uid:2000,gid:2000,perm:1,ngrps:1),
> ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission
> denied]
>
> pending frames:
>
> frame : type(0) op(27)
>
> patchset: git://git.gluster.org/glusterfs.git
>
> signal received: 11
>
> time of crash:
>
> 2019-05-02 07:43:39
>
> configuration details:
>
> argp 1
>
> backtrace 1
>
> dlfcn 1
>
> libpthread 1
>
> llistxattr 1
>
> setfsid 1
>
> spinlock 1
>
> epoll.h 1
>
> xattr.h 1
>
> st_atim.tv_nsec 1
>
> package-string: glusterfs 5.5
>
> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fbb3f0b364c]
>
> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fbb3f0bdd26]
>
> /lib64/libc.so.6(+0x361e0)[0x7fbb3e27a1e0]
>
> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7fbb32257910]
>
> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7fbb32254118]
>
> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7fbb324708d6]
>
>
> /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7fbb326a275b]
>
> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7fbb324693b3]
>
> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7fbb322545b2]
>
> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c]
>
> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c]
>
> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7fbb31c18548]
>
> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7fbb3f155c22]
>
> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7fbb3f0d93a5]
>
>
> /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7fbb31a00088]
>
> /lib64/libpthread.so.0(+0x7569)[0x7fbb3e605569]
>
> /lib64/libc.so.6(clone+0x3f)[0x7fbb3e33c9ef]
>

This happens on a 3-Node Gluster v5.5 Cluster on two different volumes. But
both volumes has the same settings:

> Volume Name: shortterm
> Type: Replicate
> Volume ID: 5307e5c5-e8a1-493a-a846-342fb0195dee
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Bricks:
> Brick1: fs-xxxxx-c1-n1:/gluster/brick4/glusterbrick
> Brick2: fs-xxxxx-c1-n2:/gluster/brick4/glusterbrick
> Brick3: fs-xxxxx-c1-n3:/gluster/brick4/glusterbrick
> Options Reconfigured:
> storage.reserve: 1
> performance.client-io-threads: off
> nfs.disable: on
> transport.address-family: inet
> user.smb: disable
> features.read-only: off
> features.worm: off
> features.worm-file-level: on
> features.retention-mode: enterprise
> features.default-retention-period: 120
> network.ping-timeout: 10
> features.cache-invalidation: on
> features.cache-invalidation-timeout: 600
> performance.nl-cache: on
> performance.nl-cache-timeout: 600
> client.event-threads: 32
> server.event-threads: 32
> cluster.lookup-optimize: on
> performance.stat-prefetch: on
> performance.cache-invalidation: on
> performance.md-cache-timeout: 600
> performance.cache-samba-metadata: on
> performance.cache-ima-xattrs: on
> performance.io-thread-count: 64
> cluster.use-compound-fops: on
> performance.cache-size: 512MB
> performance.cache-refresh-timeout: 10
> performance.read-ahead: off
> performance.write-behind-window-size: 4MB
> performance.write-behind: on
> storage.build-pgfid: on
> features.utime: on
> storage.ctime: on
> cluster.quorum-type: fixed
> cluster.quorum-count: 2
> features.bitrot: on
> features.scrub: Active
> features.scrub-freq: daily
> cluster.enable-shared-storage: enable
>
>
Why can this happen to all Brick processes? I don't understand the crash
report. The FOPs are nothing special and after restart brick processes
everything works fine and our application was succeed.

Regards
David Spisla
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190506/5227e9d3/attachment.html>

From spisla80 at gmail.com  Mon May  6 11:51:32 2019
From: spisla80 at gmail.com (David Spisla)
Date: Mon, 6 May 2019 13:51:32 +0200
Subject: [Gluster-users] Hard Failover with Samba and Glusterfs
In-Reply-To: <CAHxyDdN1j+TH9FLOorUvU0H=n_sgmQYyN4B+HGLF8sWRk_EfjA@mail.gmail.com>
References: <CAJyj9j-ZWmEBjfVDkdTU8R2BG+d-16tnCowD-LMUhmEyjGxeXA@mail.gmail.com>
	<CAHxyDdN1j+TH9FLOorUvU0H=n_sgmQYyN4B+HGLF8sWRk_EfjA@mail.gmail.com>
Message-ID: <CAJyj9j9QMHSri3zqJDq0pPy-_zEGGMOQDL-Xg5CdP5NEiBcTGA@mail.gmail.com>

Hello,

I create a Bug for this issue:
https://bugzilla.redhat.com/show_bug.cgi?id=1706842

Regards
David Spisla

Am Mi., 1. Mai 2019 um 14:46 Uhr schrieb Amar Tumballi Suryanarayan <
atumball at redhat.com>:

>
>
> On Wed, Apr 17, 2019 at 1:33 PM David Spisla <spisla80 at gmail.com> wrote:
>
>> Dear Gluster Community,
>>
>> I have this setup: 4-Node Glusterfs v5.5 Cluster, using SAMBA/CTDB v4.8
>> to access the volumes (each node has a VIP)
>>
>> I was testing this failover scenario:
>>
>> 1. Start Writing 940 GB with small files (64K-100K)from a Win10 Client
>> to node1
>> 2. During the write process I hardly shutdown node1  (where the client
>> is connect via VIP) by turn off the power
>>
>> My expectation is, that the write process stops and after a while the
>> Win10 Client offers me a Retry, so I can continue the write on different
>> node (which has now the VIP of node1).
>> In past time I did this observation, but now the system shows a strange
>> bahaviour:
>>
>> The Win10 Client do nothing and the Explorer freezes, in the backend CTDB
>> can not perform the failover and throws errors. The glusterd from node2 and
>> node3 logs this messages:
>>
>>> [2019-04-16 14:47:31.828323] W [glusterd-locks.c:795:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x24349) [0x7f1a62fcb349] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x2d950) [0x7f1a62fd4950] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0xe0359) [0x7f1a63087359] ) 0-management: Lock for vol archive1 not held
>>> [2019-04-16 14:47:31.828350] W [MSGID: 106117] [glusterd-handler.c:6451:__glusterd_peer_rpc_notify] 0-management: Lock not released for archive1
>>> [2019-04-16 14:47:31.828369] W [glusterd-locks.c:795:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x24349) [0x7f1a62fcb349] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x2d950) [0x7f1a62fd4950] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0xe0359) [0x7f1a63087359] ) 0-management: Lock for vol archive2 not held
>>> [2019-04-16 14:47:31.828376] W [MSGID: 106117] [glusterd-handler.c:6451:__glusterd_peer_rpc_notify] 0-management: Lock not released for archive2
>>> [2019-04-16 14:47:31.828412] W [glusterd-locks.c:795:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x24349) [0x7f1a62fcb349] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x2d950) [0x7f1a62fd4950] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0xe0359) [0x7f1a63087359] ) 0-management: Lock for vol gluster_shared_storage not held
>>> [2019-04-16 14:47:31.828423] W [MSGID: 106117] [glusterd-handler.c:6451:__glusterd_peer_rpc_notify] 0-management: Lock not released for gluster_shared_storage
>>>
>>>
>> *In my oponion Samba/CTDB can not perform the failover correctly and
>> continue the write process because glusterfs didn't released the lock.*
>> What do you think? It seems to me like a bug because in past time the
>> failover works correctly.
>>
>>
> Thanks for the report David. It surely looks like a bug, and I would let
> some experts on this domain answer the question. One request on such thing
> is to file a bug (preferred) or github issue, so it can be present in
> system.
>
>
>> Regards
>> David Spisla
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
> --
> Amar Tumballi (amarts)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190506/9504ac9d/attachment.html>

From lm at zork.pl  Mon May  6 13:13:14 2019
From: lm at zork.pl (=?UTF-8?Q?=c5=81ukasz_Michalski?=)
Date: Mon, 6 May 2019 15:13:14 +0200
Subject: [Gluster-users] heal: Not able to fetch volfile from glusterd
Message-ID: <4376d725-a451-7b18-a7a1-c5285c3570b3@zork.pl>

Hi,

I have problem resolving split-brain in one of my installations.

CenOS 7, glusterfs 3.10.12, replica on two nodes:

[root at ixmed1 iscsi]# gluster volume status cluster
Status of volume: cluster
Gluster process???????????????????????????? TCP Port? RDMA Port Online? Pid
------------------------------------------------------------------------------
Brick ixmed2:/glusterfs-bricks/cluster/clus
ter???????????????????????????????????????? 49153???? 0 Y?????? 3028
Brick ixmed1:/glusterfs-bricks/cluster/clus
ter???????????????????????????????????????? 49153???? 0 Y?????? 2917
Self-heal Daemon on localhost?????????????? N/A?????? N/A Y?????? 112929
Self-heal Daemon on ixmed2????????????????? N/A?????? N/A Y?????? 57774

Task Status of Volume cluster
------------------------------------------------------------------------------
There are no active volume tasks

When I try to access one file glusterd reports split brain:

[2019-05-06 12:36:43.785098] E [MSGID: 108008] 
[afr-read-txn.c:90:afr_read_txn_refresh_done] 0-cluster-replicate-0: 
Failing READ on gfid 2584a0e2-c0fa-4fde-8537-5d5b6a5a4635: split-brain 
observed. [Input/output error]
[2019-05-06 12:36:43.787952] E [MSGID: 108008] 
[afr-read-txn.c:90:afr_read_txn_refresh_done] 0-cluster-replicate-0: 
Failing FGETXATTR on gfid 2584a0e2-c0fa-4fde-8537-5d5b6a5a4635: 
split-brain observed. [Input/output error]
[2019-05-06 12:36:43.788778] W [MSGID: 108027] 
[afr-common.c:2722:afr_discover_done] 0-cluster-replicate-0: no read 
subvols for (null)
[2019-05-06 12:36:43.790123] W [fuse-bridge.c:2254:fuse_readv_cbk] 
0-glusterfs-fuse: 3352501: READ => -1 
gfid=2584a0e2-c0fa-4fde-8537-5d5b6a5a4635 fd=0x7fde0803f390 
(Input/output error)
[2019-05-06 12:36:43.794979] W [fuse-bridge.c:2254:fuse_readv_cbk] 
0-glusterfs-fuse: 3352506: READ => -1 
gfid=2584a0e2-c0fa-4fde-8537-5d5b6a5a4635 fd=0x7fde08215ed0 
(Input/output error)
[2019-05-06 12:36:43.800468] W [fuse-bridge.c:2254:fuse_readv_cbk] 
0-glusterfs-fuse: 3352508: READ => -1 
gfid=2584a0e2-c0fa-4fde-8537-5d5b6a5a4635 fd=0x7fde08215ed0 
(Input/output error)

The problem is that "gluster volume heal info" hangs for 10 seconds and 
returns:

 ??? Not able to fetch volfile from glusterd
 ??? Volume heal failed

glfsheal.log contains:

[2019-05-06 12:40:25.589879] I [afr.c:94:fix_quorum_options] 
0-cluster-replicate-0: reindeer: incoming qtype = none
[2019-05-06 12:40:25.589967] I [afr.c:116:fix_quorum_options] 
0-cluster-replicate-0: reindeer: quorum_count = 0
[2019-05-06 12:40:25.593294] W [MSGID: 101174] 
[graph.c:361:_log_if_unknown_option] 0-cluster-readdir-ahead: option 
'parallel-readdir' is not recognized
[2019-05-06 12:40:25.593895] I [MSGID: 104045] [glfs-master.c:91:notify] 
0-gfapi: New graph 69786d65-6431-2d32-3037-3739322d3230 (0) coming up
[2019-05-06 12:40:25.593972] I [MSGID: 114020] [client.c:2352:notify] 
0-cluster-client-0: parent translators are ready, attempting connect on 
transport
[2019-05-06 12:40:25.607836] I [MSGID: 114020] [client.c:2352:notify] 
0-cluster-client-1: parent translators are ready, attempting connect on 
transport
[2019-05-06 12:40:25.608556] I [rpc-clnt.c:2000:rpc_clnt_reconfig] 
0-cluster-client-0: changing port to 49153 (from 0)
[2019-05-06 12:40:25.618167] I [rpc-clnt.c:2000:rpc_clnt_reconfig] 
0-cluster-client-1: changing port to 49153 (from 0)
[2019-05-06 12:40:25.629595] I [MSGID: 114057] 
[client-handshake.c:1451:select_server_supported_programs] 
0-cluster-client-0: Using Program GlusterFS 3.3, Num (1298437), Version 
(330)
[2019-05-06 12:40:25.632031] I [MSGID: 114046] 
[client-handshake.c:1216:client_setvolume_cbk] 0-cluster-client-0: 
Connected to cluster-client-0, attached to remote volume 
'/glusterfs-bricks/cluster/cluster'.
[2019-05-06 12:40:25.632100] I [MSGID: 114047] 
[client-handshake.c:1227:client_setvolume_cbk] 0-cluster-client-0: 
Server and Client lk-version numbers are not same, reopening the fds
[2019-05-06 12:40:25.632263] I [MSGID: 108005] 
[afr-common.c:4817:afr_notify] 0-cluster-replicate-0: Subvolume 
'cluster-client-0' came back up; going online.
[2019-05-06 12:40:25.637707] I [MSGID: 114057] 
[client-handshake.c:1451:select_server_supported_programs] 
0-cluster-client-1: Using Program GlusterFS 3.3, Num (1298437), Version 
(330)
[2019-05-06 12:40:25.639285] I [MSGID: 114046] 
[client-handshake.c:1216:client_setvolume_cbk] 0-cluster-client-1: 
Connected to cluster-client-1, attached to remote volume 
'/glusterfs-bricks/cluster/cluster'.
[2019-05-06 12:40:25.639341] I [MSGID: 114047] 
[client-handshake.c:1227:client_setvolume_cbk] 0-cluster-client-1: 
Server and Client lk-version numbers are not same, reopening the fds
[2019-05-06 12:40:31.564407] C 
[rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired] 0-cluster-client-0: 
server 10.0.104.26:49153 has not responded in the last 5 seconds, 
disconnecting.
[2019-05-06 12:40:31.565764] C 
[rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired] 0-cluster-client-1: 
server 10.0.7.26:49153 has not responded in the last 5 seconds, 
disconnecting.
[2019-05-06 12:40:35.645545] I [MSGID: 114018] 
[client.c:2276:client_rpc_notify] 0-cluster-client-0: disconnected from 
cluster-client-0. Client process will keep trying to connect to glusterd 
until brick's port is available
[2019-05-06 12:40:35.645683] I [socket.c:3534:socket_submit_request] 
0-cluster-client-0: not connected (priv->connected = -1)
[2019-05-06 12:40:35.645755] W [rpc-clnt.c:1693:rpc_clnt_submit] 
0-cluster-client-0: failed to submit rpc-request (XID: 0x7 Program: 
GlusterFS 3.3, ProgVers: 330, Proc: 14) to rpc-transport (cluster-client-0)
[2019-05-06 12:40:35.645807] W [MSGID: 114031] 
[client-rpc-fops.c:797:client3_3_statfs_cbk] 0-cluster-client-0: remote 
operation failed [Drugi koniec nie jest po??czony]
[2019-05-06 12:40:35.645887] I [socket.c:3534:socket_submit_request] 
0-cluster-client-1: not connected (priv->connected = -1)
[2019-05-06 12:40:35.645918] W [rpc-clnt.c:1693:rpc_clnt_submit] 
0-cluster-client-1: failed to submit rpc-request (XID: 0x7 Program: 
GlusterFS 3.3, ProgVers: 330, Proc: 14) to rpc-transport (cluster-client-1)
[2019-05-06 12:40:35.645955] W [MSGID: 114031] 
[client-rpc-fops.c:797:client3_3_statfs_cbk] 0-cluster-client-1: remote 
operation failed [Drugi koniec nie jest po??czony]
[2019-05-06 12:40:35.646008] W [MSGID: 109075] 
[dht-diskusage.c:44:dht_du_info_cbk] 0-cluster-dht: failed to get disk 
info from cluster-replicate-0 [Drugi koniec nie jest po??czony]
[2019-05-06 12:40:35.647846] I [MSGID: 114018] 
[client.c:2276:client_rpc_notify] 0-cluster-client-1: disconnected from 
cluster-client-1. Client process will keep trying to connect to glusterd 
until brick's port is available
[2019-05-06 12:40:35.647895] E [MSGID: 108006] 
[afr-common.c:4842:afr_notify] 0-cluster-replicate-0: All subvolumes are 
down. Going offline until atleast one of them comes back up.
[2019-05-06 12:40:35.647989] I [MSGID: 108006] 
[afr-common.c:4984:afr_local_init] 0-cluster-replicate-0: no subvolumes up
[2019-05-06 12:40:35.648051] I [MSGID: 108006] 
[afr-common.c:4984:afr_local_init] 0-cluster-replicate-0: no subvolumes up
[2019-05-06 12:40:35.648122] I [MSGID: 104039] 
[glfs-resolve.c:902:__glfs_active_subvol] 0-cluster: first lookup on 
graph 69786d65-6431-2d32-3037-3739322d3230 (0) failed (Drugi koniec nie 
jest po??czony) [Drugi koniec nie jest po??czony]

"Drugi koniec nie jest po??czony" -> Transport endpoint not connected

On brick process side there is an connection attempt:

[2019-05-06 12:40:25.638032] I [addr.c:182:gf_auth] 
0-/glusterfs-bricks/cluster/cluster: allowed = "*", received addr = 
"10.0.7.26"
[2019-05-06 12:40:25.638080] I [login.c:111:gf_auth] 0-auth/login: 
allowed user names: e2f4c8f4-d040-4856-b6e3-62611fbab0ea
[2019-05-06 12:40:25.638109] I [MSGID: 115029] 
[server-handshake.c:695:server_setvolume] 0-cluster-server: accepted 
client from 
ixmed1-207792-2019/05/06-12:40:25:562982-cluster-client-1-0-0 (version: 
3.10.12)
[2019-05-06 12:40:31.565931] I [MSGID: 115036] 
[server.c:559:server_rpc_notify] 0-cluster-server: disconnecting 
connection from 
ixmed1-207792-2019/05/06-12:40:25:562982-cluster-client-1-0-0
[2019-05-06 12:40:31.566420] I [MSGID: 101055] 
[client_t.c:436:gf_client_unref] 0-cluster-server: Shutting down 
connection ixmed1-207792-2019/05/06-12:40:25:562982-cluster-client-1-0-0

I am not able to use any heal command because of this problem.

I have three volumes configured on that nodes. Configuration is 
identical and "gluster volume heal" command fails for all of them.

Can anyone help?

Thanks,
?ukasz


From vbellur at redhat.com  Mon May  6 17:48:22 2019
From: vbellur at redhat.com (Vijay Bellur)
Date: Mon, 6 May 2019 10:48:22 -0700
Subject: [Gluster-users] Brick-Xlators crashes after Set-RO and Read
In-Reply-To: <CAJyj9j9wLf7BLJOP8Se5NrXr2ySPw5bTMY8OQ+RU0f4tCzAXcA@mail.gmail.com>
References: <CAJyj9j9wLf7BLJOP8Se5NrXr2ySPw5bTMY8OQ+RU0f4tCzAXcA@mail.gmail.com>
Message-ID: <CAHn=sVAm-9GXBYst2P0k3rn-HU15Rw3HgUAGY0R=MOymDpxq-A@mail.gmail.com>

Thank you for the report, David. Do you have core files available on any of
the servers? If yes, would it be possible for you to provide a backtrace.

Regards,
Vijay

On Mon, May 6, 2019 at 3:09 AM David Spisla <spisla80 at gmail.com> wrote:

> Hello folks,
>
> we have a client application (runs on Win10) which does some FOPs on a
> gluster volume which is accessed by SMB.
>
> *Scenario 1* is a READ Operation which reads all files successively and
> checks if the files data was correctly copied. While doing this, all brick
> processes crashes and in the logs one have this crash report on every brick
> log:
>
>> CTX_ID:a0359502-2c76-4fee-8cb9-365679dc690e-GRAPH_ID:0-PID:32934-HOST:XX-XXXXX-XX-XX-PC_NAME:shortterm-client-2-RECON_NO:-0, gfid: 00000000-0000-0000-0000-000000000001, req(uid:2000,gid:2000,perm:1,ngrps:1), ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission denied]
>> pending frames:
>> frame : type(0) op(27)
>> frame : type(0) op(40)
>> patchset: git://git.gluster.org/glusterfs.git
>> signal received: 11
>> time of crash:
>> 2019-04-16 08:32:21
>> configuration details:
>> argp 1
>> backtrace 1
>> dlfcn 1
>> libpthread 1
>> llistxattr 1
>> setfsid 1
>> spinlock 1
>> epoll.h 1
>> xattr.h 1
>> st_atim.tv_nsec 1
>> package-string: glusterfs 5.5
>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f9a5bd4d64c]
>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f9a5bd57d26]
>> /lib64/libc.so.6(+0x361a0)[0x7f9a5af141a0]
>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7f9a4ef0e910]
>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7f9a4ef0b118]
>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7f9a4f1278d6]
>> /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7f9a4f35975b]
>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7f9a4f1203b3]
>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7f9a4ef0b5b2]
>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c]
>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c]
>> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7f9a4e8cf548]
>> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7f9a5bdefc22]
>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7f9a5bd733a5]
>> /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7f9a4e6b7088]
>> /lib64/libpthread.so.0(+0x7569)[0x7f9a5b29f569]
>> /lib64/libc.so.6(clone+0x3f)[0x7f9a5afd69af]
>>
>> *Scenario 2 *The application just SET Read-Only on each file
> sucessively. After the 70th file was set, all the bricks crashes and again,
> one can read this crash report in every brick log:
>
>>
>>
>> [2019-05-02 07:43:39.953591] I [MSGID: 139001]
>> [posix-acl.c:263:posix_acl_log_permit_denied] 0-longterm-access-control:
>> client:
>> CTX_ID:21aa9c75-3a5f-41f9-925b-48e4c80bd24a-GRAPH_ID:0-PID:16325-HOST:XXX-X-X-XXX-PC_NAME:longterm-client-0-RECON_NO:-0,
>> gfid: 00000000-0000-0000-0000-000000000001,
>> req(uid:2000,gid:2000,perm:1,ngrps:1),
>> ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission
>> denied]
>>
>> pending frames:
>>
>> frame : type(0) op(27)
>>
>> patchset: git://git.gluster.org/glusterfs.git
>>
>> signal received: 11
>>
>> time of crash:
>>
>> 2019-05-02 07:43:39
>>
>> configuration details:
>>
>> argp 1
>>
>> backtrace 1
>>
>> dlfcn 1
>>
>> libpthread 1
>>
>> llistxattr 1
>>
>> setfsid 1
>>
>> spinlock 1
>>
>> epoll.h 1
>>
>> xattr.h 1
>>
>> st_atim.tv_nsec 1
>>
>> package-string: glusterfs 5.5
>>
>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fbb3f0b364c]
>>
>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fbb3f0bdd26]
>>
>> /lib64/libc.so.6(+0x361e0)[0x7fbb3e27a1e0]
>>
>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7fbb32257910]
>>
>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7fbb32254118]
>>
>>
>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7fbb324708d6]
>>
>>
>> /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7fbb326a275b]
>>
>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7fbb324693b3]
>>
>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7fbb322545b2]
>>
>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c]
>>
>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c]
>>
>>
>> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7fbb31c18548]
>>
>> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7fbb3f155c22]
>>
>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7fbb3f0d93a5]
>>
>>
>> /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7fbb31a00088]
>>
>> /lib64/libpthread.so.0(+0x7569)[0x7fbb3e605569]
>>
>> /lib64/libc.so.6(clone+0x3f)[0x7fbb3e33c9ef]
>>
>
> This happens on a 3-Node Gluster v5.5 Cluster on two different volumes.
> But both volumes has the same settings:
>
>> Volume Name: shortterm
>> Type: Replicate
>> Volume ID: 5307e5c5-e8a1-493a-a846-342fb0195dee
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 1 x 3 = 3
>> Transport-type: tcp
>> Bricks:
>> Brick1: fs-xxxxx-c1-n1:/gluster/brick4/glusterbrick
>> Brick2: fs-xxxxx-c1-n2:/gluster/brick4/glusterbrick
>> Brick3: fs-xxxxx-c1-n3:/gluster/brick4/glusterbrick
>> Options Reconfigured:
>> storage.reserve: 1
>> performance.client-io-threads: off
>> nfs.disable: on
>> transport.address-family: inet
>> user.smb: disable
>> features.read-only: off
>> features.worm: off
>> features.worm-file-level: on
>> features.retention-mode: enterprise
>> features.default-retention-period: 120
>> network.ping-timeout: 10
>> features.cache-invalidation: on
>> features.cache-invalidation-timeout: 600
>> performance.nl-cache: on
>> performance.nl-cache-timeout: 600
>> client.event-threads: 32
>> server.event-threads: 32
>> cluster.lookup-optimize: on
>> performance.stat-prefetch: on
>> performance.cache-invalidation: on
>> performance.md-cache-timeout: 600
>> performance.cache-samba-metadata: on
>> performance.cache-ima-xattrs: on
>> performance.io-thread-count: 64
>> cluster.use-compound-fops: on
>> performance.cache-size: 512MB
>> performance.cache-refresh-timeout: 10
>> performance.read-ahead: off
>> performance.write-behind-window-size: 4MB
>> performance.write-behind: on
>> storage.build-pgfid: on
>> features.utime: on
>> storage.ctime: on
>> cluster.quorum-type: fixed
>> cluster.quorum-count: 2
>> features.bitrot: on
>> features.scrub: Active
>> features.scrub-freq: daily
>> cluster.enable-shared-storage: enable
>>
>>
> Why can this happen to all Brick processes? I don't understand the crash
> report. The FOPs are nothing special and after restart brick processes
> everything works fine and our application was succeed.
>
> Regards
> David Spisla
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190506/c1d08f57/attachment.html>

From atumball at redhat.com  Mon May  6 18:15:04 2019
From: atumball at redhat.com (Amar Tumballi Suryanarayan)
Date: Mon, 6 May 2019 14:15:04 -0400
Subject: [Gluster-users] gluster-block v0.4 is alive!
In-Reply-To: <CANwsLLHaP7OV50wHbc0jSH0frvhLr9a2qNZf1BagoY1+q0S8jQ@mail.gmail.com>
References: <CANwsLLHaP7OV50wHbc0jSH0frvhLr9a2qNZf1BagoY1+q0S8jQ@mail.gmail.com>
Message-ID: <CAHxyDdP36+e=tjTZhz9trkbxfNPbE9Y=98C6+h95oxUc=QdGFQ@mail.gmail.com>

On Thu, May 2, 2019 at 1:35 PM Prasanna Kalever <pkalever at redhat.com> wrote:

> Hello Gluster folks,
>
> Gluster-block team is happy to announce the v0.4 release [1].
>
> This is the new stable version of gluster-block, lots of new and
> exciting features and interesting bug fixes are made available as part
> of this release.
> Please find the big list of release highlights and notable fixes at [2].
>
>
Good work Team (Prasanna and Xiubo Li to be precise)!!

This was much needed release w.r.to gluster-block project, mainly because
of the number of improvements done since last release. Also, gluster-block
release 0.3 was not compatible with glusterfs-6.x series.

All, feel free to use it if your deployment has any usecase for Block
storage, and give us feedback. Happy to make sure gluster-block is stable
for you.

Regards,
Amar


> Details about installation can be found in the easy install guide at
> [3]. Find the details about prerequisites and setup guide at [4].
> If you are a new user, checkout the demo video attached in the README
> doc [5], which will be a good source of intro to the project.
> There are good examples about how to use gluster-block both in the man
> pages [6] and test file [7] (also in the README).
>
> gluster-block is part of fedora package collection, an updated package
> with release version v0.4 will be soon made available. And the
> community provided packages will be soon made available at [8].
>
> Please spend a minute to report any kind of issue that comes to your
> notice with this handy link [9].
> We look forward to your feedback, which will help gluster-block get better!
>
> We would like to thank all our users, contributors for bug filing and
> fixes, also the whole team who involved in the huge effort with
> pre-release testing.
>
>
> [1] https://github.com/gluster/gluster-block
> [2] https://github.com/gluster/gluster-block/releases
> [3] https://github.com/gluster/gluster-block/blob/master/INSTALL
> [4] https://github.com/gluster/gluster-block#usage
> [5] https://github.com/gluster/gluster-block/blob/master/README.md
> [6] https://github.com/gluster/gluster-block/tree/master/docs
> [7] https://github.com/gluster/gluster-block/blob/master/tests/basic.t
> [8] https://download.gluster.org/pub/gluster/gluster-block/
> [9] https://github.com/gluster/gluster-block/issues/new
>
> Cheers,
> Team Gluster-Block!
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>

-- 
Amar Tumballi (amarts)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190506/30c83b0b/attachment.html>

From atumball at redhat.com  Mon May  6 18:16:25 2019
From: atumball at redhat.com (Amar Tumballi Suryanarayan)
Date: Mon, 6 May 2019 14:16:25 -0400
Subject: [Gluster-users] [External] Re: anyone using gluster-block?
In-Reply-To: <CAH2S_oT5g1ot76SJSqdc-XEM-V8mEHyh=e0L8GP4rL2+ezp7CQ@mail.gmail.com>
References: <CAH2S_oSAJbUfg=_zodMovxTjVLNEYCd_UcqNrY0gUNqoJ2EY5w@mail.gmail.com>
	<CAHn=sVAyqTQLRPAub9ihy2qFngUJG6Yv3TU0fkHwL+_F2JTTSw@mail.gmail.com>
	<CAH2S_oT5g1ot76SJSqdc-XEM-V8mEHyh=e0L8GP4rL2+ezp7CQ@mail.gmail.com>
Message-ID: <CAHxyDdNr8S-1cvqtZrJ2xYZhYOVjz6NPzeUdhsd9dnFF-RZT2A@mail.gmail.com>

Davide,

With release 0.4, gluster-block is now having more functionality, and we
did many stability fixes. Feel free to try out, and let us know how you
feel.

-Amar

On Fri, Nov 9, 2018 at 3:36 AM Davide Obbi <davide.obbi at booking.com> wrote:

> Hi Vijay,
>
> The Volume has been created using heketi-cli blockvolume create command.
> The block config is the config applied by heketi out of the box and in my
> case ended up to be:
> - 3 nodes each with 1 brick
> - the brick is carved from a VG with a single PV
> - the PV consists of a 1.2TB SSD, not partitioned and no HW RAID behind
> - the volume does not have any custom setting aside what configured in
> /etc/glusterfs/group-gluster-block by default
> performance.quick-read=off
> performance.read-ahead=off
> performance.io-cache=off
> performance.stat-prefetch=off
> performance.open-behind=off
> performance.readdir-ahead=off
> performance.strict-o-direct=on
> network.remote-dio=disable
> cluster.eager-lock=enable
> cluster.quorum-type=auto
> cluster.data-self-heal-algorithm=full
> cluster.locking-scheme=granular
> cluster.shd-max-threads=8
> cluster.shd-wait-qlength=10000
> features.shard=on
> features.shard-block-size=64MB
> user.cifs=off
> server.allow-insecure=on
> cluster.choose-local=off
>
> Kernel: 3.10.0-862.11.6.el7.x86_64
> OS: Centos 7.5.1804
> tcmu-runner: 0.2rc4.el7
>
> Each node has 32 cores and 128GB RAM and 10Gb connection.
>
> What i am trying to understand is what should be performance expectations
> with gluster-block since i couldnt find many benchmarks online.
>
> Regards
> Davide
>
>
> On Fri, Nov 9, 2018 at 7:07 AM Vijay Bellur <vbellur at redhat.com> wrote:
>
>> Hi Davide,
>>
>> Can you please share the block hosting volume configuration?
>>
>> Also, more details about the kernel and tcmu-runner versions could help
>> in understanding the problem better.
>>
>> Thanks,
>> Vijay
>>
>> On Tue, Nov 6, 2018 at 6:16 AM Davide Obbi <davide.obbi at booking.com>
>> wrote:
>>
>>> Hi,
>>>
>>> i am testing gluster-block and i am wondering if someone has used it and
>>> have some feedback regarding its performance.. just to set some
>>> expectations... for example:
>>> - i have deployed a block volume using heketi on a 3 nodes gluster4.1
>>> cluster. it's a replica3 volume.
>>> - i have mounted via iscsi using multipath config suggested, created
>>> vg/lv and put xfs on it
>>> - all done without touching any volume setting or customizing xfs
>>> parameters etc..
>>> - all baremetal running on 10Gb, gluster has a single block device, SSD
>>> in use by heketi
>>>
>>> so i tried a dd and i get a 4.7 MB/s?
>>> - on the gluster nodes i have in write ~200iops, ~15MB/s, 75% util
>>> steady and spiky await time up to 100ms alternating between the servers.
>>> CPUs are mostly idle but there is some waiting...
>>> - Glusterd and fsd utilization is below 1%
>>>
>>> The thing is that a gluster fuse mount on same platform does not have
>>> this slowness so there must be something wrong with my understanding of
>>> gluster-block?
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>
> --
> Davide Obbi
> System Administrator
>
> Booking.com B.V.
> Vijzelstraat 66-80 Amsterdam 1017HL Netherlands
> Direct +31207031558
> [image: Booking.com] <https://www.booking.com/>
> Empowering People to experience the world since 1996
> 43 languages, 214+ offices worldwide, 141,000+ global destinations, 29
> million reported listings
> Subsidiary of Booking Holdings Inc. (NASDAQ: BKNG)
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users


-- 
Amar Tumballi (amarts)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190506/53cdf548/attachment.html>

From jthottan at redhat.com  Tue May  7 04:10:11 2019
From: jthottan at redhat.com (Jiffin Tony Thottan)
Date: Tue, 7 May 2019 09:40:11 +0530
Subject: [Gluster-users] Proposing to previous ganesha HA
 clustersolution back to gluster code as gluster-7 feature
In-Reply-To: <xesix0fr4eyn6d8tmgeermgk.1556951696318@email.android.com>
References: <xesix0fr4eyn6d8tmgeermgk.1556951696318@email.android.com>
Message-ID: <bf77a163-cdc1-40e4-df90-5f89e0432efb@redhat.com>

Hi

On 04/05/19 12:04 PM, Strahil wrote:
> Hi Jiffin,
>
> No vendor will support your corosync/pacemaker stack if you do not have proper fencing.
> As Gluster is already a cluster of its own, it makes sense to control everything from there.
>
> Best Regards,


Yeah I agree with your point. What I meant to say by default this 
feature won't provide any fencing mechanism,

user need to manually configure fencing for the cluster. In future we 
can try to include to default fencing configuration

for the ganesha cluster as part of the Ganesha HA configuration

Regards,

Jiffin


> Strahil NikolovOn May 3, 2019 09:08, Jiffin Tony Thottan <jthottan at redhat.com> wrote:
>>
>> On 30/04/19 6:59 PM, Strahil Nikolov wrote:
>>> Hi,
>>>
>>> I'm posting this again as it got bounced.
>>> Keep in mind that corosync/pacemaker? is hard for proper setup by new admins/users.
>>>
>>> I'm still trying to remediate the effects of poor configuration at work.
>>> Also, storhaug is nice for hyperconverged setups where the host is not only hosting bricks, but? other? workloads.
>>> Corosync/pacemaker require proper fencing to be setup and most of the stonith resources 'shoot the other node in the head'.
>>> I would be happy to see an easy to deploy (let say 'cluster.enable-ha-ganesha true') and gluster to be bringing up the Floating IPs and taking care of the NFS locks, so no disruption will be felt by the clients.
>>
>> It do take care those, but need to follow certain prerequisite, but
>> please fencing won't configured for this setup. May we think about in
>> future.
>>
>> -- 
>>
>> Jiffin
>>
>>> Still, this will be a lot of work to achieve.
>>>
>>> Best Regards,
>>> Strahil Nikolov
>>>
>>> On Apr 30, 2019 15:19, Jim Kinney <jim.kinney at gmail.com> wrote:
>>>>      
>>>> +1!
>>>> I'm using nfs-ganesha in my next upgrade so my client systems can use NFS instead of fuse mounts. Having an integrated, designed in process to coordinate multiple nodes into an HA cluster will very welcome.
>>>>
>>>> On April 30, 2019 3:20:11 AM EDT, Jiffin Tony Thottan <jthottan at redhat.com> wrote:
>>>>>      
>>>>> Hi all,
>>>>>
>>>>> Some of you folks may be familiar with HA solution provided for nfs-ganesha by gluster using pacemaker and corosync.
>>>>>
>>>>> That feature was removed in glusterfs 3.10 in favour for common HA project "Storhaug". Even Storhaug was not progressed
>>>>>
>>>>> much from last two years and current development is in halt state, hence planning to restore old HA ganesha solution back
>>>>>
>>>>> to gluster code repository with some improvement and targetting for next gluster release 7.
>>>>>
>>>>>  ? ??I have opened up an issue [1] with details and posted initial set of patches [2]
>>>>>
>>>>> Please share your thoughts on the same
>>>>>
>>>>>
>>>>> Regards,
>>>>>
>>>>> Jiffin
>>>>>
>>>>> [1] https://github.com/gluster/glusterfs/issues/663
>>>>>
>>>>> [2] https://review.gluster.org/#/q/topic:rfc-663+(status:open+OR+status:merged)
>>>>>
>>>>>
>>>> -- 
>>>> Sent from my Android device with K-9 Mail. All tyopes are thumb related and reflect authenticity.
>>> Keep in mind that corosync/pacemaker? is hard for proper setup by new admins/users.
>>>
>>> I'm still trying to remediate the effects of poor configuration at work.
>>> Also, storhaug is nice for hyperconverged setups where the host is not only hosting bricks, but? other? workloads.
>>> Corosync/pacemaker require proper fencing to be setup and most of the stonith resources 'shoot the other node in the head'.
>>> I would be happy to see an easy to deploy (let say 'cluster.enable-ha-ganesha true') and gluster to be bringing up the Floating IPs and taking care of the NFS locks, so no disruption will be felt by the clients.
>>>
>>> Still, this will be a lot of work to achieve.
>>>
>>> Best Regards,
>>> Strahil NikolovOn Apr 30, 2019 15:19, Jim Kinney <jim.kinney at gmail.com> wrote:
>>>> +1!
>>>> I'm using nfs-ganesha in my next upgrade so my client systems can use NFS instead of fuse mounts. Having an integrated, designed in process to coordinate multiple nodes into an HA cluster will very welcome.
>>>>
>>>> On April 30, 2019 3:20:11 AM EDT, Jiffin Tony Thottan <jthottan at redhat.com> wrote:
>>>>> Hi all,
>>>>>
>>>>> Some of you folks may be familiar with HA solution provided for nfs-ganesha by gluster using pacemaker and corosync.
>>>>>
>>>>> That feature was removed in glusterfs 3.10 in favour for common HA project "Storhaug". Even Storhaug was not progressed
>>>>>
>>>>> much from last two years and current development is in halt state, hence planning to restore old HA ganesha solution back
>>>>>
>>>>> to gluster code repository with some improvement and targetting for next gluster release 7.
>>>>>
>>>>> I have opened up an issue [1] with details and posted initial set of patches [2]
>>>>>
>>>>> Please share your thoughts on the same
>>>>>
>>>>> Regards,
>>>>>
>>>>> Jiffin
>>>>>
>>>>> [1] https://github.com/gluster/glusterfs/issues/663
>>>>>
>>>>> [2] https://review.gluster.org/#/q/topic:rfc-663+(status:open+OR+status:merged)
>>>> -- 
>>>> Sent from my Android device with K-9 Mail. All tyopes are thumb related and reflect authenticity.

From ndevos at redhat.com  Tue May  7 05:35:34 2019
From: ndevos at redhat.com (Niels de Vos)
Date: Tue, 7 May 2019 07:35:34 +0200
Subject: [Gluster-users] gluster-block v0.4 is alive!
In-Reply-To: <CANwsLLHaP7OV50wHbc0jSH0frvhLr9a2qNZf1BagoY1+q0S8jQ@mail.gmail.com>
References: <CANwsLLHaP7OV50wHbc0jSH0frvhLr9a2qNZf1BagoY1+q0S8jQ@mail.gmail.com>
Message-ID: <20190507053534.GF5209@ndevos-x270>

On Thu, May 02, 2019 at 11:04:41PM +0530, Prasanna Kalever wrote:
> Hello Gluster folks,
> 
> Gluster-block team is happy to announce the v0.4 release [1].
> 
> This is the new stable version of gluster-block, lots of new and
> exciting features and interesting bug fixes are made available as part
> of this release.
> Please find the big list of release highlights and notable fixes at [2].
> 
> Details about installation can be found in the easy install guide at
> [3]. Find the details about prerequisites and setup guide at [4].
> If you are a new user, checkout the demo video attached in the README
> doc [5], which will be a good source of intro to the project.
> There are good examples about how to use gluster-block both in the man
> pages [6] and test file [7] (also in the README).
> 
> gluster-block is part of fedora package collection, an updated package
> with release version v0.4 will be soon made available. And the
> community provided packages will be soon made available at [8].

Updates for Fedora are available in the testing repositories:

Fedora 30: https://bodhi.fedoraproject.org/updates/FEDORA-2019-76730d7230
Fedora 29: https://bodhi.fedoraproject.org/updates/FEDORA-2019-cc7cdce2a4
Fedora 28: https://bodhi.fedoraproject.org/updates/FEDORA-2019-9e9a210110

Installation instructions can be found at the above links. Please leave
testing feedback as comments on the Fedora Update pages.

Thanks,
Niels


> Please spend a minute to report any kind of issue that comes to your
> notice with this handy link [9].
> We look forward to your feedback, which will help gluster-block get better!
> 
> We would like to thank all our users, contributors for bug filing and
> fixes, also the whole team who involved in the huge effort with
> pre-release testing.
> 
> 
> [1] https://github.com/gluster/gluster-block
> [2] https://github.com/gluster/gluster-block/releases
> [3] https://github.com/gluster/gluster-block/blob/master/INSTALL
> [4] https://github.com/gluster/gluster-block#usage
> [5] https://github.com/gluster/gluster-block/blob/master/README.md
> [6] https://github.com/gluster/gluster-block/tree/master/docs
> [7] https://github.com/gluster/gluster-block/blob/master/tests/basic.t
> [8] https://download.gluster.org/pub/gluster/gluster-block/
> [9] https://github.com/gluster/gluster-block/issues/new
> 
> Cheers,
> Team Gluster-Block!
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel

From ravishankar at redhat.com  Tue May  7 06:25:07 2019
From: ravishankar at redhat.com (Ravishankar N)
Date: Tue, 7 May 2019 11:55:07 +0530
Subject: [Gluster-users] heal: Not able to fetch volfile from glusterd
In-Reply-To: <4376d725-a451-7b18-a7a1-c5285c3570b3@zork.pl>
References: <4376d725-a451-7b18-a7a1-c5285c3570b3@zork.pl>
Message-ID: <e0b96e81-e4d6-8071-fbe9-f3e8c11d0af4@redhat.com>


On 06/05/19 6:43 PM, ?ukasz Michalski wrote:
> Hi,
>
> I have problem resolving split-brain in one of my installations.
>
> CenOS 7, glusterfs 3.10.12, replica on two nodes:
>
> [root at ixmed1 iscsi]# gluster volume status cluster
> Status of volume: cluster
> Gluster process???????????????????????????? TCP Port? RDMA Port 
> Online? Pid
> ------------------------------------------------------------------------------ 
>
> Brick ixmed2:/glusterfs-bricks/cluster/clus
> ter???????????????????????????????????????? 49153???? 0 Y 3028
> Brick ixmed1:/glusterfs-bricks/cluster/clus
> ter???????????????????????????????????????? 49153???? 0 Y 2917
> Self-heal Daemon on localhost?????????????? N/A?????? N/A Y 112929
> Self-heal Daemon on ixmed2????????????????? N/A?????? N/A Y 57774
>
> Task Status of Volume cluster
> ------------------------------------------------------------------------------ 
>
> There are no active volume tasks
>
> When I try to access one file glusterd reports split brain:
>
> [2019-05-06 12:36:43.785098] E [MSGID: 108008] 
> [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-cluster-replicate-0: 
> Failing READ on gfid 2584a0e2-c0fa-4fde-8537-5d5b6a5a4635: split-brain 
> observed. [Input/output error]
> [2019-05-06 12:36:43.787952] E [MSGID: 108008] 
> [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-cluster-replicate-0: 
> Failing FGETXATTR on gfid 2584a0e2-c0fa-4fde-8537-5d5b6a5a4635: 
> split-brain observed. [Input/output error]
> [2019-05-06 12:36:43.788778] W [MSGID: 108027] 
> [afr-common.c:2722:afr_discover_done] 0-cluster-replicate-0: no read 
> subvols for (null)
> [2019-05-06 12:36:43.790123] W [fuse-bridge.c:2254:fuse_readv_cbk] 
> 0-glusterfs-fuse: 3352501: READ => -1 
> gfid=2584a0e2-c0fa-4fde-8537-5d5b6a5a4635 fd=0x7fde0803f390 
> (Input/output error)
> [2019-05-06 12:36:43.794979] W [fuse-bridge.c:2254:fuse_readv_cbk] 
> 0-glusterfs-fuse: 3352506: READ => -1 
> gfid=2584a0e2-c0fa-4fde-8537-5d5b6a5a4635 fd=0x7fde08215ed0 
> (Input/output error)
> [2019-05-06 12:36:43.800468] W [fuse-bridge.c:2254:fuse_readv_cbk] 
> 0-glusterfs-fuse: 3352508: READ => -1 
> gfid=2584a0e2-c0fa-4fde-8537-5d5b6a5a4635 fd=0x7fde08215ed0 
> (Input/output error)
>
> The problem is that "gluster volume heal info" hangs for 10 seconds 
> and returns:
>
> ??? Not able to fetch volfile from glusterd
> ??? Volume heal failed
>
> glfsheal.log contains:
>
> [2019-05-06 12:40:25.589879] I [afr.c:94:fix_quorum_options] 
> 0-cluster-replicate-0: reindeer: incoming qtype = none
> [2019-05-06 12:40:25.589967] I [afr.c:116:fix_quorum_options] 
> 0-cluster-replicate-0: reindeer: quorum_count = 0
> [2019-05-06 12:40:25.593294] W [MSGID: 101174] 
> [graph.c:361:_log_if_unknown_option] 0-cluster-readdir-ahead: option 
> 'parallel-readdir' is not recognized
> [2019-05-06 12:40:25.593895] I [MSGID: 104045] 
> [glfs-master.c:91:notify] 0-gfapi: New graph 
> 69786d65-6431-2d32-3037-3739322d3230 (0) coming up
> [2019-05-06 12:40:25.593972] I [MSGID: 114020] [client.c:2352:notify] 
> 0-cluster-client-0: parent translators are ready, attempting connect 
> on transport
> [2019-05-06 12:40:25.607836] I [MSGID: 114020] [client.c:2352:notify] 
> 0-cluster-client-1: parent translators are ready, attempting connect 
> on transport
> [2019-05-06 12:40:25.608556] I [rpc-clnt.c:2000:rpc_clnt_reconfig] 
> 0-cluster-client-0: changing port to 49153 (from 0)
> [2019-05-06 12:40:25.618167] I [rpc-clnt.c:2000:rpc_clnt_reconfig] 
> 0-cluster-client-1: changing port to 49153 (from 0)
> [2019-05-06 12:40:25.629595] I [MSGID: 114057] 
> [client-handshake.c:1451:select_server_supported_programs] 
> 0-cluster-client-0: Using Program GlusterFS 3.3, Num (1298437), 
> Version (330)
> [2019-05-06 12:40:25.632031] I [MSGID: 114046] 
> [client-handshake.c:1216:client_setvolume_cbk] 0-cluster-client-0: 
> Connected to cluster-client-0, attached to remote volume 
> '/glusterfs-bricks/cluster/cluster'.
> [2019-05-06 12:40:25.632100] I [MSGID: 114047] 
> [client-handshake.c:1227:client_setvolume_cbk] 0-cluster-client-0: 
> Server and Client lk-version numbers are not same, reopening the fds
> [2019-05-06 12:40:25.632263] I [MSGID: 108005] 
> [afr-common.c:4817:afr_notify] 0-cluster-replicate-0: Subvolume 
> 'cluster-client-0' came back up; going online.
> [2019-05-06 12:40:25.637707] I [MSGID: 114057] 
> [client-handshake.c:1451:select_server_supported_programs] 
> 0-cluster-client-1: Using Program GlusterFS 3.3, Num (1298437), 
> Version (330)
> [2019-05-06 12:40:25.639285] I [MSGID: 114046] 
> [client-handshake.c:1216:client_setvolume_cbk] 0-cluster-client-1: 
> Connected to cluster-client-1, attached to remote volume 
> '/glusterfs-bricks/cluster/cluster'.
> [2019-05-06 12:40:25.639341] I [MSGID: 114047] 
> [client-handshake.c:1227:client_setvolume_cbk] 0-cluster-client-1: 
> Server and Client lk-version numbers are not same, reopening the fds
> [2019-05-06 12:40:31.564407] C 
> [rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired] 0-cluster-client-0: 
> server 10.0.104.26:49153 has not responded in the last 5 seconds, 
> disconnecting.
> [2019-05-06 12:40:31.565764] C 
> [rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired] 0-cluster-client-1: 
> server 10.0.7.26:49153 has not responded in the last 5 seconds, 
> disconnecting.

This seems to be a problem.? Have you changed the value of ping-timeout 
? Could you share the output of `gluster volume info`?

Does the same issue occur if you try to resolve the split-brain on the 
gfid 2584a0e2-c0fa-4fde-8537-5d5b6a5a4635 using the |gluster volume heal 
<VOLNAME> split-brain |CLI?

-Ravi

> [2019-05-06 12:40:35.645545] I [MSGID: 114018] 
> [client.c:2276:client_rpc_notify] 0-cluster-client-0: disconnected 
> from cluster-client-0. Client process will keep trying to connect to 
> glusterd until brick's port is available
> [2019-05-06 12:40:35.645683] I [socket.c:3534:socket_submit_request] 
> 0-cluster-client-0: not connected (priv->connected = -1)
> [2019-05-06 12:40:35.645755] W [rpc-clnt.c:1693:rpc_clnt_submit] 
> 0-cluster-client-0: failed to submit rpc-request (XID: 0x7 Program: 
> GlusterFS 3.3, ProgVers: 330, Proc: 14) to rpc-transport 
> (cluster-client-0)
> [2019-05-06 12:40:35.645807] W [MSGID: 114031] 
> [client-rpc-fops.c:797:client3_3_statfs_cbk] 0-cluster-client-0: 
> remote operation failed [Drugi koniec nie jest po??czony]
> [2019-05-06 12:40:35.645887] I [socket.c:3534:socket_submit_request] 
> 0-cluster-client-1: not connected (priv->connected = -1)
> [2019-05-06 12:40:35.645918] W [rpc-clnt.c:1693:rpc_clnt_submit] 
> 0-cluster-client-1: failed to submit rpc-request (XID: 0x7 Program: 
> GlusterFS 3.3, ProgVers: 330, Proc: 14) to rpc-transport 
> (cluster-client-1)
> [2019-05-06 12:40:35.645955] W [MSGID: 114031] 
> [client-rpc-fops.c:797:client3_3_statfs_cbk] 0-cluster-client-1: 
> remote operation failed [Drugi koniec nie jest po??czony]
> [2019-05-06 12:40:35.646008] W [MSGID: 109075] 
> [dht-diskusage.c:44:dht_du_info_cbk] 0-cluster-dht: failed to get disk 
> info from cluster-replicate-0 [Drugi koniec nie jest po??czony]
> [2019-05-06 12:40:35.647846] I [MSGID: 114018] 
> [client.c:2276:client_rpc_notify] 0-cluster-client-1: disconnected 
> from cluster-client-1. Client process will keep trying to connect to 
> glusterd until brick's port is available
> [2019-05-06 12:40:35.647895] E [MSGID: 108006] 
> [afr-common.c:4842:afr_notify] 0-cluster-replicate-0: All subvolumes 
> are down. Going offline until atleast one of them comes back up.
> [2019-05-06 12:40:35.647989] I [MSGID: 108006] 
> [afr-common.c:4984:afr_local_init] 0-cluster-replicate-0: no 
> subvolumes up
> [2019-05-06 12:40:35.648051] I [MSGID: 108006] 
> [afr-common.c:4984:afr_local_init] 0-cluster-replicate-0: no 
> subvolumes up
> [2019-05-06 12:40:35.648122] I [MSGID: 104039] 
> [glfs-resolve.c:902:__glfs_active_subvol] 0-cluster: first lookup on 
> graph 69786d65-6431-2d32-3037-3739322d3230 (0) failed (Drugi koniec 
> nie jest po??czony) [Drugi koniec nie jest po??czony]
>
> "Drugi koniec nie jest po??czony" -> Transport endpoint not connected
>
> On brick process side there is an connection attempt:
>
> [2019-05-06 12:40:25.638032] I [addr.c:182:gf_auth] 
> 0-/glusterfs-bricks/cluster/cluster: allowed = "*", received addr = 
> "10.0.7.26"
> [2019-05-06 12:40:25.638080] I [login.c:111:gf_auth] 0-auth/login: 
> allowed user names: e2f4c8f4-d040-4856-b6e3-62611fbab0ea
> [2019-05-06 12:40:25.638109] I [MSGID: 115029] 
> [server-handshake.c:695:server_setvolume] 0-cluster-server: accepted 
> client from 
> ixmed1-207792-2019/05/06-12:40:25:562982-cluster-client-1-0-0 
> (version: 3.10.12)
> [2019-05-06 12:40:31.565931] I [MSGID: 115036] 
> [server.c:559:server_rpc_notify] 0-cluster-server: disconnecting 
> connection from 
> ixmed1-207792-2019/05/06-12:40:25:562982-cluster-client-1-0-0
> [2019-05-06 12:40:31.566420] I [MSGID: 101055] 
> [client_t.c:436:gf_client_unref] 0-cluster-server: Shutting down 
> connection ixmed1-207792-2019/05/06-12:40:25:562982-cluster-client-1-0-0
>
> I am not able to use any heal command because of this problem.
>
> I have three volumes configured on that nodes. Configuration is 
> identical and "gluster volume heal" command fails for all of them.
>
> Can anyone help?
>
> Thanks,
> ?ukasz
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190507/01a9946f/attachment.html>

From spisla80 at gmail.com  Tue May  7 09:15:52 2019
From: spisla80 at gmail.com (David Spisla)
Date: Tue, 7 May 2019 11:15:52 +0200
Subject: [Gluster-users] Brick-Xlators crashes after Set-RO and Read
In-Reply-To: <CAHn=sVAm-9GXBYst2P0k3rn-HU15Rw3HgUAGY0R=MOymDpxq-A@mail.gmail.com>
References: <CAJyj9j9wLf7BLJOP8Se5NrXr2ySPw5bTMY8OQ+RU0f4tCzAXcA@mail.gmail.com>
	<CAHn=sVAm-9GXBYst2P0k3rn-HU15Rw3HgUAGY0R=MOymDpxq-A@mail.gmail.com>
Message-ID: <CAJyj9j83TOBvaBCW+biCoo0uxhf+0nZed1cTFYCJfmu2s-ZsNA@mail.gmail.com>

Hello Vijay,

how can I create such a core file? Or will it be created automatically if a
gluster process crashes?
Maybe you can give me a hint and will try to get a backtrace.

Unfortunately this bug is not easy to reproduce because it appears only
sometimes.

Regards
David Spisla

Am Mo., 6. Mai 2019 um 19:48 Uhr schrieb Vijay Bellur <vbellur at redhat.com>:

> Thank you for the report, David. Do you have core files available on any
> of the servers? If yes, would it be possible for you to provide a backtrace.
>
> Regards,
> Vijay
>
> On Mon, May 6, 2019 at 3:09 AM David Spisla <spisla80 at gmail.com> wrote:
>
>> Hello folks,
>>
>> we have a client application (runs on Win10) which does some FOPs on a
>> gluster volume which is accessed by SMB.
>>
>> *Scenario 1* is a READ Operation which reads all files successively and
>> checks if the files data was correctly copied. While doing this, all brick
>> processes crashes and in the logs one have this crash report on every brick
>> log:
>>
>>> CTX_ID:a0359502-2c76-4fee-8cb9-365679dc690e-GRAPH_ID:0-PID:32934-HOST:XX-XXXXX-XX-XX-PC_NAME:shortterm-client-2-RECON_NO:-0, gfid: 00000000-0000-0000-0000-000000000001, req(uid:2000,gid:2000,perm:1,ngrps:1), ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission denied]
>>> pending frames:
>>> frame : type(0) op(27)
>>> frame : type(0) op(40)
>>> patchset: git://git.gluster.org/glusterfs.git
>>> signal received: 11
>>> time of crash:
>>> 2019-04-16 08:32:21
>>> configuration details:
>>> argp 1
>>> backtrace 1
>>> dlfcn 1
>>> libpthread 1
>>> llistxattr 1
>>> setfsid 1
>>> spinlock 1
>>> epoll.h 1
>>> xattr.h 1
>>> st_atim.tv_nsec 1
>>> package-string: glusterfs 5.5
>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f9a5bd4d64c]
>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f9a5bd57d26]
>>> /lib64/libc.so.6(+0x361a0)[0x7f9a5af141a0]
>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7f9a4ef0e910]
>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7f9a4ef0b118]
>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7f9a4f1278d6]
>>> /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7f9a4f35975b]
>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7f9a4f1203b3]
>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7f9a4ef0b5b2]
>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c]
>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c]
>>> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7f9a4e8cf548]
>>> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7f9a5bdefc22]
>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7f9a5bd733a5]
>>> /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7f9a4e6b7088]
>>> /lib64/libpthread.so.0(+0x7569)[0x7f9a5b29f569]
>>> /lib64/libc.so.6(clone+0x3f)[0x7f9a5afd69af]
>>>
>>> *Scenario 2 *The application just SET Read-Only on each file
>> sucessively. After the 70th file was set, all the bricks crashes and again,
>> one can read this crash report in every brick log:
>>
>>>
>>>
>>> [2019-05-02 07:43:39.953591] I [MSGID: 139001]
>>> [posix-acl.c:263:posix_acl_log_permit_denied] 0-longterm-access-control:
>>> client:
>>> CTX_ID:21aa9c75-3a5f-41f9-925b-48e4c80bd24a-GRAPH_ID:0-PID:16325-HOST:XXX-X-X-XXX-PC_NAME:longterm-client-0-RECON_NO:-0,
>>> gfid: 00000000-0000-0000-0000-000000000001,
>>> req(uid:2000,gid:2000,perm:1,ngrps:1),
>>> ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission
>>> denied]
>>>
>>> pending frames:
>>>
>>> frame : type(0) op(27)
>>>
>>> patchset: git://git.gluster.org/glusterfs.git
>>>
>>> signal received: 11
>>>
>>> time of crash:
>>>
>>> 2019-05-02 07:43:39
>>>
>>> configuration details:
>>>
>>> argp 1
>>>
>>> backtrace 1
>>>
>>> dlfcn 1
>>>
>>> libpthread 1
>>>
>>> llistxattr 1
>>>
>>> setfsid 1
>>>
>>> spinlock 1
>>>
>>> epoll.h 1
>>>
>>> xattr.h 1
>>>
>>> st_atim.tv_nsec 1
>>>
>>> package-string: glusterfs 5.5
>>>
>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fbb3f0b364c]
>>>
>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fbb3f0bdd26]
>>>
>>> /lib64/libc.so.6(+0x361e0)[0x7fbb3e27a1e0]
>>>
>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7fbb32257910]
>>>
>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7fbb32254118]
>>>
>>>
>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7fbb324708d6]
>>>
>>>
>>> /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7fbb326a275b]
>>>
>>>
>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7fbb324693b3]
>>>
>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7fbb322545b2]
>>>
>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c]
>>>
>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c]
>>>
>>>
>>> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7fbb31c18548]
>>>
>>> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7fbb3f155c22]
>>>
>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7fbb3f0d93a5]
>>>
>>>
>>> /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7fbb31a00088]
>>>
>>> /lib64/libpthread.so.0(+0x7569)[0x7fbb3e605569]
>>>
>>> /lib64/libc.so.6(clone+0x3f)[0x7fbb3e33c9ef]
>>>
>>
>> This happens on a 3-Node Gluster v5.5 Cluster on two different volumes.
>> But both volumes has the same settings:
>>
>>> Volume Name: shortterm
>>> Type: Replicate
>>> Volume ID: 5307e5c5-e8a1-493a-a846-342fb0195dee
>>> Status: Started
>>> Snapshot Count: 0
>>> Number of Bricks: 1 x 3 = 3
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: fs-xxxxx-c1-n1:/gluster/brick4/glusterbrick
>>> Brick2: fs-xxxxx-c1-n2:/gluster/brick4/glusterbrick
>>> Brick3: fs-xxxxx-c1-n3:/gluster/brick4/glusterbrick
>>> Options Reconfigured:
>>> storage.reserve: 1
>>> performance.client-io-threads: off
>>> nfs.disable: on
>>> transport.address-family: inet
>>> user.smb: disable
>>> features.read-only: off
>>> features.worm: off
>>> features.worm-file-level: on
>>> features.retention-mode: enterprise
>>> features.default-retention-period: 120
>>> network.ping-timeout: 10
>>> features.cache-invalidation: on
>>> features.cache-invalidation-timeout: 600
>>> performance.nl-cache: on
>>> performance.nl-cache-timeout: 600
>>> client.event-threads: 32
>>> server.event-threads: 32
>>> cluster.lookup-optimize: on
>>> performance.stat-prefetch: on
>>> performance.cache-invalidation: on
>>> performance.md-cache-timeout: 600
>>> performance.cache-samba-metadata: on
>>> performance.cache-ima-xattrs: on
>>> performance.io-thread-count: 64
>>> cluster.use-compound-fops: on
>>> performance.cache-size: 512MB
>>> performance.cache-refresh-timeout: 10
>>> performance.read-ahead: off
>>> performance.write-behind-window-size: 4MB
>>> performance.write-behind: on
>>> storage.build-pgfid: on
>>> features.utime: on
>>> storage.ctime: on
>>> cluster.quorum-type: fixed
>>> cluster.quorum-count: 2
>>> features.bitrot: on
>>> features.scrub: Active
>>> features.scrub-freq: daily
>>> cluster.enable-shared-storage: enable
>>>
>>>
>> Why can this happen to all Brick processes? I don't understand the crash
>> report. The FOPs are nothing special and after restart brick processes
>> everything works fine and our application was succeed.
>>
>> Regards
>> David Spisla
>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190507/fe8551af/attachment-0001.html>

From aspandey at redhat.com  Tue May  7 09:19:05 2019
From: aspandey at redhat.com (Ashish Pandey)
Date: Tue, 7 May 2019 05:19:05 -0400 (EDT)
Subject: [Gluster-users] Meeting Details on footer of the gluster-devel and
 gluster-user mailing list
In-Reply-To: <2029030585.17155612.1557220163425.JavaMail.zimbra@redhat.com>
Message-ID: <1839109616.17156274.1557220745006.JavaMail.zimbra@redhat.com>

Hi, 

While we send a mail on gluster-devel or gluster-user mailing list, following content gets auto generated and placed at the end of mail. 

Gluster-users mailing list 
Gluster-users at gluster.org 
https://lists.gluster.org/mailman/listinfo/gluster-users 
Gluster-devel mailing list 
Gluster-devel at gluster.org 
https://lists.gluster.org/mailman/listinfo/gluster-devel 

In the similar way, is it possible to attach meeting schedule and link at the end of every such mails? 
Like this - 

Meeting schedule - 


    * APAC friendly hours 
        * Tuesday 14th May 2019 , 11:30AM IST 
        * Bridge: https://bluejeans.com/836554017 
    * NA/EMEA 
        * Tuesday 7th May 2019 , 01:00 PM EDT 
        * Bridge: https://bluejeans.com/486278655 

Or just a link to meeting minutes details?? 
https://github.com/gluster/community/tree/master/meetings 

This will help developers and users of the community to know when and where meeting happens and how to attend those meetings. 

--- 
Ashish 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190507/8c67a7c7/attachment.html>

From spisla80 at gmail.com  Tue May  7 10:20:05 2019
From: spisla80 at gmail.com (David Spisla)
Date: Tue, 7 May 2019 12:20:05 +0200
Subject: [Gluster-users] Hard Failover with Samba and Glusterfs
In-Reply-To: <CAL=eBzddskgrsNePi_YsXyn8hsjV9YmOCEYMx3=DiOgzj0=+CA@mail.gmail.com>
References: <CAJyj9j-ZWmEBjfVDkdTU8R2BG+d-16tnCowD-LMUhmEyjGxeXA@mail.gmail.com>
	<CAL=eBzddskgrsNePi_YsXyn8hsjV9YmOCEYMx3=DiOgzj0=+CA@mail.gmail.com>
Message-ID: <CAJyj9j9doHKa7h5D-5cv7hJsvqfr+BSHoHcvU1+qvCbnbo4XqA@mail.gmail.com>

All answers to this questions are in this bugreport:
https://bugzilla.redhat.com/show_bug.cgi?GoAheadAndLogIn=Log%20in&id=1706842

Am Do., 18. Apr. 2019 um 09:21 Uhr schrieb hgichon <hgichon at gmail.com>:

> Hi.
>
> I have a some question about your testing.
>
> 1. What  was the glusterfs version you used in past time?
> 2. How about a volume configuration?
> 3. Was CTDB vip failed over correctly? If so, Clould you attach
> /var/log/samba/glusterfs-volname.win10.ip.log ?
>
> Best Regards
>
> - kpkim
>
>
> 2019? 4? 17? (?) ?? 5:02, David Spisla <spisla80 at gmail.com>?? ??:
>
>> Dear Gluster Community,
>>
>> I have this setup: 4-Node Glusterfs v5.5 Cluster, using SAMBA/CTDB v4.8
>> to access the volumes (each node has a VIP)
>>
>> I was testing this failover scenario:
>>
>> 1. Start Writing 940 GB with small files (64K-100K)from a Win10 Client
>> to node1
>> 2. During the write process I hardly shutdown node1  (where the client
>> is connect via VIP) by turn off the power
>>
>> My expectation is, that the write process stops and after a while the
>> Win10 Client offers me a Retry, so I can continue the write on different
>> node (which has now the VIP of node1).
>> In past time I did this observation, but now the system shows a strange
>> bahaviour:
>>
>> The Win10 Client do nothing and the Explorer freezes, in the backend CTDB
>> can not perform the failover and throws errors. The glusterd from node2 and
>> node3 logs this messages:
>>
>>> [2019-04-16 14:47:31.828323] W [glusterd-locks.c:795:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x24349) [0x7f1a62fcb349] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x2d950) [0x7f1a62fd4950] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0xe0359) [0x7f1a63087359] ) 0-management: Lock for vol archive1 not held
>>> [2019-04-16 14:47:31.828350] W [MSGID: 106117] [glusterd-handler.c:6451:__glusterd_peer_rpc_notify] 0-management: Lock not released for archive1
>>> [2019-04-16 14:47:31.828369] W [glusterd-locks.c:795:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x24349) [0x7f1a62fcb349] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x2d950) [0x7f1a62fd4950] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0xe0359) [0x7f1a63087359] ) 0-management: Lock for vol archive2 not held
>>> [2019-04-16 14:47:31.828376] W [MSGID: 106117] [glusterd-handler.c:6451:__glusterd_peer_rpc_notify] 0-management: Lock not released for archive2
>>> [2019-04-16 14:47:31.828412] W [glusterd-locks.c:795:glusterd_mgmt_v3_unlock] (-->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x24349) [0x7f1a62fcb349] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0x2d950) [0x7f1a62fd4950] -->/usr/lib64/glusterfs/5.5/xlator/mgmt/glusterd.so(+0xe0359) [0x7f1a63087359] ) 0-management: Lock for vol gluster_shared_storage not held
>>> [2019-04-16 14:47:31.828423] W [MSGID: 106117] [glusterd-handler.c:6451:__glusterd_peer_rpc_notify] 0-management: Lock not released for gluster_shared_storage
>>>
>>>
>> *In my oponion Samba/CTDB can not perform the failover correctly and
>> continue the write process because glusterfs didn't released the lock.*
>> What do you think? It seems to me like a bug because in past time the
>> failover works correctly.
>>
>> Regards
>> David Spisla
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190507/8c32d761/attachment.html>

From alan.orth at gmail.com  Tue May  7 13:12:06 2019
From: alan.orth at gmail.com (Alan Orth)
Date: Tue, 7 May 2019 16:12:06 +0300
Subject: [Gluster-users] "No space left on device" during rebalance with
 failed brick on Gluster 4.1.7
Message-ID: <CAKKdN4U=EC=2UsB=z9TBJ4zMQbTastAj2PZTFoCO53VsDu=vjg@mail.gmail.com>

Dear list,

We are using a Distributed-Replicate volume with replica 2 on Gluster 4.1.7
on CentOS 7. One of our nodes died recently and we will add new nodes and
bricks to replace it soon. In preparation for the maintenance I wanted to
rebalance the volume to make the disk thrashing less intense when we
add/remove bricks, but after eight hours of scanning I see millions of
"failures" in the rebalance status. The volume rebalance log shows many
errors like:

[2019-05-07 06:06:02.310843] E [MSGID: 109023]
[dht-rebalance.c:2907:gf_defrag_migrate_single_file] 0-data-dht:
migrate-data failed for
/ilri/miseq/MiSeq2/MiSeq2Output2018/180912_M03021_0002_000000000-BVM95/Thumbnail_Images/L001/C174.1/s_1_2103_c.jpg
[No space left on device]

The bricks on the healthy nodes all have 1.5TB of free space so I'm not
sure what this error means. Could it be because one of the replicas is
unavailable? I saw a similar bug report? about that. I've started a simple
fix-layout without data migration and it is working fine.

Thank you,

? https://access.redhat.com/solutions/456333
--
Alan Orth
alan.orth at gmail.com
https://picturingjordan.com
https://englishbulgaria.net
https://mjanja.ch
"In heaven all the interesting people are missing." ?Friedrich Nietzsche
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190507/636ec3e6/attachment.html>

From alan.orth at gmail.com  Tue May  7 13:51:23 2019
From: alan.orth at gmail.com (Alan Orth)
Date: Tue, 7 May 2019 16:51:23 +0300
Subject: [Gluster-users] "No space left on device" during rebalance with
 failed brick on Gluster 4.1.7
In-Reply-To: <CAKKdN4U=EC=2UsB=z9TBJ4zMQbTastAj2PZTFoCO53VsDu=vjg@mail.gmail.com>
References: <CAKKdN4U=EC=2UsB=z9TBJ4zMQbTastAj2PZTFoCO53VsDu=vjg@mail.gmail.com>
Message-ID: <CAKKdN4XaHO2+CEmumfesfDPqLn=3C6Xsiazc_T+ZAYaFtFu9qA@mail.gmail.com>

Dear list,

After looking at my rebalance log more I saw another message that helped me
solve the problem:

[2019-05-06 22:46:01.074035] W [MSGID: 0]
[dht-rebalance.c:1075:__dht_check_free_space] 0-data-dht: Write will cross
min-free-disk for file -
/ilri/miseq/MiSeq1/MiSeq1Output_2014/140624_M01601_0035_000000000-A6L82/Data/TileStatus/TileStatusL1T1114.tpl
on subvol - data-replicate-1. Looking for new subvol

I have not test cluster.min-free-disk, but it appears that the default
value for is 10%, and my bricks are at about 97% capacity so the "No space
error" in my previous message makes sense. I reduced the
cluster.min-free-disk to 2% and restarted the data rebalance and now I see
that it is already rebalancing files. The issue is solved.

Sorry about that! Thank you,

On Tue, May 7, 2019 at 4:12 PM Alan Orth <alan.orth at gmail.com> wrote:

> Dear list,
>
> We are using a Distributed-Replicate volume with replica 2 on Gluster
> 4.1.7 on CentOS 7. One of our nodes died recently and we will add new nodes
> and bricks to replace it soon. In preparation for the maintenance I wanted
> to rebalance the volume to make the disk thrashing less intense when we
> add/remove bricks, but after eight hours of scanning I see millions of
> "failures" in the rebalance status. The volume rebalance log shows many
> errors like:
>
> [2019-05-07 06:06:02.310843] E [MSGID: 109023]
> [dht-rebalance.c:2907:gf_defrag_migrate_single_file] 0-data-dht:
> migrate-data failed for
> /ilri/miseq/MiSeq2/MiSeq2Output2018/180912_M03021_0002_000000000-BVM95/Thumbnail_Images/L001/C174.1/s_1_2103_c.jpg
> [No space left on device]
>
> The bricks on the healthy nodes all have 1.5TB of free space so I'm not
> sure what this error means. Could it be because one of the replicas is
> unavailable? I saw a similar bug report? about that. I've started a simple
> fix-layout without data migration and it is working fine.
>
> Thank you,
>
> ? https://access.redhat.com/solutions/456333
> --
> Alan Orth
> alan.orth at gmail.com
> https://picturingjordan.com
> https://englishbulgaria.net
> https://mjanja.ch
> "In heaven all the interesting people are missing." ?Friedrich Nietzsche
>


-- 
Alan Orth
alan.orth at gmail.com
https://picturingjordan.com
https://englishbulgaria.net
https://mjanja.ch
"In heaven all the interesting people are missing." ?Friedrich Nietzsche
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190507/e1e070f1/attachment.html>

From lm at zork.pl  Tue May  7 14:01:03 2019
From: lm at zork.pl (=?UTF-8?Q?=c5=81ukasz_Michalski?=)
Date: Tue, 7 May 2019 16:01:03 +0200
Subject: [Gluster-users] heal: Not able to fetch volfile from glusterd
In-Reply-To: <e0b96e81-e4d6-8071-fbe9-f3e8c11d0af4@redhat.com>
References: <4376d725-a451-7b18-a7a1-c5285c3570b3@zork.pl>
	<e0b96e81-e4d6-8071-fbe9-f3e8c11d0af4@redhat.com>
Message-ID: <d43b7246-73ef-7e98-0fcc-bd78f155ac01@zork.pl>

> 
> Does the same issue occur if you try to resolve the split-brain on the 
> gfid 2584a0e2-c0fa-4fde-8537-5d5b6a5a4635 using the |gluster volume heal 
> <VOLNAME> split-brain |CLI?
> 

Many thanks for responding!

gluster volume info:

Volume Name: cluster
Type: Replicate
Volume ID: 8787d95e-8e66-4476-a990-4e27fc47c765
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: ixmed2:/glusterfs-bricks/cluster/cluster
Brick2: ixmed1:/glusterfs-bricks/cluster/cluster
Options Reconfigured:
network.ping-timeout: 5
user.smb: disable
transport.address-family: inet
nfs.disable: on

The problem was in network.ping-timeout set to 5 seconds. It is set for 
such a short value to prevent smb session from disconnecting when one 
node goes offline.

It seems that for split-brain resolution and management I have to 
temporarily set this value to 30 seconds or more.

Regards,
?ukasz


From vbellur at redhat.com  Tue May  7 18:08:13 2019
From: vbellur at redhat.com (Vijay Bellur)
Date: Tue, 7 May 2019 11:08:13 -0700
Subject: [Gluster-users] Brick-Xlators crashes after Set-RO and Read
In-Reply-To: <CAJyj9j83TOBvaBCW+biCoo0uxhf+0nZed1cTFYCJfmu2s-ZsNA@mail.gmail.com>
References: <CAJyj9j9wLf7BLJOP8Se5NrXr2ySPw5bTMY8OQ+RU0f4tCzAXcA@mail.gmail.com>
	<CAHn=sVAm-9GXBYst2P0k3rn-HU15Rw3HgUAGY0R=MOymDpxq-A@mail.gmail.com>
	<CAJyj9j83TOBvaBCW+biCoo0uxhf+0nZed1cTFYCJfmu2s-ZsNA@mail.gmail.com>
Message-ID: <CAHn=sVCh1nPCvUvZT2tmnn7NF8WztO9D9PUiNa5mDb5ORy+8Yw@mail.gmail.com>

Hello David,

On Tue, May 7, 2019 at 2:16 AM David Spisla <spisla80 at gmail.com> wrote:

> Hello Vijay,
>
> how can I create such a core file? Or will it be created automatically if
> a gluster process crashes?
> Maybe you can give me a hint and will try to get a backtrace.
>

Generation of core file is dependent on the system configuration.  `man 5
core` contains useful information to generate a core file in a directory.
Once a core file is generated, you can use gdb to get a backtrace of all
threads (using "thread apply all bt full").


> Unfortunately this bug is not easy to reproduce because it appears only
> sometimes.
>

If the bug is not easy to reproduce, having a backtrace from the generated
core would be very useful!

Thanks,
Vijay


>
> Regards
> David Spisla
>
> Am Mo., 6. Mai 2019 um 19:48 Uhr schrieb Vijay Bellur <vbellur at redhat.com
> >:
>
>> Thank you for the report, David. Do you have core files available on any
>> of the servers? If yes, would it be possible for you to provide a backtrace.
>>
>> Regards,
>> Vijay
>>
>> On Mon, May 6, 2019 at 3:09 AM David Spisla <spisla80 at gmail.com> wrote:
>>
>>> Hello folks,
>>>
>>> we have a client application (runs on Win10) which does some FOPs on a
>>> gluster volume which is accessed by SMB.
>>>
>>> *Scenario 1* is a READ Operation which reads all files successively and
>>> checks if the files data was correctly copied. While doing this, all brick
>>> processes crashes and in the logs one have this crash report on every brick
>>> log:
>>>
>>>> CTX_ID:a0359502-2c76-4fee-8cb9-365679dc690e-GRAPH_ID:0-PID:32934-HOST:XX-XXXXX-XX-XX-PC_NAME:shortterm-client-2-RECON_NO:-0, gfid: 00000000-0000-0000-0000-000000000001, req(uid:2000,gid:2000,perm:1,ngrps:1), ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission denied]
>>>> pending frames:
>>>> frame : type(0) op(27)
>>>> frame : type(0) op(40)
>>>> patchset: git://git.gluster.org/glusterfs.git
>>>> signal received: 11
>>>> time of crash:
>>>> 2019-04-16 08:32:21
>>>> configuration details:
>>>> argp 1
>>>> backtrace 1
>>>> dlfcn 1
>>>> libpthread 1
>>>> llistxattr 1
>>>> setfsid 1
>>>> spinlock 1
>>>> epoll.h 1
>>>> xattr.h 1
>>>> st_atim.tv_nsec 1
>>>> package-string: glusterfs 5.5
>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f9a5bd4d64c]
>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f9a5bd57d26]
>>>> /lib64/libc.so.6(+0x361a0)[0x7f9a5af141a0]
>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7f9a4ef0e910]
>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7f9a4ef0b118]
>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7f9a4f1278d6]
>>>> /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7f9a4f35975b]
>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7f9a4f1203b3]
>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7f9a4ef0b5b2]
>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c]
>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c]
>>>> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7f9a4e8cf548]
>>>> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7f9a5bdefc22]
>>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7f9a5bd733a5]
>>>> /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7f9a4e6b7088]
>>>> /lib64/libpthread.so.0(+0x7569)[0x7f9a5b29f569]
>>>> /lib64/libc.so.6(clone+0x3f)[0x7f9a5afd69af]
>>>>
>>>> *Scenario 2 *The application just SET Read-Only on each file
>>> sucessively. After the 70th file was set, all the bricks crashes and again,
>>> one can read this crash report in every brick log:
>>>
>>>>
>>>>
>>>> [2019-05-02 07:43:39.953591] I [MSGID: 139001]
>>>> [posix-acl.c:263:posix_acl_log_permit_denied] 0-longterm-access-control:
>>>> client:
>>>> CTX_ID:21aa9c75-3a5f-41f9-925b-48e4c80bd24a-GRAPH_ID:0-PID:16325-HOST:XXX-X-X-XXX-PC_NAME:longterm-client-0-RECON_NO:-0,
>>>> gfid: 00000000-0000-0000-0000-000000000001,
>>>> req(uid:2000,gid:2000,perm:1,ngrps:1),
>>>> ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission
>>>> denied]
>>>>
>>>> pending frames:
>>>>
>>>> frame : type(0) op(27)
>>>>
>>>> patchset: git://git.gluster.org/glusterfs.git
>>>>
>>>> signal received: 11
>>>>
>>>> time of crash:
>>>>
>>>> 2019-05-02 07:43:39
>>>>
>>>> configuration details:
>>>>
>>>> argp 1
>>>>
>>>> backtrace 1
>>>>
>>>> dlfcn 1
>>>>
>>>> libpthread 1
>>>>
>>>> llistxattr 1
>>>>
>>>> setfsid 1
>>>>
>>>> spinlock 1
>>>>
>>>> epoll.h 1
>>>>
>>>> xattr.h 1
>>>>
>>>> st_atim.tv_nsec 1
>>>>
>>>> package-string: glusterfs 5.5
>>>>
>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fbb3f0b364c]
>>>>
>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fbb3f0bdd26]
>>>>
>>>> /lib64/libc.so.6(+0x361e0)[0x7fbb3e27a1e0]
>>>>
>>>>
>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7fbb32257910]
>>>>
>>>>
>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7fbb32254118]
>>>>
>>>>
>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7fbb324708d6]
>>>>
>>>>
>>>> /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7fbb326a275b]
>>>>
>>>>
>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7fbb324693b3]
>>>>
>>>>
>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7fbb322545b2]
>>>>
>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c]
>>>>
>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c]
>>>>
>>>>
>>>> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7fbb31c18548]
>>>>
>>>>
>>>> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7fbb3f155c22]
>>>>
>>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7fbb3f0d93a5]
>>>>
>>>>
>>>> /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7fbb31a00088]
>>>>
>>>> /lib64/libpthread.so.0(+0x7569)[0x7fbb3e605569]
>>>>
>>>> /lib64/libc.so.6(clone+0x3f)[0x7fbb3e33c9ef]
>>>>
>>>
>>> This happens on a 3-Node Gluster v5.5 Cluster on two different volumes.
>>> But both volumes has the same settings:
>>>
>>>> Volume Name: shortterm
>>>> Type: Replicate
>>>> Volume ID: 5307e5c5-e8a1-493a-a846-342fb0195dee
>>>> Status: Started
>>>> Snapshot Count: 0
>>>> Number of Bricks: 1 x 3 = 3
>>>> Transport-type: tcp
>>>> Bricks:
>>>> Brick1: fs-xxxxx-c1-n1:/gluster/brick4/glusterbrick
>>>> Brick2: fs-xxxxx-c1-n2:/gluster/brick4/glusterbrick
>>>> Brick3: fs-xxxxx-c1-n3:/gluster/brick4/glusterbrick
>>>> Options Reconfigured:
>>>> storage.reserve: 1
>>>> performance.client-io-threads: off
>>>> nfs.disable: on
>>>> transport.address-family: inet
>>>> user.smb: disable
>>>> features.read-only: off
>>>> features.worm: off
>>>> features.worm-file-level: on
>>>> features.retention-mode: enterprise
>>>> features.default-retention-period: 120
>>>> network.ping-timeout: 10
>>>> features.cache-invalidation: on
>>>> features.cache-invalidation-timeout: 600
>>>> performance.nl-cache: on
>>>> performance.nl-cache-timeout: 600
>>>> client.event-threads: 32
>>>> server.event-threads: 32
>>>> cluster.lookup-optimize: on
>>>> performance.stat-prefetch: on
>>>> performance.cache-invalidation: on
>>>> performance.md-cache-timeout: 600
>>>> performance.cache-samba-metadata: on
>>>> performance.cache-ima-xattrs: on
>>>> performance.io-thread-count: 64
>>>> cluster.use-compound-fops: on
>>>> performance.cache-size: 512MB
>>>> performance.cache-refresh-timeout: 10
>>>> performance.read-ahead: off
>>>> performance.write-behind-window-size: 4MB
>>>> performance.write-behind: on
>>>> storage.build-pgfid: on
>>>> features.utime: on
>>>> storage.ctime: on
>>>> cluster.quorum-type: fixed
>>>> cluster.quorum-count: 2
>>>> features.bitrot: on
>>>> features.scrub: Active
>>>> features.scrub-freq: daily
>>>> cluster.enable-shared-storage: enable
>>>>
>>>>
>>> Why can this happen to all Brick processes? I don't understand the crash
>>> report. The FOPs are nothing special and after restart brick processes
>>> everything works fine and our application was succeed.
>>>
>>> Regards
>>> David Spisla
>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190507/a3fa2730/attachment.html>

From rabhat at redhat.com  Tue May  7 18:14:33 2019
From: rabhat at redhat.com (FNU Raghavendra Manjunath)
Date: Tue, 7 May 2019 14:14:33 -0400
Subject: [Gluster-users] Meeting Details on footer of the gluster-devel
 and gluster-user mailing list
In-Reply-To: <1839109616.17156274.1557220745006.JavaMail.zimbra@redhat.com>
References: <2029030585.17155612.1557220163425.JavaMail.zimbra@redhat.com>
	<1839109616.17156274.1557220745006.JavaMail.zimbra@redhat.com>
Message-ID: <CAMRKmT_Sy0mYNVrLfTSQ=p_pg0PVG5XKKFW3LLiaAbTxifLmBw@mail.gmail.com>

+ 1 to this.

There is also one more thing. For some reason, the community meeting is not
visible in my calendar (especially NA region). I am not sure if anyone else
also facing this issue.

Regards,
Raghavendra

On Tue, May 7, 2019 at 5:19 AM Ashish Pandey <aspandey at redhat.com> wrote:

> Hi,
>
> While we send a mail on gluster-devel or gluster-user mailing list,
> following content gets auto generated and placed at the end of mail.
>
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
>
> In the similar way, is it possible to attach meeting schedule and link at the end of every such mails?
> Like this -
>
> Meeting schedule -
>
>
>    - APAC friendly hours
>       - Tuesday 14th May 2019, 11:30AM IST
>       - Bridge: https://bluejeans.com/836554017
>       - NA/EMEA
>       - Tuesday 7th May 2019, 01:00 PM EDT
>       - Bridge: https://bluejeans.com/486278655
>
> Or just a link to meeting minutes details??
>  <https://github.com/gluster/community/tree/master/meetings>https://github.com/gluster/community/tree/master/meetings
>
> This will help developers and users of the community to know when and where meeting happens and how to attend those meetings.
>
> ---
> Ashish
>
>
>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190507/212cada5/attachment.html>

From vbellur at redhat.com  Tue May  7 18:37:27 2019
From: vbellur at redhat.com (Vijay Bellur)
Date: Tue, 7 May 2019 11:37:27 -0700
Subject: [Gluster-users] Meeting Details on footer of the gluster-devel
 and gluster-user mailing list
In-Reply-To: <CAMRKmT_Sy0mYNVrLfTSQ=p_pg0PVG5XKKFW3LLiaAbTxifLmBw@mail.gmail.com>
References: <2029030585.17155612.1557220163425.JavaMail.zimbra@redhat.com>
	<1839109616.17156274.1557220745006.JavaMail.zimbra@redhat.com>
	<CAMRKmT_Sy0mYNVrLfTSQ=p_pg0PVG5XKKFW3LLiaAbTxifLmBw@mail.gmail.com>
Message-ID: <CAHn=sVDaVmgCzpDGfUSh1yLVONA-kgxHxERum5bqmHCSzcSsfA@mail.gmail.com>

On Tue, May 7, 2019 at 11:15 AM FNU Raghavendra Manjunath <rabhat at redhat.com>
wrote:

>
> + 1 to this.
>

I have updated the footer of gluster-devel. If that looks ok, we can extend
it to gluster-users too.

In case of a month with 5 Tuesdays, we can skip the 5th Tuesday and always
stick to the first 4 Tuesdays of every month. That will help in describing
the community meeting schedule better. If we want to keep the schedule
running on alternate Tuesdays, please let me know and the mailing list
footers can be updated accordingly :-).


> There is also one more thing. For some reason, the community meeting is
> not visible in my calendar (especially NA region). I am not sure if anyone
> else also facing this issue.
>

I did face this issue. Realized that we had a meeting today and showed up
at the meeting a while later but did not see many participants. Perhaps,
the calendar invite has to be made a recurring one.

Thanks,
Vijay


>
> Regards,
> Raghavendra
>
> On Tue, May 7, 2019 at 5:19 AM Ashish Pandey <aspandey at redhat.com> wrote:
>
>> Hi,
>>
>> While we send a mail on gluster-devel or gluster-user mailing list,
>> following content gets auto generated and placed at the end of mail.
>>
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>> In the similar way, is it possible to attach meeting schedule and link at the end of every such mails?
>> Like this -
>>
>> Meeting schedule -
>>
>>
>>    - APAC friendly hours
>>       - Tuesday 14th May 2019, 11:30AM IST
>>       - Bridge: https://bluejeans.com/836554017
>>       - NA/EMEA
>>       - Tuesday 7th May 2019, 01:00 PM EDT
>>       - Bridge: https://bluejeans.com/486278655
>>
>> Or just a link to meeting minutes details??
>>  <https://github.com/gluster/community/tree/master/meetings>https://github.com/gluster/community/tree/master/meetings
>>
>> This will help developers and users of the community to know when and where meeting happens and how to attend those meetings.
>>
>> ---
>> Ashish
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190507/3703b328/attachment.html>

From amukherj at redhat.com  Wed May  8 04:15:10 2019
From: amukherj at redhat.com (Atin Mukherjee)
Date: Wed, 8 May 2019 09:45:10 +0530
Subject: [Gluster-users] [Gluster-devel] Meeting Details on footer of
 the gluster-devel and gluster-user mailing list
In-Reply-To: <CAHn=sVDaVmgCzpDGfUSh1yLVONA-kgxHxERum5bqmHCSzcSsfA@mail.gmail.com>
References: <2029030585.17155612.1557220163425.JavaMail.zimbra@redhat.com>
	<1839109616.17156274.1557220745006.JavaMail.zimbra@redhat.com>
	<CAMRKmT_Sy0mYNVrLfTSQ=p_pg0PVG5XKKFW3LLiaAbTxifLmBw@mail.gmail.com>
	<CAHn=sVDaVmgCzpDGfUSh1yLVONA-kgxHxERum5bqmHCSzcSsfA@mail.gmail.com>
Message-ID: <CAGNCGH2waRT4oR6t2hwUd=xexvw9-_nSfJnFGurV88iAHyTGUA@mail.gmail.com>

On Wed, May 8, 2019 at 12:08 AM Vijay Bellur <vbellur at redhat.com> wrote:

>
>
> On Tue, May 7, 2019 at 11:15 AM FNU Raghavendra Manjunath <
> rabhat at redhat.com> wrote:
>
>>
>> + 1 to this.
>>
>
> I have updated the footer of gluster-devel. If that looks ok, we can
> extend it to gluster-users too.
>
> In case of a month with 5 Tuesdays, we can skip the 5th Tuesday and always
> stick to the first 4 Tuesdays of every month. That will help in describing
> the community meeting schedule better. If we want to keep the schedule
> running on alternate Tuesdays, please let me know and the mailing list
> footers can be updated accordingly :-).
>
>
>> There is also one more thing. For some reason, the community meeting is
>> not visible in my calendar (especially NA region). I am not sure if anyone
>> else also facing this issue.
>>
>
> I did face this issue. Realized that we had a meeting today and showed up
> at the meeting a while later but did not see many participants. Perhaps,
> the calendar invite has to be made a recurring one.
>

We'd need to explicitly import the invite and add it to our calendar,
otherwise it doesn't reflect.


> Thanks,
> Vijay
>
>
>>
>> Regards,
>> Raghavendra
>>
>> On Tue, May 7, 2019 at 5:19 AM Ashish Pandey <aspandey at redhat.com> wrote:
>>
>>> Hi,
>>>
>>> While we send a mail on gluster-devel or gluster-user mailing list,
>>> following content gets auto generated and placed at the end of mail.
>>>
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>> Gluster-devel mailing list
>>> Gluster-devel at gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>>
>>> In the similar way, is it possible to attach meeting schedule and link at the end of every such mails?
>>> Like this -
>>>
>>> Meeting schedule -
>>>
>>>
>>>    - APAC friendly hours
>>>       - Tuesday 14th May 2019, 11:30AM IST
>>>       - Bridge: https://bluejeans.com/836554017
>>>       - NA/EMEA
>>>       - Tuesday 7th May 2019, 01:00 PM EDT
>>>       - Bridge: https://bluejeans.com/486278655
>>>
>>> Or just a link to meeting minutes details??
>>>  <https://github.com/gluster/community/tree/master/meetings>https://github.com/gluster/community/tree/master/meetings
>>>
>>> This will help developers and users of the community to know when and where meeting happens and how to attend those meetings.
>>>
>>> ---
>>> Ashish
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
> _______________________________________________
>
> Community Meeting Calendar:
>
> APAC Schedule -
> Every 2nd and 4th Tuesday at 11:30 AM IST
> Bridge: https://bluejeans.com/836554017
>
> NA/EMEA Schedule -
> Every 1st and 3rd Tuesday at 01:00 PM EDT
> Bridge: https://bluejeans.com/486278655
>
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190508/ad35f126/attachment.html>

From amukherj at redhat.com  Wed May  8 04:16:47 2019
From: amukherj at redhat.com (Atin Mukherjee)
Date: Wed, 8 May 2019 09:46:47 +0530
Subject: [Gluster-users] [Gluster-devel] Meeting Details on footer of
 the gluster-devel and gluster-user mailing list
In-Reply-To: <CAGNCGH2waRT4oR6t2hwUd=xexvw9-_nSfJnFGurV88iAHyTGUA@mail.gmail.com>
References: <2029030585.17155612.1557220163425.JavaMail.zimbra@redhat.com>
	<1839109616.17156274.1557220745006.JavaMail.zimbra@redhat.com>
	<CAMRKmT_Sy0mYNVrLfTSQ=p_pg0PVG5XKKFW3LLiaAbTxifLmBw@mail.gmail.com>
	<CAHn=sVDaVmgCzpDGfUSh1yLVONA-kgxHxERum5bqmHCSzcSsfA@mail.gmail.com>
	<CAGNCGH2waRT4oR6t2hwUd=xexvw9-_nSfJnFGurV88iAHyTGUA@mail.gmail.com>
Message-ID: <CAGNCGH1Wji+hn-6Y7TvCeevY-mD_cwXCVkN0Tv35Oz11nndfDA@mail.gmail.com>

On Wed, May 8, 2019 at 9:45 AM Atin Mukherjee <amukherj at redhat.com> wrote:

>
>
> On Wed, May 8, 2019 at 12:08 AM Vijay Bellur <vbellur at redhat.com> wrote:
>
>>
>>
>> On Tue, May 7, 2019 at 11:15 AM FNU Raghavendra Manjunath <
>> rabhat at redhat.com> wrote:
>>
>>>
>>> + 1 to this.
>>>
>>
>> I have updated the footer of gluster-devel. If that looks ok, we can
>> extend it to gluster-users too.
>>
>> In case of a month with 5 Tuesdays, we can skip the 5th Tuesday and
>> always stick to the first 4 Tuesdays of every month. That will help in
>> describing the community meeting schedule better. If we want to keep the
>> schedule running on alternate Tuesdays, please let me know and the mailing
>> list footers can be updated accordingly :-).
>>
>>
>>> There is also one more thing. For some reason, the community meeting is
>>> not visible in my calendar (especially NA region). I am not sure if anyone
>>> else also facing this issue.
>>>
>>
>> I did face this issue. Realized that we had a meeting today and showed up
>> at the meeting a while later but did not see many participants. Perhaps,
>> the calendar invite has to be made a recurring one.
>>
>
> We'd need to explicitly import the invite and add it to our calendar,
> otherwise it doesn't reflect.
>

And you're right that the last series wasn't a recurring one either.


>
>> Thanks,
>> Vijay
>>
>>
>>>
>>> Regards,
>>> Raghavendra
>>>
>>> On Tue, May 7, 2019 at 5:19 AM Ashish Pandey <aspandey at redhat.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> While we send a mail on gluster-devel or gluster-user mailing list,
>>>> following content gets auto generated and placed at the end of mail.
>>>>
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>> Gluster-devel mailing list
>>>> Gluster-devel at gluster.org
>>>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>>>
>>>> In the similar way, is it possible to attach meeting schedule and link at the end of every such mails?
>>>> Like this -
>>>>
>>>> Meeting schedule -
>>>>
>>>>
>>>>    - APAC friendly hours
>>>>       - Tuesday 14th May 2019, 11:30AM IST
>>>>       - Bridge: https://bluejeans.com/836554017
>>>>       - NA/EMEA
>>>>       - Tuesday 7th May 2019, 01:00 PM EDT
>>>>       - Bridge: https://bluejeans.com/486278655
>>>>
>>>> Or just a link to meeting minutes details??
>>>>  <https://github.com/gluster/community/tree/master/meetings>https://github.com/gluster/community/tree/master/meetings
>>>>
>>>> This will help developers and users of the community to know when and where meeting happens and how to attend those meetings.
>>>>
>>>> ---
>>>> Ashish
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>> _______________________________________________
>>
>> Community Meeting Calendar:
>>
>> APAC Schedule -
>> Every 2nd and 4th Tuesday at 11:30 AM IST
>> Bridge: https://bluejeans.com/836554017
>>
>> NA/EMEA Schedule -
>> Every 1st and 3rd Tuesday at 01:00 PM EDT
>> Bridge: https://bluejeans.com/486278655
>>
>> Gluster-devel mailing list
>> Gluster-devel at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190508/40967c3a/attachment.html>

From ndevos at redhat.com  Wed May  8 07:08:08 2019
From: ndevos at redhat.com (Niels de Vos)
Date: Wed, 8 May 2019 09:08:08 +0200
Subject: [Gluster-users] Meeting Details on footer of the gluster-devel
 and gluster-user mailing list
In-Reply-To: <CAHn=sVDaVmgCzpDGfUSh1yLVONA-kgxHxERum5bqmHCSzcSsfA@mail.gmail.com>
References: <2029030585.17155612.1557220163425.JavaMail.zimbra@redhat.com>
	<1839109616.17156274.1557220745006.JavaMail.zimbra@redhat.com>
	<CAMRKmT_Sy0mYNVrLfTSQ=p_pg0PVG5XKKFW3LLiaAbTxifLmBw@mail.gmail.com>
	<CAHn=sVDaVmgCzpDGfUSh1yLVONA-kgxHxERum5bqmHCSzcSsfA@mail.gmail.com>
Message-ID: <20190508070808.GA22482@ndevos-x270>

On Tue, May 07, 2019 at 11:37:27AM -0700, Vijay Bellur wrote:
> On Tue, May 7, 2019 at 11:15 AM FNU Raghavendra Manjunath <rabhat at redhat.com>
> wrote:
> 
> >
> > + 1 to this.
> >
> 
> I have updated the footer of gluster-devel. If that looks ok, we can extend
> it to gluster-users too.
> 
> In case of a month with 5 Tuesdays, we can skip the 5th Tuesday and always
> stick to the first 4 Tuesdays of every month. That will help in describing
> the community meeting schedule better. If we want to keep the schedule
> running on alternate Tuesdays, please let me know and the mailing list
> footers can be updated accordingly :-).
> 
> 
> > There is also one more thing. For some reason, the community meeting is
> > not visible in my calendar (especially NA region). I am not sure if anyone
> > else also facing this issue.
> >
> 
> I did face this issue. Realized that we had a meeting today and showed up
> at the meeting a while later but did not see many participants. Perhaps,
> the calendar invite has to be made a recurring one.

Maybe a new invite can be sent with the minutes after a meeting has
finished. This makes it easier for people that recently subscribed to
the list to add it to their calendar?

Niels


> 
> Thanks,
> Vijay
> 
> 
> >
> > Regards,
> > Raghavendra
> >
> > On Tue, May 7, 2019 at 5:19 AM Ashish Pandey <aspandey at redhat.com> wrote:
> >
> >> Hi,
> >>
> >> While we send a mail on gluster-devel or gluster-user mailing list,
> >> following content gets auto generated and placed at the end of mail.
> >>
> >> Gluster-users mailing list
> >> Gluster-users at gluster.org
> >> https://lists.gluster.org/mailman/listinfo/gluster-users
> >>
> >> Gluster-devel mailing list
> >> Gluster-devel at gluster.org
> >> https://lists.gluster.org/mailman/listinfo/gluster-devel
> >>
> >> In the similar way, is it possible to attach meeting schedule and link at the end of every such mails?
> >> Like this -
> >>
> >> Meeting schedule -
> >>
> >>
> >>    - APAC friendly hours
> >>       - Tuesday 14th May 2019, 11:30AM IST
> >>       - Bridge: https://bluejeans.com/836554017
> >>       - NA/EMEA
> >>       - Tuesday 7th May 2019, 01:00 PM EDT
> >>       - Bridge: https://bluejeans.com/486278655
> >>
> >> Or just a link to meeting minutes details??
> >>  <https://github.com/gluster/community/tree/master/meetings>https://github.com/gluster/community/tree/master/meetings
> >>
> >> This will help developers and users of the community to know when and where meeting happens and how to attend those meetings.
> >>
> >> ---
> >> Ashish
> >>
> >>
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> Gluster-users mailing list
> >> Gluster-users at gluster.org
> >> https://lists.gluster.org/mailman/listinfo/gluster-users
> >
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > https://lists.gluster.org/mailman/listinfo/gluster-users

> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users


From vbellur at redhat.com  Wed May  8 07:31:37 2019
From: vbellur at redhat.com (Vijay Bellur)
Date: Wed, 8 May 2019 00:31:37 -0700
Subject: [Gluster-users] Meeting Details on footer of the gluster-devel
 and gluster-user mailing list
In-Reply-To: <20190508070808.GA22482@ndevos-x270>
References: <2029030585.17155612.1557220163425.JavaMail.zimbra@redhat.com>
	<1839109616.17156274.1557220745006.JavaMail.zimbra@redhat.com>
	<CAMRKmT_Sy0mYNVrLfTSQ=p_pg0PVG5XKKFW3LLiaAbTxifLmBw@mail.gmail.com>
	<CAHn=sVDaVmgCzpDGfUSh1yLVONA-kgxHxERum5bqmHCSzcSsfA@mail.gmail.com>
	<20190508070808.GA22482@ndevos-x270>
Message-ID: <CAHn=sVCh5LeEHU1UVDYkSf=OWsmk42C7pYs1W5RAQHpikqZB8w@mail.gmail.com>

On Wed, May 8, 2019 at 12:08 AM Niels de Vos <ndevos at redhat.com> wrote:

> On Tue, May 07, 2019 at 11:37:27AM -0700, Vijay Bellur wrote:
> > On Tue, May 7, 2019 at 11:15 AM FNU Raghavendra Manjunath <
> rabhat at redhat.com>
> > wrote:
> >
> > >
> > > + 1 to this.
> > >
> >
> > I have updated the footer of gluster-devel. If that looks ok, we can
> extend
> > it to gluster-users too.
> >
> > In case of a month with 5 Tuesdays, we can skip the 5th Tuesday and
> always
> > stick to the first 4 Tuesdays of every month. That will help in
> describing
> > the community meeting schedule better. If we want to keep the schedule
> > running on alternate Tuesdays, please let me know and the mailing list
> > footers can be updated accordingly :-).
> >
> >
> > > There is also one more thing. For some reason, the community meeting is
> > > not visible in my calendar (especially NA region). I am not sure if
> anyone
> > > else also facing this issue.
> > >
> >
> > I did face this issue. Realized that we had a meeting today and showed up
> > at the meeting a while later but did not see many participants. Perhaps,
> > the calendar invite has to be made a recurring one.
>
> Maybe a new invite can be sent with the minutes after a meeting has
> finished. This makes it easier for people that recently subscribed to
> the list to add it to their calendar?
>
>
>
That is a good point. I have observed in google groups based mailing lists
that a calendar invite for a recurring event is sent automatically to
people after they subscribe to the list. I don't think mailman has a
similar feature yet.

Thanks,
Vijay
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190508/0465a154/attachment.html>

From spisla80 at gmail.com  Thu May  9 12:52:52 2019
From: spisla80 at gmail.com (David Spisla)
Date: Thu, 9 May 2019 14:52:52 +0200
Subject: [Gluster-users] Create Gluster RPMs on a SLES15 machine
Message-ID: <CAJyj9j-A1VzvmQWJgSUj1NDZrc_eGsUdH5GU_F_gyd4n8jR4tQ@mail.gmail.com>

Hello Kaleb,

i am trying to create my own Gluster v5.5 RPMs for SLES15 and I am using a
SLES15 system to create them. I got the following error message:

rpmbuild --define '_topdir /home/davids/glusterfs/extras/LinuxRPM/rpmbuild'
> --with gnfs -bb rpmbuild/SPECS/glusterfs.spec
> warning: bogus date in %changelog: Tue Apr 17 2019 kkeithle at redhat.com
> warning: bogus date in %changelog: Fri Sep 19 2018 kkeithle at redhat.com
> error: Failed build dependencies:
>     rpcgen is needed by glusterfs-5.5-100.x86_64
> make: *** [Makefile:579: rpms] Error 1
>
>
In the corresponding glusterfs.spec file (branch sles15-glusterfs-5 in Repo
glusterfs-suse) there is rpcgen listed as dependency. But unfortunately
there is no rpcgen package provided on SLES15. Or with other words:
I did only find RPMs for other SUSE distributions, but not for SLES15.

Do you know that isssue?
What is the name of the distribution which you are using to create Packages
for SLES15?

Regards
David Spisla
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190509/b1c0e3f1/attachment.html>

From spisla80 at gmail.com  Thu May  9 14:12:03 2019
From: spisla80 at gmail.com (David Spisla)
Date: Thu, 9 May 2019 16:12:03 +0200
Subject: [Gluster-users] Improve stability between SMB/CTDB and Gluster
 (together with Samba Core Developer)
Message-ID: <CAJyj9j_BofU7ziSNmTQWzcs4r9xLzzTNKetk+nhHay70T_z1qA@mail.gmail.com>

Dear Gluster Community,
at the moment we are improving the stability of SMB/CTDB and Gluster. For
this purpose we are working together with an advanced SAMBA Core Developer.
He did some debugging but needs more information about Gluster Core
Behaviour.

*Would any of the Gluster Developer wants to have a online conference with
him and me?*

I would organize everything. In my opinion this is a good chance to improve
stability of Glusterfs and this is at the moment one of the major issues in
the Community.

Regards
David Spisla
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190509/7ea5dd48/attachment.html>

From hunter86_bg at yahoo.com  Fri May 10 10:36:38 2019
From: hunter86_bg at yahoo.com (Strahil)
Date: Fri, 10 May 2019 13:36:38 +0300
Subject: [Gluster-users] Advice needed for network change.
Message-ID: <4ie49rarxdk9rx1mer4ol71w.1557484034416@email.android.com>

Hello Community,

I'm making some changes and I would like to hear? your opinion on the topic.

First, let me share my setup.
I have 3 systems with in a replica 3 arbiter 1 hyperconverged setup (oVirt) which use 1 gbit networks for any connectivity.

I have added 4 dual-port 1 gbit NICs ( 8 ports per machine in total) and connected them directly between ovirt1 and ovirt2 /data nodes/ with LACP aggregation (layer3+layer4 hashing).

As ovirt1 & ovirt2 are directly connected /trying to reduce costs by avoiding the switch/ I have  setup /etc/hosts for the arbiter  /ovirt3/ to point tothe old IPs .

So they look like:
ovirt1 & ovirt2 /data nodes/  /etc/hosts:

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.1.90 ovirt1.localdomain ovirt1
192.168.1.64 ovirt2.localdomain ovirt2
192.168.1.41 ovirt3.localdomain ovirt3
10.10.10.1   gluster1.localdomain gluster1
10.10.10.2   gluster2.localdomain gluster2

ovirt3 /etc/hosts:

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
#As gluster1 & gluster2 are directly connected , we cannot reach them.
192.168.1.90 ovirt1.localdomain ovirt1 gluster1
192.168.1.64 ovirt2.localdomain ovirt2 gluster2
192.168.1.41 ovirt3.localdomain ovirt3

Do you see any obstacles to 'peer probe' and then 'replace brick' the 2 data nodes.
Downtime is not an issue, but I preffer not to wipe the setup.

Thanks for reading this long post and don't hesitate to recommend any tunings.
I am still considering what values to put for the  client/server thread count.

Best Regards,
Strahil Nikolov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190510/a374137e/attachment.html>

From kkeithle at redhat.com  Fri May 10 14:24:15 2019
From: kkeithle at redhat.com (Kaleb Keithley)
Date: Fri, 10 May 2019 10:24:15 -0400
Subject: [Gluster-users] Create Gluster RPMs on a SLES15 machine
In-Reply-To: <CAC+Jd5Dy-GkfQbwkHmJEAXoGqqHfUEqw_e1otj+rJPdmAPk_yA@mail.gmail.com>
References: <CAJyj9j-A1VzvmQWJgSUj1NDZrc_eGsUdH5GU_F_gyd4n8jR4tQ@mail.gmail.com>
	<CAC+Jd5Dy-GkfQbwkHmJEAXoGqqHfUEqw_e1otj+rJPdmAPk_yA@mail.gmail.com>
Message-ID: <CAC+Jd5D=QiJn4vsp=weSoiQOHRoEg91yanHinEsXcRCdn1r1RA@mail.gmail.com>

Seems I accidentally omitted gluster-users in my first reply.

On Thu, May 9, 2019 at 3:19 PM Kaleb Keithley <kkeithle at redhat.com> wrote:

> On Thu, May 9, 2019 at 8:53 AM David Spisla <spisla80 at gmail.com> wrote:
>
>> Hello Kaleb,
>>
>> i am trying to create my own Gluster v5.5 RPMs for SLES15 and I am using
>> a SLES15 system to create them. I got the following error message:
>>
>> rpmbuild --define '_topdir
>>> /home/davids/glusterfs/extras/LinuxRPM/rpmbuild' --with gnfs -bb
>>> rpmbuild/SPECS/glusterfs.spec
>>> warning: bogus date in %changelog: Tue Apr 17 2019 kkeithle at
>>> redhat.com
>>> warning: bogus date in %changelog: Fri Sep 19 2018 kkeithle at
>>> redhat.com
>>> error: Failed build dependencies:
>>>     rpcgen is needed by glusterfs-5.5-100.x86_64
>>> make: *** [Makefile:579: rpms] Error 1
>>>
>>>
>> In the corresponding glusterfs.spec file (branch sles15-glusterfs-5 in
>> Repo glusterfs-suse) there is rpcgen listed as dependency. But
>> unfortunately there is no rpcgen package provided on SLES15. Or with other
>> words:
>> I did only find RPMs for other SUSE distributions, but not for SLES15.
>>
>> Do you know that issue?
>>
>
> I'm afraid I don't.
>
>
>> What is the name of the distribution which you are using to create
>> Packages for SLES15?
>>
>
> The community packages are built on the OpenSUSE OBS and they are built on
> SLES15 ?the one that OBS provides. I don't know any details beyond that. It
> could be a real SLES15 system, or it could be a build in mock, or SUSE's
> chroot build tool if they don't have mock.
>
> You can see the build logs from the community builds of glusterfs-5.5 and
> glusterfs-5.6 for SLES15 at [1] and [2] respectively. AFAIK it's a
> completely "vanilla" SLES15 and seems to have rpcgen-1.3-2.18 available.
> Finding things in the OBS repos seems to be hit or miss sometimes. I can't
> find the SLE_15 rpcgen package.
>
> (Back in SLES11 days I had a free eval license that let me update and
> install add-on packages on my own system. I tried to get a similar license
> for SLES12 and was advised to just use OBS. I haven't even bothered trying
> to get one for SLES15. It makes it harder IMO to figure things out.)
>
> I recommend asking the OBS team on #opensuse-buildservice on (freenode)
> IRC. They've always been very helpful to me.
>

Miuku on #opensuse-buildservice poked around and found that the unbundled
rpcgen in SLE_15 comes from the rpcsvc-proto rpm. (Not the rpcgen rpm as it
does in Fedora and RHEL8.)

All the gluster community packages for SLE_15 going back to glusterfs-5.0
in October 2018 have used the unbundled rpcgen. You can do the same, or
remove the BuildRequires: rpcgen line and use the glibc bundled rpcgen.

HTH

--

Kaleb
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190510/6ffe1af5/attachment.html>

From pgurusid at redhat.com  Mon May 13 05:22:06 2019
From: pgurusid at redhat.com (Poornima Gurusiddaiah)
Date: Mon, 13 May 2019 10:52:06 +0530
Subject: [Gluster-users] [Gluster-devel] Improve stability between
 SMB/CTDB and Gluster (together with Samba Core Developer)
In-Reply-To: <CAJyj9j_BofU7ziSNmTQWzcs4r9xLzzTNKetk+nhHay70T_z1qA@mail.gmail.com>
References: <CAJyj9j_BofU7ziSNmTQWzcs4r9xLzzTNKetk+nhHay70T_z1qA@mail.gmail.com>
Message-ID: <CALXm_dFHHX7Sn8YvpemN1B6YPPB4W+4hZ+50NfTrDaijqejk4Q@mail.gmail.com>

Hi,

We would be definitely interested in this. Thank you for contacting us. For
the starter we can have an online conference. Please suggest few possible
date and times for the week(preferably between IST 7.00AM - 9.PM)?
Adding Anoop and Gunther who are also the main contributors to the
Gluster-Samba integration.

Thanks,
Poornima


On Thu, May 9, 2019 at 7:43 PM David Spisla <spisla80 at gmail.com> wrote:

> Dear Gluster Community,
> at the moment we are improving the stability of SMB/CTDB and Gluster. For
> this purpose we are working together with an advanced SAMBA Core Developer.
> He did some debugging but needs more information about Gluster Core
> Behaviour.
>
> *Would any of the Gluster Developer wants to have a online conference with
> him and me?*
>
> I would organize everything. In my opinion this is a good chance to
> improve stability of Glusterfs and this is at the moment one of the major
> issues in the Community.
>
> Regards
> David Spisla
> _______________________________________________
>
> Community Meeting Calendar:
>
> APAC Schedule -
> Every 2nd and 4th Tuesday at 11:30 AM IST
> Bridge: https://bluejeans.com/836554017
>
> NA/EMEA Schedule -
> Every 1st and 3rd Tuesday at 01:00 PM EDT
> Bridge: https://bluejeans.com/486278655
>
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190513/9dbf62b5/attachment.html>

From spisla80 at gmail.com  Mon May 13 06:10:35 2019
From: spisla80 at gmail.com (David Spisla)
Date: Mon, 13 May 2019 08:10:35 +0200
Subject: [Gluster-users] Create Gluster RPMs on a SLES15 machine
In-Reply-To: <CAC+Jd5D=QiJn4vsp=weSoiQOHRoEg91yanHinEsXcRCdn1r1RA@mail.gmail.com>
References: <CAJyj9j-A1VzvmQWJgSUj1NDZrc_eGsUdH5GU_F_gyd4n8jR4tQ@mail.gmail.com>
	<CAC+Jd5Dy-GkfQbwkHmJEAXoGqqHfUEqw_e1otj+rJPdmAPk_yA@mail.gmail.com>
	<CAC+Jd5D=QiJn4vsp=weSoiQOHRoEg91yanHinEsXcRCdn1r1RA@mail.gmail.com>
Message-ID: <CAJyj9j9wX=J601SiP-ic1LnbQKWe4LKmGHd4LX=LpzRCsU3EmQ@mail.gmail.com>

Hello Kaleb,

thank you for the info. I'll try this out.

Regards
David

Am Fr., 10. Mai 2019 um 16:24 Uhr schrieb Kaleb Keithley <
kkeithle at redhat.com>:

> Seems I accidentally omitted gluster-users in my first reply.
>
> On Thu, May 9, 2019 at 3:19 PM Kaleb Keithley <kkeithle at redhat.com> wrote:
>
>> On Thu, May 9, 2019 at 8:53 AM David Spisla <spisla80 at gmail.com> wrote:
>>
>>> Hello Kaleb,
>>>
>>> i am trying to create my own Gluster v5.5 RPMs for SLES15 and I am using
>>> a SLES15 system to create them. I got the following error message:
>>>
>>> rpmbuild --define '_topdir
>>>> /home/davids/glusterfs/extras/LinuxRPM/rpmbuild' --with gnfs -bb
>>>> rpmbuild/SPECS/glusterfs.spec
>>>> warning: bogus date in %changelog: Tue Apr 17 2019 kkeithle at
>>>> redhat.com
>>>> warning: bogus date in %changelog: Fri Sep 19 2018 kkeithle at
>>>> redhat.com
>>>> error: Failed build dependencies:
>>>>     rpcgen is needed by glusterfs-5.5-100.x86_64
>>>> make: *** [Makefile:579: rpms] Error 1
>>>>
>>>>
>>> In the corresponding glusterfs.spec file (branch sles15-glusterfs-5 in
>>> Repo glusterfs-suse) there is rpcgen listed as dependency. But
>>> unfortunately there is no rpcgen package provided on SLES15. Or with other
>>> words:
>>> I did only find RPMs for other SUSE distributions, but not for SLES15.
>>>
>>> Do you know that issue?
>>>
>>
>> I'm afraid I don't.
>>
>>
>>> What is the name of the distribution which you are using to create
>>> Packages for SLES15?
>>>
>>
>> The community packages are built on the OpenSUSE OBS and they are built
>> on SLES15 ?the one that OBS provides. I don't know any details beyond that.
>> It could be a real SLES15 system, or it could be a build in mock, or SUSE's
>> chroot build tool if they don't have mock.
>>
>> You can see the build logs from the community builds of glusterfs-5.5 and
>> glusterfs-5.6 for SLES15 at [1] and [2] respectively. AFAIK it's a
>> completely "vanilla" SLES15 and seems to have rpcgen-1.3-2.18 available.
>> Finding things in the OBS repos seems to be hit or miss sometimes. I can't
>> find the SLE_15 rpcgen package.
>>
>> (Back in SLES11 days I had a free eval license that let me update and
>> install add-on packages on my own system. I tried to get a similar license
>> for SLES12 and was advised to just use OBS. I haven't even bothered trying
>> to get one for SLES15. It makes it harder IMO to figure things out.)
>>
>> I recommend asking the OBS team on #opensuse-buildservice on (freenode)
>> IRC. They've always been very helpful to me.
>>
>
> Miuku on #opensuse-buildservice poked around and found that the unbundled
> rpcgen in SLE_15 comes from the rpcsvc-proto rpm. (Not the rpcgen rpm as it
> does in Fedora and RHEL8.)
>
> All the gluster community packages for SLE_15 going back to glusterfs-5.0
> in October 2018 have used the unbundled rpcgen. You can do the same, or
> remove the BuildRequires: rpcgen line and use the glibc bundled rpcgen.
>
> HTH
>
> --
>
> Kaleb
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190513/8d5e3152/attachment.html>

From snowmailer at gmail.com  Mon May 13 06:47:45 2019
From: snowmailer at gmail.com (Martin Toth)
Date: Mon, 13 May 2019 08:47:45 +0200
Subject: [Gluster-users] VMs blocked for more than 120 seconds
Message-ID: <E2FA4FEE-5830-4270-925F-9808A9C6603E@gmail.com>

Hi all,

I am running replica 3 on SSDs with 10G networking, everything works OK but VMs stored in Gluster volume occasionally freeze with ?Task XY blocked for more than 120 seconds?.
Only solution is to poweroff (hard) VM and than boot it up again. I am unable to SSH and also login with console, its stuck probably on some disk operation. No error/warning logs or messages are store in VMs logs.

KVM/Libvirt(qemu) using libgfapi and fuse mount to access VM disks on replica volume. Can someone advice  how to debug this problem or what can cause these issues? 
It?s really annoying, I?ve tried to google everything but nothing came up. I?ve tried changing virtio-scsi-pci to virtio-blk-pci disk drivers, but its not related.

BR,
Martin


These are volume settings :

Type: Replicate
Volume ID: b021bbb6-fa99-4cc7-88f6-49152a22cb9e
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: node1:/imagestore/brick1
Brick2: node2:/imagestore/brick1
Brick3: node3:/imagestore/brick1
Options Reconfigured:
performance.client-io-threads: on
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: on
cluster.min-free-disk: 10%
cluster.server-quorum-type: server
cluster.quorum-type: auto
cluster.eager-lock: enable
cluster.data-self-heal-algorithm: full
network.remote-dio: enable
network.ping-timeout: 30
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
client.event-threads: 4
server.event-threads: 4
storage.owner-gid: 9869
storage.owner-uid: 9869
server.allow-insecure: on
nfs.disable: on
performance.readdir-ahead: on


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190513/7b425685/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screen Shot 2019-05-13 at 08.32.24.png
Type: image/png
Size: 144426 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190513/7b425685/attachment.png>

From lemonnierk at ulrar.net  Mon May 13 06:55:48 2019
From: lemonnierk at ulrar.net (lemonnierk at ulrar.net)
Date: Mon, 13 May 2019 07:55:48 +0100
Subject: [Gluster-users] VMs blocked for more than 120 seconds
In-Reply-To: <E2FA4FEE-5830-4270-925F-9808A9C6603E@gmail.com>
References: <E2FA4FEE-5830-4270-925F-9808A9C6603E@gmail.com>
Message-ID: <20190513065548.GI25080@althea.ulrar.net>

On Mon, May 13, 2019 at 08:47:45AM +0200, Martin Toth wrote:
> Hi all,

Hi

> 
> I am running replica 3 on SSDs with 10G networking, everything works OK but VMs stored in Gluster volume occasionally freeze with ?Task XY blocked for more than 120 seconds?.
> Only solution is to poweroff (hard) VM and than boot it up again. I am unable to SSH and also login with console, its stuck probably on some disk operation. No error/warning logs or messages are store in VMs logs.
> 

As far as I know this should be unrelated, I get this during heals
without any freezes, it just means the storage is slow I think.

> KVM/Libvirt(qemu) using libgfapi and fuse mount to access VM disks on replica volume. Can someone advice  how to debug this problem or what can cause these issues? 
> It?s really annoying, I?ve tried to google everything but nothing came up. I?ve tried changing virtio-scsi-pci to virtio-blk-pci disk drivers, but its not related.
> 

Any chance your gluster goes readonly ? Have you checked your gluster
logs to see if maybe they lose each other some times ?
/var/log/glusterfs

For libgfapi accesses you'd have it's log on qemu's standard output,
that might contain the actual error at the time of the freez.

From snowmailer at gmail.com  Mon May 13 07:03:40 2019
From: snowmailer at gmail.com (Martin Toth)
Date: Mon, 13 May 2019 09:03:40 +0200
Subject: [Gluster-users] VMs blocked for more than 120 seconds
In-Reply-To: <20190513065548.GI25080@althea.ulrar.net>
References: <E2FA4FEE-5830-4270-925F-9808A9C6603E@gmail.com>
	<20190513065548.GI25080@althea.ulrar.net>
Message-ID: <B7CB29FD-5A42-46EE-A46B-5C4581E9DA24@gmail.com>

Hi,

there is no healing operation, not peer disconnects, no readonly filesystem. Yes, storage is slow and unavailable for 120 seconds, but why, its SSD with 10G, performance is good.

> you'd have it's log on qemu's standard output,

If you mean /var/log/libvirt/qemu/vm.log there is nothing. I am looking for problem for more than month, tried everything. Can?t find anything. Any more clues or leads?

BR,
Martin

> On 13 May 2019, at 08:55, lemonnierk at ulrar.net wrote:
> 
> On Mon, May 13, 2019 at 08:47:45AM +0200, Martin Toth wrote:
>> Hi all,
> 
> Hi
> 
>> 
>> I am running replica 3 on SSDs with 10G networking, everything works OK but VMs stored in Gluster volume occasionally freeze with ?Task XY blocked for more than 120 seconds?.
>> Only solution is to poweroff (hard) VM and than boot it up again. I am unable to SSH and also login with console, its stuck probably on some disk operation. No error/warning logs or messages are store in VMs logs.
>> 
> 
> As far as I know this should be unrelated, I get this during heals
> without any freezes, it just means the storage is slow I think.
> 
>> KVM/Libvirt(qemu) using libgfapi and fuse mount to access VM disks on replica volume. Can someone advice  how to debug this problem or what can cause these issues? 
>> It?s really annoying, I?ve tried to google everything but nothing came up. I?ve tried changing virtio-scsi-pci to virtio-blk-pci disk drivers, but its not related.
>> 
> 
> Any chance your gluster goes readonly ? Have you checked your gluster
> logs to see if maybe they lose each other some times ?
> /var/log/glusterfs
> 
> For libgfapi accesses you'd have it's log on qemu's standard output,
> that might contain the actual error at the time of the freez.
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users


From kdhananj at redhat.com  Mon May 13 07:19:25 2019
From: kdhananj at redhat.com (Krutika Dhananjay)
Date: Mon, 13 May 2019 12:49:25 +0530
Subject: [Gluster-users] VMs blocked for more than 120 seconds
In-Reply-To: <B7CB29FD-5A42-46EE-A46B-5C4581E9DA24@gmail.com>
References: <E2FA4FEE-5830-4270-925F-9808A9C6603E@gmail.com>
	<20190513065548.GI25080@althea.ulrar.net>
	<B7CB29FD-5A42-46EE-A46B-5C4581E9DA24@gmail.com>
Message-ID: <CAPhYV8Px8G=e5GJnk=SnS+BUa5U8xkUSMWhF0+P8bJYENLKOcA@mail.gmail.com>

What version of gluster are you using?
Also, can you capture and share volume-profile output for a run where you
manage to recreate this issue?
https://docs.gluster.org/en/v3/Administrator%20Guide/Monitoring%20Workload/#running-glusterfs-volume-profile-command
Let me know if you have any questions.

-Krutika

On Mon, May 13, 2019 at 12:34 PM Martin Toth <snowmailer at gmail.com> wrote:

> Hi,
>
> there is no healing operation, not peer disconnects, no readonly
> filesystem. Yes, storage is slow and unavailable for 120 seconds, but why,
> its SSD with 10G, performance is good.
>
> > you'd have it's log on qemu's standard output,
>
> If you mean /var/log/libvirt/qemu/vm.log there is nothing. I am looking
> for problem for more than month, tried everything. Can?t find anything. Any
> more clues or leads?
>
> BR,
> Martin
>
> > On 13 May 2019, at 08:55, lemonnierk at ulrar.net wrote:
> >
> > On Mon, May 13, 2019 at 08:47:45AM +0200, Martin Toth wrote:
> >> Hi all,
> >
> > Hi
> >
> >>
> >> I am running replica 3 on SSDs with 10G networking, everything works OK
> but VMs stored in Gluster volume occasionally freeze with ?Task XY blocked
> for more than 120 seconds?.
> >> Only solution is to poweroff (hard) VM and than boot it up again. I am
> unable to SSH and also login with console, its stuck probably on some disk
> operation. No error/warning logs or messages are store in VMs logs.
> >>
> >
> > As far as I know this should be unrelated, I get this during heals
> > without any freezes, it just means the storage is slow I think.
> >
> >> KVM/Libvirt(qemu) using libgfapi and fuse mount to access VM disks on
> replica volume. Can someone advice  how to debug this problem or what can
> cause these issues?
> >> It?s really annoying, I?ve tried to google everything but nothing came
> up. I?ve tried changing virtio-scsi-pci to virtio-blk-pci disk drivers, but
> its not related.
> >>
> >
> > Any chance your gluster goes readonly ? Have you checked your gluster
> > logs to see if maybe they lose each other some times ?
> > /var/log/glusterfs
> >
> > For libgfapi accesses you'd have it's log on qemu's standard output,
> > that might contain the actual error at the time of the freez.
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > https://lists.gluster.org/mailman/listinfo/gluster-users
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190513/262acc1b/attachment.html>

From kdhananj at redhat.com  Mon May 13 07:21:19 2019
From: kdhananj at redhat.com (Krutika Dhananjay)
Date: Mon, 13 May 2019 12:51:19 +0530
Subject: [Gluster-users] VMs blocked for more than 120 seconds
In-Reply-To: <CAPhYV8Px8G=e5GJnk=SnS+BUa5U8xkUSMWhF0+P8bJYENLKOcA@mail.gmail.com>
References: <E2FA4FEE-5830-4270-925F-9808A9C6603E@gmail.com>
	<20190513065548.GI25080@althea.ulrar.net>
	<B7CB29FD-5A42-46EE-A46B-5C4581E9DA24@gmail.com>
	<CAPhYV8Px8G=e5GJnk=SnS+BUa5U8xkUSMWhF0+P8bJYENLKOcA@mail.gmail.com>
Message-ID: <CAPhYV8P2P4cT6fKkRJ5K9OAux2ams2iRM4-PQKD2Svw1AJmZCQ@mail.gmail.com>

Also, what's the caching policy that qemu is using on the affected vms?
Is it cache=none? Or something else? You can get this information in the
command line of qemu-kvm process corresponding to your vm in the ps output.

-Krutika

On Mon, May 13, 2019 at 12:49 PM Krutika Dhananjay <kdhananj at redhat.com>
wrote:

> What version of gluster are you using?
> Also, can you capture and share volume-profile output for a run where you
> manage to recreate this issue?
>
> https://docs.gluster.org/en/v3/Administrator%20Guide/Monitoring%20Workload/#running-glusterfs-volume-profile-command
> Let me know if you have any questions.
>
> -Krutika
>
> On Mon, May 13, 2019 at 12:34 PM Martin Toth <snowmailer at gmail.com> wrote:
>
>> Hi,
>>
>> there is no healing operation, not peer disconnects, no readonly
>> filesystem. Yes, storage is slow and unavailable for 120 seconds, but why,
>> its SSD with 10G, performance is good.
>>
>> > you'd have it's log on qemu's standard output,
>>
>> If you mean /var/log/libvirt/qemu/vm.log there is nothing. I am looking
>> for problem for more than month, tried everything. Can?t find anything. Any
>> more clues or leads?
>>
>> BR,
>> Martin
>>
>> > On 13 May 2019, at 08:55, lemonnierk at ulrar.net wrote:
>> >
>> > On Mon, May 13, 2019 at 08:47:45AM +0200, Martin Toth wrote:
>> >> Hi all,
>> >
>> > Hi
>> >
>> >>
>> >> I am running replica 3 on SSDs with 10G networking, everything works
>> OK but VMs stored in Gluster volume occasionally freeze with ?Task XY
>> blocked for more than 120 seconds?.
>> >> Only solution is to poweroff (hard) VM and than boot it up again. I am
>> unable to SSH and also login with console, its stuck probably on some disk
>> operation. No error/warning logs or messages are store in VMs logs.
>> >>
>> >
>> > As far as I know this should be unrelated, I get this during heals
>> > without any freezes, it just means the storage is slow I think.
>> >
>> >> KVM/Libvirt(qemu) using libgfapi and fuse mount to access VM disks on
>> replica volume. Can someone advice  how to debug this problem or what can
>> cause these issues?
>> >> It?s really annoying, I?ve tried to google everything but nothing came
>> up. I?ve tried changing virtio-scsi-pci to virtio-blk-pci disk drivers, but
>> its not related.
>> >>
>> >
>> > Any chance your gluster goes readonly ? Have you checked your gluster
>> > logs to see if maybe they lose each other some times ?
>> > /var/log/glusterfs
>> >
>> > For libgfapi accesses you'd have it's log on qemu's standard output,
>> > that might contain the actual error at the time of the freez.
>> > _______________________________________________
>> > Gluster-users mailing list
>> > Gluster-users at gluster.org
>> > https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190513/ebc9d839/attachment.html>

From snowmailer at gmail.com  Mon May 13 07:31:51 2019
From: snowmailer at gmail.com (Martin Toth)
Date: Mon, 13 May 2019 09:31:51 +0200
Subject: [Gluster-users] VMs blocked for more than 120 seconds
In-Reply-To: <CAPhYV8P2P4cT6fKkRJ5K9OAux2ams2iRM4-PQKD2Svw1AJmZCQ@mail.gmail.com>
References: <E2FA4FEE-5830-4270-925F-9808A9C6603E@gmail.com>
	<20190513065548.GI25080@althea.ulrar.net>
	<B7CB29FD-5A42-46EE-A46B-5C4581E9DA24@gmail.com>
	<CAPhYV8Px8G=e5GJnk=SnS+BUa5U8xkUSMWhF0+P8bJYENLKOcA@mail.gmail.com>
	<CAPhYV8P2P4cT6fKkRJ5K9OAux2ams2iRM4-PQKD2Svw1AJmZCQ@mail.gmail.com>
Message-ID: <681F0862-7C80-414D-9637-7697A8C65AFA@gmail.com>

Cache in qemu is none. That should be correct. This is full command :

/usr/bin/qemu-system-x86_64 -name one-312 -S -machine pc-i440fx-xenial,accel=kvm,usb=off -m 4096 -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -uuid e95a774e-a594-4e98-b141-9f30a3f848c1 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-one-312/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime -no-shutdown -boot order=c,menu=on,splash-time=3000,strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 

-device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4
-device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5
-drive file=/var/lib/one//datastores/116/312/disk.0,format=raw,if=none,id=drive-virtio-disk1,cache=none
	-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk1,id=virtio-disk1
-drive file=gluster://localhost:24007/imagestore/7b64d6757acc47a39503f68731f89b8e,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none
	-device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0
-drive file=/var/lib/one//datastores/116/312/disk.1,format=raw,if=none,id=drive-ide0-0-0,readonly=on
	-device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0

-netdev tap,fd=26,id=hostnet0 -device e1000,netdev=hostnet0,id=net0,mac=02:00:5c:f0:e4:39,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-one-312/org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -vnc 0.0.0.0:312,password -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on

I?ve highlighted disks. First is VM context disk - Fuse used, second is SDA (OS is installed here) - libgfapi used, third is SWAP - Fuse used.

Krutika,
I will start profiling on Gluster Volumes and wait for next VM to fail. Than I will attach/send profiling info after some VM will be failed. I suppose this is correct profiling strategy.

Thanks,
BR!
Martin

> On 13 May 2019, at 09:21, Krutika Dhananjay <kdhananj at redhat.com> wrote:
> 
> Also, what's the caching policy that qemu is using on the affected vms?
> Is it cache=none? Or something else? You can get this information in the command line of qemu-kvm process corresponding to your vm in the ps output.
> 
> -Krutika
> 
> On Mon, May 13, 2019 at 12:49 PM Krutika Dhananjay <kdhananj at redhat.com <mailto:kdhananj at redhat.com>> wrote:
> What version of gluster are you using?
> Also, can you capture and share volume-profile output for a run where you manage to recreate this issue?
> https://docs.gluster.org/en/v3/Administrator%20Guide/Monitoring%20Workload/#running-glusterfs-volume-profile-command <https://docs.gluster.org/en/v3/Administrator%20Guide/Monitoring%20Workload/#running-glusterfs-volume-profile-command>
> Let me know if you have any questions.
> 
> -Krutika
> 
> On Mon, May 13, 2019 at 12:34 PM Martin Toth <snowmailer at gmail.com <mailto:snowmailer at gmail.com>> wrote:
> Hi,
> 
> there is no healing operation, not peer disconnects, no readonly filesystem. Yes, storage is slow and unavailable for 120 seconds, but why, its SSD with 10G, performance is good.
> 
> > you'd have it's log on qemu's standard output,
> 
> If you mean /var/log/libvirt/qemu/vm.log there is nothing. I am looking for problem for more than month, tried everything. Can?t find anything. Any more clues or leads?
> 
> BR,
> Martin
> 
> > On 13 May 2019, at 08:55, lemonnierk at ulrar.net <mailto:lemonnierk at ulrar.net> wrote:
> > 
> > On Mon, May 13, 2019 at 08:47:45AM +0200, Martin Toth wrote:
> >> Hi all,
> > 
> > Hi
> > 
> >> 
> >> I am running replica 3 on SSDs with 10G networking, everything works OK but VMs stored in Gluster volume occasionally freeze with ?Task XY blocked for more than 120 seconds?.
> >> Only solution is to poweroff (hard) VM and than boot it up again. I am unable to SSH and also login with console, its stuck probably on some disk operation. No error/warning logs or messages are store in VMs logs.
> >> 
> > 
> > As far as I know this should be unrelated, I get this during heals
> > without any freezes, it just means the storage is slow I think.
> > 
> >> KVM/Libvirt(qemu) using libgfapi and fuse mount to access VM disks on replica volume. Can someone advice  how to debug this problem or what can cause these issues? 
> >> It?s really annoying, I?ve tried to google everything but nothing came up. I?ve tried changing virtio-scsi-pci to virtio-blk-pci disk drivers, but its not related.
> >> 
> > 
> > Any chance your gluster goes readonly ? Have you checked your gluster
> > logs to see if maybe they lose each other some times ?
> > /var/log/glusterfs
> > 
> > For libgfapi accesses you'd have it's log on qemu's standard output,
> > that might contain the actual error at the time of the freez.
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
> > https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users>
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
> https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190513/594ca4a2/attachment.html>

From andrevolodin at gmail.com  Mon May 13 07:33:55 2019
From: andrevolodin at gmail.com (Andrey Volodin)
Date: Mon, 13 May 2019 07:33:55 +0000
Subject: [Gluster-users] VMs blocked for more than 120 seconds
In-Reply-To: <681F0862-7C80-414D-9637-7697A8C65AFA@gmail.com>
References: <E2FA4FEE-5830-4270-925F-9808A9C6603E@gmail.com>
	<20190513065548.GI25080@althea.ulrar.net>
	<B7CB29FD-5A42-46EE-A46B-5C4581E9DA24@gmail.com>
	<CAPhYV8Px8G=e5GJnk=SnS+BUa5U8xkUSMWhF0+P8bJYENLKOcA@mail.gmail.com>
	<CAPhYV8P2P4cT6fKkRJ5K9OAux2ams2iRM4-PQKD2Svw1AJmZCQ@mail.gmail.com>
	<681F0862-7C80-414D-9637-7697A8C65AFA@gmail.com>
Message-ID: <CAJo=0L3X_g42d271tsF14f-z_Gfe_b_+vbu8zH6-offY=fvfrQ@mail.gmail.com>

as per
https://helpful.knobs-dials.com/index.php/INFO:_task_blocked_for_more_than_120_seconds.
,
the informational warning could be suppressed with :

"echo 0 > /proc/sys/kernel/hung_task_timeout_secs"

Moreover, as per their website : "*This message is not an error*.
It is an indication that a program has had to wait for a very long time,
and what it was doing. "
More reference:
https://serverfault.com/questions/405210/can-high-load-cause-server-hang-and-error-blocked-for-more-than-120-seconds

Regards,
Andrei

On Mon, May 13, 2019 at 7:32 AM Martin Toth <snowmailer at gmail.com> wrote:

> Cache in qemu is none. That should be correct. This is full command :
>
> /usr/bin/qemu-system-x86_64 -name one-312 -S -machine
> pc-i440fx-xenial,accel=kvm,usb=off -m 4096 -realtime mlock=off -smp
> 4,sockets=4,cores=1,threads=1 -uuid e95a774e-a594-4e98-b141-9f30a3f848c1
> -no-user-config -nodefaults -chardev
> socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-one-312/monitor.sock,server,nowait
> -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime
> -no-shutdown -boot order=c,menu=on,splash-time=3000,strict=on -device
> piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2
>
> -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4
> -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5
> -drive file=/var/lib/one//datastores/116/312/*disk.0*
> ,format=raw,if=none,id=drive-virtio-disk1,cache=none
> -device
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk1,id=virtio-disk1
> -drive file=gluster://localhost:24007/imagestore/
> *7b64d6757acc47a39503f68731f89b8e*
> ,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none
> -device
> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0
> -drive file=/var/lib/one//datastores/116/312/*disk.1*
> ,format=raw,if=none,id=drive-ide0-0-0,readonly=on
> -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0
>
> -netdev tap,fd=26,id=hostnet0
> -device e1000,netdev=hostnet0,id=net0,mac=02:00:5c:f0:e4:39,bus=pci.0,addr=0x3
> -chardev pty,id=charserial0 -device
> isa-serial,chardev=charserial0,id=serial0
> -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-one-312/org.qemu.guest_agent.0,server,nowait
> -device
> virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0
> -vnc 0.0.0.0:312,password -device cirrus-vga,id=video0,bus=pci.0,addr=0x2
> -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on
>
> I?ve highlighted disks. First is VM context disk - Fuse used, second is
> SDA (OS is installed here) - libgfapi used, third is SWAP - Fuse used.
>
> Krutika,
> I will start profiling on Gluster Volumes and wait for next VM to fail.
> Than I will attach/send profiling info after some VM will be failed. I
> suppose this is correct profiling strategy.
>
> Thanks,
> BR!
> Martin
>
> On 13 May 2019, at 09:21, Krutika Dhananjay <kdhananj at redhat.com> wrote:
>
> Also, what's the caching policy that qemu is using on the affected vms?
> Is it cache=none? Or something else? You can get this information in the
> command line of qemu-kvm process corresponding to your vm in the ps output.
>
> -Krutika
>
> On Mon, May 13, 2019 at 12:49 PM Krutika Dhananjay <kdhananj at redhat.com>
> wrote:
>
>> What version of gluster are you using?
>> Also, can you capture and share volume-profile output for a run where you
>> manage to recreate this issue?
>>
>> https://docs.gluster.org/en/v3/Administrator%20Guide/Monitoring%20Workload/#running-glusterfs-volume-profile-command
>> Let me know if you have any questions.
>>
>> -Krutika
>>
>> On Mon, May 13, 2019 at 12:34 PM Martin Toth <snowmailer at gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> there is no healing operation, not peer disconnects, no readonly
>>> filesystem. Yes, storage is slow and unavailable for 120 seconds, but why,
>>> its SSD with 10G, performance is good.
>>>
>>> > you'd have it's log on qemu's standard output,
>>>
>>> If you mean /var/log/libvirt/qemu/vm.log there is nothing. I am looking
>>> for problem for more than month, tried everything. Can?t find anything. Any
>>> more clues or leads?
>>>
>>> BR,
>>> Martin
>>>
>>> > On 13 May 2019, at 08:55, lemonnierk at ulrar.net wrote:
>>> >
>>> > On Mon, May 13, 2019 at 08:47:45AM +0200, Martin Toth wrote:
>>> >> Hi all,
>>> >
>>> > Hi
>>> >
>>> >>
>>> >> I am running replica 3 on SSDs with 10G networking, everything works
>>> OK but VMs stored in Gluster volume occasionally freeze with ?Task XY
>>> blocked for more than 120 seconds?.
>>> >> Only solution is to poweroff (hard) VM and than boot it up again. I
>>> am unable to SSH and also login with console, its stuck probably on some
>>> disk operation. No error/warning logs or messages are store in VMs logs.
>>> >>
>>> >
>>> > As far as I know this should be unrelated, I get this during heals
>>> > without any freezes, it just means the storage is slow I think.
>>> >
>>> >> KVM/Libvirt(qemu) using libgfapi and fuse mount to access VM disks on
>>> replica volume. Can someone advice  how to debug this problem or what can
>>> cause these issues?
>>> >> It?s really annoying, I?ve tried to google everything but nothing
>>> came up. I?ve tried changing virtio-scsi-pci to virtio-blk-pci disk
>>> drivers, but its not related.
>>> >>
>>> >
>>> > Any chance your gluster goes readonly ? Have you checked your gluster
>>> > logs to see if maybe they lose each other some times ?
>>> > /var/log/glusterfs
>>> >
>>> > For libgfapi accesses you'd have it's log on qemu's standard output,
>>> > that might contain the actual error at the time of the freez.
>>> > _______________________________________________
>>> > Gluster-users mailing list
>>> > Gluster-users at gluster.org
>>> > https://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190513/2e9124ea/attachment.html>

From andrevolodin at gmail.com  Mon May 13 07:37:15 2019
From: andrevolodin at gmail.com (Andrey Volodin)
Date: Mon, 13 May 2019 07:37:15 +0000
Subject: [Gluster-users] VMs blocked for more than 120 seconds
In-Reply-To: <CAJo=0L3X_g42d271tsF14f-z_Gfe_b_+vbu8zH6-offY=fvfrQ@mail.gmail.com>
References: <E2FA4FEE-5830-4270-925F-9808A9C6603E@gmail.com>
	<20190513065548.GI25080@althea.ulrar.net>
	<B7CB29FD-5A42-46EE-A46B-5C4581E9DA24@gmail.com>
	<CAPhYV8Px8G=e5GJnk=SnS+BUa5U8xkUSMWhF0+P8bJYENLKOcA@mail.gmail.com>
	<CAPhYV8P2P4cT6fKkRJ5K9OAux2ams2iRM4-PQKD2Svw1AJmZCQ@mail.gmail.com>
	<681F0862-7C80-414D-9637-7697A8C65AFA@gmail.com>
	<CAJo=0L3X_g42d271tsF14f-z_Gfe_b_+vbu8zH6-offY=fvfrQ@mail.gmail.com>
Message-ID: <CAJo=0L3+5v812sRSN3B8Z08E+hCgQx4-2=rViavFj85HPXOHCQ@mail.gmail.com>

what is the context from dmesg ?

On Mon, May 13, 2019 at 7:33 AM Andrey Volodin <andrevolodin at gmail.com>
wrote:

> as per
> https://helpful.knobs-dials.com/index.php/INFO:_task_blocked_for_more_than_120_seconds. ,
> the informational warning could be suppressed with :
>
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>
> Moreover, as per their website : "*This message is not an error*.
> It is an indication that a program has had to wait for a very long time,
> and what it was doing. "
> More reference:
> https://serverfault.com/questions/405210/can-high-load-cause-server-hang-and-error-blocked-for-more-than-120-seconds
>
> Regards,
> Andrei
>
> On Mon, May 13, 2019 at 7:32 AM Martin Toth <snowmailer at gmail.com> wrote:
>
>> Cache in qemu is none. That should be correct. This is full command :
>>
>> /usr/bin/qemu-system-x86_64 -name one-312 -S -machine
>> pc-i440fx-xenial,accel=kvm,usb=off -m 4096 -realtime mlock=off -smp
>> 4,sockets=4,cores=1,threads=1 -uuid e95a774e-a594-4e98-b141-9f30a3f848c1
>> -no-user-config -nodefaults -chardev
>> socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-one-312/monitor.sock,server,nowait
>> -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime
>> -no-shutdown -boot order=c,menu=on,splash-time=3000,strict=on -device
>> piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2
>>
>> -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4
>> -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5
>> -drive file=/var/lib/one//datastores/116/312/*disk.0*
>> ,format=raw,if=none,id=drive-virtio-disk1,cache=none
>> -device
>> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk1,id=virtio-disk1
>> -drive file=gluster://localhost:24007/imagestore/
>> *7b64d6757acc47a39503f68731f89b8e*
>> ,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none
>> -device
>> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0
>> -drive file=/var/lib/one//datastores/116/312/*disk.1*
>> ,format=raw,if=none,id=drive-ide0-0-0,readonly=on
>> -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0
>>
>> -netdev tap,fd=26,id=hostnet0
>> -device e1000,netdev=hostnet0,id=net0,mac=02:00:5c:f0:e4:39,bus=pci.0,addr=0x3
>> -chardev pty,id=charserial0 -device
>> isa-serial,chardev=charserial0,id=serial0
>> -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-one-312/org.qemu.guest_agent.0,server,nowait
>> -device
>> virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0
>> -vnc 0.0.0.0:312,password -device
>> cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device
>> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on
>>
>> I?ve highlighted disks. First is VM context disk - Fuse used, second is
>> SDA (OS is installed here) - libgfapi used, third is SWAP - Fuse used.
>>
>> Krutika,
>> I will start profiling on Gluster Volumes and wait for next VM to fail.
>> Than I will attach/send profiling info after some VM will be failed. I
>> suppose this is correct profiling strategy.
>>
>> Thanks,
>> BR!
>> Martin
>>
>> On 13 May 2019, at 09:21, Krutika Dhananjay <kdhananj at redhat.com> wrote:
>>
>> Also, what's the caching policy that qemu is using on the affected vms?
>> Is it cache=none? Or something else? You can get this information in the
>> command line of qemu-kvm process corresponding to your vm in the ps output.
>>
>> -Krutika
>>
>> On Mon, May 13, 2019 at 12:49 PM Krutika Dhananjay <kdhananj at redhat.com>
>> wrote:
>>
>>> What version of gluster are you using?
>>> Also, can you capture and share volume-profile output for a run where
>>> you manage to recreate this issue?
>>>
>>> https://docs.gluster.org/en/v3/Administrator%20Guide/Monitoring%20Workload/#running-glusterfs-volume-profile-command
>>> Let me know if you have any questions.
>>>
>>> -Krutika
>>>
>>> On Mon, May 13, 2019 at 12:34 PM Martin Toth <snowmailer at gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> there is no healing operation, not peer disconnects, no readonly
>>>> filesystem. Yes, storage is slow and unavailable for 120 seconds, but why,
>>>> its SSD with 10G, performance is good.
>>>>
>>>> > you'd have it's log on qemu's standard output,
>>>>
>>>> If you mean /var/log/libvirt/qemu/vm.log there is nothing. I am looking
>>>> for problem for more than month, tried everything. Can?t find anything. Any
>>>> more clues or leads?
>>>>
>>>> BR,
>>>> Martin
>>>>
>>>> > On 13 May 2019, at 08:55, lemonnierk at ulrar.net wrote:
>>>> >
>>>> > On Mon, May 13, 2019 at 08:47:45AM +0200, Martin Toth wrote:
>>>> >> Hi all,
>>>> >
>>>> > Hi
>>>> >
>>>> >>
>>>> >> I am running replica 3 on SSDs with 10G networking, everything works
>>>> OK but VMs stored in Gluster volume occasionally freeze with ?Task XY
>>>> blocked for more than 120 seconds?.
>>>> >> Only solution is to poweroff (hard) VM and than boot it up again. I
>>>> am unable to SSH and also login with console, its stuck probably on some
>>>> disk operation. No error/warning logs or messages are store in VMs logs.
>>>> >>
>>>> >
>>>> > As far as I know this should be unrelated, I get this during heals
>>>> > without any freezes, it just means the storage is slow I think.
>>>> >
>>>> >> KVM/Libvirt(qemu) using libgfapi and fuse mount to access VM disks
>>>> on replica volume. Can someone advice  how to debug this problem or what
>>>> can cause these issues?
>>>> >> It?s really annoying, I?ve tried to google everything but nothing
>>>> came up. I?ve tried changing virtio-scsi-pci to virtio-blk-pci disk
>>>> drivers, but its not related.
>>>> >>
>>>> >
>>>> > Any chance your gluster goes readonly ? Have you checked your gluster
>>>> > logs to see if maybe they lose each other some times ?
>>>> > /var/log/glusterfs
>>>> >
>>>> > For libgfapi accesses you'd have it's log on qemu's standard output,
>>>> > that might contain the actual error at the time of the freez.
>>>> > _______________________________________________
>>>> > Gluster-users mailing list
>>>> > Gluster-users at gluster.org
>>>> > https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190513/69552f10/attachment.html>

From hunter86_bg at yahoo.com  Mon May 13 07:44:05 2019
From: hunter86_bg at yahoo.com (Strahil)
Date: Mon, 13 May 2019 10:44:05 +0300
Subject: [Gluster-users] Advice needed for network change.
Message-ID: <78m66mxb8mlgq3hspi04g8vn.1557733445126@email.android.com>

Hey All,

I have managed to migrate but with some slight changes.
I'm using teaming with balance runner (l3 + l4 for balance hashing) on 6 ports (2 dual port + 2 nics using only 1 port  ) per machine.
Running some tests (8 network connections in parallel)  between the replica nodes shows  an aggregated  bandwidth of 400+ MB/s (megabytes).

Now I need to 'force' gluster to open more connections in order to spread the load to as many ports as possible.
I have tried with client & server event-threads set to 6 , but the performance is bellow my expectations .

Any hints will be appreciated.

Best Regards,
Strahil Nikolov On May 10, 2019 13:36, Strahil <hunter86_bg at yahoo.com> wrote:
>
> Hello Community,
>
> I'm making some changes and I would like to hear? your opinion on the topic.
>
> First, let me share my setup.
> I have 3 systems with in a replica 3 arbiter 1 hyperconverged setup (oVirt) which use 1 gbit networks for any connectivity.
>
> I have added 4 dual-port 1 gbit NICs ( 8 ports per machine in total) and connected them directly between ovirt1 and ovirt2 /data nodes/ with LACP aggregation (layer3+layer4 hashing).
>
> As ovirt1 & ovirt2 are directly connected /trying to reduce costs by avoiding the switch/ I have? setup /etc/hosts for the arbiter? /ovirt3/ to point tothe old IPs .
>
> So they look like:
> ovirt1 & ovirt2 /data nodes/? /etc/hosts:
>
> 127.0.0.1?? localhost localhost.localdomain localhost4 localhost4.localdomain4
> ::1???????? localhost localhost.localdomain localhost6 localhost6.localdomain6
> 192.168.1.90 ovirt1.localdomain ovirt1
> 192.168.1.64 ovirt2.localdomain ovirt2
> 192.168.1.41 ovirt3.localdomain ovirt3
> 10.10.10.1?? gluster1.localdomain gluster1
> 10.10.10.2?? gluster2.localdomain gluster2
>
> ovirt3 /etc/hosts:
>
> 127.0.0.1?? localhost localhost.localdomain localhost4 localhost4.localdomain4
> ::1???????? localhost localhost.localdomain localhost6 localhost6.localdomain6
> #As gluster1 & gluster2 are directly connected , we cannot reach them.
> 192.168.1.90 ovirt1.localdomain ovirt1 gluster1
> 192.168.1.64 ovirt2.localdomain ovirt2 gluster2
> 192.168.1.41 ovirt3.localdomain ovirt3
>
> Do you see any obstacles to 'peer probe' and then 'replace brick' the 2 data nodes.
> Downtime is not an issue, but I preffer not to wipe the setup.
>
> Thanks for reading this long post and don't hesitate to recommend any tunings.
> I am still considering what values to put for the? client/server thread count.
>
> Best Regards,
> Strahil Nikolov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190513/5ab54a2f/attachment.html>

From kdhananj at redhat.com  Mon May 13 08:20:14 2019
From: kdhananj at redhat.com (Krutika Dhananjay)
Date: Mon, 13 May 2019 13:50:14 +0530
Subject: [Gluster-users] VMs blocked for more than 120 seconds
In-Reply-To: <681F0862-7C80-414D-9637-7697A8C65AFA@gmail.com>
References: <E2FA4FEE-5830-4270-925F-9808A9C6603E@gmail.com>
	<20190513065548.GI25080@althea.ulrar.net>
	<B7CB29FD-5A42-46EE-A46B-5C4581E9DA24@gmail.com>
	<CAPhYV8Px8G=e5GJnk=SnS+BUa5U8xkUSMWhF0+P8bJYENLKOcA@mail.gmail.com>
	<CAPhYV8P2P4cT6fKkRJ5K9OAux2ams2iRM4-PQKD2Svw1AJmZCQ@mail.gmail.com>
	<681F0862-7C80-414D-9637-7697A8C65AFA@gmail.com>
Message-ID: <CAPhYV8MNBzMPkK2DvFXdjK2mBLwNJUtFW+=r=pO2-r5cRH3xPw@mail.gmail.com>

OK. In that case, can you check if the following two changes help:

# gluster volume set $VOL network.remote-dio off
# gluster volume set $VOL performance.strict-o-direct on

preferably one option changed at a time, its impact tested and then the
next change applied and tested.

Also, gluster version please?

-Krutika

On Mon, May 13, 2019 at 1:02 PM Martin Toth <snowmailer at gmail.com> wrote:

> Cache in qemu is none. That should be correct. This is full command :
>
> /usr/bin/qemu-system-x86_64 -name one-312 -S -machine
> pc-i440fx-xenial,accel=kvm,usb=off -m 4096 -realtime mlock=off -smp
> 4,sockets=4,cores=1,threads=1 -uuid e95a774e-a594-4e98-b141-9f30a3f848c1
> -no-user-config -nodefaults -chardev
> socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-one-312/monitor.sock,server,nowait
> -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime
> -no-shutdown -boot order=c,menu=on,splash-time=3000,strict=on -device
> piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2
>
> -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4
> -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5
> -drive file=/var/lib/one//datastores/116/312/*disk.0*
> ,format=raw,if=none,id=drive-virtio-disk1,cache=none
> -device
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk1,id=virtio-disk1
> -drive file=gluster://localhost:24007/imagestore/
> *7b64d6757acc47a39503f68731f89b8e*
> ,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none
> -device
> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0
> -drive file=/var/lib/one//datastores/116/312/*disk.1*
> ,format=raw,if=none,id=drive-ide0-0-0,readonly=on
> -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0
>
> -netdev tap,fd=26,id=hostnet0
> -device e1000,netdev=hostnet0,id=net0,mac=02:00:5c:f0:e4:39,bus=pci.0,addr=0x3
> -chardev pty,id=charserial0 -device
> isa-serial,chardev=charserial0,id=serial0
> -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-one-312/org.qemu.guest_agent.0,server,nowait
> -device
> virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0
> -vnc 0.0.0.0:312,password -device cirrus-vga,id=video0,bus=pci.0,addr=0x2
> -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on
>
> I?ve highlighted disks. First is VM context disk - Fuse used, second is
> SDA (OS is installed here) - libgfapi used, third is SWAP - Fuse used.
>
> Krutika,
> I will start profiling on Gluster Volumes and wait for next VM to fail.
> Than I will attach/send profiling info after some VM will be failed. I
> suppose this is correct profiling strategy.
>

About this, how many vms do you need to recreate it? A single vm? Or
multiple vms doing IO in parallel?


> Thanks,
> BR!
> Martin
>
> On 13 May 2019, at 09:21, Krutika Dhananjay <kdhananj at redhat.com> wrote:
>
> Also, what's the caching policy that qemu is using on the affected vms?
> Is it cache=none? Or something else? You can get this information in the
> command line of qemu-kvm process corresponding to your vm in the ps output.
>
> -Krutika
>
> On Mon, May 13, 2019 at 12:49 PM Krutika Dhananjay <kdhananj at redhat.com>
> wrote:
>
>> What version of gluster are you using?
>> Also, can you capture and share volume-profile output for a run where you
>> manage to recreate this issue?
>>
>> https://docs.gluster.org/en/v3/Administrator%20Guide/Monitoring%20Workload/#running-glusterfs-volume-profile-command
>> Let me know if you have any questions.
>>
>> -Krutika
>>
>> On Mon, May 13, 2019 at 12:34 PM Martin Toth <snowmailer at gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> there is no healing operation, not peer disconnects, no readonly
>>> filesystem. Yes, storage is slow and unavailable for 120 seconds, but why,
>>> its SSD with 10G, performance is good.
>>>
>>> > you'd have it's log on qemu's standard output,
>>>
>>> If you mean /var/log/libvirt/qemu/vm.log there is nothing. I am looking
>>> for problem for more than month, tried everything. Can?t find anything. Any
>>> more clues or leads?
>>>
>>> BR,
>>> Martin
>>>
>>> > On 13 May 2019, at 08:55, lemonnierk at ulrar.net wrote:
>>> >
>>> > On Mon, May 13, 2019 at 08:47:45AM +0200, Martin Toth wrote:
>>> >> Hi all,
>>> >
>>> > Hi
>>> >
>>> >>
>>> >> I am running replica 3 on SSDs with 10G networking, everything works
>>> OK but VMs stored in Gluster volume occasionally freeze with ?Task XY
>>> blocked for more than 120 seconds?.
>>> >> Only solution is to poweroff (hard) VM and than boot it up again. I
>>> am unable to SSH and also login with console, its stuck probably on some
>>> disk operation. No error/warning logs or messages are store in VMs logs.
>>> >>
>>> >
>>> > As far as I know this should be unrelated, I get this during heals
>>> > without any freezes, it just means the storage is slow I think.
>>> >
>>> >> KVM/Libvirt(qemu) using libgfapi and fuse mount to access VM disks on
>>> replica volume. Can someone advice  how to debug this problem or what can
>>> cause these issues?
>>> >> It?s really annoying, I?ve tried to google everything but nothing
>>> came up. I?ve tried changing virtio-scsi-pci to virtio-blk-pci disk
>>> drivers, but its not related.
>>> >>
>>> >
>>> > Any chance your gluster goes readonly ? Have you checked your gluster
>>> > logs to see if maybe they lose each other some times ?
>>> > /var/log/glusterfs
>>> >
>>> > For libgfapi accesses you'd have it's log on qemu's standard output,
>>> > that might contain the actual error at the time of the freez.
>>> > _______________________________________________
>>> > Gluster-users mailing list
>>> > Gluster-users at gluster.org
>>> > https://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190513/b3fa30eb/attachment.html>

From pgurusid at redhat.com  Tue May 14 04:36:21 2019
From: pgurusid at redhat.com (pgurusid at redhat.com)
Date: Tue, 14 May 2019 04:36:21 +0000
Subject: [Gluster-users] Invitation: Gluster Community Meeting (APAC
 friendly hours) @ Every 2 weeks at 11:30am on Tuesday 15 times (IST)
 (gluster-users@gluster.org)
Message-ID: <0000000000001e34cf0588d19307@google.com>

You have been invited to the following event.

Title: Gluster Community Meeting (APAC friendly hours)
Bridge: https://bluejeans.com/836554017

Meeting minutes: https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g?both

Previous Meeting notes: http://github.com/gluster/community
When: Every 2 weeks at 11:30am on Tuesday 15 times India Standard Time -  
Kolkata
Where: https://bluejeans.com/836554017
Calendar: gluster-users at gluster.org
Who:
     * pgurusid at redhat.com - organizer
     * gluster-users at gluster.org
     * maintainers at gluster.org
     * gluster-devel at gluster.org

Event details:  
https://www.google.com/calendar/event?action=VIEW&eid=NTEwOGJvMGZjMnRjN3Z0YzY0OGNmb3E4dXQgZ2x1c3Rlci11c2Vyc0BnbHVzdGVyLm9yZw&tok=MTkjcGd1cnVzaWRAcmVkaGF0LmNvbTk4OTgxMGM4NWE4YjNlMjU0ZjM2YjAxNDBjNTlhMjdjYWY2ODA5Mjk&ctz=Asia%2FKolkata&hl=en&es=0

Invitation from Google Calendar: https://www.google.com/calendar/

You are receiving this courtesy email at the account  
gluster-users at gluster.org because you are an attendee of this event.

To stop receiving future updates for this event, decline this event.  
Alternatively you can sign up for a Google account at  
https://www.google.com/calendar/ and control your notification settings for  
your entire calendar.

Forwarding this invitation could allow any recipient to send a response to  
the organizer and be added to the guest list, or invite others regardless  
of their own invitation status, or to modify your RSVP. Learn more at  
https://support.google.com/calendar/answer/37135#forwarding
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190514/58b21057/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/calendar
Size: 2142 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190514/58b21057/attachment-0001.ics>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: invite.ics
Type: application/ics
Size: 2194 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190514/58b21057/attachment-0001.bin>

From pgurusid at redhat.com  Tue May 14 04:47:10 2019
From: pgurusid at redhat.com (pgurusid at redhat.com)
Date: Tue, 14 May 2019 04:47:10 +0000
Subject: [Gluster-users] Updated invitation: Gluster Community Meeting (APAC
 friendly hours) @ Every 2 weeks from 11:30am to 12:30pm on Tuesday 15 times
 (IST) (gluster-users@gluster.org)
Message-ID: <000000000000d5a2c10588d1b9e2@google.com>

This event has been changed.

Title: Gluster Community Meeting (APAC friendly hours)
Bridge: https://bluejeans.com/836554017

Meeting minutes: https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g?both

Previous Meeting notes: http://github.com/gluster/community
When: Every 2 weeks from 11:30am to 12:30pm on Tuesday 15 times India  
Standard Time - Kolkata (changed)
Where: https://bluejeans.com/836554017
Calendar: gluster-users at gluster.org
Who:
     * pgurusid at redhat.com - organizer
     * gluster-users at gluster.org
     * maintainers at gluster.org
     * gluster-devel at gluster.org
     * ranaraya at redhat.com
     * khiremat at redhat.com
     * dcunningham at voisonics.com

Event details:  
https://www.google.com/calendar/event?action=VIEW&eid=NTEwOGJvMGZjMnRjN3Z0YzY0OGNmb3E4dXQgZ2x1c3Rlci11c2Vyc0BnbHVzdGVyLm9yZw&tok=MTkjcGd1cnVzaWRAcmVkaGF0LmNvbTk4OTgxMGM4NWE4YjNlMjU0ZjM2YjAxNDBjNTlhMjdjYWY2ODA5Mjk&ctz=Asia%2FKolkata&hl=en&es=0

Invitation from Google Calendar: https://www.google.com/calendar/

You are receiving this courtesy email at the account  
gluster-users at gluster.org because you are an attendee of this event.

To stop receiving future updates for this event, decline this event.  
Alternatively you can sign up for a Google account at  
https://www.google.com/calendar/ and control your notification settings for  
your entire calendar.

Forwarding this invitation could allow any recipient to send a response to  
the organizer and be added to the guest list, or invite others regardless  
of their own invitation status, or to modify your RSVP. Learn more at  
https://support.google.com/calendar/answer/37135#forwarding
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190514/8879003e/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/calendar
Size: 2585 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190514/8879003e/attachment.ics>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: invite.ics
Type: application/ics
Size: 2644 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190514/8879003e/attachment.bin>

From paul at vandervlis.nl  Wed May 15 11:24:41 2019
From: paul at vandervlis.nl (Paul van der Vlis)
Date: Wed, 15 May 2019 13:24:41 +0200
Subject: [Gluster-users] Cannot see all data in mount
Message-ID: <9241cbaf-38ba-63e0-95f0-120bd9856bf5@vandervlis.nl>

Hello,

I am the new sysadmin of an organization what uses Glusterfs.
I did not set it up, and I don't know much about Glusterfs.

What I do not understand is that I do not see all data in the mount.
Not as root, not as a normal user who has privileges.

When I do "ls" in one of the subdirectories I don't see any data, but
this data exists at the server!

In another subdirectory I see everything fine, the rights of the
directories and files inside are the same.

I mount with something like:
/bin/mount -t glusterfs -o acl 10.8.0.1:/data /data
I see data in /data/VOORBEELD/, and I don't see any data in /data/ALGEMEEN/.

I don't see something special in /etc/exports or in /etc/glusterfs on
the server.

Is there maybe a mechanism in Glusterfs what can exclude data from
export?  Or is there a way to debug this problem?

With regards,
Paul van der Vlis

----
# file: VOORBEELD
# owner: root
# group: secretariaat
# flags: -s-
user::rwx
group::rwx
group:medewerkers:r-x
mask::rwx
other::---
default:user::rwx
default:group::rwx
default:group:medewerkers:r-x
default:mask::rwx
default:other::---

# file: ALGEMEEN
# owner: root
# group: secretariaat
# flags: -s-
user::rwx
group::rwx
group:medewerkers:r-x
mask::rwx
other::---
default:user::rwx
default:group::rwx
default:group:medewerkers:r-x
default:mask::rwx
default:other::---
------


-- 
Paul van der Vlis Linux systeembeheer Groningen
https://www.vandervlis.nl/

From hunter86_bg at yahoo.com  Wed May 15 12:59:24 2019
From: hunter86_bg at yahoo.com (Strahil Nikolov)
Date: Wed, 15 May 2019 12:59:24 +0000 (UTC)
Subject: [Gluster-users] Cannot see all data in mount
In-Reply-To: <9241cbaf-38ba-63e0-95f0-120bd9856bf5@vandervlis.nl>
References: <9241cbaf-38ba-63e0-95f0-120bd9856bf5@vandervlis.nl>
Message-ID: <1716249284.809654.1557925164742@mail.yahoo.com>

 Most probably you use sharding , which splits the files into smaller chunks so you can fit a 1TB file into gluster nodes with bricks of smaller size.So if you have 2 dispersed servers each having 500Gb brick->? without sharding you won't be able to store files larger than the brick size - no matter you have free space on the other server.
When sharding is enabled - you will see on the brick the first shard as a file and the rest is in a hidden folder called ".shards" (or something like that).
The benefit is also viewable when you need to do some maintenance on a gluster node, as you will need to heal only the shards containing modified by the customers' data.

Best Regards,Strahil Nikolov


    ? ?????, 15 ??? 2019 ?., 7:31:39 ?. ???????-4, Paul van der Vlis <paul at vandervlis.nl> ??????:  
 
 Hello,

I am the new sysadmin of an organization what uses Glusterfs.
I did not set it up, and I don't know much about Glusterfs.

What I do not understand is that I do not see all data in the mount.
Not as root, not as a normal user who has privileges.

When I do "ls" in one of the subdirectories I don't see any data, but
this data exists at the server!

In another subdirectory I see everything fine, the rights of the
directories and files inside are the same.

I mount with something like:
/bin/mount -t glusterfs -o acl 10.8.0.1:/data /data
I see data in /data/VOORBEELD/, and I don't see any data in /data/ALGEMEEN/.

I don't see something special in /etc/exports or in /etc/glusterfs on
the server.

Is there maybe a mechanism in Glusterfs what can exclude data from
export?? Or is there a way to debug this problem?

With regards,
Paul van der Vlis

----
# file: VOORBEELD
# owner: root
# group: secretariaat
# flags: -s-
user::rwx
group::rwx
group:medewerkers:r-x
mask::rwx
other::---
default:user::rwx
default:group::rwx
default:group:medewerkers:r-x
default:mask::rwx
default:other::---

# file: ALGEMEEN
# owner: root
# group: secretariaat
# flags: -s-
user::rwx
group::rwx
group:medewerkers:r-x
mask::rwx
other::---
default:user::rwx
default:group::rwx
default:group:medewerkers:r-x
default:mask::rwx
default:other::---
------


-- 
Paul van der Vlis Linux systeembeheer Groningen
https://www.vandervlis.nl/
_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users
  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190515/3b736e72/attachment.html>

From paul at vandervlis.nl  Wed May 15 13:24:17 2019
From: paul at vandervlis.nl (Paul van der Vlis)
Date: Wed, 15 May 2019 15:24:17 +0200
Subject: [Gluster-users] Cannot see all data in mount
In-Reply-To: <1716249284.809654.1557925164742@mail.yahoo.com>
References: <9241cbaf-38ba-63e0-95f0-120bd9856bf5@vandervlis.nl>
	<1716249284.809654.1557925164742@mail.yahoo.com>
Message-ID: <4e160ac2-002a-8ef8-7660-de7cff369882@vandervlis.nl>

Hello Strahil,

Thanks for your answer. I don't find the word "sharding" in the
configfiles. There is not much shared data (24GB), and only 1 brick:
---
root at xxx:/etc/glusterfs# gluster volume info DATA

Volume Name: DATA
Type: Distribute
Volume ID: db53ece1-5def-4f7c-b59d-3a230824032a
Status: Started
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: xxx-vpn:/DATA
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
----
(I have edited this a bit for privacy of my customer).

I think they have used glusterfs because it can do ACLs.

With regards,
Paul van der Vlis


Op 15-05-19 om 14:59 schreef Strahil Nikolov:
> Most probably you use sharding , which splits the files into smaller
> chunks so you can fit a 1TB file into gluster nodes with bricks of
> smaller size.
> So if you have 2 dispersed servers each having 500Gb brick->? without
> sharding you won't be able to store files larger than the brick size -
> no matter you have free space on the other server.
> 
> When sharding is enabled - you will see on the brick the first shard as
> a file and the rest is in a hidden folder called ".shards" (or something
> like that).
> 
> The benefit is also viewable when you need to do some maintenance on a
> gluster node, as you will need to heal only the shards containing
> modified by the customers' data.
> 
> Best Regards,
> Strahil Nikolov
> 
> 
> ? ?????, 15 ??? 2019 ?., 7:31:39 ?. ???????-4, Paul van der Vlis
> <paul at vandervlis.nl> ??????:
> 
> 
> Hello,
> 
> I am the new sysadmin of an organization what uses Glusterfs.
> I did not set it up, and I don't know much about Glusterfs.
> 
> What I do not understand is that I do not see all data in the mount.
> Not as root, not as a normal user who has privileges.
> 
> When I do "ls" in one of the subdirectories I don't see any data, but
> this data exists at the server!
> 
> In another subdirectory I see everything fine, the rights of the
> directories and files inside are the same.
> 
> I mount with something like:
> /bin/mount -t glusterfs -o acl 10.8.0.1:/data /data
> I see data in /data/VOORBEELD/, and I don't see any data in /data/ALGEMEEN/.
> 
> I don't see something special in /etc/exports or in /etc/glusterfs on
> the server.
> 
> Is there maybe a mechanism in Glusterfs what can exclude data from
> export?? Or is there a way to debug this problem?
> 
> With regards,
> Paul van der Vlis
> 
> ----
> # file: VOORBEELD
> # owner: root
> # group: secretariaat
> # flags: -s-
> user::rwx
> group::rwx
> group:medewerkers:r-x
> mask::rwx
> other::---
> default:user::rwx
> default:group::rwx
> default:group:medewerkers:r-x
> default:mask::rwx
> default:other::---
> 
> # file: ALGEMEEN
> # owner: root
> # group: secretariaat
> # flags: -s-
> user::rwx
> group::rwx
> group:medewerkers:r-x
> mask::rwx
> other::---
> default:user::rwx
> default:group::rwx
> default:group:medewerkers:r-x
> default:mask::rwx
> default:other::---
> ------
> 
> 
> 
> 
> 
> -- 
> Paul van der Vlis Linux systeembeheer Groningen
> https://www.vandervlis.nl/
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
> https://lists.gluster.org/mailman/listinfo/gluster-users


-- 
Paul van der Vlis Linux systeembeheer Groningen
https://www.vandervlis.nl/

From nbalacha at redhat.com  Wed May 15 13:45:10 2019
From: nbalacha at redhat.com (Nithya Balachandran)
Date: Wed, 15 May 2019 19:15:10 +0530
Subject: [Gluster-users] Cannot see all data in mount
In-Reply-To: <4e160ac2-002a-8ef8-7660-de7cff369882@vandervlis.nl>
References: <9241cbaf-38ba-63e0-95f0-120bd9856bf5@vandervlis.nl>
	<1716249284.809654.1557925164742@mail.yahoo.com>
	<4e160ac2-002a-8ef8-7660-de7cff369882@vandervlis.nl>
Message-ID: <CAOUCJ=g0_pURSXb-ZUpauRDv5KVNYOTyic+r6Bm4LJn9W83EVg@mail.gmail.com>

Hi Paul,

A few questions:
Which version of gluster are you using?
Did this behaviour start recently? As in were the contents of that
directory visible earlier?

Regards,
Nithya


On Wed, 15 May 2019 at 18:55, Paul van der Vlis <paul at vandervlis.nl> wrote:

> Hello Strahil,
>
> Thanks for your answer. I don't find the word "sharding" in the
> configfiles. There is not much shared data (24GB), and only 1 brick:
> ---
> root at xxx:/etc/glusterfs# gluster volume info DATA
>
> Volume Name: DATA
> Type: Distribute
> Volume ID: db53ece1-5def-4f7c-b59d-3a230824032a
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1
> Transport-type: tcp
> Bricks:
> Brick1: xxx-vpn:/DATA
> Options Reconfigured:
> transport.address-family: inet
> nfs.disable: on
> ----
> (I have edited this a bit for privacy of my customer).
>
> I think they have used glusterfs because it can do ACLs.
>
> With regards,
> Paul van der Vlis
>
>
> Op 15-05-19 om 14:59 schreef Strahil Nikolov:
> > Most probably you use sharding , which splits the files into smaller
> > chunks so you can fit a 1TB file into gluster nodes with bricks of
> > smaller size.
> > So if you have 2 dispersed servers each having 500Gb brick->  without
> > sharding you won't be able to store files larger than the brick size -
> > no matter you have free space on the other server.
> >
> > When sharding is enabled - you will see on the brick the first shard as
> > a file and the rest is in a hidden folder called ".shards" (or something
> > like that).
> >
> > The benefit is also viewable when you need to do some maintenance on a
> > gluster node, as you will need to heal only the shards containing
> > modified by the customers' data.
> >
> > Best Regards,
> > Strahil Nikolov
> >
> >
> > ? ?????, 15 ??? 2019 ?., 7:31:39 ?. ???????-4, Paul van der Vlis
> > <paul at vandervlis.nl> ??????:
> >
> >
> > Hello,
> >
> > I am the new sysadmin of an organization what uses Glusterfs.
> > I did not set it up, and I don't know much about Glusterfs.
> >
> > What I do not understand is that I do not see all data in the mount.
> > Not as root, not as a normal user who has privileges.
> >
> > When I do "ls" in one of the subdirectories I don't see any data, but
> > this data exists at the server!
> >
> > In another subdirectory I see everything fine, the rights of the
> > directories and files inside are the same.
> >
> > I mount with something like:
> > /bin/mount -t glusterfs -o acl 10.8.0.1:/data /data
> > I see data in /data/VOORBEELD/, and I don't see any data in
> /data/ALGEMEEN/.
> >
> > I don't see something special in /etc/exports or in /etc/glusterfs on
> > the server.
> >
> > Is there maybe a mechanism in Glusterfs what can exclude data from
> > export?  Or is there a way to debug this problem?
> >
> > With regards,
> > Paul van der Vlis
> >
> > ----
> > # file: VOORBEELD
> > # owner: root
> > # group: secretariaat
> > # flags: -s-
> > user::rwx
> > group::rwx
> > group:medewerkers:r-x
> > mask::rwx
> > other::---
> > default:user::rwx
> > default:group::rwx
> > default:group:medewerkers:r-x
> > default:mask::rwx
> > default:other::---
> >
> > # file: ALGEMEEN
> > # owner: root
> > # group: secretariaat
> > # flags: -s-
> > user::rwx
> > group::rwx
> > group:medewerkers:r-x
> > mask::rwx
> > other::---
> > default:user::rwx
> > default:group::rwx
> > default:group:medewerkers:r-x
> > default:mask::rwx
> > default:other::---
> > ------
> >
> >
> >
> >
> >
> > --
> > Paul van der Vlis Linux systeembeheer Groningen
> > https://www.vandervlis.nl/
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
> > https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
> --
> Paul van der Vlis Linux systeembeheer Groningen
> https://www.vandervlis.nl/
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190515/ed593445/attachment.html>

From paul at vandervlis.nl  Wed May 15 21:34:54 2019
From: paul at vandervlis.nl (Paul van der Vlis)
Date: Wed, 15 May 2019 23:34:54 +0200
Subject: [Gluster-users] Cannot see all data in mount
In-Reply-To: <CAOUCJ=g0_pURSXb-ZUpauRDv5KVNYOTyic+r6Bm4LJn9W83EVg@mail.gmail.com>
References: <9241cbaf-38ba-63e0-95f0-120bd9856bf5@vandervlis.nl>
	<1716249284.809654.1557925164742@mail.yahoo.com>
	<4e160ac2-002a-8ef8-7660-de7cff369882@vandervlis.nl>
	<CAOUCJ=g0_pURSXb-ZUpauRDv5KVNYOTyic+r6Bm4LJn9W83EVg@mail.gmail.com>
Message-ID: <5ca473e6-d2d4-3363-6a98-30667a644e05@vandervlis.nl>

Op 15-05-19 om 15:45 schreef Nithya Balachandran:
> Hi Paul,
> 
> A few questions:
> Which version of gluster are you using?

On the server and some clients: glusterfs 4.1.2
On a new client: glusterfs 5.5

> Did this behaviour start recently? As in were the contents of that
> directory visible earlier?

This directory was normally used in the headoffice, and there is direct
access to the files without Glusterfs. So I don't know.

With regards,
Paul van der Vlis

> Regards,
> Nithya
> 
> 
> On Wed, 15 May 2019 at 18:55, Paul van der Vlis <paul at vandervlis.nl
> <mailto:paul at vandervlis.nl>> wrote:
> 
>     Hello Strahil,
> 
>     Thanks for your answer. I don't find the word "sharding" in the
>     configfiles. There is not much shared data (24GB), and only 1 brick:
>     ---
>     root at xxx:/etc/glusterfs# gluster volume info DATA
> 
>     Volume Name: DATA
>     Type: Distribute
>     Volume ID: db53ece1-5def-4f7c-b59d-3a230824032a
>     Status: Started
>     Snapshot Count: 0
>     Number of Bricks: 1
>     Transport-type: tcp
>     Bricks:
>     Brick1: xxx-vpn:/DATA
>     Options Reconfigured:
>     transport.address-family: inet
>     nfs.disable: on
>     ----
>     (I have edited this a bit for privacy of my customer).
> 
>     I think they have used glusterfs because it can do ACLs.
> 
>     With regards,
>     Paul van der Vlis
> 
> 
>     Op 15-05-19 om 14:59 schreef Strahil Nikolov:
>     > Most probably you use sharding , which splits the files into smaller
>     > chunks so you can fit a 1TB file into gluster nodes with bricks of
>     > smaller size.
>     > So if you have 2 dispersed servers each having 500Gb brick->? without
>     > sharding you won't be able to store files larger than the brick size -
>     > no matter you have free space on the other server.
>     >
>     > When sharding is enabled - you will see on the brick the first
>     shard as
>     > a file and the rest is in a hidden folder called ".shards" (or
>     something
>     > like that).
>     >
>     > The benefit is also viewable when you need to do some maintenance on a
>     > gluster node, as you will need to heal only the shards containing
>     > modified by the customers' data.
>     >
>     > Best Regards,
>     > Strahil Nikolov
>     >
>     >
>     > ? ?????, 15 ??? 2019 ?., 7:31:39 ?. ???????-4, Paul van der Vlis
>     > <paul at vandervlis.nl <mailto:paul at vandervlis.nl>> ??????:
>     >
>     >
>     > Hello,
>     >
>     > I am the new sysadmin of an organization what uses Glusterfs.
>     > I did not set it up, and I don't know much about Glusterfs.
>     >
>     > What I do not understand is that I do not see all data in the mount.
>     > Not as root, not as a normal user who has privileges.
>     >
>     > When I do "ls" in one of the subdirectories I don't see any data, but
>     > this data exists at the server!
>     >
>     > In another subdirectory I see everything fine, the rights of the
>     > directories and files inside are the same.
>     >
>     > I mount with something like:
>     > /bin/mount -t glusterfs -o acl 10.8.0.1:/data /data
>     > I see data in /data/VOORBEELD/, and I don't see any data in
>     /data/ALGEMEEN/.
>     >
>     > I don't see something special in /etc/exports or in /etc/glusterfs on
>     > the server.
>     >
>     > Is there maybe a mechanism in Glusterfs what can exclude data from
>     > export?? Or is there a way to debug this problem?
>     >
>     > With regards,
>     > Paul van der Vlis
>     >
>     > ----
>     > # file: VOORBEELD
>     > # owner: root
>     > # group: secretariaat
>     > # flags: -s-
>     > user::rwx
>     > group::rwx
>     > group:medewerkers:r-x
>     > mask::rwx
>     > other::---
>     > default:user::rwx
>     > default:group::rwx
>     > default:group:medewerkers:r-x
>     > default:mask::rwx
>     > default:other::---
>     >
>     > # file: ALGEMEEN
>     > # owner: root
>     > # group: secretariaat
>     > # flags: -s-
>     > user::rwx
>     > group::rwx
>     > group:medewerkers:r-x
>     > mask::rwx
>     > other::---
>     > default:user::rwx
>     > default:group::rwx
>     > default:group:medewerkers:r-x
>     > default:mask::rwx
>     > default:other::---
>     > ------
>     >
>     >
>     >
>     >
>     >
>     > --
>     > Paul van der Vlis Linux systeembeheer Groningen
>     > https://www.vandervlis.nl/
>     > _______________________________________________
>     > Gluster-users mailing list
>     > Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>     <mailto:Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>>
>     > https://lists.gluster.org/mailman/listinfo/gluster-users
> 
> 
> 
>     -- 
>     Paul van der Vlis Linux systeembeheer Groningen
>     https://www.vandervlis.nl/
>     _______________________________________________
>     Gluster-users mailing list
>     Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>     https://lists.gluster.org/mailman/listinfo/gluster-users
> 


-- 
Paul van der Vlis Linux systeembeheer Groningen
https://www.vandervlis.nl/

From hunter86_bg at yahoo.com  Wed May 15 21:46:22 2019
From: hunter86_bg at yahoo.com (Strahil Nikolov)
Date: Wed, 15 May 2019 21:46:22 +0000 (UTC)
Subject: [Gluster-users] Cannot see all data in mount
In-Reply-To: <4e160ac2-002a-8ef8-7660-de7cff369882@vandervlis.nl>
References: <9241cbaf-38ba-63e0-95f0-120bd9856bf5@vandervlis.nl>
	<1716249284.809654.1557925164742@mail.yahoo.com>
	<4e160ac2-002a-8ef8-7660-de7cff369882@vandervlis.nl>
Message-ID: <1841695718.1050162.1557956782965@mail.yahoo.com>

 Check with 'gluster volume info <volume_name> | grep shard'
If you have it enabled it should show:features.shard: on
Keep in mind that disabling sharding is really bad, so if you really use it - do not disable sharding - will cause a real mess.
Best Regards,Strahil Nikolov
    ? ?????, 15 ??? 2019 ?., 16:24:20 ?. ???????+3, Paul van der Vlis <paul at vandervlis.nl> ??????:  
 
 Hello Strahil,

Thanks for your answer. I don't find the word "sharding" in the
configfiles. There is not much shared data (24GB), and only 1 brick:
---
root at xxx:/etc/glusterfs# gluster volume info DATA

Volume Name: DATA
Type: Distribute
Volume ID: db53ece1-5def-4f7c-b59d-3a230824032a
Status: Started
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: xxx-vpn:/DATA
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
----
(I have edited this a bit for privacy of my customer).

I think they have used glusterfs because it can do ACLs.

With regards,
Paul van der Vlis


Op 15-05-19 om 14:59 schreef Strahil Nikolov:
> Most probably you use sharding , which splits the files into smaller
> chunks so you can fit a 1TB file into gluster nodes with bricks of
> smaller size.
> So if you have 2 dispersed servers each having 500Gb brick->? without
> sharding you won't be able to store files larger than the brick size -
> no matter you have free space on the other server.
> 
> When sharding is enabled - you will see on the brick the first shard as
> a file and the rest is in a hidden folder called ".shards" (or something
> like that).
> 
> The benefit is also viewable when you need to do some maintenance on a
> gluster node, as you will need to heal only the shards containing
> modified by the customers' data.
> 
> Best Regards,
> Strahil Nikolov
> 
> 
> ? ?????, 15 ??? 2019 ?., 7:31:39 ?. ???????-4, Paul van der Vlis
> <paul at vandervlis.nl> ??????:
> 
> 
> Hello,
> 
> I am the new sysadmin of an organization what uses Glusterfs.
> I did not set it up, and I don't know much about Glusterfs.
> 
> What I do not understand is that I do not see all data in the mount.
> Not as root, not as a normal user who has privileges.
> 
> When I do "ls" in one of the subdirectories I don't see any data, but
> this data exists at the server!
> 
> In another subdirectory I see everything fine, the rights of the
> directories and files inside are the same.
> 
> I mount with something like:
> /bin/mount -t glusterfs -o acl 10.8.0.1:/data /data
> I see data in /data/VOORBEELD/, and I don't see any data in /data/ALGEMEEN/.
> 
> I don't see something special in /etc/exports or in /etc/glusterfs on
> the server.
> 
> Is there maybe a mechanism in Glusterfs what can exclude data from
> export?? Or is there a way to debug this problem?
> 
> With regards,
> Paul van der Vlis
> 
> ----
> # file: VOORBEELD
> # owner: root
> # group: secretariaat
> # flags: -s-
> user::rwx
> group::rwx
> group:medewerkers:r-x
> mask::rwx
> other::---
> default:user::rwx
> default:group::rwx
> default:group:medewerkers:r-x
> default:mask::rwx
> default:other::---
> 
> # file: ALGEMEEN
> # owner: root
> # group: secretariaat
> # flags: -s-
> user::rwx
> group::rwx
> group:medewerkers:r-x
> mask::rwx
> other::---
> default:user::rwx
> default:group::rwx
> default:group:medewerkers:r-x
> default:mask::rwx
> default:other::---
> ------
> 
> 
> 
> 
> 
> -- 
> Paul van der Vlis Linux systeembeheer Groningen
> https://www.vandervlis.nl/
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
> https://lists.gluster.org/mailman/listinfo/gluster-users


-- 
Paul van der Vlis Linux systeembeheer Groningen
https://www.vandervlis.nl/
  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190515/3ed9ae4f/attachment.html>

From hunter86_bg at yahoo.com  Wed May 15 21:48:20 2019
From: hunter86_bg at yahoo.com (Strahil Nikolov)
Date: Wed, 15 May 2019 21:48:20 +0000 (UTC)
Subject: [Gluster-users] Cannot see all data in mount
In-Reply-To: <5ca473e6-d2d4-3363-6a98-30667a644e05@vandervlis.nl>
References: <9241cbaf-38ba-63e0-95f0-120bd9856bf5@vandervlis.nl>
	<1716249284.809654.1557925164742@mail.yahoo.com>
	<4e160ac2-002a-8ef8-7660-de7cff369882@vandervlis.nl>
	<CAOUCJ=g0_pURSXb-ZUpauRDv5KVNYOTyic+r6Bm4LJn9W83EVg@mail.gmail.com>
	<5ca473e6-d2d4-3363-6a98-30667a644e05@vandervlis.nl>
Message-ID: <1246047051.1043872.1557956900813@mail.yahoo.com>

 It seems that I got confused.So you see the files on the bricks (servers) , but not when you mount glusterfs on the clients ?
If so - this is not the sharding feature as it works the opposite way.
Best Regards,Strahil Nikolov
    ? ?????????, 16 ??? 2019 ?., 0:35:04 ?. ???????+3, Paul van der Vlis <paul at vandervlis.nl> ??????:  
 
 Op 15-05-19 om 15:45 schreef Nithya Balachandran:
> Hi Paul,
> 
> A few questions:
> Which version of gluster are you using?

On the server and some clients: glusterfs 4.1.2
On a new client: glusterfs 5.5

> Did this behaviour start recently? As in were the contents of that
> directory visible earlier?

This directory was normally used in the headoffice, and there is direct
access to the files without Glusterfs. So I don't know.

With regards,
Paul van der Vlis

> Regards,
> Nithya
> 
> 
> On Wed, 15 May 2019 at 18:55, Paul van der Vlis <paul at vandervlis.nl
> <mailto:paul at vandervlis.nl>> wrote:
> 
>? ? Hello Strahil,
> 
>? ? Thanks for your answer. I don't find the word "sharding" in the
>? ? configfiles. There is not much shared data (24GB), and only 1 brick:
>? ? ---
>? ? root at xxx:/etc/glusterfs# gluster volume info DATA
> 
>? ? Volume Name: DATA
>? ? Type: Distribute
>? ? Volume ID: db53ece1-5def-4f7c-b59d-3a230824032a
>? ? Status: Started
>? ? Snapshot Count: 0
>? ? Number of Bricks: 1
>? ? Transport-type: tcp
>? ? Bricks:
>? ? Brick1: xxx-vpn:/DATA
>? ? Options Reconfigured:
>? ? transport.address-family: inet
>? ? nfs.disable: on
>? ? ----
>? ? (I have edited this a bit for privacy of my customer).
> 
>? ? I think they have used glusterfs because it can do ACLs.
> 
>? ? With regards,
>? ? Paul van der Vlis
> 
> 
>? ? Op 15-05-19 om 14:59 schreef Strahil Nikolov:
>? ? > Most probably you use sharding , which splits the files into smaller
>? ? > chunks so you can fit a 1TB file into gluster nodes with bricks of
>? ? > smaller size.
>? ? > So if you have 2 dispersed servers each having 500Gb brick->? without
>? ? > sharding you won't be able to store files larger than the brick size -
>? ? > no matter you have free space on the other server.
>? ? >
>? ? > When sharding is enabled - you will see on the brick the first
>? ? shard as
>? ? > a file and the rest is in a hidden folder called ".shards" (or
>? ? something
>? ? > like that).
>? ? >
>? ? > The benefit is also viewable when you need to do some maintenance on a
>? ? > gluster node, as you will need to heal only the shards containing
>? ? > modified by the customers' data.
>? ? >
>? ? > Best Regards,
>? ? > Strahil Nikolov
>? ? >
>? ? >
>? ? > ? ?????, 15 ??? 2019 ?., 7:31:39 ?. ???????-4, Paul van der Vlis
>? ? > <paul at vandervlis.nl <mailto:paul at vandervlis.nl>> ??????:
>? ? >
>? ? >
>? ? > Hello,
>? ? >
>? ? > I am the new sysadmin of an organization what uses Glusterfs.
>? ? > I did not set it up, and I don't know much about Glusterfs.
>? ? >
>? ? > What I do not understand is that I do not see all data in the mount.
>? ? > Not as root, not as a normal user who has privileges.
>? ? >
>? ? > When I do "ls" in one of the subdirectories I don't see any data, but
>? ? > this data exists at the server!
>? ? >
>? ? > In another subdirectory I see everything fine, the rights of the
>? ? > directories and files inside are the same.
>? ? >
>? ? > I mount with something like:
>? ? > /bin/mount -t glusterfs -o acl 10.8.0.1:/data /data
>? ? > I see data in /data/VOORBEELD/, and I don't see any data in
>? ? /data/ALGEMEEN/.
>? ? >
>? ? > I don't see something special in /etc/exports or in /etc/glusterfs on
>? ? > the server.
>? ? >
>? ? > Is there maybe a mechanism in Glusterfs what can exclude data from
>? ? > export?? Or is there a way to debug this problem?
>? ? >
>? ? > With regards,
>? ? > Paul van der Vlis
>? ? >
>? ? > ----
>? ? > # file: VOORBEELD
>? ? > # owner: root
>? ? > # group: secretariaat
>? ? > # flags: -s-
>? ? > user::rwx
>? ? > group::rwx
>? ? > group:medewerkers:r-x
>? ? > mask::rwx
>? ? > other::---
>? ? > default:user::rwx
>? ? > default:group::rwx
>? ? > default:group:medewerkers:r-x
>? ? > default:mask::rwx
>? ? > default:other::---
>? ? >
>? ? > # file: ALGEMEEN
>? ? > # owner: root
>? ? > # group: secretariaat
>? ? > # flags: -s-
>? ? > user::rwx
>? ? > group::rwx
>? ? > group:medewerkers:r-x
>? ? > mask::rwx
>? ? > other::---
>? ? > default:user::rwx
>? ? > default:group::rwx
>? ? > default:group:medewerkers:r-x
>? ? > default:mask::rwx
>? ? > default:other::---
>? ? > ------
>? ? >
>? ? >
>? ? >
>? ? >
>? ? >
>? ? > --
>? ? > Paul van der Vlis Linux systeembeheer Groningen
>? ? > https://www.vandervlis.nl/
>? ? > _______________________________________________
>? ? > Gluster-users mailing list
>? ? > Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>? ? <mailto:Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>>
>? ? > https://lists.gluster.org/mailman/listinfo/gluster-users
> 
> 
> 
>? ? -- 
>? ? Paul van der Vlis Linux systeembeheer Groningen
>? ? https://www.vandervlis.nl/
>? ? _______________________________________________
>? ? Gluster-users mailing list
>? ? Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>? ? https://lists.gluster.org/mailman/listinfo/gluster-users
> 


-- 
Paul van der Vlis Linux systeembeheer Groningen
https://www.vandervlis.nl/
_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190515/17c14a89/attachment.html>

From order at rikus.com  Thu May 16 02:18:51 2019
From: order at rikus.com (Jeff Bischoff)
Date: Wed, 15 May 2019 22:18:51 -0400
Subject: [Gluster-users] Gluster mounts becoming stale and never recovering
Message-ID: <B7332222-46E4-4062-A3EA-B5FF8C6487FD@rikus.com>

Hi all,

 
We are having a sporadic issue with our Gluster mounts that is affecting several of our Kubernetes environments. We are having trouble understanding what is causing it, and we could use some guidance from the pros!

 
Scenario

We have an environment running a single-node Kubernetes with Heketi and several pods using Gluster mounts. The environment runs fine and the mounts appear to be healthy for up to several days. Suddenly, one or more (sometimes all) Gluster mounts have a problem and shut down the brick. The affected containers enter a crash loop that continues indefinitely, until someone intervenes. To work-around the crash loop, a user needs to trigger the bricks to be started again--either through manually starting them, restarting the Gluster pod or restarting the entire node.

 
Diagnostics

The tell-tale error message is seeing the following when describing a pod that is in a crash loop:

 
Message:      error while creating mount source path '/var/lib/kubelet/pods/4a2574bb-6fa4-11e9-a315-005056b83c80/volumes/kubernetes.io~glusterfs/db': mkdir /var/lib/kubelet/pods/4a2574bb-6fa4-11e9-a315-005056b83c80/volumes/kubernetes.io~glusterfs/db: file exists

 
We always see that "file exists" message when this error occurs.

 
Looking at the glusterd.log file, there had been nothing in the log for over a day and then suddenly, at the time the crash loop started, this:

 
[2019-05-08 13:49:04.733147] I [MSGID: 106143] [glusterd-pmap.c:397:pmap_registry_remove] 0-pmap: removing brick /var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_a3cef78a5914a2808da0b5736e3daec7/brick on port 49168

[2019-05-08 13:49:04.733374] I [MSGID: 106143] [glusterd-pmap.c:397:pmap_registry_remove] 0-pmap: removing brick /var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_7614e5014a0e402630a0e1fd776acf0a/brick on port 49167

[2019-05-08 13:49:05.003848] W [socket.c:599:__socket_rwv] 0-management: readv on /var/run/gluster/fe4ac75011a4de0e.socket failed (No data available)

[2019-05-08 13:49:05.065420] W [socket.c:599:__socket_rwv] 0-management: readv on /var/run/gluster/85e9fb223aa121f2.socket failed (No data available)

[2019-05-08 13:49:05.066479] W [socket.c:599:__socket_rwv] 0-management: readv on /var/run/gluster/e2a66e8cd8f5f606.socket failed (No data available)

[2019-05-08 13:49:05.067444] W [socket.c:599:__socket_rwv] 0-management: readv on /var/run/gluster/a0625e5b78d69bb8.socket failed (No data available)

[2019-05-08 13:49:05.068471] W [socket.c:599:__socket_rwv] 0-management: readv on /var/run/gluster/770bc294526d0360.socket failed (No data available)

[2019-05-08 13:49:05.074278] W [socket.c:599:__socket_rwv] 0-management: readv on /var/run/gluster/adbd37fe3e1eed36.socket failed (No data available)

[2019-05-08 13:49:05.075497] W [socket.c:599:__socket_rwv] 0-management: readv on /var/run/gluster/17712138f3370e53.socket failed (No data available)

[2019-05-08 13:49:05.076545] W [socket.c:599:__socket_rwv] 0-management: readv on /var/run/gluster/a6cf1aca8b23f394.socket failed (No data available)

[2019-05-08 13:49:05.077511] W [socket.c:599:__socket_rwv] 0-management: readv on /var/run/gluster/d0f83b191213e877.socket failed (No data available)

[2019-05-08 13:49:05.078447] W [socket.c:599:__socket_rwv] 0-management: readv on /var/run/gluster/d5dd08945d4f7f6d.socket failed (No data available)

[2019-05-08 13:49:05.079424] W [socket.c:599:__socket_rwv] 0-management: readv on /var/run/gluster/c8d7b10108758e2f.socket failed (No data available)

[2019-05-08 13:49:14.778619] I [MSGID: 106143] [glusterd-pmap.c:397:pmap_registry_remove] 0-pmap: removing brick /var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_0ed4f7f941de388cda678fe273e9ceb4/brick on port 49166

... (and more of the same)

 
Nothing further has been printed to the gluster log since. The bricks do not come back on their own.

The version of gluster we are using (running in a container, using the gluster/gluster-centos image from dockerhub):

 
# rpm -qa | grep gluster

glusterfs-rdma-4.1.7-1.el7.x86_64

gluster-block-0.3-2.el7.x86_64

python2-gluster-4.1.7-1.el7.x86_64

centos-release-gluster41-1.0-3.el7.centos.noarch

glusterfs-4.1.7-1.el7.x86_64

glusterfs-api-4.1.7-1.el7.x86_64

glusterfs-cli-4.1.7-1.el7.x86_64

glusterfs-geo-replication-4.1.7-1.el7.x86_64

glusterfs-libs-4.1.7-1.el7.x86_64

glusterfs-client-xlators-4.1.7-1.el7.x86_64

glusterfs-fuse-4.1.7-1.el7.x86_64

glusterfs-server-4.1.7-1.el7.x86_64

 
The version of glusterfs running on our Kubernetes node (a CentOS system):

 
]$ rpm -qa | grep gluster

glusterfs-libs-3.12.2-18.el7.x86_64

glusterfs-3.12.2-18.el7.x86_64

glusterfs-fuse-3.12.2-18.el7.x86_64

glusterfs-client-xlators-3.12.2-18.el7.x86_64

 
The Kubernetes version:

 
$  kubectl version

Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5", GitCommit:"2166946f41b36dea2c4626f90a77706f426cdea2", GitTreeState:"clean", BuildDate:"2019-03-25T15:19:22Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}

Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5", GitCommit:"2166946f41b36dea2c4626f90a77706f426cdea2", GitTreeState:"clean", BuildDate:"2019-03-25T15:19:22Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}

 
Our gluster settings/volume options:

 
apiVersion: storage.k8s.io/v1

kind: StorageClass

metadata:

  name: gluster-heketi

  selfLink: /apis/storage.k8s.io/v1/storageclasses/gluster-heketi

parameters:

  gidMax: "50000"

  gidMin: "2000"

  resturl: http://10.233.35.158:8080

  restuser: "null"

  restuserkey: "null"

  volumetype: "none"

  volumeoptions: cluster.post-op-delay-secs 0, performance.client-io-threads off, performance.open-behind off, performance.readdir-ahead off, performance.read-ahead off, performance.stat-prefetch off, performance.write-behind off, performance.io-cache off, cluster.consistent-metadata on, performance.quick-read off, performance.strict-o-direct on

provisioner: kubernetes.io/glusterfs

reclaimPolicy: Delete

 
Volume info for the heketi volume:

 
gluster> volume info heketidbstorage
 
Volume Name: heketidbstorage
Type: Distribute
Volume ID: 34b897d0-0953-4f8f-9c5c-54e043e55d92
Status: Started
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: 10.10.168.25:/var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_a16f9f0374fe5db948a60a017a3f5e60/brick
Options Reconfigured:
user.heketi.id: 1d2400626dac780fce12e45a07494853
transport.address-family: inet
nfs.disable: on
 

Full Gluster logs available if needed, just let me know how best to provide them.

 
Thanks in advance for any help or suggestions on this!

 
Best,

 
Jeff Bischoff

Turbonomic

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190515/d01eb649/attachment.html>

From nbalacha at redhat.com  Thu May 16 03:43:30 2019
From: nbalacha at redhat.com (Nithya Balachandran)
Date: Thu, 16 May 2019 09:13:30 +0530
Subject: [Gluster-users] Cannot see all data in mount
In-Reply-To: <5ca473e6-d2d4-3363-6a98-30667a644e05@vandervlis.nl>
References: <9241cbaf-38ba-63e0-95f0-120bd9856bf5@vandervlis.nl>
	<1716249284.809654.1557925164742@mail.yahoo.com>
	<4e160ac2-002a-8ef8-7660-de7cff369882@vandervlis.nl>
	<CAOUCJ=g0_pURSXb-ZUpauRDv5KVNYOTyic+r6Bm4LJn9W83EVg@mail.gmail.com>
	<5ca473e6-d2d4-3363-6a98-30667a644e05@vandervlis.nl>
Message-ID: <CAOUCJ=jSp7j2d=_kQYbLgUm0cn1G+4z97hhnt2JmQrb0ygDcnQ@mail.gmail.com>

On Thu, 16 May 2019 at 03:05, Paul van der Vlis <paul at vandervlis.nl> wrote:

> Op 15-05-19 om 15:45 schreef Nithya Balachandran:
> > Hi Paul,
> >
> > A few questions:
> > Which version of gluster are you using?
>
> On the server and some clients: glusterfs 4.1.2
> On a new client: glusterfs 5.5
>
> Is the same behaviour seen on both client versions?


> > Did this behaviour start recently? As in were the contents of that
> > directory visible earlier?
>
> This directory was normally used in the headoffice, and there is direct
> access to the files without Glusterfs. So I don't know.
>

Do you mean that they access the files on the gluster volume without using
the client or that these files were stored elsewhere
earlier (not on gluster)? Files on a gluster volume should never be
accessed directly.

To debug this further, please send the following:

   1. The directory contents when the listing is performed directly on the
   brick.
   2. The tcpdump of the gluster client when listing the directory using
   the following command:

tcpdump -i any -s 0 -w /var/tmp/dirls.pcap tcp and not port 22


You can send these directly to me in case you want to keep the information
private.

Regards,
Nithya


>
> With regards,
> Paul van der Vlis
>
> > Regards,
> > Nithya
> >
> >
> > On Wed, 15 May 2019 at 18:55, Paul van der Vlis <paul at vandervlis.nl
> > <mailto:paul at vandervlis.nl>> wrote:
> >
> >     Hello Strahil,
> >
> >     Thanks for your answer. I don't find the word "sharding" in the
> >     configfiles. There is not much shared data (24GB), and only 1 brick:
> >     ---
> >     root at xxx:/etc/glusterfs# gluster volume info DATA
> >
> >     Volume Name: DATA
> >     Type: Distribute
> >     Volume ID: db53ece1-5def-4f7c-b59d-3a230824032a
> >     Status: Started
> >     Snapshot Count: 0
> >     Number of Bricks: 1
> >     Transport-type: tcp
> >     Bricks:
> >     Brick1: xxx-vpn:/DATA
> >     Options Reconfigured:
> >     transport.address-family: inet
> >     nfs.disable: on
> >     ----
> >     (I have edited this a bit for privacy of my customer).
> >
> >     I think they have used glusterfs because it can do ACLs.
> >
> >     With regards,
> >     Paul van der Vlis
> >
> >
> >     Op 15-05-19 om 14:59 schreef Strahil Nikolov:
> >     > Most probably you use sharding , which splits the files into
> smaller
> >     > chunks so you can fit a 1TB file into gluster nodes with bricks of
> >     > smaller size.
> >     > So if you have 2 dispersed servers each having 500Gb brick->
> without
> >     > sharding you won't be able to store files larger than the brick
> size -
> >     > no matter you have free space on the other server.
> >     >
> >     > When sharding is enabled - you will see on the brick the first
> >     shard as
> >     > a file and the rest is in a hidden folder called ".shards" (or
> >     something
> >     > like that).
> >     >
> >     > The benefit is also viewable when you need to do some maintenance
> on a
> >     > gluster node, as you will need to heal only the shards containing
> >     > modified by the customers' data.
> >     >
> >     > Best Regards,
> >     > Strahil Nikolov
> >     >
> >     >
> >     > ? ?????, 15 ??? 2019 ?., 7:31:39 ?. ???????-4, Paul van der Vlis
> >     > <paul at vandervlis.nl <mailto:paul at vandervlis.nl>> ??????:
> >     >
> >     >
> >     > Hello,
> >     >
> >     > I am the new sysadmin of an organization what uses Glusterfs.
> >     > I did not set it up, and I don't know much about Glusterfs.
> >     >
> >     > What I do not understand is that I do not see all data in the
> mount.
> >     > Not as root, not as a normal user who has privileges.
> >     >
> >     > When I do "ls" in one of the subdirectories I don't see any data,
> but
> >     > this data exists at the server!
> >     >
> >     > In another subdirectory I see everything fine, the rights of the
> >     > directories and files inside are the same.
> >     >
> >     > I mount with something like:
> >     > /bin/mount -t glusterfs -o acl 10.8.0.1:/data /data
> >     > I see data in /data/VOORBEELD/, and I don't see any data in
> >     /data/ALGEMEEN/.
> >     >
> >     > I don't see something special in /etc/exports or in /etc/glusterfs
> on
> >     > the server.
> >     >
> >     > Is there maybe a mechanism in Glusterfs what can exclude data from
> >     > export?  Or is there a way to debug this problem?
> >     >
> >     > With regards,
> >     > Paul van der Vlis
> >     >
> >     > ----
> >     > # file: VOORBEELD
> >     > # owner: root
> >     > # group: secretariaat
> >     > # flags: -s-
> >     > user::rwx
> >     > group::rwx
> >     > group:medewerkers:r-x
> >     > mask::rwx
> >     > other::---
> >     > default:user::rwx
> >     > default:group::rwx
> >     > default:group:medewerkers:r-x
> >     > default:mask::rwx
> >     > default:other::---
> >     >
> >     > # file: ALGEMEEN
> >     > # owner: root
> >     > # group: secretariaat
> >     > # flags: -s-
> >     > user::rwx
> >     > group::rwx
> >     > group:medewerkers:r-x
> >     > mask::rwx
> >     > other::---
> >     > default:user::rwx
> >     > default:group::rwx
> >     > default:group:medewerkers:r-x
> >     > default:mask::rwx
> >     > default:other::---
> >     > ------
> >     >
> >     >
> >     >
> >     >
> >     >
> >     > --
> >     > Paul van der Vlis Linux systeembeheer Groningen
> >     > https://www.vandervlis.nl/
> >     > _______________________________________________
> >     > Gluster-users mailing list
> >     > Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
> >     <mailto:Gluster-users at gluster.org <mailto:Gluster-users at gluster.org
> >>
> >     > https://lists.gluster.org/mailman/listinfo/gluster-users
> >
> >
> >
> >     --
> >     Paul van der Vlis Linux systeembeheer Groningen
> >     https://www.vandervlis.nl/
> >     _______________________________________________
> >     Gluster-users mailing list
> >     Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
> >     https://lists.gluster.org/mailman/listinfo/gluster-users
> >
>
>
>
> --
> Paul van der Vlis Linux systeembeheer Groningen
> https://www.vandervlis.nl/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190516/cd0fab21/attachment.html>

From abhishpaliwal at gmail.com  Thu May 16 05:06:14 2019
From: abhishpaliwal at gmail.com (ABHISHEK PALIWAL)
Date: Thu, 16 May 2019 10:36:14 +0530
Subject: [Gluster-users] Memory leak in glusterfs process
Message-ID: <CA+15cFPhttMRrVXJ4uFBc_=0ZUzjVETU9SSZsoE2ceGpuqq8cg@mail.gmail.com>

Hi Team,

I upload some valgrind logs from my gluster 5.4 setup. This is writing to
the volume every 15 minutes. I stopped glusterd and then copy away the
logs.  The test was running for some simulated days. They are zipped in
valgrind-54.zip.

Lots of info in valgrind-2730.log. Lots of possibly lost bytes in glusterfs
and even some definitely lost bytes.

==2737== 1,572,880 bytes in 1 blocks are possibly lost in loss record 391
of 391
==2737== at 0x4C29C25: calloc (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==2737== by 0xA22485E: ??? (in
/usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so)
==2737== by 0xA217C94: ??? (in
/usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so)
==2737== by 0xA21D9F8: ??? (in
/usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so)
==2737== by 0xA21DED9: ??? (in
/usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so)
==2737== by 0xA21E685: ??? (in
/usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so)
==2737== by 0xA1B9D8C: init (in
/usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so)
==2737== by 0x4E511CE: xlator_init (in /usr/lib64/libglusterfs.so.0.0.1)
==2737== by 0x4E8A2B8: ??? (in /usr/lib64/libglusterfs.so.0.0.1)
==2737== by 0x4E8AAB3: glusterfs_graph_activate (in
/usr/lib64/libglusterfs.so.0.0.1)
==2737== by 0x409C35: glusterfs_process_volfp (in /usr/sbin/glusterfsd)
==2737== by 0x409D99: glusterfs_volumes_init (in /usr/sbin/glusterfsd)
==2737==
==2737== LEAK SUMMARY:
==2737== definitely lost: 1,053 bytes in 10 blocks
==2737== indirectly lost: 317 bytes in 3 blocks
==2737== possibly lost: 2,374,971 bytes in 524 blocks
==2737== still reachable: 53,277 bytes in 201 blocks
==2737== suppressed: 0 bytes in 0 blocks
-- 

Regards
Abhishek Paliwal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190516/f7b04d22/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: valgrind-54.zip
Type: application/zip
Size: 45897 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190516/f7b04d22/attachment.zip>

From abhishpaliwal at gmail.com  Thu May 16 05:19:49 2019
From: abhishpaliwal at gmail.com (ABHISHEK PALIWAL)
Date: Thu, 16 May 2019 10:49:49 +0530
Subject: [Gluster-users] Memory leak in glusterfs
Message-ID: <CA+15cFN0ESgCYSgQpqeVH953O=XzNvcrm-ce2jt8Ct3nuNLnCg@mail.gmail.com>

Hi Team,

I upload some valgrind logs from my gluster 5.4 setup. This is writing to
the volume every 15 minutes. I stopped glusterd and then copy away the
logs.  The test was running for some simulated days. They are zipped in
valgrind-54.zip.

Lots of info in valgrind-2730.log. Lots of possibly lost bytes in glusterfs
and even some definitely lost bytes.

==2737== 1,572,880 bytes in 1 blocks are possibly lost in loss record 391
of 391
==2737== at 0x4C29C25: calloc (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==2737== by 0xA22485E: ??? (in
/usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so)
==2737== by 0xA217C94: ??? (in
/usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so)
==2737== by 0xA21D9F8: ??? (in
/usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so)
==2737== by 0xA21DED9: ??? (in
/usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so)
==2737== by 0xA21E685: ??? (in
/usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so)
==2737== by 0xA1B9D8C: init (in
/usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so)
==2737== by 0x4E511CE: xlator_init (in /usr/lib64/libglusterfs.so.0.0.1)
==2737== by 0x4E8A2B8: ??? (in /usr/lib64/libglusterfs.so.0.0.1)
==2737== by 0x4E8AAB3: glusterfs_graph_activate (in
/usr/lib64/libglusterfs.so.0.0.1)
==2737== by 0x409C35: glusterfs_process_volfp (in /usr/sbin/glusterfsd)
==2737== by 0x409D99: glusterfs_volumes_init (in /usr/sbin/glusterfsd)
==2737==
==2737== LEAK SUMMARY:
==2737== definitely lost: 1,053 bytes in 10 blocks
==2737== indirectly lost: 317 bytes in 3 blocks
==2737== possibly lost: 2,374,971 bytes in 524 blocks
==2737== still reachable: 53,277 bytes in 201 blocks
==2737== suppressed: 0 bytes in 0 blocks

-- 


Regards
Abhishek Paliwal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190516/b81a0ba8/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: valgrind-2748.log
Type: text/x-log
Size: 23721 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190516/b81a0ba8/attachment-0003.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: valgrind-2746.log
Type: text/x-log
Size: 24526 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190516/b81a0ba8/attachment-0004.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: valgrind-2730.log
Type: text/x-log
Size: 1239130 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190516/b81a0ba8/attachment-0005.bin>

From spisla80 at gmail.com  Thu May 16 07:38:12 2019
From: spisla80 at gmail.com (David Spisla)
Date: Thu, 16 May 2019 09:38:12 +0200
Subject: [Gluster-users] [Gluster-devel] Improve stability between
 SMB/CTDB and Gluster (together with Samba Core Developer)
In-Reply-To: <AM6PR04MB6006428E20D24F6B883BD12EF10F0@AM6PR04MB6006.eurprd04.prod.outlook.com>
References: <CAJyj9j_BofU7ziSNmTQWzcs4r9xLzzTNKetk+nhHay70T_z1qA@mail.gmail.com>
	<CALXm_dFHHX7Sn8YvpemN1B6YPPB4W+4hZ+50NfTrDaijqejk4Q@mail.gmail.com>
	<AM6PR04MB6006428E20D24F6B883BD12EF10F0@AM6PR04MB6006.eurprd04.prod.outlook.com>
Message-ID: <CAJyj9j_e=AM=1OoZ6T8jeF3B0pewB8Tft2TPro1X6dmKjzQ+eg@mail.gmail.com>

Hello everyone,

if there is any problem in finding a date and time, please contact me. It
would be fine to have a meeting soon.

Regards
David Spisla

Am Mo., 13. Mai 2019 um 12:38 Uhr schrieb David Spisla <
david.spisla at iternity.com>:

> Hi Poornima,
>
>
>
> thats fine. I would suggest this dates and times:
>
>
>
> May 15th ? 17th at 12:30, 13:30, 14:30 IST (9:00, 10:00, 11:00 CEST)
>
> May 20th ? 24th at 12:30, 13:30, 14:30 IST (9:00, 10:00, 11:00 CEST)
>
>
>
> I add Volker Lendecke from Sernet to the mail. He is the Samba Expert.
>
> Can someone of you provide a host via bluejeans.com? If not, I will try
> it with GoToMeeting (https://www.gotomeeting.com).
>
>
>
> @all Please write your prefered dates and times. For me, all oft the above
> dates and times are fine
>
>
>
> Regards
>
> David
>
>
>
>
>
> *Von:* Poornima Gurusiddaiah <pgurusid at redhat.com>
> *Gesendet:* Montag, 13. Mai 2019 07:22
> *An:* David Spisla <spisla80 at gmail.com>; Anoop C S <anoopcs at redhat.com>;
> Gunther Deschner <gdeschne at redhat.com>
> *Cc:* Gluster Devel <gluster-devel at gluster.org>; gluster-users at gluster.org
> List <gluster-users at gluster.org>
> *Betreff:* Re: [Gluster-devel] Improve stability between SMB/CTDB and
> Gluster (together with Samba Core Developer)
>
>
>
> Hi,
>
>
>
> We would be definitely interested in this. Thank you for contacting us.
> For the starter we can have an online conference. Please suggest few
> possible date and times for the week(preferably between IST 7.00AM - 9.PM
> )?
>
> Adding Anoop and Gunther who are also the main contributors to the
> Gluster-Samba integration.
>
>
>
> Thanks,
>
> Poornima
>
>
>
>
>
>
>
> On Thu, May 9, 2019 at 7:43 PM David Spisla <spisla80 at gmail.com> wrote:
>
> Dear Gluster Community,
>
> at the moment we are improving the stability of SMB/CTDB and Gluster. For
> this purpose we are working together with an advanced SAMBA Core Developer.
> He did some debugging but needs more information about Gluster Core
> Behaviour.
>
>
>
> *Would any of the Gluster Developer wants to have a online conference with
> him and me?*
>
>
>
> I would organize everything. In my opinion this is a good chance to
> improve stability of Glusterfs and this is at the moment one of the major
> issues in the Community.
>
>
>
> Regards
>
> David Spisla
>
> _______________________________________________
>
> Community Meeting Calendar:
>
> APAC Schedule -
> Every 2nd and 4th Tuesday at 11:30 AM IST
> Bridge: https://bluejeans.com/836554017
>
> NA/EMEA Schedule -
> Every 1st and 3rd Tuesday at 01:00 PM EDT
> Bridge: https://bluejeans.com/486278655
>
> Gluster-devel mailing list
> Gluster-devel at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190516/15b53be1/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image860747.png
Type: image/png
Size: 382 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190516/15b53be1/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image735814.png
Type: image/png
Size: 412 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190516/15b53be1/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image116096.png
Type: image/png
Size: 6545 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190516/15b53be1/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image142576.png
Type: image/png
Size: 37146 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190516/15b53be1/attachment-0003.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image714843.png
Type: image/png
Size: 522 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190516/15b53be1/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image293410.png
Type: image/png
Size: 591 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190516/15b53be1/attachment-0005.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image570372.png
Type: image/png
Size: 775 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190516/15b53be1/attachment-0006.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image031225.png
Type: image/png
Size: 508 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190516/15b53be1/attachment-0007.png>

From spisla80 at gmail.com  Thu May 16 07:53:36 2019
From: spisla80 at gmail.com (David Spisla)
Date: Thu, 16 May 2019 09:53:36 +0200
Subject: [Gluster-users] Brick-Xlators crashes after Set-RO and Read
In-Reply-To: <CAHn=sVCh1nPCvUvZT2tmnn7NF8WztO9D9PUiNa5mDb5ORy+8Yw@mail.gmail.com>
References: <CAJyj9j9wLf7BLJOP8Se5NrXr2ySPw5bTMY8OQ+RU0f4tCzAXcA@mail.gmail.com>
	<CAHn=sVAm-9GXBYst2P0k3rn-HU15Rw3HgUAGY0R=MOymDpxq-A@mail.gmail.com>
	<CAJyj9j83TOBvaBCW+biCoo0uxhf+0nZed1cTFYCJfmu2s-ZsNA@mail.gmail.com>
	<CAHn=sVCh1nPCvUvZT2tmnn7NF8WztO9D9PUiNa5mDb5ORy+8Yw@mail.gmail.com>
Message-ID: <CAJyj9j-h=3y+uQhSZvvh=31GUQX5eAqOo3FCXqECpE6QADVm_A@mail.gmail.com>

Hello Vijay,

I could reproduce the issue. After doing a simple DIR Listing from Win10
powershell, all brick processes crashes. Its not the same scenario
mentioned before but the crash report in the bricks log is the same.
Attached you find the backtrace.

Regards
David Spisla

Am Di., 7. Mai 2019 um 20:08 Uhr schrieb Vijay Bellur <vbellur at redhat.com>:

> Hello David,
>
> On Tue, May 7, 2019 at 2:16 AM David Spisla <spisla80 at gmail.com> wrote:
>
>> Hello Vijay,
>>
>> how can I create such a core file? Or will it be created automatically if
>> a gluster process crashes?
>> Maybe you can give me a hint and will try to get a backtrace.
>>
>
> Generation of core file is dependent on the system configuration.  `man 5
> core` contains useful information to generate a core file in a directory.
> Once a core file is generated, you can use gdb to get a backtrace of all
> threads (using "thread apply all bt full").
>
>
>> Unfortunately this bug is not easy to reproduce because it appears only
>> sometimes.
>>
>
> If the bug is not easy to reproduce, having a backtrace from the generated
> core would be very useful!
>
> Thanks,
> Vijay
>
>
>>
>> Regards
>> David Spisla
>>
>> Am Mo., 6. Mai 2019 um 19:48 Uhr schrieb Vijay Bellur <vbellur at redhat.com
>> >:
>>
>>> Thank you for the report, David. Do you have core files available on any
>>> of the servers? If yes, would it be possible for you to provide a backtrace.
>>>
>>> Regards,
>>> Vijay
>>>
>>> On Mon, May 6, 2019 at 3:09 AM David Spisla <spisla80 at gmail.com> wrote:
>>>
>>>> Hello folks,
>>>>
>>>> we have a client application (runs on Win10) which does some FOPs on a
>>>> gluster volume which is accessed by SMB.
>>>>
>>>> *Scenario 1* is a READ Operation which reads all files successively
>>>> and checks if the files data was correctly copied. While doing this, all
>>>> brick processes crashes and in the logs one have this crash report on every
>>>> brick log:
>>>>
>>>>> CTX_ID:a0359502-2c76-4fee-8cb9-365679dc690e-GRAPH_ID:0-PID:32934-HOST:XX-XXXXX-XX-XX-PC_NAME:shortterm-client-2-RECON_NO:-0, gfid: 00000000-0000-0000-0000-000000000001, req(uid:2000,gid:2000,perm:1,ngrps:1), ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission denied]
>>>>> pending frames:
>>>>> frame : type(0) op(27)
>>>>> frame : type(0) op(40)
>>>>> patchset: git://git.gluster.org/glusterfs.git
>>>>> signal received: 11
>>>>> time of crash:
>>>>> 2019-04-16 08:32:21
>>>>> configuration details:
>>>>> argp 1
>>>>> backtrace 1
>>>>> dlfcn 1
>>>>> libpthread 1
>>>>> llistxattr 1
>>>>> setfsid 1
>>>>> spinlock 1
>>>>> epoll.h 1
>>>>> xattr.h 1
>>>>> st_atim.tv_nsec 1
>>>>> package-string: glusterfs 5.5
>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f9a5bd4d64c]
>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f9a5bd57d26]
>>>>> /lib64/libc.so.6(+0x361a0)[0x7f9a5af141a0]
>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7f9a4ef0e910]
>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7f9a4ef0b118]
>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7f9a4f1278d6]
>>>>> /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7f9a4f35975b]
>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7f9a4f1203b3]
>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7f9a4ef0b5b2]
>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c]
>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c]
>>>>> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7f9a4e8cf548]
>>>>> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7f9a5bdefc22]
>>>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7f9a5bd733a5]
>>>>> /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7f9a4e6b7088]
>>>>> /lib64/libpthread.so.0(+0x7569)[0x7f9a5b29f569]
>>>>> /lib64/libc.so.6(clone+0x3f)[0x7f9a5afd69af]
>>>>>
>>>>> *Scenario 2 *The application just SET Read-Only on each file
>>>> sucessively. After the 70th file was set, all the bricks crashes and again,
>>>> one can read this crash report in every brick log:
>>>>
>>>>>
>>>>>
>>>>> [2019-05-02 07:43:39.953591] I [MSGID: 139001]
>>>>> [posix-acl.c:263:posix_acl_log_permit_denied] 0-longterm-access-control:
>>>>> client:
>>>>> CTX_ID:21aa9c75-3a5f-41f9-925b-48e4c80bd24a-GRAPH_ID:0-PID:16325-HOST:XXX-X-X-XXX-PC_NAME:longterm-client-0-RECON_NO:-0,
>>>>> gfid: 00000000-0000-0000-0000-000000000001,
>>>>> req(uid:2000,gid:2000,perm:1,ngrps:1),
>>>>> ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission
>>>>> denied]
>>>>>
>>>>> pending frames:
>>>>>
>>>>> frame : type(0) op(27)
>>>>>
>>>>> patchset: git://git.gluster.org/glusterfs.git
>>>>>
>>>>> signal received: 11
>>>>>
>>>>> time of crash:
>>>>>
>>>>> 2019-05-02 07:43:39
>>>>>
>>>>> configuration details:
>>>>>
>>>>> argp 1
>>>>>
>>>>> backtrace 1
>>>>>
>>>>> dlfcn 1
>>>>>
>>>>> libpthread 1
>>>>>
>>>>> llistxattr 1
>>>>>
>>>>> setfsid 1
>>>>>
>>>>> spinlock 1
>>>>>
>>>>> epoll.h 1
>>>>>
>>>>> xattr.h 1
>>>>>
>>>>> st_atim.tv_nsec 1
>>>>>
>>>>> package-string: glusterfs 5.5
>>>>>
>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fbb3f0b364c]
>>>>>
>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fbb3f0bdd26]
>>>>>
>>>>> /lib64/libc.so.6(+0x361e0)[0x7fbb3e27a1e0]
>>>>>
>>>>>
>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7fbb32257910]
>>>>>
>>>>>
>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7fbb32254118]
>>>>>
>>>>>
>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7fbb324708d6]
>>>>>
>>>>>
>>>>> /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7fbb326a275b]
>>>>>
>>>>>
>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7fbb324693b3]
>>>>>
>>>>>
>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7fbb322545b2]
>>>>>
>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c]
>>>>>
>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c]
>>>>>
>>>>>
>>>>> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7fbb31c18548]
>>>>>
>>>>>
>>>>> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7fbb3f155c22]
>>>>>
>>>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7fbb3f0d93a5]
>>>>>
>>>>>
>>>>> /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7fbb31a00088]
>>>>>
>>>>> /lib64/libpthread.so.0(+0x7569)[0x7fbb3e605569]
>>>>>
>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fbb3e33c9ef]
>>>>>
>>>>
>>>> This happens on a 3-Node Gluster v5.5 Cluster on two different volumes.
>>>> But both volumes has the same settings:
>>>>
>>>>> Volume Name: shortterm
>>>>> Type: Replicate
>>>>> Volume ID: 5307e5c5-e8a1-493a-a846-342fb0195dee
>>>>> Status: Started
>>>>> Snapshot Count: 0
>>>>> Number of Bricks: 1 x 3 = 3
>>>>> Transport-type: tcp
>>>>> Bricks:
>>>>> Brick1: fs-xxxxx-c1-n1:/gluster/brick4/glusterbrick
>>>>> Brick2: fs-xxxxx-c1-n2:/gluster/brick4/glusterbrick
>>>>> Brick3: fs-xxxxx-c1-n3:/gluster/brick4/glusterbrick
>>>>> Options Reconfigured:
>>>>> storage.reserve: 1
>>>>> performance.client-io-threads: off
>>>>> nfs.disable: on
>>>>> transport.address-family: inet
>>>>> user.smb: disable
>>>>> features.read-only: off
>>>>> features.worm: off
>>>>> features.worm-file-level: on
>>>>> features.retention-mode: enterprise
>>>>> features.default-retention-period: 120
>>>>> network.ping-timeout: 10
>>>>> features.cache-invalidation: on
>>>>> features.cache-invalidation-timeout: 600
>>>>> performance.nl-cache: on
>>>>> performance.nl-cache-timeout: 600
>>>>> client.event-threads: 32
>>>>> server.event-threads: 32
>>>>> cluster.lookup-optimize: on
>>>>> performance.stat-prefetch: on
>>>>> performance.cache-invalidation: on
>>>>> performance.md-cache-timeout: 600
>>>>> performance.cache-samba-metadata: on
>>>>> performance.cache-ima-xattrs: on
>>>>> performance.io-thread-count: 64
>>>>> cluster.use-compound-fops: on
>>>>> performance.cache-size: 512MB
>>>>> performance.cache-refresh-timeout: 10
>>>>> performance.read-ahead: off
>>>>> performance.write-behind-window-size: 4MB
>>>>> performance.write-behind: on
>>>>> storage.build-pgfid: on
>>>>> features.utime: on
>>>>> storage.ctime: on
>>>>> cluster.quorum-type: fixed
>>>>> cluster.quorum-count: 2
>>>>> features.bitrot: on
>>>>> features.scrub: Active
>>>>> features.scrub-freq: daily
>>>>> cluster.enable-shared-storage: enable
>>>>>
>>>>>
>>>> Why can this happen to all Brick processes? I don't understand the
>>>> crash report. The FOPs are nothing special and after restart brick
>>>> processes everything works fine and our application was succeed.
>>>>
>>>> Regards
>>>> David Spisla
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190516/42a4c56d/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: backtrace.log
Type: application/octet-stream
Size: 36515 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190516/42a4c56d/attachment.obj>

From vbellur at redhat.com  Thu May 16 08:05:22 2019
From: vbellur at redhat.com (Vijay Bellur)
Date: Thu, 16 May 2019 01:05:22 -0700
Subject: [Gluster-users] Brick-Xlators crashes after Set-RO and Read
In-Reply-To: <CAJyj9j-h=3y+uQhSZvvh=31GUQX5eAqOo3FCXqECpE6QADVm_A@mail.gmail.com>
References: <CAJyj9j9wLf7BLJOP8Se5NrXr2ySPw5bTMY8OQ+RU0f4tCzAXcA@mail.gmail.com>
	<CAHn=sVAm-9GXBYst2P0k3rn-HU15Rw3HgUAGY0R=MOymDpxq-A@mail.gmail.com>
	<CAJyj9j83TOBvaBCW+biCoo0uxhf+0nZed1cTFYCJfmu2s-ZsNA@mail.gmail.com>
	<CAHn=sVCh1nPCvUvZT2tmnn7NF8WztO9D9PUiNa5mDb5ORy+8Yw@mail.gmail.com>
	<CAJyj9j-h=3y+uQhSZvvh=31GUQX5eAqOo3FCXqECpE6QADVm_A@mail.gmail.com>
Message-ID: <CAHn=sVA0iC8izWBvctks_+L4b921QKx1fmgV7cJB1aXFMBF=JA@mail.gmail.com>

Hello David,

Do you have any custom patches in your deployment? I looked up v5.5 but
could not find the following functions referred to in the core:

map_atime_from_server()
worm_lookup_cbk()

Neither do I see xlator_helper.c in the codebase.

Thanks,
Vijay


#0  map_atime_from_server (this=0x7fdef401af00, stbuf=0x0) at
../../../../xlators/lib/src/xlator_helper.c:21
        __FUNCTION__ = "map_to_atime_from_server"
#1  0x00007fdef39a0382 in worm_lookup_cbk (frame=frame at entry=0x7fdeac0015c8,
cookie=<optimized out>, this=0x7fdef401af00, op_ret=op_ret at entry=-1,
op_errno=op_errno at entry=13,
    inode=inode at entry=0x0, buf=0x0, xdata=0x0, postparent=0x0) at worm.c:531
        priv = 0x7fdef4075378
        ret = 0
        __FUNCTION__ = "worm_lookup_cbk"

On Thu, May 16, 2019 at 12:53 AM David Spisla <spisla80 at gmail.com> wrote:

> Hello Vijay,
>
> I could reproduce the issue. After doing a simple DIR Listing from Win10
> powershell, all brick processes crashes. Its not the same scenario
> mentioned before but the crash report in the bricks log is the same.
> Attached you find the backtrace.
>
> Regards
> David Spisla
>
> Am Di., 7. Mai 2019 um 20:08 Uhr schrieb Vijay Bellur <vbellur at redhat.com
> >:
>
>> Hello David,
>>
>> On Tue, May 7, 2019 at 2:16 AM David Spisla <spisla80 at gmail.com> wrote:
>>
>>> Hello Vijay,
>>>
>>> how can I create such a core file? Or will it be created automatically
>>> if a gluster process crashes?
>>> Maybe you can give me a hint and will try to get a backtrace.
>>>
>>
>> Generation of core file is dependent on the system configuration.  `man 5
>> core` contains useful information to generate a core file in a directory.
>> Once a core file is generated, you can use gdb to get a backtrace of all
>> threads (using "thread apply all bt full").
>>
>>
>>> Unfortunately this bug is not easy to reproduce because it appears only
>>> sometimes.
>>>
>>
>> If the bug is not easy to reproduce, having a backtrace from the
>> generated core would be very useful!
>>
>> Thanks,
>> Vijay
>>
>>
>>>
>>> Regards
>>> David Spisla
>>>
>>> Am Mo., 6. Mai 2019 um 19:48 Uhr schrieb Vijay Bellur <
>>> vbellur at redhat.com>:
>>>
>>>> Thank you for the report, David. Do you have core files available on
>>>> any of the servers? If yes, would it be possible for you to provide a
>>>> backtrace.
>>>>
>>>> Regards,
>>>> Vijay
>>>>
>>>> On Mon, May 6, 2019 at 3:09 AM David Spisla <spisla80 at gmail.com> wrote:
>>>>
>>>>> Hello folks,
>>>>>
>>>>> we have a client application (runs on Win10) which does some FOPs on a
>>>>> gluster volume which is accessed by SMB.
>>>>>
>>>>> *Scenario 1* is a READ Operation which reads all files successively
>>>>> and checks if the files data was correctly copied. While doing this, all
>>>>> brick processes crashes and in the logs one have this crash report on every
>>>>> brick log:
>>>>>
>>>>>> CTX_ID:a0359502-2c76-4fee-8cb9-365679dc690e-GRAPH_ID:0-PID:32934-HOST:XX-XXXXX-XX-XX-PC_NAME:shortterm-client-2-RECON_NO:-0, gfid: 00000000-0000-0000-0000-000000000001, req(uid:2000,gid:2000,perm:1,ngrps:1), ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission denied]
>>>>>> pending frames:
>>>>>> frame : type(0) op(27)
>>>>>> frame : type(0) op(40)
>>>>>> patchset: git://git.gluster.org/glusterfs.git
>>>>>> signal received: 11
>>>>>> time of crash:
>>>>>> 2019-04-16 08:32:21
>>>>>> configuration details:
>>>>>> argp 1
>>>>>> backtrace 1
>>>>>> dlfcn 1
>>>>>> libpthread 1
>>>>>> llistxattr 1
>>>>>> setfsid 1
>>>>>> spinlock 1
>>>>>> epoll.h 1
>>>>>> xattr.h 1
>>>>>> st_atim.tv_nsec 1
>>>>>> package-string: glusterfs 5.5
>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f9a5bd4d64c]
>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f9a5bd57d26]
>>>>>> /lib64/libc.so.6(+0x361a0)[0x7f9a5af141a0]
>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7f9a4ef0e910]
>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7f9a4ef0b118]
>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7f9a4f1278d6]
>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7f9a4f35975b]
>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7f9a4f1203b3]
>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7f9a4ef0b5b2]
>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c]
>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c]
>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7f9a4e8cf548]
>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7f9a5bdefc22]
>>>>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7f9a5bd733a5]
>>>>>> /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7f9a4e6b7088]
>>>>>> /lib64/libpthread.so.0(+0x7569)[0x7f9a5b29f569]
>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7f9a5afd69af]
>>>>>>
>>>>>> *Scenario 2 *The application just SET Read-Only on each file
>>>>> sucessively. After the 70th file was set, all the bricks crashes and again,
>>>>> one can read this crash report in every brick log:
>>>>>
>>>>>>
>>>>>>
>>>>>> [2019-05-02 07:43:39.953591] I [MSGID: 139001]
>>>>>> [posix-acl.c:263:posix_acl_log_permit_denied] 0-longterm-access-control:
>>>>>> client:
>>>>>> CTX_ID:21aa9c75-3a5f-41f9-925b-48e4c80bd24a-GRAPH_ID:0-PID:16325-HOST:XXX-X-X-XXX-PC_NAME:longterm-client-0-RECON_NO:-0,
>>>>>> gfid: 00000000-0000-0000-0000-000000000001,
>>>>>> req(uid:2000,gid:2000,perm:1,ngrps:1),
>>>>>> ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission
>>>>>> denied]
>>>>>>
>>>>>> pending frames:
>>>>>>
>>>>>> frame : type(0) op(27)
>>>>>>
>>>>>> patchset: git://git.gluster.org/glusterfs.git
>>>>>>
>>>>>> signal received: 11
>>>>>>
>>>>>> time of crash:
>>>>>>
>>>>>> 2019-05-02 07:43:39
>>>>>>
>>>>>> configuration details:
>>>>>>
>>>>>> argp 1
>>>>>>
>>>>>> backtrace 1
>>>>>>
>>>>>> dlfcn 1
>>>>>>
>>>>>> libpthread 1
>>>>>>
>>>>>> llistxattr 1
>>>>>>
>>>>>> setfsid 1
>>>>>>
>>>>>> spinlock 1
>>>>>>
>>>>>> epoll.h 1
>>>>>>
>>>>>> xattr.h 1
>>>>>>
>>>>>> st_atim.tv_nsec 1
>>>>>>
>>>>>> package-string: glusterfs 5.5
>>>>>>
>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fbb3f0b364c]
>>>>>>
>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fbb3f0bdd26]
>>>>>>
>>>>>> /lib64/libc.so.6(+0x361e0)[0x7fbb3e27a1e0]
>>>>>>
>>>>>>
>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7fbb32257910]
>>>>>>
>>>>>>
>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7fbb32254118]
>>>>>>
>>>>>>
>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7fbb324708d6]
>>>>>>
>>>>>>
>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7fbb326a275b]
>>>>>>
>>>>>>
>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7fbb324693b3]
>>>>>>
>>>>>>
>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7fbb322545b2]
>>>>>>
>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c]
>>>>>>
>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c]
>>>>>>
>>>>>>
>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7fbb31c18548]
>>>>>>
>>>>>>
>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7fbb3f155c22]
>>>>>>
>>>>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7fbb3f0d93a5]
>>>>>>
>>>>>>
>>>>>> /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7fbb31a00088]
>>>>>>
>>>>>> /lib64/libpthread.so.0(+0x7569)[0x7fbb3e605569]
>>>>>>
>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fbb3e33c9ef]
>>>>>>
>>>>>
>>>>> This happens on a 3-Node Gluster v5.5 Cluster on two different
>>>>> volumes. But both volumes has the same settings:
>>>>>
>>>>>> Volume Name: shortterm
>>>>>> Type: Replicate
>>>>>> Volume ID: 5307e5c5-e8a1-493a-a846-342fb0195dee
>>>>>> Status: Started
>>>>>> Snapshot Count: 0
>>>>>> Number of Bricks: 1 x 3 = 3
>>>>>> Transport-type: tcp
>>>>>> Bricks:
>>>>>> Brick1: fs-xxxxx-c1-n1:/gluster/brick4/glusterbrick
>>>>>> Brick2: fs-xxxxx-c1-n2:/gluster/brick4/glusterbrick
>>>>>> Brick3: fs-xxxxx-c1-n3:/gluster/brick4/glusterbrick
>>>>>> Options Reconfigured:
>>>>>> storage.reserve: 1
>>>>>> performance.client-io-threads: off
>>>>>> nfs.disable: on
>>>>>> transport.address-family: inet
>>>>>> user.smb: disable
>>>>>> features.read-only: off
>>>>>> features.worm: off
>>>>>> features.worm-file-level: on
>>>>>> features.retention-mode: enterprise
>>>>>> features.default-retention-period: 120
>>>>>> network.ping-timeout: 10
>>>>>> features.cache-invalidation: on
>>>>>> features.cache-invalidation-timeout: 600
>>>>>> performance.nl-cache: on
>>>>>> performance.nl-cache-timeout: 600
>>>>>> client.event-threads: 32
>>>>>> server.event-threads: 32
>>>>>> cluster.lookup-optimize: on
>>>>>> performance.stat-prefetch: on
>>>>>> performance.cache-invalidation: on
>>>>>> performance.md-cache-timeout: 600
>>>>>> performance.cache-samba-metadata: on
>>>>>> performance.cache-ima-xattrs: on
>>>>>> performance.io-thread-count: 64
>>>>>> cluster.use-compound-fops: on
>>>>>> performance.cache-size: 512MB
>>>>>> performance.cache-refresh-timeout: 10
>>>>>> performance.read-ahead: off
>>>>>> performance.write-behind-window-size: 4MB
>>>>>> performance.write-behind: on
>>>>>> storage.build-pgfid: on
>>>>>> features.utime: on
>>>>>> storage.ctime: on
>>>>>> cluster.quorum-type: fixed
>>>>>> cluster.quorum-count: 2
>>>>>> features.bitrot: on
>>>>>> features.scrub: Active
>>>>>> features.scrub-freq: daily
>>>>>> cluster.enable-shared-storage: enable
>>>>>>
>>>>>>
>>>>> Why can this happen to all Brick processes? I don't understand the
>>>>> crash report. The FOPs are nothing special and after restart brick
>>>>> processes everything works fine and our application was succeed.
>>>>>
>>>>> Regards
>>>>> David Spisla
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190516/20a821a8/attachment.html>

From paul at vandervlis.nl  Thu May 16 08:47:46 2019
From: paul at vandervlis.nl (Paul van der Vlis)
Date: Thu, 16 May 2019 10:47:46 +0200
Subject: [Gluster-users] Cannot see all data in mount
In-Reply-To: <CAOUCJ=jSp7j2d=_kQYbLgUm0cn1G+4z97hhnt2JmQrb0ygDcnQ@mail.gmail.com>
References: <9241cbaf-38ba-63e0-95f0-120bd9856bf5@vandervlis.nl>
	<1716249284.809654.1557925164742@mail.yahoo.com>
	<4e160ac2-002a-8ef8-7660-de7cff369882@vandervlis.nl>
	<CAOUCJ=g0_pURSXb-ZUpauRDv5KVNYOTyic+r6Bm4LJn9W83EVg@mail.gmail.com>
	<5ca473e6-d2d4-3363-6a98-30667a644e05@vandervlis.nl>
	<CAOUCJ=jSp7j2d=_kQYbLgUm0cn1G+4z97hhnt2JmQrb0ygDcnQ@mail.gmail.com>
Message-ID: <6ef1edd2-7051-a7ad-a0c3-b59fa00aec03@vandervlis.nl>

Op 16-05-19 om 05:43 schreef Nithya Balachandran:
> 
> 
> On Thu, 16 May 2019 at 03:05, Paul van der Vlis <paul at vandervlis.nl
> <mailto:paul at vandervlis.nl>> wrote:
> 
>     Op 15-05-19 om 15:45 schreef Nithya Balachandran:
>     > Hi Paul,
>     >
>     > A few questions:
>     > Which version of gluster are you using?
> 
>     On the server and some clients: glusterfs 4.1.2
>     On a new client: glusterfs 5.5
> 
> Is the same behaviour seen on both client versions?

Yes.

>     > Did this behaviour start recently? As in were the contents of that
>     > directory visible earlier?
> 
>     This directory was normally used in the headoffice, and there is direct
>     access to the files without Glusterfs. So I don't know.
> 
> 
> Do you mean that they access the files on the gluster volume without
> using the client or that these files were stored elsewhere
> earlier (not on gluster)? Files on a gluster volume should never be
> accessed directly.

The central server (this is the only gluster-brick) is a thin-client
server, people are working directly on the server using LTSP terminals:
http://ltsp.org/).

The data is exported using Gluster to some other machines in smaller
offices.

And to a new thin-client server what I am making (using X2go). The goal
is that this server will replace all of the excisting machines in the
future. X2go is something like "Citrix for Linux", you can use it over
the internet.

I did not setup Gluster and I have never met the old sysadmin. I guess
it's also very strange to use Gluster with only one brick. So when I
understand you right, the whole setup is wrong, and you may not access
the files without client?

> To debug this further, please send the following:
> 
>  1. The directory contents when the listing is performed directly on the
>     brick.
>  2. The tcpdump of the gluster client when listing the directory using
>     the following command:
> 
>     tcpdump -i any -s 0 -w /var/tmp/dirls.pcap tcp and not port 22
> 
> 
> You can send these directly to me in case you want to keep the
> information private.

I have just heard (during writing this message) that the owner of the
firm where I make this for, is in hospital in very critical condition.
They've asked me to stop with the work at the moment.

I did also hear that there where more problems with the filesystem.
Especially when a directory was renamed.
And this directory was renamed in the past.

With regards,
Paul van der Vlis

> Regards,
> Nithya
> ?
> 
> 
>     With regards,
>     Paul van der Vlis
> 
>     > Regards,
>     > Nithya
>     >
>     >
>     > On Wed, 15 May 2019 at 18:55, Paul van der Vlis
>     <paul at vandervlis.nl <mailto:paul at vandervlis.nl>
>     > <mailto:paul at vandervlis.nl <mailto:paul at vandervlis.nl>>> wrote:
>     >
>     >? ? ?Hello Strahil,
>     >
>     >? ? ?Thanks for your answer. I don't find the word "sharding" in the
>     >? ? ?configfiles. There is not much shared data (24GB), and only 1
>     brick:
>     >? ? ?---
>     >? ? ?root at xxx:/etc/glusterfs# gluster volume info DATA
>     >
>     >? ? ?Volume Name: DATA
>     >? ? ?Type: Distribute
>     >? ? ?Volume ID: db53ece1-5def-4f7c-b59d-3a230824032a
>     >? ? ?Status: Started
>     >? ? ?Snapshot Count: 0
>     >? ? ?Number of Bricks: 1
>     >? ? ?Transport-type: tcp
>     >? ? ?Bricks:
>     >? ? ?Brick1: xxx-vpn:/DATA
>     >? ? ?Options Reconfigured:
>     >? ? ?transport.address-family: inet
>     >? ? ?nfs.disable: on
>     >? ? ?----
>     >? ? ?(I have edited this a bit for privacy of my customer).
>     >
>     >? ? ?I think they have used glusterfs because it can do ACLs.
>     >
>     >? ? ?With regards,
>     >? ? ?Paul van der Vlis
>     >
>     >
>     >? ? ?Op 15-05-19 om 14:59 schreef Strahil Nikolov:
>     >? ? ?> Most probably you use sharding , which splits the files into
>     smaller
>     >? ? ?> chunks so you can fit a 1TB file into gluster nodes with
>     bricks of
>     >? ? ?> smaller size.
>     >? ? ?> So if you have 2 dispersed servers each having 500Gb
>     brick->? without
>     >? ? ?> sharding you won't be able to store files larger than the
>     brick size -
>     >? ? ?> no matter you have free space on the other server.
>     >? ? ?>
>     >? ? ?> When sharding is enabled - you will see on the brick the first
>     >? ? ?shard as
>     >? ? ?> a file and the rest is in a hidden folder called ".shards" (or
>     >? ? ?something
>     >? ? ?> like that).
>     >? ? ?>
>     >? ? ?> The benefit is also viewable when you need to do some
>     maintenance on a
>     >? ? ?> gluster node, as you will need to heal only the shards
>     containing
>     >? ? ?> modified by the customers' data.
>     >? ? ?>
>     >? ? ?> Best Regards,
>     >? ? ?> Strahil Nikolov
>     >? ? ?>
>     >? ? ?>
>     >? ? ?> ? ?????, 15 ??? 2019 ?., 7:31:39 ?. ???????-4, Paul van der Vlis
>     >? ? ?> <paul at vandervlis.nl <mailto:paul at vandervlis.nl>
>     <mailto:paul at vandervlis.nl <mailto:paul at vandervlis.nl>>> ??????:
>     >? ? ?>
>     >? ? ?>
>     >? ? ?> Hello,
>     >? ? ?>
>     >? ? ?> I am the new sysadmin of an organization what uses Glusterfs.
>     >? ? ?> I did not set it up, and I don't know much about Glusterfs.
>     >? ? ?>
>     >? ? ?> What I do not understand is that I do not see all data in
>     the mount.
>     >? ? ?> Not as root, not as a normal user who has privileges.
>     >? ? ?>
>     >? ? ?> When I do "ls" in one of the subdirectories I don't see any
>     data, but
>     >? ? ?> this data exists at the server!
>     >? ? ?>
>     >? ? ?> In another subdirectory I see everything fine, the rights of the
>     >? ? ?> directories and files inside are the same.
>     >? ? ?>
>     >? ? ?> I mount with something like:
>     >? ? ?> /bin/mount -t glusterfs -o acl 10.8.0.1:/data /data
>     >? ? ?> I see data in /data/VOORBEELD/, and I don't see any data in
>     >? ? ?/data/ALGEMEEN/.
>     >? ? ?>
>     >? ? ?> I don't see something special in /etc/exports or in
>     /etc/glusterfs on
>     >? ? ?> the server.
>     >? ? ?>
>     >? ? ?> Is there maybe a mechanism in Glusterfs what can exclude
>     data from
>     >? ? ?> export?? Or is there a way to debug this problem?
>     >? ? ?>
>     >? ? ?> With regards,
>     >? ? ?> Paul van der Vlis
>     >? ? ?>
>     >? ? ?> ----
>     >? ? ?> # file: VOORBEELD
>     >? ? ?> # owner: root
>     >? ? ?> # group: secretariaat
>     >? ? ?> # flags: -s-
>     >? ? ?> user::rwx
>     >? ? ?> group::rwx
>     >? ? ?> group:medewerkers:r-x
>     >? ? ?> mask::rwx
>     >? ? ?> other::---
>     >? ? ?> default:user::rwx
>     >? ? ?> default:group::rwx
>     >? ? ?> default:group:medewerkers:r-x
>     >? ? ?> default:mask::rwx
>     >? ? ?> default:other::---
>     >? ? ?>
>     >? ? ?> # file: ALGEMEEN
>     >? ? ?> # owner: root
>     >? ? ?> # group: secretariaat
>     >? ? ?> # flags: -s-
>     >? ? ?> user::rwx
>     >? ? ?> group::rwx
>     >? ? ?> group:medewerkers:r-x
>     >? ? ?> mask::rwx
>     >? ? ?> other::---
>     >? ? ?> default:user::rwx
>     >? ? ?> default:group::rwx
>     >? ? ?> default:group:medewerkers:r-x
>     >? ? ?> default:mask::rwx
>     >? ? ?> default:other::---
>     >? ? ?> ------
>     >? ? ?>
>     >? ? ?>
>     >? ? ?>
>     >? ? ?>
>     >? ? ?>
>     >? ? ?> --
>     >? ? ?> Paul van der Vlis Linux systeembeheer Groningen
>     >? ? ?> https://www.vandervlis.nl/
>     >? ? ?> _______________________________________________
>     >? ? ?> Gluster-users mailing list
>     >? ? ?> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>     <mailto:Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>>
>     >? ? ?<mailto:Gluster-users at gluster.org
>     <mailto:Gluster-users at gluster.org> <mailto:Gluster-users at gluster.org
>     <mailto:Gluster-users at gluster.org>>>
>     >? ? ?> https://lists.gluster.org/mailman/listinfo/gluster-users
>     >
>     >
>     >
>     >? ? ?--
>     >? ? ?Paul van der Vlis Linux systeembeheer Groningen
>     >? ? ?https://www.vandervlis.nl/
>     >? ? ?_______________________________________________
>     >? ? ?Gluster-users mailing list
>     >? ? ?Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>     <mailto:Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>>
>     >? ? ?https://lists.gluster.org/mailman/listinfo/gluster-users
>     >
> 
> 
> 
>     -- 
>     Paul van der Vlis Linux systeembeheer Groningen
>     https://www.vandervlis.nl/
> 


-- 
Paul van der Vlis Linux systeembeheer Groningen
https://www.vandervlis.nl/

From nbalacha at redhat.com  Thu May 16 09:04:58 2019
From: nbalacha at redhat.com (Nithya Balachandran)
Date: Thu, 16 May 2019 14:34:58 +0530
Subject: [Gluster-users] Cannot see all data in mount
In-Reply-To: <6ef1edd2-7051-a7ad-a0c3-b59fa00aec03@vandervlis.nl>
References: <9241cbaf-38ba-63e0-95f0-120bd9856bf5@vandervlis.nl>
	<1716249284.809654.1557925164742@mail.yahoo.com>
	<4e160ac2-002a-8ef8-7660-de7cff369882@vandervlis.nl>
	<CAOUCJ=g0_pURSXb-ZUpauRDv5KVNYOTyic+r6Bm4LJn9W83EVg@mail.gmail.com>
	<5ca473e6-d2d4-3363-6a98-30667a644e05@vandervlis.nl>
	<CAOUCJ=jSp7j2d=_kQYbLgUm0cn1G+4z97hhnt2JmQrb0ygDcnQ@mail.gmail.com>
	<6ef1edd2-7051-a7ad-a0c3-b59fa00aec03@vandervlis.nl>
Message-ID: <CAOUCJ=jgU1WVVmqew6s2Lb8mFJLT6uQj-R-+ZeFHrNavu7gZ5Q@mail.gmail.com>

On Thu, 16 May 2019 at 14:17, Paul van der Vlis <paul at vandervlis.nl> wrote:

> Op 16-05-19 om 05:43 schreef Nithya Balachandran:
> >
> >
> > On Thu, 16 May 2019 at 03:05, Paul van der Vlis <paul at vandervlis.nl
> > <mailto:paul at vandervlis.nl>> wrote:
> >
> >     Op 15-05-19 om 15:45 schreef Nithya Balachandran:
> >     > Hi Paul,
> >     >
> >     > A few questions:
> >     > Which version of gluster are you using?
> >
> >     On the server and some clients: glusterfs 4.1.2
> >     On a new client: glusterfs 5.5
> >
> > Is the same behaviour seen on both client versions?
>
> Yes.
>
> >     > Did this behaviour start recently? As in were the contents of that
> >     > directory visible earlier?
> >
> >     This directory was normally used in the headoffice, and there is
> direct
> >     access to the files without Glusterfs. So I don't know.
> >
> >
> > Do you mean that they access the files on the gluster volume without
> > using the client or that these files were stored elsewhere
> > earlier (not on gluster)? Files on a gluster volume should never be
> > accessed directly.
>
> The central server (this is the only gluster-brick) is a thin-client
> server, people are working directly on the server using LTSP terminals:
> http://ltsp.org/).
>
> The data is exported using Gluster to some other machines in smaller
> offices.
>
> And to a new thin-client server what I am making (using X2go). The goal
> is that this server will replace all of the excisting machines in the
> future. X2go is something like "Citrix for Linux", you can use it over
> the internet.
>
> I did not setup Gluster and I have never met the old sysadmin. I guess
> it's also very strange to use Gluster with only one brick. So when I
> understand you right, the whole setup is wrong, and you may not access
> the files without client?
>
>
That is correct - any files on a gluster volume should be accessed only via
a gluster client (if using fuse).


> > To debug this further, please send the following:
> >
> >  1. The directory contents when the listing is performed directly on the
> >     brick.
> >  2. The tcpdump of the gluster client when listing the directory using
> >     the following command:
> >
> >     tcpdump -i any -s 0 -w /var/tmp/dirls.pcap tcp and not port 22
> >
> >
> > You can send these directly to me in case you want to keep the
> > information private.
>
> I have just heard (during writing this message) that the owner of the
> firm where I make this for, is in hospital in very critical condition.
> They've asked me to stop with the work at the moment.
>
> I did also hear that there where more problems with the filesystem.
> Especially when a directory was renamed.
> And this directory was renamed in the past.
>
>
Let me know when you plan to continue with this . We can take a look.

Regards,
Nithya


> With regards,
> Paul van der Vlis
>
> > Regards,
> > Nithya
> >
> >
> >
> >     With regards,
> >     Paul van der Vlis
> >
> >     > Regards,
> >     > Nithya
> >     >
> >     >
> >     > On Wed, 15 May 2019 at 18:55, Paul van der Vlis
> >     <paul at vandervlis.nl <mailto:paul at vandervlis.nl>
> >     > <mailto:paul at vandervlis.nl <mailto:paul at vandervlis.nl>>> wrote:
> >     >
> >     >     Hello Strahil,
> >     >
> >     >     Thanks for your answer. I don't find the word "sharding" in the
> >     >     configfiles. There is not much shared data (24GB), and only 1
> >     brick:
> >     >     ---
> >     >     root at xxx:/etc/glusterfs# gluster volume info DATA
> >     >
> >     >     Volume Name: DATA
> >     >     Type: Distribute
> >     >     Volume ID: db53ece1-5def-4f7c-b59d-3a230824032a
> >     >     Status: Started
> >     >     Snapshot Count: 0
> >     >     Number of Bricks: 1
> >     >     Transport-type: tcp
> >     >     Bricks:
> >     >     Brick1: xxx-vpn:/DATA
> >     >     Options Reconfigured:
> >     >     transport.address-family: inet
> >     >     nfs.disable: on
> >     >     ----
> >     >     (I have edited this a bit for privacy of my customer).
> >     >
> >     >     I think they have used glusterfs because it can do ACLs.
> >     >
> >     >     With regards,
> >     >     Paul van der Vlis
> >     >
> >     >
> >     >     Op 15-05-19 om 14:59 schreef Strahil Nikolov:
> >     >     > Most probably you use sharding , which splits the files into
> >     smaller
> >     >     > chunks so you can fit a 1TB file into gluster nodes with
> >     bricks of
> >     >     > smaller size.
> >     >     > So if you have 2 dispersed servers each having 500Gb
> >     brick->  without
> >     >     > sharding you won't be able to store files larger than the
> >     brick size -
> >     >     > no matter you have free space on the other server.
> >     >     >
> >     >     > When sharding is enabled - you will see on the brick the
> first
> >     >     shard as
> >     >     > a file and the rest is in a hidden folder called ".shards"
> (or
> >     >     something
> >     >     > like that).
> >     >     >
> >     >     > The benefit is also viewable when you need to do some
> >     maintenance on a
> >     >     > gluster node, as you will need to heal only the shards
> >     containing
> >     >     > modified by the customers' data.
> >     >     >
> >     >     > Best Regards,
> >     >     > Strahil Nikolov
> >     >     >
> >     >     >
> >     >     > ? ?????, 15 ??? 2019 ?., 7:31:39 ?. ???????-4, Paul van der
> Vlis
> >     >     > <paul at vandervlis.nl <mailto:paul at vandervlis.nl>
> >     <mailto:paul at vandervlis.nl <mailto:paul at vandervlis.nl>>> ??????:
> >     >     >
> >     >     >
> >     >     > Hello,
> >     >     >
> >     >     > I am the new sysadmin of an organization what uses Glusterfs.
> >     >     > I did not set it up, and I don't know much about Glusterfs.
> >     >     >
> >     >     > What I do not understand is that I do not see all data in
> >     the mount.
> >     >     > Not as root, not as a normal user who has privileges.
> >     >     >
> >     >     > When I do "ls" in one of the subdirectories I don't see any
> >     data, but
> >     >     > this data exists at the server!
> >     >     >
> >     >     > In another subdirectory I see everything fine, the rights of
> the
> >     >     > directories and files inside are the same.
> >     >     >
> >     >     > I mount with something like:
> >     >     > /bin/mount -t glusterfs -o acl 10.8.0.1:/data /data
> >     >     > I see data in /data/VOORBEELD/, and I don't see any data in
> >     >     /data/ALGEMEEN/.
> >     >     >
> >     >     > I don't see something special in /etc/exports or in
> >     /etc/glusterfs on
> >     >     > the server.
> >     >     >
> >     >     > Is there maybe a mechanism in Glusterfs what can exclude
> >     data from
> >     >     > export?  Or is there a way to debug this problem?
> >     >     >
> >     >     > With regards,
> >     >     > Paul van der Vlis
> >     >     >
> >     >     > ----
> >     >     > # file: VOORBEELD
> >     >     > # owner: root
> >     >     > # group: secretariaat
> >     >     > # flags: -s-
> >     >     > user::rwx
> >     >     > group::rwx
> >     >     > group:medewerkers:r-x
> >     >     > mask::rwx
> >     >     > other::---
> >     >     > default:user::rwx
> >     >     > default:group::rwx
> >     >     > default:group:medewerkers:r-x
> >     >     > default:mask::rwx
> >     >     > default:other::---
> >     >     >
> >     >     > # file: ALGEMEEN
> >     >     > # owner: root
> >     >     > # group: secretariaat
> >     >     > # flags: -s-
> >     >     > user::rwx
> >     >     > group::rwx
> >     >     > group:medewerkers:r-x
> >     >     > mask::rwx
> >     >     > other::---
> >     >     > default:user::rwx
> >     >     > default:group::rwx
> >     >     > default:group:medewerkers:r-x
> >     >     > default:mask::rwx
> >     >     > default:other::---
> >     >     > ------
> >     >     >
> >     >     >
> >     >     >
> >     >     >
> >     >     >
> >     >     > --
> >     >     > Paul van der Vlis Linux systeembeheer Groningen
> >     >     > https://www.vandervlis.nl/
> >     >     > _______________________________________________
> >     >     > Gluster-users mailing list
> >     >     > Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
> >     <mailto:Gluster-users at gluster.org <mailto:Gluster-users at gluster.org
> >>
> >     >     <mailto:Gluster-users at gluster.org
> >     <mailto:Gluster-users at gluster.org> <mailto:Gluster-users at gluster.org
> >     <mailto:Gluster-users at gluster.org>>>
> >     >     > https://lists.gluster.org/mailman/listinfo/gluster-users
> >     >
> >     >
> >     >
> >     >     --
> >     >     Paul van der Vlis Linux systeembeheer Groningen
> >     >     https://www.vandervlis.nl/
> >     >     _______________________________________________
> >     >     Gluster-users mailing list
> >     >     Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
> >     <mailto:Gluster-users at gluster.org <mailto:Gluster-users at gluster.org
> >>
> >     >     https://lists.gluster.org/mailman/listinfo/gluster-users
> >     >
> >
> >
> >
> >     --
> >     Paul van der Vlis Linux systeembeheer Groningen
> >     https://www.vandervlis.nl/
> >
>
>
>
> --
> Paul van der Vlis Linux systeembeheer Groningen
> https://www.vandervlis.nl/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190516/f3dc3e69/attachment.html>

From chrmeyer at chrmeyer.de  Thu May 16 09:27:15 2019
From: chrmeyer at chrmeyer.de (Christian Meyer)
Date: Thu, 16 May 2019 11:27:15 +0200
Subject: [Gluster-users] Memory leak in gluster 5.4
Message-ID: <CAKof758Nby8QWFiPvXPQEEFZOF6CBhXFH05mu5kuh0gM9QuHCg@mail.gmail.com>

Hi everyone!

I'm using a Gluster 5.4 Setup with three Nodes and three volumes
 (one is the gluster shared storage). The other are replicated volumes.
Each node has 64GB of RAM.
Over the time of ~2 month the memory consumption of glusterd grow
linear. An the end glusterd used ~45% of RAM the brick processes
together ~43% of RAM.
I think this is a memory leak.

I made a coredump of the processes (glusterd, bricks) (zipped ~500MB),
hope this will help to find the problem.

Could someone please have a look on it?

Download Coredumps:
https://s3.eu-central-1.amazonaws.com/glusterlogs/gluster_coredump.zip

Kind regards

Christian

From spisla80 at gmail.com  Thu May 16 09:36:04 2019
From: spisla80 at gmail.com (David Spisla)
Date: Thu, 16 May 2019 11:36:04 +0200
Subject: [Gluster-users] Brick-Xlators crashes after Set-RO and Read
In-Reply-To: <CAHn=sVA0iC8izWBvctks_+L4b921QKx1fmgV7cJB1aXFMBF=JA@mail.gmail.com>
References: <CAJyj9j9wLf7BLJOP8Se5NrXr2ySPw5bTMY8OQ+RU0f4tCzAXcA@mail.gmail.com>
	<CAHn=sVAm-9GXBYst2P0k3rn-HU15Rw3HgUAGY0R=MOymDpxq-A@mail.gmail.com>
	<CAJyj9j83TOBvaBCW+biCoo0uxhf+0nZed1cTFYCJfmu2s-ZsNA@mail.gmail.com>
	<CAHn=sVCh1nPCvUvZT2tmnn7NF8WztO9D9PUiNa5mDb5ORy+8Yw@mail.gmail.com>
	<CAJyj9j-h=3y+uQhSZvvh=31GUQX5eAqOo3FCXqECpE6QADVm_A@mail.gmail.com>
	<CAHn=sVA0iC8izWBvctks_+L4b921QKx1fmgV7cJB1aXFMBF=JA@mail.gmail.com>
Message-ID: <CAJyj9j-1YYmSm0JVr8U7NsA+4TPQ1tAqeYHD-9BzG2oYyg5kEQ@mail.gmail.com>

Hello Vijay,

yes, we are using custom patches. It s a helper function, which is defined
in xlator_helper.c and used in worm_lookup_cbk.
Do you think this could be the problem? The functions only manipulates the
atime in struct iattr

Regards
David Spisla

Am Do., 16. Mai 2019 um 10:05 Uhr schrieb Vijay Bellur <vbellur at redhat.com>:

> Hello David,
>
> Do you have any custom patches in your deployment? I looked up v5.5 but
> could not find the following functions referred to in the core:
>
> map_atime_from_server()
> worm_lookup_cbk()
>
> Neither do I see xlator_helper.c in the codebase.
>
> Thanks,
> Vijay
>
>
> #0  map_atime_from_server (this=0x7fdef401af00, stbuf=0x0) at
> ../../../../xlators/lib/src/xlator_helper.c:21
>         __FUNCTION__ = "map_to_atime_from_server"
> #1  0x00007fdef39a0382 in worm_lookup_cbk (frame=frame at entry=0x7fdeac0015c8,
> cookie=<optimized out>, this=0x7fdef401af00, op_ret=op_ret at entry=-1,
> op_errno=op_errno at entry=13,
>     inode=inode at entry=0x0, buf=0x0, xdata=0x0, postparent=0x0) at
> worm.c:531
>         priv = 0x7fdef4075378
>         ret = 0
>         __FUNCTION__ = "worm_lookup_cbk"
>
> On Thu, May 16, 2019 at 12:53 AM David Spisla <spisla80 at gmail.com> wrote:
>
>> Hello Vijay,
>>
>> I could reproduce the issue. After doing a simple DIR Listing from Win10
>> powershell, all brick processes crashes. Its not the same scenario
>> mentioned before but the crash report in the bricks log is the same.
>> Attached you find the backtrace.
>>
>> Regards
>> David Spisla
>>
>> Am Di., 7. Mai 2019 um 20:08 Uhr schrieb Vijay Bellur <vbellur at redhat.com
>> >:
>>
>>> Hello David,
>>>
>>> On Tue, May 7, 2019 at 2:16 AM David Spisla <spisla80 at gmail.com> wrote:
>>>
>>>> Hello Vijay,
>>>>
>>>> how can I create such a core file? Or will it be created automatically
>>>> if a gluster process crashes?
>>>> Maybe you can give me a hint and will try to get a backtrace.
>>>>
>>>
>>> Generation of core file is dependent on the system configuration.  `man
>>> 5 core` contains useful information to generate a core file in a directory.
>>> Once a core file is generated, you can use gdb to get a backtrace of all
>>> threads (using "thread apply all bt full").
>>>
>>>
>>>> Unfortunately this bug is not easy to reproduce because it appears only
>>>> sometimes.
>>>>
>>>
>>> If the bug is not easy to reproduce, having a backtrace from the
>>> generated core would be very useful!
>>>
>>> Thanks,
>>> Vijay
>>>
>>>
>>>>
>>>> Regards
>>>> David Spisla
>>>>
>>>> Am Mo., 6. Mai 2019 um 19:48 Uhr schrieb Vijay Bellur <
>>>> vbellur at redhat.com>:
>>>>
>>>>> Thank you for the report, David. Do you have core files available on
>>>>> any of the servers? If yes, would it be possible for you to provide a
>>>>> backtrace.
>>>>>
>>>>> Regards,
>>>>> Vijay
>>>>>
>>>>> On Mon, May 6, 2019 at 3:09 AM David Spisla <spisla80 at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hello folks,
>>>>>>
>>>>>> we have a client application (runs on Win10) which does some FOPs on
>>>>>> a gluster volume which is accessed by SMB.
>>>>>>
>>>>>> *Scenario 1* is a READ Operation which reads all files successively
>>>>>> and checks if the files data was correctly copied. While doing this, all
>>>>>> brick processes crashes and in the logs one have this crash report on every
>>>>>> brick log:
>>>>>>
>>>>>>> CTX_ID:a0359502-2c76-4fee-8cb9-365679dc690e-GRAPH_ID:0-PID:32934-HOST:XX-XXXXX-XX-XX-PC_NAME:shortterm-client-2-RECON_NO:-0, gfid: 00000000-0000-0000-0000-000000000001, req(uid:2000,gid:2000,perm:1,ngrps:1), ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission denied]
>>>>>>> pending frames:
>>>>>>> frame : type(0) op(27)
>>>>>>> frame : type(0) op(40)
>>>>>>> patchset: git://git.gluster.org/glusterfs.git
>>>>>>> signal received: 11
>>>>>>> time of crash:
>>>>>>> 2019-04-16 08:32:21
>>>>>>> configuration details:
>>>>>>> argp 1
>>>>>>> backtrace 1
>>>>>>> dlfcn 1
>>>>>>> libpthread 1
>>>>>>> llistxattr 1
>>>>>>> setfsid 1
>>>>>>> spinlock 1
>>>>>>> epoll.h 1
>>>>>>> xattr.h 1
>>>>>>> st_atim.tv_nsec 1
>>>>>>> package-string: glusterfs 5.5
>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f9a5bd4d64c]
>>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f9a5bd57d26]
>>>>>>> /lib64/libc.so.6(+0x361a0)[0x7f9a5af141a0]
>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7f9a4ef0e910]
>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7f9a4ef0b118]
>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7f9a4f1278d6]
>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7f9a4f35975b]
>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7f9a4f1203b3]
>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7f9a4ef0b5b2]
>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c]
>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c]
>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7f9a4e8cf548]
>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7f9a5bdefc22]
>>>>>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7f9a5bd733a5]
>>>>>>> /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7f9a4e6b7088]
>>>>>>> /lib64/libpthread.so.0(+0x7569)[0x7f9a5b29f569]
>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7f9a5afd69af]
>>>>>>>
>>>>>>> *Scenario 2 *The application just SET Read-Only on each file
>>>>>> sucessively. After the 70th file was set, all the bricks crashes and again,
>>>>>> one can read this crash report in every brick log:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> [2019-05-02 07:43:39.953591] I [MSGID: 139001]
>>>>>>> [posix-acl.c:263:posix_acl_log_permit_denied] 0-longterm-access-control:
>>>>>>> client:
>>>>>>> CTX_ID:21aa9c75-3a5f-41f9-925b-48e4c80bd24a-GRAPH_ID:0-PID:16325-HOST:XXX-X-X-XXX-PC_NAME:longterm-client-0-RECON_NO:-0,
>>>>>>> gfid: 00000000-0000-0000-0000-000000000001,
>>>>>>> req(uid:2000,gid:2000,perm:1,ngrps:1),
>>>>>>> ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission
>>>>>>> denied]
>>>>>>>
>>>>>>> pending frames:
>>>>>>>
>>>>>>> frame : type(0) op(27)
>>>>>>>
>>>>>>> patchset: git://git.gluster.org/glusterfs.git
>>>>>>>
>>>>>>> signal received: 11
>>>>>>>
>>>>>>> time of crash:
>>>>>>>
>>>>>>> 2019-05-02 07:43:39
>>>>>>>
>>>>>>> configuration details:
>>>>>>>
>>>>>>> argp 1
>>>>>>>
>>>>>>> backtrace 1
>>>>>>>
>>>>>>> dlfcn 1
>>>>>>>
>>>>>>> libpthread 1
>>>>>>>
>>>>>>> llistxattr 1
>>>>>>>
>>>>>>> setfsid 1
>>>>>>>
>>>>>>> spinlock 1
>>>>>>>
>>>>>>> epoll.h 1
>>>>>>>
>>>>>>> xattr.h 1
>>>>>>>
>>>>>>> st_atim.tv_nsec 1
>>>>>>>
>>>>>>> package-string: glusterfs 5.5
>>>>>>>
>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fbb3f0b364c]
>>>>>>>
>>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fbb3f0bdd26]
>>>>>>>
>>>>>>> /lib64/libc.so.6(+0x361e0)[0x7fbb3e27a1e0]
>>>>>>>
>>>>>>>
>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7fbb32257910]
>>>>>>>
>>>>>>>
>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7fbb32254118]
>>>>>>>
>>>>>>>
>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7fbb324708d6]
>>>>>>>
>>>>>>>
>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7fbb326a275b]
>>>>>>>
>>>>>>>
>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7fbb324693b3]
>>>>>>>
>>>>>>>
>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7fbb322545b2]
>>>>>>>
>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c]
>>>>>>>
>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c]
>>>>>>>
>>>>>>>
>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7fbb31c18548]
>>>>>>>
>>>>>>>
>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7fbb3f155c22]
>>>>>>>
>>>>>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7fbb3f0d93a5]
>>>>>>>
>>>>>>>
>>>>>>> /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7fbb31a00088]
>>>>>>>
>>>>>>> /lib64/libpthread.so.0(+0x7569)[0x7fbb3e605569]
>>>>>>>
>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fbb3e33c9ef]
>>>>>>>
>>>>>>
>>>>>> This happens on a 3-Node Gluster v5.5 Cluster on two different
>>>>>> volumes. But both volumes has the same settings:
>>>>>>
>>>>>>> Volume Name: shortterm
>>>>>>> Type: Replicate
>>>>>>> Volume ID: 5307e5c5-e8a1-493a-a846-342fb0195dee
>>>>>>> Status: Started
>>>>>>> Snapshot Count: 0
>>>>>>> Number of Bricks: 1 x 3 = 3
>>>>>>> Transport-type: tcp
>>>>>>> Bricks:
>>>>>>> Brick1: fs-xxxxx-c1-n1:/gluster/brick4/glusterbrick
>>>>>>> Brick2: fs-xxxxx-c1-n2:/gluster/brick4/glusterbrick
>>>>>>> Brick3: fs-xxxxx-c1-n3:/gluster/brick4/glusterbrick
>>>>>>> Options Reconfigured:
>>>>>>> storage.reserve: 1
>>>>>>> performance.client-io-threads: off
>>>>>>> nfs.disable: on
>>>>>>> transport.address-family: inet
>>>>>>> user.smb: disable
>>>>>>> features.read-only: off
>>>>>>> features.worm: off
>>>>>>> features.worm-file-level: on
>>>>>>> features.retention-mode: enterprise
>>>>>>> features.default-retention-period: 120
>>>>>>> network.ping-timeout: 10
>>>>>>> features.cache-invalidation: on
>>>>>>> features.cache-invalidation-timeout: 600
>>>>>>> performance.nl-cache: on
>>>>>>> performance.nl-cache-timeout: 600
>>>>>>> client.event-threads: 32
>>>>>>> server.event-threads: 32
>>>>>>> cluster.lookup-optimize: on
>>>>>>> performance.stat-prefetch: on
>>>>>>> performance.cache-invalidation: on
>>>>>>> performance.md-cache-timeout: 600
>>>>>>> performance.cache-samba-metadata: on
>>>>>>> performance.cache-ima-xattrs: on
>>>>>>> performance.io-thread-count: 64
>>>>>>> cluster.use-compound-fops: on
>>>>>>> performance.cache-size: 512MB
>>>>>>> performance.cache-refresh-timeout: 10
>>>>>>> performance.read-ahead: off
>>>>>>> performance.write-behind-window-size: 4MB
>>>>>>> performance.write-behind: on
>>>>>>> storage.build-pgfid: on
>>>>>>> features.utime: on
>>>>>>> storage.ctime: on
>>>>>>> cluster.quorum-type: fixed
>>>>>>> cluster.quorum-count: 2
>>>>>>> features.bitrot: on
>>>>>>> features.scrub: Active
>>>>>>> features.scrub-freq: daily
>>>>>>> cluster.enable-shared-storage: enable
>>>>>>>
>>>>>>>
>>>>>> Why can this happen to all Brick processes? I don't understand the
>>>>>> crash report. The FOPs are nothing special and after restart brick
>>>>>> processes everything works fine and our application was succeed.
>>>>>>
>>>>>> Regards
>>>>>> David Spisla
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Gluster-users mailing list
>>>>>> Gluster-users at gluster.org
>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190516/56f6e6a8/attachment.html>

From spisla80 at gmail.com  Thu May 16 10:42:27 2019
From: spisla80 at gmail.com (David Spisla)
Date: Thu, 16 May 2019 12:42:27 +0200
Subject: [Gluster-users] [Gluster-devel] Improve stability between
 SMB/CTDB and Gluster (together with Samba Core Developer)
In-Reply-To: <CAHxyDdNsZRngAYBY+fxR58X12QhkVDyiBRCRRFXRCdya1s1Sgw@mail.gmail.com>
References: <CAJyj9j_BofU7ziSNmTQWzcs4r9xLzzTNKetk+nhHay70T_z1qA@mail.gmail.com>
	<CALXm_dFHHX7Sn8YvpemN1B6YPPB4W+4hZ+50NfTrDaijqejk4Q@mail.gmail.com>
	<AM6PR04MB6006428E20D24F6B883BD12EF10F0@AM6PR04MB6006.eurprd04.prod.outlook.com>
	<CAJyj9j_e=AM=1OoZ6T8jeF3B0pewB8Tft2TPro1X6dmKjzQ+eg@mail.gmail.com>
	<CAHxyDdNsZRngAYBY+fxR58X12QhkVDyiBRCRRFXRCdya1s1Sgw@mail.gmail.com>
Message-ID: <CAJyj9j-EPt9g9pPJDzt4Ztnx420=_+cnVGKdDTDSEUdHm7cr8Q@mail.gmail.com>

Hello Amar,

thank you for the information. Of course, we should wait for Poornima
because of her knowledge.

Regards
David Spisla

Am Do., 16. Mai 2019 um 12:23 Uhr schrieb Amar Tumballi Suryanarayan <
atumball at redhat.com>:

> David, Poornima is on leave from today till 21st May. So having it after
> she comes back is better. She has more experience in SMB integration than
> many of us.
>
> -Amar
>
> On Thu, May 16, 2019 at 1:09 PM David Spisla <spisla80 at gmail.com> wrote:
>
>> Hello everyone,
>>
>> if there is any problem in finding a date and time, please contact me. It
>> would be fine to have a meeting soon.
>>
>> Regards
>> David Spisla
>>
>> Am Mo., 13. Mai 2019 um 12:38 Uhr schrieb David Spisla <
>> david.spisla at iternity.com>:
>>
>>> Hi Poornima,
>>>
>>>
>>>
>>> thats fine. I would suggest this dates and times:
>>>
>>>
>>>
>>> May 15th ? 17th at 12:30, 13:30, 14:30 IST (9:00, 10:00, 11:00 CEST)
>>>
>>> May 20th ? 24th at 12:30, 13:30, 14:30 IST (9:00, 10:00, 11:00 CEST)
>>>
>>>
>>>
>>> I add Volker Lendecke from Sernet to the mail. He is the Samba Expert.
>>>
>>> Can someone of you provide a host via bluejeans.com? If not, I will try
>>> it with GoToMeeting (https://www.gotomeeting.com).
>>>
>>>
>>>
>>> @all Please write your prefered dates and times. For me, all oft the
>>> above dates and times are fine
>>>
>>>
>>>
>>> Regards
>>>
>>> David
>>>
>>>
>>>
>>>
>>>
>>> *Von:* Poornima Gurusiddaiah <pgurusid at redhat.com>
>>> *Gesendet:* Montag, 13. Mai 2019 07:22
>>> *An:* David Spisla <spisla80 at gmail.com>; Anoop C S <anoopcs at redhat.com>;
>>> Gunther Deschner <gdeschne at redhat.com>
>>> *Cc:* Gluster Devel <gluster-devel at gluster.org>;
>>> gluster-users at gluster.org List <gluster-users at gluster.org>
>>> *Betreff:* Re: [Gluster-devel] Improve stability between SMB/CTDB and
>>> Gluster (together with Samba Core Developer)
>>>
>>>
>>>
>>> Hi,
>>>
>>>
>>>
>>> We would be definitely interested in this. Thank you for contacting us.
>>> For the starter we can have an online conference. Please suggest few
>>> possible date and times for the week(preferably between IST 7.00AM -
>>> 9.PM)?
>>>
>>> Adding Anoop and Gunther who are also the main contributors to the
>>> Gluster-Samba integration.
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Poornima
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, May 9, 2019 at 7:43 PM David Spisla <spisla80 at gmail.com> wrote:
>>>
>>> Dear Gluster Community,
>>>
>>> at the moment we are improving the stability of SMB/CTDB and Gluster.
>>> For this purpose we are working together with an advanced SAMBA Core
>>> Developer. He did some debugging but needs more information about Gluster
>>> Core Behaviour.
>>>
>>>
>>>
>>> *Would any of the Gluster Developer wants to have a online conference
>>> with him and me?*
>>>
>>>
>>>
>>> I would organize everything. In my opinion this is a good chance to
>>> improve stability of Glusterfs and this is at the moment one of the major
>>> issues in the Community.
>>>
>>>
>>>
>>> Regards
>>>
>>> David Spisla
>>>
>>> _______________________________________________
>>>
>>> Community Meeting Calendar:
>>>
>>> APAC Schedule -
>>> Every 2nd and 4th Tuesday at 11:30 AM IST
>>> Bridge: https://bluejeans.com/836554017
>>>
>>> NA/EMEA Schedule -
>>> Every 1st and 3rd Tuesday at 01:00 PM EDT
>>> Bridge: https://bluejeans.com/486278655
>>>
>>> Gluster-devel mailing list
>>> Gluster-devel at gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>>
>>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
> --
> Amar Tumballi (amarts)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190516/db1686ff/attachment.html>

From order at rikus.com  Thu May 16 20:50:16 2019
From: order at rikus.com (Jeff Bischoff)
Date: Thu, 16 May 2019 16:50:16 -0400
Subject: [Gluster-users] How to prevent Brick terminated by socket
	temporarily unavailable
Message-ID: <0F6B141E-3903-4A8E-8BA5-F2925C782905@rikus.com>

I'm having a frequent problem where some temporary condition causes bricks to be shut down. The health-check feature is shutting them down, and according to https://staged-gluster-docs.readthedocs.io/en/release3.7.0beta1/Features/brick-failure-detection/ the brick will stay off and not be restarted (by design).

 
What I don't understand is:
What is causing this "Resource temporarily unavailable" in the first place. From searching the web, it sounds like a socket timeout. Have you guys seen this before?
If this is truly a temporary failure, why do we shut down the brick indefinitely?
 

Should I try any of the following:
Increase 'network.ping-timeout' or 'client.grace-timeout'
Disable the health check feature by setting:
# gluster volume set <VOLNAME> storage.health-check-interval 0

 
The brick log looks like this at the time it is shut down:

------------------

[2019-05-08 13:48:33.642605] W [MSGID: 113075] [posix-helpers.c:1895:posix_fs_health_check] 0-heketidbstorage-posix: aio_write() on /var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_a16f9f0374fe5db948a60a017a3f5e60/brick/.glusterfs/health_check returned [Resource temporarily unavailable]

[2019-05-08 13:48:33.749246] M [MSGID: 113075] [posix-helpers.c:1962:posix_health_check_thread_proc] 0-heketidbstorage-posix: health-check failed, going down

[2019-05-08 13:48:34.000428] M [MSGID: 113075] [posix-helpers.c:1981:posix_health_check_thread_proc] 0-heketidbstorage-posix: still alive! -> SIGTERM

[2019-05-08 13:49:04.597061] W [glusterfsd.c:1514:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f16fdd94dd5] -->/usr/sbin/glusterfsd(glusterfs_sigwaiter+0xe5) [0x556e53da2d65] -->/usr/sbin/glusterfsd(cleanup_and_exit+0x6b) [0x556e53da2b8b] ) 0-: received signum (15), shutting down

------------------

 
The GlusterD log shows this shortly after:

 
------------------
[2019-05-08 13:49:04.673536] I [MSGID: 106143] [glusterd-pmap.c:397:pmap_registry_remove] 0-pmap: removing brick /var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_a16f9f0374fe5db948a60a017a3f5e60/brick on port
 49152
[2019-05-08 13:49:05.003848] W [socket.c:599:__socket_rwv] 0-management: readv on /var/run/gluster/fe4ac75011a4de0e.socket failed (No data available)
------------------

 
Any guidance would be greatly appreciated!

 
Best,

 
Jeff Bischoff

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190516/a433ed15/attachment.html>

From vbellur at redhat.com  Thu May 16 23:50:50 2019
From: vbellur at redhat.com (Vijay Bellur)
Date: Thu, 16 May 2019 16:50:50 -0700
Subject: [Gluster-users] Brick-Xlators crashes after Set-RO and Read
In-Reply-To: <CAJyj9j-1YYmSm0JVr8U7NsA+4TPQ1tAqeYHD-9BzG2oYyg5kEQ@mail.gmail.com>
References: <CAJyj9j9wLf7BLJOP8Se5NrXr2ySPw5bTMY8OQ+RU0f4tCzAXcA@mail.gmail.com>
	<CAHn=sVAm-9GXBYst2P0k3rn-HU15Rw3HgUAGY0R=MOymDpxq-A@mail.gmail.com>
	<CAJyj9j83TOBvaBCW+biCoo0uxhf+0nZed1cTFYCJfmu2s-ZsNA@mail.gmail.com>
	<CAHn=sVCh1nPCvUvZT2tmnn7NF8WztO9D9PUiNa5mDb5ORy+8Yw@mail.gmail.com>
	<CAJyj9j-h=3y+uQhSZvvh=31GUQX5eAqOo3FCXqECpE6QADVm_A@mail.gmail.com>
	<CAHn=sVA0iC8izWBvctks_+L4b921QKx1fmgV7cJB1aXFMBF=JA@mail.gmail.com>
	<CAJyj9j-1YYmSm0JVr8U7NsA+4TPQ1tAqeYHD-9BzG2oYyg5kEQ@mail.gmail.com>
Message-ID: <CAHn=sVBr8LO4eoePoOR2Kzd9Ac4EzfnKOUn-nCXb_GvuKwkLAQ@mail.gmail.com>

Hello David,

>From the backtrace it looks like stbuf is NULL in map_atime_from_server()
as  worm_lookup_cbk has got an error (op_ret = -1, op_errno = 13). Can you
please check if there is an unconditional dereference of stbuf in
map_atime_from_server()?

Regards,
Vijay

On Thu, May 16, 2019 at 2:36 AM David Spisla <spisla80 at gmail.com> wrote:

> Hello Vijay,
>
> yes, we are using custom patches. It s a helper function, which is defined
> in xlator_helper.c and used in worm_lookup_cbk.
> Do you think this could be the problem? The functions only manipulates the
> atime in struct iattr
>
> Regards
> David Spisla
>
> Am Do., 16. Mai 2019 um 10:05 Uhr schrieb Vijay Bellur <vbellur at redhat.com
> >:
>
>> Hello David,
>>
>> Do you have any custom patches in your deployment? I looked up v5.5 but
>> could not find the following functions referred to in the core:
>>
>> map_atime_from_server()
>> worm_lookup_cbk()
>>
>> Neither do I see xlator_helper.c in the codebase.
>>
>> Thanks,
>> Vijay
>>
>>
>> #0  map_atime_from_server (this=0x7fdef401af00, stbuf=0x0) at
>> ../../../../xlators/lib/src/xlator_helper.c:21
>>         __FUNCTION__ = "map_to_atime_from_server"
>> #1  0x00007fdef39a0382 in worm_lookup_cbk (frame=frame at entry=0x7fdeac0015c8,
>> cookie=<optimized out>, this=0x7fdef401af00, op_ret=op_ret at entry=-1,
>> op_errno=op_errno at entry=13,
>>     inode=inode at entry=0x0, buf=0x0, xdata=0x0, postparent=0x0) at
>> worm.c:531
>>         priv = 0x7fdef4075378
>>         ret = 0
>>         __FUNCTION__ = "worm_lookup_cbk"
>>
>> On Thu, May 16, 2019 at 12:53 AM David Spisla <spisla80 at gmail.com> wrote:
>>
>>> Hello Vijay,
>>>
>>> I could reproduce the issue. After doing a simple DIR Listing from Win10
>>> powershell, all brick processes crashes. Its not the same scenario
>>> mentioned before but the crash report in the bricks log is the same.
>>> Attached you find the backtrace.
>>>
>>> Regards
>>> David Spisla
>>>
>>> Am Di., 7. Mai 2019 um 20:08 Uhr schrieb Vijay Bellur <
>>> vbellur at redhat.com>:
>>>
>>>> Hello David,
>>>>
>>>> On Tue, May 7, 2019 at 2:16 AM David Spisla <spisla80 at gmail.com> wrote:
>>>>
>>>>> Hello Vijay,
>>>>>
>>>>> how can I create such a core file? Or will it be created automatically
>>>>> if a gluster process crashes?
>>>>> Maybe you can give me a hint and will try to get a backtrace.
>>>>>
>>>>
>>>> Generation of core file is dependent on the system configuration.  `man
>>>> 5 core` contains useful information to generate a core file in a directory.
>>>> Once a core file is generated, you can use gdb to get a backtrace of all
>>>> threads (using "thread apply all bt full").
>>>>
>>>>
>>>>> Unfortunately this bug is not easy to reproduce because it appears
>>>>> only sometimes.
>>>>>
>>>>
>>>> If the bug is not easy to reproduce, having a backtrace from the
>>>> generated core would be very useful!
>>>>
>>>> Thanks,
>>>> Vijay
>>>>
>>>>
>>>>>
>>>>> Regards
>>>>> David Spisla
>>>>>
>>>>> Am Mo., 6. Mai 2019 um 19:48 Uhr schrieb Vijay Bellur <
>>>>> vbellur at redhat.com>:
>>>>>
>>>>>> Thank you for the report, David. Do you have core files available on
>>>>>> any of the servers? If yes, would it be possible for you to provide a
>>>>>> backtrace.
>>>>>>
>>>>>> Regards,
>>>>>> Vijay
>>>>>>
>>>>>> On Mon, May 6, 2019 at 3:09 AM David Spisla <spisla80 at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hello folks,
>>>>>>>
>>>>>>> we have a client application (runs on Win10) which does some FOPs on
>>>>>>> a gluster volume which is accessed by SMB.
>>>>>>>
>>>>>>> *Scenario 1* is a READ Operation which reads all files successively
>>>>>>> and checks if the files data was correctly copied. While doing this, all
>>>>>>> brick processes crashes and in the logs one have this crash report on every
>>>>>>> brick log:
>>>>>>>
>>>>>>>> CTX_ID:a0359502-2c76-4fee-8cb9-365679dc690e-GRAPH_ID:0-PID:32934-HOST:XX-XXXXX-XX-XX-PC_NAME:shortterm-client-2-RECON_NO:-0, gfid: 00000000-0000-0000-0000-000000000001, req(uid:2000,gid:2000,perm:1,ngrps:1), ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission denied]
>>>>>>>> pending frames:
>>>>>>>> frame : type(0) op(27)
>>>>>>>> frame : type(0) op(40)
>>>>>>>> patchset: git://git.gluster.org/glusterfs.git
>>>>>>>> signal received: 11
>>>>>>>> time of crash:
>>>>>>>> 2019-04-16 08:32:21
>>>>>>>> configuration details:
>>>>>>>> argp 1
>>>>>>>> backtrace 1
>>>>>>>> dlfcn 1
>>>>>>>> libpthread 1
>>>>>>>> llistxattr 1
>>>>>>>> setfsid 1
>>>>>>>> spinlock 1
>>>>>>>> epoll.h 1
>>>>>>>> xattr.h 1
>>>>>>>> st_atim.tv_nsec 1
>>>>>>>> package-string: glusterfs 5.5
>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f9a5bd4d64c]
>>>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f9a5bd57d26]
>>>>>>>> /lib64/libc.so.6(+0x361a0)[0x7f9a5af141a0]
>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7f9a4ef0e910]
>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7f9a4ef0b118]
>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7f9a4f1278d6]
>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7f9a4f35975b]
>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7f9a4f1203b3]
>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7f9a4ef0b5b2]
>>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c]
>>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c]
>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7f9a4e8cf548]
>>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7f9a5bdefc22]
>>>>>>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7f9a5bd733a5]
>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7f9a4e6b7088]
>>>>>>>> /lib64/libpthread.so.0(+0x7569)[0x7f9a5b29f569]
>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7f9a5afd69af]
>>>>>>>>
>>>>>>>> *Scenario 2 *The application just SET Read-Only on each file
>>>>>>> sucessively. After the 70th file was set, all the bricks crashes and again,
>>>>>>> one can read this crash report in every brick log:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> [2019-05-02 07:43:39.953591] I [MSGID: 139001]
>>>>>>>> [posix-acl.c:263:posix_acl_log_permit_denied] 0-longterm-access-control:
>>>>>>>> client:
>>>>>>>> CTX_ID:21aa9c75-3a5f-41f9-925b-48e4c80bd24a-GRAPH_ID:0-PID:16325-HOST:XXX-X-X-XXX-PC_NAME:longterm-client-0-RECON_NO:-0,
>>>>>>>> gfid: 00000000-0000-0000-0000-000000000001,
>>>>>>>> req(uid:2000,gid:2000,perm:1,ngrps:1),
>>>>>>>> ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission
>>>>>>>> denied]
>>>>>>>>
>>>>>>>> pending frames:
>>>>>>>>
>>>>>>>> frame : type(0) op(27)
>>>>>>>>
>>>>>>>> patchset: git://git.gluster.org/glusterfs.git
>>>>>>>>
>>>>>>>> signal received: 11
>>>>>>>>
>>>>>>>> time of crash:
>>>>>>>>
>>>>>>>> 2019-05-02 07:43:39
>>>>>>>>
>>>>>>>> configuration details:
>>>>>>>>
>>>>>>>> argp 1
>>>>>>>>
>>>>>>>> backtrace 1
>>>>>>>>
>>>>>>>> dlfcn 1
>>>>>>>>
>>>>>>>> libpthread 1
>>>>>>>>
>>>>>>>> llistxattr 1
>>>>>>>>
>>>>>>>> setfsid 1
>>>>>>>>
>>>>>>>> spinlock 1
>>>>>>>>
>>>>>>>> epoll.h 1
>>>>>>>>
>>>>>>>> xattr.h 1
>>>>>>>>
>>>>>>>> st_atim.tv_nsec 1
>>>>>>>>
>>>>>>>> package-string: glusterfs 5.5
>>>>>>>>
>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fbb3f0b364c]
>>>>>>>>
>>>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fbb3f0bdd26]
>>>>>>>>
>>>>>>>> /lib64/libc.so.6(+0x361e0)[0x7fbb3e27a1e0]
>>>>>>>>
>>>>>>>>
>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7fbb32257910]
>>>>>>>>
>>>>>>>>
>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7fbb32254118]
>>>>>>>>
>>>>>>>>
>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7fbb324708d6]
>>>>>>>>
>>>>>>>>
>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7fbb326a275b]
>>>>>>>>
>>>>>>>>
>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7fbb324693b3]
>>>>>>>>
>>>>>>>>
>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7fbb322545b2]
>>>>>>>>
>>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c]
>>>>>>>>
>>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c]
>>>>>>>>
>>>>>>>>
>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7fbb31c18548]
>>>>>>>>
>>>>>>>>
>>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7fbb3f155c22]
>>>>>>>>
>>>>>>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7fbb3f0d93a5]
>>>>>>>>
>>>>>>>>
>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7fbb31a00088]
>>>>>>>>
>>>>>>>> /lib64/libpthread.so.0(+0x7569)[0x7fbb3e605569]
>>>>>>>>
>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fbb3e33c9ef]
>>>>>>>>
>>>>>>>
>>>>>>> This happens on a 3-Node Gluster v5.5 Cluster on two different
>>>>>>> volumes. But both volumes has the same settings:
>>>>>>>
>>>>>>>> Volume Name: shortterm
>>>>>>>> Type: Replicate
>>>>>>>> Volume ID: 5307e5c5-e8a1-493a-a846-342fb0195dee
>>>>>>>> Status: Started
>>>>>>>> Snapshot Count: 0
>>>>>>>> Number of Bricks: 1 x 3 = 3
>>>>>>>> Transport-type: tcp
>>>>>>>> Bricks:
>>>>>>>> Brick1: fs-xxxxx-c1-n1:/gluster/brick4/glusterbrick
>>>>>>>> Brick2: fs-xxxxx-c1-n2:/gluster/brick4/glusterbrick
>>>>>>>> Brick3: fs-xxxxx-c1-n3:/gluster/brick4/glusterbrick
>>>>>>>> Options Reconfigured:
>>>>>>>> storage.reserve: 1
>>>>>>>> performance.client-io-threads: off
>>>>>>>> nfs.disable: on
>>>>>>>> transport.address-family: inet
>>>>>>>> user.smb: disable
>>>>>>>> features.read-only: off
>>>>>>>> features.worm: off
>>>>>>>> features.worm-file-level: on
>>>>>>>> features.retention-mode: enterprise
>>>>>>>> features.default-retention-period: 120
>>>>>>>> network.ping-timeout: 10
>>>>>>>> features.cache-invalidation: on
>>>>>>>> features.cache-invalidation-timeout: 600
>>>>>>>> performance.nl-cache: on
>>>>>>>> performance.nl-cache-timeout: 600
>>>>>>>> client.event-threads: 32
>>>>>>>> server.event-threads: 32
>>>>>>>> cluster.lookup-optimize: on
>>>>>>>> performance.stat-prefetch: on
>>>>>>>> performance.cache-invalidation: on
>>>>>>>> performance.md-cache-timeout: 600
>>>>>>>> performance.cache-samba-metadata: on
>>>>>>>> performance.cache-ima-xattrs: on
>>>>>>>> performance.io-thread-count: 64
>>>>>>>> cluster.use-compound-fops: on
>>>>>>>> performance.cache-size: 512MB
>>>>>>>> performance.cache-refresh-timeout: 10
>>>>>>>> performance.read-ahead: off
>>>>>>>> performance.write-behind-window-size: 4MB
>>>>>>>> performance.write-behind: on
>>>>>>>> storage.build-pgfid: on
>>>>>>>> features.utime: on
>>>>>>>> storage.ctime: on
>>>>>>>> cluster.quorum-type: fixed
>>>>>>>> cluster.quorum-count: 2
>>>>>>>> features.bitrot: on
>>>>>>>> features.scrub: Active
>>>>>>>> features.scrub-freq: daily
>>>>>>>> cluster.enable-shared-storage: enable
>>>>>>>>
>>>>>>>>
>>>>>>> Why can this happen to all Brick processes? I don't understand the
>>>>>>> crash report. The FOPs are nothing special and after restart brick
>>>>>>> processes everything works fine and our application was succeed.
>>>>>>>
>>>>>>> Regards
>>>>>>> David Spisla
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Gluster-users mailing list
>>>>>>> Gluster-users at gluster.org
>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>
>>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190516/8b983d16/attachment.html>

From dcunningham at voisonics.com  Fri May 17 00:29:58 2019
From: dcunningham at voisonics.com (David Cunningham)
Date: Fri, 17 May 2019 12:29:58 +1200
Subject: [Gluster-users] add-brick: failed: Commit failed
Message-ID: <CAHGbP-9UtXqtK8=zsf=V+HghhLmsyONSfkt6SZAYQOtFezHBJw@mail.gmail.com>

Hello,

We're adding an arbiter node to an existing volume and having an issue. Can
anyone help? The root cause error appears to be
"00000000-0000-0000-0000-000000000001: failed to resolve (Transport
endpoint is not connected)", as below.

We are running glusterfs 5.6.1. Thanks in advance for any assistance!

On existing node gfs1, trying to add new arbiter node gfs3:

# gluster volume add-brick gvol0 replica 3 arbiter 1
gfs3:/nodirectwritedata/gluster/gvol0
volume add-brick: failed: Commit failed on gfs3. Please check log file for
details.

On new node gfs3 in gvol0-add-brick-mount.log:

[2019-05-17 01:20:22.689721] I [fuse-bridge.c:4267:fuse_init]
0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel
7.22
[2019-05-17 01:20:22.689778] I [fuse-bridge.c:4878:fuse_graph_sync] 0-fuse:
switched to graph 0
[2019-05-17 01:20:22.694897] E [fuse-bridge.c:4336:fuse_first_lookup]
0-fuse: first lookup on root failed (Transport endpoint is not connected)
[2019-05-17 01:20:22.699770] W [fuse-resolve.c:127:fuse_resolve_gfid_cbk]
0-fuse: 00000000-0000-0000-0000-000000000001: failed to resolve (Transport
endpoint is not connected)
[2019-05-17 01:20:22.699834] W [fuse-bridge.c:3294:fuse_setxattr_resume]
0-glusterfs-fuse: 2: SETXATTR 00000000-0000-0000-0000-000000000001/1
(trusted.add-brick) resolution failed
[2019-05-17 01:20:22.715656] I [fuse-bridge.c:5144:fuse_thread_proc]
0-fuse: initating unmount of /tmp/mntQAtu3f
[2019-05-17 01:20:22.715865] W [glusterfsd.c:1500:cleanup_and_exit]
(-->/lib64/libpthread.so.0(+0x7dd5) [0x7fb223bf6dd5]
-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x560886581e75]
-->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x560886581ceb] ) 0-:
received signum (15), shutting down
[2019-05-17 01:20:22.715926] I [fuse-bridge.c:5914:fini] 0-fuse: Unmounting
'/tmp/mntQAtu3f'.
[2019-05-17 01:20:22.715953] I [fuse-bridge.c:5919:fini] 0-fuse: Closing
fuse connection to '/tmp/mntQAtu3f'.

Processes running on new node gfs3:

# ps -ef | grep gluster
root      6832     1  0 20:17 ?        00:00:00 /usr/sbin/glusterd -p
/var/run/glusterd.pid --log-level INFO
root     15799     1  0 20:17 ?        00:00:00 /usr/sbin/glusterfs -s
localhost --volfile-id gluster/glustershd -p
/var/run/gluster/glustershd/glustershd.pid -l
/var/log/glusterfs/glustershd.log -S
/var/run/gluster/24c12b09f93eec8e.socket --xlator-option
*replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412 --process-name
glustershd
root     16856 16735  0 21:21 pts/0    00:00:00 grep --color=auto gluster

-- 
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190517/ed2baf66/attachment.html>

From spisla80 at gmail.com  Fri May 17 07:50:28 2019
From: spisla80 at gmail.com (David Spisla)
Date: Fri, 17 May 2019 09:50:28 +0200
Subject: [Gluster-users] Brick-Xlators crashes after Set-RO and Read
In-Reply-To: <CAHn=sVBr8LO4eoePoOR2Kzd9Ac4EzfnKOUn-nCXb_GvuKwkLAQ@mail.gmail.com>
References: <CAJyj9j9wLf7BLJOP8Se5NrXr2ySPw5bTMY8OQ+RU0f4tCzAXcA@mail.gmail.com>
	<CAHn=sVAm-9GXBYst2P0k3rn-HU15Rw3HgUAGY0R=MOymDpxq-A@mail.gmail.com>
	<CAJyj9j83TOBvaBCW+biCoo0uxhf+0nZed1cTFYCJfmu2s-ZsNA@mail.gmail.com>
	<CAHn=sVCh1nPCvUvZT2tmnn7NF8WztO9D9PUiNa5mDb5ORy+8Yw@mail.gmail.com>
	<CAJyj9j-h=3y+uQhSZvvh=31GUQX5eAqOo3FCXqECpE6QADVm_A@mail.gmail.com>
	<CAHn=sVA0iC8izWBvctks_+L4b921QKx1fmgV7cJB1aXFMBF=JA@mail.gmail.com>
	<CAJyj9j-1YYmSm0JVr8U7NsA+4TPQ1tAqeYHD-9BzG2oYyg5kEQ@mail.gmail.com>
	<CAHn=sVBr8LO4eoePoOR2Kzd9Ac4EzfnKOUn-nCXb_GvuKwkLAQ@mail.gmail.com>
Message-ID: <CAJyj9j8nnU4WivHFiAH_gVTzOwJj=GAo6DTKtF+=Fxq+xadF_A@mail.gmail.com>

Hello Vijay,
thank you for the clarification. Yes, there is an unconditional dereference
in stbuf. It seems plausible that this causes the crash. I think a check
like this should help:

if (buf == NULL) {
        goto out;
}
map_atime_from_server(this, buf);

Is there a reason why buf can be NULL?

Regards
David Spisla


Am Fr., 17. Mai 2019 um 01:51 Uhr schrieb Vijay Bellur <vbellur at redhat.com>:

> Hello David,
>
> From the backtrace it looks like stbuf is NULL in map_atime_from_server()
> as  worm_lookup_cbk has got an error (op_ret = -1, op_errno = 13). Can you
> please check if there is an unconditional dereference of stbuf in
> map_atime_from_server()?
>
> Regards,
> Vijay
>
> On Thu, May 16, 2019 at 2:36 AM David Spisla <spisla80 at gmail.com> wrote:
>
>> Hello Vijay,
>>
>> yes, we are using custom patches. It s a helper function, which is
>> defined in xlator_helper.c and used in worm_lookup_cbk.
>> Do you think this could be the problem? The functions only manipulates
>> the atime in struct iattr
>>
>> Regards
>> David Spisla
>>
>> Am Do., 16. Mai 2019 um 10:05 Uhr schrieb Vijay Bellur <
>> vbellur at redhat.com>:
>>
>>> Hello David,
>>>
>>> Do you have any custom patches in your deployment? I looked up v5.5 but
>>> could not find the following functions referred to in the core:
>>>
>>> map_atime_from_server()
>>> worm_lookup_cbk()
>>>
>>> Neither do I see xlator_helper.c in the codebase.
>>>
>>> Thanks,
>>> Vijay
>>>
>>>
>>> #0  map_atime_from_server (this=0x7fdef401af00, stbuf=0x0) at
>>> ../../../../xlators/lib/src/xlator_helper.c:21
>>>         __FUNCTION__ = "map_to_atime_from_server"
>>> #1  0x00007fdef39a0382 in worm_lookup_cbk (frame=frame at entry=0x7fdeac0015c8,
>>> cookie=<optimized out>, this=0x7fdef401af00, op_ret=op_ret at entry=-1,
>>> op_errno=op_errno at entry=13,
>>>     inode=inode at entry=0x0, buf=0x0, xdata=0x0, postparent=0x0) at
>>> worm.c:531
>>>         priv = 0x7fdef4075378
>>>         ret = 0
>>>         __FUNCTION__ = "worm_lookup_cbk"
>>>
>>> On Thu, May 16, 2019 at 12:53 AM David Spisla <spisla80 at gmail.com>
>>> wrote:
>>>
>>>> Hello Vijay,
>>>>
>>>> I could reproduce the issue. After doing a simple DIR Listing from
>>>> Win10 powershell, all brick processes crashes. Its not the same scenario
>>>> mentioned before but the crash report in the bricks log is the same.
>>>> Attached you find the backtrace.
>>>>
>>>> Regards
>>>> David Spisla
>>>>
>>>> Am Di., 7. Mai 2019 um 20:08 Uhr schrieb Vijay Bellur <
>>>> vbellur at redhat.com>:
>>>>
>>>>> Hello David,
>>>>>
>>>>> On Tue, May 7, 2019 at 2:16 AM David Spisla <spisla80 at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hello Vijay,
>>>>>>
>>>>>> how can I create such a core file? Or will it be created
>>>>>> automatically if a gluster process crashes?
>>>>>> Maybe you can give me a hint and will try to get a backtrace.
>>>>>>
>>>>>
>>>>> Generation of core file is dependent on the system configuration.
>>>>> `man 5 core` contains useful information to generate a core file in a
>>>>> directory. Once a core file is generated, you can use gdb to get a
>>>>> backtrace of all threads (using "thread apply all bt full").
>>>>>
>>>>>
>>>>>> Unfortunately this bug is not easy to reproduce because it appears
>>>>>> only sometimes.
>>>>>>
>>>>>
>>>>> If the bug is not easy to reproduce, having a backtrace from the
>>>>> generated core would be very useful!
>>>>>
>>>>> Thanks,
>>>>> Vijay
>>>>>
>>>>>
>>>>>>
>>>>>> Regards
>>>>>> David Spisla
>>>>>>
>>>>>> Am Mo., 6. Mai 2019 um 19:48 Uhr schrieb Vijay Bellur <
>>>>>> vbellur at redhat.com>:
>>>>>>
>>>>>>> Thank you for the report, David. Do you have core files available on
>>>>>>> any of the servers? If yes, would it be possible for you to provide a
>>>>>>> backtrace.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Vijay
>>>>>>>
>>>>>>> On Mon, May 6, 2019 at 3:09 AM David Spisla <spisla80 at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hello folks,
>>>>>>>>
>>>>>>>> we have a client application (runs on Win10) which does some FOPs
>>>>>>>> on a gluster volume which is accessed by SMB.
>>>>>>>>
>>>>>>>> *Scenario 1* is a READ Operation which reads all files
>>>>>>>> successively and checks if the files data was correctly copied. While doing
>>>>>>>> this, all brick processes crashes and in the logs one have this crash
>>>>>>>> report on every brick log:
>>>>>>>>
>>>>>>>>> CTX_ID:a0359502-2c76-4fee-8cb9-365679dc690e-GRAPH_ID:0-PID:32934-HOST:XX-XXXXX-XX-XX-PC_NAME:shortterm-client-2-RECON_NO:-0, gfid: 00000000-0000-0000-0000-000000000001, req(uid:2000,gid:2000,perm:1,ngrps:1), ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission denied]
>>>>>>>>> pending frames:
>>>>>>>>> frame : type(0) op(27)
>>>>>>>>> frame : type(0) op(40)
>>>>>>>>> patchset: git://git.gluster.org/glusterfs.git
>>>>>>>>> signal received: 11
>>>>>>>>> time of crash:
>>>>>>>>> 2019-04-16 08:32:21
>>>>>>>>> configuration details:
>>>>>>>>> argp 1
>>>>>>>>> backtrace 1
>>>>>>>>> dlfcn 1
>>>>>>>>> libpthread 1
>>>>>>>>> llistxattr 1
>>>>>>>>> setfsid 1
>>>>>>>>> spinlock 1
>>>>>>>>> epoll.h 1
>>>>>>>>> xattr.h 1
>>>>>>>>> st_atim.tv_nsec 1
>>>>>>>>> package-string: glusterfs 5.5
>>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f9a5bd4d64c]
>>>>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f9a5bd57d26]
>>>>>>>>> /lib64/libc.so.6(+0x361a0)[0x7f9a5af141a0]
>>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7f9a4ef0e910]
>>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7f9a4ef0b118]
>>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7f9a4f1278d6]
>>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7f9a4f35975b]
>>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7f9a4f1203b3]
>>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7f9a4ef0b5b2]
>>>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c]
>>>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c]
>>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7f9a4e8cf548]
>>>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7f9a5bdefc22]
>>>>>>>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7f9a5bd733a5]
>>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7f9a4e6b7088]
>>>>>>>>> /lib64/libpthread.so.0(+0x7569)[0x7f9a5b29f569]
>>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7f9a5afd69af]
>>>>>>>>>
>>>>>>>>> *Scenario 2 *The application just SET Read-Only on each file
>>>>>>>> sucessively. After the 70th file was set, all the bricks crashes and again,
>>>>>>>> one can read this crash report in every brick log:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> [2019-05-02 07:43:39.953591] I [MSGID: 139001]
>>>>>>>>> [posix-acl.c:263:posix_acl_log_permit_denied] 0-longterm-access-control:
>>>>>>>>> client:
>>>>>>>>> CTX_ID:21aa9c75-3a5f-41f9-925b-48e4c80bd24a-GRAPH_ID:0-PID:16325-HOST:XXX-X-X-XXX-PC_NAME:longterm-client-0-RECON_NO:-0,
>>>>>>>>> gfid: 00000000-0000-0000-0000-000000000001,
>>>>>>>>> req(uid:2000,gid:2000,perm:1,ngrps:1),
>>>>>>>>> ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission
>>>>>>>>> denied]
>>>>>>>>>
>>>>>>>>> pending frames:
>>>>>>>>>
>>>>>>>>> frame : type(0) op(27)
>>>>>>>>>
>>>>>>>>> patchset: git://git.gluster.org/glusterfs.git
>>>>>>>>>
>>>>>>>>> signal received: 11
>>>>>>>>>
>>>>>>>>> time of crash:
>>>>>>>>>
>>>>>>>>> 2019-05-02 07:43:39
>>>>>>>>>
>>>>>>>>> configuration details:
>>>>>>>>>
>>>>>>>>> argp 1
>>>>>>>>>
>>>>>>>>> backtrace 1
>>>>>>>>>
>>>>>>>>> dlfcn 1
>>>>>>>>>
>>>>>>>>> libpthread 1
>>>>>>>>>
>>>>>>>>> llistxattr 1
>>>>>>>>>
>>>>>>>>> setfsid 1
>>>>>>>>>
>>>>>>>>> spinlock 1
>>>>>>>>>
>>>>>>>>> epoll.h 1
>>>>>>>>>
>>>>>>>>> xattr.h 1
>>>>>>>>>
>>>>>>>>> st_atim.tv_nsec 1
>>>>>>>>>
>>>>>>>>> package-string: glusterfs 5.5
>>>>>>>>>
>>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fbb3f0b364c]
>>>>>>>>>
>>>>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fbb3f0bdd26]
>>>>>>>>>
>>>>>>>>> /lib64/libc.so.6(+0x361e0)[0x7fbb3e27a1e0]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7fbb32257910]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7fbb32254118]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7fbb324708d6]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7fbb326a275b]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7fbb324693b3]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7fbb322545b2]
>>>>>>>>>
>>>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c]
>>>>>>>>>
>>>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7fbb31c18548]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7fbb3f155c22]
>>>>>>>>>
>>>>>>>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7fbb3f0d93a5]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7fbb31a00088]
>>>>>>>>>
>>>>>>>>> /lib64/libpthread.so.0(+0x7569)[0x7fbb3e605569]
>>>>>>>>>
>>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fbb3e33c9ef]
>>>>>>>>>
>>>>>>>>
>>>>>>>> This happens on a 3-Node Gluster v5.5 Cluster on two different
>>>>>>>> volumes. But both volumes has the same settings:
>>>>>>>>
>>>>>>>>> Volume Name: shortterm
>>>>>>>>> Type: Replicate
>>>>>>>>> Volume ID: 5307e5c5-e8a1-493a-a846-342fb0195dee
>>>>>>>>> Status: Started
>>>>>>>>> Snapshot Count: 0
>>>>>>>>> Number of Bricks: 1 x 3 = 3
>>>>>>>>> Transport-type: tcp
>>>>>>>>> Bricks:
>>>>>>>>> Brick1: fs-xxxxx-c1-n1:/gluster/brick4/glusterbrick
>>>>>>>>> Brick2: fs-xxxxx-c1-n2:/gluster/brick4/glusterbrick
>>>>>>>>> Brick3: fs-xxxxx-c1-n3:/gluster/brick4/glusterbrick
>>>>>>>>> Options Reconfigured:
>>>>>>>>> storage.reserve: 1
>>>>>>>>> performance.client-io-threads: off
>>>>>>>>> nfs.disable: on
>>>>>>>>> transport.address-family: inet
>>>>>>>>> user.smb: disable
>>>>>>>>> features.read-only: off
>>>>>>>>> features.worm: off
>>>>>>>>> features.worm-file-level: on
>>>>>>>>> features.retention-mode: enterprise
>>>>>>>>> features.default-retention-period: 120
>>>>>>>>> network.ping-timeout: 10
>>>>>>>>> features.cache-invalidation: on
>>>>>>>>> features.cache-invalidation-timeout: 600
>>>>>>>>> performance.nl-cache: on
>>>>>>>>> performance.nl-cache-timeout: 600
>>>>>>>>> client.event-threads: 32
>>>>>>>>> server.event-threads: 32
>>>>>>>>> cluster.lookup-optimize: on
>>>>>>>>> performance.stat-prefetch: on
>>>>>>>>> performance.cache-invalidation: on
>>>>>>>>> performance.md-cache-timeout: 600
>>>>>>>>> performance.cache-samba-metadata: on
>>>>>>>>> performance.cache-ima-xattrs: on
>>>>>>>>> performance.io-thread-count: 64
>>>>>>>>> cluster.use-compound-fops: on
>>>>>>>>> performance.cache-size: 512MB
>>>>>>>>> performance.cache-refresh-timeout: 10
>>>>>>>>> performance.read-ahead: off
>>>>>>>>> performance.write-behind-window-size: 4MB
>>>>>>>>> performance.write-behind: on
>>>>>>>>> storage.build-pgfid: on
>>>>>>>>> features.utime: on
>>>>>>>>> storage.ctime: on
>>>>>>>>> cluster.quorum-type: fixed
>>>>>>>>> cluster.quorum-count: 2
>>>>>>>>> features.bitrot: on
>>>>>>>>> features.scrub: Active
>>>>>>>>> features.scrub-freq: daily
>>>>>>>>> cluster.enable-shared-storage: enable
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Why can this happen to all Brick processes? I don't understand the
>>>>>>>> crash report. The FOPs are nothing special and after restart brick
>>>>>>>> processes everything works fine and our application was succeed.
>>>>>>>>
>>>>>>>> Regards
>>>>>>>> David Spisla
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Gluster-users mailing list
>>>>>>>> Gluster-users at gluster.org
>>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>
>>>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190517/d4892d50/attachment.html>

From ndevos at redhat.com  Fri May 17 08:21:41 2019
From: ndevos at redhat.com (Niels de Vos)
Date: Fri, 17 May 2019 10:21:41 +0200
Subject: [Gluster-users] Brick-Xlators crashes after Set-RO and Read
In-Reply-To: <CAJyj9j8nnU4WivHFiAH_gVTzOwJj=GAo6DTKtF+=Fxq+xadF_A@mail.gmail.com>
References: <CAJyj9j9wLf7BLJOP8Se5NrXr2ySPw5bTMY8OQ+RU0f4tCzAXcA@mail.gmail.com>
	<CAHn=sVAm-9GXBYst2P0k3rn-HU15Rw3HgUAGY0R=MOymDpxq-A@mail.gmail.com>
	<CAJyj9j83TOBvaBCW+biCoo0uxhf+0nZed1cTFYCJfmu2s-ZsNA@mail.gmail.com>
	<CAHn=sVCh1nPCvUvZT2tmnn7NF8WztO9D9PUiNa5mDb5ORy+8Yw@mail.gmail.com>
	<CAJyj9j-h=3y+uQhSZvvh=31GUQX5eAqOo3FCXqECpE6QADVm_A@mail.gmail.com>
	<CAHn=sVA0iC8izWBvctks_+L4b921QKx1fmgV7cJB1aXFMBF=JA@mail.gmail.com>
	<CAJyj9j-1YYmSm0JVr8U7NsA+4TPQ1tAqeYHD-9BzG2oYyg5kEQ@mail.gmail.com>
	<CAHn=sVBr8LO4eoePoOR2Kzd9Ac4EzfnKOUn-nCXb_GvuKwkLAQ@mail.gmail.com>
	<CAJyj9j8nnU4WivHFiAH_gVTzOwJj=GAo6DTKtF+=Fxq+xadF_A@mail.gmail.com>
Message-ID: <20190517082141.GA24535@ndevos-x270>

On Fri, May 17, 2019 at 09:50:28AM +0200, David Spisla wrote:
> Hello Vijay,
> thank you for the clarification. Yes, there is an unconditional dereference
> in stbuf. It seems plausible that this causes the crash. I think a check
> like this should help:
> 
> if (buf == NULL) {
>         goto out;
> }
> map_atime_from_server(this, buf);
> 
> Is there a reason why buf can be NULL?

It seems LOOKUP returned an error (errno=13: EACCES: Permission denied).
This is probably something you need to handle in worm_lookup_cbk. There
can be many reasons for a FOP to return an error, why it happened in
this case is a little difficult to say without (much) more details.

HTH,
Niels


> 
> Regards
> David Spisla
> 
> 
> Am Fr., 17. Mai 2019 um 01:51 Uhr schrieb Vijay Bellur <vbellur at redhat.com>:
> 
> > Hello David,
> >
> > From the backtrace it looks like stbuf is NULL in map_atime_from_server()
> > as  worm_lookup_cbk has got an error (op_ret = -1, op_errno = 13). Can you
> > please check if there is an unconditional dereference of stbuf in
> > map_atime_from_server()?
> >
> > Regards,
> > Vijay
> >
> > On Thu, May 16, 2019 at 2:36 AM David Spisla <spisla80 at gmail.com> wrote:
> >
> >> Hello Vijay,
> >>
> >> yes, we are using custom patches. It s a helper function, which is
> >> defined in xlator_helper.c and used in worm_lookup_cbk.
> >> Do you think this could be the problem? The functions only manipulates
> >> the atime in struct iattr
> >>
> >> Regards
> >> David Spisla
> >>
> >> Am Do., 16. Mai 2019 um 10:05 Uhr schrieb Vijay Bellur <
> >> vbellur at redhat.com>:
> >>
> >>> Hello David,
> >>>
> >>> Do you have any custom patches in your deployment? I looked up v5.5 but
> >>> could not find the following functions referred to in the core:
> >>>
> >>> map_atime_from_server()
> >>> worm_lookup_cbk()
> >>>
> >>> Neither do I see xlator_helper.c in the codebase.
> >>>
> >>> Thanks,
> >>> Vijay
> >>>
> >>>
> >>> #0  map_atime_from_server (this=0x7fdef401af00, stbuf=0x0) at
> >>> ../../../../xlators/lib/src/xlator_helper.c:21
> >>>         __FUNCTION__ = "map_to_atime_from_server"
> >>> #1  0x00007fdef39a0382 in worm_lookup_cbk (frame=frame at entry=0x7fdeac0015c8,
> >>> cookie=<optimized out>, this=0x7fdef401af00, op_ret=op_ret at entry=-1,
> >>> op_errno=op_errno at entry=13,
> >>>     inode=inode at entry=0x0, buf=0x0, xdata=0x0, postparent=0x0) at
> >>> worm.c:531
> >>>         priv = 0x7fdef4075378
> >>>         ret = 0
> >>>         __FUNCTION__ = "worm_lookup_cbk"
> >>>
> >>> On Thu, May 16, 2019 at 12:53 AM David Spisla <spisla80 at gmail.com>
> >>> wrote:
> >>>
> >>>> Hello Vijay,
> >>>>
> >>>> I could reproduce the issue. After doing a simple DIR Listing from
> >>>> Win10 powershell, all brick processes crashes. Its not the same scenario
> >>>> mentioned before but the crash report in the bricks log is the same.
> >>>> Attached you find the backtrace.
> >>>>
> >>>> Regards
> >>>> David Spisla
> >>>>
> >>>> Am Di., 7. Mai 2019 um 20:08 Uhr schrieb Vijay Bellur <
> >>>> vbellur at redhat.com>:
> >>>>
> >>>>> Hello David,
> >>>>>
> >>>>> On Tue, May 7, 2019 at 2:16 AM David Spisla <spisla80 at gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Hello Vijay,
> >>>>>>
> >>>>>> how can I create such a core file? Or will it be created
> >>>>>> automatically if a gluster process crashes?
> >>>>>> Maybe you can give me a hint and will try to get a backtrace.
> >>>>>>
> >>>>>
> >>>>> Generation of core file is dependent on the system configuration.
> >>>>> `man 5 core` contains useful information to generate a core file in a
> >>>>> directory. Once a core file is generated, you can use gdb to get a
> >>>>> backtrace of all threads (using "thread apply all bt full").
> >>>>>
> >>>>>
> >>>>>> Unfortunately this bug is not easy to reproduce because it appears
> >>>>>> only sometimes.
> >>>>>>
> >>>>>
> >>>>> If the bug is not easy to reproduce, having a backtrace from the
> >>>>> generated core would be very useful!
> >>>>>
> >>>>> Thanks,
> >>>>> Vijay
> >>>>>
> >>>>>
> >>>>>>
> >>>>>> Regards
> >>>>>> David Spisla
> >>>>>>
> >>>>>> Am Mo., 6. Mai 2019 um 19:48 Uhr schrieb Vijay Bellur <
> >>>>>> vbellur at redhat.com>:
> >>>>>>
> >>>>>>> Thank you for the report, David. Do you have core files available on
> >>>>>>> any of the servers? If yes, would it be possible for you to provide a
> >>>>>>> backtrace.
> >>>>>>>
> >>>>>>> Regards,
> >>>>>>> Vijay
> >>>>>>>
> >>>>>>> On Mon, May 6, 2019 at 3:09 AM David Spisla <spisla80 at gmail.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Hello folks,
> >>>>>>>>
> >>>>>>>> we have a client application (runs on Win10) which does some FOPs
> >>>>>>>> on a gluster volume which is accessed by SMB.
> >>>>>>>>
> >>>>>>>> *Scenario 1* is a READ Operation which reads all files
> >>>>>>>> successively and checks if the files data was correctly copied. While doing
> >>>>>>>> this, all brick processes crashes and in the logs one have this crash
> >>>>>>>> report on every brick log:
> >>>>>>>>
> >>>>>>>>> CTX_ID:a0359502-2c76-4fee-8cb9-365679dc690e-GRAPH_ID:0-PID:32934-HOST:XX-XXXXX-XX-XX-PC_NAME:shortterm-client-2-RECON_NO:-0, gfid: 00000000-0000-0000-0000-000000000001, req(uid:2000,gid:2000,perm:1,ngrps:1), ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission denied]
> >>>>>>>>> pending frames:
> >>>>>>>>> frame : type(0) op(27)
> >>>>>>>>> frame : type(0) op(40)
> >>>>>>>>> patchset: git://git.gluster.org/glusterfs.git
> >>>>>>>>> signal received: 11
> >>>>>>>>> time of crash:
> >>>>>>>>> 2019-04-16 08:32:21
> >>>>>>>>> configuration details:
> >>>>>>>>> argp 1
> >>>>>>>>> backtrace 1
> >>>>>>>>> dlfcn 1
> >>>>>>>>> libpthread 1
> >>>>>>>>> llistxattr 1
> >>>>>>>>> setfsid 1
> >>>>>>>>> spinlock 1
> >>>>>>>>> epoll.h 1
> >>>>>>>>> xattr.h 1
> >>>>>>>>> st_atim.tv_nsec 1
> >>>>>>>>> package-string: glusterfs 5.5
> >>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f9a5bd4d64c]
> >>>>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f9a5bd57d26]
> >>>>>>>>> /lib64/libc.so.6(+0x361a0)[0x7f9a5af141a0]
> >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7f9a4ef0e910]
> >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7f9a4ef0b118]
> >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7f9a4f1278d6]
> >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7f9a4f35975b]
> >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7f9a4f1203b3]
> >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7f9a4ef0b5b2]
> >>>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c]
> >>>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c]
> >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7f9a4e8cf548]
> >>>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7f9a5bdefc22]
> >>>>>>>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7f9a5bd733a5]
> >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7f9a4e6b7088]
> >>>>>>>>> /lib64/libpthread.so.0(+0x7569)[0x7f9a5b29f569]
> >>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7f9a5afd69af]
> >>>>>>>>>
> >>>>>>>>> *Scenario 2 *The application just SET Read-Only on each file
> >>>>>>>> sucessively. After the 70th file was set, all the bricks crashes and again,
> >>>>>>>> one can read this crash report in every brick log:
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> [2019-05-02 07:43:39.953591] I [MSGID: 139001]
> >>>>>>>>> [posix-acl.c:263:posix_acl_log_permit_denied] 0-longterm-access-control:
> >>>>>>>>> client:
> >>>>>>>>> CTX_ID:21aa9c75-3a5f-41f9-925b-48e4c80bd24a-GRAPH_ID:0-PID:16325-HOST:XXX-X-X-XXX-PC_NAME:longterm-client-0-RECON_NO:-0,
> >>>>>>>>> gfid: 00000000-0000-0000-0000-000000000001,
> >>>>>>>>> req(uid:2000,gid:2000,perm:1,ngrps:1),
> >>>>>>>>> ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission
> >>>>>>>>> denied]
> >>>>>>>>>
> >>>>>>>>> pending frames:
> >>>>>>>>>
> >>>>>>>>> frame : type(0) op(27)
> >>>>>>>>>
> >>>>>>>>> patchset: git://git.gluster.org/glusterfs.git
> >>>>>>>>>
> >>>>>>>>> signal received: 11
> >>>>>>>>>
> >>>>>>>>> time of crash:
> >>>>>>>>>
> >>>>>>>>> 2019-05-02 07:43:39
> >>>>>>>>>
> >>>>>>>>> configuration details:
> >>>>>>>>>
> >>>>>>>>> argp 1
> >>>>>>>>>
> >>>>>>>>> backtrace 1
> >>>>>>>>>
> >>>>>>>>> dlfcn 1
> >>>>>>>>>
> >>>>>>>>> libpthread 1
> >>>>>>>>>
> >>>>>>>>> llistxattr 1
> >>>>>>>>>
> >>>>>>>>> setfsid 1
> >>>>>>>>>
> >>>>>>>>> spinlock 1
> >>>>>>>>>
> >>>>>>>>> epoll.h 1
> >>>>>>>>>
> >>>>>>>>> xattr.h 1
> >>>>>>>>>
> >>>>>>>>> st_atim.tv_nsec 1
> >>>>>>>>>
> >>>>>>>>> package-string: glusterfs 5.5
> >>>>>>>>>
> >>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fbb3f0b364c]
> >>>>>>>>>
> >>>>>>>>> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fbb3f0bdd26]
> >>>>>>>>>
> >>>>>>>>> /lib64/libc.so.6(+0x361e0)[0x7fbb3e27a1e0]
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7fbb32257910]
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7fbb32254118]
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7fbb324708d6]
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7fbb326a275b]
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7fbb324693b3]
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7fbb322545b2]
> >>>>>>>>>
> >>>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c]
> >>>>>>>>>
> >>>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c]
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7fbb31c18548]
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7fbb3f155c22]
> >>>>>>>>>
> >>>>>>>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7fbb3f0d93a5]
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7fbb31a00088]
> >>>>>>>>>
> >>>>>>>>> /lib64/libpthread.so.0(+0x7569)[0x7fbb3e605569]
> >>>>>>>>>
> >>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fbb3e33c9ef]
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> This happens on a 3-Node Gluster v5.5 Cluster on two different
> >>>>>>>> volumes. But both volumes has the same settings:
> >>>>>>>>
> >>>>>>>>> Volume Name: shortterm
> >>>>>>>>> Type: Replicate
> >>>>>>>>> Volume ID: 5307e5c5-e8a1-493a-a846-342fb0195dee
> >>>>>>>>> Status: Started
> >>>>>>>>> Snapshot Count: 0
> >>>>>>>>> Number of Bricks: 1 x 3 = 3
> >>>>>>>>> Transport-type: tcp
> >>>>>>>>> Bricks:
> >>>>>>>>> Brick1: fs-xxxxx-c1-n1:/gluster/brick4/glusterbrick
> >>>>>>>>> Brick2: fs-xxxxx-c1-n2:/gluster/brick4/glusterbrick
> >>>>>>>>> Brick3: fs-xxxxx-c1-n3:/gluster/brick4/glusterbrick
> >>>>>>>>> Options Reconfigured:
> >>>>>>>>> storage.reserve: 1
> >>>>>>>>> performance.client-io-threads: off
> >>>>>>>>> nfs.disable: on
> >>>>>>>>> transport.address-family: inet
> >>>>>>>>> user.smb: disable
> >>>>>>>>> features.read-only: off
> >>>>>>>>> features.worm: off
> >>>>>>>>> features.worm-file-level: on
> >>>>>>>>> features.retention-mode: enterprise
> >>>>>>>>> features.default-retention-period: 120
> >>>>>>>>> network.ping-timeout: 10
> >>>>>>>>> features.cache-invalidation: on
> >>>>>>>>> features.cache-invalidation-timeout: 600
> >>>>>>>>> performance.nl-cache: on
> >>>>>>>>> performance.nl-cache-timeout: 600
> >>>>>>>>> client.event-threads: 32
> >>>>>>>>> server.event-threads: 32
> >>>>>>>>> cluster.lookup-optimize: on
> >>>>>>>>> performance.stat-prefetch: on
> >>>>>>>>> performance.cache-invalidation: on
> >>>>>>>>> performance.md-cache-timeout: 600
> >>>>>>>>> performance.cache-samba-metadata: on
> >>>>>>>>> performance.cache-ima-xattrs: on
> >>>>>>>>> performance.io-thread-count: 64
> >>>>>>>>> cluster.use-compound-fops: on
> >>>>>>>>> performance.cache-size: 512MB
> >>>>>>>>> performance.cache-refresh-timeout: 10
> >>>>>>>>> performance.read-ahead: off
> >>>>>>>>> performance.write-behind-window-size: 4MB
> >>>>>>>>> performance.write-behind: on
> >>>>>>>>> storage.build-pgfid: on
> >>>>>>>>> features.utime: on
> >>>>>>>>> storage.ctime: on
> >>>>>>>>> cluster.quorum-type: fixed
> >>>>>>>>> cluster.quorum-count: 2
> >>>>>>>>> features.bitrot: on
> >>>>>>>>> features.scrub: Active
> >>>>>>>>> features.scrub-freq: daily
> >>>>>>>>> cluster.enable-shared-storage: enable
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> Why can this happen to all Brick processes? I don't understand the
> >>>>>>>> crash report. The FOPs are nothing special and after restart brick
> >>>>>>>> processes everything works fine and our application was succeed.
> >>>>>>>>
> >>>>>>>> Regards
> >>>>>>>> David Spisla
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> _______________________________________________
> >>>>>>>> Gluster-users mailing list
> >>>>>>>> Gluster-users at gluster.org
> >>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
> >>>>>>>
> >>>>>>>

> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users


From abhishpaliwal at gmail.com  Fri May 17 09:16:20 2019
From: abhishpaliwal at gmail.com (ABHISHEK PALIWAL)
Date: Fri, 17 May 2019 14:46:20 +0530
Subject: [Gluster-users] Memory leak in glusterfs
In-Reply-To: <CA+15cFN0ESgCYSgQpqeVH953O=XzNvcrm-ce2jt8Ct3nuNLnCg@mail.gmail.com>
References: <CA+15cFN0ESgCYSgQpqeVH953O=XzNvcrm-ce2jt8Ct3nuNLnCg@mail.gmail.com>
Message-ID: <CA+15cFNmjo2n7TP1RQHFZW6rDzNyfePYaGhzp5Uj63mY7M1UBA@mail.gmail.com>

Anyone please reply....

On Thu, May 16, 2019, 10:49 ABHISHEK PALIWAL <abhishpaliwal at gmail.com>
wrote:

> Hi Team,
>
> I upload some valgrind logs from my gluster 5.4 setup. This is writing to
> the volume every 15 minutes. I stopped glusterd and then copy away the
> logs.  The test was running for some simulated days. They are zipped in
> valgrind-54.zip.
>
> Lots of info in valgrind-2730.log. Lots of possibly lost bytes in
> glusterfs and even some definitely lost bytes.
>
> ==2737== 1,572,880 bytes in 1 blocks are possibly lost in loss record 391
> of 391
> ==2737== at 0x4C29C25: calloc (in
> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==2737== by 0xA22485E: ??? (in
> /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so)
> ==2737== by 0xA217C94: ??? (in
> /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so)
> ==2737== by 0xA21D9F8: ??? (in
> /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so)
> ==2737== by 0xA21DED9: ??? (in
> /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so)
> ==2737== by 0xA21E685: ??? (in
> /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so)
> ==2737== by 0xA1B9D8C: init (in
> /usr/lib64/glusterfs/5.4/xlator/mgmt/glusterd.so)
> ==2737== by 0x4E511CE: xlator_init (in /usr/lib64/libglusterfs.so.0.0.1)
> ==2737== by 0x4E8A2B8: ??? (in /usr/lib64/libglusterfs.so.0.0.1)
> ==2737== by 0x4E8AAB3: glusterfs_graph_activate (in
> /usr/lib64/libglusterfs.so.0.0.1)
> ==2737== by 0x409C35: glusterfs_process_volfp (in /usr/sbin/glusterfsd)
> ==2737== by 0x409D99: glusterfs_volumes_init (in /usr/sbin/glusterfsd)
> ==2737==
> ==2737== LEAK SUMMARY:
> ==2737== definitely lost: 1,053 bytes in 10 blocks
> ==2737== indirectly lost: 317 bytes in 3 blocks
> ==2737== possibly lost: 2,374,971 bytes in 524 blocks
> ==2737== still reachable: 53,277 bytes in 201 blocks
> ==2737== suppressed: 0 bytes in 0 blocks
>
> --
>
>
>
>
> Regards
> Abhishek Paliwal
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190517/6e7de06c/attachment.html>

From spisla80 at gmail.com  Fri May 17 09:17:52 2019
From: spisla80 at gmail.com (David Spisla)
Date: Fri, 17 May 2019 11:17:52 +0200
Subject: [Gluster-users] Brick-Xlators crashes after Set-RO and Read
In-Reply-To: <20190517082141.GA24535@ndevos-x270>
References: <CAJyj9j9wLf7BLJOP8Se5NrXr2ySPw5bTMY8OQ+RU0f4tCzAXcA@mail.gmail.com>
	<CAHn=sVAm-9GXBYst2P0k3rn-HU15Rw3HgUAGY0R=MOymDpxq-A@mail.gmail.com>
	<CAJyj9j83TOBvaBCW+biCoo0uxhf+0nZed1cTFYCJfmu2s-ZsNA@mail.gmail.com>
	<CAHn=sVCh1nPCvUvZT2tmnn7NF8WztO9D9PUiNa5mDb5ORy+8Yw@mail.gmail.com>
	<CAJyj9j-h=3y+uQhSZvvh=31GUQX5eAqOo3FCXqECpE6QADVm_A@mail.gmail.com>
	<CAHn=sVA0iC8izWBvctks_+L4b921QKx1fmgV7cJB1aXFMBF=JA@mail.gmail.com>
	<CAJyj9j-1YYmSm0JVr8U7NsA+4TPQ1tAqeYHD-9BzG2oYyg5kEQ@mail.gmail.com>
	<CAHn=sVBr8LO4eoePoOR2Kzd9Ac4EzfnKOUn-nCXb_GvuKwkLAQ@mail.gmail.com>
	<CAJyj9j8nnU4WivHFiAH_gVTzOwJj=GAo6DTKtF+=Fxq+xadF_A@mail.gmail.com>
	<20190517082141.GA24535@ndevos-x270>
Message-ID: <CAJyj9j9J5onU2z_K8RxUgn_Zp8Tn0S-A0WQC2=qEsdAB19EUyQ@mail.gmail.com>

Hello Niels,

Am Fr., 17. Mai 2019 um 10:21 Uhr schrieb Niels de Vos <ndevos at redhat.com>:

> On Fri, May 17, 2019 at 09:50:28AM +0200, David Spisla wrote:
> > Hello Vijay,
> > thank you for the clarification. Yes, there is an unconditional
> dereference
> > in stbuf. It seems plausible that this causes the crash. I think a check
> > like this should help:
> >
> > if (buf == NULL) {
> >         goto out;
> > }
> > map_atime_from_server(this, buf);
> >
> > Is there a reason why buf can be NULL?
>
> It seems LOOKUP returned an error (errno=13: EACCES: Permission denied).
> This is probably something you need to handle in worm_lookup_cbk. There
> can be many reasons for a FOP to return an error, why it happened in
> this case is a little difficult to say without (much) more details.
>
Yes, I will look for a way to handle that case.
It is intended, that the struct stbuf ist NULL when an error happens?

Regards
David Spisla


> HTH,
> Niels
>
>
> >
> > Regards
> > David Spisla
> >
> >
> > Am Fr., 17. Mai 2019 um 01:51 Uhr schrieb Vijay Bellur <
> vbellur at redhat.com>:
> >
> > > Hello David,
> > >
> > > From the backtrace it looks like stbuf is NULL in
> map_atime_from_server()
> > > as  worm_lookup_cbk has got an error (op_ret = -1, op_errno = 13). Can
> you
> > > please check if there is an unconditional dereference of stbuf in
> > > map_atime_from_server()?
> > >
> > > Regards,
> > > Vijay
> > >
> > > On Thu, May 16, 2019 at 2:36 AM David Spisla <spisla80 at gmail.com>
> wrote:
> > >
> > >> Hello Vijay,
> > >>
> > >> yes, we are using custom patches. It s a helper function, which is
> > >> defined in xlator_helper.c and used in worm_lookup_cbk.
> > >> Do you think this could be the problem? The functions only manipulates
> > >> the atime in struct iattr
> > >>
> > >> Regards
> > >> David Spisla
> > >>
> > >> Am Do., 16. Mai 2019 um 10:05 Uhr schrieb Vijay Bellur <
> > >> vbellur at redhat.com>:
> > >>
> > >>> Hello David,
> > >>>
> > >>> Do you have any custom patches in your deployment? I looked up v5.5
> but
> > >>> could not find the following functions referred to in the core:
> > >>>
> > >>> map_atime_from_server()
> > >>> worm_lookup_cbk()
> > >>>
> > >>> Neither do I see xlator_helper.c in the codebase.
> > >>>
> > >>> Thanks,
> > >>> Vijay
> > >>>
> > >>>
> > >>> #0  map_atime_from_server (this=0x7fdef401af00, stbuf=0x0) at
> > >>> ../../../../xlators/lib/src/xlator_helper.c:21
> > >>>         __FUNCTION__ = "map_to_atime_from_server"
> > >>> #1  0x00007fdef39a0382 in worm_lookup_cbk (frame=frame at entry
> =0x7fdeac0015c8,
> > >>> cookie=<optimized out>, this=0x7fdef401af00, op_ret=op_ret at entry=-1,
> > >>> op_errno=op_errno at entry=13,
> > >>>     inode=inode at entry=0x0, buf=0x0, xdata=0x0, postparent=0x0) at
> > >>> worm.c:531
> > >>>         priv = 0x7fdef4075378
> > >>>         ret = 0
> > >>>         __FUNCTION__ = "worm_lookup_cbk"
> > >>>
> > >>> On Thu, May 16, 2019 at 12:53 AM David Spisla <spisla80 at gmail.com>
> > >>> wrote:
> > >>>
> > >>>> Hello Vijay,
> > >>>>
> > >>>> I could reproduce the issue. After doing a simple DIR Listing from
> > >>>> Win10 powershell, all brick processes crashes. Its not the same
> scenario
> > >>>> mentioned before but the crash report in the bricks log is the same.
> > >>>> Attached you find the backtrace.
> > >>>>
> > >>>> Regards
> > >>>> David Spisla
> > >>>>
> > >>>> Am Di., 7. Mai 2019 um 20:08 Uhr schrieb Vijay Bellur <
> > >>>> vbellur at redhat.com>:
> > >>>>
> > >>>>> Hello David,
> > >>>>>
> > >>>>> On Tue, May 7, 2019 at 2:16 AM David Spisla <spisla80 at gmail.com>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> Hello Vijay,
> > >>>>>>
> > >>>>>> how can I create such a core file? Or will it be created
> > >>>>>> automatically if a gluster process crashes?
> > >>>>>> Maybe you can give me a hint and will try to get a backtrace.
> > >>>>>>
> > >>>>>
> > >>>>> Generation of core file is dependent on the system configuration.
> > >>>>> `man 5 core` contains useful information to generate a core file
> in a
> > >>>>> directory. Once a core file is generated, you can use gdb to get a
> > >>>>> backtrace of all threads (using "thread apply all bt full").
> > >>>>>
> > >>>>>
> > >>>>>> Unfortunately this bug is not easy to reproduce because it appears
> > >>>>>> only sometimes.
> > >>>>>>
> > >>>>>
> > >>>>> If the bug is not easy to reproduce, having a backtrace from the
> > >>>>> generated core would be very useful!
> > >>>>>
> > >>>>> Thanks,
> > >>>>> Vijay
> > >>>>>
> > >>>>>
> > >>>>>>
> > >>>>>> Regards
> > >>>>>> David Spisla
> > >>>>>>
> > >>>>>> Am Mo., 6. Mai 2019 um 19:48 Uhr schrieb Vijay Bellur <
> > >>>>>> vbellur at redhat.com>:
> > >>>>>>
> > >>>>>>> Thank you for the report, David. Do you have core files
> available on
> > >>>>>>> any of the servers? If yes, would it be possible for you to
> provide a
> > >>>>>>> backtrace.
> > >>>>>>>
> > >>>>>>> Regards,
> > >>>>>>> Vijay
> > >>>>>>>
> > >>>>>>> On Mon, May 6, 2019 at 3:09 AM David Spisla <spisla80 at gmail.com>
> > >>>>>>> wrote:
> > >>>>>>>
> > >>>>>>>> Hello folks,
> > >>>>>>>>
> > >>>>>>>> we have a client application (runs on Win10) which does some
> FOPs
> > >>>>>>>> on a gluster volume which is accessed by SMB.
> > >>>>>>>>
> > >>>>>>>> *Scenario 1* is a READ Operation which reads all files
> > >>>>>>>> successively and checks if the files data was correctly copied.
> While doing
> > >>>>>>>> this, all brick processes crashes and in the logs one have this
> crash
> > >>>>>>>> report on every brick log:
> > >>>>>>>>
> > >>>>>>>>>
> CTX_ID:a0359502-2c76-4fee-8cb9-365679dc690e-GRAPH_ID:0-PID:32934-HOST:XX-XXXXX-XX-XX-PC_NAME:shortterm-client-2-RECON_NO:-0,
> gfid: 00000000-0000-0000-0000-000000000001,
> req(uid:2000,gid:2000,perm:1,ngrps:1),
> ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission
> denied]
> > >>>>>>>>> pending frames:
> > >>>>>>>>> frame : type(0) op(27)
> > >>>>>>>>> frame : type(0) op(40)
> > >>>>>>>>> patchset: git://git.gluster.org/glusterfs.git
> > >>>>>>>>> signal received: 11
> > >>>>>>>>> time of crash:
> > >>>>>>>>> 2019-04-16 08:32:21
> > >>>>>>>>> configuration details:
> > >>>>>>>>> argp 1
> > >>>>>>>>> backtrace 1
> > >>>>>>>>> dlfcn 1
> > >>>>>>>>> libpthread 1
> > >>>>>>>>> llistxattr 1
> > >>>>>>>>> setfsid 1
> > >>>>>>>>> spinlock 1
> > >>>>>>>>> epoll.h 1
> > >>>>>>>>> xattr.h 1
> > >>>>>>>>> st_atim.tv_nsec 1
> > >>>>>>>>> package-string: glusterfs 5.5
> > >>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f9a5bd4d64c]
> > >>>>>>>>>
> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f9a5bd57d26]
> > >>>>>>>>> /lib64/libc.so.6(+0x361a0)[0x7f9a5af141a0]
> > >>>>>>>>>
> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7f9a4ef0e910]
> > >>>>>>>>>
> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7f9a4ef0b118]
> > >>>>>>>>>
> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7f9a4f1278d6]
> > >>>>>>>>>
> /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7f9a4f35975b]
> > >>>>>>>>>
> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7f9a4f1203b3]
> > >>>>>>>>>
> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7f9a4ef0b5b2]
> > >>>>>>>>>
> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c]
> > >>>>>>>>>
> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c]
> > >>>>>>>>>
> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7f9a4e8cf548]
> > >>>>>>>>>
> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7f9a5bdefc22]
> > >>>>>>>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7f9a5bd733a5]
> > >>>>>>>>>
> /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7f9a4e6b7088]
> > >>>>>>>>> /lib64/libpthread.so.0(+0x7569)[0x7f9a5b29f569]
> > >>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7f9a5afd69af]
> > >>>>>>>>>
> > >>>>>>>>> *Scenario 2 *The application just SET Read-Only on each file
> > >>>>>>>> sucessively. After the 70th file was set, all the bricks
> crashes and again,
> > >>>>>>>> one can read this crash report in every brick log:
> > >>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> [2019-05-02 07:43:39.953591] I [MSGID: 139001]
> > >>>>>>>>> [posix-acl.c:263:posix_acl_log_permit_denied]
> 0-longterm-access-control:
> > >>>>>>>>> client:
> > >>>>>>>>>
> CTX_ID:21aa9c75-3a5f-41f9-925b-48e4c80bd24a-GRAPH_ID:0-PID:16325-HOST:XXX-X-X-XXX-PC_NAME:longterm-client-0-RECON_NO:-0,
> > >>>>>>>>> gfid: 00000000-0000-0000-0000-000000000001,
> > >>>>>>>>> req(uid:2000,gid:2000,perm:1,ngrps:1),
> > >>>>>>>>> ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP,
> acl:-) [Permission
> > >>>>>>>>> denied]
> > >>>>>>>>>
> > >>>>>>>>> pending frames:
> > >>>>>>>>>
> > >>>>>>>>> frame : type(0) op(27)
> > >>>>>>>>>
> > >>>>>>>>> patchset: git://git.gluster.org/glusterfs.git
> > >>>>>>>>>
> > >>>>>>>>> signal received: 11
> > >>>>>>>>>
> > >>>>>>>>> time of crash:
> > >>>>>>>>>
> > >>>>>>>>> 2019-05-02 07:43:39
> > >>>>>>>>>
> > >>>>>>>>> configuration details:
> > >>>>>>>>>
> > >>>>>>>>> argp 1
> > >>>>>>>>>
> > >>>>>>>>> backtrace 1
> > >>>>>>>>>
> > >>>>>>>>> dlfcn 1
> > >>>>>>>>>
> > >>>>>>>>> libpthread 1
> > >>>>>>>>>
> > >>>>>>>>> llistxattr 1
> > >>>>>>>>>
> > >>>>>>>>> setfsid 1
> > >>>>>>>>>
> > >>>>>>>>> spinlock 1
> > >>>>>>>>>
> > >>>>>>>>> epoll.h 1
> > >>>>>>>>>
> > >>>>>>>>> xattr.h 1
> > >>>>>>>>>
> > >>>>>>>>> st_atim.tv_nsec 1
> > >>>>>>>>>
> > >>>>>>>>> package-string: glusterfs 5.5
> > >>>>>>>>>
> > >>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fbb3f0b364c]
> > >>>>>>>>>
> > >>>>>>>>>
> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fbb3f0bdd26]
> > >>>>>>>>>
> > >>>>>>>>> /lib64/libc.so.6(+0x361e0)[0x7fbb3e27a1e0]
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7fbb32257910]
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7fbb32254118]
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7fbb324708d6]
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7fbb326a275b]
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7fbb324693b3]
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7fbb322545b2]
> > >>>>>>>>>
> > >>>>>>>>>
> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c]
> > >>>>>>>>>
> > >>>>>>>>>
> /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c]
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7fbb31c18548]
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7fbb3f155c22]
> > >>>>>>>>>
> > >>>>>>>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7fbb3f0d93a5]
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7fbb31a00088]
> > >>>>>>>>>
> > >>>>>>>>> /lib64/libpthread.so.0(+0x7569)[0x7fbb3e605569]
> > >>>>>>>>>
> > >>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fbb3e33c9ef]
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>> This happens on a 3-Node Gluster v5.5 Cluster on two different
> > >>>>>>>> volumes. But both volumes has the same settings:
> > >>>>>>>>
> > >>>>>>>>> Volume Name: shortterm
> > >>>>>>>>> Type: Replicate
> > >>>>>>>>> Volume ID: 5307e5c5-e8a1-493a-a846-342fb0195dee
> > >>>>>>>>> Status: Started
> > >>>>>>>>> Snapshot Count: 0
> > >>>>>>>>> Number of Bricks: 1 x 3 = 3
> > >>>>>>>>> Transport-type: tcp
> > >>>>>>>>> Bricks:
> > >>>>>>>>> Brick1: fs-xxxxx-c1-n1:/gluster/brick4/glusterbrick
> > >>>>>>>>> Brick2: fs-xxxxx-c1-n2:/gluster/brick4/glusterbrick
> > >>>>>>>>> Brick3: fs-xxxxx-c1-n3:/gluster/brick4/glusterbrick
> > >>>>>>>>> Options Reconfigured:
> > >>>>>>>>> storage.reserve: 1
> > >>>>>>>>> performance.client-io-threads: off
> > >>>>>>>>> nfs.disable: on
> > >>>>>>>>> transport.address-family: inet
> > >>>>>>>>> user.smb: disable
> > >>>>>>>>> features.read-only: off
> > >>>>>>>>> features.worm: off
> > >>>>>>>>> features.worm-file-level: on
> > >>>>>>>>> features.retention-mode: enterprise
> > >>>>>>>>> features.default-retention-period: 120
> > >>>>>>>>> network.ping-timeout: 10
> > >>>>>>>>> features.cache-invalidation: on
> > >>>>>>>>> features.cache-invalidation-timeout: 600
> > >>>>>>>>> performance.nl-cache: on
> > >>>>>>>>> performance.nl-cache-timeout: 600
> > >>>>>>>>> client.event-threads: 32
> > >>>>>>>>> server.event-threads: 32
> > >>>>>>>>> cluster.lookup-optimize: on
> > >>>>>>>>> performance.stat-prefetch: on
> > >>>>>>>>> performance.cache-invalidation: on
> > >>>>>>>>> performance.md-cache-timeout: 600
> > >>>>>>>>> performance.cache-samba-metadata: on
> > >>>>>>>>> performance.cache-ima-xattrs: on
> > >>>>>>>>> performance.io-thread-count: 64
> > >>>>>>>>> cluster.use-compound-fops: on
> > >>>>>>>>> performance.cache-size: 512MB
> > >>>>>>>>> performance.cache-refresh-timeout: 10
> > >>>>>>>>> performance.read-ahead: off
> > >>>>>>>>> performance.write-behind-window-size: 4MB
> > >>>>>>>>> performance.write-behind: on
> > >>>>>>>>> storage.build-pgfid: on
> > >>>>>>>>> features.utime: on
> > >>>>>>>>> storage.ctime: on
> > >>>>>>>>> cluster.quorum-type: fixed
> > >>>>>>>>> cluster.quorum-count: 2
> > >>>>>>>>> features.bitrot: on
> > >>>>>>>>> features.scrub: Active
> > >>>>>>>>> features.scrub-freq: daily
> > >>>>>>>>> cluster.enable-shared-storage: enable
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>> Why can this happen to all Brick processes? I don't understand
> the
> > >>>>>>>> crash report. The FOPs are nothing special and after restart
> brick
> > >>>>>>>> processes everything works fine and our application was succeed.
> > >>>>>>>>
> > >>>>>>>> Regards
> > >>>>>>>> David Spisla
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> _______________________________________________
> > >>>>>>>> Gluster-users mailing list
> > >>>>>>>> Gluster-users at gluster.org
> > >>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
> > >>>>>>>
> > >>>>>>>
>
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190517/904c5686/attachment.html>

From ndevos at redhat.com  Fri May 17 09:35:02 2019
From: ndevos at redhat.com (Niels de Vos)
Date: Fri, 17 May 2019 11:35:02 +0200
Subject: [Gluster-users] Brick-Xlators crashes after Set-RO and Read
In-Reply-To: <CAJyj9j9J5onU2z_K8RxUgn_Zp8Tn0S-A0WQC2=qEsdAB19EUyQ@mail.gmail.com>
References: <CAHn=sVAm-9GXBYst2P0k3rn-HU15Rw3HgUAGY0R=MOymDpxq-A@mail.gmail.com>
	<CAJyj9j83TOBvaBCW+biCoo0uxhf+0nZed1cTFYCJfmu2s-ZsNA@mail.gmail.com>
	<CAHn=sVCh1nPCvUvZT2tmnn7NF8WztO9D9PUiNa5mDb5ORy+8Yw@mail.gmail.com>
	<CAJyj9j-h=3y+uQhSZvvh=31GUQX5eAqOo3FCXqECpE6QADVm_A@mail.gmail.com>
	<CAHn=sVA0iC8izWBvctks_+L4b921QKx1fmgV7cJB1aXFMBF=JA@mail.gmail.com>
	<CAJyj9j-1YYmSm0JVr8U7NsA+4TPQ1tAqeYHD-9BzG2oYyg5kEQ@mail.gmail.com>
	<CAHn=sVBr8LO4eoePoOR2Kzd9Ac4EzfnKOUn-nCXb_GvuKwkLAQ@mail.gmail.com>
	<CAJyj9j8nnU4WivHFiAH_gVTzOwJj=GAo6DTKtF+=Fxq+xadF_A@mail.gmail.com>
	<20190517082141.GA24535@ndevos-x270>
	<CAJyj9j9J5onU2z_K8RxUgn_Zp8Tn0S-A0WQC2=qEsdAB19EUyQ@mail.gmail.com>
Message-ID: <20190517093502.GB24535@ndevos-x270>

On Fri, May 17, 2019 at 11:17:52AM +0200, David Spisla wrote:
> Hello Niels,
> 
> Am Fr., 17. Mai 2019 um 10:21 Uhr schrieb Niels de Vos <ndevos at redhat.com>:
> 
> > On Fri, May 17, 2019 at 09:50:28AM +0200, David Spisla wrote:
> > > Hello Vijay,
> > > thank you for the clarification. Yes, there is an unconditional
> > dereference
> > > in stbuf. It seems plausible that this causes the crash. I think a check
> > > like this should help:
> > >
> > > if (buf == NULL) {
> > >         goto out;
> > > }
> > > map_atime_from_server(this, buf);
> > >
> > > Is there a reason why buf can be NULL?
> >
> > It seems LOOKUP returned an error (errno=13: EACCES: Permission denied).
> > This is probably something you need to handle in worm_lookup_cbk. There
> > can be many reasons for a FOP to return an error, why it happened in
> > this case is a little difficult to say without (much) more details.
> >
> Yes, I will look for a way to handle that case.
> It is intended, that the struct stbuf ist NULL when an error happens?

Yes, in most error occasions it will not be possible to get a valid
stbuf.

Niels


> 
> Regards
> David Spisla
> 
> 
> > HTH,
> > Niels
> >
> >
> > >
> > > Regards
> > > David Spisla
> > >
> > >
> > > Am Fr., 17. Mai 2019 um 01:51 Uhr schrieb Vijay Bellur <
> > vbellur at redhat.com>:
> > >
> > > > Hello David,
> > > >
> > > > From the backtrace it looks like stbuf is NULL in
> > map_atime_from_server()
> > > > as  worm_lookup_cbk has got an error (op_ret = -1, op_errno = 13). Can
> > you
> > > > please check if there is an unconditional dereference of stbuf in
> > > > map_atime_from_server()?
> > > >
> > > > Regards,
> > > > Vijay
> > > >
> > > > On Thu, May 16, 2019 at 2:36 AM David Spisla <spisla80 at gmail.com>
> > wrote:
> > > >
> > > >> Hello Vijay,
> > > >>
> > > >> yes, we are using custom patches. It s a helper function, which is
> > > >> defined in xlator_helper.c and used in worm_lookup_cbk.
> > > >> Do you think this could be the problem? The functions only manipulates
> > > >> the atime in struct iattr
> > > >>
> > > >> Regards
> > > >> David Spisla
> > > >>
> > > >> Am Do., 16. Mai 2019 um 10:05 Uhr schrieb Vijay Bellur <
> > > >> vbellur at redhat.com>:
> > > >>
> > > >>> Hello David,
> > > >>>
> > > >>> Do you have any custom patches in your deployment? I looked up v5.5
> > but
> > > >>> could not find the following functions referred to in the core:
> > > >>>
> > > >>> map_atime_from_server()
> > > >>> worm_lookup_cbk()
> > > >>>
> > > >>> Neither do I see xlator_helper.c in the codebase.
> > > >>>
> > > >>> Thanks,
> > > >>> Vijay
> > > >>>
> > > >>>
> > > >>> #0  map_atime_from_server (this=0x7fdef401af00, stbuf=0x0) at
> > > >>> ../../../../xlators/lib/src/xlator_helper.c:21
> > > >>>         __FUNCTION__ = "map_to_atime_from_server"
> > > >>> #1  0x00007fdef39a0382 in worm_lookup_cbk (frame=frame at entry
> > =0x7fdeac0015c8,
> > > >>> cookie=<optimized out>, this=0x7fdef401af00, op_ret=op_ret at entry=-1,
> > > >>> op_errno=op_errno at entry=13,
> > > >>>     inode=inode at entry=0x0, buf=0x0, xdata=0x0, postparent=0x0) at
> > > >>> worm.c:531
> > > >>>         priv = 0x7fdef4075378
> > > >>>         ret = 0
> > > >>>         __FUNCTION__ = "worm_lookup_cbk"
> > > >>>
> > > >>> On Thu, May 16, 2019 at 12:53 AM David Spisla <spisla80 at gmail.com>
> > > >>> wrote:
> > > >>>
> > > >>>> Hello Vijay,
> > > >>>>
> > > >>>> I could reproduce the issue. After doing a simple DIR Listing from
> > > >>>> Win10 powershell, all brick processes crashes. Its not the same
> > scenario
> > > >>>> mentioned before but the crash report in the bricks log is the same.
> > > >>>> Attached you find the backtrace.
> > > >>>>
> > > >>>> Regards
> > > >>>> David Spisla
> > > >>>>
> > > >>>> Am Di., 7. Mai 2019 um 20:08 Uhr schrieb Vijay Bellur <
> > > >>>> vbellur at redhat.com>:
> > > >>>>
> > > >>>>> Hello David,
> > > >>>>>
> > > >>>>> On Tue, May 7, 2019 at 2:16 AM David Spisla <spisla80 at gmail.com>
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>>> Hello Vijay,
> > > >>>>>>
> > > >>>>>> how can I create such a core file? Or will it be created
> > > >>>>>> automatically if a gluster process crashes?
> > > >>>>>> Maybe you can give me a hint and will try to get a backtrace.
> > > >>>>>>
> > > >>>>>
> > > >>>>> Generation of core file is dependent on the system configuration.
> > > >>>>> `man 5 core` contains useful information to generate a core file
> > in a
> > > >>>>> directory. Once a core file is generated, you can use gdb to get a
> > > >>>>> backtrace of all threads (using "thread apply all bt full").
> > > >>>>>
> > > >>>>>
> > > >>>>>> Unfortunately this bug is not easy to reproduce because it appears
> > > >>>>>> only sometimes.
> > > >>>>>>
> > > >>>>>
> > > >>>>> If the bug is not easy to reproduce, having a backtrace from the
> > > >>>>> generated core would be very useful!
> > > >>>>>
> > > >>>>> Thanks,
> > > >>>>> Vijay
> > > >>>>>
> > > >>>>>
> > > >>>>>>
> > > >>>>>> Regards
> > > >>>>>> David Spisla
> > > >>>>>>
> > > >>>>>> Am Mo., 6. Mai 2019 um 19:48 Uhr schrieb Vijay Bellur <
> > > >>>>>> vbellur at redhat.com>:
> > > >>>>>>
> > > >>>>>>> Thank you for the report, David. Do you have core files
> > available on
> > > >>>>>>> any of the servers? If yes, would it be possible for you to
> > provide a
> > > >>>>>>> backtrace.
> > > >>>>>>>
> > > >>>>>>> Regards,
> > > >>>>>>> Vijay
> > > >>>>>>>
> > > >>>>>>> On Mon, May 6, 2019 at 3:09 AM David Spisla <spisla80 at gmail.com>
> > > >>>>>>> wrote:
> > > >>>>>>>
> > > >>>>>>>> Hello folks,
> > > >>>>>>>>
> > > >>>>>>>> we have a client application (runs on Win10) which does some
> > FOPs
> > > >>>>>>>> on a gluster volume which is accessed by SMB.
> > > >>>>>>>>
> > > >>>>>>>> *Scenario 1* is a READ Operation which reads all files
> > > >>>>>>>> successively and checks if the files data was correctly copied.
> > While doing
> > > >>>>>>>> this, all brick processes crashes and in the logs one have this
> > crash
> > > >>>>>>>> report on every brick log:
> > > >>>>>>>>
> > > >>>>>>>>>
> > CTX_ID:a0359502-2c76-4fee-8cb9-365679dc690e-GRAPH_ID:0-PID:32934-HOST:XX-XXXXX-XX-XX-PC_NAME:shortterm-client-2-RECON_NO:-0,
> > gfid: 00000000-0000-0000-0000-000000000001,
> > req(uid:2000,gid:2000,perm:1,ngrps:1),
> > ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-) [Permission
> > denied]
> > > >>>>>>>>> pending frames:
> > > >>>>>>>>> frame : type(0) op(27)
> > > >>>>>>>>> frame : type(0) op(40)
> > > >>>>>>>>> patchset: git://git.gluster.org/glusterfs.git
> > > >>>>>>>>> signal received: 11
> > > >>>>>>>>> time of crash:
> > > >>>>>>>>> 2019-04-16 08:32:21
> > > >>>>>>>>> configuration details:
> > > >>>>>>>>> argp 1
> > > >>>>>>>>> backtrace 1
> > > >>>>>>>>> dlfcn 1
> > > >>>>>>>>> libpthread 1
> > > >>>>>>>>> llistxattr 1
> > > >>>>>>>>> setfsid 1
> > > >>>>>>>>> spinlock 1
> > > >>>>>>>>> epoll.h 1
> > > >>>>>>>>> xattr.h 1
> > > >>>>>>>>> st_atim.tv_nsec 1
> > > >>>>>>>>> package-string: glusterfs 5.5
> > > >>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f9a5bd4d64c]
> > > >>>>>>>>>
> > /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f9a5bd57d26]
> > > >>>>>>>>> /lib64/libc.so.6(+0x361a0)[0x7f9a5af141a0]
> > > >>>>>>>>>
> > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7f9a4ef0e910]
> > > >>>>>>>>>
> > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7f9a4ef0b118]
> > > >>>>>>>>>
> > /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7f9a4f1278d6]
> > > >>>>>>>>>
> > /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7f9a4f35975b]
> > > >>>>>>>>>
> > /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7f9a4f1203b3]
> > > >>>>>>>>>
> > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7f9a4ef0b5b2]
> > > >>>>>>>>>
> > /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c]
> > > >>>>>>>>>
> > /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c]
> > > >>>>>>>>>
> > /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7f9a4e8cf548]
> > > >>>>>>>>>
> > /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7f9a5bdefc22]
> > > >>>>>>>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7f9a5bd733a5]
> > > >>>>>>>>>
> > /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7f9a4e6b7088]
> > > >>>>>>>>> /lib64/libpthread.so.0(+0x7569)[0x7f9a5b29f569]
> > > >>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7f9a5afd69af]
> > > >>>>>>>>>
> > > >>>>>>>>> *Scenario 2 *The application just SET Read-Only on each file
> > > >>>>>>>> sucessively. After the 70th file was set, all the bricks
> > crashes and again,
> > > >>>>>>>> one can read this crash report in every brick log:
> > > >>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>> [2019-05-02 07:43:39.953591] I [MSGID: 139001]
> > > >>>>>>>>> [posix-acl.c:263:posix_acl_log_permit_denied]
> > 0-longterm-access-control:
> > > >>>>>>>>> client:
> > > >>>>>>>>>
> > CTX_ID:21aa9c75-3a5f-41f9-925b-48e4c80bd24a-GRAPH_ID:0-PID:16325-HOST:XXX-X-X-XXX-PC_NAME:longterm-client-0-RECON_NO:-0,
> > > >>>>>>>>> gfid: 00000000-0000-0000-0000-000000000001,
> > > >>>>>>>>> req(uid:2000,gid:2000,perm:1,ngrps:1),
> > > >>>>>>>>> ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP,
> > acl:-) [Permission
> > > >>>>>>>>> denied]
> > > >>>>>>>>>
> > > >>>>>>>>> pending frames:
> > > >>>>>>>>>
> > > >>>>>>>>> frame : type(0) op(27)
> > > >>>>>>>>>
> > > >>>>>>>>> patchset: git://git.gluster.org/glusterfs.git
> > > >>>>>>>>>
> > > >>>>>>>>> signal received: 11
> > > >>>>>>>>>
> > > >>>>>>>>> time of crash:
> > > >>>>>>>>>
> > > >>>>>>>>> 2019-05-02 07:43:39
> > > >>>>>>>>>
> > > >>>>>>>>> configuration details:
> > > >>>>>>>>>
> > > >>>>>>>>> argp 1
> > > >>>>>>>>>
> > > >>>>>>>>> backtrace 1
> > > >>>>>>>>>
> > > >>>>>>>>> dlfcn 1
> > > >>>>>>>>>
> > > >>>>>>>>> libpthread 1
> > > >>>>>>>>>
> > > >>>>>>>>> llistxattr 1
> > > >>>>>>>>>
> > > >>>>>>>>> setfsid 1
> > > >>>>>>>>>
> > > >>>>>>>>> spinlock 1
> > > >>>>>>>>>
> > > >>>>>>>>> epoll.h 1
> > > >>>>>>>>>
> > > >>>>>>>>> xattr.h 1
> > > >>>>>>>>>
> > > >>>>>>>>> st_atim.tv_nsec 1
> > > >>>>>>>>>
> > > >>>>>>>>> package-string: glusterfs 5.5
> > > >>>>>>>>>
> > > >>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fbb3f0b364c]
> > > >>>>>>>>>
> > > >>>>>>>>>
> > /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fbb3f0bdd26]
> > > >>>>>>>>>
> > > >>>>>>>>> /lib64/libc.so.6(+0x361e0)[0x7fbb3e27a1e0]
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7fbb32257910]
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7fbb32254118]
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7fbb324708d6]
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7fbb326a275b]
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7fbb324693b3]
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7fbb322545b2]
> > > >>>>>>>>>
> > > >>>>>>>>>
> > /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c]
> > > >>>>>>>>>
> > > >>>>>>>>>
> > /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c]
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7fbb31c18548]
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7fbb3f155c22]
> > > >>>>>>>>>
> > > >>>>>>>>> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7fbb3f0d93a5]
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7fbb31a00088]
> > > >>>>>>>>>
> > > >>>>>>>>> /lib64/libpthread.so.0(+0x7569)[0x7fbb3e605569]
> > > >>>>>>>>>
> > > >>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fbb3e33c9ef]
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> This happens on a 3-Node Gluster v5.5 Cluster on two different
> > > >>>>>>>> volumes. But both volumes has the same settings:
> > > >>>>>>>>
> > > >>>>>>>>> Volume Name: shortterm
> > > >>>>>>>>> Type: Replicate
> > > >>>>>>>>> Volume ID: 5307e5c5-e8a1-493a-a846-342fb0195dee
> > > >>>>>>>>> Status: Started
> > > >>>>>>>>> Snapshot Count: 0
> > > >>>>>>>>> Number of Bricks: 1 x 3 = 3
> > > >>>>>>>>> Transport-type: tcp
> > > >>>>>>>>> Bricks:
> > > >>>>>>>>> Brick1: fs-xxxxx-c1-n1:/gluster/brick4/glusterbrick
> > > >>>>>>>>> Brick2: fs-xxxxx-c1-n2:/gluster/brick4/glusterbrick
> > > >>>>>>>>> Brick3: fs-xxxxx-c1-n3:/gluster/brick4/glusterbrick
> > > >>>>>>>>> Options Reconfigured:
> > > >>>>>>>>> storage.reserve: 1
> > > >>>>>>>>> performance.client-io-threads: off
> > > >>>>>>>>> nfs.disable: on
> > > >>>>>>>>> transport.address-family: inet
> > > >>>>>>>>> user.smb: disable
> > > >>>>>>>>> features.read-only: off
> > > >>>>>>>>> features.worm: off
> > > >>>>>>>>> features.worm-file-level: on
> > > >>>>>>>>> features.retention-mode: enterprise
> > > >>>>>>>>> features.default-retention-period: 120
> > > >>>>>>>>> network.ping-timeout: 10
> > > >>>>>>>>> features.cache-invalidation: on
> > > >>>>>>>>> features.cache-invalidation-timeout: 600
> > > >>>>>>>>> performance.nl-cache: on
> > > >>>>>>>>> performance.nl-cache-timeout: 600
> > > >>>>>>>>> client.event-threads: 32
> > > >>>>>>>>> server.event-threads: 32
> > > >>>>>>>>> cluster.lookup-optimize: on
> > > >>>>>>>>> performance.stat-prefetch: on
> > > >>>>>>>>> performance.cache-invalidation: on
> > > >>>>>>>>> performance.md-cache-timeout: 600
> > > >>>>>>>>> performance.cache-samba-metadata: on
> > > >>>>>>>>> performance.cache-ima-xattrs: on
> > > >>>>>>>>> performance.io-thread-count: 64
> > > >>>>>>>>> cluster.use-compound-fops: on
> > > >>>>>>>>> performance.cache-size: 512MB
> > > >>>>>>>>> performance.cache-refresh-timeout: 10
> > > >>>>>>>>> performance.read-ahead: off
> > > >>>>>>>>> performance.write-behind-window-size: 4MB
> > > >>>>>>>>> performance.write-behind: on
> > > >>>>>>>>> storage.build-pgfid: on
> > > >>>>>>>>> features.utime: on
> > > >>>>>>>>> storage.ctime: on
> > > >>>>>>>>> cluster.quorum-type: fixed
> > > >>>>>>>>> cluster.quorum-count: 2
> > > >>>>>>>>> features.bitrot: on
> > > >>>>>>>>> features.scrub: Active
> > > >>>>>>>>> features.scrub-freq: daily
> > > >>>>>>>>> cluster.enable-shared-storage: enable
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>> Why can this happen to all Brick processes? I don't understand
> > the
> > > >>>>>>>> crash report. The FOPs are nothing special and after restart
> > brick
> > > >>>>>>>> processes everything works fine and our application was succeed.
> > > >>>>>>>>
> > > >>>>>>>> Regards
> > > >>>>>>>> David Spisla
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> _______________________________________________
> > > >>>>>>>> Gluster-users mailing list
> > > >>>>>>>> Gluster-users at gluster.org
> > > >>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
> > > >>>>>>>
> > > >>>>>>>
> >
> > > _______________________________________________
> > > Gluster-users mailing list
> > > Gluster-users at gluster.org
> > > https://lists.gluster.org/mailman/listinfo/gluster-users
> >
> >

From spisla80 at gmail.com  Fri May 17 09:57:47 2019
From: spisla80 at gmail.com (David Spisla)
Date: Fri, 17 May 2019 11:57:47 +0200
Subject: [Gluster-users] Brick-Xlators crashes after Set-RO and Read
In-Reply-To: <20190517093502.GB24535@ndevos-x270>
References: <CAHn=sVAm-9GXBYst2P0k3rn-HU15Rw3HgUAGY0R=MOymDpxq-A@mail.gmail.com>
	<CAJyj9j83TOBvaBCW+biCoo0uxhf+0nZed1cTFYCJfmu2s-ZsNA@mail.gmail.com>
	<CAHn=sVCh1nPCvUvZT2tmnn7NF8WztO9D9PUiNa5mDb5ORy+8Yw@mail.gmail.com>
	<CAJyj9j-h=3y+uQhSZvvh=31GUQX5eAqOo3FCXqECpE6QADVm_A@mail.gmail.com>
	<CAHn=sVA0iC8izWBvctks_+L4b921QKx1fmgV7cJB1aXFMBF=JA@mail.gmail.com>
	<CAJyj9j-1YYmSm0JVr8U7NsA+4TPQ1tAqeYHD-9BzG2oYyg5kEQ@mail.gmail.com>
	<CAHn=sVBr8LO4eoePoOR2Kzd9Ac4EzfnKOUn-nCXb_GvuKwkLAQ@mail.gmail.com>
	<CAJyj9j8nnU4WivHFiAH_gVTzOwJj=GAo6DTKtF+=Fxq+xadF_A@mail.gmail.com>
	<20190517082141.GA24535@ndevos-x270>
	<CAJyj9j9J5onU2z_K8RxUgn_Zp8Tn0S-A0WQC2=qEsdAB19EUyQ@mail.gmail.com>
	<20190517093502.GB24535@ndevos-x270>
Message-ID: <CAJyj9j9GnV6G9fTyzkP8JfXZd2G41vZ1p5UZN2wwE+WA8hedmQ@mail.gmail.com>

Hello Niels,

Am Fr., 17. Mai 2019 um 11:35 Uhr schrieb Niels de Vos <ndevos at redhat.com>:

> On Fri, May 17, 2019 at 11:17:52AM +0200, David Spisla wrote:
> > Hello Niels,
> >
> > Am Fr., 17. Mai 2019 um 10:21 Uhr schrieb Niels de Vos <
> ndevos at redhat.com>:
> >
> > > On Fri, May 17, 2019 at 09:50:28AM +0200, David Spisla wrote:
> > > > Hello Vijay,
> > > > thank you for the clarification. Yes, there is an unconditional
> > > dereference
> > > > in stbuf. It seems plausible that this causes the crash. I think a
> check
> > > > like this should help:
> > > >
> > > > if (buf == NULL) {
> > > >         goto out;
> > > > }
> > > > map_atime_from_server(this, buf);
> > > >
> > > > Is there a reason why buf can be NULL?
> > >
> > > It seems LOOKUP returned an error (errno=13: EACCES: Permission
> denied).
> > > This is probably something you need to handle in worm_lookup_cbk. There
> > > can be many reasons for a FOP to return an error, why it happened in
> > > this case is a little difficult to say without (much) more details.
> > >
> > Yes, I will look for a way to handle that case.
> > It is intended, that the struct stbuf ist NULL when an error happens?
>
> Yes, in most error occasions it will not be possible to get a valid
> stbuf.
>
I will do a check like this assuming that in case of an error op_errno != 0
and ret = -1

if (buf == NULL || op_errno != 0 || ret = -1) {
        goto out;
}
map_atime_from_server(this, buf);

Does this fit?
Regards
David

>
> Niels
>
>
> >
> > Regards
> > David Spisla
> >
> >
> > > HTH,
> > > Niels
> > >
> > >
> > > >
> > > > Regards
> > > > David Spisla
> > > >
> > > >
> > > > Am Fr., 17. Mai 2019 um 01:51 Uhr schrieb Vijay Bellur <
> > > vbellur at redhat.com>:
> > > >
> > > > > Hello David,
> > > > >
> > > > > From the backtrace it looks like stbuf is NULL in
> > > map_atime_from_server()
> > > > > as  worm_lookup_cbk has got an error (op_ret = -1, op_errno = 13).
> Can
> > > you
> > > > > please check if there is an unconditional dereference of stbuf in
> > > > > map_atime_from_server()?
> > > > >
> > > > > Regards,
> > > > > Vijay
> > > > >
> > > > > On Thu, May 16, 2019 at 2:36 AM David Spisla <spisla80 at gmail.com>
> > > wrote:
> > > > >
> > > > >> Hello Vijay,
> > > > >>
> > > > >> yes, we are using custom patches. It s a helper function, which is
> > > > >> defined in xlator_helper.c and used in worm_lookup_cbk.
> > > > >> Do you think this could be the problem? The functions only
> manipulates
> > > > >> the atime in struct iattr
> > > > >>
> > > > >> Regards
> > > > >> David Spisla
> > > > >>
> > > > >> Am Do., 16. Mai 2019 um 10:05 Uhr schrieb Vijay Bellur <
> > > > >> vbellur at redhat.com>:
> > > > >>
> > > > >>> Hello David,
> > > > >>>
> > > > >>> Do you have any custom patches in your deployment? I looked up
> v5.5
> > > but
> > > > >>> could not find the following functions referred to in the core:
> > > > >>>
> > > > >>> map_atime_from_server()
> > > > >>> worm_lookup_cbk()
> > > > >>>
> > > > >>> Neither do I see xlator_helper.c in the codebase.
> > > > >>>
> > > > >>> Thanks,
> > > > >>> Vijay
> > > > >>>
> > > > >>>
> > > > >>> #0  map_atime_from_server (this=0x7fdef401af00, stbuf=0x0) at
> > > > >>> ../../../../xlators/lib/src/xlator_helper.c:21
> > > > >>>         __FUNCTION__ = "map_to_atime_from_server"
> > > > >>> #1  0x00007fdef39a0382 in worm_lookup_cbk (frame=frame at entry
> > > =0x7fdeac0015c8,
> > > > >>> cookie=<optimized out>, this=0x7fdef401af00, op_ret=op_ret at entry
> =-1,
> > > > >>> op_errno=op_errno at entry=13,
> > > > >>>     inode=inode at entry=0x0, buf=0x0, xdata=0x0, postparent=0x0)
> at
> > > > >>> worm.c:531
> > > > >>>         priv = 0x7fdef4075378
> > > > >>>         ret = 0
> > > > >>>         __FUNCTION__ = "worm_lookup_cbk"
> > > > >>>
> > > > >>> On Thu, May 16, 2019 at 12:53 AM David Spisla <
> spisla80 at gmail.com>
> > > > >>> wrote:
> > > > >>>
> > > > >>>> Hello Vijay,
> > > > >>>>
> > > > >>>> I could reproduce the issue. After doing a simple DIR Listing
> from
> > > > >>>> Win10 powershell, all brick processes crashes. Its not the same
> > > scenario
> > > > >>>> mentioned before but the crash report in the bricks log is the
> same.
> > > > >>>> Attached you find the backtrace.
> > > > >>>>
> > > > >>>> Regards
> > > > >>>> David Spisla
> > > > >>>>
> > > > >>>> Am Di., 7. Mai 2019 um 20:08 Uhr schrieb Vijay Bellur <
> > > > >>>> vbellur at redhat.com>:
> > > > >>>>
> > > > >>>>> Hello David,
> > > > >>>>>
> > > > >>>>> On Tue, May 7, 2019 at 2:16 AM David Spisla <
> spisla80 at gmail.com>
> > > > >>>>> wrote:
> > > > >>>>>
> > > > >>>>>> Hello Vijay,
> > > > >>>>>>
> > > > >>>>>> how can I create such a core file? Or will it be created
> > > > >>>>>> automatically if a gluster process crashes?
> > > > >>>>>> Maybe you can give me a hint and will try to get a backtrace.
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>> Generation of core file is dependent on the system
> configuration.
> > > > >>>>> `man 5 core` contains useful information to generate a core
> file
> > > in a
> > > > >>>>> directory. Once a core file is generated, you can use gdb to
> get a
> > > > >>>>> backtrace of all threads (using "thread apply all bt full").
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>> Unfortunately this bug is not easy to reproduce because it
> appears
> > > > >>>>>> only sometimes.
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>> If the bug is not easy to reproduce, having a backtrace from
> the
> > > > >>>>> generated core would be very useful!
> > > > >>>>>
> > > > >>>>> Thanks,
> > > > >>>>> Vijay
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>>
> > > > >>>>>> Regards
> > > > >>>>>> David Spisla
> > > > >>>>>>
> > > > >>>>>> Am Mo., 6. Mai 2019 um 19:48 Uhr schrieb Vijay Bellur <
> > > > >>>>>> vbellur at redhat.com>:
> > > > >>>>>>
> > > > >>>>>>> Thank you for the report, David. Do you have core files
> > > available on
> > > > >>>>>>> any of the servers? If yes, would it be possible for you to
> > > provide a
> > > > >>>>>>> backtrace.
> > > > >>>>>>>
> > > > >>>>>>> Regards,
> > > > >>>>>>> Vijay
> > > > >>>>>>>
> > > > >>>>>>> On Mon, May 6, 2019 at 3:09 AM David Spisla <
> spisla80 at gmail.com>
> > > > >>>>>>> wrote:
> > > > >>>>>>>
> > > > >>>>>>>> Hello folks,
> > > > >>>>>>>>
> > > > >>>>>>>> we have a client application (runs on Win10) which does some
> > > FOPs
> > > > >>>>>>>> on a gluster volume which is accessed by SMB.
> > > > >>>>>>>>
> > > > >>>>>>>> *Scenario 1* is a READ Operation which reads all files
> > > > >>>>>>>> successively and checks if the files data was correctly
> copied.
> > > While doing
> > > > >>>>>>>> this, all brick processes crashes and in the logs one have
> this
> > > crash
> > > > >>>>>>>> report on every brick log:
> > > > >>>>>>>>
> > > > >>>>>>>>>
> > >
> CTX_ID:a0359502-2c76-4fee-8cb9-365679dc690e-GRAPH_ID:0-PID:32934-HOST:XX-XXXXX-XX-XX-PC_NAME:shortterm-client-2-RECON_NO:-0,
> > > gfid: 00000000-0000-0000-0000-000000000001,
> > > req(uid:2000,gid:2000,perm:1,ngrps:1),
> > > ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-)
> [Permission
> > > denied]
> > > > >>>>>>>>> pending frames:
> > > > >>>>>>>>> frame : type(0) op(27)
> > > > >>>>>>>>> frame : type(0) op(40)
> > > > >>>>>>>>> patchset: git://git.gluster.org/glusterfs.git
> > > > >>>>>>>>> signal received: 11
> > > > >>>>>>>>> time of crash:
> > > > >>>>>>>>> 2019-04-16 08:32:21
> > > > >>>>>>>>> configuration details:
> > > > >>>>>>>>> argp 1
> > > > >>>>>>>>> backtrace 1
> > > > >>>>>>>>> dlfcn 1
> > > > >>>>>>>>> libpthread 1
> > > > >>>>>>>>> llistxattr 1
> > > > >>>>>>>>> setfsid 1
> > > > >>>>>>>>> spinlock 1
> > > > >>>>>>>>> epoll.h 1
> > > > >>>>>>>>> xattr.h 1
> > > > >>>>>>>>> st_atim.tv_nsec 1
> > > > >>>>>>>>> package-string: glusterfs 5.5
> > > > >>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f9a5bd4d64c]
> > > > >>>>>>>>>
> > > /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f9a5bd57d26]
> > > > >>>>>>>>> /lib64/libc.so.6(+0x361a0)[0x7f9a5af141a0]
> > > > >>>>>>>>>
> > >
> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7f9a4ef0e910]
> > > > >>>>>>>>>
> > >
> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7f9a4ef0b118]
> > > > >>>>>>>>>
> > >
> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7f9a4f1278d6]
> > > > >>>>>>>>>
> > >
> /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7f9a4f35975b]
> > > > >>>>>>>>>
> > >
> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7f9a4f1203b3]
> > > > >>>>>>>>>
> > >
> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7f9a4ef0b5b2]
> > > > >>>>>>>>>
> > > /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c]
> > > > >>>>>>>>>
> > > /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c]
> > > > >>>>>>>>>
> > >
> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7f9a4e8cf548]
> > > > >>>>>>>>>
> > >
> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7f9a5bdefc22]
> > > > >>>>>>>>>
> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7f9a5bd733a5]
> > > > >>>>>>>>>
> > >
> /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7f9a4e6b7088]
> > > > >>>>>>>>> /lib64/libpthread.so.0(+0x7569)[0x7f9a5b29f569]
> > > > >>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7f9a5afd69af]
> > > > >>>>>>>>>
> > > > >>>>>>>>> *Scenario 2 *The application just SET Read-Only on each
> file
> > > > >>>>>>>> sucessively. After the 70th file was set, all the bricks
> > > crashes and again,
> > > > >>>>>>>> one can read this crash report in every brick log:
> > > > >>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>> [2019-05-02 07:43:39.953591] I [MSGID: 139001]
> > > > >>>>>>>>> [posix-acl.c:263:posix_acl_log_permit_denied]
> > > 0-longterm-access-control:
> > > > >>>>>>>>> client:
> > > > >>>>>>>>>
> > >
> CTX_ID:21aa9c75-3a5f-41f9-925b-48e4c80bd24a-GRAPH_ID:0-PID:16325-HOST:XXX-X-X-XXX-PC_NAME:longterm-client-0-RECON_NO:-0,
> > > > >>>>>>>>> gfid: 00000000-0000-0000-0000-000000000001,
> > > > >>>>>>>>> req(uid:2000,gid:2000,perm:1,ngrps:1),
> > > > >>>>>>>>> ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP,
> > > acl:-) [Permission
> > > > >>>>>>>>> denied]
> > > > >>>>>>>>>
> > > > >>>>>>>>> pending frames:
> > > > >>>>>>>>>
> > > > >>>>>>>>> frame : type(0) op(27)
> > > > >>>>>>>>>
> > > > >>>>>>>>> patchset: git://git.gluster.org/glusterfs.git
> > > > >>>>>>>>>
> > > > >>>>>>>>> signal received: 11
> > > > >>>>>>>>>
> > > > >>>>>>>>> time of crash:
> > > > >>>>>>>>>
> > > > >>>>>>>>> 2019-05-02 07:43:39
> > > > >>>>>>>>>
> > > > >>>>>>>>> configuration details:
> > > > >>>>>>>>>
> > > > >>>>>>>>> argp 1
> > > > >>>>>>>>>
> > > > >>>>>>>>> backtrace 1
> > > > >>>>>>>>>
> > > > >>>>>>>>> dlfcn 1
> > > > >>>>>>>>>
> > > > >>>>>>>>> libpthread 1
> > > > >>>>>>>>>
> > > > >>>>>>>>> llistxattr 1
> > > > >>>>>>>>>
> > > > >>>>>>>>> setfsid 1
> > > > >>>>>>>>>
> > > > >>>>>>>>> spinlock 1
> > > > >>>>>>>>>
> > > > >>>>>>>>> epoll.h 1
> > > > >>>>>>>>>
> > > > >>>>>>>>> xattr.h 1
> > > > >>>>>>>>>
> > > > >>>>>>>>> st_atim.tv_nsec 1
> > > > >>>>>>>>>
> > > > >>>>>>>>> package-string: glusterfs 5.5
> > > > >>>>>>>>>
> > > > >>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fbb3f0b364c]
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fbb3f0bdd26]
> > > > >>>>>>>>>
> > > > >>>>>>>>> /lib64/libc.so.6(+0x361e0)[0x7fbb3e27a1e0]
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > >
> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7fbb32257910]
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > >
> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7fbb32254118]
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > >
> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7fbb324708d6]
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > >
> /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7fbb326a275b]
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > >
> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7fbb324693b3]
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > >
> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7fbb322545b2]
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c]
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c]
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > >
> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7fbb31c18548]
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > >
> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7fbb3f155c22]
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7fbb3f0d93a5]
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > >
> /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7fbb31a00088]
> > > > >>>>>>>>>
> > > > >>>>>>>>> /lib64/libpthread.so.0(+0x7569)[0x7fbb3e605569]
> > > > >>>>>>>>>
> > > > >>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fbb3e33c9ef]
> > > > >>>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>> This happens on a 3-Node Gluster v5.5 Cluster on two
> different
> > > > >>>>>>>> volumes. But both volumes has the same settings:
> > > > >>>>>>>>
> > > > >>>>>>>>> Volume Name: shortterm
> > > > >>>>>>>>> Type: Replicate
> > > > >>>>>>>>> Volume ID: 5307e5c5-e8a1-493a-a846-342fb0195dee
> > > > >>>>>>>>> Status: Started
> > > > >>>>>>>>> Snapshot Count: 0
> > > > >>>>>>>>> Number of Bricks: 1 x 3 = 3
> > > > >>>>>>>>> Transport-type: tcp
> > > > >>>>>>>>> Bricks:
> > > > >>>>>>>>> Brick1: fs-xxxxx-c1-n1:/gluster/brick4/glusterbrick
> > > > >>>>>>>>> Brick2: fs-xxxxx-c1-n2:/gluster/brick4/glusterbrick
> > > > >>>>>>>>> Brick3: fs-xxxxx-c1-n3:/gluster/brick4/glusterbrick
> > > > >>>>>>>>> Options Reconfigured:
> > > > >>>>>>>>> storage.reserve: 1
> > > > >>>>>>>>> performance.client-io-threads: off
> > > > >>>>>>>>> nfs.disable: on
> > > > >>>>>>>>> transport.address-family: inet
> > > > >>>>>>>>> user.smb: disable
> > > > >>>>>>>>> features.read-only: off
> > > > >>>>>>>>> features.worm: off
> > > > >>>>>>>>> features.worm-file-level: on
> > > > >>>>>>>>> features.retention-mode: enterprise
> > > > >>>>>>>>> features.default-retention-period: 120
> > > > >>>>>>>>> network.ping-timeout: 10
> > > > >>>>>>>>> features.cache-invalidation: on
> > > > >>>>>>>>> features.cache-invalidation-timeout: 600
> > > > >>>>>>>>> performance.nl-cache: on
> > > > >>>>>>>>> performance.nl-cache-timeout: 600
> > > > >>>>>>>>> client.event-threads: 32
> > > > >>>>>>>>> server.event-threads: 32
> > > > >>>>>>>>> cluster.lookup-optimize: on
> > > > >>>>>>>>> performance.stat-prefetch: on
> > > > >>>>>>>>> performance.cache-invalidation: on
> > > > >>>>>>>>> performance.md-cache-timeout: 600
> > > > >>>>>>>>> performance.cache-samba-metadata: on
> > > > >>>>>>>>> performance.cache-ima-xattrs: on
> > > > >>>>>>>>> performance.io-thread-count: 64
> > > > >>>>>>>>> cluster.use-compound-fops: on
> > > > >>>>>>>>> performance.cache-size: 512MB
> > > > >>>>>>>>> performance.cache-refresh-timeout: 10
> > > > >>>>>>>>> performance.read-ahead: off
> > > > >>>>>>>>> performance.write-behind-window-size: 4MB
> > > > >>>>>>>>> performance.write-behind: on
> > > > >>>>>>>>> storage.build-pgfid: on
> > > > >>>>>>>>> features.utime: on
> > > > >>>>>>>>> storage.ctime: on
> > > > >>>>>>>>> cluster.quorum-type: fixed
> > > > >>>>>>>>> cluster.quorum-count: 2
> > > > >>>>>>>>> features.bitrot: on
> > > > >>>>>>>>> features.scrub: Active
> > > > >>>>>>>>> features.scrub-freq: daily
> > > > >>>>>>>>> cluster.enable-shared-storage: enable
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>> Why can this happen to all Brick processes? I don't
> understand
> > > the
> > > > >>>>>>>> crash report. The FOPs are nothing special and after restart
> > > brick
> > > > >>>>>>>> processes everything works fine and our application was
> succeed.
> > > > >>>>>>>>
> > > > >>>>>>>> Regards
> > > > >>>>>>>> David Spisla
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>> _______________________________________________
> > > > >>>>>>>> Gluster-users mailing list
> > > > >>>>>>>> Gluster-users at gluster.org
> > > > >>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
> > > > >>>>>>>
> > > > >>>>>>>
> > >
> > > > _______________________________________________
> > > > Gluster-users mailing list
> > > > Gluster-users at gluster.org
> > > > https://lists.gluster.org/mailman/listinfo/gluster-users
> > >
> > >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190517/65f70f3b/attachment-0001.html>

From ndevos at redhat.com  Fri May 17 10:15:54 2019
From: ndevos at redhat.com (Niels de Vos)
Date: Fri, 17 May 2019 12:15:54 +0200
Subject: [Gluster-users] Brick-Xlators crashes after Set-RO and Read
In-Reply-To: <CAJyj9j9GnV6G9fTyzkP8JfXZd2G41vZ1p5UZN2wwE+WA8hedmQ@mail.gmail.com>
References: <CAHn=sVCh1nPCvUvZT2tmnn7NF8WztO9D9PUiNa5mDb5ORy+8Yw@mail.gmail.com>
	<CAJyj9j-h=3y+uQhSZvvh=31GUQX5eAqOo3FCXqECpE6QADVm_A@mail.gmail.com>
	<CAHn=sVA0iC8izWBvctks_+L4b921QKx1fmgV7cJB1aXFMBF=JA@mail.gmail.com>
	<CAJyj9j-1YYmSm0JVr8U7NsA+4TPQ1tAqeYHD-9BzG2oYyg5kEQ@mail.gmail.com>
	<CAHn=sVBr8LO4eoePoOR2Kzd9Ac4EzfnKOUn-nCXb_GvuKwkLAQ@mail.gmail.com>
	<CAJyj9j8nnU4WivHFiAH_gVTzOwJj=GAo6DTKtF+=Fxq+xadF_A@mail.gmail.com>
	<20190517082141.GA24535@ndevos-x270>
	<CAJyj9j9J5onU2z_K8RxUgn_Zp8Tn0S-A0WQC2=qEsdAB19EUyQ@mail.gmail.com>
	<20190517093502.GB24535@ndevos-x270>
	<CAJyj9j9GnV6G9fTyzkP8JfXZd2G41vZ1p5UZN2wwE+WA8hedmQ@mail.gmail.com>
Message-ID: <20190517101328.GC24535@ndevos-x270>

On Fri, May 17, 2019 at 11:57:47AM +0200, David Spisla wrote:
> Hello Niels,
> 
> Am Fr., 17. Mai 2019 um 11:35 Uhr schrieb Niels de Vos <ndevos at redhat.com>:
> 
> > On Fri, May 17, 2019 at 11:17:52AM +0200, David Spisla wrote:
> > > Hello Niels,
> > >
> > > Am Fr., 17. Mai 2019 um 10:21 Uhr schrieb Niels de Vos <
> > ndevos at redhat.com>:
> > >
> > > > On Fri, May 17, 2019 at 09:50:28AM +0200, David Spisla wrote:
> > > > > Hello Vijay,
> > > > > thank you for the clarification. Yes, there is an unconditional
> > > > dereference
> > > > > in stbuf. It seems plausible that this causes the crash. I think a
> > check
> > > > > like this should help:
> > > > >
> > > > > if (buf == NULL) {
> > > > >         goto out;
> > > > > }
> > > > > map_atime_from_server(this, buf);
> > > > >
> > > > > Is there a reason why buf can be NULL?
> > > >
> > > > It seems LOOKUP returned an error (errno=13: EACCES: Permission
> > denied).
> > > > This is probably something you need to handle in worm_lookup_cbk. There
> > > > can be many reasons for a FOP to return an error, why it happened in
> > > > this case is a little difficult to say without (much) more details.
> > > >
> > > Yes, I will look for a way to handle that case.
> > > It is intended, that the struct stbuf ist NULL when an error happens?
> >
> > Yes, in most error occasions it will not be possible to get a valid
> > stbuf.
> >
> I will do a check like this assuming that in case of an error op_errno != 0
> and ret = -1
> 
> if (buf == NULL || op_errno != 0 || ret = -1) {
>         goto out;
> }
> map_atime_from_server(this, buf);
> 
> Does this fit?

I think it is more common to do

    if (ret == -1) {
        /* error handling and unwind */
        goto out;
    }

    map_atime_from_server(this, buf);

Niels

> Regards
> David
> 
> >
> > Niels
> >
> >
> > >
> > > Regards
> > > David Spisla
> > >
> > >
> > > > HTH,
> > > > Niels
> > > >
> > > >
> > > > >
> > > > > Regards
> > > > > David Spisla
> > > > >
> > > > >
> > > > > Am Fr., 17. Mai 2019 um 01:51 Uhr schrieb Vijay Bellur <
> > > > vbellur at redhat.com>:
> > > > >
> > > > > > Hello David,
> > > > > >
> > > > > > From the backtrace it looks like stbuf is NULL in
> > > > map_atime_from_server()
> > > > > > as  worm_lookup_cbk has got an error (op_ret = -1, op_errno = 13).
> > Can
> > > > you
> > > > > > please check if there is an unconditional dereference of stbuf in
> > > > > > map_atime_from_server()?
> > > > > >
> > > > > > Regards,
> > > > > > Vijay
> > > > > >
> > > > > > On Thu, May 16, 2019 at 2:36 AM David Spisla <spisla80 at gmail.com>
> > > > wrote:
> > > > > >
> > > > > >> Hello Vijay,
> > > > > >>
> > > > > >> yes, we are using custom patches. It s a helper function, which is
> > > > > >> defined in xlator_helper.c and used in worm_lookup_cbk.
> > > > > >> Do you think this could be the problem? The functions only
> > manipulates
> > > > > >> the atime in struct iattr
> > > > > >>
> > > > > >> Regards
> > > > > >> David Spisla
> > > > > >>
> > > > > >> Am Do., 16. Mai 2019 um 10:05 Uhr schrieb Vijay Bellur <
> > > > > >> vbellur at redhat.com>:
> > > > > >>
> > > > > >>> Hello David,
> > > > > >>>
> > > > > >>> Do you have any custom patches in your deployment? I looked up
> > v5.5
> > > > but
> > > > > >>> could not find the following functions referred to in the core:
> > > > > >>>
> > > > > >>> map_atime_from_server()
> > > > > >>> worm_lookup_cbk()
> > > > > >>>
> > > > > >>> Neither do I see xlator_helper.c in the codebase.
> > > > > >>>
> > > > > >>> Thanks,
> > > > > >>> Vijay
> > > > > >>>
> > > > > >>>
> > > > > >>> #0  map_atime_from_server (this=0x7fdef401af00, stbuf=0x0) at
> > > > > >>> ../../../../xlators/lib/src/xlator_helper.c:21
> > > > > >>>         __FUNCTION__ = "map_to_atime_from_server"
> > > > > >>> #1  0x00007fdef39a0382 in worm_lookup_cbk (frame=frame at entry
> > > > =0x7fdeac0015c8,
> > > > > >>> cookie=<optimized out>, this=0x7fdef401af00, op_ret=op_ret at entry
> > =-1,
> > > > > >>> op_errno=op_errno at entry=13,
> > > > > >>>     inode=inode at entry=0x0, buf=0x0, xdata=0x0, postparent=0x0)
> > at
> > > > > >>> worm.c:531
> > > > > >>>         priv = 0x7fdef4075378
> > > > > >>>         ret = 0
> > > > > >>>         __FUNCTION__ = "worm_lookup_cbk"
> > > > > >>>
> > > > > >>> On Thu, May 16, 2019 at 12:53 AM David Spisla <
> > spisla80 at gmail.com>
> > > > > >>> wrote:
> > > > > >>>
> > > > > >>>> Hello Vijay,
> > > > > >>>>
> > > > > >>>> I could reproduce the issue. After doing a simple DIR Listing
> > from
> > > > > >>>> Win10 powershell, all brick processes crashes. Its not the same
> > > > scenario
> > > > > >>>> mentioned before but the crash report in the bricks log is the
> > same.
> > > > > >>>> Attached you find the backtrace.
> > > > > >>>>
> > > > > >>>> Regards
> > > > > >>>> David Spisla
> > > > > >>>>
> > > > > >>>> Am Di., 7. Mai 2019 um 20:08 Uhr schrieb Vijay Bellur <
> > > > > >>>> vbellur at redhat.com>:
> > > > > >>>>
> > > > > >>>>> Hello David,
> > > > > >>>>>
> > > > > >>>>> On Tue, May 7, 2019 at 2:16 AM David Spisla <
> > spisla80 at gmail.com>
> > > > > >>>>> wrote:
> > > > > >>>>>
> > > > > >>>>>> Hello Vijay,
> > > > > >>>>>>
> > > > > >>>>>> how can I create such a core file? Or will it be created
> > > > > >>>>>> automatically if a gluster process crashes?
> > > > > >>>>>> Maybe you can give me a hint and will try to get a backtrace.
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>> Generation of core file is dependent on the system
> > configuration.
> > > > > >>>>> `man 5 core` contains useful information to generate a core
> > file
> > > > in a
> > > > > >>>>> directory. Once a core file is generated, you can use gdb to
> > get a
> > > > > >>>>> backtrace of all threads (using "thread apply all bt full").
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>>> Unfortunately this bug is not easy to reproduce because it
> > appears
> > > > > >>>>>> only sometimes.
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>> If the bug is not easy to reproduce, having a backtrace from
> > the
> > > > > >>>>> generated core would be very useful!
> > > > > >>>>>
> > > > > >>>>> Thanks,
> > > > > >>>>> Vijay
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>>>
> > > > > >>>>>> Regards
> > > > > >>>>>> David Spisla
> > > > > >>>>>>
> > > > > >>>>>> Am Mo., 6. Mai 2019 um 19:48 Uhr schrieb Vijay Bellur <
> > > > > >>>>>> vbellur at redhat.com>:
> > > > > >>>>>>
> > > > > >>>>>>> Thank you for the report, David. Do you have core files
> > > > available on
> > > > > >>>>>>> any of the servers? If yes, would it be possible for you to
> > > > provide a
> > > > > >>>>>>> backtrace.
> > > > > >>>>>>>
> > > > > >>>>>>> Regards,
> > > > > >>>>>>> Vijay
> > > > > >>>>>>>
> > > > > >>>>>>> On Mon, May 6, 2019 at 3:09 AM David Spisla <
> > spisla80 at gmail.com>
> > > > > >>>>>>> wrote:
> > > > > >>>>>>>
> > > > > >>>>>>>> Hello folks,
> > > > > >>>>>>>>
> > > > > >>>>>>>> we have a client application (runs on Win10) which does some
> > > > FOPs
> > > > > >>>>>>>> on a gluster volume which is accessed by SMB.
> > > > > >>>>>>>>
> > > > > >>>>>>>> *Scenario 1* is a READ Operation which reads all files
> > > > > >>>>>>>> successively and checks if the files data was correctly
> > copied.
> > > > While doing
> > > > > >>>>>>>> this, all brick processes crashes and in the logs one have
> > this
> > > > crash
> > > > > >>>>>>>> report on every brick log:
> > > > > >>>>>>>>
> > > > > >>>>>>>>>
> > > >
> > CTX_ID:a0359502-2c76-4fee-8cb9-365679dc690e-GRAPH_ID:0-PID:32934-HOST:XX-XXXXX-XX-XX-PC_NAME:shortterm-client-2-RECON_NO:-0,
> > > > gfid: 00000000-0000-0000-0000-000000000001,
> > > > req(uid:2000,gid:2000,perm:1,ngrps:1),
> > > > ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-)
> > [Permission
> > > > denied]
> > > > > >>>>>>>>> pending frames:
> > > > > >>>>>>>>> frame : type(0) op(27)
> > > > > >>>>>>>>> frame : type(0) op(40)
> > > > > >>>>>>>>> patchset: git://git.gluster.org/glusterfs.git
> > > > > >>>>>>>>> signal received: 11
> > > > > >>>>>>>>> time of crash:
> > > > > >>>>>>>>> 2019-04-16 08:32:21
> > > > > >>>>>>>>> configuration details:
> > > > > >>>>>>>>> argp 1
> > > > > >>>>>>>>> backtrace 1
> > > > > >>>>>>>>> dlfcn 1
> > > > > >>>>>>>>> libpthread 1
> > > > > >>>>>>>>> llistxattr 1
> > > > > >>>>>>>>> setfsid 1
> > > > > >>>>>>>>> spinlock 1
> > > > > >>>>>>>>> epoll.h 1
> > > > > >>>>>>>>> xattr.h 1
> > > > > >>>>>>>>> st_atim.tv_nsec 1
> > > > > >>>>>>>>> package-string: glusterfs 5.5
> > > > > >>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f9a5bd4d64c]
> > > > > >>>>>>>>>
> > > > /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f9a5bd57d26]
> > > > > >>>>>>>>> /lib64/libc.so.6(+0x361a0)[0x7f9a5af141a0]
> > > > > >>>>>>>>>
> > > >
> > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7f9a4ef0e910]
> > > > > >>>>>>>>>
> > > >
> > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7f9a4ef0b118]
> > > > > >>>>>>>>>
> > > >
> > /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7f9a4f1278d6]
> > > > > >>>>>>>>>
> > > >
> > /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7f9a4f35975b]
> > > > > >>>>>>>>>
> > > >
> > /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7f9a4f1203b3]
> > > > > >>>>>>>>>
> > > >
> > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7f9a4ef0b5b2]
> > > > > >>>>>>>>>
> > > > /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c]
> > > > > >>>>>>>>>
> > > > /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c]
> > > > > >>>>>>>>>
> > > >
> > /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7f9a4e8cf548]
> > > > > >>>>>>>>>
> > > >
> > /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7f9a5bdefc22]
> > > > > >>>>>>>>>
> > /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7f9a5bd733a5]
> > > > > >>>>>>>>>
> > > >
> > /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7f9a4e6b7088]
> > > > > >>>>>>>>> /lib64/libpthread.so.0(+0x7569)[0x7f9a5b29f569]
> > > > > >>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7f9a5afd69af]
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> *Scenario 2 *The application just SET Read-Only on each
> > file
> > > > > >>>>>>>> sucessively. After the 70th file was set, all the bricks
> > > > crashes and again,
> > > > > >>>>>>>> one can read this crash report in every brick log:
> > > > > >>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> [2019-05-02 07:43:39.953591] I [MSGID: 139001]
> > > > > >>>>>>>>> [posix-acl.c:263:posix_acl_log_permit_denied]
> > > > 0-longterm-access-control:
> > > > > >>>>>>>>> client:
> > > > > >>>>>>>>>
> > > >
> > CTX_ID:21aa9c75-3a5f-41f9-925b-48e4c80bd24a-GRAPH_ID:0-PID:16325-HOST:XXX-X-X-XXX-PC_NAME:longterm-client-0-RECON_NO:-0,
> > > > > >>>>>>>>> gfid: 00000000-0000-0000-0000-000000000001,
> > > > > >>>>>>>>> req(uid:2000,gid:2000,perm:1,ngrps:1),
> > > > > >>>>>>>>> ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP,
> > > > acl:-) [Permission
> > > > > >>>>>>>>> denied]
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> pending frames:
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> frame : type(0) op(27)
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> patchset: git://git.gluster.org/glusterfs.git
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> signal received: 11
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> time of crash:
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> 2019-05-02 07:43:39
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> configuration details:
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> argp 1
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> backtrace 1
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> dlfcn 1
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> libpthread 1
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> llistxattr 1
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> setfsid 1
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> spinlock 1
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> epoll.h 1
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> xattr.h 1
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> st_atim.tv_nsec 1
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> package-string: glusterfs 5.5
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fbb3f0b364c]
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > > /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fbb3f0bdd26]
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> /lib64/libc.so.6(+0x361e0)[0x7fbb3e27a1e0]
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > >
> > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7fbb32257910]
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > >
> > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7fbb32254118]
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > >
> > /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7fbb324708d6]
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > >
> > /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7fbb326a275b]
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > >
> > /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7fbb324693b3]
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > >
> > /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7fbb322545b2]
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > > /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c]
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > > /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c]
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > >
> > /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7fbb31c18548]
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > >
> > /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7fbb3f155c22]
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7fbb3f0d93a5]
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > >
> > /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7fbb31a00088]
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> /lib64/libpthread.so.0(+0x7569)[0x7fbb3e605569]
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fbb3e33c9ef]
> > > > > >>>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>> This happens on a 3-Node Gluster v5.5 Cluster on two
> > different
> > > > > >>>>>>>> volumes. But both volumes has the same settings:
> > > > > >>>>>>>>
> > > > > >>>>>>>>> Volume Name: shortterm
> > > > > >>>>>>>>> Type: Replicate
> > > > > >>>>>>>>> Volume ID: 5307e5c5-e8a1-493a-a846-342fb0195dee
> > > > > >>>>>>>>> Status: Started
> > > > > >>>>>>>>> Snapshot Count: 0
> > > > > >>>>>>>>> Number of Bricks: 1 x 3 = 3
> > > > > >>>>>>>>> Transport-type: tcp
> > > > > >>>>>>>>> Bricks:
> > > > > >>>>>>>>> Brick1: fs-xxxxx-c1-n1:/gluster/brick4/glusterbrick
> > > > > >>>>>>>>> Brick2: fs-xxxxx-c1-n2:/gluster/brick4/glusterbrick
> > > > > >>>>>>>>> Brick3: fs-xxxxx-c1-n3:/gluster/brick4/glusterbrick
> > > > > >>>>>>>>> Options Reconfigured:
> > > > > >>>>>>>>> storage.reserve: 1
> > > > > >>>>>>>>> performance.client-io-threads: off
> > > > > >>>>>>>>> nfs.disable: on
> > > > > >>>>>>>>> transport.address-family: inet
> > > > > >>>>>>>>> user.smb: disable
> > > > > >>>>>>>>> features.read-only: off
> > > > > >>>>>>>>> features.worm: off
> > > > > >>>>>>>>> features.worm-file-level: on
> > > > > >>>>>>>>> features.retention-mode: enterprise
> > > > > >>>>>>>>> features.default-retention-period: 120
> > > > > >>>>>>>>> network.ping-timeout: 10
> > > > > >>>>>>>>> features.cache-invalidation: on
> > > > > >>>>>>>>> features.cache-invalidation-timeout: 600
> > > > > >>>>>>>>> performance.nl-cache: on
> > > > > >>>>>>>>> performance.nl-cache-timeout: 600
> > > > > >>>>>>>>> client.event-threads: 32
> > > > > >>>>>>>>> server.event-threads: 32
> > > > > >>>>>>>>> cluster.lookup-optimize: on
> > > > > >>>>>>>>> performance.stat-prefetch: on
> > > > > >>>>>>>>> performance.cache-invalidation: on
> > > > > >>>>>>>>> performance.md-cache-timeout: 600
> > > > > >>>>>>>>> performance.cache-samba-metadata: on
> > > > > >>>>>>>>> performance.cache-ima-xattrs: on
> > > > > >>>>>>>>> performance.io-thread-count: 64
> > > > > >>>>>>>>> cluster.use-compound-fops: on
> > > > > >>>>>>>>> performance.cache-size: 512MB
> > > > > >>>>>>>>> performance.cache-refresh-timeout: 10
> > > > > >>>>>>>>> performance.read-ahead: off
> > > > > >>>>>>>>> performance.write-behind-window-size: 4MB
> > > > > >>>>>>>>> performance.write-behind: on
> > > > > >>>>>>>>> storage.build-pgfid: on
> > > > > >>>>>>>>> features.utime: on
> > > > > >>>>>>>>> storage.ctime: on
> > > > > >>>>>>>>> cluster.quorum-type: fixed
> > > > > >>>>>>>>> cluster.quorum-count: 2
> > > > > >>>>>>>>> features.bitrot: on
> > > > > >>>>>>>>> features.scrub: Active
> > > > > >>>>>>>>> features.scrub-freq: daily
> > > > > >>>>>>>>> cluster.enable-shared-storage: enable
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>> Why can this happen to all Brick processes? I don't
> > understand
> > > > the
> > > > > >>>>>>>> crash report. The FOPs are nothing special and after restart
> > > > brick
> > > > > >>>>>>>> processes everything works fine and our application was
> > succeed.
> > > > > >>>>>>>>
> > > > > >>>>>>>> Regards
> > > > > >>>>>>>> David Spisla
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>> _______________________________________________
> > > > > >>>>>>>> Gluster-users mailing list
> > > > > >>>>>>>> Gluster-users at gluster.org
> > > > > >>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > >
> > > > > _______________________________________________
> > > > > Gluster-users mailing list
> > > > > Gluster-users at gluster.org
> > > > > https://lists.gluster.org/mailman/listinfo/gluster-users
> > > >
> > > >
> >

From spisla80 at gmail.com  Fri May 17 10:34:19 2019
From: spisla80 at gmail.com (David Spisla)
Date: Fri, 17 May 2019 12:34:19 +0200
Subject: [Gluster-users] Brick-Xlators crashes after Set-RO and Read
In-Reply-To: <20190517101328.GC24535@ndevos-x270>
References: <CAHn=sVCh1nPCvUvZT2tmnn7NF8WztO9D9PUiNa5mDb5ORy+8Yw@mail.gmail.com>
	<CAJyj9j-h=3y+uQhSZvvh=31GUQX5eAqOo3FCXqECpE6QADVm_A@mail.gmail.com>
	<CAHn=sVA0iC8izWBvctks_+L4b921QKx1fmgV7cJB1aXFMBF=JA@mail.gmail.com>
	<CAJyj9j-1YYmSm0JVr8U7NsA+4TPQ1tAqeYHD-9BzG2oYyg5kEQ@mail.gmail.com>
	<CAHn=sVBr8LO4eoePoOR2Kzd9Ac4EzfnKOUn-nCXb_GvuKwkLAQ@mail.gmail.com>
	<CAJyj9j8nnU4WivHFiAH_gVTzOwJj=GAo6DTKtF+=Fxq+xadF_A@mail.gmail.com>
	<20190517082141.GA24535@ndevos-x270>
	<CAJyj9j9J5onU2z_K8RxUgn_Zp8Tn0S-A0WQC2=qEsdAB19EUyQ@mail.gmail.com>
	<20190517093502.GB24535@ndevos-x270>
	<CAJyj9j9GnV6G9fTyzkP8JfXZd2G41vZ1p5UZN2wwE+WA8hedmQ@mail.gmail.com>
	<20190517101328.GC24535@ndevos-x270>
Message-ID: <CAJyj9j9eupNYF9uMGya5AiDiLLhNVy=4biXY9duP3W_0mqvKig@mail.gmail.com>

Thank you all for the clarification. This is very helpful!

Regards
David Spisla

Am Fr., 17. Mai 2019 um 12:15 Uhr schrieb Niels de Vos <ndevos at redhat.com>:

> On Fri, May 17, 2019 at 11:57:47AM +0200, David Spisla wrote:
> > Hello Niels,
> >
> > Am Fr., 17. Mai 2019 um 11:35 Uhr schrieb Niels de Vos <
> ndevos at redhat.com>:
> >
> > > On Fri, May 17, 2019 at 11:17:52AM +0200, David Spisla wrote:
> > > > Hello Niels,
> > > >
> > > > Am Fr., 17. Mai 2019 um 10:21 Uhr schrieb Niels de Vos <
> > > ndevos at redhat.com>:
> > > >
> > > > > On Fri, May 17, 2019 at 09:50:28AM +0200, David Spisla wrote:
> > > > > > Hello Vijay,
> > > > > > thank you for the clarification. Yes, there is an unconditional
> > > > > dereference
> > > > > > in stbuf. It seems plausible that this causes the crash. I think
> a
> > > check
> > > > > > like this should help:
> > > > > >
> > > > > > if (buf == NULL) {
> > > > > >         goto out;
> > > > > > }
> > > > > > map_atime_from_server(this, buf);
> > > > > >
> > > > > > Is there a reason why buf can be NULL?
> > > > >
> > > > > It seems LOOKUP returned an error (errno=13: EACCES: Permission
> > > denied).
> > > > > This is probably something you need to handle in worm_lookup_cbk.
> There
> > > > > can be many reasons for a FOP to return an error, why it happened
> in
> > > > > this case is a little difficult to say without (much) more details.
> > > > >
> > > > Yes, I will look for a way to handle that case.
> > > > It is intended, that the struct stbuf ist NULL when an error happens?
> > >
> > > Yes, in most error occasions it will not be possible to get a valid
> > > stbuf.
> > >
> > I will do a check like this assuming that in case of an error op_errno
> != 0
> > and ret = -1
> >
> > if (buf == NULL || op_errno != 0 || ret = -1) {
> >         goto out;
> > }
> > map_atime_from_server(this, buf);
> >
> > Does this fit?
>
> I think it is more common to do
>
>     if (ret == -1) {
>         /* error handling and unwind */
>         goto out;
>     }
>
>     map_atime_from_server(this, buf);
>
> Niels
>
> > Regards
> > David
> >
> > >
> > > Niels
> > >
> > >
> > > >
> > > > Regards
> > > > David Spisla
> > > >
> > > >
> > > > > HTH,
> > > > > Niels
> > > > >
> > > > >
> > > > > >
> > > > > > Regards
> > > > > > David Spisla
> > > > > >
> > > > > >
> > > > > > Am Fr., 17. Mai 2019 um 01:51 Uhr schrieb Vijay Bellur <
> > > > > vbellur at redhat.com>:
> > > > > >
> > > > > > > Hello David,
> > > > > > >
> > > > > > > From the backtrace it looks like stbuf is NULL in
> > > > > map_atime_from_server()
> > > > > > > as  worm_lookup_cbk has got an error (op_ret = -1, op_errno =
> 13).
> > > Can
> > > > > you
> > > > > > > please check if there is an unconditional dereference of stbuf
> in
> > > > > > > map_atime_from_server()?
> > > > > > >
> > > > > > > Regards,
> > > > > > > Vijay
> > > > > > >
> > > > > > > On Thu, May 16, 2019 at 2:36 AM David Spisla <
> spisla80 at gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > >> Hello Vijay,
> > > > > > >>
> > > > > > >> yes, we are using custom patches. It s a helper function,
> which is
> > > > > > >> defined in xlator_helper.c and used in worm_lookup_cbk.
> > > > > > >> Do you think this could be the problem? The functions only
> > > manipulates
> > > > > > >> the atime in struct iattr
> > > > > > >>
> > > > > > >> Regards
> > > > > > >> David Spisla
> > > > > > >>
> > > > > > >> Am Do., 16. Mai 2019 um 10:05 Uhr schrieb Vijay Bellur <
> > > > > > >> vbellur at redhat.com>:
> > > > > > >>
> > > > > > >>> Hello David,
> > > > > > >>>
> > > > > > >>> Do you have any custom patches in your deployment? I looked
> up
> > > v5.5
> > > > > but
> > > > > > >>> could not find the following functions referred to in the
> core:
> > > > > > >>>
> > > > > > >>> map_atime_from_server()
> > > > > > >>> worm_lookup_cbk()
> > > > > > >>>
> > > > > > >>> Neither do I see xlator_helper.c in the codebase.
> > > > > > >>>
> > > > > > >>> Thanks,
> > > > > > >>> Vijay
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> #0  map_atime_from_server (this=0x7fdef401af00, stbuf=0x0) at
> > > > > > >>> ../../../../xlators/lib/src/xlator_helper.c:21
> > > > > > >>>         __FUNCTION__ = "map_to_atime_from_server"
> > > > > > >>> #1  0x00007fdef39a0382 in worm_lookup_cbk (frame=frame at entry
> > > > > =0x7fdeac0015c8,
> > > > > > >>> cookie=<optimized out>, this=0x7fdef401af00,
> op_ret=op_ret at entry
> > > =-1,
> > > > > > >>> op_errno=op_errno at entry=13,
> > > > > > >>>     inode=inode at entry=0x0, buf=0x0, xdata=0x0,
> postparent=0x0)
> > > at
> > > > > > >>> worm.c:531
> > > > > > >>>         priv = 0x7fdef4075378
> > > > > > >>>         ret = 0
> > > > > > >>>         __FUNCTION__ = "worm_lookup_cbk"
> > > > > > >>>
> > > > > > >>> On Thu, May 16, 2019 at 12:53 AM David Spisla <
> > > spisla80 at gmail.com>
> > > > > > >>> wrote:
> > > > > > >>>
> > > > > > >>>> Hello Vijay,
> > > > > > >>>>
> > > > > > >>>> I could reproduce the issue. After doing a simple DIR
> Listing
> > > from
> > > > > > >>>> Win10 powershell, all brick processes crashes. Its not the
> same
> > > > > scenario
> > > > > > >>>> mentioned before but the crash report in the bricks log is
> the
> > > same.
> > > > > > >>>> Attached you find the backtrace.
> > > > > > >>>>
> > > > > > >>>> Regards
> > > > > > >>>> David Spisla
> > > > > > >>>>
> > > > > > >>>> Am Di., 7. Mai 2019 um 20:08 Uhr schrieb Vijay Bellur <
> > > > > > >>>> vbellur at redhat.com>:
> > > > > > >>>>
> > > > > > >>>>> Hello David,
> > > > > > >>>>>
> > > > > > >>>>> On Tue, May 7, 2019 at 2:16 AM David Spisla <
> > > spisla80 at gmail.com>
> > > > > > >>>>> wrote:
> > > > > > >>>>>
> > > > > > >>>>>> Hello Vijay,
> > > > > > >>>>>>
> > > > > > >>>>>> how can I create such a core file? Or will it be created
> > > > > > >>>>>> automatically if a gluster process crashes?
> > > > > > >>>>>> Maybe you can give me a hint and will try to get a
> backtrace.
> > > > > > >>>>>>
> > > > > > >>>>>
> > > > > > >>>>> Generation of core file is dependent on the system
> > > configuration.
> > > > > > >>>>> `man 5 core` contains useful information to generate a core
> > > file
> > > > > in a
> > > > > > >>>>> directory. Once a core file is generated, you can use gdb
> to
> > > get a
> > > > > > >>>>> backtrace of all threads (using "thread apply all bt
> full").
> > > > > > >>>>>
> > > > > > >>>>>
> > > > > > >>>>>> Unfortunately this bug is not easy to reproduce because it
> > > appears
> > > > > > >>>>>> only sometimes.
> > > > > > >>>>>>
> > > > > > >>>>>
> > > > > > >>>>> If the bug is not easy to reproduce, having a backtrace
> from
> > > the
> > > > > > >>>>> generated core would be very useful!
> > > > > > >>>>>
> > > > > > >>>>> Thanks,
> > > > > > >>>>> Vijay
> > > > > > >>>>>
> > > > > > >>>>>
> > > > > > >>>>>>
> > > > > > >>>>>> Regards
> > > > > > >>>>>> David Spisla
> > > > > > >>>>>>
> > > > > > >>>>>> Am Mo., 6. Mai 2019 um 19:48 Uhr schrieb Vijay Bellur <
> > > > > > >>>>>> vbellur at redhat.com>:
> > > > > > >>>>>>
> > > > > > >>>>>>> Thank you for the report, David. Do you have core files
> > > > > available on
> > > > > > >>>>>>> any of the servers? If yes, would it be possible for you
> to
> > > > > provide a
> > > > > > >>>>>>> backtrace.
> > > > > > >>>>>>>
> > > > > > >>>>>>> Regards,
> > > > > > >>>>>>> Vijay
> > > > > > >>>>>>>
> > > > > > >>>>>>> On Mon, May 6, 2019 at 3:09 AM David Spisla <
> > > spisla80 at gmail.com>
> > > > > > >>>>>>> wrote:
> > > > > > >>>>>>>
> > > > > > >>>>>>>> Hello folks,
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> we have a client application (runs on Win10) which does
> some
> > > > > FOPs
> > > > > > >>>>>>>> on a gluster volume which is accessed by SMB.
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> *Scenario 1* is a READ Operation which reads all files
> > > > > > >>>>>>>> successively and checks if the files data was correctly
> > > copied.
> > > > > While doing
> > > > > > >>>>>>>> this, all brick processes crashes and in the logs one
> have
> > > this
> > > > > crash
> > > > > > >>>>>>>> report on every brick log:
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>>
> > > > >
> > >
> CTX_ID:a0359502-2c76-4fee-8cb9-365679dc690e-GRAPH_ID:0-PID:32934-HOST:XX-XXXXX-XX-XX-PC_NAME:shortterm-client-2-RECON_NO:-0,
> > > > > gfid: 00000000-0000-0000-0000-000000000001,
> > > > > req(uid:2000,gid:2000,perm:1,ngrps:1),
> > > > > ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-)
> > > [Permission
> > > > > denied]
> > > > > > >>>>>>>>> pending frames:
> > > > > > >>>>>>>>> frame : type(0) op(27)
> > > > > > >>>>>>>>> frame : type(0) op(40)
> > > > > > >>>>>>>>> patchset: git://git.gluster.org/glusterfs.git
> > > > > > >>>>>>>>> signal received: 11
> > > > > > >>>>>>>>> time of crash:
> > > > > > >>>>>>>>> 2019-04-16 08:32:21
> > > > > > >>>>>>>>> configuration details:
> > > > > > >>>>>>>>> argp 1
> > > > > > >>>>>>>>> backtrace 1
> > > > > > >>>>>>>>> dlfcn 1
> > > > > > >>>>>>>>> libpthread 1
> > > > > > >>>>>>>>> llistxattr 1
> > > > > > >>>>>>>>> setfsid 1
> > > > > > >>>>>>>>> spinlock 1
> > > > > > >>>>>>>>> epoll.h 1
> > > > > > >>>>>>>>> xattr.h 1
> > > > > > >>>>>>>>> st_atim.tv_nsec 1
> > > > > > >>>>>>>>> package-string: glusterfs 5.5
> > > > > > >>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f9a5bd4d64c]
> > > > > > >>>>>>>>>
> > > > > /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f9a5bd57d26]
> > > > > > >>>>>>>>> /lib64/libc.so.6(+0x361a0)[0x7f9a5af141a0]
> > > > > > >>>>>>>>>
> > > > >
> > >
> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7f9a4ef0e910]
> > > > > > >>>>>>>>>
> > > > >
> > >
> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7f9a4ef0b118]
> > > > > > >>>>>>>>>
> > > > >
> > >
> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7f9a4f1278d6]
> > > > > > >>>>>>>>>
> > > > >
> > >
> /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7f9a4f35975b]
> > > > > > >>>>>>>>>
> > > > >
> > >
> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7f9a4f1203b3]
> > > > > > >>>>>>>>>
> > > > >
> > >
> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7f9a4ef0b5b2]
> > > > > > >>>>>>>>>
> > > > > /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c]
> > > > > > >>>>>>>>>
> > > > > /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7f9a5bdd7b6c]
> > > > > > >>>>>>>>>
> > > > >
> > >
> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7f9a4e8cf548]
> > > > > > >>>>>>>>>
> > > > >
> > >
> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7f9a5bdefc22]
> > > > > > >>>>>>>>>
> > > /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7f9a5bd733a5]
> > > > > > >>>>>>>>>
> > > > >
> > >
> /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7f9a4e6b7088]
> > > > > > >>>>>>>>> /lib64/libpthread.so.0(+0x7569)[0x7f9a5b29f569]
> > > > > > >>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7f9a5afd69af]
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> *Scenario 2 *The application just SET Read-Only on each
> > > file
> > > > > > >>>>>>>> sucessively. After the 70th file was set, all the bricks
> > > > > crashes and again,
> > > > > > >>>>>>>> one can read this crash report in every brick log:
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> [2019-05-02 07:43:39.953591] I [MSGID: 139001]
> > > > > > >>>>>>>>> [posix-acl.c:263:posix_acl_log_permit_denied]
> > > > > 0-longterm-access-control:
> > > > > > >>>>>>>>> client:
> > > > > > >>>>>>>>>
> > > > >
> > >
> CTX_ID:21aa9c75-3a5f-41f9-925b-48e4c80bd24a-GRAPH_ID:0-PID:16325-HOST:XXX-X-X-XXX-PC_NAME:longterm-client-0-RECON_NO:-0,
> > > > > > >>>>>>>>> gfid: 00000000-0000-0000-0000-000000000001,
> > > > > > >>>>>>>>> req(uid:2000,gid:2000,perm:1,ngrps:1),
> > > > > > >>>>>>>>>
> ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP,
> > > > > acl:-) [Permission
> > > > > > >>>>>>>>> denied]
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> pending frames:
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> frame : type(0) op(27)
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> patchset: git://git.gluster.org/glusterfs.git
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> signal received: 11
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> time of crash:
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> 2019-05-02 07:43:39
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> configuration details:
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> argp 1
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> backtrace 1
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> dlfcn 1
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> libpthread 1
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> llistxattr 1
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> setfsid 1
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> spinlock 1
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> epoll.h 1
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> xattr.h 1
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> st_atim.tv_nsec 1
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> package-string: glusterfs 5.5
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fbb3f0b364c]
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>>
> > > > > /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fbb3f0bdd26]
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> /lib64/libc.so.6(+0x361e0)[0x7fbb3e27a1e0]
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>>
> > > > >
> > >
> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0xb910)[0x7fbb32257910]
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>>
> > > > >
> > >
> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x8118)[0x7fbb32254118]
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>>
> > > > >
> > >
> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0x128d6)[0x7fbb324708d6]
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>>
> > > > >
> > >
> /usr/lib64/glusterfs/5.5/xlator/features/access-control.so(+0x575b)[0x7fbb326a275b]
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>>
> > > > >
> > >
> /usr/lib64/glusterfs/5.5/xlator/features/locks.so(+0xb3b3)[0x7fbb324693b3]
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>>
> > > > >
> > >
> /usr/lib64/glusterfs/5.5/xlator/features/worm.so(+0x85b2)[0x7fbb322545b2]
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>>
> > > > > /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c]
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>>
> > > > > /usr/lib64/libglusterfs.so.0(default_lookup+0xbc)[0x7fbb3f13db6c]
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>>
> > > > >
> > >
> /usr/lib64/glusterfs/5.5/xlator/features/upcall.so(+0xf548)[0x7fbb31c18548]
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>>
> > > > >
> > >
> /usr/lib64/libglusterfs.so.0(default_lookup_resume+0x1e2)[0x7fbb3f155c22]
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>>
> > > /usr/lib64/libglusterfs.so.0(call_resume+0x75)[0x7fbb3f0d93a5]
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>>
> > > > >
> > >
> /usr/lib64/glusterfs/5.5/xlator/performance/io-threads.so(+0x6088)[0x7fbb31a00088]
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> /lib64/libpthread.so.0(+0x7569)[0x7fbb3e605569]
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>> /lib64/libc.so.6(clone+0x3f)[0x7fbb3e33c9ef]
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> This happens on a 3-Node Gluster v5.5 Cluster on two
> > > different
> > > > > > >>>>>>>> volumes. But both volumes has the same settings:
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>> Volume Name: shortterm
> > > > > > >>>>>>>>> Type: Replicate
> > > > > > >>>>>>>>> Volume ID: 5307e5c5-e8a1-493a-a846-342fb0195dee
> > > > > > >>>>>>>>> Status: Started
> > > > > > >>>>>>>>> Snapshot Count: 0
> > > > > > >>>>>>>>> Number of Bricks: 1 x 3 = 3
> > > > > > >>>>>>>>> Transport-type: tcp
> > > > > > >>>>>>>>> Bricks:
> > > > > > >>>>>>>>> Brick1: fs-xxxxx-c1-n1:/gluster/brick4/glusterbrick
> > > > > > >>>>>>>>> Brick2: fs-xxxxx-c1-n2:/gluster/brick4/glusterbrick
> > > > > > >>>>>>>>> Brick3: fs-xxxxx-c1-n3:/gluster/brick4/glusterbrick
> > > > > > >>>>>>>>> Options Reconfigured:
> > > > > > >>>>>>>>> storage.reserve: 1
> > > > > > >>>>>>>>> performance.client-io-threads: off
> > > > > > >>>>>>>>> nfs.disable: on
> > > > > > >>>>>>>>> transport.address-family: inet
> > > > > > >>>>>>>>> user.smb: disable
> > > > > > >>>>>>>>> features.read-only: off
> > > > > > >>>>>>>>> features.worm: off
> > > > > > >>>>>>>>> features.worm-file-level: on
> > > > > > >>>>>>>>> features.retention-mode: enterprise
> > > > > > >>>>>>>>> features.default-retention-period: 120
> > > > > > >>>>>>>>> network.ping-timeout: 10
> > > > > > >>>>>>>>> features.cache-invalidation: on
> > > > > > >>>>>>>>> features.cache-invalidation-timeout: 600
> > > > > > >>>>>>>>> performance.nl-cache: on
> > > > > > >>>>>>>>> performance.nl-cache-timeout: 600
> > > > > > >>>>>>>>> client.event-threads: 32
> > > > > > >>>>>>>>> server.event-threads: 32
> > > > > > >>>>>>>>> cluster.lookup-optimize: on
> > > > > > >>>>>>>>> performance.stat-prefetch: on
> > > > > > >>>>>>>>> performance.cache-invalidation: on
> > > > > > >>>>>>>>> performance.md-cache-timeout: 600
> > > > > > >>>>>>>>> performance.cache-samba-metadata: on
> > > > > > >>>>>>>>> performance.cache-ima-xattrs: on
> > > > > > >>>>>>>>> performance.io-thread-count: 64
> > > > > > >>>>>>>>> cluster.use-compound-fops: on
> > > > > > >>>>>>>>> performance.cache-size: 512MB
> > > > > > >>>>>>>>> performance.cache-refresh-timeout: 10
> > > > > > >>>>>>>>> performance.read-ahead: off
> > > > > > >>>>>>>>> performance.write-behind-window-size: 4MB
> > > > > > >>>>>>>>> performance.write-behind: on
> > > > > > >>>>>>>>> storage.build-pgfid: on
> > > > > > >>>>>>>>> features.utime: on
> > > > > > >>>>>>>>> storage.ctime: on
> > > > > > >>>>>>>>> cluster.quorum-type: fixed
> > > > > > >>>>>>>>> cluster.quorum-count: 2
> > > > > > >>>>>>>>> features.bitrot: on
> > > > > > >>>>>>>>> features.scrub: Active
> > > > > > >>>>>>>>> features.scrub-freq: daily
> > > > > > >>>>>>>>> cluster.enable-shared-storage: enable
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>>>
> > > > > > >>>>>>>> Why can this happen to all Brick processes? I don't
> > > understand
> > > > > the
> > > > > > >>>>>>>> crash report. The FOPs are nothing special and after
> restart
> > > > > brick
> > > > > > >>>>>>>> processes everything works fine and our application was
> > > succeed.
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> Regards
> > > > > > >>>>>>>> David Spisla
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>>
> > > > > > >>>>>>>> _______________________________________________
> > > > > > >>>>>>>> Gluster-users mailing list
> > > > > > >>>>>>>> Gluster-users at gluster.org
> > > > > > >>>>>>>>
> https://lists.gluster.org/mailman/listinfo/gluster-users
> > > > > > >>>>>>>
> > > > > > >>>>>>>
> > > > >
> > > > > > _______________________________________________
> > > > > > Gluster-users mailing list
> > > > > > Gluster-users at gluster.org
> > > > > > https://lists.gluster.org/mailman/listinfo/gluster-users
> > > > >
> > > > >
> > >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190517/e9bd6016/attachment.html>

From ravishankar at redhat.com  Fri May 17 10:42:59 2019
From: ravishankar at redhat.com (Ravishankar N)
Date: Fri, 17 May 2019 16:12:59 +0530
Subject: [Gluster-users] add-brick: failed: Commit failed
In-Reply-To: <CAHGbP-9UtXqtK8=zsf=V+HghhLmsyONSfkt6SZAYQOtFezHBJw@mail.gmail.com>
References: <CAHGbP-9UtXqtK8=zsf=V+HghhLmsyONSfkt6SZAYQOtFezHBJw@mail.gmail.com>
Message-ID: <be9839aa-3aba-078b-9de6-f280e45a838d@redhat.com>


On 17/05/19 5:59 AM, David Cunningham wrote:
> Hello,
>
> We're adding an arbiter node to an existing volume and having an 
> issue. Can anyone help? The root cause error appears to be 
> "00000000-0000-0000-0000-000000000001: failed to resolve (Transport 
> endpoint is not connected)", as below.
>
Was your root directory of the replica 2 volume? in metadata or entry 
split-brain? If yes, you need to resolve it before proceeding with the 
add-brick.

-Ravi


> We are running glusterfs 5.6.1. Thanks in advance for any assistance!
>
> On existing node gfs1, trying to add new arbiter node gfs3:
>
> # gluster volume add-brick gvol0 replica 3 arbiter 1 
> gfs3:/nodirectwritedata/gluster/gvol0
> volume add-brick: failed: Commit failed on gfs3. Please check log file 
> for details.
>
> On new node gfs3 in gvol0-add-brick-mount.log:
>
> [2019-05-17 01:20:22.689721] I [fuse-bridge.c:4267:fuse_init] 
> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 
> kernel 7.22
> [2019-05-17 01:20:22.689778] I [fuse-bridge.c:4878:fuse_graph_sync] 
> 0-fuse: switched to graph 0
> [2019-05-17 01:20:22.694897] E [fuse-bridge.c:4336:fuse_first_lookup] 
> 0-fuse: first lookup on root failed (Transport endpoint is not connected)
> [2019-05-17 01:20:22.699770] W 
> [fuse-resolve.c:127:fuse_resolve_gfid_cbk] 0-fuse: 
> 00000000-0000-0000-0000-000000000001: failed to resolve (Transport 
> endpoint is not connected)
> [2019-05-17 01:20:22.699834] W 
> [fuse-bridge.c:3294:fuse_setxattr_resume] 0-glusterfs-fuse: 2: 
> SETXATTR 00000000-0000-0000-0000-000000000001/1 (trusted.add-brick) 
> resolution failed
> [2019-05-17 01:20:22.715656] I [fuse-bridge.c:5144:fuse_thread_proc] 
> 0-fuse: initating unmount of /tmp/mntQAtu3f
> [2019-05-17 01:20:22.715865] W [glusterfsd.c:1500:cleanup_and_exit] 
> (-->/lib64/libpthread.so.0(+0x7dd5) [0x7fb223bf6dd5] 
> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x560886581e75] 
> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x560886581ceb] ) 0-: 
> received signum (15), shutting down
> [2019-05-17 01:20:22.715926] I [fuse-bridge.c:5914:fini] 0-fuse: 
> Unmounting '/tmp/mntQAtu3f'.
> [2019-05-17 01:20:22.715953] I [fuse-bridge.c:5919:fini] 0-fuse: 
> Closing fuse connection to '/tmp/mntQAtu3f'.
>
> Processes running on new node gfs3:
>
> # ps -ef | grep gluster
> root????? 6832???? 1? 0 20:17 ???????? 00:00:00 /usr/sbin/glusterd -p 
> /var/run/glusterd.pid --log-level INFO
> root???? 15799???? 1? 0 20:17 ???????? 00:00:00 /usr/sbin/glusterfs -s 
> localhost --volfile-id gluster/glustershd -p 
> /var/run/gluster/glustershd/glustershd.pid -l 
> /var/log/glusterfs/glustershd.log -S 
> /var/run/gluster/24c12b09f93eec8e.socket --xlator-option 
> *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412 
> --process-name glustershd
> root???? 16856 16735? 0 21:21 pts/0??? 00:00:00 grep --color=auto gluster
>
> -- 
> David Cunningham, Voisonics Limited
> http://voisonics.com/
> USA: +1 213 221 1092
> New Zealand: +64 (0)28 2558 3782
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190517/2098d335/attachment.html>

From dcunningham at voisonics.com  Fri May 17 23:01:24 2019
From: dcunningham at voisonics.com (David Cunningham)
Date: Sat, 18 May 2019 11:01:24 +1200
Subject: [Gluster-users] add-brick: failed: Commit failed
In-Reply-To: <be9839aa-3aba-078b-9de6-f280e45a838d@redhat.com>
References: <CAHGbP-9UtXqtK8=zsf=V+HghhLmsyONSfkt6SZAYQOtFezHBJw@mail.gmail.com>
	<be9839aa-3aba-078b-9de6-f280e45a838d@redhat.com>
Message-ID: <CAHGbP-8tY7TEmsUHGyn2stOnXHg+X_OQ8XkFhHLRPwkApUqCDg@mail.gmail.com>

Hi Ravi,

The existing two nodes aren't in split-brain, at least that I'm aware of.
Running "gluster volume status all" doesn't show any problem.

I'm not sure what "in metadata" means. Can you please explain that one?


On Fri, 17 May 2019 at 22:43, Ravishankar N <ravishankar at redhat.com> wrote:

>
> On 17/05/19 5:59 AM, David Cunningham wrote:
>
> Hello,
>
> We're adding an arbiter node to an existing volume and having an issue.
> Can anyone help? The root cause error appears to be
> "00000000-0000-0000-0000-000000000001: failed to resolve (Transport
> endpoint is not connected)", as below.
>
> Was your root directory of the replica 2 volume  in metadata or entry
> split-brain? If yes, you need to resolve it before proceeding with the
> add-brick.
>
> -Ravi
>
>
> We are running glusterfs 5.6.1. Thanks in advance for any assistance!
>
> On existing node gfs1, trying to add new arbiter node gfs3:
>
> # gluster volume add-brick gvol0 replica 3 arbiter 1
> gfs3:/nodirectwritedata/gluster/gvol0
> volume add-brick: failed: Commit failed on gfs3. Please check log file for
> details.
>
> On new node gfs3 in gvol0-add-brick-mount.log:
>
> [2019-05-17 01:20:22.689721] I [fuse-bridge.c:4267:fuse_init]
> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel
> 7.22
> [2019-05-17 01:20:22.689778] I [fuse-bridge.c:4878:fuse_graph_sync]
> 0-fuse: switched to graph 0
> [2019-05-17 01:20:22.694897] E [fuse-bridge.c:4336:fuse_first_lookup]
> 0-fuse: first lookup on root failed (Transport endpoint is not connected)
> [2019-05-17 01:20:22.699770] W [fuse-resolve.c:127:fuse_resolve_gfid_cbk]
> 0-fuse: 00000000-0000-0000-0000-000000000001: failed to resolve (Transport
> endpoint is not connected)
> [2019-05-17 01:20:22.699834] W [fuse-bridge.c:3294:fuse_setxattr_resume]
> 0-glusterfs-fuse: 2: SETXATTR 00000000-0000-0000-0000-000000000001/1
> (trusted.add-brick) resolution failed
> [2019-05-17 01:20:22.715656] I [fuse-bridge.c:5144:fuse_thread_proc]
> 0-fuse: initating unmount of /tmp/mntQAtu3f
> [2019-05-17 01:20:22.715865] W [glusterfsd.c:1500:cleanup_and_exit]
> (-->/lib64/libpthread.so.0(+0x7dd5) [0x7fb223bf6dd5]
> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x560886581e75]
> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x560886581ceb] ) 0-:
> received signum (15), shutting down
> [2019-05-17 01:20:22.715926] I [fuse-bridge.c:5914:fini] 0-fuse:
> Unmounting '/tmp/mntQAtu3f'.
> [2019-05-17 01:20:22.715953] I [fuse-bridge.c:5919:fini] 0-fuse: Closing
> fuse connection to '/tmp/mntQAtu3f'.
>
> Processes running on new node gfs3:
>
> # ps -ef | grep gluster
> root      6832     1  0 20:17 ?        00:00:00 /usr/sbin/glusterd -p
> /var/run/glusterd.pid --log-level INFO
> root     15799     1  0 20:17 ?        00:00:00 /usr/sbin/glusterfs -s
> localhost --volfile-id gluster/glustershd -p
> /var/run/gluster/glustershd/glustershd.pid -l
> /var/log/glusterfs/glustershd.log -S
> /var/run/gluster/24c12b09f93eec8e.socket --xlator-option
> *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412 --process-name
> glustershd
> root     16856 16735  0 21:21 pts/0    00:00:00 grep --color=auto gluster
>
> --
> David Cunningham, Voisonics Limited
> http://voisonics.com/
> USA: +1 213 221 1092
> New Zealand: +64 (0)28 2558 3782
>
> _______________________________________________
> Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users
>
>

-- 
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190518/3c45ff84/attachment.html>

From vladkopy at gmail.com  Fri May 17 23:18:36 2019
From: vladkopy at gmail.com (Vlad Kopylov)
Date: Fri, 17 May 2019 19:18:36 -0400
Subject: [Gluster-users] gluster-block v0.4 is alive!
In-Reply-To: <CANwsLLHaP7OV50wHbc0jSH0frvhLr9a2qNZf1BagoY1+q0S8jQ@mail.gmail.com>
References: <CANwsLLHaP7OV50wHbc0jSH0frvhLr9a2qNZf1BagoY1+q0S8jQ@mail.gmail.com>
Message-ID: <CACeDKtd0BeR8spNU9yfifmGu60MYqJaPnAReKac8P0dEwPiUwA@mail.gmail.com>

straight from

./autogen.sh && ./configure && make -j install


CentOS Linux release 7.6.1810 (Core)


May 17 19:13:18 vm2 gluster-blockd[24294]: Error opening log file: No such
file or directory
May 17 19:13:18 vm2 gluster-blockd[24294]: Logging to stderr.
May 17 19:13:18 vm2 gluster-blockd[24294]: [2019-05-17 23:13:18.966992]
CRIT: trying to change logDir from /var/log/gluster-block to
/var/log/gluster-block [at utils.c+495 :<initLogDirAndFiles>]
May 17 19:13:19 vm2 gluster-blockd[24294]: No such path
/backstores/user:glfs
May 17 19:13:19 vm2 systemd[1]: gluster-blockd.service: main process
exited, code=exited, status=1/FAILURE
May 17 19:13:19 vm2 systemd[1]: Unit gluster-blockd.service entered failed
state.
May 17 19:13:19 vm2 systemd[1]: gluster-blockd.service failed.


On Thu, May 2, 2019 at 1:35 PM Prasanna Kalever <pkalever at redhat.com> wrote:

> Hello Gluster folks,
>
> Gluster-block team is happy to announce the v0.4 release [1].
>
> This is the new stable version of gluster-block, lots of new and
> exciting features and interesting bug fixes are made available as part
> of this release.
> Please find the big list of release highlights and notable fixes at [2].
>
> Details about installation can be found in the easy install guide at
> [3]. Find the details about prerequisites and setup guide at [4].
> If you are a new user, checkout the demo video attached in the README
> doc [5], which will be a good source of intro to the project.
> There are good examples about how to use gluster-block both in the man
> pages [6] and test file [7] (also in the README).
>
> gluster-block is part of fedora package collection, an updated package
> with release version v0.4 will be soon made available. And the
> community provided packages will be soon made available at [8].
>
> Please spend a minute to report any kind of issue that comes to your
> notice with this handy link [9].
> We look forward to your feedback, which will help gluster-block get better!
>
> We would like to thank all our users, contributors for bug filing and
> fixes, also the whole team who involved in the huge effort with
> pre-release testing.
>
>
> [1] https://github.com/gluster/gluster-block
> [2] https://github.com/gluster/gluster-block/releases
> [3] https://github.com/gluster/gluster-block/blob/master/INSTALL
> [4] https://github.com/gluster/gluster-block#usage
> [5] https://github.com/gluster/gluster-block/blob/master/README.md
> [6] https://github.com/gluster/gluster-block/tree/master/docs
> [7] https://github.com/gluster/gluster-block/blob/master/tests/basic.t
> [8] https://download.gluster.org/pub/gluster/gluster-block/
> [9] https://github.com/gluster/gluster-block/issues/new
>
> Cheers,
> Team Gluster-Block!
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190517/652bdddd/attachment.html>

From hunter86_bg at yahoo.com  Sat May 18 10:34:57 2019
From: hunter86_bg at yahoo.com (Strahil)
Date: Sat, 18 May 2019 13:34:57 +0300
Subject: [Gluster-users] add-brick: failed: Commit failed
Message-ID: <4p7xh1oypo8edvdnaxhb0i8y.1558175697764@email.android.com>

Just run 'gluster volume heal my_volume info summary'.

It will report any issues - everything should be 'Connected' and show '0'.

Best Regards,
Strahil NikolovOn May 18, 2019 02:01, David Cunningham <dcunningham at voisonics.com> wrote:
>
> Hi Ravi,
>
> The existing two nodes aren't in split-brain, at least that I'm aware of. Running "gluster volume status all" doesn't show any problem.
>
> I'm not sure what "in metadata" means. Can you please explain that one?
>
>
> On Fri, 17 May 2019 at 22:43, Ravishankar N <ravishankar at redhat.com> wrote:
>>
>>
>> On 17/05/19 5:59 AM, David Cunningham wrote:
>>>
>>> Hello,
>>>
>>> We're adding an arbiter node to an existing volume and having an issue. Can anyone help? The root cause error appears to be "00000000-0000-0000-0000-000000000001: failed to resolve (Transport endpoint is not connected)", as below.
>>>
>> Was your root directory of the replica 2 volume? in metadata or entry split-brain? If yes, you need to resolve it before proceeding with the add-brick.
>>
>> -Ravi
>>
>>
>>> We are running glusterfs 5.6.1. Thanks in advance for any assistance!
>>>
>>> On existing node gfs1, trying to add new arbiter node gfs3:
>>>
>>> # gluster volume add-brick gvol0 replica 3 arbiter 1 gfs3:/nodirectwritedata/gluster/gvol0
>>> volume add-brick: failed: Commit failed on gfs3. Please check log file for details.
>>>
>>> On new node gfs3 in gvol0-add-brick-mount.log:
>>>
>>> [2019-05-17 01:20:22.689721] I [fuse-bridge.c:4267:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.22
>>> [2019-05-17 01:20:22.689778] I [fuse-bridge.c:4878:fuse_graph_sync] 0-fuse: switched to graph 0
>>> [2019-05-17 01:20:22.694897] E [fuse-bridge.c:4336:fuse_first_lookup] 0-fuse: first lookup on root failed (Transport endpoint is not connected)
>>> [2019-05-17 01:20:22.699770] W [fuse-resolve.c:127:fuse_resolve_gfid_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001: failed to resolve (Transport endpoint is not connected)
>>> [2019-05-17 01:20:22.699834] W [fuse-bridge.c:3294:fuse_setxattr_resume] 0-glusterfs-fuse: 2: SETXATTR 00000000-0000-0000-0000-000000000001/1 (trusted.add-brick) resolution failed
>>> [2019-05-17 01:20:22.715656] I [fuse-bridge.c:5144:fuse_thread_proc] 0-fuse: initating unmount of /tmp/mntQAtu3f
>>> [2019-05-17 01:20:22.715865] W [glusterfsd.c:1500:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dd5)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190518/936fae26/attachment.html>

From dcunningham at voisonics.com  Sun May 19 23:31:16 2019
From: dcunningham at voisonics.com (David Cunningham)
Date: Mon, 20 May 2019 11:31:16 +1200
Subject: [Gluster-users] add-brick: failed: Commit failed
In-Reply-To: <4p7xh1oypo8edvdnaxhb0i8y.1558175697764@email.android.com>
References: <4p7xh1oypo8edvdnaxhb0i8y.1558175697764@email.android.com>
Message-ID: <CAHGbP-8WJbMxUUpeXKPqTvOsbhj9jwukLXHjvPOa1=TxNUJ-VQ@mail.gmail.com>

Hello,

It does show everything as Connected and 0 for the existing bricks, gfs1
and gfs2. The new brick gfs3 isn't listed, presumably because of the
failure as per my original email. Would anyone have any further suggestions
on how to prevent the "Transport endpoint is not connected" error when
adding the new brick?

# gluster volume heal gvol0 info summary
Brick gfs1:/nodirectwritedata/gluster/gvol0
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0

Brick gfs2:/nodirectwritedata/gluster/gvol0
Status: Connected
Total Number of entries: 0
Number of entries in heal pending: 0
Number of entries in split-brain: 0
Number of entries possibly healing: 0


# gluster volume status all
Status of volume: gvol0
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick gfs1:/nodirectwritedata/gluster/gvol0 49152     0          Y
7706
Brick gfs2:/nodirectwritedata/gluster/gvol0 49152     0          Y
7624
Self-heal Daemon on localhost               N/A       N/A        Y
47636
Self-heal Daemon on gfs3                    N/A       N/A        Y
18542
Self-heal Daemon on gfs2                    N/A       N/A        Y
37192

Task Status of Volume gvol0
------------------------------------------------------------------------------
There are no active volume task


On Sat, 18 May 2019 at 22:34, Strahil <hunter86_bg at yahoo.com> wrote:

> Just run 'gluster volume heal my_volume info summary'.
>
> It will report any issues - everything should be 'Connected' and show '0'.
>
> Best Regards,
> Strahil Nikolov
> On May 18, 2019 02:01, David Cunningham <dcunningham at voisonics.com> wrote:
>
> Hi Ravi,
>
> The existing two nodes aren't in split-brain, at least that I'm aware of.
> Running "gluster volume status all" doesn't show any problem.
>
> I'm not sure what "in metadata" means. Can you please explain that one?
>
>
> On Fri, 17 May 2019 at 22:43, Ravishankar N <ravishankar at redhat.com>
> wrote:
>
>
> On 17/05/19 5:59 AM, David Cunningham wrote:
>
> Hello,
>
> We're adding an arbiter node to an existing volume and having an issue.
> Can anyone help? The root cause error appears to be
> "00000000-0000-0000-0000-000000000001: failed to resolve (Transport
> endpoint is not connected)", as below.
>
> Was your root directory of the replica 2 volume  in metadata or entry
> split-brain? If yes, you need to resolve it before proceeding with the
> add-brick.
>
> -Ravi
>
>
> We are running glusterfs 5.6.1. Thanks in advance for any assistance!
>
> On existing node gfs1, trying to add new arbiter node gfs3:
>
> # gluster volume add-brick gvol0 replica 3 arbiter 1
> gfs3:/nodirectwritedata/gluster/gvol0
> volume add-brick: failed: Commit failed on gfs3. Please check log file for
> details.
>
> On new node gfs3 in gvol0-add-brick-mount.log:
>
> [2019-05-17 01:20:22.689721] I [fuse-bridge.c:4267:fuse_init]
> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel
> 7.22
> [2019-05-17 01:20:22.689778] I [fuse-bridge.c:4878:fuse_graph_sync]
> 0-fuse: switched to graph 0
> [2019-05-17 01:20:22.694897] E [fuse-bridge.c:4336:fuse_first_lookup]
> 0-fuse: first lookup on root failed (Transport endpoint is not connected)
> [2019-05-17 01:20:22.699770] W [fuse-resolve.c:127:fuse_resolve_gfid_cbk]
> 0-fuse: 00000000-0000-0000-0000-000000000001: failed to resolve (Transport
> endpoint is not connected)
> [2019-05-17 01:20:22.699834] W [fuse-bridge.c:3294:fuse_setxattr_resume]
> 0-glusterfs-fuse: 2: SETXATTR 00000000-0000-0000-0000-000000000001/1
> (trusted.add-brick) resolution failed
> [2019-05-17 01:20:22.715656] I [fuse-bridge.c:5144:fuse_thread_proc]
> 0-fuse: initating unmount of /tmp/mntQAtu3f
> [2019-05-17 01:20:22.715865] W [glusterfsd.c:1500:cleanup_and_exit]
> (-->/lib64/libpthread.so.0(+0x7dd5)
>
>

-- 
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190520/f0755d64/attachment.html>

From hunter86_bg at yahoo.com  Mon May 20 03:57:30 2019
From: hunter86_bg at yahoo.com (Strahil)
Date: Mon, 20 May 2019 06:57:30 +0300
Subject: [Gluster-users] add-brick: failed: Commit failed
Message-ID: <y0vfvr3m9y7s9jwlqp8mt4x1.1558324459973@email.android.com>

As everything seems OK, you can check if your arbiter is ok.
Run 'gluster peer status' on all nodes.

If all peers report 2 peers connected ,you can run:
gluster volume add-brick gvol0 replica 2 arbiter 1 gfs3:/nodirectwritedata/gluster/gvol0

Bewt Regards,
Strahil NikolovOn May 20, 2019 02:31, David Cunningham <dcunningham at voisonics.com> wrote:
>
> Hello,
>
> It does show everything as Connected and 0 for the existing bricks, gfs1 and gfs2. The new brick gfs3 isn't listed, presumably because of the failure as per my original email. Would anyone have any further suggestions on how to prevent the "Transport endpoint is not connected" error when adding the new brick?
>
> # gluster volume heal gvol0 info summary
> Brick gfs1:/nodirectwritedata/gluster/gvol0
> Status: Connected
> Total Number of entries: 0
> Number of entries in heal pending: 0
> Number of entries in split-brain: 0
> Number of entries possibly healing: 0
>
> Brick gfs2:/nodirectwritedata/gluster/gvol0
> Status: Connected
> Total Number of entries: 0
> Number of entries in heal pending: 0
> Number of entries in split-brain: 0
> Number of entries possibly healing: 0
>
>
> # gluster volume status all
> Status of volume: gvol0
> Gluster process???????????????????????????? TCP Port? RDMA Port? Online? Pid
> ------------------------------------------------------------------------------
> Brick gfs1:/nodirectwritedata/gluster/gvol0 49152???? 0????????? Y?????? 7706 
> Brick gfs2:/nodirectwritedata/gluster/gvol0 49152???? 0????????? Y?????? 7624 
> Self-heal Daemon on localhost?????????????? N/A?????? N/A??????? Y?????? 47636
> Self-heal Daemon on gfs3??????????????????? N/A?????? N/A??????? Y?????? 18542
> Self-heal Daemon on gfs2??????????????????? N/A?????? N/A??????? Y?????? 37192
> ?
> Task Status of Volume gvol0
> ------------------------------------------------------------------------------
> There are no active volume task
>
>
> On Sat, 18 May 2019 at 22:34, Strahil <hunter86_bg at yahoo.com> wrote:
>>
>> Just run 'gluster volume heal my_volume info summary'.
>>
>> It will report any issues - everything should be 'Connected' and show '0'.
>>
>> Best Regards,
>> Strahil Nikolov
>>
>> On May 18, 2019 02:01, David Cunningham <dcunningham at voisonics.com> wrote:
>>>
>>> Hi Ravi,
>>>
>>> The existing two nodes aren't in split-brain, at least that I'm aware of. Running "gluster volume status all" doesn't show any problem.
>>>
>>> I'm not sure what "in metadata" means. Can you please explain that one?
>>>
>>>
>>> On Fri, 17 May 2019 at 22:43, Ravishankar N <ravishankar at redhat.com> wrote:
>>>>
>>>>
>>>> On 17/05/19 5:59 AM, David Cunningham wrote:
>>>>>
>>>>> Hello,
>>>>>
>>>>> We're adding an arbiter node to an existing volume and having an issue. Can anyone help? The root cause error appears to be "00000000-0000-0000-0000-000000000001: failed to resolve (Transport endpoint is not connected)", as below.
>>>>>
>>>> Was your root directory of the replica 2 volume? in metadata or entry split-brain? If yes, you need to resolve it before proceeding with the add-brick.
>>>>
>>>> -Ravi
>>>>
>>>>
>>>>> We are running glusterfs 5.6.1. Thanks in advance for any assistance!
>>>>>
>>>>> On existing node gfs1, trying to add new arbiter node gfs3:
>>>>>
>>>>> # gluster volume add-brick gvol0 replica 3 arbiter 1 gfs3:/nodirectwritedata/gluster/gvol0
>>>>> volume add-brick: failed: Commit failed on gfs3. Please check log file for details.
>>>>>
>>>>> On new node gfs3 in gvol0-add-brick-mount.log:
>>>>>
>>>>> [2019-05-17 01:20:22.689721] I [fuse-bridge.c:4267:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.22
>>>>> [2019-05-17 01:20:22.689778] I [fuse-bridge.c:4878:fuse_graph_sync] 0-fuse: switched to graph 0
>>>>> [2019-05-17 01:20:22.694897] E [fuse-bridge.c:4336:fuse_first_lookup] 0-fuse: first lookup on root failed (Transport endpoint is not connected)
>>>>> [2019-05-17 01:20:22.699770] W [fuse-resolve.c:127:fuse_resolve_gfid_cbk] 0-fuse: 00000000-0000-0000-0000-000000000001: failed to resolve (Transport endpoint is not connected)
>>>>> [2019-05-17 01:20:22.699834] W [fuse-bridge.c:3294:fuse_setxattr_resume] 0-glusterfs-fuse: 2: SETXATTR 00000000-0000-0000-0000-000000000001/1 (trusted.add-brick) resolution failed
>>>>> [2019-05-17 01:20:22.715656] I [fuse-bridge.c:5144:fuse_thread_proc] 0-fuse: initating unmount of /tmp/mntQAtu3f
>>>>> [2019-05-17 01:20:22.715865] W [glusterfsd.c:1500:cleanup_and_exit] (-->/lib64/libpthread.so.0(+0x7dd5)
>
>
>
> -- 
> David Cunningham, Voisonics Limited
> http://voisonics.com/
> USA: +1 213 221 1092
> New Zealand: +64 (0)28 2558 3782
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190520/24806913/attachment.html>

From nbalacha at redhat.com  Mon May 20 04:39:44 2019
From: nbalacha at redhat.com (Nithya Balachandran)
Date: Mon, 20 May 2019 10:09:44 +0530
Subject: [Gluster-users] add-brick: failed: Commit failed
In-Reply-To: <CAHGbP-9UtXqtK8=zsf=V+HghhLmsyONSfkt6SZAYQOtFezHBJw@mail.gmail.com>
References: <CAHGbP-9UtXqtK8=zsf=V+HghhLmsyONSfkt6SZAYQOtFezHBJw@mail.gmail.com>
Message-ID: <CAOUCJ=hodD2sW3Maa9U8t5xNueuf8MaW6=KALE15+TPekEAA8g@mail.gmail.com>

On Fri, 17 May 2019 at 06:01, David Cunningham <dcunningham at voisonics.com>
wrote:

> Hello,
>
> We're adding an arbiter node to an existing volume and having an issue.
> Can anyone help? The root cause error appears to be
> "00000000-0000-0000-0000-000000000001: failed to resolve (Transport
> endpoint is not connected)", as below.
>
> We are running glusterfs 5.6.1. Thanks in advance for any assistance!
>
> On existing node gfs1, trying to add new arbiter node gfs3:
>
> # gluster volume add-brick gvol0 replica 3 arbiter 1
> gfs3:/nodirectwritedata/gluster/gvol0
> volume add-brick: failed: Commit failed on gfs3. Please check log file for
> details.
>

This looks like a glusterd issue. Please check the glusterd logs for more
info.
Adding the glusterd dev to this thread. Sanju, can you take a look?

Regards,
Nithya

>
> On new node gfs3 in gvol0-add-brick-mount.log:
>
> [2019-05-17 01:20:22.689721] I [fuse-bridge.c:4267:fuse_init]
> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel
> 7.22
> [2019-05-17 01:20:22.689778] I [fuse-bridge.c:4878:fuse_graph_sync]
> 0-fuse: switched to graph 0
> [2019-05-17 01:20:22.694897] E [fuse-bridge.c:4336:fuse_first_lookup]
> 0-fuse: first lookup on root failed (Transport endpoint is not connected)
> [2019-05-17 01:20:22.699770] W [fuse-resolve.c:127:fuse_resolve_gfid_cbk]
> 0-fuse: 00000000-0000-0000-0000-000000000001: failed to resolve (Transport
> endpoint is not connected)
> [2019-05-17 01:20:22.699834] W [fuse-bridge.c:3294:fuse_setxattr_resume]
> 0-glusterfs-fuse: 2: SETXATTR 00000000-0000-0000-0000-000000000001/1
> (trusted.add-brick) resolution failed
> [2019-05-17 01:20:22.715656] I [fuse-bridge.c:5144:fuse_thread_proc]
> 0-fuse: initating unmount of /tmp/mntQAtu3f
> [2019-05-17 01:20:22.715865] W [glusterfsd.c:1500:cleanup_and_exit]
> (-->/lib64/libpthread.so.0(+0x7dd5) [0x7fb223bf6dd5]
> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x560886581e75]
> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x560886581ceb] ) 0-:
> received signum (15), shutting down
> [2019-05-17 01:20:22.715926] I [fuse-bridge.c:5914:fini] 0-fuse:
> Unmounting '/tmp/mntQAtu3f'.
> [2019-05-17 01:20:22.715953] I [fuse-bridge.c:5919:fini] 0-fuse: Closing
> fuse connection to '/tmp/mntQAtu3f'.
>
> Processes running on new node gfs3:
>
> # ps -ef | grep gluster
> root      6832     1  0 20:17 ?        00:00:00 /usr/sbin/glusterd -p
> /var/run/glusterd.pid --log-level INFO
> root     15799     1  0 20:17 ?        00:00:00 /usr/sbin/glusterfs -s
> localhost --volfile-id gluster/glustershd -p
> /var/run/gluster/glustershd/glustershd.pid -l
> /var/log/glusterfs/glustershd.log -S
> /var/run/gluster/24c12b09f93eec8e.socket --xlator-option
> *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412 --process-name
> glustershd
> root     16856 16735  0 21:21 pts/0    00:00:00 grep --color=auto gluster
>
> --
> David Cunningham, Voisonics Limited
> http://voisonics.com/
> USA: +1 213 221 1092
> New Zealand: +64 (0)28 2558 3782
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190520/03fb4d8a/attachment.html>

From snowmailer at gmail.com  Mon May 20 10:07:54 2019
From: snowmailer at gmail.com (Martin)
Date: Mon, 20 May 2019 12:07:54 +0200
Subject: [Gluster-users] VMs blocked for more than 120 seconds
In-Reply-To: <CAPhYV8MNBzMPkK2DvFXdjK2mBLwNJUtFW+=r=pO2-r5cRH3xPw@mail.gmail.com>
References: <E2FA4FEE-5830-4270-925F-9808A9C6603E@gmail.com>
	<20190513065548.GI25080@althea.ulrar.net>
	<B7CB29FD-5A42-46EE-A46B-5C4581E9DA24@gmail.com>
	<CAPhYV8Px8G=e5GJnk=SnS+BUa5U8xkUSMWhF0+P8bJYENLKOcA@mail.gmail.com>
	<CAPhYV8P2P4cT6fKkRJ5K9OAux2ams2iRM4-PQKD2Svw1AJmZCQ@mail.gmail.com>
	<681F0862-7C80-414D-9637-7697A8C65AFA@gmail.com>
	<CAPhYV8MNBzMPkK2DvFXdjK2mBLwNJUtFW+=r=pO2-r5cRH3xPw@mail.gmail.com>
Message-ID: <76CB580E-0F53-468F-B7F9-FE46C2971B8C@gmail.com>

Hi Krutika,

> Also, gluster version please?

I am running old 3.7.6. (Yes I know I should upgrade asap)

I?ve applied firstly "network.remote-dio off", behaviour did not changed, VMs got stuck after some time again.
Then I?ve set "performance.strict-o-direct on" and problem completly disappeared. No more stucks at all (7 days without any problems at all). This SOLVED the issue.

Can you explain what remote-dio and strict-o-direct variables changed in behaviour of my Gluster? It would be great for later archive/users to understand what and why this solved my issue.

Anyway, Thanks a LOT!!!

BR, 
Martin

> On 13 May 2019, at 10:20, Krutika Dhananjay <kdhananj at redhat.com> wrote:
> 
> OK. In that case, can you check if the following two changes help:
> 
> # gluster volume set $VOL network.remote-dio off
> # gluster volume set $VOL performance.strict-o-direct on
> 
> preferably one option changed at a time, its impact tested and then the next change applied and tested.
> 
> Also, gluster version please?
> 
> -Krutika
> 
> On Mon, May 13, 2019 at 1:02 PM Martin Toth <snowmailer at gmail.com <mailto:snowmailer at gmail.com>> wrote:
> Cache in qemu is none. That should be correct. This is full command :
> 
> /usr/bin/qemu-system-x86_64 -name one-312 -S -machine pc-i440fx-xenial,accel=kvm,usb=off -m 4096 -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -uuid e95a774e-a594-4e98-b141-9f30a3f848c1 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-one-312/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime -no-shutdown -boot order=c,menu=on,splash-time=3000,strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 
> 
> -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4
> -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5
> -drive file=/var/lib/one//datastores/116/312/disk.0,format=raw,if=none,id=drive-virtio-disk1,cache=none
> 	-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk1,id=virtio-disk1
> -drive file=gluster://localhost:24007/imagestore/ <>7b64d6757acc47a39503f68731f89b8e,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none
> 	-device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0
> -drive file=/var/lib/one//datastores/116/312/disk.1,format=raw,if=none,id=drive-ide0-0-0,readonly=on
> 	-device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0
> 
> -netdev tap,fd=26,id=hostnet0 -device e1000,netdev=hostnet0,id=net0,mac=02:00:5c:f0:e4:39,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-one-312/org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -vnc 0.0.0.0:312 <http://0.0.0.0:312/>,password -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on
> 
> I?ve highlighted disks. First is VM context disk - Fuse used, second is SDA (OS is installed here) - libgfapi used, third is SWAP - Fuse used.
> 
> Krutika,
> I will start profiling on Gluster Volumes and wait for next VM to fail. Than I will attach/send profiling info after some VM will be failed. I suppose this is correct profiling strategy.
> 
> About this, how many vms do you need to recreate it? A single vm? Or multiple vms doing IO in parallel?
> 
> 
> Thanks,
> BR!
> Martin
> 
>> On 13 May 2019, at 09:21, Krutika Dhananjay <kdhananj at redhat.com <mailto:kdhananj at redhat.com>> wrote:
>> 
>> Also, what's the caching policy that qemu is using on the affected vms?
>> Is it cache=none? Or something else? You can get this information in the command line of qemu-kvm process corresponding to your vm in the ps output.
>> 
>> -Krutika
>> 
>> On Mon, May 13, 2019 at 12:49 PM Krutika Dhananjay <kdhananj at redhat.com <mailto:kdhananj at redhat.com>> wrote:
>> What version of gluster are you using?
>> Also, can you capture and share volume-profile output for a run where you manage to recreate this issue?
>> https://docs.gluster.org/en/v3/Administrator%20Guide/Monitoring%20Workload/#running-glusterfs-volume-profile-command <https://docs.gluster.org/en/v3/Administrator%20Guide/Monitoring%20Workload/#running-glusterfs-volume-profile-command>
>> Let me know if you have any questions.
>> 
>> -Krutika
>> 
>> On Mon, May 13, 2019 at 12:34 PM Martin Toth <snowmailer at gmail.com <mailto:snowmailer at gmail.com>> wrote:
>> Hi,
>> 
>> there is no healing operation, not peer disconnects, no readonly filesystem. Yes, storage is slow and unavailable for 120 seconds, but why, its SSD with 10G, performance is good.
>> 
>> > you'd have it's log on qemu's standard output,
>> 
>> If you mean /var/log/libvirt/qemu/vm.log there is nothing. I am looking for problem for more than month, tried everything. Can?t find anything. Any more clues or leads?
>> 
>> BR,
>> Martin
>> 
>> > On 13 May 2019, at 08:55, lemonnierk at ulrar.net <mailto:lemonnierk at ulrar.net> wrote:
>> > 
>> > On Mon, May 13, 2019 at 08:47:45AM +0200, Martin Toth wrote:
>> >> Hi all,
>> > 
>> > Hi
>> > 
>> >> 
>> >> I am running replica 3 on SSDs with 10G networking, everything works OK but VMs stored in Gluster volume occasionally freeze with ?Task XY blocked for more than 120 seconds?.
>> >> Only solution is to poweroff (hard) VM and than boot it up again. I am unable to SSH and also login with console, its stuck probably on some disk operation. No error/warning logs or messages are store in VMs logs.
>> >> 
>> > 
>> > As far as I know this should be unrelated, I get this during heals
>> > without any freezes, it just means the storage is slow I think.
>> > 
>> >> KVM/Libvirt(qemu) using libgfapi and fuse mount to access VM disks on replica volume. Can someone advice  how to debug this problem or what can cause these issues? 
>> >> It?s really annoying, I?ve tried to google everything but nothing came up. I?ve tried changing virtio-scsi-pci to virtio-blk-pci disk drivers, but its not related.
>> >> 
>> > 
>> > Any chance your gluster goes readonly ? Have you checked your gluster
>> > logs to see if maybe they lose each other some times ?
>> > /var/log/glusterfs
>> > 
>> > For libgfapi accesses you'd have it's log on qemu's standard output,
>> > that might contain the actual error at the time of the freez.
>> > _______________________________________________
>> > Gluster-users mailing list
>> > Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>> > https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users>
>> 
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>> https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190520/dbeffd40/attachment.html>

From srakonde at redhat.com  Mon May 20 12:10:25 2019
From: srakonde at redhat.com (Sanju Rakonde)
Date: Mon, 20 May 2019 17:40:25 +0530
Subject: [Gluster-users] add-brick: failed: Commit failed
In-Reply-To: <CAOUCJ=hodD2sW3Maa9U8t5xNueuf8MaW6=KALE15+TPekEAA8g@mail.gmail.com>
References: <CAHGbP-9UtXqtK8=zsf=V+HghhLmsyONSfkt6SZAYQOtFezHBJw@mail.gmail.com>
	<CAOUCJ=hodD2sW3Maa9U8t5xNueuf8MaW6=KALE15+TPekEAA8g@mail.gmail.com>
Message-ID: <CABj3WN2vDQwLfU673SAr5g6JLNwLMjeW8D46B_Fx+rFv2KB5LQ@mail.gmail.com>

David,

can you please attach glusterd.logs? As the error message says, Commit
failed on the arbitar node, we might be able to find some issue on that
node.

On Mon, May 20, 2019 at 10:10 AM Nithya Balachandran <nbalacha at redhat.com>
wrote:

>
>
> On Fri, 17 May 2019 at 06:01, David Cunningham <dcunningham at voisonics.com>
> wrote:
>
>> Hello,
>>
>> We're adding an arbiter node to an existing volume and having an issue.
>> Can anyone help? The root cause error appears to be
>> "00000000-0000-0000-0000-000000000001: failed to resolve (Transport
>> endpoint is not connected)", as below.
>>
>> We are running glusterfs 5.6.1. Thanks in advance for any assistance!
>>
>> On existing node gfs1, trying to add new arbiter node gfs3:
>>
>> # gluster volume add-brick gvol0 replica 3 arbiter 1
>> gfs3:/nodirectwritedata/gluster/gvol0
>> volume add-brick: failed: Commit failed on gfs3. Please check log file
>> for details.
>>
>
> This looks like a glusterd issue. Please check the glusterd logs for more
> info.
> Adding the glusterd dev to this thread. Sanju, can you take a look?
>
> Regards,
> Nithya
>
>>
>> On new node gfs3 in gvol0-add-brick-mount.log:
>>
>> [2019-05-17 01:20:22.689721] I [fuse-bridge.c:4267:fuse_init]
>> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel
>> 7.22
>> [2019-05-17 01:20:22.689778] I [fuse-bridge.c:4878:fuse_graph_sync]
>> 0-fuse: switched to graph 0
>> [2019-05-17 01:20:22.694897] E [fuse-bridge.c:4336:fuse_first_lookup]
>> 0-fuse: first lookup on root failed (Transport endpoint is not connected)
>> [2019-05-17 01:20:22.699770] W [fuse-resolve.c:127:fuse_resolve_gfid_cbk]
>> 0-fuse: 00000000-0000-0000-0000-000000000001: failed to resolve (Transport
>> endpoint is not connected)
>> [2019-05-17 01:20:22.699834] W [fuse-bridge.c:3294:fuse_setxattr_resume]
>> 0-glusterfs-fuse: 2: SETXATTR 00000000-0000-0000-0000-000000000001/1
>> (trusted.add-brick) resolution failed
>> [2019-05-17 01:20:22.715656] I [fuse-bridge.c:5144:fuse_thread_proc]
>> 0-fuse: initating unmount of /tmp/mntQAtu3f
>> [2019-05-17 01:20:22.715865] W [glusterfsd.c:1500:cleanup_and_exit]
>> (-->/lib64/libpthread.so.0(+0x7dd5) [0x7fb223bf6dd5]
>> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x560886581e75]
>> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x560886581ceb] ) 0-:
>> received signum (15), shutting down
>> [2019-05-17 01:20:22.715926] I [fuse-bridge.c:5914:fini] 0-fuse:
>> Unmounting '/tmp/mntQAtu3f'.
>> [2019-05-17 01:20:22.715953] I [fuse-bridge.c:5919:fini] 0-fuse: Closing
>> fuse connection to '/tmp/mntQAtu3f'.
>>
>> Processes running on new node gfs3:
>>
>> # ps -ef | grep gluster
>> root      6832     1  0 20:17 ?        00:00:00 /usr/sbin/glusterd -p
>> /var/run/glusterd.pid --log-level INFO
>> root     15799     1  0 20:17 ?        00:00:00 /usr/sbin/glusterfs -s
>> localhost --volfile-id gluster/glustershd -p
>> /var/run/gluster/glustershd/glustershd.pid -l
>> /var/log/glusterfs/glustershd.log -S
>> /var/run/gluster/24c12b09f93eec8e.socket --xlator-option
>> *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412 --process-name
>> glustershd
>> root     16856 16735  0 21:21 pts/0    00:00:00 grep --color=auto gluster
>>
>> --
>> David Cunningham, Voisonics Limited
>> http://voisonics.com/
>> USA: +1 213 221 1092
>> New Zealand: +64 (0)28 2558 3782
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>

-- 
Thanks,
Sanju
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190520/7ed6a8f5/attachment.html>

From pkalever at redhat.com  Mon May 20 12:36:41 2019
From: pkalever at redhat.com (Prasanna Kalever)
Date: Mon, 20 May 2019 18:06:41 +0530
Subject: [Gluster-users] gluster-block v0.4 is alive!
In-Reply-To: <CACeDKtd0BeR8spNU9yfifmGu60MYqJaPnAReKac8P0dEwPiUwA@mail.gmail.com>
References: <CANwsLLHaP7OV50wHbc0jSH0frvhLr9a2qNZf1BagoY1+q0S8jQ@mail.gmail.com>
	<CACeDKtd0BeR8spNU9yfifmGu60MYqJaPnAReKac8P0dEwPiUwA@mail.gmail.com>
Message-ID: <CANwsLLE8Un3fxEBvhBi-Rny6Wp4ucm0ZsW1HwrHVbbcM=zzZ9w@mail.gmail.com>

Hey Vlad,

Thanks for trying gluster-block. Appreciate your feedback.

Here is the patch which should fix the issue you have noticed:
https://github.com/gluster/gluster-block/pull/233

Thanks!
--
Prasanna

On Sat, May 18, 2019 at 4:48 AM Vlad Kopylov <vladkopy at gmail.com> wrote:
>
>
> straight from
>
> ./autogen.sh && ./configure && make -j install
>
>
> CentOS Linux release 7.6.1810 (Core)
>
>
> May 17 19:13:18 vm2 gluster-blockd[24294]: Error opening log file: No such file or directory
> May 17 19:13:18 vm2 gluster-blockd[24294]: Logging to stderr.
> May 17 19:13:18 vm2 gluster-blockd[24294]: [2019-05-17 23:13:18.966992] CRIT: trying to change logDir from /var/log/gluster-block to /var/log/gluster-block [at utils.c+495 :<initLogDirAndFiles>]
> May 17 19:13:19 vm2 gluster-blockd[24294]: No such path /backstores/user:glfs
> May 17 19:13:19 vm2 systemd[1]: gluster-blockd.service: main process exited, code=exited, status=1/FAILURE
> May 17 19:13:19 vm2 systemd[1]: Unit gluster-blockd.service entered failed state.
> May 17 19:13:19 vm2 systemd[1]: gluster-blockd.service failed.
>
>
>
> On Thu, May 2, 2019 at 1:35 PM Prasanna Kalever <pkalever at redhat.com> wrote:
>>
>> Hello Gluster folks,
>>
>> Gluster-block team is happy to announce the v0.4 release [1].
>>
>> This is the new stable version of gluster-block, lots of new and
>> exciting features and interesting bug fixes are made available as part
>> of this release.
>> Please find the big list of release highlights and notable fixes at [2].
>>
>> Details about installation can be found in the easy install guide at
>> [3]. Find the details about prerequisites and setup guide at [4].
>> If you are a new user, checkout the demo video attached in the README
>> doc [5], which will be a good source of intro to the project.
>> There are good examples about how to use gluster-block both in the man
>> pages [6] and test file [7] (also in the README).
>>
>> gluster-block is part of fedora package collection, an updated package
>> with release version v0.4 will be soon made available. And the
>> community provided packages will be soon made available at [8].
>>
>> Please spend a minute to report any kind of issue that comes to your
>> notice with this handy link [9].
>> We look forward to your feedback, which will help gluster-block get better!
>>
>> We would like to thank all our users, contributors for bug filing and
>> fixes, also the whole team who involved in the huge effort with
>> pre-release testing.
>>
>>
>> [1] https://github.com/gluster/gluster-block
>> [2] https://github.com/gluster/gluster-block/releases
>> [3] https://github.com/gluster/gluster-block/blob/master/INSTALL
>> [4] https://github.com/gluster/gluster-block#usage
>> [5] https://github.com/gluster/gluster-block/blob/master/README.md
>> [6] https://github.com/gluster/gluster-block/tree/master/docs
>> [7] https://github.com/gluster/gluster-block/blob/master/tests/basic.t
>> [8] https://download.gluster.org/pub/gluster/gluster-block/
>> [9] https://github.com/gluster/gluster-block/issues/new
>>
>> Cheers,
>> Team Gluster-Block!
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users

From vladkopy at gmail.com  Mon May 20 15:35:20 2019
From: vladkopy at gmail.com (Vlad Kopylov)
Date: Mon, 20 May 2019 11:35:20 -0400
Subject: [Gluster-users] gluster-block v0.4 is alive!
In-Reply-To: <CANwsLLE8Un3fxEBvhBi-Rny6Wp4ucm0ZsW1HwrHVbbcM=zzZ9w@mail.gmail.com>
References: <CANwsLLHaP7OV50wHbc0jSH0frvhLr9a2qNZf1BagoY1+q0S8jQ@mail.gmail.com>
	<CACeDKtd0BeR8spNU9yfifmGu60MYqJaPnAReKac8P0dEwPiUwA@mail.gmail.com>
	<CANwsLLE8Un3fxEBvhBi-Rny6Wp4ucm0ZsW1HwrHVbbcM=zzZ9w@mail.gmail.com>
Message-ID: <CACeDKtd2+m5=MDm8Uf0RP6FeFpjxu4WyhEEJQ3QeNV_5gkmm9g@mail.gmail.com>

Thank you Prasanna.

Do we have architecture somewhere?
Dies it bypass Fuse and go directly gfapi ?

v

On Mon, May 20, 2019, 8:36 AM Prasanna Kalever <pkalever at redhat.com> wrote:

> Hey Vlad,
>
> Thanks for trying gluster-block. Appreciate your feedback.
>
> Here is the patch which should fix the issue you have noticed:
> https://github.com/gluster/gluster-block/pull/233
>
> Thanks!
> --
> Prasanna
>
> On Sat, May 18, 2019 at 4:48 AM Vlad Kopylov <vladkopy at gmail.com> wrote:
> >
> >
> > straight from
> >
> > ./autogen.sh && ./configure && make -j install
> >
> >
> > CentOS Linux release 7.6.1810 (Core)
> >
> >
> > May 17 19:13:18 vm2 gluster-blockd[24294]: Error opening log file: No
> such file or directory
> > May 17 19:13:18 vm2 gluster-blockd[24294]: Logging to stderr.
> > May 17 19:13:18 vm2 gluster-blockd[24294]: [2019-05-17 23:13:18.966992]
> CRIT: trying to change logDir from /var/log/gluster-block to
> /var/log/gluster-block [at utils.c+495 :<initLogDirAndFiles>]
> > May 17 19:13:19 vm2 gluster-blockd[24294]: No such path
> /backstores/user:glfs
> > May 17 19:13:19 vm2 systemd[1]: gluster-blockd.service: main process
> exited, code=exited, status=1/FAILURE
> > May 17 19:13:19 vm2 systemd[1]: Unit gluster-blockd.service entered
> failed state.
> > May 17 19:13:19 vm2 systemd[1]: gluster-blockd.service failed.
> >
> >
> >
> > On Thu, May 2, 2019 at 1:35 PM Prasanna Kalever <pkalever at redhat.com>
> wrote:
> >>
> >> Hello Gluster folks,
> >>
> >> Gluster-block team is happy to announce the v0.4 release [1].
> >>
> >> This is the new stable version of gluster-block, lots of new and
> >> exciting features and interesting bug fixes are made available as part
> >> of this release.
> >> Please find the big list of release highlights and notable fixes at [2].
> >>
> >> Details about installation can be found in the easy install guide at
> >> [3]. Find the details about prerequisites and setup guide at [4].
> >> If you are a new user, checkout the demo video attached in the README
> >> doc [5], which will be a good source of intro to the project.
> >> There are good examples about how to use gluster-block both in the man
> >> pages [6] and test file [7] (also in the README).
> >>
> >> gluster-block is part of fedora package collection, an updated package
> >> with release version v0.4 will be soon made available. And the
> >> community provided packages will be soon made available at [8].
> >>
> >> Please spend a minute to report any kind of issue that comes to your
> >> notice with this handy link [9].
> >> We look forward to your feedback, which will help gluster-block get
> better!
> >>
> >> We would like to thank all our users, contributors for bug filing and
> >> fixes, also the whole team who involved in the huge effort with
> >> pre-release testing.
> >>
> >>
> >> [1] https://github.com/gluster/gluster-block
> >> [2] https://github.com/gluster/gluster-block/releases
> >> [3] https://github.com/gluster/gluster-block/blob/master/INSTALL
> >> [4] https://github.com/gluster/gluster-block#usage
> >> [5] https://github.com/gluster/gluster-block/blob/master/README.md
> >> [6] https://github.com/gluster/gluster-block/tree/master/docs
> >> [7] https://github.com/gluster/gluster-block/blob/master/tests/basic.t
> >> [8] https://download.gluster.org/pub/gluster/gluster-block/
> >> [9] https://github.com/gluster/gluster-block/issues/new
> >>
> >> Cheers,
> >> Team Gluster-Block!
> >> _______________________________________________
> >> Gluster-users mailing list
> >> Gluster-users at gluster.org
> >> https://lists.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190520/4484f2ec/attachment.html>

From spisla80 at gmail.com  Tue May 21 09:24:18 2019
From: spisla80 at gmail.com (David Spisla)
Date: Tue, 21 May 2019 11:24:18 +0200
Subject: [Gluster-users] [Gluster-devel] Improve stability between
 SMB/CTDB and Gluster (together with Samba Core Developer)
In-Reply-To: <CAJyj9j-EPt9g9pPJDzt4Ztnx420=_+cnVGKdDTDSEUdHm7cr8Q@mail.gmail.com>
References: <CAJyj9j_BofU7ziSNmTQWzcs4r9xLzzTNKetk+nhHay70T_z1qA@mail.gmail.com>
	<CALXm_dFHHX7Sn8YvpemN1B6YPPB4W+4hZ+50NfTrDaijqejk4Q@mail.gmail.com>
	<AM6PR04MB6006428E20D24F6B883BD12EF10F0@AM6PR04MB6006.eurprd04.prod.outlook.com>
	<CAJyj9j_e=AM=1OoZ6T8jeF3B0pewB8Tft2TPro1X6dmKjzQ+eg@mail.gmail.com>
	<CAHxyDdNsZRngAYBY+fxR58X12QhkVDyiBRCRRFXRCdya1s1Sgw@mail.gmail.com>
	<CAJyj9j-EPt9g9pPJDzt4Ztnx420=_+cnVGKdDTDSEUdHm7cr8Q@mail.gmail.com>
Message-ID: <CAJyj9j_5xTxrR-Ta-yeEEyae8nJDMwVYN5M0iX1T2EFxf4qtOQ@mail.gmail.com>

Hello together,

we are still seeking a day and time to talk about interesting Samba /
Glusterfs issues. Here is a new list of possible dates and time.

May 22th ? 24th at 12:30 - 14:30 IST or  (9:00 - 11:00 CEST)

May 27th ? 29th and 31th at 12:30 - 14:30 IST (9:00 - 11:00 CEST)


On May 30th there is a holiday here in germany.

@Poornima Gurusiddaiah <pgurusid at redhat.com> If there is any problem
finding a date please contanct me. I will look for alternatives


Regards

David Spisla


Am Do., 16. Mai 2019 um 12:42 Uhr schrieb David Spisla <spisla80 at gmail.com>:

> Hello Amar,
>
> thank you for the information. Of course, we should wait for Poornima
> because of her knowledge.
>
> Regards
> David Spisla
>
> Am Do., 16. Mai 2019 um 12:23 Uhr schrieb Amar Tumballi Suryanarayan <
> atumball at redhat.com>:
>
>> David, Poornima is on leave from today till 21st May. So having it after
>> she comes back is better. She has more experience in SMB integration than
>> many of us.
>>
>> -Amar
>>
>> On Thu, May 16, 2019 at 1:09 PM David Spisla <spisla80 at gmail.com> wrote:
>>
>>> Hello everyone,
>>>
>>> if there is any problem in finding a date and time, please contact me.
>>> It would be fine to have a meeting soon.
>>>
>>> Regards
>>> David Spisla
>>>
>>> Am Mo., 13. Mai 2019 um 12:38 Uhr schrieb David Spisla <
>>> david.spisla at iternity.com>:
>>>
>>>> Hi Poornima,
>>>>
>>>>
>>>>
>>>> thats fine. I would suggest this dates and times:
>>>>
>>>>
>>>>
>>>> May 15th ? 17th at 12:30, 13:30, 14:30 IST (9:00, 10:00, 11:00 CEST)
>>>>
>>>> May 20th ? 24th at 12:30, 13:30, 14:30 IST (9:00, 10:00, 11:00 CEST)
>>>>
>>>>
>>>>
>>>> I add Volker Lendecke from Sernet to the mail. He is the Samba Expert.
>>>>
>>>> Can someone of you provide a host via bluejeans.com? If not, I will
>>>> try it with GoToMeeting (https://www.gotomeeting.com).
>>>>
>>>>
>>>>
>>>> @all Please write your prefered dates and times. For me, all oft the
>>>> above dates and times are fine
>>>>
>>>>
>>>>
>>>> Regards
>>>>
>>>> David
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *Von:* Poornima Gurusiddaiah <pgurusid at redhat.com>
>>>> *Gesendet:* Montag, 13. Mai 2019 07:22
>>>> *An:* David Spisla <spisla80 at gmail.com>; Anoop C S <anoopcs at redhat.com>;
>>>> Gunther Deschner <gdeschne at redhat.com>
>>>> *Cc:* Gluster Devel <gluster-devel at gluster.org>;
>>>> gluster-users at gluster.org List <gluster-users at gluster.org>
>>>> *Betreff:* Re: [Gluster-devel] Improve stability between SMB/CTDB and
>>>> Gluster (together with Samba Core Developer)
>>>>
>>>>
>>>>
>>>> Hi,
>>>>
>>>>
>>>>
>>>> We would be definitely interested in this. Thank you for contacting us.
>>>> For the starter we can have an online conference. Please suggest few
>>>> possible date and times for the week(preferably between IST 7.00AM -
>>>> 9.PM)?
>>>>
>>>> Adding Anoop and Gunther who are also the main contributors to the
>>>> Gluster-Samba integration.
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Poornima
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, May 9, 2019 at 7:43 PM David Spisla <spisla80 at gmail.com> wrote:
>>>>
>>>> Dear Gluster Community,
>>>>
>>>> at the moment we are improving the stability of SMB/CTDB and Gluster.
>>>> For this purpose we are working together with an advanced SAMBA Core
>>>> Developer. He did some debugging but needs more information about Gluster
>>>> Core Behaviour.
>>>>
>>>>
>>>>
>>>> *Would any of the Gluster Developer wants to have a online conference
>>>> with him and me?*
>>>>
>>>>
>>>>
>>>> I would organize everything. In my opinion this is a good chance to
>>>> improve stability of Glusterfs and this is at the moment one of the major
>>>> issues in the Community.
>>>>
>>>>
>>>>
>>>> Regards
>>>>
>>>> David Spisla
>>>>
>>>> _______________________________________________
>>>>
>>>> Community Meeting Calendar:
>>>>
>>>> APAC Schedule -
>>>> Every 2nd and 4th Tuesday at 11:30 AM IST
>>>> Bridge: https://bluejeans.com/836554017
>>>>
>>>> NA/EMEA Schedule -
>>>> Every 1st and 3rd Tuesday at 01:00 PM EDT
>>>> Bridge: https://bluejeans.com/486278655
>>>>
>>>> Gluster-devel mailing list
>>>> Gluster-devel at gluster.org
>>>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>>>
>>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>> --
>> Amar Tumballi (amarts)
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190521/c7a11975/attachment.html>

From kdhananj at redhat.com  Tue May 21 10:05:07 2019
From: kdhananj at redhat.com (Krutika Dhananjay)
Date: Tue, 21 May 2019 15:35:07 +0530
Subject: [Gluster-users] VMs blocked for more than 120 seconds
In-Reply-To: <76CB580E-0F53-468F-B7F9-FE46C2971B8C@gmail.com>
References: <E2FA4FEE-5830-4270-925F-9808A9C6603E@gmail.com>
	<20190513065548.GI25080@althea.ulrar.net>
	<B7CB29FD-5A42-46EE-A46B-5C4581E9DA24@gmail.com>
	<CAPhYV8Px8G=e5GJnk=SnS+BUa5U8xkUSMWhF0+P8bJYENLKOcA@mail.gmail.com>
	<CAPhYV8P2P4cT6fKkRJ5K9OAux2ams2iRM4-PQKD2Svw1AJmZCQ@mail.gmail.com>
	<681F0862-7C80-414D-9637-7697A8C65AFA@gmail.com>
	<CAPhYV8MNBzMPkK2DvFXdjK2mBLwNJUtFW+=r=pO2-r5cRH3xPw@mail.gmail.com>
	<76CB580E-0F53-468F-B7F9-FE46C2971B8C@gmail.com>
Message-ID: <CAPhYV8ME=vn=MQuULFe59vMwReQ-+WyLQeYEjek0YUsuZbh-sQ@mail.gmail.com>

Hi Martin,

Glad it worked! And yes, 3.7.6 is really old! :)

So the issue is occurring when the vm flushes outstanding data to disk. And
this
is taking > 120s because there's lot of buffered writes to flush, possibly
followed
by an fsync too which needs to sync them to disk (volume profile would have
been helpful in confirming this). All these two options do is to truly
honor O_DIRECT flag
(which is what we want anyway given the vms are opened with 'cache=none'
qemu option).
This will skip write-caching on gluster client side and also bypass the
page-cache on the
gluster-bricks, and so data gets flushed faster, thereby eliminating these
timeouts.

-Krutika


On Mon, May 20, 2019 at 3:38 PM Martin <snowmailer at gmail.com> wrote:

> Hi Krutika,
>
> Also, gluster version please?
>
> I am running old 3.7.6. (Yes I know I should upgrade asap)
>
> I?ve applied firstly "network.remote-dio off", behaviour did not changed,
> VMs got stuck after some time again.
> Then I?ve set "performance.strict-o-direct on" and problem completly
> disappeared. No more stucks at all (7 days without any problems at all).
> This SOLVED the issue.
>
> Can you explain what remote-dio and strict-o-direct variables changed in
> behaviour of my Gluster? It would be great for later archive/users to
> understand what and why this solved my issue.
>
> Anyway, Thanks a LOT!!!
>
> BR,
> Martin
>
> On 13 May 2019, at 10:20, Krutika Dhananjay <kdhananj at redhat.com> wrote:
>
> OK. In that case, can you check if the following two changes help:
>
> # gluster volume set $VOL network.remote-dio off
> # gluster volume set $VOL performance.strict-o-direct on
>
> preferably one option changed at a time, its impact tested and then the
> next change applied and tested.
>
> Also, gluster version please?
>
> -Krutika
>
> On Mon, May 13, 2019 at 1:02 PM Martin Toth <snowmailer at gmail.com> wrote:
>
>> Cache in qemu is none. That should be correct. This is full command :
>>
>> /usr/bin/qemu-system-x86_64 -name one-312 -S -machine
>> pc-i440fx-xenial,accel=kvm,usb=off -m 4096 -realtime mlock=off -smp
>> 4,sockets=4,cores=1,threads=1 -uuid e95a774e-a594-4e98-b141-9f30a3f848c1
>> -no-user-config -nodefaults -chardev
>> socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-one-312/monitor.sock,server,nowait
>> -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime
>> -no-shutdown -boot order=c,menu=on,splash-time=3000,strict=on -device
>> piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2
>>
>> -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4
>> -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5
>> -drive file=/var/lib/one//datastores/116/312/*disk.0*
>> ,format=raw,if=none,id=drive-virtio-disk1,cache=none
>> -device
>> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk1,id=virtio-disk1
>> -drive file=gluster://localhost:24007/imagestore/
>> *7b64d6757acc47a39503f68731f89b8e*
>> ,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none
>> -device
>> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0
>> -drive file=/var/lib/one//datastores/116/312/*disk.1*
>> ,format=raw,if=none,id=drive-ide0-0-0,readonly=on
>> -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0
>>
>> -netdev tap,fd=26,id=hostnet0
>> -device e1000,netdev=hostnet0,id=net0,mac=02:00:5c:f0:e4:39,bus=pci.0,addr=0x3
>> -chardev pty,id=charserial0 -device
>> isa-serial,chardev=charserial0,id=serial0
>> -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-one-312/org.qemu.guest_agent.0,server,nowait
>> -device
>> virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0
>> -vnc 0.0.0.0:312,password -device
>> cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device
>> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on
>>
>> I?ve highlighted disks. First is VM context disk - Fuse used, second is
>> SDA (OS is installed here) - libgfapi used, third is SWAP - Fuse used.
>>
>> Krutika,
>> I will start profiling on Gluster Volumes and wait for next VM to fail.
>> Than I will attach/send profiling info after some VM will be failed. I
>> suppose this is correct profiling strategy.
>>
>
> About this, how many vms do you need to recreate it? A single vm? Or
> multiple vms doing IO in parallel?
>
>
>> Thanks,
>> BR!
>> Martin
>>
>> On 13 May 2019, at 09:21, Krutika Dhananjay <kdhananj at redhat.com> wrote:
>>
>> Also, what's the caching policy that qemu is using on the affected vms?
>> Is it cache=none? Or something else? You can get this information in the
>> command line of qemu-kvm process corresponding to your vm in the ps output.
>>
>> -Krutika
>>
>> On Mon, May 13, 2019 at 12:49 PM Krutika Dhananjay <kdhananj at redhat.com>
>> wrote:
>>
>>> What version of gluster are you using?
>>> Also, can you capture and share volume-profile output for a run where
>>> you manage to recreate this issue?
>>>
>>> https://docs.gluster.org/en/v3/Administrator%20Guide/Monitoring%20Workload/#running-glusterfs-volume-profile-command
>>> Let me know if you have any questions.
>>>
>>> -Krutika
>>>
>>> On Mon, May 13, 2019 at 12:34 PM Martin Toth <snowmailer at gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> there is no healing operation, not peer disconnects, no readonly
>>>> filesystem. Yes, storage is slow and unavailable for 120 seconds, but why,
>>>> its SSD with 10G, performance is good.
>>>>
>>>> > you'd have it's log on qemu's standard output,
>>>>
>>>> If you mean /var/log/libvirt/qemu/vm.log there is nothing. I am looking
>>>> for problem for more than month, tried everything. Can?t find anything. Any
>>>> more clues or leads?
>>>>
>>>> BR,
>>>> Martin
>>>>
>>>> > On 13 May 2019, at 08:55, lemonnierk at ulrar.net wrote:
>>>> >
>>>> > On Mon, May 13, 2019 at 08:47:45AM +0200, Martin Toth wrote:
>>>> >> Hi all,
>>>> >
>>>> > Hi
>>>> >
>>>> >>
>>>> >> I am running replica 3 on SSDs with 10G networking, everything works
>>>> OK but VMs stored in Gluster volume occasionally freeze with ?Task XY
>>>> blocked for more than 120 seconds?.
>>>> >> Only solution is to poweroff (hard) VM and than boot it up again. I
>>>> am unable to SSH and also login with console, its stuck probably on some
>>>> disk operation. No error/warning logs or messages are store in VMs logs.
>>>> >>
>>>> >
>>>> > As far as I know this should be unrelated, I get this during heals
>>>> > without any freezes, it just means the storage is slow I think.
>>>> >
>>>> >> KVM/Libvirt(qemu) using libgfapi and fuse mount to access VM disks
>>>> on replica volume. Can someone advice  how to debug this problem or what
>>>> can cause these issues?
>>>> >> It?s really annoying, I?ve tried to google everything but nothing
>>>> came up. I?ve tried changing virtio-scsi-pci to virtio-blk-pci disk
>>>> drivers, but its not related.
>>>> >>
>>>> >
>>>> > Any chance your gluster goes readonly ? Have you checked your gluster
>>>> > logs to see if maybe they lose each other some times ?
>>>> > /var/log/glusterfs
>>>> >
>>>> > For libgfapi accesses you'd have it's log on qemu's standard output,
>>>> > that might contain the actual error at the time of the freez.
>>>> > _______________________________________________
>>>> > Gluster-users mailing list
>>>> > Gluster-users at gluster.org
>>>> > https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190521/389f526e/attachment.html>

From pkalever at redhat.com  Tue May 21 14:39:22 2019
From: pkalever at redhat.com (Prasanna Kalever)
Date: Tue, 21 May 2019 20:09:22 +0530
Subject: [Gluster-users] gluster-block v0.4 is alive!
In-Reply-To: <CACeDKtd2+m5=MDm8Uf0RP6FeFpjxu4WyhEEJQ3QeNV_5gkmm9g@mail.gmail.com>
References: <CANwsLLHaP7OV50wHbc0jSH0frvhLr9a2qNZf1BagoY1+q0S8jQ@mail.gmail.com>
	<CACeDKtd0BeR8spNU9yfifmGu60MYqJaPnAReKac8P0dEwPiUwA@mail.gmail.com>
	<CANwsLLE8Un3fxEBvhBi-Rny6Wp4ucm0ZsW1HwrHVbbcM=zzZ9w@mail.gmail.com>
	<CACeDKtd2+m5=MDm8Uf0RP6FeFpjxu4WyhEEJQ3QeNV_5gkmm9g@mail.gmail.com>
Message-ID: <CANwsLLE=nd22sSgqg5N4n2y-=F-iY=XW-1+wh_4hzrnEb6eviQ@mail.gmail.com>

On Mon, May 20, 2019 at 9:05 PM Vlad Kopylov <vladkopy at gmail.com> wrote:
>
> Thank you Prasanna.
>
> Do we have architecture somewhere?

Vlad,

Although the complete set of details might be missing at one place
right now, some pointers to start are available at,
https://github.com/gluster/gluster-block#gluster-block and
https://pkalever.wordpress.com/2019/05/06/starting-with-gluster-block,
hopefully that should give some clarity about the project. Also
checkout the man pages.

> Dies it bypass Fuse and go directly gfapi ?

yes, we don't use Fuse access with gluster-block. The management
as-well-as IO happens over gfapi.

Please go through the docs pointed above, if you have any specific
queries, feel free to ask them here or on github.

Best Regards,
--
Prasanna

>
> v
>
> On Mon, May 20, 2019, 8:36 AM Prasanna Kalever <pkalever at redhat.com> wrote:
>>
>> Hey Vlad,
>>
>> Thanks for trying gluster-block. Appreciate your feedback.
>>
>> Here is the patch which should fix the issue you have noticed:
>> https://github.com/gluster/gluster-block/pull/233
>>
>> Thanks!
>> --
>> Prasanna
>>
>> On Sat, May 18, 2019 at 4:48 AM Vlad Kopylov <vladkopy at gmail.com> wrote:
>> >
>> >
>> > straight from
>> >
>> > ./autogen.sh && ./configure && make -j install
>> >
>> >
>> > CentOS Linux release 7.6.1810 (Core)
>> >
>> >
>> > May 17 19:13:18 vm2 gluster-blockd[24294]: Error opening log file: No such file or directory
>> > May 17 19:13:18 vm2 gluster-blockd[24294]: Logging to stderr.
>> > May 17 19:13:18 vm2 gluster-blockd[24294]: [2019-05-17 23:13:18.966992] CRIT: trying to change logDir from /var/log/gluster-block to /var/log/gluster-block [at utils.c+495 :<initLogDirAndFiles>]
>> > May 17 19:13:19 vm2 gluster-blockd[24294]: No such path /backstores/user:glfs
>> > May 17 19:13:19 vm2 systemd[1]: gluster-blockd.service: main process exited, code=exited, status=1/FAILURE
>> > May 17 19:13:19 vm2 systemd[1]: Unit gluster-blockd.service entered failed state.
>> > May 17 19:13:19 vm2 systemd[1]: gluster-blockd.service failed.
>> >
>> >
>> >
>> > On Thu, May 2, 2019 at 1:35 PM Prasanna Kalever <pkalever at redhat.com> wrote:
>> >>
>> >> Hello Gluster folks,
>> >>
>> >> Gluster-block team is happy to announce the v0.4 release [1].
>> >>
>> >> This is the new stable version of gluster-block, lots of new and
>> >> exciting features and interesting bug fixes are made available as part
>> >> of this release.
>> >> Please find the big list of release highlights and notable fixes at [2].
>> >>
>> >> Details about installation can be found in the easy install guide at
>> >> [3]. Find the details about prerequisites and setup guide at [4].
>> >> If you are a new user, checkout the demo video attached in the README
>> >> doc [5], which will be a good source of intro to the project.
>> >> There are good examples about how to use gluster-block both in the man
>> >> pages [6] and test file [7] (also in the README).
>> >>
>> >> gluster-block is part of fedora package collection, an updated package
>> >> with release version v0.4 will be soon made available. And the
>> >> community provided packages will be soon made available at [8].
>> >>
>> >> Please spend a minute to report any kind of issue that comes to your
>> >> notice with this handy link [9].
>> >> We look forward to your feedback, which will help gluster-block get better!
>> >>
>> >> We would like to thank all our users, contributors for bug filing and
>> >> fixes, also the whole team who involved in the huge effort with
>> >> pre-release testing.
>> >>
>> >>
>> >> [1] https://github.com/gluster/gluster-block
>> >> [2] https://github.com/gluster/gluster-block/releases
>> >> [3] https://github.com/gluster/gluster-block/blob/master/INSTALL
>> >> [4] https://github.com/gluster/gluster-block#usage
>> >> [5] https://github.com/gluster/gluster-block/blob/master/README.md
>> >> [6] https://github.com/gluster/gluster-block/tree/master/docs
>> >> [7] https://github.com/gluster/gluster-block/blob/master/tests/basic.t
>> >> [8] https://download.gluster.org/pub/gluster/gluster-block/
>> >> [9] https://github.com/gluster/gluster-block/issues/new
>> >>
>> >> Cheers,
>> >> Team Gluster-Block!
>> >> _______________________________________________
>> >> Gluster-users mailing list
>> >> Gluster-users at gluster.org
>> >> https://lists.gluster.org/mailman/listinfo/gluster-users

From rabhat at redhat.com  Tue May 21 15:13:18 2019
From: rabhat at redhat.com (FNU Raghavendra Manjunath)
Date: Tue, 21 May 2019 11:13:18 -0400
Subject: [Gluster-users] Proposal: Changes in Gluster Community meetings
In-Reply-To: <CAHxyDdOUkV=iV1TQLiVLkZ5AGgvBmxiByrPDFsrwNbTtVt2RUw@mail.gmail.com>
References: <CAHxyDdN3NdNyGM3L-uX3zYFPxEYHvwNt7XQOSbA8y=1GH1uhaQ@mail.gmail.com>
	<62104B6F-99CF-4C22-80FC-9C177F73E897@onholyground.com>
	<CAHxyDdOhYfcOMh+LUV3dU+UqrU5AycNHaU21SRu=SVdZYaDuRg@mail.gmail.com>
	<CAHxyDdOZz-aap+_o0ibucAAd202Mq1=y6kDuaJM02ciL-1WRcg@mail.gmail.com>
	<CAMRKmT_oUCj1fvVgpRXLGW87v8ajk73QP1aEMhi5-oekE+oaRA@mail.gmail.com>
	<907BA003-F786-46CF-A31B-38C93CE9BB20@onholyground.com>
	<CAHxyDdOUkV=iV1TQLiVLkZ5AGgvBmxiByrPDFsrwNbTtVt2RUw@mail.gmail.com>
Message-ID: <CAMRKmT82zTpxPV8Vr2ThvU5G4e55h_v8FxW3yE9pjep4QoJ==A@mail.gmail.com>

Today's meeting will happen couple of hours from now. i.e. 1PM EST at (
https://bluejeans.com/486278655)

I am not able to see the meeting in my calendar. I am not sure whether this
is the case just for me or is it not visible to others as well.
Either way, I will be waiting at the above mentioned bluejeans link.

Regards,
Raghavendra

On Wed, May 1, 2019 at 8:37 AM Amar Tumballi Suryanarayan <
atumball at redhat.com> wrote:

>
>
> On Tue, Apr 23, 2019 at 8:47 PM Darrell Budic <budic at onholyground.com>
> wrote:
>
>> I was one of the folk who wanted a NA/EMEA scheduled meeting, and I?m
>> going to have to miss it due to some real life issues (clogged sewer I?m
>> going to have to be dealing with at the time). Apologies, I?ll work on
>>  making the next one.
>>
>>
> No problem. We will continue to have these meetings every week (ie,
> bi-weekly in each timezone). Feel free to join when possible. We surely
> like to see more community participation for sure, but everyone would have
> their day jobs, so no pressure :-)
>
> -Amar
>
>
>>   -Darrell
>>
>> On Apr 22, 2019, at 4:20 PM, FNU Raghavendra Manjunath <rabhat at redhat.com>
>> wrote:
>>
>>
>> Hi,
>>
>> This is the agenda for tomorrow's community meeting for NA/EMEA timezone.
>>
>> https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g?both
>> ----
>>
>>
>>
>> On Thu, Apr 11, 2019 at 4:56 AM Amar Tumballi Suryanarayan <
>> atumball at redhat.com> wrote:
>>
>>> Hi All,
>>>
>>> Below is the final details of our community meeting, and I will be
>>> sending invites to mailing list following this email. You can add Gluster
>>> Community Calendar so you can get notifications on the meetings.
>>>
>>> We are starting the meetings from next week. For the first meeting, we
>>> need 1 volunteer from users to discuss the use case / what went well, and
>>> what went bad, etc. preferrably in APAC region.  NA/EMEA region, next week.
>>>
>>> Draft Content: https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g
>>> ----
>>> Gluster Community Meeting
>>> <https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g?both#Previous-Meeting-minutes>Previous
>>> Meeting minutes:
>>>
>>>    - http://github.com/gluster/community
>>>
>>>
>>> <https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g?both#DateTime-Check-the-community-calendar>Date/Time:
>>> Check the community calendar
>>> <https://calendar.google.com/calendar/b/1?cid=dmViajVibDBrbnNiOWQwY205ZWg5cGJsaTRAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ>
>>> <https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g?both#Bridge>Bridge
>>>
>>>    - APAC friendly hours
>>>       - Bridge: https://bluejeans.com/836554017
>>>    - NA/EMEA
>>>       - Bridge: https://bluejeans.com/486278655
>>>
>>> ------------------------------
>>> <https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g?both#Attendance>Attendance
>>>
>>>    - Name, Company<Optional>
>>>
>>> <https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g?both#Host>Host
>>>
>>>    - Who will host next meeting?
>>>       - Host will need to send out the agenda 24hr - 12hrs in advance
>>>       to mailing list, and also make sure to send the meeting minutes.
>>>       - Host will need to reach out to one user at least who can talk
>>>       about their usecase, their experience, and their needs.
>>>       - Host needs to send meeting minutes as PR to
>>>       http://github.com/gluster/community
>>>
>>> <https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g?both#User-stories>User stories
>>>
>>>    - Discuss 1 usecase from a user.
>>>       - How was the architecture derived, what volume type used,
>>>       options, etc?
>>>       - What were the major issues faced ? How to improve them?
>>>       - What worked good?
>>>       - How can we all collaborate well, so it is win-win for the
>>>       community and the user? How can we
>>>
>>> <https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g?both#Community>Community
>>>
>>>    -
>>>
>>>    Any release updates?
>>>    -
>>>
>>>    Blocker issues across the project?
>>>    -
>>>
>>>    Metrics
>>>    - Number of new bugs since previous meeting. How many are not
>>>       triaged?
>>>       - Number of emails, anything unanswered?
>>>
>>> <https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g?both#Conferences--Meetups>Conferences
>>> / Meetups
>>>
>>>    - Any conference in next 1 month where gluster-developers are going?
>>>    gluster-users are going? So we can meet and discuss.
>>>
>>> <https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g?both#Developer-focus>Developer
>>> focus
>>>
>>>    -
>>>
>>>    Any design specs to discuss?
>>>    -
>>>
>>>    Metrics of the week?
>>>    - Coverity
>>>       - Clang-Scan
>>>       - Number of patches from new developers.
>>>       - Did we increase test coverage?
>>>       - [Atin] Also talk about most frequent test failures in the CI
>>>       and carve out an AI to get them fixed.
>>>
>>> <https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g?both#RoundTable>RoundTable
>>>
>>>    - <Open for anything which is not covered in agenda>
>>>
>>> ----
>>>
>>> Regards,
>>> Amar
>>>
>>> On Mon, Mar 25, 2019 at 8:53 PM Amar Tumballi Suryanarayan <
>>> atumball at redhat.com> wrote:
>>>
>>>> Thanks for the feedback Darrell,
>>>>
>>>> The new proposal is to have one in North America 'morning' time. (10AM
>>>> PST), And another in ASIA day time, which is evening 7pm/6pm in Australia,
>>>> 9pm Newzealand, 5pm Tokyo, 4pm Beijing.
>>>>
>>>> For example, if we choose Every other Tuesday for meeting, and 1st of
>>>> the month is Tuesday, we would have North America time for 1st, and on 15th
>>>> it would be ASIA/Pacific time.
>>>>
>>>> Hopefully, this way, we can cover all the timezones, and meeting
>>>> minutes would be committed to github repo, so that way, it will be easier
>>>> for everyone to be aware of what is happening.
>>>>
>>>> Regards,
>>>> Amar
>>>>
>>>> On Mon, Mar 25, 2019 at 8:40 PM Darrell Budic <budic at onholyground.com>
>>>> wrote:
>>>>
>>>>> As a user, I?d like to visit more of these, but the time slot is my
>>>>> 3AM. Any possibility for a rolling schedule (move meeting +6 hours each
>>>>> week with rolling attendance from maintainers?) or an occasional regional
>>>>> meeting 12 hours opposed to the one you?re proposing?
>>>>>
>>>>>   -Darrell
>>>>>
>>>>> On Mar 25, 2019, at 4:25 AM, Amar Tumballi Suryanarayan <
>>>>> atumball at redhat.com> wrote:
>>>>>
>>>>> All,
>>>>>
>>>>> We currently have 3 meetings which are public:
>>>>>
>>>>> 1. Maintainer's Meeting
>>>>>
>>>>> - Runs once in 2 weeks (on Mondays), and current attendance is around
>>>>> 3-5 on an avg, and not much is discussed.
>>>>> - Without majority attendance, we can't take any decisions too.
>>>>>
>>>>> 2. Community meeting
>>>>>
>>>>> - Supposed to happen on #gluster-meeting, every 2 weeks, and is the
>>>>> only meeting which is for 'Community/Users'. Others are for
>>>>> developers as of now.
>>>>> Sadly attendance is getting closer to 0 in recent times.
>>>>>
>>>>> 3. GCS meeting
>>>>>
>>>>> - We started it as an effort inside Red Hat gluster team, and opened
>>>>> it up for community from Jan 2019, but the attendance was always from
>>>>> RHT members, and haven't seen any traction from wider group.
>>>>>
>>>>> So, I have a proposal to call out for cancelling all these meeting,
>>>>> and keeping just 1 weekly 'Community' meeting, where even topics
>>>>> related to maintainers and GCS and other projects can be discussed.
>>>>>
>>>>> I have a template of a draft template @
>>>>> https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g
>>>>>
>>>>> Please feel free to suggest improvements, both in agenda and in
>>>>> timings. So, we can have more participation from members of community,
>>>>> which allows more user - developer interactions, and hence quality of
>>>>> project.
>>>>>
>>>>> Waiting for feedbacks,
>>>>>
>>>>> Regards,
>>>>> Amar
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> Amar Tumballi (amarts)
>>>>
>>>
>>>
>>> --
>>> Amar Tumballi (amarts)
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
> --
> Amar Tumballi (amarts)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190521/3272d6f0/attachment.html>

From dcunningham at voisonics.com  Tue May 21 23:27:09 2019
From: dcunningham at voisonics.com (David Cunningham)
Date: Wed, 22 May 2019 11:27:09 +1200
Subject: [Gluster-users] add-brick: failed: Commit failed
In-Reply-To: <CABj3WN2vDQwLfU673SAr5g6JLNwLMjeW8D46B_Fx+rFv2KB5LQ@mail.gmail.com>
References: <CAHGbP-9UtXqtK8=zsf=V+HghhLmsyONSfkt6SZAYQOtFezHBJw@mail.gmail.com>
	<CAOUCJ=hodD2sW3Maa9U8t5xNueuf8MaW6=KALE15+TPekEAA8g@mail.gmail.com>
	<CABj3WN2vDQwLfU673SAr5g6JLNwLMjeW8D46B_Fx+rFv2KB5LQ@mail.gmail.com>
Message-ID: <CAHGbP-8dhkODZqREoxCWtpFqk+ZrW+EkyWfwgWNqRXoJTS6frg@mail.gmail.com>

Hi Sanju,

Here's what glusterd.log says on the new arbiter server when trying to add
the node:

[2019-05-22 00:15:05.963059] I [run.c:242:runner_log]
(-->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x3b2cd)
[0x7fe4ca9102cd]
-->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0xe6b85)
[0x7fe4ca9bbb85] -->/lib64/libglusterfs.so.0(runner_log+0x115)
[0x7fe4d5ecc955] ) 0-management: Ran script:
/var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh
--volname=gvol0 --version=1 --volume-op=add-brick
--gd-workdir=/var/lib/glusterd
[2019-05-22 00:15:05.963177] I [MSGID: 106578]
[glusterd-brick-ops.c:1355:glusterd_op_perform_add_bricks] 0-management:
replica-count is set 3
[2019-05-22 00:15:05.963228] I [MSGID: 106578]
[glusterd-brick-ops.c:1360:glusterd_op_perform_add_bricks] 0-management:
arbiter-count is set 1
[2019-05-22 00:15:05.963257] I [MSGID: 106578]
[glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks] 0-management:
type is set 0, need to change it
[2019-05-22 00:15:17.015268] E [MSGID: 106053]
[glusterd-utils.c:13942:glusterd_handle_replicate_brick_ops] 0-management:
Failed to set extended attribute trusted.add-brick : Transport endpoint is
not connected [Transport endpoint is not connected]
[2019-05-22 00:15:17.036479] E [MSGID: 106073]
[glusterd-brick-ops.c:2595:glusterd_op_add_brick] 0-glusterd: Unable to add
bricks
[2019-05-22 00:15:17.036595] E [MSGID: 106122]
[glusterd-mgmt.c:299:gd_mgmt_v3_commit_fn] 0-management: Add-brick commit
failed.
[2019-05-22 00:15:17.036710] E [MSGID: 106122]
[glusterd-mgmt-handler.c:594:glusterd_handle_commit_fn] 0-management:
commit failed on operation Add brick

As before gvol0-add-brick-mount.log said:

[2019-05-22 00:15:17.005695] I [fuse-bridge.c:4267:fuse_init]
0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel
7.22
[2019-05-22 00:15:17.005749] I [fuse-bridge.c:4878:fuse_graph_sync] 0-fuse:
switched to graph 0
[2019-05-22 00:15:17.010101] E [fuse-bridge.c:4336:fuse_first_lookup]
0-fuse: first lookup on root failed (Transport endpoint is not connected)
[2019-05-22 00:15:17.014217] W [fuse-bridge.c:897:fuse_attr_cbk]
0-glusterfs-fuse: 2: LOOKUP() / => -1 (Transport endpoint is not connected)
[2019-05-22 00:15:17.015097] W [fuse-resolve.c:127:fuse_resolve_gfid_cbk]
0-fuse: 00000000-0000-0000-0000-000000000001: failed to resolve (Transport
endpoint is not connected)
[2019-05-22 00:15:17.015158] W [fuse-bridge.c:3294:fuse_setxattr_resume]
0-glusterfs-fuse: 3: SETXATTR 00000000-0000-0000-0000-000000000001/1
(trusted.add-brick) resolution failed
[2019-05-22 00:15:17.035636] I [fuse-bridge.c:5144:fuse_thread_proc]
0-fuse: initating unmount of /tmp/mntYGNbj9
[2019-05-22 00:15:17.035854] W [glusterfsd.c:1500:cleanup_and_exit]
(-->/lib64/libpthread.so.0(+0x7dd5) [0x7f7745ccedd5]
-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55c81b63de75]
-->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55c81b63dceb] ) 0-:
received signum (15), shutting down
[2019-05-22 00:15:17.035942] I [fuse-bridge.c:5914:fini] 0-fuse: Unmounting
'/tmp/mntYGNbj9'.
[2019-05-22 00:15:17.035966] I [fuse-bridge.c:5919:fini] 0-fuse: Closing
fuse connection to '/tmp/mntYGNbj9'.

Here are the processes running on the new arbiter server:
# ps -ef | grep gluster
root      3466     1  0 20:13 ?        00:00:00 /usr/sbin/glusterfs -s
localhost --volfile-id gluster/glustershd -p
/var/run/gluster/glustershd/glustershd.pid -l
/var/log/glusterfs/glustershd.log -S
/var/run/gluster/24c12b09f93eec8e.socket --xlator-option
*replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412 --process-name
glustershd
root      6832     1  0 May16 ?        00:02:10 /usr/sbin/glusterd -p
/var/run/glusterd.pid --log-level INFO
root     17841     1  0 May16 ?        00:00:58 /usr/sbin/glusterfs
--process-name fuse --volfile-server=gfs1 --volfile-id=/gvol0 /mnt/glusterfs

Here are the files created on the new arbiter server:
# find /nodirectwritedata/gluster/gvol0 | xargs ls -ald
drwxr-xr-x 3 root root 4096 May 21 20:15 /nodirectwritedata/gluster/gvol0
drw------- 2 root root 4096 May 21 20:15
/nodirectwritedata/gluster/gvol0/.glusterfs

Thank you for your help!


On Tue, 21 May 2019 at 00:10, Sanju Rakonde <srakonde at redhat.com> wrote:

> David,
>
> can you please attach glusterd.logs? As the error message says, Commit
> failed on the arbitar node, we might be able to find some issue on that
> node.
>
> On Mon, May 20, 2019 at 10:10 AM Nithya Balachandran <nbalacha at redhat.com>
> wrote:
>
>>
>>
>> On Fri, 17 May 2019 at 06:01, David Cunningham <dcunningham at voisonics.com>
>> wrote:
>>
>>> Hello,
>>>
>>> We're adding an arbiter node to an existing volume and having an issue.
>>> Can anyone help? The root cause error appears to be
>>> "00000000-0000-0000-0000-000000000001: failed to resolve (Transport
>>> endpoint is not connected)", as below.
>>>
>>> We are running glusterfs 5.6.1. Thanks in advance for any assistance!
>>>
>>> On existing node gfs1, trying to add new arbiter node gfs3:
>>>
>>> # gluster volume add-brick gvol0 replica 3 arbiter 1
>>> gfs3:/nodirectwritedata/gluster/gvol0
>>> volume add-brick: failed: Commit failed on gfs3. Please check log file
>>> for details.
>>>
>>
>> This looks like a glusterd issue. Please check the glusterd logs for more
>> info.
>> Adding the glusterd dev to this thread. Sanju, can you take a look?
>>
>> Regards,
>> Nithya
>>
>>>
>>> On new node gfs3 in gvol0-add-brick-mount.log:
>>>
>>> [2019-05-17 01:20:22.689721] I [fuse-bridge.c:4267:fuse_init]
>>> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel
>>> 7.22
>>> [2019-05-17 01:20:22.689778] I [fuse-bridge.c:4878:fuse_graph_sync]
>>> 0-fuse: switched to graph 0
>>> [2019-05-17 01:20:22.694897] E [fuse-bridge.c:4336:fuse_first_lookup]
>>> 0-fuse: first lookup on root failed (Transport endpoint is not connected)
>>> [2019-05-17 01:20:22.699770] W
>>> [fuse-resolve.c:127:fuse_resolve_gfid_cbk] 0-fuse:
>>> 00000000-0000-0000-0000-000000000001: failed to resolve (Transport endpoint
>>> is not connected)
>>> [2019-05-17 01:20:22.699834] W [fuse-bridge.c:3294:fuse_setxattr_resume]
>>> 0-glusterfs-fuse: 2: SETXATTR 00000000-0000-0000-0000-000000000001/1
>>> (trusted.add-brick) resolution failed
>>> [2019-05-17 01:20:22.715656] I [fuse-bridge.c:5144:fuse_thread_proc]
>>> 0-fuse: initating unmount of /tmp/mntQAtu3f
>>> [2019-05-17 01:20:22.715865] W [glusterfsd.c:1500:cleanup_and_exit]
>>> (-->/lib64/libpthread.so.0(+0x7dd5) [0x7fb223bf6dd5]
>>> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x560886581e75]
>>> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x560886581ceb] ) 0-:
>>> received signum (15), shutting down
>>> [2019-05-17 01:20:22.715926] I [fuse-bridge.c:5914:fini] 0-fuse:
>>> Unmounting '/tmp/mntQAtu3f'.
>>> [2019-05-17 01:20:22.715953] I [fuse-bridge.c:5919:fini] 0-fuse: Closing
>>> fuse connection to '/tmp/mntQAtu3f'.
>>>
>>> Processes running on new node gfs3:
>>>
>>> # ps -ef | grep gluster
>>> root      6832     1  0 20:17 ?        00:00:00 /usr/sbin/glusterd -p
>>> /var/run/glusterd.pid --log-level INFO
>>> root     15799     1  0 20:17 ?        00:00:00 /usr/sbin/glusterfs -s
>>> localhost --volfile-id gluster/glustershd -p
>>> /var/run/gluster/glustershd/glustershd.pid -l
>>> /var/log/glusterfs/glustershd.log -S
>>> /var/run/gluster/24c12b09f93eec8e.socket --xlator-option
>>> *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412 --process-name
>>> glustershd
>>> root     16856 16735  0 21:21 pts/0    00:00:00 grep --color=auto gluster
>>>
>>> --
>>> David Cunningham, Voisonics Limited
>>> http://voisonics.com/
>>> USA: +1 213 221 1092
>>> New Zealand: +64 (0)28 2558 3782
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>
> --
> Thanks,
> Sanju
>


-- 
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190522/a216bad8/attachment.html>

From ravishankar at redhat.com  Wed May 22 00:43:11 2019
From: ravishankar at redhat.com (Ravishankar N)
Date: Wed, 22 May 2019 06:13:11 +0530
Subject: [Gluster-users] add-brick: failed: Commit failed
In-Reply-To: <CAHGbP-8dhkODZqREoxCWtpFqk+ZrW+EkyWfwgWNqRXoJTS6frg@mail.gmail.com>
References: <CAHGbP-9UtXqtK8=zsf=V+HghhLmsyONSfkt6SZAYQOtFezHBJw@mail.gmail.com>
	<CAOUCJ=hodD2sW3Maa9U8t5xNueuf8MaW6=KALE15+TPekEAA8g@mail.gmail.com>
	<CABj3WN2vDQwLfU673SAr5g6JLNwLMjeW8D46B_Fx+rFv2KB5LQ@mail.gmail.com>
	<CAHGbP-8dhkODZqREoxCWtpFqk+ZrW+EkyWfwgWNqRXoJTS6frg@mail.gmail.com>
Message-ID: <924b8cb6-5a61-3a7f-1591-07ffe0d80a24@redhat.com>

Hi David,
Could you provide the `getfattr -d -m. -e hex 
/nodirectwritedata/gluster/gvol0` output of all bricks and the output of 
`gluster volume info`?

Thanks,
Ravi
On 22/05/19 4:57 AM, David Cunningham wrote:
> Hi Sanju,
>
> Here's what glusterd.log says on the new arbiter server when trying to 
> add the node:
>
> [2019-05-22 00:15:05.963059] I [run.c:242:runner_log] 
> (-->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x3b2cd) 
> [0x7fe4ca9102cd] 
> -->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0xe6b85) 
> [0x7fe4ca9bbb85] -->/lib64/libglusterfs.so.0(runner_log+0x115) 
> [0x7fe4d5ecc955] ) 0-management: Ran script: 
> /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh 
> --volname=gvol0 --version=1 --volume-op=add-brick 
> --gd-workdir=/var/lib/glusterd
> [2019-05-22 00:15:05.963177] I [MSGID: 106578] 
> [glusterd-brick-ops.c:1355:glusterd_op_perform_add_bricks] 
> 0-management: replica-count is set 3
> [2019-05-22 00:15:05.963228] I [MSGID: 106578] 
> [glusterd-brick-ops.c:1360:glusterd_op_perform_add_bricks] 
> 0-management: arbiter-count is set 1
> [2019-05-22 00:15:05.963257] I [MSGID: 106578] 
> [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks] 
> 0-management: type is set 0, need to change it
> [2019-05-22 00:15:17.015268] E [MSGID: 106053] 
> [glusterd-utils.c:13942:glusterd_handle_replicate_brick_ops] 
> 0-management: Failed to set extended attribute trusted.add-brick : 
> Transport endpoint is not connected [Transport endpoint is not connected]
> [2019-05-22 00:15:17.036479] E [MSGID: 106073] 
> [glusterd-brick-ops.c:2595:glusterd_op_add_brick] 0-glusterd: Unable 
> to add bricks
> [2019-05-22 00:15:17.036595] E [MSGID: 106122] 
> [glusterd-mgmt.c:299:gd_mgmt_v3_commit_fn] 0-management: Add-brick 
> commit failed.
> [2019-05-22 00:15:17.036710] E [MSGID: 106122] 
> [glusterd-mgmt-handler.c:594:glusterd_handle_commit_fn] 0-management: 
> commit failed on operation Add brick
>
> As before gvol0-add-brick-mount.log said:
>
> [2019-05-22 00:15:17.005695] I [fuse-bridge.c:4267:fuse_init] 
> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 
> kernel 7.22
> [2019-05-22 00:15:17.005749] I [fuse-bridge.c:4878:fuse_graph_sync] 
> 0-fuse: switched to graph 0
> [2019-05-22 00:15:17.010101] E [fuse-bridge.c:4336:fuse_first_lookup] 
> 0-fuse: first lookup on root failed (Transport endpoint is not connected)
> [2019-05-22 00:15:17.014217] W [fuse-bridge.c:897:fuse_attr_cbk] 
> 0-glusterfs-fuse: 2: LOOKUP() / => -1 (Transport endpoint is not 
> connected)
> [2019-05-22 00:15:17.015097] W 
> [fuse-resolve.c:127:fuse_resolve_gfid_cbk] 0-fuse: 
> 00000000-0000-0000-0000-000000000001: failed to resolve (Transport 
> endpoint is not connected)
> [2019-05-22 00:15:17.015158] W 
> [fuse-bridge.c:3294:fuse_setxattr_resume] 0-glusterfs-fuse: 3: 
> SETXATTR 00000000-0000-0000-0000-000000000001/1 (trusted.add-brick) 
> resolution failed
> [2019-05-22 00:15:17.035636] I [fuse-bridge.c:5144:fuse_thread_proc] 
> 0-fuse: initating unmount of /tmp/mntYGNbj9
> [2019-05-22 00:15:17.035854] W [glusterfsd.c:1500:cleanup_and_exit] 
> (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f7745ccedd5] 
> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55c81b63de75] 
> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55c81b63dceb] ) 0-: 
> received signum (15), shutting down
> [2019-05-22 00:15:17.035942] I [fuse-bridge.c:5914:fini] 0-fuse: 
> Unmounting '/tmp/mntYGNbj9'.
> [2019-05-22 00:15:17.035966] I [fuse-bridge.c:5919:fini] 0-fuse: 
> Closing fuse connection to '/tmp/mntYGNbj9'.
>
> Here are the processes running on the new arbiter server:
> # ps -ef | grep gluster
> root????? 3466???? 1? 0 20:13 ???????? 00:00:00 /usr/sbin/glusterfs -s 
> localhost --volfile-id gluster/glustershd -p 
> /var/run/gluster/glustershd/glustershd.pid -l 
> /var/log/glusterfs/glustershd.log -S 
> /var/run/gluster/24c12b09f93eec8e.socket --xlator-option 
> *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412 
> --process-name glustershd
> root????? 6832???? 1? 0 May16 ???????? 00:02:10 /usr/sbin/glusterd -p 
> /var/run/glusterd.pid --log-level INFO
> root???? 17841???? 1? 0 May16 ???????? 00:00:58 /usr/sbin/glusterfs 
> --process-name fuse --volfile-server=gfs1 --volfile-id=/gvol0 
> /mnt/glusterfs
>
> Here are the files created on the new arbiter server:
> # find /nodirectwritedata/gluster/gvol0 | xargs ls -ald
> drwxr-xr-x 3 root root 4096 May 21 20:15 /nodirectwritedata/gluster/gvol0
> drw------- 2 root root 4096 May 21 20:15 
> /nodirectwritedata/gluster/gvol0/.glusterfs
>
> Thank you for your help!
>
>
> On Tue, 21 May 2019 at 00:10, Sanju Rakonde <srakonde at redhat.com 
> <mailto:srakonde at redhat.com>> wrote:
>
>     David,
>
>     can you please attach glusterd.logs? As the error message says,
>     Commit failed on the arbitar node, we might be able to find some
>     issue on that node.
>
>     On Mon, May 20, 2019 at 10:10 AM Nithya Balachandran
>     <nbalacha at redhat.com <mailto:nbalacha at redhat.com>> wrote:
>
>
>
>         On Fri, 17 May 2019 at 06:01, David Cunningham
>         <dcunningham at voisonics.com <mailto:dcunningham at voisonics.com>>
>         wrote:
>
>             Hello,
>
>             We're adding an arbiter node to an existing volume and
>             having an issue. Can anyone help? The root cause error
>             appears to be "00000000-0000-0000-0000-000000000001:
>             failed to resolve (Transport endpoint is not connected)",
>             as below.
>
>             We are running glusterfs 5.6.1. Thanks in advance for any
>             assistance!
>
>             On existing node gfs1, trying to add new arbiter node gfs3:
>
>             # gluster volume add-brick gvol0 replica 3 arbiter 1
>             gfs3:/nodirectwritedata/gluster/gvol0
>             volume add-brick: failed: Commit failed on gfs3. Please
>             check log file for details.
>
>
>         This looks like a glusterd issue. Please check the glusterd
>         logs for more info.
>         Adding the glusterd dev to this thread. Sanju, can you take a
>         look?
>         Regards,
>         Nithya
>
>
>             On new node gfs3 in gvol0-add-brick-mount.log:
>
>             [2019-05-17 01:20:22.689721] I
>             [fuse-bridge.c:4267:fuse_init] 0-glusterfs-fuse: FUSE
>             inited with protocol versions: glusterfs 7.24 kernel 7.22
>             [2019-05-17 01:20:22.689778] I
>             [fuse-bridge.c:4878:fuse_graph_sync] 0-fuse: switched to
>             graph 0
>             [2019-05-17 01:20:22.694897] E
>             [fuse-bridge.c:4336:fuse_first_lookup] 0-fuse: first
>             lookup on root failed (Transport endpoint is not connected)
>             [2019-05-17 01:20:22.699770] W
>             [fuse-resolve.c:127:fuse_resolve_gfid_cbk] 0-fuse:
>             00000000-0000-0000-0000-000000000001: failed to resolve
>             (Transport endpoint is not connected)
>             [2019-05-17 01:20:22.699834] W
>             [fuse-bridge.c:3294:fuse_setxattr_resume]
>             0-glusterfs-fuse: 2: SETXATTR
>             00000000-0000-0000-0000-000000000001/1 (trusted.add-brick)
>             resolution failed
>             [2019-05-17 01:20:22.715656] I
>             [fuse-bridge.c:5144:fuse_thread_proc] 0-fuse: initating
>             unmount of /tmp/mntQAtu3f
>             [2019-05-17 01:20:22.715865] W
>             [glusterfsd.c:1500:cleanup_and_exit]
>             (-->/lib64/libpthread.so.0(+0x7dd5) [0x7fb223bf6dd5]
>             -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5)
>             [0x560886581e75]
>             -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b)
>             [0x560886581ceb] ) 0-: received signum (15), shutting down
>             [2019-05-17 01:20:22.715926] I [fuse-bridge.c:5914:fini]
>             0-fuse: Unmounting '/tmp/mntQAtu3f'.
>             [2019-05-17 01:20:22.715953] I [fuse-bridge.c:5919:fini]
>             0-fuse: Closing fuse connection to '/tmp/mntQAtu3f'.
>
>             Processes running on new node gfs3:
>
>             # ps -ef | grep gluster
>             root????? 6832???? 1? 0 20:17 ???????? 00:00:00
>             /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
>             root???? 15799???? 1? 0 20:17 ???????? 00:00:00
>             /usr/sbin/glusterfs -s localhost --volfile-id
>             gluster/glustershd -p
>             /var/run/gluster/glustershd/glustershd.pid -l
>             /var/log/glusterfs/glustershd.log -S
>             /var/run/gluster/24c12b09f93eec8e.socket --xlator-option
>             *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412
>             --process-name glustershd
>             root???? 16856 16735? 0 21:21 pts/0??? 00:00:00 grep
>             --color=auto gluster
>
>             -- 
>             David Cunningham, Voisonics Limited
>             http://voisonics.com/
>             USA: +1 213 221 1092
>             New Zealand: +64 (0)28 2558 3782
>             _______________________________________________
>             Gluster-users mailing list
>             Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>             https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
>     -- 
>     Thanks,
>     Sanju
>
>
>
> -- 
> David Cunningham, Voisonics Limited
> http://voisonics.com/
> USA: +1 213 221 1092
> New Zealand: +64 (0)28 2558 3782
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190522/ecefc935/attachment.html>

From dcunningham at voisonics.com  Wed May 22 01:20:56 2019
From: dcunningham at voisonics.com (David Cunningham)
Date: Wed, 22 May 2019 13:20:56 +1200
Subject: [Gluster-users] add-brick: failed: Commit failed
In-Reply-To: <924b8cb6-5a61-3a7f-1591-07ffe0d80a24@redhat.com>
References: <CAHGbP-9UtXqtK8=zsf=V+HghhLmsyONSfkt6SZAYQOtFezHBJw@mail.gmail.com>
	<CAOUCJ=hodD2sW3Maa9U8t5xNueuf8MaW6=KALE15+TPekEAA8g@mail.gmail.com>
	<CABj3WN2vDQwLfU673SAr5g6JLNwLMjeW8D46B_Fx+rFv2KB5LQ@mail.gmail.com>
	<CAHGbP-8dhkODZqREoxCWtpFqk+ZrW+EkyWfwgWNqRXoJTS6frg@mail.gmail.com>
	<924b8cb6-5a61-3a7f-1591-07ffe0d80a24@redhat.com>
Message-ID: <CAHGbP-_A6JogdzUOm7UcGd8V8pZbF-veR0g_rsMvC7Ju0sCqMw@mail.gmail.com>

Hi Ravi,

Certainly. On the existing two nodes:

gfs1 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0
getfattr: Removing leading '/' from absolute path names
# file: nodirectwritedata/gluster/gvol0
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.gvol0-client-2=0x000000000000000000000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6

gfs2 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0
getfattr: Removing leading '/' from absolute path names
# file: nodirectwritedata/gluster/gvol0
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.gvol0-client-0=0x000000000000000000000000
trusted.afr.gvol0-client-2=0x000000000000000000000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6

On the new node:

gfs3 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0
getfattr: Removing leading '/' from absolute path names
# file: nodirectwritedata/gluster/gvol0
trusted.afr.dirty=0x000000000000000000000001
trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6

Output of "gluster volume info" is the same on all 3 nodes and is:

# gluster volume info

Volume Name: gvol0
Type: Replicate
Volume ID: fb5af69e-1c3e-4164-8b23-c1d7bec9b1b6
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: gfs1:/nodirectwritedata/gluster/gvol0
Brick2: gfs2:/nodirectwritedata/gluster/gvol0
Brick3: gfs3:/nodirectwritedata/gluster/gvol0 (arbiter)
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet


On Wed, 22 May 2019 at 12:43, Ravishankar N <ravishankar at redhat.com> wrote:

> Hi David,
> Could you provide the `getfattr -d -m. -e hex
> /nodirectwritedata/gluster/gvol0` output of all bricks and the output of
> `gluster volume info`?
>
> Thanks,
> Ravi
> On 22/05/19 4:57 AM, David Cunningham wrote:
>
> Hi Sanju,
>
> Here's what glusterd.log says on the new arbiter server when trying to add
> the node:
>
> [2019-05-22 00:15:05.963059] I [run.c:242:runner_log]
> (-->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x3b2cd)
> [0x7fe4ca9102cd]
> -->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0xe6b85)
> [0x7fe4ca9bbb85] -->/lib64/libglusterfs.so.0(runner_log+0x115)
> [0x7fe4d5ecc955] ) 0-management: Ran script:
> /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh
> --volname=gvol0 --version=1 --volume-op=add-brick
> --gd-workdir=/var/lib/glusterd
> [2019-05-22 00:15:05.963177] I [MSGID: 106578]
> [glusterd-brick-ops.c:1355:glusterd_op_perform_add_bricks] 0-management:
> replica-count is set 3
> [2019-05-22 00:15:05.963228] I [MSGID: 106578]
> [glusterd-brick-ops.c:1360:glusterd_op_perform_add_bricks] 0-management:
> arbiter-count is set 1
> [2019-05-22 00:15:05.963257] I [MSGID: 106578]
> [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks] 0-management:
> type is set 0, need to change it
> [2019-05-22 00:15:17.015268] E [MSGID: 106053]
> [glusterd-utils.c:13942:glusterd_handle_replicate_brick_ops] 0-management:
> Failed to set extended attribute trusted.add-brick : Transport endpoint is
> not connected [Transport endpoint is not connected]
> [2019-05-22 00:15:17.036479] E [MSGID: 106073]
> [glusterd-brick-ops.c:2595:glusterd_op_add_brick] 0-glusterd: Unable to add
> bricks
> [2019-05-22 00:15:17.036595] E [MSGID: 106122]
> [glusterd-mgmt.c:299:gd_mgmt_v3_commit_fn] 0-management: Add-brick commit
> failed.
> [2019-05-22 00:15:17.036710] E [MSGID: 106122]
> [glusterd-mgmt-handler.c:594:glusterd_handle_commit_fn] 0-management:
> commit failed on operation Add brick
>
> As before gvol0-add-brick-mount.log said:
>
> [2019-05-22 00:15:17.005695] I [fuse-bridge.c:4267:fuse_init]
> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel
> 7.22
> [2019-05-22 00:15:17.005749] I [fuse-bridge.c:4878:fuse_graph_sync]
> 0-fuse: switched to graph 0
> [2019-05-22 00:15:17.010101] E [fuse-bridge.c:4336:fuse_first_lookup]
> 0-fuse: first lookup on root failed (Transport endpoint is not connected)
> [2019-05-22 00:15:17.014217] W [fuse-bridge.c:897:fuse_attr_cbk]
> 0-glusterfs-fuse: 2: LOOKUP() / => -1 (Transport endpoint is not connected)
> [2019-05-22 00:15:17.015097] W [fuse-resolve.c:127:fuse_resolve_gfid_cbk]
> 0-fuse: 00000000-0000-0000-0000-000000000001: failed to resolve (Transport
> endpoint is not connected)
> [2019-05-22 00:15:17.015158] W [fuse-bridge.c:3294:fuse_setxattr_resume]
> 0-glusterfs-fuse: 3: SETXATTR 00000000-0000-0000-0000-000000000001/1
> (trusted.add-brick) resolution failed
> [2019-05-22 00:15:17.035636] I [fuse-bridge.c:5144:fuse_thread_proc]
> 0-fuse: initating unmount of /tmp/mntYGNbj9
> [2019-05-22 00:15:17.035854] W [glusterfsd.c:1500:cleanup_and_exit]
> (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f7745ccedd5]
> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55c81b63de75]
> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55c81b63dceb] ) 0-:
> received signum (15), shutting down
> [2019-05-22 00:15:17.035942] I [fuse-bridge.c:5914:fini] 0-fuse:
> Unmounting '/tmp/mntYGNbj9'.
> [2019-05-22 00:15:17.035966] I [fuse-bridge.c:5919:fini] 0-fuse: Closing
> fuse connection to '/tmp/mntYGNbj9'.
>
> Here are the processes running on the new arbiter server:
> # ps -ef | grep gluster
> root      3466     1  0 20:13 ?        00:00:00 /usr/sbin/glusterfs -s
> localhost --volfile-id gluster/glustershd -p
> /var/run/gluster/glustershd/glustershd.pid -l
> /var/log/glusterfs/glustershd.log -S
> /var/run/gluster/24c12b09f93eec8e.socket --xlator-option
> *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412 --process-name
> glustershd
> root      6832     1  0 May16 ?        00:02:10 /usr/sbin/glusterd -p
> /var/run/glusterd.pid --log-level INFO
> root     17841     1  0 May16 ?        00:00:58 /usr/sbin/glusterfs
> --process-name fuse --volfile-server=gfs1 --volfile-id=/gvol0 /mnt/glusterfs
>
> Here are the files created on the new arbiter server:
> # find /nodirectwritedata/gluster/gvol0 | xargs ls -ald
> drwxr-xr-x 3 root root 4096 May 21 20:15 /nodirectwritedata/gluster/gvol0
> drw------- 2 root root 4096 May 21 20:15
> /nodirectwritedata/gluster/gvol0/.glusterfs
>
> Thank you for your help!
>
>
> On Tue, 21 May 2019 at 00:10, Sanju Rakonde <srakonde at redhat.com> wrote:
>
>> David,
>>
>> can you please attach glusterd.logs? As the error message says, Commit
>> failed on the arbitar node, we might be able to find some issue on that
>> node.
>>
>> On Mon, May 20, 2019 at 10:10 AM Nithya Balachandran <nbalacha at redhat.com>
>> wrote:
>>
>>>
>>>
>>> On Fri, 17 May 2019 at 06:01, David Cunningham <
>>> dcunningham at voisonics.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> We're adding an arbiter node to an existing volume and having an issue.
>>>> Can anyone help? The root cause error appears to be
>>>> "00000000-0000-0000-0000-000000000001: failed to resolve (Transport
>>>> endpoint is not connected)", as below.
>>>>
>>>> We are running glusterfs 5.6.1. Thanks in advance for any assistance!
>>>>
>>>> On existing node gfs1, trying to add new arbiter node gfs3:
>>>>
>>>> # gluster volume add-brick gvol0 replica 3 arbiter 1
>>>> gfs3:/nodirectwritedata/gluster/gvol0
>>>> volume add-brick: failed: Commit failed on gfs3. Please check log file
>>>> for details.
>>>>
>>>
>>> This looks like a glusterd issue. Please check the glusterd logs for
>>> more info.
>>> Adding the glusterd dev to this thread. Sanju, can you take a look?
>>>
>>> Regards,
>>> Nithya
>>>
>>>>
>>>> On new node gfs3 in gvol0-add-brick-mount.log:
>>>>
>>>> [2019-05-17 01:20:22.689721] I [fuse-bridge.c:4267:fuse_init]
>>>> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel
>>>> 7.22
>>>> [2019-05-17 01:20:22.689778] I [fuse-bridge.c:4878:fuse_graph_sync]
>>>> 0-fuse: switched to graph 0
>>>> [2019-05-17 01:20:22.694897] E [fuse-bridge.c:4336:fuse_first_lookup]
>>>> 0-fuse: first lookup on root failed (Transport endpoint is not connected)
>>>> [2019-05-17 01:20:22.699770] W
>>>> [fuse-resolve.c:127:fuse_resolve_gfid_cbk] 0-fuse:
>>>> 00000000-0000-0000-0000-000000000001: failed to resolve (Transport endpoint
>>>> is not connected)
>>>> [2019-05-17 01:20:22.699834] W
>>>> [fuse-bridge.c:3294:fuse_setxattr_resume] 0-glusterfs-fuse: 2: SETXATTR
>>>> 00000000-0000-0000-0000-000000000001/1 (trusted.add-brick) resolution failed
>>>> [2019-05-17 01:20:22.715656] I [fuse-bridge.c:5144:fuse_thread_proc]
>>>> 0-fuse: initating unmount of /tmp/mntQAtu3f
>>>> [2019-05-17 01:20:22.715865] W [glusterfsd.c:1500:cleanup_and_exit]
>>>> (-->/lib64/libpthread.so.0(+0x7dd5) [0x7fb223bf6dd5]
>>>> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x560886581e75]
>>>> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x560886581ceb] ) 0-:
>>>> received signum (15), shutting down
>>>> [2019-05-17 01:20:22.715926] I [fuse-bridge.c:5914:fini] 0-fuse:
>>>> Unmounting '/tmp/mntQAtu3f'.
>>>> [2019-05-17 01:20:22.715953] I [fuse-bridge.c:5919:fini] 0-fuse:
>>>> Closing fuse connection to '/tmp/mntQAtu3f'.
>>>>
>>>> Processes running on new node gfs3:
>>>>
>>>> # ps -ef | grep gluster
>>>> root      6832     1  0 20:17 ?        00:00:00 /usr/sbin/glusterd -p
>>>> /var/run/glusterd.pid --log-level INFO
>>>> root     15799     1  0 20:17 ?        00:00:00 /usr/sbin/glusterfs -s
>>>> localhost --volfile-id gluster/glustershd -p
>>>> /var/run/gluster/glustershd/glustershd.pid -l
>>>> /var/log/glusterfs/glustershd.log -S
>>>> /var/run/gluster/24c12b09f93eec8e.socket --xlator-option
>>>> *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412 --process-name
>>>> glustershd
>>>> root     16856 16735  0 21:21 pts/0    00:00:00 grep --color=auto
>>>> gluster
>>>>
>>>> --
>>>> David Cunningham, Voisonics Limited
>>>> http://voisonics.com/
>>>> USA: +1 213 221 1092
>>>> New Zealand: +64 (0)28 2558 3782
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>>
>> --
>> Thanks,
>> Sanju
>>
>
>
> --
> David Cunningham, Voisonics Limited
> http://voisonics.com/
> USA: +1 213 221 1092
> New Zealand: +64 (0)28 2558 3782
>
> _______________________________________________
> Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users
>
>

-- 
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190522/19f228b1/attachment.html>

From ravishankar at redhat.com  Wed May 22 01:55:58 2019
From: ravishankar at redhat.com (Ravishankar N)
Date: Wed, 22 May 2019 07:25:58 +0530
Subject: [Gluster-users] add-brick: failed: Commit failed
In-Reply-To: <CAHGbP-_A6JogdzUOm7UcGd8V8pZbF-veR0g_rsMvC7Ju0sCqMw@mail.gmail.com>
References: <CAHGbP-9UtXqtK8=zsf=V+HghhLmsyONSfkt6SZAYQOtFezHBJw@mail.gmail.com>
	<CAOUCJ=hodD2sW3Maa9U8t5xNueuf8MaW6=KALE15+TPekEAA8g@mail.gmail.com>
	<CABj3WN2vDQwLfU673SAr5g6JLNwLMjeW8D46B_Fx+rFv2KB5LQ@mail.gmail.com>
	<CAHGbP-8dhkODZqREoxCWtpFqk+ZrW+EkyWfwgWNqRXoJTS6frg@mail.gmail.com>
	<924b8cb6-5a61-3a7f-1591-07ffe0d80a24@redhat.com>
	<CAHGbP-_A6JogdzUOm7UcGd8V8pZbF-veR0g_rsMvC7Ju0sCqMw@mail.gmail.com>
Message-ID: <47a6c5fa-4304-4680-d63f-99ecd1e43c4c@redhat.com>

Hmm, so the volume info seems to indicate that the add-brick was 
successful but the gfid xattr is missing on the new brick (as are the 
actual files, barring the .glusterfs folder, according to your previous 
mail).

Do you want to try removing and adding it again?

1. `gluster volume remove-brick gvol0 replica 2 
gfs3:/nodirectwritedata/gluster/gvol0 force` from gfs1

2. Check that gluster volume info is now back to a 1x2 volume on all 
nodes and `gluster peer status` is? connected on all nodes.

3. Cleanup or reformat '/nodirectwritedata/gluster/gvol0' on gfs3.

4. `gluster volume add-brick gvol0 replica 3 arbiter 1 
gfs3:/nodirectwritedata/gluster/gvol0` from gfs1.

5. Check that the files are getting healed on to the new brick.

Thanks,
Ravi
On 22/05/19 6:50 AM, David Cunningham wrote:
> Hi Ravi,
>
> Certainly. On the existing two nodes:
>
> gfs1 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0
> getfattr: Removing leading '/' from absolute path names
> # file: nodirectwritedata/gluster/gvol0
> trusted.afr.dirty=0x000000000000000000000000
> trusted.afr.gvol0-client-2=0x000000000000000000000000
> trusted.gfid=0x00000000000000000000000000000001
> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
> trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6
>
> gfs2 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0
> getfattr: Removing leading '/' from absolute path names
> # file: nodirectwritedata/gluster/gvol0
> trusted.afr.dirty=0x000000000000000000000000
> trusted.afr.gvol0-client-0=0x000000000000000000000000
> trusted.afr.gvol0-client-2=0x000000000000000000000000
> trusted.gfid=0x00000000000000000000000000000001
> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
> trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6
>
> On the new node:
>
> gfs3 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0
> getfattr: Removing leading '/' from absolute path names
> # file: nodirectwritedata/gluster/gvol0
> trusted.afr.dirty=0x000000000000000000000001
> trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6
>
> Output of "gluster volume info" is the same on all 3 nodes and is:
>
> # gluster volume info
>
> Volume Name: gvol0
> Type: Replicate
> Volume ID: fb5af69e-1c3e-4164-8b23-c1d7bec9b1b6
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x (2 + 1) = 3
> Transport-type: tcp
> Bricks:
> Brick1: gfs1:/nodirectwritedata/gluster/gvol0
> Brick2: gfs2:/nodirectwritedata/gluster/gvol0
> Brick3: gfs3:/nodirectwritedata/gluster/gvol0 (arbiter)
> Options Reconfigured:
> performance.client-io-threads: off
> nfs.disable: on
> transport.address-family: inet
>
>
> On Wed, 22 May 2019 at 12:43, Ravishankar N <ravishankar at redhat.com 
> <mailto:ravishankar at redhat.com>> wrote:
>
>     Hi David,
>     Could you provide the `getfattr -d -m. -e hex
>     /nodirectwritedata/gluster/gvol0` output of all bricks and the
>     output of `gluster volume info`?
>
>     Thanks,
>     Ravi
>     On 22/05/19 4:57 AM, David Cunningham wrote:
>>     Hi Sanju,
>>
>>     Here's what glusterd.log says on the new arbiter server when
>>     trying to add the node:
>>
>>     [2019-05-22 00:15:05.963059] I [run.c:242:runner_log]
>>     (-->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x3b2cd)
>>     [0x7fe4ca9102cd]
>>     -->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0xe6b85)
>>     [0x7fe4ca9bbb85] -->/lib64/libglusterfs.so.0(runner_log+0x115)
>>     [0x7fe4d5ecc955] ) 0-management: Ran script:
>>     /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh
>>     --volname=gvol0 --version=1 --volume-op=add-brick
>>     --gd-workdir=/var/lib/glusterd
>>     [2019-05-22 00:15:05.963177] I [MSGID: 106578]
>>     [glusterd-brick-ops.c:1355:glusterd_op_perform_add_bricks]
>>     0-management: replica-count is set 3
>>     [2019-05-22 00:15:05.963228] I [MSGID: 106578]
>>     [glusterd-brick-ops.c:1360:glusterd_op_perform_add_bricks]
>>     0-management: arbiter-count is set 1
>>     [2019-05-22 00:15:05.963257] I [MSGID: 106578]
>>     [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks]
>>     0-management: type is set 0, need to change it
>>     [2019-05-22 00:15:17.015268] E [MSGID: 106053]
>>     [glusterd-utils.c:13942:glusterd_handle_replicate_brick_ops]
>>     0-management: Failed to set extended attribute trusted.add-brick
>>     : Transport endpoint is not connected [Transport endpoint is not
>>     connected]
>>     [2019-05-22 00:15:17.036479] E [MSGID: 106073]
>>     [glusterd-brick-ops.c:2595:glusterd_op_add_brick] 0-glusterd:
>>     Unable to add bricks
>>     [2019-05-22 00:15:17.036595] E [MSGID: 106122]
>>     [glusterd-mgmt.c:299:gd_mgmt_v3_commit_fn] 0-management:
>>     Add-brick commit failed.
>>     [2019-05-22 00:15:17.036710] E [MSGID: 106122]
>>     [glusterd-mgmt-handler.c:594:glusterd_handle_commit_fn]
>>     0-management: commit failed on operation Add brick
>>
>>     As before gvol0-add-brick-mount.log said:
>>
>>     [2019-05-22 00:15:17.005695] I [fuse-bridge.c:4267:fuse_init]
>>     0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs
>>     7.24 kernel 7.22
>>     [2019-05-22 00:15:17.005749] I
>>     [fuse-bridge.c:4878:fuse_graph_sync] 0-fuse: switched to graph 0
>>     [2019-05-22 00:15:17.010101] E
>>     [fuse-bridge.c:4336:fuse_first_lookup] 0-fuse: first lookup on
>>     root failed (Transport endpoint is not connected)
>>     [2019-05-22 00:15:17.014217] W [fuse-bridge.c:897:fuse_attr_cbk]
>>     0-glusterfs-fuse: 2: LOOKUP() / => -1 (Transport endpoint is not
>>     connected)
>>     [2019-05-22 00:15:17.015097] W
>>     [fuse-resolve.c:127:fuse_resolve_gfid_cbk] 0-fuse:
>>     00000000-0000-0000-0000-000000000001: failed to resolve
>>     (Transport endpoint is not connected)
>>     [2019-05-22 00:15:17.015158] W
>>     [fuse-bridge.c:3294:fuse_setxattr_resume] 0-glusterfs-fuse: 3:
>>     SETXATTR 00000000-0000-0000-0000-000000000001/1
>>     (trusted.add-brick) resolution failed
>>     [2019-05-22 00:15:17.035636] I
>>     [fuse-bridge.c:5144:fuse_thread_proc] 0-fuse: initating unmount
>>     of /tmp/mntYGNbj9
>>     [2019-05-22 00:15:17.035854] W
>>     [glusterfsd.c:1500:cleanup_and_exit]
>>     (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f7745ccedd5]
>>     -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55c81b63de75]
>>     -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55c81b63dceb] )
>>     0-: received signum (15), shutting down
>>     [2019-05-22 00:15:17.035942] I [fuse-bridge.c:5914:fini] 0-fuse:
>>     Unmounting '/tmp/mntYGNbj9'.
>>     [2019-05-22 00:15:17.035966] I [fuse-bridge.c:5919:fini] 0-fuse:
>>     Closing fuse connection to '/tmp/mntYGNbj9'.
>>
>>     Here are the processes running on the new arbiter server:
>>     # ps -ef | grep gluster
>>     root????? 3466???? 1? 0 20:13 ? 00:00:00 /usr/sbin/glusterfs -s
>>     localhost --volfile-id gluster/glustershd -p
>>     /var/run/gluster/glustershd/glustershd.pid -l
>>     /var/log/glusterfs/glustershd.log -S
>>     /var/run/gluster/24c12b09f93eec8e.socket --xlator-option
>>     *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412
>>     --process-name glustershd
>>     root????? 6832???? 1? 0 May16 ? 00:02:10 /usr/sbin/glusterd -p
>>     /var/run/glusterd.pid --log-level INFO
>>     root???? 17841???? 1? 0 May16 ? 00:00:58 /usr/sbin/glusterfs
>>     --process-name fuse --volfile-server=gfs1 --volfile-id=/gvol0
>>     /mnt/glusterfs
>>
>>     Here are the files created on the new arbiter server:
>>     # find /nodirectwritedata/gluster/gvol0 | xargs ls -ald
>>     drwxr-xr-x 3 root root 4096 May 21 20:15
>>     /nodirectwritedata/gluster/gvol0
>>     drw------- 2 root root 4096 May 21 20:15
>>     /nodirectwritedata/gluster/gvol0/.glusterfs
>>
>>     Thank you for your help!
>>
>>
>>     On Tue, 21 May 2019 at 00:10, Sanju Rakonde <srakonde at redhat.com
>>     <mailto:srakonde at redhat.com>> wrote:
>>
>>         David,
>>
>>         can you please attach glusterd.logs? As the error message
>>         says, Commit failed on the arbitar node, we might be able to
>>         find some issue on that node.
>>
>>         On Mon, May 20, 2019 at 10:10 AM Nithya Balachandran
>>         <nbalacha at redhat.com <mailto:nbalacha at redhat.com>> wrote:
>>
>>
>>
>>             On Fri, 17 May 2019 at 06:01, David Cunningham
>>             <dcunningham at voisonics.com
>>             <mailto:dcunningham at voisonics.com>> wrote:
>>
>>                 Hello,
>>
>>                 We're adding an arbiter node to an existing volume
>>                 and having an issue. Can anyone help? The root cause
>>                 error appears to be
>>                 "00000000-0000-0000-0000-000000000001: failed to
>>                 resolve (Transport endpoint is not connected)", as below.
>>
>>                 We are running glusterfs 5.6.1. Thanks in advance for
>>                 any assistance!
>>
>>                 On existing node gfs1, trying to add new arbiter node
>>                 gfs3:
>>
>>                 # gluster volume add-brick gvol0 replica 3 arbiter 1
>>                 gfs3:/nodirectwritedata/gluster/gvol0
>>                 volume add-brick: failed: Commit failed on gfs3.
>>                 Please check log file for details.
>>
>>
>>             This looks like a glusterd issue. Please check the
>>             glusterd logs for more info.
>>             Adding the glusterd dev to this thread. Sanju, can you
>>             take a look?
>>             Regards,
>>             Nithya
>>
>>
>>                 On new node gfs3 in gvol0-add-brick-mount.log:
>>
>>                 [2019-05-17 01:20:22.689721] I
>>                 [fuse-bridge.c:4267:fuse_init] 0-glusterfs-fuse: FUSE
>>                 inited with protocol versions: glusterfs 7.24 kernel 7.22
>>                 [2019-05-17 01:20:22.689778] I
>>                 [fuse-bridge.c:4878:fuse_graph_sync] 0-fuse: switched
>>                 to graph 0
>>                 [2019-05-17 01:20:22.694897] E
>>                 [fuse-bridge.c:4336:fuse_first_lookup] 0-fuse: first
>>                 lookup on root failed (Transport endpoint is not
>>                 connected)
>>                 [2019-05-17 01:20:22.699770] W
>>                 [fuse-resolve.c:127:fuse_resolve_gfid_cbk] 0-fuse:
>>                 00000000-0000-0000-0000-000000000001: failed to
>>                 resolve (Transport endpoint is not connected)
>>                 [2019-05-17 01:20:22.699834] W
>>                 [fuse-bridge.c:3294:fuse_setxattr_resume]
>>                 0-glusterfs-fuse: 2: SETXATTR
>>                 00000000-0000-0000-0000-000000000001/1
>>                 (trusted.add-brick) resolution failed
>>                 [2019-05-17 01:20:22.715656] I
>>                 [fuse-bridge.c:5144:fuse_thread_proc] 0-fuse:
>>                 initating unmount of /tmp/mntQAtu3f
>>                 [2019-05-17 01:20:22.715865] W
>>                 [glusterfsd.c:1500:cleanup_and_exit]
>>                 (-->/lib64/libpthread.so.0(+0x7dd5) [0x7fb223bf6dd5]
>>                 -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5)
>>                 [0x560886581e75]
>>                 -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b)
>>                 [0x560886581ceb] ) 0-: received signum (15), shutting
>>                 down
>>                 [2019-05-17 01:20:22.715926] I
>>                 [fuse-bridge.c:5914:fini] 0-fuse: Unmounting
>>                 '/tmp/mntQAtu3f'.
>>                 [2019-05-17 01:20:22.715953] I
>>                 [fuse-bridge.c:5919:fini] 0-fuse: Closing fuse
>>                 connection to '/tmp/mntQAtu3f'.
>>
>>                 Processes running on new node gfs3:
>>
>>                 # ps -ef | grep gluster
>>                 root????? 6832???? 1? 0 20:17 ???????? 00:00:00
>>                 /usr/sbin/glusterd -p /var/run/glusterd.pid
>>                 --log-level INFO
>>                 root???? 15799???? 1? 0 20:17 ???????? 00:00:00
>>                 /usr/sbin/glusterfs -s localhost --volfile-id
>>                 gluster/glustershd -p
>>                 /var/run/gluster/glustershd/glustershd.pid -l
>>                 /var/log/glusterfs/glustershd.log -S
>>                 /var/run/gluster/24c12b09f93eec8e.socket
>>                 --xlator-option
>>                 *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412
>>                 --process-name glustershd
>>                 root???? 16856 16735? 0 21:21 pts/0??? 00:00:00 grep
>>                 --color=auto gluster
>>
>>                 -- 
>>                 David Cunningham, Voisonics Limited
>>                 http://voisonics.com/
>>                 USA: +1 213 221 1092
>>                 New Zealand: +64 (0)28 2558 3782
>>                 _______________________________________________
>>                 Gluster-users mailing list
>>                 Gluster-users at gluster.org
>>                 <mailto:Gluster-users at gluster.org>
>>                 https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>>         -- 
>>         Thanks,
>>         Sanju
>>
>>
>>
>>     -- 
>>     David Cunningham, Voisonics Limited
>>     http://voisonics.com/
>>     USA: +1 213 221 1092
>>     New Zealand: +64 (0)28 2558 3782
>>
>>     _______________________________________________
>>     Gluster-users mailing list
>>     Gluster-users at gluster.org  <mailto:Gluster-users at gluster.org>
>>     https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
> -- 
> David Cunningham, Voisonics Limited
> http://voisonics.com/
> USA: +1 213 221 1092
> New Zealand: +64 (0)28 2558 3782
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190522/fe6dce71/attachment.html>

From dcunningham at voisonics.com  Wed May 22 05:53:11 2019
From: dcunningham at voisonics.com (David Cunningham)
Date: Wed, 22 May 2019 17:53:11 +1200
Subject: [Gluster-users] add-brick: failed: Commit failed
In-Reply-To: <47a6c5fa-4304-4680-d63f-99ecd1e43c4c@redhat.com>
References: <CAHGbP-9UtXqtK8=zsf=V+HghhLmsyONSfkt6SZAYQOtFezHBJw@mail.gmail.com>
	<CAOUCJ=hodD2sW3Maa9U8t5xNueuf8MaW6=KALE15+TPekEAA8g@mail.gmail.com>
	<CABj3WN2vDQwLfU673SAr5g6JLNwLMjeW8D46B_Fx+rFv2KB5LQ@mail.gmail.com>
	<CAHGbP-8dhkODZqREoxCWtpFqk+ZrW+EkyWfwgWNqRXoJTS6frg@mail.gmail.com>
	<924b8cb6-5a61-3a7f-1591-07ffe0d80a24@redhat.com>
	<CAHGbP-_A6JogdzUOm7UcGd8V8pZbF-veR0g_rsMvC7Ju0sCqMw@mail.gmail.com>
	<47a6c5fa-4304-4680-d63f-99ecd1e43c4c@redhat.com>
Message-ID: <CAHGbP-_wr7TOHHCjU4M0gwK4_HvS_1mFm4pg+iRRNOsoo0npFw@mail.gmail.com>

Hi Ravi,

I'd already done exactly that before, where step 3 was a simple 'rm -rf
/nodirectwritedata/gluster/gvol0'. Have you another suggestion on what the
cleanup or reformat should be?

Thank you.


On Wed, 22 May 2019 at 13:56, Ravishankar N <ravishankar at redhat.com> wrote:

> Hmm, so the volume info seems to indicate that the add-brick was
> successful but the gfid xattr is missing on the new brick (as are the
> actual files, barring the .glusterfs folder, according to your previous
> mail).
>
> Do you want to try removing and adding it again?
>
> 1. `gluster volume remove-brick gvol0 replica 2
> gfs3:/nodirectwritedata/gluster/gvol0 force` from gfs1
>
> 2. Check that gluster volume info is now back to a 1x2 volume on all nodes
> and `gluster peer status` is  connected on all nodes.
>
> 3. Cleanup or reformat '/nodirectwritedata/gluster/gvol0' on gfs3.
>
> 4. `gluster volume add-brick gvol0 replica 3 arbiter 1
> gfs3:/nodirectwritedata/gluster/gvol0` from gfs1.
>
> 5. Check that the files are getting healed on to the new brick.
> Thanks,
> Ravi
> On 22/05/19 6:50 AM, David Cunningham wrote:
>
> Hi Ravi,
>
> Certainly. On the existing two nodes:
>
> gfs1 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0
> getfattr: Removing leading '/' from absolute path names
> # file: nodirectwritedata/gluster/gvol0
> trusted.afr.dirty=0x000000000000000000000000
> trusted.afr.gvol0-client-2=0x000000000000000000000000
> trusted.gfid=0x00000000000000000000000000000001
> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
> trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6
>
> gfs2 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0
> getfattr: Removing leading '/' from absolute path names
> # file: nodirectwritedata/gluster/gvol0
> trusted.afr.dirty=0x000000000000000000000000
> trusted.afr.gvol0-client-0=0x000000000000000000000000
> trusted.afr.gvol0-client-2=0x000000000000000000000000
> trusted.gfid=0x00000000000000000000000000000001
> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
> trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6
>
> On the new node:
>
> gfs3 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0
> getfattr: Removing leading '/' from absolute path names
> # file: nodirectwritedata/gluster/gvol0
> trusted.afr.dirty=0x000000000000000000000001
> trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6
>
> Output of "gluster volume info" is the same on all 3 nodes and is:
>
> # gluster volume info
>
> Volume Name: gvol0
> Type: Replicate
> Volume ID: fb5af69e-1c3e-4164-8b23-c1d7bec9b1b6
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x (2 + 1) = 3
> Transport-type: tcp
> Bricks:
> Brick1: gfs1:/nodirectwritedata/gluster/gvol0
> Brick2: gfs2:/nodirectwritedata/gluster/gvol0
> Brick3: gfs3:/nodirectwritedata/gluster/gvol0 (arbiter)
> Options Reconfigured:
> performance.client-io-threads: off
> nfs.disable: on
> transport.address-family: inet
>
>
> On Wed, 22 May 2019 at 12:43, Ravishankar N <ravishankar at redhat.com>
> wrote:
>
>> Hi David,
>> Could you provide the `getfattr -d -m. -e hex
>> /nodirectwritedata/gluster/gvol0` output of all bricks and the output of
>> `gluster volume info`?
>>
>> Thanks,
>> Ravi
>> On 22/05/19 4:57 AM, David Cunningham wrote:
>>
>> Hi Sanju,
>>
>> Here's what glusterd.log says on the new arbiter server when trying to
>> add the node:
>>
>> [2019-05-22 00:15:05.963059] I [run.c:242:runner_log]
>> (-->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x3b2cd)
>> [0x7fe4ca9102cd]
>> -->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0xe6b85)
>> [0x7fe4ca9bbb85] -->/lib64/libglusterfs.so.0(runner_log+0x115)
>> [0x7fe4d5ecc955] ) 0-management: Ran script:
>> /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh
>> --volname=gvol0 --version=1 --volume-op=add-brick
>> --gd-workdir=/var/lib/glusterd
>> [2019-05-22 00:15:05.963177] I [MSGID: 106578]
>> [glusterd-brick-ops.c:1355:glusterd_op_perform_add_bricks] 0-management:
>> replica-count is set 3
>> [2019-05-22 00:15:05.963228] I [MSGID: 106578]
>> [glusterd-brick-ops.c:1360:glusterd_op_perform_add_bricks] 0-management:
>> arbiter-count is set 1
>> [2019-05-22 00:15:05.963257] I [MSGID: 106578]
>> [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks] 0-management:
>> type is set 0, need to change it
>> [2019-05-22 00:15:17.015268] E [MSGID: 106053]
>> [glusterd-utils.c:13942:glusterd_handle_replicate_brick_ops] 0-management:
>> Failed to set extended attribute trusted.add-brick : Transport endpoint is
>> not connected [Transport endpoint is not connected]
>> [2019-05-22 00:15:17.036479] E [MSGID: 106073]
>> [glusterd-brick-ops.c:2595:glusterd_op_add_brick] 0-glusterd: Unable to add
>> bricks
>> [2019-05-22 00:15:17.036595] E [MSGID: 106122]
>> [glusterd-mgmt.c:299:gd_mgmt_v3_commit_fn] 0-management: Add-brick commit
>> failed.
>> [2019-05-22 00:15:17.036710] E [MSGID: 106122]
>> [glusterd-mgmt-handler.c:594:glusterd_handle_commit_fn] 0-management:
>> commit failed on operation Add brick
>>
>> As before gvol0-add-brick-mount.log said:
>>
>> [2019-05-22 00:15:17.005695] I [fuse-bridge.c:4267:fuse_init]
>> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel
>> 7.22
>> [2019-05-22 00:15:17.005749] I [fuse-bridge.c:4878:fuse_graph_sync]
>> 0-fuse: switched to graph 0
>> [2019-05-22 00:15:17.010101] E [fuse-bridge.c:4336:fuse_first_lookup]
>> 0-fuse: first lookup on root failed (Transport endpoint is not connected)
>> [2019-05-22 00:15:17.014217] W [fuse-bridge.c:897:fuse_attr_cbk]
>> 0-glusterfs-fuse: 2: LOOKUP() / => -1 (Transport endpoint is not connected)
>> [2019-05-22 00:15:17.015097] W [fuse-resolve.c:127:fuse_resolve_gfid_cbk]
>> 0-fuse: 00000000-0000-0000-0000-000000000001: failed to resolve (Transport
>> endpoint is not connected)
>> [2019-05-22 00:15:17.015158] W [fuse-bridge.c:3294:fuse_setxattr_resume]
>> 0-glusterfs-fuse: 3: SETXATTR 00000000-0000-0000-0000-000000000001/1
>> (trusted.add-brick) resolution failed
>> [2019-05-22 00:15:17.035636] I [fuse-bridge.c:5144:fuse_thread_proc]
>> 0-fuse: initating unmount of /tmp/mntYGNbj9
>> [2019-05-22 00:15:17.035854] W [glusterfsd.c:1500:cleanup_and_exit]
>> (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f7745ccedd5]
>> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55c81b63de75]
>> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55c81b63dceb] ) 0-:
>> received signum (15), shutting down
>> [2019-05-22 00:15:17.035942] I [fuse-bridge.c:5914:fini] 0-fuse:
>> Unmounting '/tmp/mntYGNbj9'.
>> [2019-05-22 00:15:17.035966] I [fuse-bridge.c:5919:fini] 0-fuse: Closing
>> fuse connection to '/tmp/mntYGNbj9'.
>>
>> Here are the processes running on the new arbiter server:
>> # ps -ef | grep gluster
>> root      3466     1  0 20:13 ?        00:00:00 /usr/sbin/glusterfs -s
>> localhost --volfile-id gluster/glustershd -p
>> /var/run/gluster/glustershd/glustershd.pid -l
>> /var/log/glusterfs/glustershd.log -S
>> /var/run/gluster/24c12b09f93eec8e.socket --xlator-option
>> *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412 --process-name
>> glustershd
>> root      6832     1  0 May16 ?        00:02:10 /usr/sbin/glusterd -p
>> /var/run/glusterd.pid --log-level INFO
>> root     17841     1  0 May16 ?        00:00:58 /usr/sbin/glusterfs
>> --process-name fuse --volfile-server=gfs1 --volfile-id=/gvol0 /mnt/glusterfs
>>
>> Here are the files created on the new arbiter server:
>> # find /nodirectwritedata/gluster/gvol0 | xargs ls -ald
>> drwxr-xr-x 3 root root 4096 May 21 20:15 /nodirectwritedata/gluster/gvol0
>> drw------- 2 root root 4096 May 21 20:15
>> /nodirectwritedata/gluster/gvol0/.glusterfs
>>
>> Thank you for your help!
>>
>>
>> On Tue, 21 May 2019 at 00:10, Sanju Rakonde <srakonde at redhat.com> wrote:
>>
>>> David,
>>>
>>> can you please attach glusterd.logs? As the error message says, Commit
>>> failed on the arbitar node, we might be able to find some issue on that
>>> node.
>>>
>>> On Mon, May 20, 2019 at 10:10 AM Nithya Balachandran <
>>> nbalacha at redhat.com> wrote:
>>>
>>>>
>>>>
>>>> On Fri, 17 May 2019 at 06:01, David Cunningham <
>>>> dcunningham at voisonics.com> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> We're adding an arbiter node to an existing volume and having an
>>>>> issue. Can anyone help? The root cause error appears to be
>>>>> "00000000-0000-0000-0000-000000000001: failed to resolve (Transport
>>>>> endpoint is not connected)", as below.
>>>>>
>>>>> We are running glusterfs 5.6.1. Thanks in advance for any assistance!
>>>>>
>>>>> On existing node gfs1, trying to add new arbiter node gfs3:
>>>>>
>>>>> # gluster volume add-brick gvol0 replica 3 arbiter 1
>>>>> gfs3:/nodirectwritedata/gluster/gvol0
>>>>> volume add-brick: failed: Commit failed on gfs3. Please check log file
>>>>> for details.
>>>>>
>>>>
>>>> This looks like a glusterd issue. Please check the glusterd logs for
>>>> more info.
>>>> Adding the glusterd dev to this thread. Sanju, can you take a look?
>>>>
>>>> Regards,
>>>> Nithya
>>>>
>>>>>
>>>>> On new node gfs3 in gvol0-add-brick-mount.log:
>>>>>
>>>>> [2019-05-17 01:20:22.689721] I [fuse-bridge.c:4267:fuse_init]
>>>>> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel
>>>>> 7.22
>>>>> [2019-05-17 01:20:22.689778] I [fuse-bridge.c:4878:fuse_graph_sync]
>>>>> 0-fuse: switched to graph 0
>>>>> [2019-05-17 01:20:22.694897] E [fuse-bridge.c:4336:fuse_first_lookup]
>>>>> 0-fuse: first lookup on root failed (Transport endpoint is not connected)
>>>>> [2019-05-17 01:20:22.699770] W
>>>>> [fuse-resolve.c:127:fuse_resolve_gfid_cbk] 0-fuse:
>>>>> 00000000-0000-0000-0000-000000000001: failed to resolve (Transport endpoint
>>>>> is not connected)
>>>>> [2019-05-17 01:20:22.699834] W
>>>>> [fuse-bridge.c:3294:fuse_setxattr_resume] 0-glusterfs-fuse: 2: SETXATTR
>>>>> 00000000-0000-0000-0000-000000000001/1 (trusted.add-brick) resolution failed
>>>>> [2019-05-17 01:20:22.715656] I [fuse-bridge.c:5144:fuse_thread_proc]
>>>>> 0-fuse: initating unmount of /tmp/mntQAtu3f
>>>>> [2019-05-17 01:20:22.715865] W [glusterfsd.c:1500:cleanup_and_exit]
>>>>> (-->/lib64/libpthread.so.0(+0x7dd5) [0x7fb223bf6dd5]
>>>>> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x560886581e75]
>>>>> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x560886581ceb] ) 0-:
>>>>> received signum (15), shutting down
>>>>> [2019-05-17 01:20:22.715926] I [fuse-bridge.c:5914:fini] 0-fuse:
>>>>> Unmounting '/tmp/mntQAtu3f'.
>>>>> [2019-05-17 01:20:22.715953] I [fuse-bridge.c:5919:fini] 0-fuse:
>>>>> Closing fuse connection to '/tmp/mntQAtu3f'.
>>>>>
>>>>> Processes running on new node gfs3:
>>>>>
>>>>> # ps -ef | grep gluster
>>>>> root      6832     1  0 20:17 ?        00:00:00 /usr/sbin/glusterd -p
>>>>> /var/run/glusterd.pid --log-level INFO
>>>>> root     15799     1  0 20:17 ?        00:00:00 /usr/sbin/glusterfs -s
>>>>> localhost --volfile-id gluster/glustershd -p
>>>>> /var/run/gluster/glustershd/glustershd.pid -l
>>>>> /var/log/glusterfs/glustershd.log -S
>>>>> /var/run/gluster/24c12b09f93eec8e.socket --xlator-option
>>>>> *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412 --process-name
>>>>> glustershd
>>>>> root     16856 16735  0 21:21 pts/0    00:00:00 grep --color=auto
>>>>> gluster
>>>>>
>>>>> --
>>>>> David Cunningham, Voisonics Limited
>>>>> http://voisonics.com/
>>>>> USA: +1 213 221 1092
>>>>> New Zealand: +64 (0)28 2558 3782
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users at gluster.org
>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>>
>>>
>>> --
>>> Thanks,
>>> Sanju
>>>
>>
>>
>> --
>> David Cunningham, Voisonics Limited
>> http://voisonics.com/
>> USA: +1 213 221 1092
>> New Zealand: +64 (0)28 2558 3782
>>
>> _______________________________________________
>> Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>
> --
> David Cunningham, Voisonics Limited
> http://voisonics.com/
> USA: +1 213 221 1092
> New Zealand: +64 (0)28 2558 3782
>
>

-- 
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190522/366ca7f7/attachment-0001.html>

From ravishankar at redhat.com  Wed May 22 06:02:04 2019
From: ravishankar at redhat.com (Ravishankar N)
Date: Wed, 22 May 2019 11:32:04 +0530
Subject: [Gluster-users] add-brick: failed: Commit failed
In-Reply-To: <CAHGbP-_wr7TOHHCjU4M0gwK4_HvS_1mFm4pg+iRRNOsoo0npFw@mail.gmail.com>
References: <CAHGbP-9UtXqtK8=zsf=V+HghhLmsyONSfkt6SZAYQOtFezHBJw@mail.gmail.com>
	<CAOUCJ=hodD2sW3Maa9U8t5xNueuf8MaW6=KALE15+TPekEAA8g@mail.gmail.com>
	<CABj3WN2vDQwLfU673SAr5g6JLNwLMjeW8D46B_Fx+rFv2KB5LQ@mail.gmail.com>
	<CAHGbP-8dhkODZqREoxCWtpFqk+ZrW+EkyWfwgWNqRXoJTS6frg@mail.gmail.com>
	<924b8cb6-5a61-3a7f-1591-07ffe0d80a24@redhat.com>
	<CAHGbP-_A6JogdzUOm7UcGd8V8pZbF-veR0g_rsMvC7Ju0sCqMw@mail.gmail.com>
	<47a6c5fa-4304-4680-d63f-99ecd1e43c4c@redhat.com>
	<CAHGbP-_wr7TOHHCjU4M0gwK4_HvS_1mFm4pg+iRRNOsoo0npFw@mail.gmail.com>
Message-ID: <764773c5-38d4-e427-d699-3192bf9a1005@redhat.com>


On 22/05/19 11:23 AM, David Cunningham wrote:
> Hi Ravi,
>
> I'd already done exactly that before, where step 3 was a simple 'rm 
> -rf /nodirectwritedata/gluster/gvol0'. Have you another suggestion on 
> what the cleanup or reformat should be?
`rm -rf /nodirectwritedata/gluster/gvol0` does look okay to me David. 
Basically, '/nodirectwritedata/gluster/gvol0' must be empty and must not 
have any extended attributes set on it. Why fuse_first_lookup() is 
failing is a bit of a mystery to me at this point. :-(
Regards,
Ravi
>
> Thank you.
>
>
> On Wed, 22 May 2019 at 13:56, Ravishankar N <ravishankar at redhat.com 
> <mailto:ravishankar at redhat.com>> wrote:
>
>     Hmm, so the volume info seems to indicate that the add-brick was
>     successful but the gfid xattr is missing on the new brick (as are
>     the actual files, barring the .glusterfs folder, according to your
>     previous mail).
>
>     Do you want to try removing and adding it again?
>
>     1. `gluster volume remove-brick gvol0 replica 2
>     gfs3:/nodirectwritedata/gluster/gvol0 force` from gfs1
>
>     2. Check that gluster volume info is now back to a 1x2 volume on
>     all nodes and `gluster peer status` is connected on all nodes.
>
>     3. Cleanup or reformat '/nodirectwritedata/gluster/gvol0' on gfs3.
>
>     4. `gluster volume add-brick gvol0 replica 3 arbiter 1
>     gfs3:/nodirectwritedata/gluster/gvol0` from gfs1.
>
>     5. Check that the files are getting healed on to the new brick.
>
>     Thanks,
>     Ravi
>     On 22/05/19 6:50 AM, David Cunningham wrote:
>>     Hi Ravi,
>>
>>     Certainly. On the existing two nodes:
>>
>>     gfs1 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0
>>     getfattr: Removing leading '/' from absolute path names
>>     # file: nodirectwritedata/gluster/gvol0
>>     trusted.afr.dirty=0x000000000000000000000000
>>     trusted.afr.gvol0-client-2=0x000000000000000000000000
>>     trusted.gfid=0x00000000000000000000000000000001
>>     trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>     trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6
>>
>>     gfs2 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0
>>     getfattr: Removing leading '/' from absolute path names
>>     # file: nodirectwritedata/gluster/gvol0
>>     trusted.afr.dirty=0x000000000000000000000000
>>     trusted.afr.gvol0-client-0=0x000000000000000000000000
>>     trusted.afr.gvol0-client-2=0x000000000000000000000000
>>     trusted.gfid=0x00000000000000000000000000000001
>>     trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>     trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6
>>
>>     On the new node:
>>
>>     gfs3 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0
>>     getfattr: Removing leading '/' from absolute path names
>>     # file: nodirectwritedata/gluster/gvol0
>>     trusted.afr.dirty=0x000000000000000000000001
>>     trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6
>>
>>     Output of "gluster volume info" is the same on all 3 nodes and is:
>>
>>     # gluster volume info
>>
>>     Volume Name: gvol0
>>     Type: Replicate
>>     Volume ID: fb5af69e-1c3e-4164-8b23-c1d7bec9b1b6
>>     Status: Started
>>     Snapshot Count: 0
>>     Number of Bricks: 1 x (2 + 1) = 3
>>     Transport-type: tcp
>>     Bricks:
>>     Brick1: gfs1:/nodirectwritedata/gluster/gvol0
>>     Brick2: gfs2:/nodirectwritedata/gluster/gvol0
>>     Brick3: gfs3:/nodirectwritedata/gluster/gvol0 (arbiter)
>>     Options Reconfigured:
>>     performance.client-io-threads: off
>>     nfs.disable: on
>>     transport.address-family: inet
>>
>>
>>     On Wed, 22 May 2019 at 12:43, Ravishankar N
>>     <ravishankar at redhat.com <mailto:ravishankar at redhat.com>> wrote:
>>
>>         Hi David,
>>         Could you provide the `getfattr -d -m. -e hex
>>         /nodirectwritedata/gluster/gvol0` output of all bricks and
>>         the output of `gluster volume info`?
>>
>>         Thanks,
>>         Ravi
>>         On 22/05/19 4:57 AM, David Cunningham wrote:
>>>         Hi Sanju,
>>>
>>>         Here's what glusterd.log says on the new arbiter server when
>>>         trying to add the node:
>>>
>>>         [2019-05-22 00:15:05.963059] I [run.c:242:runner_log]
>>>         (-->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x3b2cd)
>>>         [0x7fe4ca9102cd]
>>>         -->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0xe6b85)
>>>         [0x7fe4ca9bbb85]
>>>         -->/lib64/libglusterfs.so.0(runner_log+0x115)
>>>         [0x7fe4d5ecc955] ) 0-management: Ran script:
>>>         /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh
>>>         --volname=gvol0 --version=1 --volume-op=add-brick
>>>         --gd-workdir=/var/lib/glusterd
>>>         [2019-05-22 00:15:05.963177] I [MSGID: 106578]
>>>         [glusterd-brick-ops.c:1355:glusterd_op_perform_add_bricks]
>>>         0-management: replica-count is set 3
>>>         [2019-05-22 00:15:05.963228] I [MSGID: 106578]
>>>         [glusterd-brick-ops.c:1360:glusterd_op_perform_add_bricks]
>>>         0-management: arbiter-count is set 1
>>>         [2019-05-22 00:15:05.963257] I [MSGID: 106578]
>>>         [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks]
>>>         0-management: type is set 0, need to change it
>>>         [2019-05-22 00:15:17.015268] E [MSGID: 106053]
>>>         [glusterd-utils.c:13942:glusterd_handle_replicate_brick_ops]
>>>         0-management: Failed to set extended attribute
>>>         trusted.add-brick : Transport endpoint is not connected
>>>         [Transport endpoint is not connected]
>>>         [2019-05-22 00:15:17.036479] E [MSGID: 106073]
>>>         [glusterd-brick-ops.c:2595:glusterd_op_add_brick]
>>>         0-glusterd: Unable to add bricks
>>>         [2019-05-22 00:15:17.036595] E [MSGID: 106122]
>>>         [glusterd-mgmt.c:299:gd_mgmt_v3_commit_fn] 0-management:
>>>         Add-brick commit failed.
>>>         [2019-05-22 00:15:17.036710] E [MSGID: 106122]
>>>         [glusterd-mgmt-handler.c:594:glusterd_handle_commit_fn]
>>>         0-management: commit failed on operation Add brick
>>>
>>>         As before gvol0-add-brick-mount.log said:
>>>
>>>         [2019-05-22 00:15:17.005695] I
>>>         [fuse-bridge.c:4267:fuse_init] 0-glusterfs-fuse: FUSE inited
>>>         with protocol versions: glusterfs 7.24 kernel 7.22
>>>         [2019-05-22 00:15:17.005749] I
>>>         [fuse-bridge.c:4878:fuse_graph_sync] 0-fuse: switched to graph 0
>>>         [2019-05-22 00:15:17.010101] E
>>>         [fuse-bridge.c:4336:fuse_first_lookup] 0-fuse: first lookup
>>>         on root failed (Transport endpoint is not connected)
>>>         [2019-05-22 00:15:17.014217] W
>>>         [fuse-bridge.c:897:fuse_attr_cbk] 0-glusterfs-fuse: 2:
>>>         LOOKUP() / => -1 (Transport endpoint is not connected)
>>>         [2019-05-22 00:15:17.015097] W
>>>         [fuse-resolve.c:127:fuse_resolve_gfid_cbk] 0-fuse:
>>>         00000000-0000-0000-0000-000000000001: failed to resolve
>>>         (Transport endpoint is not connected)
>>>         [2019-05-22 00:15:17.015158] W
>>>         [fuse-bridge.c:3294:fuse_setxattr_resume] 0-glusterfs-fuse:
>>>         3: SETXATTR 00000000-0000-0000-0000-000000000001/1
>>>         (trusted.add-brick) resolution failed
>>>         [2019-05-22 00:15:17.035636] I
>>>         [fuse-bridge.c:5144:fuse_thread_proc] 0-fuse: initating
>>>         unmount of /tmp/mntYGNbj9
>>>         [2019-05-22 00:15:17.035854] W
>>>         [glusterfsd.c:1500:cleanup_and_exit]
>>>         (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f7745ccedd5]
>>>         -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5)
>>>         [0x55c81b63de75]
>>>         -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b)
>>>         [0x55c81b63dceb] ) 0-: received signum (15), shutting down
>>>         [2019-05-22 00:15:17.035942] I [fuse-bridge.c:5914:fini]
>>>         0-fuse: Unmounting '/tmp/mntYGNbj9'.
>>>         [2019-05-22 00:15:17.035966] I [fuse-bridge.c:5919:fini]
>>>         0-fuse: Closing fuse connection to '/tmp/mntYGNbj9'.
>>>
>>>         Here are the processes running on the new arbiter server:
>>>         # ps -ef | grep gluster
>>>         root????? 3466???? 1? 0 20:13 ???????? 00:00:00
>>>         /usr/sbin/glusterfs -s localhost --volfile-id
>>>         gluster/glustershd -p
>>>         /var/run/gluster/glustershd/glustershd.pid -l
>>>         /var/log/glusterfs/glustershd.log -S
>>>         /var/run/gluster/24c12b09f93eec8e.socket --xlator-option
>>>         *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412
>>>         --process-name glustershd
>>>         root????? 6832???? 1? 0 May16 ???????? 00:02:10
>>>         /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
>>>         root???? 17841???? 1? 0 May16 ???????? 00:00:58
>>>         /usr/sbin/glusterfs --process-name fuse
>>>         --volfile-server=gfs1 --volfile-id=/gvol0 /mnt/glusterfs
>>>
>>>         Here are the files created on the new arbiter server:
>>>         # find /nodirectwritedata/gluster/gvol0 | xargs ls -ald
>>>         drwxr-xr-x 3 root root 4096 May 21 20:15
>>>         /nodirectwritedata/gluster/gvol0
>>>         drw------- 2 root root 4096 May 21 20:15
>>>         /nodirectwritedata/gluster/gvol0/.glusterfs
>>>
>>>         Thank you for your help!
>>>
>>>
>>>         On Tue, 21 May 2019 at 00:10, Sanju Rakonde
>>>         <srakonde at redhat.com <mailto:srakonde at redhat.com>> wrote:
>>>
>>>             David,
>>>
>>>             can you please attach glusterd.logs? As the error
>>>             message says, Commit failed on the arbitar node, we
>>>             might be able to find some issue on that node.
>>>
>>>             On Mon, May 20, 2019 at 10:10 AM Nithya Balachandran
>>>             <nbalacha at redhat.com <mailto:nbalacha at redhat.com>> wrote:
>>>
>>>
>>>
>>>                 On Fri, 17 May 2019 at 06:01, David Cunningham
>>>                 <dcunningham at voisonics.com
>>>                 <mailto:dcunningham at voisonics.com>> wrote:
>>>
>>>                     Hello,
>>>
>>>                     We're adding an arbiter node to an existing
>>>                     volume and having an issue. Can anyone help? The
>>>                     root cause error appears to be
>>>                     "00000000-0000-0000-0000-000000000001: failed to
>>>                     resolve (Transport endpoint is not connected)",
>>>                     as below.
>>>
>>>                     We are running glusterfs 5.6.1. Thanks in
>>>                     advance for any assistance!
>>>
>>>                     On existing node gfs1, trying to add new arbiter
>>>                     node gfs3:
>>>
>>>                     # gluster volume add-brick gvol0 replica 3
>>>                     arbiter 1 gfs3:/nodirectwritedata/gluster/gvol0
>>>                     volume add-brick: failed: Commit failed on gfs3.
>>>                     Please check log file for details.
>>>
>>>
>>>                 This looks like a glusterd issue. Please check the
>>>                 glusterd logs for more info.
>>>                 Adding the glusterd dev to this thread. Sanju, can
>>>                 you take a look?
>>>                 Regards,
>>>                 Nithya
>>>
>>>
>>>                     On new node gfs3 in gvol0-add-brick-mount.log:
>>>
>>>                     [2019-05-17 01:20:22.689721] I
>>>                     [fuse-bridge.c:4267:fuse_init] 0-glusterfs-fuse:
>>>                     FUSE inited with protocol versions: glusterfs
>>>                     7.24 kernel 7.22
>>>                     [2019-05-17 01:20:22.689778] I
>>>                     [fuse-bridge.c:4878:fuse_graph_sync] 0-fuse:
>>>                     switched to graph 0
>>>                     [2019-05-17 01:20:22.694897] E
>>>                     [fuse-bridge.c:4336:fuse_first_lookup] 0-fuse:
>>>                     first lookup on root failed (Transport endpoint
>>>                     is not connected)
>>>                     [2019-05-17 01:20:22.699770] W
>>>                     [fuse-resolve.c:127:fuse_resolve_gfid_cbk]
>>>                     0-fuse: 00000000-0000-0000-0000-000000000001:
>>>                     failed to resolve (Transport endpoint is not
>>>                     connected)
>>>                     [2019-05-17 01:20:22.699834] W
>>>                     [fuse-bridge.c:3294:fuse_setxattr_resume]
>>>                     0-glusterfs-fuse: 2: SETXATTR
>>>                     00000000-0000-0000-0000-000000000001/1
>>>                     (trusted.add-brick) resolution failed
>>>                     [2019-05-17 01:20:22.715656] I
>>>                     [fuse-bridge.c:5144:fuse_thread_proc] 0-fuse:
>>>                     initating unmount of /tmp/mntQAtu3f
>>>                     [2019-05-17 01:20:22.715865] W
>>>                     [glusterfsd.c:1500:cleanup_and_exit]
>>>                     (-->/lib64/libpthread.so.0(+0x7dd5)
>>>                     [0x7fb223bf6dd5]
>>>                     -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5)
>>>                     [0x560886581e75]
>>>                     -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b)
>>>                     [0x560886581ceb] ) 0-: received signum (15),
>>>                     shutting down
>>>                     [2019-05-17 01:20:22.715926] I
>>>                     [fuse-bridge.c:5914:fini] 0-fuse: Unmounting
>>>                     '/tmp/mntQAtu3f'.
>>>                     [2019-05-17 01:20:22.715953] I
>>>                     [fuse-bridge.c:5919:fini] 0-fuse: Closing fuse
>>>                     connection to '/tmp/mntQAtu3f'.
>>>
>>>                     Processes running on new node gfs3:
>>>
>>>                     # ps -ef | grep gluster
>>>                     root????? 6832 1? 0 20:17 ???????? 00:00:00
>>>                     /usr/sbin/glusterd -p /var/run/glusterd.pid
>>>                     --log-level INFO
>>>                     root???? 15799 1? 0 20:17 ???????? 00:00:00
>>>                     /usr/sbin/glusterfs -s localhost --volfile-id
>>>                     gluster/glustershd -p
>>>                     /var/run/gluster/glustershd/glustershd.pid -l
>>>                     /var/log/glusterfs/glustershd.log -S
>>>                     /var/run/gluster/24c12b09f93eec8e.socket
>>>                     --xlator-option
>>>                     *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412
>>>                     --process-name glustershd
>>>                     root???? 16856 16735? 0 21:21 pts/0??? 00:00:00
>>>                     grep --color=auto gluster
>>>
>>>                     -- 
>>>                     David Cunningham, Voisonics Limited
>>>                     http://voisonics.com/
>>>                     USA: +1 213 221 1092
>>>                     New Zealand: +64 (0)28 2558 3782
>>>                     _______________________________________________
>>>                     Gluster-users mailing list
>>>                     Gluster-users at gluster.org
>>>                     <mailto:Gluster-users at gluster.org>
>>>                     https://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>>>
>>>             -- 
>>>             Thanks,
>>>             Sanju
>>>
>>>
>>>
>>>         -- 
>>>         David Cunningham, Voisonics Limited
>>>         http://voisonics.com/
>>>         USA: +1 213 221 1092
>>>         New Zealand: +64 (0)28 2558 3782
>>>
>>>         _______________________________________________
>>>         Gluster-users mailing list
>>>         Gluster-users at gluster.org  <mailto:Gluster-users at gluster.org>
>>>         https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>>     -- 
>>     David Cunningham, Voisonics Limited
>>     http://voisonics.com/
>>     USA: +1 213 221 1092
>>     New Zealand: +64 (0)28 2558 3782
>
>
>
> -- 
> David Cunningham, Voisonics Limited
> http://voisonics.com/
> USA: +1 213 221 1092
> New Zealand: +64 (0)28 2558 3782
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190522/ddb0bd91/attachment.html>

From ravishankar at redhat.com  Wed May 22 06:06:36 2019
From: ravishankar at redhat.com (Ravishankar N)
Date: Wed, 22 May 2019 11:36:36 +0530
Subject: [Gluster-users] add-brick: failed: Commit failed
In-Reply-To: <764773c5-38d4-e427-d699-3192bf9a1005@redhat.com>
References: <CAHGbP-9UtXqtK8=zsf=V+HghhLmsyONSfkt6SZAYQOtFezHBJw@mail.gmail.com>
	<CAOUCJ=hodD2sW3Maa9U8t5xNueuf8MaW6=KALE15+TPekEAA8g@mail.gmail.com>
	<CABj3WN2vDQwLfU673SAr5g6JLNwLMjeW8D46B_Fx+rFv2KB5LQ@mail.gmail.com>
	<CAHGbP-8dhkODZqREoxCWtpFqk+ZrW+EkyWfwgWNqRXoJTS6frg@mail.gmail.com>
	<924b8cb6-5a61-3a7f-1591-07ffe0d80a24@redhat.com>
	<CAHGbP-_A6JogdzUOm7UcGd8V8pZbF-veR0g_rsMvC7Ju0sCqMw@mail.gmail.com>
	<47a6c5fa-4304-4680-d63f-99ecd1e43c4c@redhat.com>
	<CAHGbP-_wr7TOHHCjU4M0gwK4_HvS_1mFm4pg+iRRNOsoo0npFw@mail.gmail.com>
	<764773c5-38d4-e427-d699-3192bf9a1005@redhat.com>
Message-ID: <fbd223ea-2742-142c-52cc-a7223076fbde@redhat.com>

If you are trying this again, please 'gluster volume set $volname 
client-log-level DEBUG`before attempting the add-brick and attach the 
gvol0-add-brick-mount.log here. After that, you can change the 
client-log-level back to INFO.

-Ravi

On 22/05/19 11:32 AM, Ravishankar N wrote:
>
>
> On 22/05/19 11:23 AM, David Cunningham wrote:
>> Hi Ravi,
>>
>> I'd already done exactly that before, where step 3 was a simple 'rm 
>> -rf /nodirectwritedata/gluster/gvol0'. Have you another suggestion on 
>> what the cleanup or reformat should be?
> `rm -rf /nodirectwritedata/gluster/gvol0` does look okay to me David. 
> Basically, '/nodirectwritedata/gluster/gvol0' must be empty and must 
> not have any extended attributes set on it. Why fuse_first_lookup() is 
> failing is a bit of a mystery to me at this point. :-(
> Regards,
> Ravi
>>
>> Thank you.
>>
>>
>> On Wed, 22 May 2019 at 13:56, Ravishankar N <ravishankar at redhat.com 
>> <mailto:ravishankar at redhat.com>> wrote:
>>
>>     Hmm, so the volume info seems to indicate that the add-brick was
>>     successful but the gfid xattr is missing on the new brick (as are
>>     the actual files, barring the .glusterfs folder, according to
>>     your previous mail).
>>
>>     Do you want to try removing and adding it again?
>>
>>     1. `gluster volume remove-brick gvol0 replica 2
>>     gfs3:/nodirectwritedata/gluster/gvol0 force` from gfs1
>>
>>     2. Check that gluster volume info is now back to a 1x2 volume on
>>     all nodes and `gluster peer status` is connected on all nodes.
>>
>>     3. Cleanup or reformat '/nodirectwritedata/gluster/gvol0' on gfs3.
>>
>>     4. `gluster volume add-brick gvol0 replica 3 arbiter 1
>>     gfs3:/nodirectwritedata/gluster/gvol0` from gfs1.
>>
>>     5. Check that the files are getting healed on to the new brick.
>>
>>     Thanks,
>>     Ravi
>>     On 22/05/19 6:50 AM, David Cunningham wrote:
>>>     Hi Ravi,
>>>
>>>     Certainly. On the existing two nodes:
>>>
>>>     gfs1 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0
>>>     getfattr: Removing leading '/' from absolute path names
>>>     # file: nodirectwritedata/gluster/gvol0
>>>     trusted.afr.dirty=0x000000000000000000000000
>>>     trusted.afr.gvol0-client-2=0x000000000000000000000000
>>>     trusted.gfid=0x00000000000000000000000000000001
>>>     trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>     trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6
>>>
>>>     gfs2 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0
>>>     getfattr: Removing leading '/' from absolute path names
>>>     # file: nodirectwritedata/gluster/gvol0
>>>     trusted.afr.dirty=0x000000000000000000000000
>>>     trusted.afr.gvol0-client-0=0x000000000000000000000000
>>>     trusted.afr.gvol0-client-2=0x000000000000000000000000
>>>     trusted.gfid=0x00000000000000000000000000000001
>>>     trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>     trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6
>>>
>>>     On the new node:
>>>
>>>     gfs3 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0
>>>     getfattr: Removing leading '/' from absolute path names
>>>     # file: nodirectwritedata/gluster/gvol0
>>>     trusted.afr.dirty=0x000000000000000000000001
>>>     trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6
>>>
>>>     Output of "gluster volume info" is the same on all 3 nodes and is:
>>>
>>>     # gluster volume info
>>>
>>>     Volume Name: gvol0
>>>     Type: Replicate
>>>     Volume ID: fb5af69e-1c3e-4164-8b23-c1d7bec9b1b6
>>>     Status: Started
>>>     Snapshot Count: 0
>>>     Number of Bricks: 1 x (2 + 1) = 3
>>>     Transport-type: tcp
>>>     Bricks:
>>>     Brick1: gfs1:/nodirectwritedata/gluster/gvol0
>>>     Brick2: gfs2:/nodirectwritedata/gluster/gvol0
>>>     Brick3: gfs3:/nodirectwritedata/gluster/gvol0 (arbiter)
>>>     Options Reconfigured:
>>>     performance.client-io-threads: off
>>>     nfs.disable: on
>>>     transport.address-family: inet
>>>
>>>
>>>     On Wed, 22 May 2019 at 12:43, Ravishankar N
>>>     <ravishankar at redhat.com <mailto:ravishankar at redhat.com>> wrote:
>>>
>>>         Hi David,
>>>         Could you provide the `getfattr -d -m. -e hex
>>>         /nodirectwritedata/gluster/gvol0` output of all bricks and
>>>         the output of `gluster volume info`?
>>>
>>>         Thanks,
>>>         Ravi
>>>         On 22/05/19 4:57 AM, David Cunningham wrote:
>>>>         Hi Sanju,
>>>>
>>>>         Here's what glusterd.log says on the new arbiter server
>>>>         when trying to add the node:
>>>>
>>>>         [2019-05-22 00:15:05.963059] I [run.c:242:runner_log]
>>>>         (-->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x3b2cd)
>>>>         [0x7fe4ca9102cd]
>>>>         -->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0xe6b85)
>>>>         [0x7fe4ca9bbb85]
>>>>         -->/lib64/libglusterfs.so.0(runner_log+0x115)
>>>>         [0x7fe4d5ecc955] ) 0-management: Ran script:
>>>>         /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh
>>>>         --volname=gvol0 --version=1 --volume-op=add-brick
>>>>         --gd-workdir=/var/lib/glusterd
>>>>         [2019-05-22 00:15:05.963177] I [MSGID: 106578]
>>>>         [glusterd-brick-ops.c:1355:glusterd_op_perform_add_bricks]
>>>>         0-management: replica-count is set 3
>>>>         [2019-05-22 00:15:05.963228] I [MSGID: 106578]
>>>>         [glusterd-brick-ops.c:1360:glusterd_op_perform_add_bricks]
>>>>         0-management: arbiter-count is set 1
>>>>         [2019-05-22 00:15:05.963257] I [MSGID: 106578]
>>>>         [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks]
>>>>         0-management: type is set 0, need to change it
>>>>         [2019-05-22 00:15:17.015268] E [MSGID: 106053]
>>>>         [glusterd-utils.c:13942:glusterd_handle_replicate_brick_ops]
>>>>         0-management: Failed to set extended attribute
>>>>         trusted.add-brick : Transport endpoint is not connected
>>>>         [Transport endpoint is not connected]
>>>>         [2019-05-22 00:15:17.036479] E [MSGID: 106073]
>>>>         [glusterd-brick-ops.c:2595:glusterd_op_add_brick]
>>>>         0-glusterd: Unable to add bricks
>>>>         [2019-05-22 00:15:17.036595] E [MSGID: 106122]
>>>>         [glusterd-mgmt.c:299:gd_mgmt_v3_commit_fn] 0-management:
>>>>         Add-brick commit failed.
>>>>         [2019-05-22 00:15:17.036710] E [MSGID: 106122]
>>>>         [glusterd-mgmt-handler.c:594:glusterd_handle_commit_fn]
>>>>         0-management: commit failed on operation Add brick
>>>>
>>>>         As before gvol0-add-brick-mount.log said:
>>>>
>>>>         [2019-05-22 00:15:17.005695] I
>>>>         [fuse-bridge.c:4267:fuse_init] 0-glusterfs-fuse: FUSE
>>>>         inited with protocol versions: glusterfs 7.24 kernel 7.22
>>>>         [2019-05-22 00:15:17.005749] I
>>>>         [fuse-bridge.c:4878:fuse_graph_sync] 0-fuse: switched to
>>>>         graph 0
>>>>         [2019-05-22 00:15:17.010101] E
>>>>         [fuse-bridge.c:4336:fuse_first_lookup] 0-fuse: first lookup
>>>>         on root failed (Transport endpoint is not connected)
>>>>         [2019-05-22 00:15:17.014217] W
>>>>         [fuse-bridge.c:897:fuse_attr_cbk] 0-glusterfs-fuse: 2:
>>>>         LOOKUP() / => -1 (Transport endpoint is not connected)
>>>>         [2019-05-22 00:15:17.015097] W
>>>>         [fuse-resolve.c:127:fuse_resolve_gfid_cbk] 0-fuse:
>>>>         00000000-0000-0000-0000-000000000001: failed to resolve
>>>>         (Transport endpoint is not connected)
>>>>         [2019-05-22 00:15:17.015158] W
>>>>         [fuse-bridge.c:3294:fuse_setxattr_resume] 0-glusterfs-fuse:
>>>>         3: SETXATTR 00000000-0000-0000-0000-000000000001/1
>>>>         (trusted.add-brick) resolution failed
>>>>         [2019-05-22 00:15:17.035636] I
>>>>         [fuse-bridge.c:5144:fuse_thread_proc] 0-fuse: initating
>>>>         unmount of /tmp/mntYGNbj9
>>>>         [2019-05-22 00:15:17.035854] W
>>>>         [glusterfsd.c:1500:cleanup_and_exit]
>>>>         (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f7745ccedd5]
>>>>         -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5)
>>>>         [0x55c81b63de75]
>>>>         -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b)
>>>>         [0x55c81b63dceb] ) 0-: received signum (15), shutting down
>>>>         [2019-05-22 00:15:17.035942] I [fuse-bridge.c:5914:fini]
>>>>         0-fuse: Unmounting '/tmp/mntYGNbj9'.
>>>>         [2019-05-22 00:15:17.035966] I [fuse-bridge.c:5919:fini]
>>>>         0-fuse: Closing fuse connection to '/tmp/mntYGNbj9'.
>>>>
>>>>         Here are the processes running on the new arbiter server:
>>>>         # ps -ef | grep gluster
>>>>         root????? 3466???? 1? 0 20:13 ???????? 00:00:00
>>>>         /usr/sbin/glusterfs -s localhost --volfile-id
>>>>         gluster/glustershd -p
>>>>         /var/run/gluster/glustershd/glustershd.pid -l
>>>>         /var/log/glusterfs/glustershd.log -S
>>>>         /var/run/gluster/24c12b09f93eec8e.socket --xlator-option
>>>>         *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412
>>>>         --process-name glustershd
>>>>         root????? 6832???? 1? 0 May16 ???????? 00:02:10
>>>>         /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
>>>>         root???? 17841???? 1? 0 May16 ???????? 00:00:58
>>>>         /usr/sbin/glusterfs --process-name fuse
>>>>         --volfile-server=gfs1 --volfile-id=/gvol0 /mnt/glusterfs
>>>>
>>>>         Here are the files created on the new arbiter server:
>>>>         # find /nodirectwritedata/gluster/gvol0 | xargs ls -ald
>>>>         drwxr-xr-x 3 root root 4096 May 21 20:15
>>>>         /nodirectwritedata/gluster/gvol0
>>>>         drw------- 2 root root 4096 May 21 20:15
>>>>         /nodirectwritedata/gluster/gvol0/.glusterfs
>>>>
>>>>         Thank you for your help!
>>>>
>>>>
>>>>         On Tue, 21 May 2019 at 00:10, Sanju Rakonde
>>>>         <srakonde at redhat.com <mailto:srakonde at redhat.com>> wrote:
>>>>
>>>>             David,
>>>>
>>>>             can you please attach glusterd.logs? As the error
>>>>             message says, Commit failed on the arbitar node, we
>>>>             might be able to find some issue on that node.
>>>>
>>>>             On Mon, May 20, 2019 at 10:10 AM Nithya Balachandran
>>>>             <nbalacha at redhat.com <mailto:nbalacha at redhat.com>> wrote:
>>>>
>>>>
>>>>
>>>>                 On Fri, 17 May 2019 at 06:01, David Cunningham
>>>>                 <dcunningham at voisonics.com
>>>>                 <mailto:dcunningham at voisonics.com>> wrote:
>>>>
>>>>                     Hello,
>>>>
>>>>                     We're adding an arbiter node to an existing
>>>>                     volume and having an issue. Can anyone help?
>>>>                     The root cause error appears to be
>>>>                     "00000000-0000-0000-0000-000000000001: failed
>>>>                     to resolve (Transport endpoint is not
>>>>                     connected)", as below.
>>>>
>>>>                     We are running glusterfs 5.6.1. Thanks in
>>>>                     advance for any assistance!
>>>>
>>>>                     On existing node gfs1, trying to add new
>>>>                     arbiter node gfs3:
>>>>
>>>>                     # gluster volume add-brick gvol0 replica 3
>>>>                     arbiter 1 gfs3:/nodirectwritedata/gluster/gvol0
>>>>                     volume add-brick: failed: Commit failed on
>>>>                     gfs3. Please check log file for details.
>>>>
>>>>
>>>>                 This looks like a glusterd issue. Please check the
>>>>                 glusterd logs for more info.
>>>>                 Adding the glusterd dev to this thread. Sanju, can
>>>>                 you take a look?
>>>>                 Regards,
>>>>                 Nithya
>>>>
>>>>
>>>>                     On new node gfs3 in gvol0-add-brick-mount.log:
>>>>
>>>>                     [2019-05-17 01:20:22.689721] I
>>>>                     [fuse-bridge.c:4267:fuse_init]
>>>>                     0-glusterfs-fuse: FUSE inited with protocol
>>>>                     versions: glusterfs 7.24 kernel 7.22
>>>>                     [2019-05-17 01:20:22.689778] I
>>>>                     [fuse-bridge.c:4878:fuse_graph_sync] 0-fuse:
>>>>                     switched to graph 0
>>>>                     [2019-05-17 01:20:22.694897] E
>>>>                     [fuse-bridge.c:4336:fuse_first_lookup] 0-fuse:
>>>>                     first lookup on root failed (Transport endpoint
>>>>                     is not connected)
>>>>                     [2019-05-17 01:20:22.699770] W
>>>>                     [fuse-resolve.c:127:fuse_resolve_gfid_cbk]
>>>>                     0-fuse: 00000000-0000-0000-0000-000000000001:
>>>>                     failed to resolve (Transport endpoint is not
>>>>                     connected)
>>>>                     [2019-05-17 01:20:22.699834] W
>>>>                     [fuse-bridge.c:3294:fuse_setxattr_resume]
>>>>                     0-glusterfs-fuse: 2: SETXATTR
>>>>                     00000000-0000-0000-0000-000000000001/1
>>>>                     (trusted.add-brick) resolution failed
>>>>                     [2019-05-17 01:20:22.715656] I
>>>>                     [fuse-bridge.c:5144:fuse_thread_proc] 0-fuse:
>>>>                     initating unmount of /tmp/mntQAtu3f
>>>>                     [2019-05-17 01:20:22.715865] W
>>>>                     [glusterfsd.c:1500:cleanup_and_exit]
>>>>                     (-->/lib64/libpthread.so.0(+0x7dd5)
>>>>                     [0x7fb223bf6dd5]
>>>>                     -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5)
>>>>                     [0x560886581e75]
>>>>                     -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b)
>>>>                     [0x560886581ceb] ) 0-: received signum (15),
>>>>                     shutting down
>>>>                     [2019-05-17 01:20:22.715926] I
>>>>                     [fuse-bridge.c:5914:fini] 0-fuse: Unmounting
>>>>                     '/tmp/mntQAtu3f'.
>>>>                     [2019-05-17 01:20:22.715953] I
>>>>                     [fuse-bridge.c:5919:fini] 0-fuse: Closing fuse
>>>>                     connection to '/tmp/mntQAtu3f'.
>>>>
>>>>                     Processes running on new node gfs3:
>>>>
>>>>                     # ps -ef | grep gluster
>>>>                     root 6832???? 1? 0 20:17 ? 00:00:00
>>>>                     /usr/sbin/glusterd -p /var/run/glusterd.pid
>>>>                     --log-level INFO
>>>>                     root 15799???? 1? 0 20:17 ? 00:00:00
>>>>                     /usr/sbin/glusterfs -s localhost --volfile-id
>>>>                     gluster/glustershd -p
>>>>                     /var/run/gluster/glustershd/glustershd.pid -l
>>>>                     /var/log/glusterfs/glustershd.log -S
>>>>                     /var/run/gluster/24c12b09f93eec8e.socket
>>>>                     --xlator-option
>>>>                     *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412
>>>>                     --process-name glustershd
>>>>                     root???? 16856 16735? 0 21:21 pts/0 00:00:00
>>>>                     grep --color=auto gluster
>>>>
>>>>                     -- 
>>>>                     David Cunningham, Voisonics Limited
>>>>                     http://voisonics.com/
>>>>                     USA: +1 213 221 1092
>>>>                     New Zealand: +64 (0)28 2558 3782
>>>>                     _______________________________________________
>>>>                     Gluster-users mailing list
>>>>                     Gluster-users at gluster.org
>>>>                     <mailto:Gluster-users at gluster.org>
>>>>                     https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>>
>>>>
>>>>             -- 
>>>>             Thanks,
>>>>             Sanju
>>>>
>>>>
>>>>
>>>>         -- 
>>>>         David Cunningham, Voisonics Limited
>>>>         http://voisonics.com/
>>>>         USA: +1 213 221 1092
>>>>         New Zealand: +64 (0)28 2558 3782
>>>>
>>>>         _______________________________________________
>>>>         Gluster-users mailing list
>>>>         Gluster-users at gluster.org  <mailto:Gluster-users at gluster.org>
>>>>         https://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>>>
>>>     -- 
>>>     David Cunningham, Voisonics Limited
>>>     http://voisonics.com/
>>>     USA: +1 213 221 1092
>>>     New Zealand: +64 (0)28 2558 3782
>>
>>
>>
>> -- 
>> David Cunningham, Voisonics Limited
>> http://voisonics.com/
>> USA: +1 213 221 1092
>> New Zealand: +64 (0)28 2558 3782
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190522/cf21d1a3/attachment.html>

From revirii at googlemail.com  Wed May 22 07:09:46 2019
From: revirii at googlemail.com (Hu Bert)
Date: Wed, 22 May 2019 09:09:46 +0200
Subject: [Gluster-users] gluster 5.6: Gfid mismatch detected
Message-ID: <CAAV-988dA5xhSVLdJOCc1t2oZ5OHMMBzO8oFu=Zix3r=k2kH4Q@mail.gmail.com>

Hi @ll,

today i updated and rebooted the 3 servers of my replicate 3 setup;
after the 3rd one came up again i noticed this error:

[2019-05-22 06:41:26.781165] E [MSGID: 108008]
[afr-self-heal-common.c:392:afr_gfid_split_brain_source]
0-workdata-replicate-0: Gfid mismatch detected for
<gfid:751233b0-7789-4550-bd95-4dd9c8f57c19>/120710351>,
82025ab3-8034-4257-9628-d8ebde909629 on workdata-client-2 and
eaf2f31e-b4a7-4fa8-b710-d6ff9cd4eace on workdata-client-1.
[2019-05-22 06:41:27.069969] W [MSGID: 108027]
[afr-common.c:2270:afr_attempt_readsubvol_set] 0-workdata-replicate-0:
no read subvols for /staticmap/120/710/120710351
[2019-05-22 06:41:27.808532] W [fuse-bridge.c:582:fuse_entry_cbk]
0-glusterfs-fuse: 1834335: LOOKUP() /staticmap/120/710/120710351 => -1
(Transport endpoint is not connected)

A simple 'gluster volume heal workdata' didn't help; 'gluster volume
heal workdata info' says:

Brick gluster1:/gluster/md4/workdata
/staticmap/120/710
/staticmap/120/710/120710351
<gfid:fe7fdbe8-9a39-4793-8d38-6dfdd3d5089b>
Status: Connected
Number of entries: 3

Brick gluster2:/gluster/md4/workdata
/staticmap/120/710
/staticmap/120/710/120710351
<gfid:fe7fdbe8-9a39-4793-8d38-6dfdd3d5089b>
Status: Connected
Number of entries: 3

Brick gluster3:/gluster/md4/workdata
/staticmap/120/710/120710351
Status: Connected
Number of entries: 1

There's a mismatch in one directory; I tried to follow these instructions:
https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/

gluster volume heal workdata split-brain source-brick
gluster1:/gluster/md4/workdata
gfid:fe7fdbe8-9a39-4793-8d38-6dfdd3d5089b
Healing gfid:fe7fdbe8-9a39-4793-8d38-6dfdd3d5089b failed: File not in
split-brain.
Volume heal failed.

Is there any other documentation for gfid mismatch and how to resolve this?


Thx,
Hubert

From ravishankar at redhat.com  Wed May 22 07:32:38 2019
From: ravishankar at redhat.com (Ravishankar N)
Date: Wed, 22 May 2019 13:02:38 +0530
Subject: [Gluster-users] gluster 5.6: Gfid mismatch detected
In-Reply-To: <CAAV-988dA5xhSVLdJOCc1t2oZ5OHMMBzO8oFu=Zix3r=k2kH4Q@mail.gmail.com>
References: <CAAV-988dA5xhSVLdJOCc1t2oZ5OHMMBzO8oFu=Zix3r=k2kH4Q@mail.gmail.com>
Message-ID: <46d9718a-491f-0d91-7721-30267727684f@redhat.com>


On 22/05/19 12:39 PM, Hu Bert wrote:
> Hi @ll,
>
> today i updated and rebooted the 3 servers of my replicate 3 setup;
> after the 3rd one came up again i noticed this error:
>
> [2019-05-22 06:41:26.781165] E [MSGID: 108008]
> [afr-self-heal-common.c:392:afr_gfid_split_brain_source]
> 0-workdata-replicate-0: Gfid mismatch detected for
> <gfid:751233b0-7789-4550-bd95-4dd9c8f57c19>/120710351>,
> 82025ab3-8034-4257-9628-d8ebde909629 on workdata-client-2 and
> eaf2f31e-b4a7-4fa8-b710-d6ff9cd4eace on workdata-client-1.

120710351 seems to be the entry that is in split-brain. Is 
/staticmap/120/710/120710351 the complete path to that entry? (check if 
gfid:751233b0-7789-4550-bd95-4dd9c8f57c19 corresponds to the gfid of 710).

You can then try "gluster volume heal workdata split-brain source-brick 
gluster1:/gluster/md4/workdata /staticmap/120/710/120710351"

-Ravi

> [2019-05-22 06:41:27.069969] W [MSGID: 108027]
> [afr-common.c:2270:afr_attempt_readsubvol_set] 0-workdata-replicate-0:
> no read subvols for /staticmap/120/710/120710351
> [2019-05-22 06:41:27.808532] W [fuse-bridge.c:582:fuse_entry_cbk]
> 0-glusterfs-fuse: 1834335: LOOKUP() /staticmap/120/710/120710351 => -1
> (Transport endpoint is not connected)
>
> A simple 'gluster volume heal workdata' didn't help; 'gluster volume
> heal workdata info' says:
>
> Brick gluster1:/gluster/md4/workdata
> /staticmap/120/710
> /staticmap/120/710/120710351
> <gfid:fe7fdbe8-9a39-4793-8d38-6dfdd3d5089b>
> Status: Connected
> Number of entries: 3
>
> Brick gluster2:/gluster/md4/workdata
> /staticmap/120/710
> /staticmap/120/710/120710351
> <gfid:fe7fdbe8-9a39-4793-8d38-6dfdd3d5089b>
> Status: Connected
> Number of entries: 3
>
> Brick gluster3:/gluster/md4/workdata
> /staticmap/120/710/120710351
> Status: Connected
> Number of entries: 1
>
> There's a mismatch in one directory; I tried to follow these instructions:
> https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/
>
> gluster volume heal workdata split-brain source-brick
> gluster1:/gluster/md4/workdata
> gfid:fe7fdbe8-9a39-4793-8d38-6dfdd3d5089b
> Healing gfid:fe7fdbe8-9a39-4793-8d38-6dfdd3d5089b failed: File not in
> split-brain.
> Volume heal failed.

>
> Is there any other documentation for gfid mismatch and how to resolve this?
>
>
> Thx,
> Hubert
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users

From revirii at googlemail.com  Wed May 22 07:59:13 2019
From: revirii at googlemail.com (Hu Bert)
Date: Wed, 22 May 2019 09:59:13 +0200
Subject: [Gluster-users] gluster 5.6: Gfid mismatch detected
In-Reply-To: <46d9718a-491f-0d91-7721-30267727684f@redhat.com>
References: <CAAV-988dA5xhSVLdJOCc1t2oZ5OHMMBzO8oFu=Zix3r=k2kH4Q@mail.gmail.com>
	<46d9718a-491f-0d91-7721-30267727684f@redhat.com>
Message-ID: <CAAV-98_LBRFEMV0OeN0=htuDNOKoLoKQHUZ0wuo9Sd792=fHnQ@mail.gmail.com>

Hi Ravi,

mount path of the volume is /shared/public, so complete paths are
/shared/public/staticmap/120/710/ and
/shared/public/staticmap/120/710/120710351/ .

getfattr -n glusterfs.gfid.string /shared/public/staticmap/120/710/
getfattr: Removing leading '/' from absolute path names
# file: shared/public/staticmap/120/710/
glusterfs.gfid.string="751233b0-7789-4550-bd95-4dd9c8f57c19"

getfattr -n glusterfs.gfid.string /shared/public/staticmap/120/710/120710351/
getfattr: Removing leading '/' from absolute path names
# file: shared/public/staticmap/120/710/120710351/
glusterfs.gfid.string="eaf2f31e-b4a7-4fa8-b710-d6ff9cd4eace"

So that fits. It somehow took a couple of attempts to resolve this,
and none of the commands seem to have "officially" succeeded:

gluster3 (host with the "fail"):
gluster volume heal workdata split-brain source-brick
gluster1:/gluster/md4/workdata
/shared/public/staticmap/120/710/120710351/
Lookup failed on /shared/public/staticmap/120/710:No such file or directory
Volume heal failed.

gluster1 ("good" host):
gluster volume heal workdata split-brain source-brick
gluster1:/gluster/md4/workdata
/shared/public/staticmap/120/710/120710351/
Lookup failed on /shared/public/staticmap/120/710:No such file or directory
Volume heal failed.

Only in the logs i see:

[2019-05-22 07:42:22.004182] I [MSGID: 108026]
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
0-workdata-replicate-0: performing metadata selfheal on
eaf2f31e-b4a7-4fa8-b710-d6ff9cd4eace
[2019-05-22 07:42:22.008502] I [MSGID: 108026]
[afr-self-heal-common.c:1729:afr_log_selfheal] 0-workdata-replicate-0:
Completed metadata selfheal on eaf2f31e-b4a7-4fa8-b710-d6ff9cd4eace.
sources=0 [1]  sinks=2

And via "gluster volume heal workdata statistics heal-count" there are
0 entries left. Files/directories are there. Happened the first time
with this setup, but everything ok now.

Thx for your fast help :-)


Hubert

Am Mi., 22. Mai 2019 um 09:32 Uhr schrieb Ravishankar N
<ravishankar at redhat.com>:
>
>
> On 22/05/19 12:39 PM, Hu Bert wrote:
> > Hi @ll,
> >
> > today i updated and rebooted the 3 servers of my replicate 3 setup;
> > after the 3rd one came up again i noticed this error:
> >
> > [2019-05-22 06:41:26.781165] E [MSGID: 108008]
> > [afr-self-heal-common.c:392:afr_gfid_split_brain_source]
> > 0-workdata-replicate-0: Gfid mismatch detected for
> > <gfid:751233b0-7789-4550-bd95-4dd9c8f57c19>/120710351>,
> > 82025ab3-8034-4257-9628-d8ebde909629 on workdata-client-2 and
> > eaf2f31e-b4a7-4fa8-b710-d6ff9cd4eace on workdata-client-1.
>
> 120710351 seems to be the entry that is in split-brain. Is
> /staticmap/120/710/120710351 the complete path to that entry? (check if
> gfid:751233b0-7789-4550-bd95-4dd9c8f57c19 corresponds to the gfid of 710).
>
> You can then try "gluster volume heal workdata split-brain source-brick
> gluster1:/gluster/md4/workdata /staticmap/120/710/120710351"
>
> -Ravi
>
> > [2019-05-22 06:41:27.069969] W [MSGID: 108027]
> > [afr-common.c:2270:afr_attempt_readsubvol_set] 0-workdata-replicate-0:
> > no read subvols for /staticmap/120/710/120710351
> > [2019-05-22 06:41:27.808532] W [fuse-bridge.c:582:fuse_entry_cbk]
> > 0-glusterfs-fuse: 1834335: LOOKUP() /staticmap/120/710/120710351 => -1
> > (Transport endpoint is not connected)
> >
> > A simple 'gluster volume heal workdata' didn't help; 'gluster volume
> > heal workdata info' says:
> >
> > Brick gluster1:/gluster/md4/workdata
> > /staticmap/120/710
> > /staticmap/120/710/120710351
> > <gfid:fe7fdbe8-9a39-4793-8d38-6dfdd3d5089b>
> > Status: Connected
> > Number of entries: 3
> >
> > Brick gluster2:/gluster/md4/workdata
> > /staticmap/120/710
> > /staticmap/120/710/120710351
> > <gfid:fe7fdbe8-9a39-4793-8d38-6dfdd3d5089b>
> > Status: Connected
> > Number of entries: 3
> >
> > Brick gluster3:/gluster/md4/workdata
> > /staticmap/120/710/120710351
> > Status: Connected
> > Number of entries: 1
> >
> > There's a mismatch in one directory; I tried to follow these instructions:
> > https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/
> >
> > gluster volume heal workdata split-brain source-brick
> > gluster1:/gluster/md4/workdata
> > gfid:fe7fdbe8-9a39-4793-8d38-6dfdd3d5089b
> > Healing gfid:fe7fdbe8-9a39-4793-8d38-6dfdd3d5089b failed: File not in
> > split-brain.
> > Volume heal failed.
>
> >
> > Is there any other documentation for gfid mismatch and how to resolve this?
> >
> >
> > Thx,
> > Hubert
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users at gluster.org
> > https://lists.gluster.org/mailman/listinfo/gluster-users

From ravishankar at redhat.com  Wed May 22 08:53:49 2019
From: ravishankar at redhat.com (Ravishankar N)
Date: Wed, 22 May 2019 14:23:49 +0530
Subject: [Gluster-users] gluster 5.6: Gfid mismatch detected
In-Reply-To: <CAAV-98_LBRFEMV0OeN0=htuDNOKoLoKQHUZ0wuo9Sd792=fHnQ@mail.gmail.com>
References: <CAAV-988dA5xhSVLdJOCc1t2oZ5OHMMBzO8oFu=Zix3r=k2kH4Q@mail.gmail.com>
	<46d9718a-491f-0d91-7721-30267727684f@redhat.com>
	<CAAV-98_LBRFEMV0OeN0=htuDNOKoLoKQHUZ0wuo9Sd792=fHnQ@mail.gmail.com>
Message-ID: <36b7e1b1-6fb6-88c9-b8e3-341dc67a44c9@redhat.com>


On 22/05/19 1:29 PM, Hu Bert wrote:
> Hi Ravi,
>
> mount path of the volume is /shared/public, so complete paths are
> /shared/public/staticmap/120/710/ and
> /shared/public/staticmap/120/710/120710351/ .
>
> getfattr -n glusterfs.gfid.string /shared/public/staticmap/120/710/
> getfattr: Removing leading '/' from absolute path names
> # file: shared/public/staticmap/120/710/
> glusterfs.gfid.string="751233b0-7789-4550-bd95-4dd9c8f57c19"
>
> getfattr -n glusterfs.gfid.string /shared/public/staticmap/120/710/120710351/
> getfattr: Removing leading '/' from absolute path names
> # file: shared/public/staticmap/120/710/120710351/
> glusterfs.gfid.string="eaf2f31e-b4a7-4fa8-b710-d6ff9cd4eace"
>
> So that fits. It somehow took a couple of attempts to resolve this,
> and none of the commands seem to have "officially" succeeded:
>
> gluster3 (host with the "fail"):
> gluster volume heal workdata split-brain source-brick
> gluster1:/gluster/md4/workdata
> /shared/public/staticmap/120/710/120710351/
> Lookup failed on /shared/public/staticmap/120/710:No such file or directory

The file path given to this command must be the absolute path /as seen 
from the root of the volume/. So the location where it is mounted 
(/shared/public) must be omitted. Only /staticmap/120/710/120710351/ is 
required.

HTH,

Ravi

> Volume heal failed.
>
> gluster1 ("good" host):
> gluster volume heal workdata split-brain source-brick
> gluster1:/gluster/md4/workdata
> /shared/public/staticmap/120/710/120710351/
> Lookup failed on /shared/public/staticmap/120/710:No such file or directory
> Volume heal failed.
>
> Only in the logs i see:
>
> [2019-05-22 07:42:22.004182] I [MSGID: 108026]
> [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
> 0-workdata-replicate-0: performing metadata selfheal on
> eaf2f31e-b4a7-4fa8-b710-d6ff9cd4eace
> [2019-05-22 07:42:22.008502] I [MSGID: 108026]
> [afr-self-heal-common.c:1729:afr_log_selfheal] 0-workdata-replicate-0:
> Completed metadata selfheal on eaf2f31e-b4a7-4fa8-b710-d6ff9cd4eace.
> sources=0 [1]  sinks=2
>
> And via "gluster volume heal workdata statistics heal-count" there are
> 0 entries left. Files/directories are there. Happened the first time
> with this setup, but everything ok now.
>
> Thx for your fast help :-)
>
>
> Hubert
>
> Am Mi., 22. Mai 2019 um 09:32 Uhr schrieb Ravishankar N
> <ravishankar at redhat.com>:
>>
>> On 22/05/19 12:39 PM, Hu Bert wrote:
>>> Hi @ll,
>>>
>>> today i updated and rebooted the 3 servers of my replicate 3 setup;
>>> after the 3rd one came up again i noticed this error:
>>>
>>> [2019-05-22 06:41:26.781165] E [MSGID: 108008]
>>> [afr-self-heal-common.c:392:afr_gfid_split_brain_source]
>>> 0-workdata-replicate-0: Gfid mismatch detected for
>>> <gfid:751233b0-7789-4550-bd95-4dd9c8f57c19>/120710351>,
>>> 82025ab3-8034-4257-9628-d8ebde909629 on workdata-client-2 and
>>> eaf2f31e-b4a7-4fa8-b710-d6ff9cd4eace on workdata-client-1.
>> 120710351 seems to be the entry that is in split-brain. Is
>> /staticmap/120/710/120710351 the complete path to that entry? (check if
>> gfid:751233b0-7789-4550-bd95-4dd9c8f57c19 corresponds to the gfid of 710).
>>
>> You can then try "gluster volume heal workdata split-brain source-brick
>> gluster1:/gluster/md4/workdata /staticmap/120/710/120710351"
>>
>> -Ravi
>>
>>> [2019-05-22 06:41:27.069969] W [MSGID: 108027]
>>> [afr-common.c:2270:afr_attempt_readsubvol_set] 0-workdata-replicate-0:
>>> no read subvols for /staticmap/120/710/120710351
>>> [2019-05-22 06:41:27.808532] W [fuse-bridge.c:582:fuse_entry_cbk]
>>> 0-glusterfs-fuse: 1834335: LOOKUP() /staticmap/120/710/120710351 => -1
>>> (Transport endpoint is not connected)
>>>
>>> A simple 'gluster volume heal workdata' didn't help; 'gluster volume
>>> heal workdata info' says:
>>>
>>> Brick gluster1:/gluster/md4/workdata
>>> /staticmap/120/710
>>> /staticmap/120/710/120710351
>>> <gfid:fe7fdbe8-9a39-4793-8d38-6dfdd3d5089b>
>>> Status: Connected
>>> Number of entries: 3
>>>
>>> Brick gluster2:/gluster/md4/workdata
>>> /staticmap/120/710
>>> /staticmap/120/710/120710351
>>> <gfid:fe7fdbe8-9a39-4793-8d38-6dfdd3d5089b>
>>> Status: Connected
>>> Number of entries: 3
>>>
>>> Brick gluster3:/gluster/md4/workdata
>>> /staticmap/120/710/120710351
>>> Status: Connected
>>> Number of entries: 1
>>>
>>> There's a mismatch in one directory; I tried to follow these instructions:
>>> https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/
>>>
>>> gluster volume heal workdata split-brain source-brick
>>> gluster1:/gluster/md4/workdata
>>> gfid:fe7fdbe8-9a39-4793-8d38-6dfdd3d5089b
>>> Healing gfid:fe7fdbe8-9a39-4793-8d38-6dfdd3d5089b failed: File not in
>>> split-brain.
>>> Volume heal failed.
>>> Is there any other documentation for gfid mismatch and how to resolve this?
>>>
>>>
>>> Thx,
>>> Hubert
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190522/1ba11dd5/attachment.html>

From alan.orth at gmail.com  Wed May 22 21:10:45 2019
From: alan.orth at gmail.com (Alan Orth)
Date: Thu, 23 May 2019 00:10:45 +0300
Subject: [Gluster-users] Does replace-brick migrate data?
Message-ID: <CAKKdN4VP6ome2h2_iXaRVdr+WNp7K6PssCBpOffOn547P9R3Dw@mail.gmail.com>

Dear list,

I seem to have gotten into a tricky situation. Today I brought up a shiny
new server with new disk arrays and attempted to replace one brick of a
replica 2 distribute/replicate volume on an older server using the
`replace-brick` command:

# gluster volume replace-brick homes wingu0:/mnt/gluster/homes
wingu06:/data/glusterfs/sdb/homes commit force

The command was successful and I see the new brick in the output of
`gluster volume info`. The problem is that Gluster doesn't seem to be
migrating the data, and now the original brick that I replaced is no longer
part of the volume (and a few terabytes of data are just sitting on the old
brick):

# gluster volume info homes | grep -E "Brick[0-9]:"
Brick1: wingu4:/mnt/gluster/homes
Brick2: wingu3:/mnt/gluster/homes
Brick3: wingu06:/data/glusterfs/sdb/homes
Brick4: wingu05:/data/glusterfs/sdb/homes
Brick5: wingu05:/data/glusterfs/sdc/homes
Brick6: wingu06:/data/glusterfs/sdc/homes

I see the Gluster docs have a more complicated procedure for replacing
bricks that involves getfattr/setfattr?. How can I tell Gluster about the
old brick? I see that I have a backup of the old volfile thanks to yum's
rpmsave function if that helps.

We are using Gluster 5.6 on CentOS 7. Thank you for any advice you can give.

?
https://docs.gluster.org/en/latest/Administrator%20Guide/Managing%20Volumes/#replace-faulty-brick

-- 
Alan Orth
alan.orth at gmail.com
https://picturingjordan.com
https://englishbulgaria.net
https://mjanja.ch
"In heaven all the interesting people are missing." ?Friedrich Nietzsche
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190523/cef517cc/attachment.html>

From dcunningham at voisonics.com  Wed May 22 22:24:28 2019
From: dcunningham at voisonics.com (David Cunningham)
Date: Thu, 23 May 2019 10:24:28 +1200
Subject: [Gluster-users] add-brick: failed: Commit failed
In-Reply-To: <fbd223ea-2742-142c-52cc-a7223076fbde@redhat.com>
References: <CAHGbP-9UtXqtK8=zsf=V+HghhLmsyONSfkt6SZAYQOtFezHBJw@mail.gmail.com>
	<CAOUCJ=hodD2sW3Maa9U8t5xNueuf8MaW6=KALE15+TPekEAA8g@mail.gmail.com>
	<CABj3WN2vDQwLfU673SAr5g6JLNwLMjeW8D46B_Fx+rFv2KB5LQ@mail.gmail.com>
	<CAHGbP-8dhkODZqREoxCWtpFqk+ZrW+EkyWfwgWNqRXoJTS6frg@mail.gmail.com>
	<924b8cb6-5a61-3a7f-1591-07ffe0d80a24@redhat.com>
	<CAHGbP-_A6JogdzUOm7UcGd8V8pZbF-veR0g_rsMvC7Ju0sCqMw@mail.gmail.com>
	<47a6c5fa-4304-4680-d63f-99ecd1e43c4c@redhat.com>
	<CAHGbP-_wr7TOHHCjU4M0gwK4_HvS_1mFm4pg+iRRNOsoo0npFw@mail.gmail.com>
	<764773c5-38d4-e427-d699-3192bf9a1005@redhat.com>
	<fbd223ea-2742-142c-52cc-a7223076fbde@redhat.com>
Message-ID: <CAHGbP-_EmbVZJ57iJEt05Lt7igSF4HF5Vp+RUrF8t4x7uAY+MA@mail.gmail.com>

Hi Ravi,

Please see the log attached. The output of "gluster volume status" is as
follows. Should there be something listening on gfs3? I'm not sure whether
it having TCP Port and Pid as N/A is a symptom or cause. Thank you.

# gluster volume status
Status of volume: gvol0
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick gfs1:/nodirectwritedata/gluster/gvol0 49152     0          Y
7706
Brick gfs2:/nodirectwritedata/gluster/gvol0 49152     0          Y
7624
Brick gfs3:/nodirectwritedata/gluster/gvol0 N/A       N/A        N
N/A
Self-heal Daemon on localhost               N/A       N/A        Y
19853
Self-heal Daemon on gfs1                    N/A       N/A        Y
28600
Self-heal Daemon on gfs2                    N/A       N/A        Y
17614

Task Status of Volume gvol0
------------------------------------------------------------------------------
There are no active volume tasks


On Wed, 22 May 2019 at 18:06, Ravishankar N <ravishankar at redhat.com> wrote:

> If you are trying this again, please 'gluster volume set $volname
> client-log-level DEBUG`before attempting the add-brick and attach the
> gvol0-add-brick-mount.log here. After that, you can change the
> client-log-level back to INFO.
>
> -Ravi
> On 22/05/19 11:32 AM, Ravishankar N wrote:
>
>
> On 22/05/19 11:23 AM, David Cunningham wrote:
>
> Hi Ravi,
>
> I'd already done exactly that before, where step 3 was a simple 'rm -rf
> /nodirectwritedata/gluster/gvol0'. Have you another suggestion on what the
> cleanup or reformat should be?
>
> `rm -rf /nodirectwritedata/gluster/gvol0` does look okay to me David.
> Basically, '/nodirectwritedata/gluster/gvol0' must be empty and must not
> have any extended attributes set on it. Why fuse_first_lookup() is failing
> is a bit of a mystery to me at this point. :-(
> Regards,
> Ravi
>
>
> Thank you.
>
>
> On Wed, 22 May 2019 at 13:56, Ravishankar N <ravishankar at redhat.com>
> wrote:
>
>> Hmm, so the volume info seems to indicate that the add-brick was
>> successful but the gfid xattr is missing on the new brick (as are the
>> actual files, barring the .glusterfs folder, according to your previous
>> mail).
>>
>> Do you want to try removing and adding it again?
>>
>> 1. `gluster volume remove-brick gvol0 replica 2
>> gfs3:/nodirectwritedata/gluster/gvol0 force` from gfs1
>>
>> 2. Check that gluster volume info is now back to a 1x2 volume on all
>> nodes and `gluster peer status` is  connected on all nodes.
>>
>> 3. Cleanup or reformat '/nodirectwritedata/gluster/gvol0' on gfs3.
>>
>> 4. `gluster volume add-brick gvol0 replica 3 arbiter 1
>> gfs3:/nodirectwritedata/gluster/gvol0` from gfs1.
>>
>> 5. Check that the files are getting healed on to the new brick.
>> Thanks,
>> Ravi
>> On 22/05/19 6:50 AM, David Cunningham wrote:
>>
>> Hi Ravi,
>>
>> Certainly. On the existing two nodes:
>>
>> gfs1 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0
>> getfattr: Removing leading '/' from absolute path names
>> # file: nodirectwritedata/gluster/gvol0
>> trusted.afr.dirty=0x000000000000000000000000
>> trusted.afr.gvol0-client-2=0x000000000000000000000000
>> trusted.gfid=0x00000000000000000000000000000001
>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>> trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6
>>
>> gfs2 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0
>> getfattr: Removing leading '/' from absolute path names
>> # file: nodirectwritedata/gluster/gvol0
>> trusted.afr.dirty=0x000000000000000000000000
>> trusted.afr.gvol0-client-0=0x000000000000000000000000
>> trusted.afr.gvol0-client-2=0x000000000000000000000000
>> trusted.gfid=0x00000000000000000000000000000001
>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>> trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6
>>
>> On the new node:
>>
>> gfs3 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0
>> getfattr: Removing leading '/' from absolute path names
>> # file: nodirectwritedata/gluster/gvol0
>> trusted.afr.dirty=0x000000000000000000000001
>> trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6
>>
>> Output of "gluster volume info" is the same on all 3 nodes and is:
>>
>> # gluster volume info
>>
>> Volume Name: gvol0
>> Type: Replicate
>> Volume ID: fb5af69e-1c3e-4164-8b23-c1d7bec9b1b6
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 1 x (2 + 1) = 3
>> Transport-type: tcp
>> Bricks:
>> Brick1: gfs1:/nodirectwritedata/gluster/gvol0
>> Brick2: gfs2:/nodirectwritedata/gluster/gvol0
>> Brick3: gfs3:/nodirectwritedata/gluster/gvol0 (arbiter)
>> Options Reconfigured:
>> performance.client-io-threads: off
>> nfs.disable: on
>> transport.address-family: inet
>>
>>
>> On Wed, 22 May 2019 at 12:43, Ravishankar N <ravishankar at redhat.com>
>> wrote:
>>
>>> Hi David,
>>> Could you provide the `getfattr -d -m. -e hex
>>> /nodirectwritedata/gluster/gvol0` output of all bricks and the output of
>>> `gluster volume info`?
>>>
>>> Thanks,
>>> Ravi
>>> On 22/05/19 4:57 AM, David Cunningham wrote:
>>>
>>> Hi Sanju,
>>>
>>> Here's what glusterd.log says on the new arbiter server when trying to
>>> add the node:
>>>
>>> [2019-05-22 00:15:05.963059] I [run.c:242:runner_log]
>>> (-->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x3b2cd)
>>> [0x7fe4ca9102cd]
>>> -->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0xe6b85)
>>> [0x7fe4ca9bbb85] -->/lib64/libglusterfs.so.0(runner_log+0x115)
>>> [0x7fe4d5ecc955] ) 0-management: Ran script:
>>> /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh
>>> --volname=gvol0 --version=1 --volume-op=add-brick
>>> --gd-workdir=/var/lib/glusterd
>>> [2019-05-22 00:15:05.963177] I [MSGID: 106578]
>>> [glusterd-brick-ops.c:1355:glusterd_op_perform_add_bricks] 0-management:
>>> replica-count is set 3
>>> [2019-05-22 00:15:05.963228] I [MSGID: 106578]
>>> [glusterd-brick-ops.c:1360:glusterd_op_perform_add_bricks] 0-management:
>>> arbiter-count is set 1
>>> [2019-05-22 00:15:05.963257] I [MSGID: 106578]
>>> [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks] 0-management:
>>> type is set 0, need to change it
>>> [2019-05-22 00:15:17.015268] E [MSGID: 106053]
>>> [glusterd-utils.c:13942:glusterd_handle_replicate_brick_ops] 0-management:
>>> Failed to set extended attribute trusted.add-brick : Transport endpoint is
>>> not connected [Transport endpoint is not connected]
>>> [2019-05-22 00:15:17.036479] E [MSGID: 106073]
>>> [glusterd-brick-ops.c:2595:glusterd_op_add_brick] 0-glusterd: Unable to add
>>> bricks
>>> [2019-05-22 00:15:17.036595] E [MSGID: 106122]
>>> [glusterd-mgmt.c:299:gd_mgmt_v3_commit_fn] 0-management: Add-brick commit
>>> failed.
>>> [2019-05-22 00:15:17.036710] E [MSGID: 106122]
>>> [glusterd-mgmt-handler.c:594:glusterd_handle_commit_fn] 0-management:
>>> commit failed on operation Add brick
>>>
>>> As before gvol0-add-brick-mount.log said:
>>>
>>> [2019-05-22 00:15:17.005695] I [fuse-bridge.c:4267:fuse_init]
>>> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel
>>> 7.22
>>> [2019-05-22 00:15:17.005749] I [fuse-bridge.c:4878:fuse_graph_sync]
>>> 0-fuse: switched to graph 0
>>> [2019-05-22 00:15:17.010101] E [fuse-bridge.c:4336:fuse_first_lookup]
>>> 0-fuse: first lookup on root failed (Transport endpoint is not connected)
>>> [2019-05-22 00:15:17.014217] W [fuse-bridge.c:897:fuse_attr_cbk]
>>> 0-glusterfs-fuse: 2: LOOKUP() / => -1 (Transport endpoint is not connected)
>>> [2019-05-22 00:15:17.015097] W
>>> [fuse-resolve.c:127:fuse_resolve_gfid_cbk] 0-fuse:
>>> 00000000-0000-0000-0000-000000000001: failed to resolve (Transport endpoint
>>> is not connected)
>>> [2019-05-22 00:15:17.015158] W [fuse-bridge.c:3294:fuse_setxattr_resume]
>>> 0-glusterfs-fuse: 3: SETXATTR 00000000-0000-0000-0000-000000000001/1
>>> (trusted.add-brick) resolution failed
>>> [2019-05-22 00:15:17.035636] I [fuse-bridge.c:5144:fuse_thread_proc]
>>> 0-fuse: initating unmount of /tmp/mntYGNbj9
>>> [2019-05-22 00:15:17.035854] W [glusterfsd.c:1500:cleanup_and_exit]
>>> (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f7745ccedd5]
>>> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55c81b63de75]
>>> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55c81b63dceb] ) 0-:
>>> received signum (15), shutting down
>>> [2019-05-22 00:15:17.035942] I [fuse-bridge.c:5914:fini] 0-fuse:
>>> Unmounting '/tmp/mntYGNbj9'.
>>> [2019-05-22 00:15:17.035966] I [fuse-bridge.c:5919:fini] 0-fuse: Closing
>>> fuse connection to '/tmp/mntYGNbj9'.
>>>
>>> Here are the processes running on the new arbiter server:
>>> # ps -ef | grep gluster
>>> root      3466     1  0 20:13 ?        00:00:00 /usr/sbin/glusterfs -s
>>> localhost --volfile-id gluster/glustershd -p
>>> /var/run/gluster/glustershd/glustershd.pid -l
>>> /var/log/glusterfs/glustershd.log -S
>>> /var/run/gluster/24c12b09f93eec8e.socket --xlator-option
>>> *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412 --process-name
>>> glustershd
>>> root      6832     1  0 May16 ?        00:02:10 /usr/sbin/glusterd -p
>>> /var/run/glusterd.pid --log-level INFO
>>> root     17841     1  0 May16 ?        00:00:58 /usr/sbin/glusterfs
>>> --process-name fuse --volfile-server=gfs1 --volfile-id=/gvol0 /mnt/glusterfs
>>>
>>> Here are the files created on the new arbiter server:
>>> # find /nodirectwritedata/gluster/gvol0 | xargs ls -ald
>>> drwxr-xr-x 3 root root 4096 May 21 20:15 /nodirectwritedata/gluster/gvol0
>>> drw------- 2 root root 4096 May 21 20:15
>>> /nodirectwritedata/gluster/gvol0/.glusterfs
>>>
>>> Thank you for your help!
>>>
>>>
>>> On Tue, 21 May 2019 at 00:10, Sanju Rakonde <srakonde at redhat.com> wrote:
>>>
>>>> David,
>>>>
>>>> can you please attach glusterd.logs? As the error message says, Commit
>>>> failed on the arbitar node, we might be able to find some issue on that
>>>> node.
>>>>
>>>> On Mon, May 20, 2019 at 10:10 AM Nithya Balachandran <
>>>> nbalacha at redhat.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Fri, 17 May 2019 at 06:01, David Cunningham <
>>>>> dcunningham at voisonics.com> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> We're adding an arbiter node to an existing volume and having an
>>>>>> issue. Can anyone help? The root cause error appears to be
>>>>>> "00000000-0000-0000-0000-000000000001: failed to resolve (Transport
>>>>>> endpoint is not connected)", as below.
>>>>>>
>>>>>> We are running glusterfs 5.6.1. Thanks in advance for any assistance!
>>>>>>
>>>>>> On existing node gfs1, trying to add new arbiter node gfs3:
>>>>>>
>>>>>> # gluster volume add-brick gvol0 replica 3 arbiter 1
>>>>>> gfs3:/nodirectwritedata/gluster/gvol0
>>>>>> volume add-brick: failed: Commit failed on gfs3. Please check log
>>>>>> file for details.
>>>>>>
>>>>>
>>>>> This looks like a glusterd issue. Please check the glusterd logs for
>>>>> more info.
>>>>> Adding the glusterd dev to this thread. Sanju, can you take a look?
>>>>>
>>>>> Regards,
>>>>> Nithya
>>>>>
>>>>>>
>>>>>> On new node gfs3 in gvol0-add-brick-mount.log:
>>>>>>
>>>>>> [2019-05-17 01:20:22.689721] I [fuse-bridge.c:4267:fuse_init]
>>>>>> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel
>>>>>> 7.22
>>>>>> [2019-05-17 01:20:22.689778] I [fuse-bridge.c:4878:fuse_graph_sync]
>>>>>> 0-fuse: switched to graph 0
>>>>>> [2019-05-17 01:20:22.694897] E [fuse-bridge.c:4336:fuse_first_lookup]
>>>>>> 0-fuse: first lookup on root failed (Transport endpoint is not connected)
>>>>>> [2019-05-17 01:20:22.699770] W
>>>>>> [fuse-resolve.c:127:fuse_resolve_gfid_cbk] 0-fuse:
>>>>>> 00000000-0000-0000-0000-000000000001: failed to resolve (Transport endpoint
>>>>>> is not connected)
>>>>>> [2019-05-17 01:20:22.699834] W
>>>>>> [fuse-bridge.c:3294:fuse_setxattr_resume] 0-glusterfs-fuse: 2: SETXATTR
>>>>>> 00000000-0000-0000-0000-000000000001/1 (trusted.add-brick) resolution failed
>>>>>> [2019-05-17 01:20:22.715656] I [fuse-bridge.c:5144:fuse_thread_proc]
>>>>>> 0-fuse: initating unmount of /tmp/mntQAtu3f
>>>>>> [2019-05-17 01:20:22.715865] W [glusterfsd.c:1500:cleanup_and_exit]
>>>>>> (-->/lib64/libpthread.so.0(+0x7dd5) [0x7fb223bf6dd5]
>>>>>> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x560886581e75]
>>>>>> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x560886581ceb] ) 0-:
>>>>>> received signum (15), shutting down
>>>>>> [2019-05-17 01:20:22.715926] I [fuse-bridge.c:5914:fini] 0-fuse:
>>>>>> Unmounting '/tmp/mntQAtu3f'.
>>>>>> [2019-05-17 01:20:22.715953] I [fuse-bridge.c:5919:fini] 0-fuse:
>>>>>> Closing fuse connection to '/tmp/mntQAtu3f'.
>>>>>>
>>>>>> Processes running on new node gfs3:
>>>>>>
>>>>>> # ps -ef | grep gluster
>>>>>> root      6832     1  0 20:17 ?        00:00:00 /usr/sbin/glusterd -p
>>>>>> /var/run/glusterd.pid --log-level INFO
>>>>>> root     15799     1  0 20:17 ?        00:00:00 /usr/sbin/glusterfs
>>>>>> -s localhost --volfile-id gluster/glustershd -p
>>>>>> /var/run/gluster/glustershd/glustershd.pid -l
>>>>>> /var/log/glusterfs/glustershd.log -S
>>>>>> /var/run/gluster/24c12b09f93eec8e.socket --xlator-option
>>>>>> *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412 --process-name
>>>>>> glustershd
>>>>>> root     16856 16735  0 21:21 pts/0    00:00:00 grep --color=auto
>>>>>> gluster
>>>>>>
>>>>>> --
>>>>>> David Cunningham, Voisonics Limited
>>>>>> http://voisonics.com/
>>>>>> USA: +1 213 221 1092
>>>>>> New Zealand: +64 (0)28 2558 3782
>>>>>> _______________________________________________
>>>>>> Gluster-users mailing list
>>>>>> Gluster-users at gluster.org
>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>>
>>>>
>>>> --
>>>> Thanks,
>>>> Sanju
>>>>
>>>
>>>
>>> --
>>> David Cunningham, Voisonics Limited
>>> http://voisonics.com/
>>> USA: +1 213 221 1092
>>> New Zealand: +64 (0)28 2558 3782
>>>
>>> _______________________________________________
>>> Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>>
>> --
>> David Cunningham, Voisonics Limited
>> http://voisonics.com/
>> USA: +1 213 221 1092
>> New Zealand: +64 (0)28 2558 3782
>>
>>
>
> --
> David Cunningham, Voisonics Limited
> http://voisonics.com/
> USA: +1 213 221 1092
> New Zealand: +64 (0)28 2558 3782
>
>

-- 
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190523/ea4c1ce2/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gvol0-add-brick-mount.log
Type: text/x-log
Size: 30154 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190523/ea4c1ce2/attachment.bin>

From spisla80 at gmail.com  Thu May 23 10:10:33 2019
From: spisla80 at gmail.com (David Spisla)
Date: Thu, 23 May 2019 12:10:33 +0200
Subject: [Gluster-users] Create Gluster RPMs on a SLES15 machine
In-Reply-To: <CAJyj9j9wX=J601SiP-ic1LnbQKWe4LKmGHd4LX=LpzRCsU3EmQ@mail.gmail.com>
References: <CAJyj9j-A1VzvmQWJgSUj1NDZrc_eGsUdH5GU_F_gyd4n8jR4tQ@mail.gmail.com>
	<CAC+Jd5Dy-GkfQbwkHmJEAXoGqqHfUEqw_e1otj+rJPdmAPk_yA@mail.gmail.com>
	<CAC+Jd5D=QiJn4vsp=weSoiQOHRoEg91yanHinEsXcRCdn1r1RA@mail.gmail.com>
	<CAJyj9j9wX=J601SiP-ic1LnbQKWe4LKmGHd4LX=LpzRCsU3EmQ@mail.gmail.com>
Message-ID: <CAJyj9j_j7A0tQyUSpgDRfTjEsrssq8Vtkt9UaLXpOsFm1+K65w@mail.gmail.com>

Hello Kaleb,

there is no rpcsvc-proto rpm for SLES15 according to this:
https://software.opensuse.org/package/rpcsvc-proto?locale=fa

It really seems to be that the SLES15 from OpenSUSE has a special setup.

Remove the BuildRequires: rpcgen line and use the glibc bundled rpcgen is
working. I have my packages now!

Regards
David

Am Mo., 13. Mai 2019 um 08:10 Uhr schrieb David Spisla <spisla80 at gmail.com>:

> Hello Kaleb,
>
> thank you for the info. I'll try this out.
>
> Regards
> David
>
> Am Fr., 10. Mai 2019 um 16:24 Uhr schrieb Kaleb Keithley <
> kkeithle at redhat.com>:
>
>> Seems I accidentally omitted gluster-users in my first reply.
>>
>> On Thu, May 9, 2019 at 3:19 PM Kaleb Keithley <kkeithle at redhat.com>
>> wrote:
>>
>>> On Thu, May 9, 2019 at 8:53 AM David Spisla <spisla80 at gmail.com> wrote:
>>>
>>>> Hello Kaleb,
>>>>
>>>> i am trying to create my own Gluster v5.5 RPMs for SLES15 and I am
>>>> using a SLES15 system to create them. I got the following error message:
>>>>
>>>> rpmbuild --define '_topdir
>>>>> /home/davids/glusterfs/extras/LinuxRPM/rpmbuild' --with gnfs -bb
>>>>> rpmbuild/SPECS/glusterfs.spec
>>>>> warning: bogus date in %changelog: Tue Apr 17 2019 kkeithle at
>>>>> redhat.com
>>>>> warning: bogus date in %changelog: Fri Sep 19 2018 kkeithle at
>>>>> redhat.com
>>>>> error: Failed build dependencies:
>>>>>     rpcgen is needed by glusterfs-5.5-100.x86_64
>>>>> make: *** [Makefile:579: rpms] Error 1
>>>>>
>>>>>
>>>> In the corresponding glusterfs.spec file (branch sles15-glusterfs-5 in
>>>> Repo glusterfs-suse) there is rpcgen listed as dependency. But
>>>> unfortunately there is no rpcgen package provided on SLES15. Or with other
>>>> words:
>>>> I did only find RPMs for other SUSE distributions, but not for SLES15.
>>>>
>>>> Do you know that issue?
>>>>
>>>
>>> I'm afraid I don't.
>>>
>>>
>>>> What is the name of the distribution which you are using to create
>>>> Packages for SLES15?
>>>>
>>>
>>> The community packages are built on the OpenSUSE OBS and they are built
>>> on SLES15 ?the one that OBS provides. I don't know any details beyond that.
>>> It could be a real SLES15 system, or it could be a build in mock, or SUSE's
>>> chroot build tool if they don't have mock.
>>>
>>> You can see the build logs from the community builds of glusterfs-5.5
>>> and glusterfs-5.6 for SLES15 at [1] and [2] respectively. AFAIK it's a
>>> completely "vanilla" SLES15 and seems to have rpcgen-1.3-2.18 available.
>>> Finding things in the OBS repos seems to be hit or miss sometimes. I can't
>>> find the SLE_15 rpcgen package.
>>>
>>> (Back in SLES11 days I had a free eval license that let me update and
>>> install add-on packages on my own system. I tried to get a similar license
>>> for SLES12 and was advised to just use OBS. I haven't even bothered trying
>>> to get one for SLES15. It makes it harder IMO to figure things out.)
>>>
>>> I recommend asking the OBS team on #opensuse-buildservice on (freenode)
>>> IRC. They've always been very helpful to me.
>>>
>>
>> Miuku on #opensuse-buildservice poked around and found that the unbundled
>> rpcgen in SLE_15 comes from the rpcsvc-proto rpm. (Not the rpcgen rpm as it
>> does in Fedora and RHEL8.)
>>
>> All the gluster community packages for SLE_15 going back to glusterfs-5.0
>> in October 2018 have used the unbundled rpcgen. You can do the same, or
>> remove the BuildRequires: rpcgen line and use the glibc bundled rpcgen.
>>
>> HTH
>>
>> --
>>
>> Kaleb
>>
>>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190523/5f131055/attachment.html>

From srakonde at redhat.com  Thu May 23 10:34:37 2019
From: srakonde at redhat.com (Sanju Rakonde)
Date: Thu, 23 May 2019 16:04:37 +0530
Subject: [Gluster-users] ./tests/basic/gfapi/gfapi-ssl-test.t is failing too
	often in regression
Message-ID: <CABj3WN0RnArUkDP9geYdqOPi_HtdybQkQ8b2WLti-DHNXf8Vxg@mail.gmail.com>

I see a lot of patches are failing regressions due to the .t mentioned in
the subject line. I've filed a bug[1] for the same.

https://bugzilla.redhat.com/show_bug.cgi?id=1713284
-- 
Thanks,
Sanju
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190523/5f4edd7c/attachment.html>

From srakonde at redhat.com  Thu May 23 11:04:28 2019
From: srakonde at redhat.com (Sanju Rakonde)
Date: Thu, 23 May 2019 16:34:28 +0530
Subject: [Gluster-users] ./tests/basic/gfapi/gfapi-ssl-test.t is failing
	too often in regression
In-Reply-To: <CABj3WN0RnArUkDP9geYdqOPi_HtdybQkQ8b2WLti-DHNXf8Vxg@mail.gmail.com>
References: <CABj3WN0RnArUkDP9geYdqOPi_HtdybQkQ8b2WLti-DHNXf8Vxg@mail.gmail.com>
Message-ID: <CABj3WN0edMcvmE45UEdzmOpBsaAp9tbgeEZL9nO5sJSTjg1D3w@mail.gmail.com>

I apologize for the wrong mail. This .t failed only for one patch and I
don't think it is spurious. Closing this bug as not a bug.

On Thu, May 23, 2019 at 4:04 PM Sanju Rakonde <srakonde at redhat.com> wrote:

> I see a lot of patches are failing regressions due to the .t mentioned in
> the subject line. I've filed a bug[1] for the same.
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1713284
> --
> Thanks,
> Sanju
>


-- 
Thanks,
Sanju
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190523/5eef297c/attachment.html>

From brandon at thinkhuge.net  Thu May 23 17:45:40 2019
From: brandon at thinkhuge.net (brandon at thinkhuge.net)
Date: Thu, 23 May 2019 10:45:40 -0700
Subject: [Gluster-users] remove-brick failure on distributed with 5.6
Message-ID: <080701d5118f$574a72b0$05df5810$@thinkhuge.net>

Does anyone know what should be done on a glusterfs v5.6 "gluster volume
remove-brick" operation that fails?  I'm trying to remove 1 of 8 distributed
smaller nodes for replacement with larger node. 

 
The "gluster volume remove-brick ... status" command reports status failed
and failures = "3" 

 
cat /var/log/glusterfs/volbackups-rebalance.log 

... 

[2019-05-23 16:43:37.442283] I [MSGID: 109028]
[dht-rebalance.c:5070:gf_defrag_status_get] 0-volbackups-dht: Rebalance is
failed. Time taken is 545.00 secs 

 
All servers are confirmed in good communications and updated and freshly
rebooted and retried the remove-brick few times with fail each time

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190523/ef3f20e9/attachment.html>

From ravishankar at redhat.com  Fri May 24 13:48:52 2019
From: ravishankar at redhat.com (Ravishankar N)
Date: Fri, 24 May 2019 19:18:52 +0530
Subject: [Gluster-users] add-brick: failed: Commit failed
In-Reply-To: <CAHGbP-_EmbVZJ57iJEt05Lt7igSF4HF5Vp+RUrF8t4x7uAY+MA@mail.gmail.com>
References: <CAHGbP-9UtXqtK8=zsf=V+HghhLmsyONSfkt6SZAYQOtFezHBJw@mail.gmail.com>
	<CAOUCJ=hodD2sW3Maa9U8t5xNueuf8MaW6=KALE15+TPekEAA8g@mail.gmail.com>
	<CABj3WN2vDQwLfU673SAr5g6JLNwLMjeW8D46B_Fx+rFv2KB5LQ@mail.gmail.com>
	<CAHGbP-8dhkODZqREoxCWtpFqk+ZrW+EkyWfwgWNqRXoJTS6frg@mail.gmail.com>
	<924b8cb6-5a61-3a7f-1591-07ffe0d80a24@redhat.com>
	<CAHGbP-_A6JogdzUOm7UcGd8V8pZbF-veR0g_rsMvC7Ju0sCqMw@mail.gmail.com>
	<47a6c5fa-4304-4680-d63f-99ecd1e43c4c@redhat.com>
	<CAHGbP-_wr7TOHHCjU4M0gwK4_HvS_1mFm4pg+iRRNOsoo0npFw@mail.gmail.com>
	<764773c5-38d4-e427-d699-3192bf9a1005@redhat.com>
	<fbd223ea-2742-142c-52cc-a7223076fbde@redhat.com>
	<CAHGbP-_EmbVZJ57iJEt05Lt7igSF4HF5Vp+RUrF8t4x7uAY+MA@mail.gmail.com>
Message-ID: <2132c601-86a4-2b85-5d72-7b9926f890a2@redhat.com>

Hi David,

On 23/05/19 3:54 AM, David Cunningham wrote:
> Hi Ravi,
>
> Please see the log attached.
When I grep -E "Connected to |disconnected from" 
gvol0-add-brick-mount.log,? I don't see a "Connected to gvol0-client-1". 
It looks like this temporary mount is not able to connect to the 2nd 
brick, which is why the lookup is failing due to lack of quorum.
> The output of "gluster volume status" is as follows. Should there be 
> something listening on gfs3? I'm not sure whether it having TCP Port 
> and Pid as N/A is a symptom or cause. Thank you.
>
> # gluster volume status
> Status of volume: gvol0
> Gluster process???????????????????????????? TCP Port? RDMA Port? 
> Online? Pid
> ------------------------------------------------------------------------------
> Brick gfs1:/nodirectwritedata/gluster/gvol0 49152 0????????? Y?????? 7706
> Brick gfs2:/nodirectwritedata/gluster/gvol0 49152 0????????? Y?????? 7624
> Brick gfs3:/nodirectwritedata/gluster/gvol0 N/A N/A??????? N?????? N/A

Can you see if the following steps help?

1. Do a `setfattr -n trusted.afr.gvol0-client-2 -v 
0x000000000000000100000001 /nodirectwritedata/gluster/gvol0` on *both* 
gfs1 and gfs2.

2. 'gluster volume start gvol0 force`

3. Check if Brick-3 now comes online with a valid TCP port and PID. If 
it doesn't, check the brick log under /var/log/glusterfs/bricks on gfs3 
to see why.

Thanks,

Ravi


> Self-heal Daemon on localhost?????????????? N/A N/A??????? Y?????? 19853
> Self-heal Daemon on gfs1??????????????????? N/A N/A??????? Y?????? 28600
> Self-heal Daemon on gfs2??????????????????? N/A N/A??????? Y?????? 17614
>
> Task Status of Volume gvol0
> ------------------------------------------------------------------------------
> There are no active volume tasks
>
>
> On Wed, 22 May 2019 at 18:06, Ravishankar N <ravishankar at redhat.com 
> <mailto:ravishankar at redhat.com>> wrote:
>
>     If you are trying this again, please 'gluster volume set $volname
>     client-log-level DEBUG`before attempting the add-brick and attach
>     the gvol0-add-brick-mount.log here. After that, you can change the
>     client-log-level back to INFO.
>
>     -Ravi
>
>     On 22/05/19 11:32 AM, Ravishankar N wrote:
>>
>>
>>     On 22/05/19 11:23 AM, David Cunningham wrote:
>>>     Hi Ravi,
>>>
>>>     I'd already done exactly that before, where step 3 was a simple
>>>     'rm -rf /nodirectwritedata/gluster/gvol0'. Have you another
>>>     suggestion on what the cleanup or reformat should be?
>>     `rm -rf /nodirectwritedata/gluster/gvol0` does look okay to me
>>     David. Basically, '/nodirectwritedata/gluster/gvol0' must be
>>     empty and must not have any extended attributes set on it. Why
>>     fuse_first_lookup() is failing is a bit of a mystery to me at
>>     this point. :-(
>>     Regards,
>>     Ravi
>>>
>>>     Thank you.
>>>
>>>
>>>     On Wed, 22 May 2019 at 13:56, Ravishankar N
>>>     <ravishankar at redhat.com <mailto:ravishankar at redhat.com>> wrote:
>>>
>>>         Hmm, so the volume info seems to indicate that the add-brick
>>>         was successful but the gfid xattr is missing on the new
>>>         brick (as are the actual files, barring the .glusterfs
>>>         folder, according to your previous mail).
>>>
>>>         Do you want to try removing and adding it again?
>>>
>>>         1. `gluster volume remove-brick gvol0 replica 2
>>>         gfs3:/nodirectwritedata/gluster/gvol0 force` from gfs1
>>>
>>>         2. Check that gluster volume info is now back to a 1x2
>>>         volume on all nodes and `gluster peer status` is? connected
>>>         on all nodes.
>>>
>>>         3. Cleanup or reformat '/nodirectwritedata/gluster/gvol0' on
>>>         gfs3.
>>>
>>>         4. `gluster volume add-brick gvol0 replica 3 arbiter 1
>>>         gfs3:/nodirectwritedata/gluster/gvol0` from gfs1.
>>>
>>>         5. Check that the files are getting healed on to the new brick.
>>>
>>>         Thanks,
>>>         Ravi
>>>         On 22/05/19 6:50 AM, David Cunningham wrote:
>>>>         Hi Ravi,
>>>>
>>>>         Certainly. On the existing two nodes:
>>>>
>>>>         gfs1 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0
>>>>         getfattr: Removing leading '/' from absolute path names
>>>>         # file: nodirectwritedata/gluster/gvol0
>>>>         trusted.afr.dirty=0x000000000000000000000000
>>>>         trusted.afr.gvol0-client-2=0x000000000000000000000000
>>>>         trusted.gfid=0x00000000000000000000000000000001
>>>>         trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>>         trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6
>>>>
>>>>         gfs2 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0
>>>>         getfattr: Removing leading '/' from absolute path names
>>>>         # file: nodirectwritedata/gluster/gvol0
>>>>         trusted.afr.dirty=0x000000000000000000000000
>>>>         trusted.afr.gvol0-client-0=0x000000000000000000000000
>>>>         trusted.afr.gvol0-client-2=0x000000000000000000000000
>>>>         trusted.gfid=0x00000000000000000000000000000001
>>>>         trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>>>         trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6
>>>>
>>>>         On the new node:
>>>>
>>>>         gfs3 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0
>>>>         getfattr: Removing leading '/' from absolute path names
>>>>         # file: nodirectwritedata/gluster/gvol0
>>>>         trusted.afr.dirty=0x000000000000000000000001
>>>>         trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6
>>>>
>>>>         Output of "gluster volume info" is the same on all 3 nodes
>>>>         and is:
>>>>
>>>>         # gluster volume info
>>>>
>>>>         Volume Name: gvol0
>>>>         Type: Replicate
>>>>         Volume ID: fb5af69e-1c3e-4164-8b23-c1d7bec9b1b6
>>>>         Status: Started
>>>>         Snapshot Count: 0
>>>>         Number of Bricks: 1 x (2 + 1) = 3
>>>>         Transport-type: tcp
>>>>         Bricks:
>>>>         Brick1: gfs1:/nodirectwritedata/gluster/gvol0
>>>>         Brick2: gfs2:/nodirectwritedata/gluster/gvol0
>>>>         Brick3: gfs3:/nodirectwritedata/gluster/gvol0 (arbiter)
>>>>         Options Reconfigured:
>>>>         performance.client-io-threads: off
>>>>         nfs.disable: on
>>>>         transport.address-family: inet
>>>>
>>>>
>>>>         On Wed, 22 May 2019 at 12:43, Ravishankar N
>>>>         <ravishankar at redhat.com <mailto:ravishankar at redhat.com>> wrote:
>>>>
>>>>             Hi David,
>>>>             Could you provide the `getfattr -d -m. -e hex
>>>>             /nodirectwritedata/gluster/gvol0` output of all bricks
>>>>             and the output of `gluster volume info`?
>>>>
>>>>             Thanks,
>>>>             Ravi
>>>>             On 22/05/19 4:57 AM, David Cunningham wrote:
>>>>>             Hi Sanju,
>>>>>
>>>>>             Here's what glusterd.log says on the new arbiter
>>>>>             server when trying to add the node:
>>>>>
>>>>>             [2019-05-22 00:15:05.963059] I [run.c:242:runner_log]
>>>>>             (-->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x3b2cd)
>>>>>             [0x7fe4ca9102cd]
>>>>>             -->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0xe6b85)
>>>>>             [0x7fe4ca9bbb85]
>>>>>             -->/lib64/libglusterfs.so.0(runner_log+0x115)
>>>>>             [0x7fe4d5ecc955] ) 0-management: Ran script:
>>>>>             /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh
>>>>>             --volname=gvol0 --version=1 --volume-op=add-brick
>>>>>             --gd-workdir=/var/lib/glusterd
>>>>>             [2019-05-22 00:15:05.963177] I [MSGID: 106578]
>>>>>             [glusterd-brick-ops.c:1355:glusterd_op_perform_add_bricks]
>>>>>             0-management: replica-count is set 3
>>>>>             [2019-05-22 00:15:05.963228] I [MSGID: 106578]
>>>>>             [glusterd-brick-ops.c:1360:glusterd_op_perform_add_bricks]
>>>>>             0-management: arbiter-count is set 1
>>>>>             [2019-05-22 00:15:05.963257] I [MSGID: 106578]
>>>>>             [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks]
>>>>>             0-management: type is set 0, need to change it
>>>>>             [2019-05-22 00:15:17.015268] E [MSGID: 106053]
>>>>>             [glusterd-utils.c:13942:glusterd_handle_replicate_brick_ops]
>>>>>             0-management: Failed to set extended attribute
>>>>>             trusted.add-brick : Transport endpoint is not
>>>>>             connected [Transport endpoint is not connected]
>>>>>             [2019-05-22 00:15:17.036479] E [MSGID: 106073]
>>>>>             [glusterd-brick-ops.c:2595:glusterd_op_add_brick]
>>>>>             0-glusterd: Unable to add bricks
>>>>>             [2019-05-22 00:15:17.036595] E [MSGID: 106122]
>>>>>             [glusterd-mgmt.c:299:gd_mgmt_v3_commit_fn]
>>>>>             0-management: Add-brick commit failed.
>>>>>             [2019-05-22 00:15:17.036710] E [MSGID: 106122]
>>>>>             [glusterd-mgmt-handler.c:594:glusterd_handle_commit_fn]
>>>>>             0-management: commit failed on operation Add brick
>>>>>
>>>>>             As before gvol0-add-brick-mount.log said:
>>>>>
>>>>>             [2019-05-22 00:15:17.005695] I
>>>>>             [fuse-bridge.c:4267:fuse_init] 0-glusterfs-fuse: FUSE
>>>>>             inited with protocol versions: glusterfs 7.24 kernel 7.22
>>>>>             [2019-05-22 00:15:17.005749] I
>>>>>             [fuse-bridge.c:4878:fuse_graph_sync] 0-fuse: switched
>>>>>             to graph 0
>>>>>             [2019-05-22 00:15:17.010101] E
>>>>>             [fuse-bridge.c:4336:fuse_first_lookup] 0-fuse: first
>>>>>             lookup on root failed (Transport endpoint is not
>>>>>             connected)
>>>>>             [2019-05-22 00:15:17.014217] W
>>>>>             [fuse-bridge.c:897:fuse_attr_cbk] 0-glusterfs-fuse: 2:
>>>>>             LOOKUP() / => -1 (Transport endpoint is not connected)
>>>>>             [2019-05-22 00:15:17.015097] W
>>>>>             [fuse-resolve.c:127:fuse_resolve_gfid_cbk] 0-fuse:
>>>>>             00000000-0000-0000-0000-000000000001: failed to
>>>>>             resolve (Transport endpoint is not connected)
>>>>>             [2019-05-22 00:15:17.015158] W
>>>>>             [fuse-bridge.c:3294:fuse_setxattr_resume]
>>>>>             0-glusterfs-fuse: 3: SETXATTR
>>>>>             00000000-0000-0000-0000-000000000001/1
>>>>>             (trusted.add-brick) resolution failed
>>>>>             [2019-05-22 00:15:17.035636] I
>>>>>             [fuse-bridge.c:5144:fuse_thread_proc] 0-fuse:
>>>>>             initating unmount of /tmp/mntYGNbj9
>>>>>             [2019-05-22 00:15:17.035854] W
>>>>>             [glusterfsd.c:1500:cleanup_and_exit]
>>>>>             (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f7745ccedd5]
>>>>>             -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5)
>>>>>             [0x55c81b63de75]
>>>>>             -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b)
>>>>>             [0x55c81b63dceb] ) 0-: received signum (15), shutting down
>>>>>             [2019-05-22 00:15:17.035942] I
>>>>>             [fuse-bridge.c:5914:fini] 0-fuse: Unmounting
>>>>>             '/tmp/mntYGNbj9'.
>>>>>             [2019-05-22 00:15:17.035966] I
>>>>>             [fuse-bridge.c:5919:fini] 0-fuse: Closing fuse
>>>>>             connection to '/tmp/mntYGNbj9'.
>>>>>
>>>>>             Here are the processes running on the new arbiter server:
>>>>>             # ps -ef | grep gluster
>>>>>             root????? 3466???? 1? 0 20:13 ???????? 00:00:00
>>>>>             /usr/sbin/glusterfs -s localhost --volfile-id
>>>>>             gluster/glustershd -p
>>>>>             /var/run/gluster/glustershd/glustershd.pid -l
>>>>>             /var/log/glusterfs/glustershd.log -S
>>>>>             /var/run/gluster/24c12b09f93eec8e.socket
>>>>>             --xlator-option
>>>>>             *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412
>>>>>             --process-name glustershd
>>>>>             root????? 6832???? 1? 0 May16 ???????? 00:02:10
>>>>>             /usr/sbin/glusterd -p /var/run/glusterd.pid
>>>>>             --log-level INFO
>>>>>             root???? 17841???? 1? 0 May16 ???????? 00:00:58
>>>>>             /usr/sbin/glusterfs --process-name fuse
>>>>>             --volfile-server=gfs1 --volfile-id=/gvol0 /mnt/glusterfs
>>>>>
>>>>>             Here are the files created on the new arbiter server:
>>>>>             # find /nodirectwritedata/gluster/gvol0 | xargs ls -ald
>>>>>             drwxr-xr-x 3 root root 4096 May 21 20:15
>>>>>             /nodirectwritedata/gluster/gvol0
>>>>>             drw------- 2 root root 4096 May 21 20:15
>>>>>             /nodirectwritedata/gluster/gvol0/.glusterfs
>>>>>
>>>>>             Thank you for your help!
>>>>>
>>>>>
>>>>>             On Tue, 21 May 2019 at 00:10, Sanju Rakonde
>>>>>             <srakonde at redhat.com <mailto:srakonde at redhat.com>> wrote:
>>>>>
>>>>>                 David,
>>>>>
>>>>>                 can you please attach glusterd.logs? As the error
>>>>>                 message says, Commit failed on the arbitar node,
>>>>>                 we might be able to find some issue on that node.
>>>>>
>>>>>                 On Mon, May 20, 2019 at 10:10 AM Nithya
>>>>>                 Balachandran <nbalacha at redhat.com
>>>>>                 <mailto:nbalacha at redhat.com>> wrote:
>>>>>
>>>>>
>>>>>
>>>>>                     On Fri, 17 May 2019 at 06:01, David Cunningham
>>>>>                     <dcunningham at voisonics.com
>>>>>                     <mailto:dcunningham at voisonics.com>> wrote:
>>>>>
>>>>>                         Hello,
>>>>>
>>>>>                         We're adding an arbiter node to an
>>>>>                         existing volume and having an issue. Can
>>>>>                         anyone help? The root cause error appears
>>>>>                         to be
>>>>>                         "00000000-0000-0000-0000-000000000001:
>>>>>                         failed to resolve (Transport endpoint is
>>>>>                         not connected)", as below.
>>>>>
>>>>>                         We are running glusterfs 5.6.1. Thanks in
>>>>>                         advance for any assistance!
>>>>>
>>>>>                         On existing node gfs1, trying to add new
>>>>>                         arbiter node gfs3:
>>>>>
>>>>>                         # gluster volume add-brick gvol0 replica 3
>>>>>                         arbiter 1
>>>>>                         gfs3:/nodirectwritedata/gluster/gvol0
>>>>>                         volume add-brick: failed: Commit failed on
>>>>>                         gfs3. Please check log file for details.
>>>>>
>>>>>
>>>>>                     This looks like a glusterd issue. Please check
>>>>>                     the glusterd logs for more info.
>>>>>                     Adding the glusterd dev to this thread. Sanju,
>>>>>                     can you take a look?
>>>>>                     Regards,
>>>>>                     Nithya
>>>>>
>>>>>
>>>>>                         On new node gfs3 in gvol0-add-brick-mount.log:
>>>>>
>>>>>                         [2019-05-17 01:20:22.689721] I
>>>>>                         [fuse-bridge.c:4267:fuse_init]
>>>>>                         0-glusterfs-fuse: FUSE inited with
>>>>>                         protocol versions: glusterfs 7.24 kernel 7.22
>>>>>                         [2019-05-17 01:20:22.689778] I
>>>>>                         [fuse-bridge.c:4878:fuse_graph_sync]
>>>>>                         0-fuse: switched to graph 0
>>>>>                         [2019-05-17 01:20:22.694897] E
>>>>>                         [fuse-bridge.c:4336:fuse_first_lookup]
>>>>>                         0-fuse: first lookup on root failed
>>>>>                         (Transport endpoint is not connected)
>>>>>                         [2019-05-17 01:20:22.699770] W
>>>>>                         [fuse-resolve.c:127:fuse_resolve_gfid_cbk]
>>>>>                         0-fuse:
>>>>>                         00000000-0000-0000-0000-000000000001:
>>>>>                         failed to resolve (Transport endpoint is
>>>>>                         not connected)
>>>>>                         [2019-05-17 01:20:22.699834] W
>>>>>                         [fuse-bridge.c:3294:fuse_setxattr_resume]
>>>>>                         0-glusterfs-fuse: 2: SETXATTR
>>>>>                         00000000-0000-0000-0000-000000000001/1
>>>>>                         (trusted.add-brick) resolution failed
>>>>>                         [2019-05-17 01:20:22.715656] I
>>>>>                         [fuse-bridge.c:5144:fuse_thread_proc]
>>>>>                         0-fuse: initating unmount of /tmp/mntQAtu3f
>>>>>                         [2019-05-17 01:20:22.715865] W
>>>>>                         [glusterfsd.c:1500:cleanup_and_exit]
>>>>>                         (-->/lib64/libpthread.so.0(+0x7dd5)
>>>>>                         [0x7fb223bf6dd5]
>>>>>                         -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5)
>>>>>                         [0x560886581e75]
>>>>>                         -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b)
>>>>>                         [0x560886581ceb] ) 0-: received signum
>>>>>                         (15), shutting down
>>>>>                         [2019-05-17 01:20:22.715926] I
>>>>>                         [fuse-bridge.c:5914:fini] 0-fuse:
>>>>>                         Unmounting '/tmp/mntQAtu3f'.
>>>>>                         [2019-05-17 01:20:22.715953] I
>>>>>                         [fuse-bridge.c:5919:fini] 0-fuse: Closing
>>>>>                         fuse connection to '/tmp/mntQAtu3f'.
>>>>>
>>>>>                         Processes running on new node gfs3:
>>>>>
>>>>>                         # ps -ef | grep gluster
>>>>>                         root 6832???? 1? 0 20:17 ? 00:00:00
>>>>>                         /usr/sbin/glusterd -p
>>>>>                         /var/run/glusterd.pid --log-level INFO
>>>>>                         root 15799???? 1? 0 20:17 ? 00:00:00
>>>>>                         /usr/sbin/glusterfs -s localhost
>>>>>                         --volfile-id gluster/glustershd -p
>>>>>                         /var/run/gluster/glustershd/glustershd.pid
>>>>>                         -l /var/log/glusterfs/glustershd.log -S
>>>>>                         /var/run/gluster/24c12b09f93eec8e.socket
>>>>>                         --xlator-option
>>>>>                         *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412
>>>>>                         --process-name glustershd
>>>>>                         root???? 16856 16735? 0 21:21 pts/0
>>>>>                         00:00:00 grep --color=auto gluster
>>>>>
>>>>>                         -- 
>>>>>                         David Cunningham, Voisonics Limited
>>>>>                         http://voisonics.com/
>>>>>                         USA: +1 213 221 1092
>>>>>                         New Zealand: +64 (0)28 2558 3782
>>>>>                         _______________________________________________
>>>>>                         Gluster-users mailing list
>>>>>                         Gluster-users at gluster.org
>>>>>                         <mailto:Gluster-users at gluster.org>
>>>>>                         https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>
>>>>>
>>>>>
>>>>>                 -- 
>>>>>                 Thanks,
>>>>>                 Sanju
>>>>>
>>>>>
>>>>>
>>>>>             -- 
>>>>>             David Cunningham, Voisonics Limited
>>>>>             http://voisonics.com/
>>>>>             USA: +1 213 221 1092
>>>>>             New Zealand: +64 (0)28 2558 3782
>>>>>
>>>>>             _______________________________________________
>>>>>             Gluster-users mailing list
>>>>>             Gluster-users at gluster.org  <mailto:Gluster-users at gluster.org>
>>>>>             https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>>
>>>>
>>>>         -- 
>>>>         David Cunningham, Voisonics Limited
>>>>         http://voisonics.com/
>>>>         USA: +1 213 221 1092
>>>>         New Zealand: +64 (0)28 2558 3782
>>>
>>>
>>>
>>>     -- 
>>>     David Cunningham, Voisonics Limited
>>>     http://voisonics.com/
>>>     USA: +1 213 221 1092
>>>     New Zealand: +64 (0)28 2558 3782
>
>
>
> -- 
> David Cunningham, Voisonics Limited
> http://voisonics.com/
> USA: +1 213 221 1092
> New Zealand: +64 (0)28 2558 3782
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190524/099fc642/attachment.html>

From ravishankar at redhat.com  Fri May 24 13:59:55 2019
From: ravishankar at redhat.com (Ravishankar N)
Date: Fri, 24 May 2019 19:29:55 +0530
Subject: [Gluster-users] Does replace-brick migrate data?
In-Reply-To: <CAKKdN4VP6ome2h2_iXaRVdr+WNp7K6PssCBpOffOn547P9R3Dw@mail.gmail.com>
References: <CAKKdN4VP6ome2h2_iXaRVdr+WNp7K6PssCBpOffOn547P9R3Dw@mail.gmail.com>
Message-ID: <32e26faf-e5c0-b944-2a32-c9eae408b146@redhat.com>


On 23/05/19 2:40 AM, Alan Orth wrote:
> Dear list,
>
> I seem to have gotten into a tricky situation. Today I brought up a 
> shiny new server with new disk arrays and attempted to replace one 
> brick of a replica 2 distribute/replicate volume on an older server 
> using the `replace-brick` command:
>
> # gluster volume replace-brick homes wingu0:/mnt/gluster/homes 
> wingu06:/data/glusterfs/sdb/homes commit force
>
> The command was successful and I see the new brick in the output of 
> `gluster volume info`. The problem is that Gluster doesn't seem to be 
> migrating the data,

`replace-brick` definitely must heal (not migrate) the data. In your 
case, data must have been healed from Brick-4 to the replaced Brick-3. 
Are there any errors in the self-heal daemon logs of Brick-4's node? 
Does Brick-4 have pending AFR xattrs blaming Brick-3? The doc is a bit 
out of date. replace-brick command internally does all the setfattr 
steps that are mentioned in the doc.

-Ravi


> and now the original brick that I replaced is no longer part of the 
> volume (and a few terabytes of data are just sitting on the old brick):
>
> # gluster volume info homes | grep -E "Brick[0-9]:"
> Brick1: wingu4:/mnt/gluster/homes
> Brick2: wingu3:/mnt/gluster/homes
> Brick3: wingu06:/data/glusterfs/sdb/homes
> Brick4: wingu05:/data/glusterfs/sdb/homes
> Brick5: wingu05:/data/glusterfs/sdc/homes
> Brick6: wingu06:/data/glusterfs/sdc/homes
>
> I see the Gluster docs have a more complicated procedure for replacing 
> bricks that involves getfattr/setfattr?. How can I tell Gluster about 
> the old brick? I see that I have a backup of the old volfile thanks to 
> yum's rpmsave function if that helps.
>
> We are using Gluster 5.6 on CentOS 7. Thank you for any advice you can 
> give.
>
> ? 
> https://docs.gluster.org/en/latest/Administrator%20Guide/Managing%20Volumes/#replace-faulty-brick
>
> -- 
> Alan Orth
> alan.orth at gmail.com <mailto:alan.orth at gmail.com>
> https://picturingjordan.com
> https://englishbulgaria.net
> https://mjanja.ch
> "In heaven all the interesting people are missing." ?Friedrich Nietzsche
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190524/d722066b/attachment.html>

From ravishankar at redhat.com  Fri May 24 14:03:03 2019
From: ravishankar at redhat.com (Ravishankar N)
Date: Fri, 24 May 2019 19:33:03 +0530
Subject: [Gluster-users] remove-brick failure on distributed with 5.6
In-Reply-To: <080701d5118f$574a72b0$05df5810$@thinkhuge.net>
References: <080701d5118f$574a72b0$05df5810$@thinkhuge.net>
Message-ID: <281bd623-f8e9-400b-bdcf-00aee1cdcf95@redhat.com>

Adding a few DHT folks for some possible suggestions.

-Ravi

On 23/05/19 11:15 PM, brandon at thinkhuge.net wrote:
>
> Does anyone know what should be done on a glusterfs v5.6 "gluster 
> volume remove-brick" operation that fails?? I'm trying to remove 1 of 
> 8 distributed smaller nodes for replacement with larger node.
>
> The "gluster volume remove-brick ... status" command reports status 
> failed and failures = "3"
>
> cat /var/log/glusterfs/volbackups-rebalance.log
>
> ...
>
> [2019-05-23 16:43:37.442283] I [MSGID: 109028] 
> [dht-rebalance.c:5070:gf_defrag_status_get] 0-volbackups-dht: 
> Rebalance is failed. Time taken is 545.00 secs
>
> All servers are confirmed in good communications and updated and 
> freshly rebooted and retried the remove-brick few times with fail each 
> time
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190524/fe8d4d6c/attachment.html>

From nbalacha at redhat.com  Sat May 25 05:00:13 2019
From: nbalacha at redhat.com (Nithya Balachandran)
Date: Sat, 25 May 2019 10:30:13 +0530
Subject: [Gluster-users] remove-brick failure on distributed with 5.6
In-Reply-To: <281bd623-f8e9-400b-bdcf-00aee1cdcf95@redhat.com>
References: <080701d5118f$574a72b0$05df5810$@thinkhuge.net>
	<281bd623-f8e9-400b-bdcf-00aee1cdcf95@redhat.com>
Message-ID: <CAOUCJ=gxDrKPnJAP7KDgCk4_GUtQix+tu2n52OQGLE+MYVgcFw@mail.gmail.com>

Hi Brandon,

Please send the following:

1. the gluster volume info
2. Information about which brick was removed
3. The rebalance log file for all nodes hosting removed bricks.

Regards,
Nithya


On Fri, 24 May 2019 at 19:33, Ravishankar N <ravishankar at redhat.com> wrote:

> Adding a few DHT folks for some possible suggestions.
>
> -Ravi
> On 23/05/19 11:15 PM, brandon at thinkhuge.net wrote:
>
> Does anyone know what should be done on a glusterfs v5.6 "gluster volume
> remove-brick" operation that fails?  I'm trying to remove 1 of 8
> distributed smaller nodes for replacement with larger node.
>
>
>
> The "gluster volume remove-brick ... status" command reports status failed
> and failures = "3"
>
>
>
> cat /var/log/glusterfs/volbackups-rebalance.log
>
> ...
>
> [2019-05-23 16:43:37.442283] I [MSGID: 109028]
> [dht-rebalance.c:5070:gf_defrag_status_get] 0-volbackups-dht: Rebalance is
> failed. Time taken is 545.00 secs
>
>
>
> All servers are confirmed in good communications and updated and freshly
> rebooted and retried the remove-brick few times with fail each time
>
>
>
> _______________________________________________
> Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190525/1de1d178/attachment.html>

From hunter86_bg at yahoo.com  Sun May 26 13:38:13 2019
From: hunter86_bg at yahoo.com (Strahil Nikolov)
Date: Sun, 26 May 2019 13:38:13 +0000 (UTC)
Subject: [Gluster-users] [ovirt-users] Re: Single instance scaleup.
In-Reply-To: <CANMQpRYU72yHfcbwOcJGGEW1tGztvZC87+pebytabTamO_svbg@mail.gmail.com>
References: <v8rxyqcbniy6jojgltf9t607.1558864970597@email.android.com>
	<CANMQpRYU72yHfcbwOcJGGEW1tGztvZC87+pebytabTamO_svbg@mail.gmail.com>
Message-ID: <626088321.4969320.1558877893362@mail.yahoo.com>

 Yeah,it seems different from the docs.I'm adding the gluster users list ,as they are more experienced into that.
@Gluster-users,
can you provide some hint how to add aditional replicas to the below volumes , so they become 'replica 2 arbiter 1' or 'replica 3' type volumes ?

Best Regards,Strahil Nikolov

    ? ??????, 26 ??? 2019 ?., 15:16:18 ?. ???????+3, Leo David <leoalex at gmail.com> ??????:  
 
 Thank you Strahil,The engine and ssd-samsung are distributed...So these are the ones that I need to have replicated accross new nodes.I am not very sure about the procedure to accomplish this.Thanks,
Leo
On Sun, May 26, 2019, 13:04 Strahil <hunter86_bg at yahoo.com> wrote:


Hi Leo,
As you do not have a distributed volume , you can easily switch to replica 2 arbiter 1 or replica 3 volumes.

You can use the following for adding the bricks:

https://access.redhat.com/documentation/en-US/Red_Hat_Storage/2.1/html/Administration_Guide/Expanding_Volumes.html

Best Regards,
Strahil Nikoliv
On May 26, 2019 10:54, Leo David <leoalex at gmail.com> wrote:

Hi Stahil,Thank you so much for yout input !
?gluster volume info

Volume Name: engine
Type: Distribute
Volume ID: d7449fc2-cc35-4f80-a776-68e4a3dbd7e1
Status: Started
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: 192.168.80.191:/gluster_bricks/engine/engine
Options Reconfigured:
nfs.disable: on
transport.address-family: inet
storage.owner-uid: 36
storage.owner-gid: 36
features.shard: on
performance.low-prio-threads: 32
performance.strict-o-direct: off
network.remote-dio: off
network.ping-timeout: 30
user.cifs: off
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
cluster.eager-lock: enableVolume Name: ssd-samsung
Type: Distribute
Volume ID: 76576cc6-220b-4651-952d-99846178a19e
Status: Started
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: 192.168.80.191:/gluster_bricks/sdc/data
Options Reconfigured:
cluster.eager-lock: enable
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
user.cifs: off
network.ping-timeout: 30
network.remote-dio: off
performance.strict-o-direct: on
performance.low-prio-threads: 32
features.shard: on
storage.owner-gid: 36
storage.owner-uid: 36
transport.address-family: inet
nfs.disable: on
The other two hosts will be 192.168.80.192/193??- this is gluster dedicated network over 10GB sfp+ switch.- host 2?wil have identical harware configuration with host 1 ( each disk is actually a raid0 array )- host 3 has:?? -? 1 ssd for OS?? -??1 ssd - for adding to engine volume in a full replica 3?? -? 2 ssd's in a raid 1 array?to be added?as arbiter for the data volume ( ssd-samsung )So the plan is to have "engine"? scaled in a full replica 3,? and "ssd-samsung" scalled in a replica 3 arbitrated.


On Sun, May 26, 2019 at 10:34 AM Strahil <hunter86_bg at yahoo.com> wrote:


Hi Leo,

Gluster is quite smart, but in order to provide any hints , can you provide output of 'gluster volume info <glustervol>'.
If you have 2 more systems , keep in mind that it is best to mirror the storage on the second replica (2 disks on 1 machine -> 2 disks on the new machine), while for the arbiter this is not neccessary.

What is your network and NICs ? Based on my experience , I can recommend at least 10 gbit/s? interfase(s).

Best Regards,
Strahil Nikolov
On May 26, 2019 07:52, Leo David <leoalex at gmail.com> wrote:

Hello Everyone,Can someone help me to clarify this ?I have a single-node 4.2.8 installation ( only two gluster storage domains - distributed? single drive volumes ). Now I just got two identintical servers and I would like to go for a 3 nodes bundle.Is it possible ( after joining the new nodes to the cluster ) to expand the existing volumes across the new nodes and change them to replica 3 arbitrated ?If so, could you share with me what would it be the procedure ?Thank you very much !
Leo


-- 
Best regards, Leo David

  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190526/0c1047b3/attachment.html>

From dcunningham at voisonics.com  Mon May 27 00:23:59 2019
From: dcunningham at voisonics.com (David Cunningham)
Date: Mon, 27 May 2019 12:23:59 +1200
Subject: [Gluster-users] add-brick: failed: Commit failed
In-Reply-To: <2132c601-86a4-2b85-5d72-7b9926f890a2@redhat.com>
References: <CAHGbP-9UtXqtK8=zsf=V+HghhLmsyONSfkt6SZAYQOtFezHBJw@mail.gmail.com>
	<CAOUCJ=hodD2sW3Maa9U8t5xNueuf8MaW6=KALE15+TPekEAA8g@mail.gmail.com>
	<CABj3WN2vDQwLfU673SAr5g6JLNwLMjeW8D46B_Fx+rFv2KB5LQ@mail.gmail.com>
	<CAHGbP-8dhkODZqREoxCWtpFqk+ZrW+EkyWfwgWNqRXoJTS6frg@mail.gmail.com>
	<924b8cb6-5a61-3a7f-1591-07ffe0d80a24@redhat.com>
	<CAHGbP-_A6JogdzUOm7UcGd8V8pZbF-veR0g_rsMvC7Ju0sCqMw@mail.gmail.com>
	<47a6c5fa-4304-4680-d63f-99ecd1e43c4c@redhat.com>
	<CAHGbP-_wr7TOHHCjU4M0gwK4_HvS_1mFm4pg+iRRNOsoo0npFw@mail.gmail.com>
	<764773c5-38d4-e427-d699-3192bf9a1005@redhat.com>
	<fbd223ea-2742-142c-52cc-a7223076fbde@redhat.com>
	<CAHGbP-_EmbVZJ57iJEt05Lt7igSF4HF5Vp+RUrF8t4x7uAY+MA@mail.gmail.com>
	<2132c601-86a4-2b85-5d72-7b9926f890a2@redhat.com>
Message-ID: <CAHGbP-8kYPUd--ibjOmK9CsG8U03DQHxVznUanfS7tiPBTYcBQ@mail.gmail.com>

Hi Ravi,

Thank you, that seems to have resolved the issue. After doing this,
"gluster volume status all" showed gfs3 as online with a port and pid,
however "gluster volume status all" didn't show any sync activity
happening. At this point we loaded gfs3 with new firewall rules which
explicitly allowed access from gfs1 and gfs2, and then "gluster volume
status all" showed the file syncing. The gfs3 server should have allow
access from gfs1 and gfs2 anyway by default, however I now believe that
perhaps this wasn't the case, and maybe it was a firewall issue all along.

Thanks for all your help.


On Sat, 25 May 2019 at 01:49, Ravishankar N <ravishankar at redhat.com> wrote:

> Hi David,
> On 23/05/19 3:54 AM, David Cunningham wrote:
>
> Hi Ravi,
>
> Please see the log attached.
>
> When I  grep -E "Connected to |disconnected from"
> gvol0-add-brick-mount.log,  I don't see a "Connected to gvol0-client-1".
> It looks like this temporary mount is not able to connect to the 2nd brick,
> which is why the lookup is failing due to lack of quorum.
>
> The output of "gluster volume status" is as follows. Should there be
> something listening on gfs3? I'm not sure whether it having TCP Port and
> Pid as N/A is a symptom or cause. Thank you.
>
> # gluster volume status
> Status of volume: gvol0
> Gluster process                             TCP Port  RDMA Port  Online
> Pid
>
> ------------------------------------------------------------------------------
> Brick gfs1:/nodirectwritedata/gluster/gvol0 49152     0          Y
> 7706
> Brick gfs2:/nodirectwritedata/gluster/gvol0 49152     0          Y
> 7624
> Brick gfs3:/nodirectwritedata/gluster/gvol0 N/A       N/A        N
> N/A
>
> Can you see if the following steps help?
>
> 1. Do a `setfattr -n trusted.afr.gvol0-client-2 -v
> 0x000000000000000100000001 /nodirectwritedata/gluster/gvol0` on *both*
> gfs1 and gfs2.
>
> 2. 'gluster volume start gvol0 force`
>
> 3. Check if Brick-3 now comes online with a valid TCP port and PID. If it
> doesn't, check the brick log under /var/log/glusterfs/bricks on gfs3 to see
> why.
>
> Thanks,
>
> Ravi
>
>
> Self-heal Daemon on localhost               N/A       N/A        Y
> 19853
> Self-heal Daemon on gfs1                    N/A       N/A        Y
> 28600
> Self-heal Daemon on gfs2                    N/A       N/A        Y
> 17614
>
> Task Status of Volume gvol0
>
> ------------------------------------------------------------------------------
> There are no active volume tasks
>
>
> On Wed, 22 May 2019 at 18:06, Ravishankar N <ravishankar at redhat.com>
> wrote:
>
>> If you are trying this again, please 'gluster volume set $volname
>> client-log-level DEBUG`before attempting the add-brick and attach the
>> gvol0-add-brick-mount.log here. After that, you can change the
>> client-log-level back to INFO.
>>
>> -Ravi
>> On 22/05/19 11:32 AM, Ravishankar N wrote:
>>
>>
>> On 22/05/19 11:23 AM, David Cunningham wrote:
>>
>> Hi Ravi,
>>
>> I'd already done exactly that before, where step 3 was a simple 'rm -rf
>> /nodirectwritedata/gluster/gvol0'. Have you another suggestion on what the
>> cleanup or reformat should be?
>>
>> `rm -rf /nodirectwritedata/gluster/gvol0` does look okay to me David.
>> Basically, '/nodirectwritedata/gluster/gvol0' must be empty and must not
>> have any extended attributes set on it. Why fuse_first_lookup() is failing
>> is a bit of a mystery to me at this point. :-(
>> Regards,
>> Ravi
>>
>>
>> Thank you.
>>
>>
>> On Wed, 22 May 2019 at 13:56, Ravishankar N <ravishankar at redhat.com>
>> wrote:
>>
>>> Hmm, so the volume info seems to indicate that the add-brick was
>>> successful but the gfid xattr is missing on the new brick (as are the
>>> actual files, barring the .glusterfs folder, according to your previous
>>> mail).
>>>
>>> Do you want to try removing and adding it again?
>>>
>>> 1. `gluster volume remove-brick gvol0 replica 2
>>> gfs3:/nodirectwritedata/gluster/gvol0 force` from gfs1
>>>
>>> 2. Check that gluster volume info is now back to a 1x2 volume on all
>>> nodes and `gluster peer status` is  connected on all nodes.
>>>
>>> 3. Cleanup or reformat '/nodirectwritedata/gluster/gvol0' on gfs3.
>>>
>>> 4. `gluster volume add-brick gvol0 replica 3 arbiter 1
>>> gfs3:/nodirectwritedata/gluster/gvol0` from gfs1.
>>>
>>> 5. Check that the files are getting healed on to the new brick.
>>> Thanks,
>>> Ravi
>>> On 22/05/19 6:50 AM, David Cunningham wrote:
>>>
>>> Hi Ravi,
>>>
>>> Certainly. On the existing two nodes:
>>>
>>> gfs1 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0
>>> getfattr: Removing leading '/' from absolute path names
>>> # file: nodirectwritedata/gluster/gvol0
>>> trusted.afr.dirty=0x000000000000000000000000
>>> trusted.afr.gvol0-client-2=0x000000000000000000000000
>>> trusted.gfid=0x00000000000000000000000000000001
>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>> trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6
>>>
>>> gfs2 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0
>>> getfattr: Removing leading '/' from absolute path names
>>> # file: nodirectwritedata/gluster/gvol0
>>> trusted.afr.dirty=0x000000000000000000000000
>>> trusted.afr.gvol0-client-0=0x000000000000000000000000
>>> trusted.afr.gvol0-client-2=0x000000000000000000000000
>>> trusted.gfid=0x00000000000000000000000000000001
>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff
>>> trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6
>>>
>>> On the new node:
>>>
>>> gfs3 # getfattr -d -m. -e hex /nodirectwritedata/gluster/gvol0
>>> getfattr: Removing leading '/' from absolute path names
>>> # file: nodirectwritedata/gluster/gvol0
>>> trusted.afr.dirty=0x000000000000000000000001
>>> trusted.glusterfs.volume-id=0xfb5af69e1c3e41648b23c1d7bec9b1b6
>>>
>>> Output of "gluster volume info" is the same on all 3 nodes and is:
>>>
>>> # gluster volume info
>>>
>>> Volume Name: gvol0
>>> Type: Replicate
>>> Volume ID: fb5af69e-1c3e-4164-8b23-c1d7bec9b1b6
>>> Status: Started
>>> Snapshot Count: 0
>>> Number of Bricks: 1 x (2 + 1) = 3
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: gfs1:/nodirectwritedata/gluster/gvol0
>>> Brick2: gfs2:/nodirectwritedata/gluster/gvol0
>>> Brick3: gfs3:/nodirectwritedata/gluster/gvol0 (arbiter)
>>> Options Reconfigured:
>>> performance.client-io-threads: off
>>> nfs.disable: on
>>> transport.address-family: inet
>>>
>>>
>>> On Wed, 22 May 2019 at 12:43, Ravishankar N <ravishankar at redhat.com>
>>> wrote:
>>>
>>>> Hi David,
>>>> Could you provide the `getfattr -d -m. -e hex
>>>> /nodirectwritedata/gluster/gvol0` output of all bricks and the output of
>>>> `gluster volume info`?
>>>>
>>>> Thanks,
>>>> Ravi
>>>> On 22/05/19 4:57 AM, David Cunningham wrote:
>>>>
>>>> Hi Sanju,
>>>>
>>>> Here's what glusterd.log says on the new arbiter server when trying to
>>>> add the node:
>>>>
>>>> [2019-05-22 00:15:05.963059] I [run.c:242:runner_log]
>>>> (-->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0x3b2cd)
>>>> [0x7fe4ca9102cd]
>>>> -->/usr/lib64/glusterfs/5.6/xlator/mgmt/glusterd.so(+0xe6b85)
>>>> [0x7fe4ca9bbb85] -->/lib64/libglusterfs.so.0(runner_log+0x115)
>>>> [0x7fe4d5ecc955] ) 0-management: Ran script:
>>>> /var/lib/glusterd/hooks/1/add-brick/pre/S28Quota-enable-root-xattr-heal.sh
>>>> --volname=gvol0 --version=1 --volume-op=add-brick
>>>> --gd-workdir=/var/lib/glusterd
>>>> [2019-05-22 00:15:05.963177] I [MSGID: 106578]
>>>> [glusterd-brick-ops.c:1355:glusterd_op_perform_add_bricks] 0-management:
>>>> replica-count is set 3
>>>> [2019-05-22 00:15:05.963228] I [MSGID: 106578]
>>>> [glusterd-brick-ops.c:1360:glusterd_op_perform_add_bricks] 0-management:
>>>> arbiter-count is set 1
>>>> [2019-05-22 00:15:05.963257] I [MSGID: 106578]
>>>> [glusterd-brick-ops.c:1364:glusterd_op_perform_add_bricks] 0-management:
>>>> type is set 0, need to change it
>>>> [2019-05-22 00:15:17.015268] E [MSGID: 106053]
>>>> [glusterd-utils.c:13942:glusterd_handle_replicate_brick_ops] 0-management:
>>>> Failed to set extended attribute trusted.add-brick : Transport endpoint is
>>>> not connected [Transport endpoint is not connected]
>>>> [2019-05-22 00:15:17.036479] E [MSGID: 106073]
>>>> [glusterd-brick-ops.c:2595:glusterd_op_add_brick] 0-glusterd: Unable to add
>>>> bricks
>>>> [2019-05-22 00:15:17.036595] E [MSGID: 106122]
>>>> [glusterd-mgmt.c:299:gd_mgmt_v3_commit_fn] 0-management: Add-brick commit
>>>> failed.
>>>> [2019-05-22 00:15:17.036710] E [MSGID: 106122]
>>>> [glusterd-mgmt-handler.c:594:glusterd_handle_commit_fn] 0-management:
>>>> commit failed on operation Add brick
>>>>
>>>> As before gvol0-add-brick-mount.log said:
>>>>
>>>> [2019-05-22 00:15:17.005695] I [fuse-bridge.c:4267:fuse_init]
>>>> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel
>>>> 7.22
>>>> [2019-05-22 00:15:17.005749] I [fuse-bridge.c:4878:fuse_graph_sync]
>>>> 0-fuse: switched to graph 0
>>>> [2019-05-22 00:15:17.010101] E [fuse-bridge.c:4336:fuse_first_lookup]
>>>> 0-fuse: first lookup on root failed (Transport endpoint is not connected)
>>>> [2019-05-22 00:15:17.014217] W [fuse-bridge.c:897:fuse_attr_cbk]
>>>> 0-glusterfs-fuse: 2: LOOKUP() / => -1 (Transport endpoint is not connected)
>>>> [2019-05-22 00:15:17.015097] W
>>>> [fuse-resolve.c:127:fuse_resolve_gfid_cbk] 0-fuse:
>>>> 00000000-0000-0000-0000-000000000001: failed to resolve (Transport endpoint
>>>> is not connected)
>>>> [2019-05-22 00:15:17.015158] W
>>>> [fuse-bridge.c:3294:fuse_setxattr_resume] 0-glusterfs-fuse: 3: SETXATTR
>>>> 00000000-0000-0000-0000-000000000001/1 (trusted.add-brick) resolution failed
>>>> [2019-05-22 00:15:17.035636] I [fuse-bridge.c:5144:fuse_thread_proc]
>>>> 0-fuse: initating unmount of /tmp/mntYGNbj9
>>>> [2019-05-22 00:15:17.035854] W [glusterfsd.c:1500:cleanup_and_exit]
>>>> (-->/lib64/libpthread.so.0(+0x7dd5) [0x7f7745ccedd5]
>>>> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55c81b63de75]
>>>> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55c81b63dceb] ) 0-:
>>>> received signum (15), shutting down
>>>> [2019-05-22 00:15:17.035942] I [fuse-bridge.c:5914:fini] 0-fuse:
>>>> Unmounting '/tmp/mntYGNbj9'.
>>>> [2019-05-22 00:15:17.035966] I [fuse-bridge.c:5919:fini] 0-fuse:
>>>> Closing fuse connection to '/tmp/mntYGNbj9'.
>>>>
>>>> Here are the processes running on the new arbiter server:
>>>> # ps -ef | grep gluster
>>>> root      3466     1  0 20:13 ?        00:00:00 /usr/sbin/glusterfs -s
>>>> localhost --volfile-id gluster/glustershd -p
>>>> /var/run/gluster/glustershd/glustershd.pid -l
>>>> /var/log/glusterfs/glustershd.log -S
>>>> /var/run/gluster/24c12b09f93eec8e.socket --xlator-option
>>>> *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412 --process-name
>>>> glustershd
>>>> root      6832     1  0 May16 ?        00:02:10 /usr/sbin/glusterd -p
>>>> /var/run/glusterd.pid --log-level INFO
>>>> root     17841     1  0 May16 ?        00:00:58 /usr/sbin/glusterfs
>>>> --process-name fuse --volfile-server=gfs1 --volfile-id=/gvol0 /mnt/glusterfs
>>>>
>>>> Here are the files created on the new arbiter server:
>>>> # find /nodirectwritedata/gluster/gvol0 | xargs ls -ald
>>>> drwxr-xr-x 3 root root 4096 May 21 20:15
>>>> /nodirectwritedata/gluster/gvol0
>>>> drw------- 2 root root 4096 May 21 20:15
>>>> /nodirectwritedata/gluster/gvol0/.glusterfs
>>>>
>>>> Thank you for your help!
>>>>
>>>>
>>>> On Tue, 21 May 2019 at 00:10, Sanju Rakonde <srakonde at redhat.com>
>>>> wrote:
>>>>
>>>>> David,
>>>>>
>>>>> can you please attach glusterd.logs? As the error message says, Commit
>>>>> failed on the arbitar node, we might be able to find some issue on that
>>>>> node.
>>>>>
>>>>> On Mon, May 20, 2019 at 10:10 AM Nithya Balachandran <
>>>>> nbalacha at redhat.com> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, 17 May 2019 at 06:01, David Cunningham <
>>>>>> dcunningham at voisonics.com> wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> We're adding an arbiter node to an existing volume and having an
>>>>>>> issue. Can anyone help? The root cause error appears to be
>>>>>>> "00000000-0000-0000-0000-000000000001: failed to resolve (Transport
>>>>>>> endpoint is not connected)", as below.
>>>>>>>
>>>>>>> We are running glusterfs 5.6.1. Thanks in advance for any assistance!
>>>>>>>
>>>>>>> On existing node gfs1, trying to add new arbiter node gfs3:
>>>>>>>
>>>>>>> # gluster volume add-brick gvol0 replica 3 arbiter 1
>>>>>>> gfs3:/nodirectwritedata/gluster/gvol0
>>>>>>> volume add-brick: failed: Commit failed on gfs3. Please check log
>>>>>>> file for details.
>>>>>>>
>>>>>>
>>>>>> This looks like a glusterd issue. Please check the glusterd logs for
>>>>>> more info.
>>>>>> Adding the glusterd dev to this thread. Sanju, can you take a look?
>>>>>>
>>>>>> Regards,
>>>>>> Nithya
>>>>>>
>>>>>>>
>>>>>>> On new node gfs3 in gvol0-add-brick-mount.log:
>>>>>>>
>>>>>>> [2019-05-17 01:20:22.689721] I [fuse-bridge.c:4267:fuse_init]
>>>>>>> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel
>>>>>>> 7.22
>>>>>>> [2019-05-17 01:20:22.689778] I [fuse-bridge.c:4878:fuse_graph_sync]
>>>>>>> 0-fuse: switched to graph 0
>>>>>>> [2019-05-17 01:20:22.694897] E
>>>>>>> [fuse-bridge.c:4336:fuse_first_lookup] 0-fuse: first lookup on root failed
>>>>>>> (Transport endpoint is not connected)
>>>>>>> [2019-05-17 01:20:22.699770] W
>>>>>>> [fuse-resolve.c:127:fuse_resolve_gfid_cbk] 0-fuse:
>>>>>>> 00000000-0000-0000-0000-000000000001: failed to resolve (Transport endpoint
>>>>>>> is not connected)
>>>>>>> [2019-05-17 01:20:22.699834] W
>>>>>>> [fuse-bridge.c:3294:fuse_setxattr_resume] 0-glusterfs-fuse: 2: SETXATTR
>>>>>>> 00000000-0000-0000-0000-000000000001/1 (trusted.add-brick) resolution failed
>>>>>>> [2019-05-17 01:20:22.715656] I [fuse-bridge.c:5144:fuse_thread_proc]
>>>>>>> 0-fuse: initating unmount of /tmp/mntQAtu3f
>>>>>>> [2019-05-17 01:20:22.715865] W [glusterfsd.c:1500:cleanup_and_exit]
>>>>>>> (-->/lib64/libpthread.so.0(+0x7dd5) [0x7fb223bf6dd5]
>>>>>>> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x560886581e75]
>>>>>>> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x560886581ceb] ) 0-:
>>>>>>> received signum (15), shutting down
>>>>>>> [2019-05-17 01:20:22.715926] I [fuse-bridge.c:5914:fini] 0-fuse:
>>>>>>> Unmounting '/tmp/mntQAtu3f'.
>>>>>>> [2019-05-17 01:20:22.715953] I [fuse-bridge.c:5919:fini] 0-fuse:
>>>>>>> Closing fuse connection to '/tmp/mntQAtu3f'.
>>>>>>>
>>>>>>> Processes running on new node gfs3:
>>>>>>>
>>>>>>> # ps -ef | grep gluster
>>>>>>> root      6832     1  0 20:17 ?        00:00:00 /usr/sbin/glusterd
>>>>>>> -p /var/run/glusterd.pid --log-level INFO
>>>>>>> root     15799     1  0 20:17 ?        00:00:00 /usr/sbin/glusterfs
>>>>>>> -s localhost --volfile-id gluster/glustershd -p
>>>>>>> /var/run/gluster/glustershd/glustershd.pid -l
>>>>>>> /var/log/glusterfs/glustershd.log -S
>>>>>>> /var/run/gluster/24c12b09f93eec8e.socket --xlator-option
>>>>>>> *replicate*.node-uuid=2069cfb3-c798-47e3-8cf8-3c584cf7c412 --process-name
>>>>>>> glustershd
>>>>>>> root     16856 16735  0 21:21 pts/0    00:00:00 grep --color=auto
>>>>>>> gluster
>>>>>>>
>>>>>>> --
>>>>>>> David Cunningham, Voisonics Limited
>>>>>>> http://voisonics.com/
>>>>>>> USA: +1 213 221 1092
>>>>>>> New Zealand: +64 (0)28 2558 3782
>>>>>>> _______________________________________________
>>>>>>> Gluster-users mailing list
>>>>>>> Gluster-users at gluster.org
>>>>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Thanks,
>>>>> Sanju
>>>>>
>>>>
>>>>
>>>> --
>>>> David Cunningham, Voisonics Limited
>>>> http://voisonics.com/
>>>> USA: +1 213 221 1092
>>>> New Zealand: +64 (0)28 2558 3782
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users
>>>>
>>>>
>>>
>>> --
>>> David Cunningham, Voisonics Limited
>>> http://voisonics.com/
>>> USA: +1 213 221 1092
>>> New Zealand: +64 (0)28 2558 3782
>>>
>>>
>>
>> --
>> David Cunningham, Voisonics Limited
>> http://voisonics.com/
>> USA: +1 213 221 1092
>> New Zealand: +64 (0)28 2558 3782
>>
>>
>
> --
> David Cunningham, Voisonics Limited
> http://voisonics.com/
> USA: +1 213 221 1092
> New Zealand: +64 (0)28 2558 3782
>
>

-- 
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190527/aa5cf588/attachment.html>

From mauro.tridici at cmcc.it  Mon May 27 08:53:22 2019
From: mauro.tridici at cmcc.it (Mauro Tridici)
Date: Mon, 27 May 2019 10:53:22 +0200
Subject: [Gluster-users] read-only glusterfs volume mount on a virtual
	machine
Message-ID: <7A414EC3-54AD-4FAF-A680-5896A659AB5A@cmcc.it>

Dear Users,

anyone of us could help me to identify the right way to export a gluster volume (or some directories of it) in read-only access to a specific IP address (assigned to a virtual machine)?

At this moment, the volume is already mounted on 3 other client servers (in RW mode) using glusterfs native client.
Now, I would like to add a new read-only client but without specifying RO mode in /etc/fstab file on the client virutal machine: i would like to set RO access mode from gluster server side. Is it possible?

Thank you in advance,
Mauro


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190527/60ccbdfb/attachment-0001.html>

From kontakt at taste-of-it.de  Mon May 27 21:43:08 2019
From: kontakt at taste-of-it.de (Taste-Of-IT)
Date: Mon, 27 May 2019 21:43:08 +0000
Subject: [Gluster-users] remove-brick failure on distributed with 5.6
In-Reply-To: <CAOUCJ=gxDrKPnJAP7KDgCk4_GUtQix+tu2n52OQGLE+MYVgcFw@mail.gmail.com>
References: <080701d5118f$574a72b0$05df5810$@thinkhuge.net>
	<281bd623-f8e9-400b-bdcf-00aee1cdcf95@redhat.com>
	<CAOUCJ=gxDrKPnJAP7KDgCk4_GUtQix+tu2n52OQGLE+MYVgcFw@mail.gmail.com>
Message-ID: <42bb895028f4def62830fd1be0054b52e1b83f32@taste-of-it.de>

Hi,
i had similar problem. In my case the rebalance didnt finish because of not enough free space to migrate the space to other nodes. Reason was 1% Reservation Options which is setup by default in distributed, but i set it to 0%, which was ignored by gluster

Greatings?
Taste


Am 25.05.2019 07:00:13, schrieb Nithya Balachandran:

> Hi Brandon,
> Please send the following:
> 
> 1. the gluster volume info
> > 2. Information about which brick was removed
> > 3. The rebalance log file for all nodes hosting removed bricks.
> 
> Regards,
> > Nithya
> 

> 
> On Fri, 24 May 2019 at 19:33, Ravishankar N <> ravishankar at redhat.com> > wrote:
> > 
  
    
  > > 
    > > Adding a few DHT folks for some possible suggestions.
> > 
    > > -Ravi
> > 
    > > On 23/05/19 11:15 PM,
      > > brandon at thinkhuge.net> >  wrote:
> > 
    > > > 
      
      
      > > > 
        > > > Does anyone
            know what should be done on a glusterfs v5.6 "gluster volume
            remove-brick" operation that fails?? I'm trying to remove 1
            of 8 distributed smaller nodes for replacement with larger
            node. > > > 
        > > > ?> > > 
        > > > The "gluster
            volume remove-brick ... status" command reports status
            failed and failures = "3" > > > 
        > > > ?> > > 
        > > > cat
            /var/log/glusterfs/volbackups-rebalance.log > > > 
        > > > ... > > > 
        > > > [2019-05-23
            16:43:37.442283] I [MSGID: 109028]
            [dht-rebalance.c:5070:gf_defrag_status_get]
            0-volbackups-dht: Rebalance is failed. Time taken is 545.00
            secs > > > 
        > > > ?> > > 
        > > > All servers
            are confirmed in good communications and updated and freshly
            rebooted and retried the remove-brick few times with fail
            each time> > > 
        > > > ?> > > 
      
> > > > > > 
      
> > > _______________________________________________
Gluster-users mailing list
> > > Gluster-users at gluster.org> > > 
> > > https://lists.gluster.org/mailman/listinfo/gluster-users> > > 
    > > 
  
> > > > 


> > 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190527/204493b0/attachment.html>

From alan.orth at gmail.com  Tue May 28 22:29:59 2019
From: alan.orth at gmail.com (Alan Orth)
Date: Wed, 29 May 2019 01:29:59 +0300
Subject: [Gluster-users] Does replace-brick migrate data?
In-Reply-To: <32e26faf-e5c0-b944-2a32-c9eae408b146@redhat.com>
References: <CAKKdN4VP6ome2h2_iXaRVdr+WNp7K6PssCBpOffOn547P9R3Dw@mail.gmail.com>
	<32e26faf-e5c0-b944-2a32-c9eae408b146@redhat.com>
Message-ID: <CAKKdN4VTTmnt_-TbeT7HanMpLM+cMvm3A9_KFxZJ2HKkg3tUVA@mail.gmail.com>

Dear Ravishankar,

I'm not sure if Brick4 had pending AFRs because I don't know what that
means and it's been a few days so I am not sure I would be able to find
that information.

Anyways, after wasting a few days rsyncing the old brick to a new host I
decided to just try to add the old brick back into the volume instead of
bringing it up on the new host. I created a new brick directory on the old
host, moved the old brick's contents into that new directory (minus the
.glusterfs directory), added the new brick to the volume, and then did
Vlad's find/stat trick? from the brick to the FUSE mount point.

The interesting problem I have now is that some files don't appear in the
FUSE mount's directory listings, but I can actually list them directly and
even read them. What could cause that?

Thanks,

?
https://lists.gluster.org/pipermail/gluster-users/2018-February/033584.html

On Fri, May 24, 2019 at 4:59 PM Ravishankar N <ravishankar at redhat.com>
wrote:

>
> On 23/05/19 2:40 AM, Alan Orth wrote:
>
> Dear list,
>
> I seem to have gotten into a tricky situation. Today I brought up a shiny
> new server with new disk arrays and attempted to replace one brick of a
> replica 2 distribute/replicate volume on an older server using the
> `replace-brick` command:
>
> # gluster volume replace-brick homes wingu0:/mnt/gluster/homes
> wingu06:/data/glusterfs/sdb/homes commit force
>
> The command was successful and I see the new brick in the output of
> `gluster volume info`. The problem is that Gluster doesn't seem to be
> migrating the data,
>
> `replace-brick` definitely must heal (not migrate) the data. In your case,
> data must have been healed from Brick-4 to the replaced Brick-3. Are there
> any errors in the self-heal daemon logs of Brick-4's node? Does Brick-4
> have pending AFR xattrs blaming Brick-3? The doc is a bit out of date.
> replace-brick command internally does all the setfattr steps that are
> mentioned in the doc.
>
> -Ravi
>
>
> and now the original brick that I replaced is no longer part of the volume
> (and a few terabytes of data are just sitting on the old brick):
>
> # gluster volume info homes | grep -E "Brick[0-9]:"
> Brick1: wingu4:/mnt/gluster/homes
> Brick2: wingu3:/mnt/gluster/homes
> Brick3: wingu06:/data/glusterfs/sdb/homes
> Brick4: wingu05:/data/glusterfs/sdb/homes
> Brick5: wingu05:/data/glusterfs/sdc/homes
> Brick6: wingu06:/data/glusterfs/sdc/homes
>
> I see the Gluster docs have a more complicated procedure for replacing
> bricks that involves getfattr/setfattr?. How can I tell Gluster about the
> old brick? I see that I have a backup of the old volfile thanks to yum's
> rpmsave function if that helps.
>
> We are using Gluster 5.6 on CentOS 7. Thank you for any advice you can
> give.
>
> ?
> https://docs.gluster.org/en/latest/Administrator%20Guide/Managing%20Volumes/#replace-faulty-brick
>
> --
> Alan Orth
> alan.orth at gmail.com
> https://picturingjordan.com
> https://englishbulgaria.net
> https://mjanja.ch
> "In heaven all the interesting people are missing." ?Friedrich Nietzsche
>
> _______________________________________________
> Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users
>
>

-- 
Alan Orth
alan.orth at gmail.com
https://picturingjordan.com
https://englishbulgaria.net
https://mjanja.ch
"In heaven all the interesting people are missing." ?Friedrich Nietzsche
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190529/017cf9f0/attachment.html>

From dcunningham at voisonics.com  Wed May 29 00:51:48 2019
From: dcunningham at voisonics.com (David Cunningham)
Date: Wed, 29 May 2019 12:51:48 +1200
Subject: [Gluster-users] Transport endpoint is not connected
Message-ID: <CAHGbP-__to-x+sxEbjayC5zCEXc2fy2eS3_pRsVTHMeSre9NLw@mail.gmail.com>

Hello all,

We are seeing a strange issue where a new node gfs3 shows another node gfs2
as not connected on the "gluster volume heal" info:

[root at gfs3 bricks]# gluster volume heal gvol0 info
Brick gfs1:/nodirectwritedata/gluster/gvol0
Status: Connected
Number of entries: 0

Brick gfs2:/nodirectwritedata/gluster/gvol0
Status: Transport endpoint is not connected
Number of entries: -

Brick gfs3:/nodirectwritedata/gluster/gvol0
Status: Connected
Number of entries: 0


However it does show the same node connected on "gluster peer status". Does
anyone know why this would be?

[root at gfs3 bricks]# gluster peer status
Number of Peers: 2

Hostname: gfs2
Uuid: 91863102-23a8-43e1-b3d3-f0a1bd57f350
State: Peer in Cluster (Connected)

Hostname: gfs1
Uuid: 32c99e7d-71f2-421c-86fc-b87c0f68ad1b
State: Peer in Cluster (Connected)


In nodirectwritedata-gluster-gvol0.log on gfs3 we see this logged with
regards to gfs2:

[2019-05-29 00:17:50.646360] I [MSGID: 115029]
[server-handshake.c:537:server_setvolume] 0-gvol0-server: accepted client
from
CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0
(version: 5.6)
[2019-05-29 00:17:50.761120] I [MSGID: 115036]
[server.c:469:server_rpc_notify] 0-gvol0-server: disconnecting connection
from
CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0
[2019-05-29 00:17:50.761352] I [MSGID: 101055]
[client_t.c:435:gf_client_unref] 0-gvol0-server: Shutting down connection
CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0

Thanks in advance for any assistance.

-- 
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190529/96a6a1d8/attachment.html>

From ravishankar at redhat.com  Wed May 29 04:20:51 2019
From: ravishankar at redhat.com (Ravishankar N)
Date: Wed, 29 May 2019 09:50:51 +0530
Subject: [Gluster-users] Does replace-brick migrate data?
In-Reply-To: <CAKKdN4VTTmnt_-TbeT7HanMpLM+cMvm3A9_KFxZJ2HKkg3tUVA@mail.gmail.com>
References: <CAKKdN4VP6ome2h2_iXaRVdr+WNp7K6PssCBpOffOn547P9R3Dw@mail.gmail.com>
	<32e26faf-e5c0-b944-2a32-c9eae408b146@redhat.com>
	<CAKKdN4VTTmnt_-TbeT7HanMpLM+cMvm3A9_KFxZJ2HKkg3tUVA@mail.gmail.com>
Message-ID: <0ab0c28a-48a1-92c0-a106-f4fa94cb620f@redhat.com>


On 29/05/19 3:59 AM, Alan Orth wrote:
> Dear Ravishankar,
>
> I'm not sure if Brick4 had pending AFRs because I don't know what that 
> means and it's been a few days so I am not sure I would be able to 
> find that information.
When you find some time, have a look at a blog <wp.me/peiBB-6b> series I 
wrote about AFR- I've tried to explain what one needs to know to debug 
replication related issues in it.
>
> Anyways, after wasting a few days rsyncing the old brick to a new host 
> I decided to just try to add the old brick back into the volume 
> instead of bringing it up on the new host. I created a new brick 
> directory on the old host, moved the old brick's contents into that 
> new directory (minus the .glusterfs directory), added the new brick to 
> the volume, and then did Vlad's find/stat trick? from the brick to the 
> FUSE mount point.
>
> The interesting problem I have now is that some files don't appear in 
> the FUSE mount's directory listings, but I can actually list them 
> directly and even read them. What could cause that?
Not sure, too many variables in the hacks that you did to take a guess. 
You can check if the contents of the .glusterfs folder are in order on 
the new brick (example hardlink for files and symlinks for directories 
are present etc.) .
Regards,
Ravi
>
> Thanks,
>
> ? 
> https://lists.gluster.org/pipermail/gluster-users/2018-February/033584.html
>
> On Fri, May 24, 2019 at 4:59 PM Ravishankar N <ravishankar at redhat.com 
> <mailto:ravishankar at redhat.com>> wrote:
>
>
>     On 23/05/19 2:40 AM, Alan Orth wrote:
>>     Dear list,
>>
>>     I seem to have gotten into a tricky situation. Today I brought up
>>     a shiny new server with new disk arrays and attempted to replace
>>     one brick of a replica 2 distribute/replicate volume on an older
>>     server using the `replace-brick` command:
>>
>>     # gluster volume replace-brick homes wingu0:/mnt/gluster/homes
>>     wingu06:/data/glusterfs/sdb/homes commit force
>>
>>     The command was successful and I see the new brick in the output
>>     of `gluster volume info`. The problem is that Gluster doesn't
>>     seem to be migrating the data,
>
>     `replace-brick` definitely must heal (not migrate) the data. In
>     your case, data must have been healed from Brick-4 to the replaced
>     Brick-3. Are there any errors in the self-heal daemon logs of
>     Brick-4's node? Does Brick-4 have pending AFR xattrs blaming
>     Brick-3? The doc is a bit out of date. replace-brick command
>     internally does all the setfattr steps that are mentioned in the doc.
>
>     -Ravi
>
>
>>     and now the original brick that I replaced is no longer part of
>>     the volume (and a few terabytes of data are just sitting on the
>>     old brick):
>>
>>     # gluster volume info homes | grep -E "Brick[0-9]:"
>>     Brick1: wingu4:/mnt/gluster/homes
>>     Brick2: wingu3:/mnt/gluster/homes
>>     Brick3: wingu06:/data/glusterfs/sdb/homes
>>     Brick4: wingu05:/data/glusterfs/sdb/homes
>>     Brick5: wingu05:/data/glusterfs/sdc/homes
>>     Brick6: wingu06:/data/glusterfs/sdc/homes
>>
>>     I see the Gluster docs have a more complicated procedure for
>>     replacing bricks that involves getfattr/setfattr?. How can I tell
>>     Gluster about the old brick? I see that I have a backup of the
>>     old volfile thanks to yum's rpmsave function if that helps.
>>
>>     We are using Gluster 5.6 on CentOS 7. Thank you for any advice
>>     you can give.
>>
>>     ?
>>     https://docs.gluster.org/en/latest/Administrator%20Guide/Managing%20Volumes/#replace-faulty-brick
>>
>>     -- 
>>     Alan Orth
>>     alan.orth at gmail.com <mailto:alan.orth at gmail.com>
>>     https://picturingjordan.com
>>     https://englishbulgaria.net
>>     https://mjanja.ch
>>     "In heaven all the interesting people are missing." ?Friedrich
>>     Nietzsche
>>
>>     _______________________________________________
>>     Gluster-users mailing list
>>     Gluster-users at gluster.org  <mailto:Gluster-users at gluster.org>
>>     https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
> -- 
> Alan Orth
> alan.orth at gmail.com <mailto:alan.orth at gmail.com>
> https://picturingjordan.com
> https://englishbulgaria.net
> https://mjanja.ch
> "In heaven all the interesting people are missing." ?Friedrich Nietzsche
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190529/c481f1cd/attachment.html>

From ravishankar at redhat.com  Wed May 29 04:24:33 2019
From: ravishankar at redhat.com (Ravishankar N)
Date: Wed, 29 May 2019 09:54:33 +0530
Subject: [Gluster-users] Does replace-brick migrate data?
In-Reply-To: <0ab0c28a-48a1-92c0-a106-f4fa94cb620f@redhat.com>
References: <CAKKdN4VP6ome2h2_iXaRVdr+WNp7K6PssCBpOffOn547P9R3Dw@mail.gmail.com>
	<32e26faf-e5c0-b944-2a32-c9eae408b146@redhat.com>
	<CAKKdN4VTTmnt_-TbeT7HanMpLM+cMvm3A9_KFxZJ2HKkg3tUVA@mail.gmail.com>
	<0ab0c28a-48a1-92c0-a106-f4fa94cb620f@redhat.com>
Message-ID: <39dcc6a5-1610-93e1-aaff-7fef9b6c1faa@redhat.com>


On 29/05/19 9:50 AM, Ravishankar N wrote:
>
>
> On 29/05/19 3:59 AM, Alan Orth wrote:
>> Dear Ravishankar,
>>
>> I'm not sure if Brick4 had pending AFRs because I don't know what 
>> that means and it's been a few days so I am not sure I would be able 
>> to find that information.
> When you find some time, have a look at a blog <wp.me/peiBB-6b> series 
> I wrote about AFR- I've tried to explain what one needs to know to 
> debug replication related issues in it.

Made a typo error. The URL for the blog is https://wp.me/peiBB-6b

-Ravi

>>
>> Anyways, after wasting a few days rsyncing the old brick to a new 
>> host I decided to just try to add the old brick back into the volume 
>> instead of bringing it up on the new host. I created a new brick 
>> directory on the old host, moved the old brick's contents into that 
>> new directory (minus the .glusterfs directory), added the new brick 
>> to the volume, and then did Vlad's find/stat trick? from the brick to 
>> the FUSE mount point.
>>
>> The interesting problem I have now is that some files don't appear in 
>> the FUSE mount's directory listings, but I can actually list them 
>> directly and even read them. What could cause that?
> Not sure, too many variables in the hacks that you did to take a 
> guess. You can check if the contents of the .glusterfs folder are in 
> order on the new brick (example hardlink for files and symlinks for 
> directories are present etc.) .
> Regards,
> Ravi
>>
>> Thanks,
>>
>> ? 
>> https://lists.gluster.org/pipermail/gluster-users/2018-February/033584.html
>>
>> On Fri, May 24, 2019 at 4:59 PM Ravishankar N <ravishankar at redhat.com 
>> <mailto:ravishankar at redhat.com>> wrote:
>>
>>
>>     On 23/05/19 2:40 AM, Alan Orth wrote:
>>>     Dear list,
>>>
>>>     I seem to have gotten into a tricky situation. Today I brought
>>>     up a shiny new server with new disk arrays and attempted to
>>>     replace one brick of a replica 2 distribute/replicate volume on
>>>     an older server using the `replace-brick` command:
>>>
>>>     # gluster volume replace-brick homes wingu0:/mnt/gluster/homes
>>>     wingu06:/data/glusterfs/sdb/homes commit force
>>>
>>>     The command was successful and I see the new brick in the output
>>>     of `gluster volume info`. The problem is that Gluster doesn't
>>>     seem to be migrating the data,
>>
>>     `replace-brick` definitely must heal (not migrate) the data. In
>>     your case, data must have been healed from Brick-4 to the
>>     replaced Brick-3. Are there any errors in the self-heal daemon
>>     logs of Brick-4's node? Does Brick-4 have pending AFR xattrs
>>     blaming Brick-3? The doc is a bit out of date. replace-brick
>>     command internally does all the setfattr steps that are mentioned
>>     in the doc.
>>
>>     -Ravi
>>
>>
>>>     and now the original brick that I replaced is no longer part of
>>>     the volume (and a few terabytes of data are just sitting on the
>>>     old brick):
>>>
>>>     # gluster volume info homes | grep -E "Brick[0-9]:"
>>>     Brick1: wingu4:/mnt/gluster/homes
>>>     Brick2: wingu3:/mnt/gluster/homes
>>>     Brick3: wingu06:/data/glusterfs/sdb/homes
>>>     Brick4: wingu05:/data/glusterfs/sdb/homes
>>>     Brick5: wingu05:/data/glusterfs/sdc/homes
>>>     Brick6: wingu06:/data/glusterfs/sdc/homes
>>>
>>>     I see the Gluster docs have a more complicated procedure for
>>>     replacing bricks that involves getfattr/setfattr?. How can I
>>>     tell Gluster about the old brick? I see that I have a backup of
>>>     the old volfile thanks to yum's rpmsave function if that helps.
>>>
>>>     We are using Gluster 5.6 on CentOS 7. Thank you for any advice
>>>     you can give.
>>>
>>>     ?
>>>     https://docs.gluster.org/en/latest/Administrator%20Guide/Managing%20Volumes/#replace-faulty-brick
>>>
>>>     -- 
>>>     Alan Orth
>>>     alan.orth at gmail.com <mailto:alan.orth at gmail.com>
>>>     https://picturingjordan.com
>>>     https://englishbulgaria.net
>>>     https://mjanja.ch
>>>     "In heaven all the interesting people are missing." ?Friedrich
>>>     Nietzsche
>>>
>>>     _______________________________________________
>>>     Gluster-users mailing list
>>>     Gluster-users at gluster.org  <mailto:Gluster-users at gluster.org>
>>>     https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>> -- 
>> Alan Orth
>> alan.orth at gmail.com <mailto:alan.orth at gmail.com>
>> https://picturingjordan.com
>> https://englishbulgaria.net
>> https://mjanja.ch
>> "In heaven all the interesting people are missing." ?Friedrich Nietzsche
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190529/d8881557/attachment.html>

From ravishankar at redhat.com  Wed May 29 04:26:31 2019
From: ravishankar at redhat.com (Ravishankar N)
Date: Wed, 29 May 2019 09:56:31 +0530
Subject: [Gluster-users] Transport endpoint is not connected
In-Reply-To: <CAHGbP-__to-x+sxEbjayC5zCEXc2fy2eS3_pRsVTHMeSre9NLw@mail.gmail.com>
References: <CAHGbP-__to-x+sxEbjayC5zCEXc2fy2eS3_pRsVTHMeSre9NLw@mail.gmail.com>
Message-ID: <c9886f6b-4cc2-15ea-6c46-286b217f84db@redhat.com>


On 29/05/19 6:21 AM, David Cunningham wrote:
> Hello all,
>
> We are seeing a strange issue where a new node gfs3 shows another node 
> gfs2 as not connected on the "gluster volume heal" info:
>
> [root at gfs3 bricks]# gluster volume heal gvol0 info
> Brick gfs1:/nodirectwritedata/gluster/gvol0
> Status: Connected
> Number of entries: 0
>
> Brick gfs2:/nodirectwritedata/gluster/gvol0
> Status: Transport endpoint is not connected
> Number of entries: -
>
> Brick gfs3:/nodirectwritedata/gluster/gvol0
> Status: Connected
> Number of entries: 0
>
>
> However it does show the same node connected on "gluster peer status". 
> Does anyone know why this would be?
>
> [root at gfs3 bricks]# gluster peer status
> Number of Peers: 2
>
> Hostname: gfs2
> Uuid: 91863102-23a8-43e1-b3d3-f0a1bd57f350
> State: Peer in Cluster (Connected)
>
> Hostname: gfs1
> Uuid: 32c99e7d-71f2-421c-86fc-b87c0f68ad1b
> State: Peer in Cluster (Connected)
>
>
> In nodirectwritedata-gluster-gvol0.log on gfs3 we see this logged with 
> regards to gfs2:

You need to check glfsheal-$volname.log on the node where you ran the 
command and check for any connection related errors.

-Ravi

>
> [2019-05-29 00:17:50.646360] I [MSGID: 115029] 
> [server-handshake.c:537:server_setvolume] 0-gvol0-server: accepted 
> client from 
> CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0 
> (version: 5.6)
> [2019-05-29 00:17:50.761120] I [MSGID: 115036] 
> [server.c:469:server_rpc_notify] 0-gvol0-server: disconnecting 
> connection from 
> CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0
> [2019-05-29 00:17:50.761352] I [MSGID: 101055] 
> [client_t.c:435:gf_client_unref] 0-gvol0-server: Shutting down 
> connection 
> CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0
>
> Thanks in advance for any assistance.
>
> -- 
> David Cunningham, Voisonics Limited
> http://voisonics.com/
> USA: +1 213 221 1092
> New Zealand: +64 (0)28 2558 3782
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190529/0df0a412/attachment.html>

From joe at julianfamily.org  Wed May 29 05:17:51 2019
From: joe at julianfamily.org (Joe Julian)
Date: Tue, 28 May 2019 22:17:51 -0700
Subject: [Gluster-users] Transport endpoint is not connected
In-Reply-To: <CAHGbP-__to-x+sxEbjayC5zCEXc2fy2eS3_pRsVTHMeSre9NLw@mail.gmail.com>
References: <CAHGbP-__to-x+sxEbjayC5zCEXc2fy2eS3_pRsVTHMeSre9NLw@mail.gmail.com>
Message-ID: <b7c3a1df-fa62-9992-f336-247875fbedb2@julianfamily.org>

Check

gluster volume status gvol0

and make sure your bricks are all running.

On 5/29/19 2:51 AM, David Cunningham wrote:
> Hello all,
>
> We are seeing a strange issue where a new node gfs3 shows another node 
> gfs2 as not connected on the "gluster volume heal" info:
>
> [root at gfs3 bricks]# gluster volume heal gvol0 info
> Brick gfs1:/nodirectwritedata/gluster/gvol0
> Status: Connected
> Number of entries: 0
>
> Brick gfs2:/nodirectwritedata/gluster/gvol0
> Status: Transport endpoint is not connected
> Number of entries: -
>
> Brick gfs3:/nodirectwritedata/gluster/gvol0
> Status: Connected
> Number of entries: 0
>
>
> However it does show the same node connected on "gluster peer status". 
> Does anyone know why this would be?
>
> [root at gfs3 bricks]# gluster peer status
> Number of Peers: 2
>
> Hostname: gfs2
> Uuid: 91863102-23a8-43e1-b3d3-f0a1bd57f350
> State: Peer in Cluster (Connected)
>
> Hostname: gfs1
> Uuid: 32c99e7d-71f2-421c-86fc-b87c0f68ad1b
> State: Peer in Cluster (Connected)
>
>
> In nodirectwritedata-gluster-gvol0.log on gfs3 we see this logged with 
> regards to gfs2:
>
> [2019-05-29 00:17:50.646360] I [MSGID: 115029] 
> [server-handshake.c:537:server_setvolume] 0-gvol0-server: accepted 
> client from 
> CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0 
> (version: 5.6)
> [2019-05-29 00:17:50.761120] I [MSGID: 115036] 
> [server.c:469:server_rpc_notify] 0-gvol0-server: disconnecting 
> connection from 
> CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0
> [2019-05-29 00:17:50.761352] I [MSGID: 101055] 
> [client_t.c:435:gf_client_unref] 0-gvol0-server: Shutting down 
> connection 
> CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0
>
> Thanks in advance for any assistance.
>
> -- 
> David Cunningham, Voisonics Limited
> http://voisonics.com/
> USA: +1 213 221 1092
> New Zealand: +64 (0)28 2558 3782
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190528/ea3bc07d/attachment.html>

From dcunningham at voisonics.com  Wed May 29 08:56:23 2019
From: dcunningham at voisonics.com (David Cunningham)
Date: Wed, 29 May 2019 20:56:23 +1200
Subject: [Gluster-users] Transport endpoint is not connected
In-Reply-To: <c9886f6b-4cc2-15ea-6c46-286b217f84db@redhat.com>
References: <CAHGbP-__to-x+sxEbjayC5zCEXc2fy2eS3_pRsVTHMeSre9NLw@mail.gmail.com>
	<c9886f6b-4cc2-15ea-6c46-286b217f84db@redhat.com>
Message-ID: <CAHGbP-8vAC4+WLObZbXPtmMG+rt-WsqLr3gt=c2yPZAZMWKgMg@mail.gmail.com>

Hi Ravi and Joe,

The command "gluster volume status gvol0" shows all 3 nodes as being
online, even on gfs3 as below. I've attached the glfsheal-gvol0.log, in
which I can't see anything like a connection error. Would you have any
further suggestions? Thank you.

[root at gfs3 glusterfs]# gluster volume status gvol0
Status of volume: gvol0
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick gfs1:/nodirectwritedata/gluster/gvol0 49152     0          Y
7706
Brick gfs2:/nodirectwritedata/gluster/gvol0 49152     0          Y
7625
Brick gfs3:/nodirectwritedata/gluster/gvol0 49152     0          Y
7307
Self-heal Daemon on localhost               N/A       N/A        Y
7316
Self-heal Daemon on gfs1                    N/A       N/A        Y
40591
Self-heal Daemon on gfs2                    N/A       N/A        Y
7634

Task Status of Volume gvol0
------------------------------------------------------------------------------
There are no active volume tasks


On Wed, 29 May 2019 at 16:26, Ravishankar N <ravishankar at redhat.com> wrote:

>
> On 29/05/19 6:21 AM, David Cunningham wrote:
>
> Hello all,
>
> We are seeing a strange issue where a new node gfs3 shows another node
> gfs2 as not connected on the "gluster volume heal" info:
>
> [root at gfs3 bricks]# gluster volume heal gvol0 info
> Brick gfs1:/nodirectwritedata/gluster/gvol0
> Status: Connected
> Number of entries: 0
>
> Brick gfs2:/nodirectwritedata/gluster/gvol0
> Status: Transport endpoint is not connected
> Number of entries: -
>
> Brick gfs3:/nodirectwritedata/gluster/gvol0
> Status: Connected
> Number of entries: 0
>
>
> However it does show the same node connected on "gluster peer status".
> Does anyone know why this would be?
>
> [root at gfs3 bricks]# gluster peer status
> Number of Peers: 2
>
> Hostname: gfs2
> Uuid: 91863102-23a8-43e1-b3d3-f0a1bd57f350
> State: Peer in Cluster (Connected)
>
> Hostname: gfs1
> Uuid: 32c99e7d-71f2-421c-86fc-b87c0f68ad1b
> State: Peer in Cluster (Connected)
>
>
> In nodirectwritedata-gluster-gvol0.log on gfs3 we see this logged with
> regards to gfs2:
>
> You need to check glfsheal-$volname.log on the node where you ran the
> command and check for any connection related errors.
>
> -Ravi
>
>
> [2019-05-29 00:17:50.646360] I [MSGID: 115029]
> [server-handshake.c:537:server_setvolume] 0-gvol0-server: accepted client
> from
> CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0
> (version: 5.6)
> [2019-05-29 00:17:50.761120] I [MSGID: 115036]
> [server.c:469:server_rpc_notify] 0-gvol0-server: disconnecting connection
> from
> CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0
> [2019-05-29 00:17:50.761352] I [MSGID: 101055]
> [client_t.c:435:gf_client_unref] 0-gvol0-server: Shutting down connection
> CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0
>
> Thanks in advance for any assistance.
>
> --
> David Cunningham, Voisonics Limited
> http://voisonics.com/
> USA: +1 213 221 1092
> New Zealand: +64 (0)28 2558 3782
>
> _______________________________________________
> Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users
>
>

-- 
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190529/96dadf4f/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: glfsheal-gvol0.log
Type: text/x-log
Size: 6160 bytes
Desc: not available
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190529/96dadf4f/attachment.bin>

From ravishankar at redhat.com  Wed May 29 11:10:49 2019
From: ravishankar at redhat.com (Ravishankar N)
Date: Wed, 29 May 2019 16:40:49 +0530
Subject: [Gluster-users] Transport endpoint is not connected
In-Reply-To: <CAHGbP-8vAC4+WLObZbXPtmMG+rt-WsqLr3gt=c2yPZAZMWKgMg@mail.gmail.com>
References: <CAHGbP-__to-x+sxEbjayC5zCEXc2fy2eS3_pRsVTHMeSre9NLw@mail.gmail.com>
	<c9886f6b-4cc2-15ea-6c46-286b217f84db@redhat.com>
	<CAHGbP-8vAC4+WLObZbXPtmMG+rt-WsqLr3gt=c2yPZAZMWKgMg@mail.gmail.com>
Message-ID: <64ca2efd-7ce2-e88c-db75-1bbb20db44ad@redhat.com>

I don't see a "Connected to gvol0-client-1" in the log.? Perhaps a 
firewall issue like the last time? Even in the earlier add-brick log 
from the other email thread, connection to the 2nd brick was not 
established.

-Ravi

On 29/05/19 2:26 PM, David Cunningham wrote:
> Hi Ravi and Joe,
>
> The command "gluster volume status gvol0" shows all 3 nodes as being 
> online, even on gfs3 as below. I've attached the glfsheal-gvol0.log, 
> in which I can't see anything like a connection error. Would you have 
> any further suggestions? Thank you.
>
> [root at gfs3 glusterfs]# gluster volume status gvol0
> Status of volume: gvol0
> Gluster process???????????????????????????? TCP Port RDMA Port? 
> Online? Pid
> ------------------------------------------------------------------------------
> Brick gfs1:/nodirectwritedata/gluster/gvol0 49152 0????????? Y?????? 7706
> Brick gfs2:/nodirectwritedata/gluster/gvol0 49152 0????????? Y?????? 7625
> Brick gfs3:/nodirectwritedata/gluster/gvol0 49152 0????????? Y?????? 7307
> Self-heal Daemon on localhost?????????????? N/A N/A??????? Y?????? 7316
> Self-heal Daemon on gfs1??????????????????? N/A N/A??????? Y?????? 40591
> Self-heal Daemon on gfs2??????????????????? N/A N/A??????? Y?????? 7634
>
> Task Status of Volume gvol0
> ------------------------------------------------------------------------------
> There are no active volume tasks
>
>
> On Wed, 29 May 2019 at 16:26, Ravishankar N <ravishankar at redhat.com 
> <mailto:ravishankar at redhat.com>> wrote:
>
>
>     On 29/05/19 6:21 AM, David Cunningham wrote:
>>     Hello all,
>>
>>     We are seeing a strange issue where a new node gfs3 shows another
>>     node gfs2 as not connected on the "gluster volume heal" info:
>>
>>     [root at gfs3 bricks]# gluster volume heal gvol0 info
>>     Brick gfs1:/nodirectwritedata/gluster/gvol0
>>     Status: Connected
>>     Number of entries: 0
>>
>>     Brick gfs2:/nodirectwritedata/gluster/gvol0
>>     Status: Transport endpoint is not connected
>>     Number of entries: -
>>
>>     Brick gfs3:/nodirectwritedata/gluster/gvol0
>>     Status: Connected
>>     Number of entries: 0
>>
>>
>>     However it does show the same node connected on "gluster peer
>>     status". Does anyone know why this would be?
>>
>>     [root at gfs3 bricks]# gluster peer status
>>     Number of Peers: 2
>>
>>     Hostname: gfs2
>>     Uuid: 91863102-23a8-43e1-b3d3-f0a1bd57f350
>>     State: Peer in Cluster (Connected)
>>
>>     Hostname: gfs1
>>     Uuid: 32c99e7d-71f2-421c-86fc-b87c0f68ad1b
>>     State: Peer in Cluster (Connected)
>>
>>
>>     In nodirectwritedata-gluster-gvol0.log on gfs3 we see this logged
>>     with regards to gfs2:
>
>     You need to check glfsheal-$volname.log on the node where you ran
>     the command and check for any connection related errors.
>
>     -Ravi
>
>>
>>     [2019-05-29 00:17:50.646360] I [MSGID: 115029]
>>     [server-handshake.c:537:server_setvolume] 0-gvol0-server:
>>     accepted client from
>>     CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0
>>     (version: 5.6)
>>     [2019-05-29 00:17:50.761120] I [MSGID: 115036]
>>     [server.c:469:server_rpc_notify] 0-gvol0-server: disconnecting
>>     connection from
>>     CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0
>>     [2019-05-29 00:17:50.761352] I [MSGID: 101055]
>>     [client_t.c:435:gf_client_unref] 0-gvol0-server: Shutting down
>>     connection
>>     CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0
>>
>>     Thanks in advance for any assistance.
>>
>>     -- 
>>     David Cunningham, Voisonics Limited
>>     http://voisonics.com/
>>     USA: +1 213 221 1092
>>     New Zealand: +64 (0)28 2558 3782
>>
>>     _______________________________________________
>>     Gluster-users mailing list
>>     Gluster-users at gluster.org  <mailto:Gluster-users at gluster.org>
>>     https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
> -- 
> David Cunningham, Voisonics Limited
> http://voisonics.com/
> USA: +1 213 221 1092
> New Zealand: +64 (0)28 2558 3782
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190529/33e054ea/attachment.html>

From srakonde at redhat.com  Wed May 29 11:13:05 2019
From: srakonde at redhat.com (Sanju Rakonde)
Date: Wed, 29 May 2019 16:43:05 +0530
Subject: [Gluster-users] Memory leak in gluster 5.4
In-Reply-To: <CAKof758Nby8QWFiPvXPQEEFZOF6CBhXFH05mu5kuh0gM9QuHCg@mail.gmail.com>
References: <CAKof758Nby8QWFiPvXPQEEFZOF6CBhXFH05mu5kuh0gM9QuHCg@mail.gmail.com>
Message-ID: <CABj3WN3wRn9c5Ci7aQKYwcdf9HW9jKwW+R-6FZZ=q8c7qAjWdg@mail.gmail.com>

Hi Christian,

I see below errors when I try to unzip the file.

[root at localhost Downloads]# unzip gluster_coredump.zip
Archive:  gluster_coredump.zip
checkdir error:  coredump exists but is not directory
                 unable to process coredump/.
checkdir error:  coredump exists but is not directory
                 unable to process
coredump/core.glusterd.0.ed02597e2d374210985795ab82dd48e7.2209.1557381154000000.lz4.
checkdir error:  coredump exists but is not directory
                 unable to process
coredump/core.glusterfsd.0.ed02597e2d374210985795ab82dd48e7.2634.1557381672000000.lz4.
checkdir error:  coredump exists but is not directory
                 unable to process
coredump/core.glusterfsd.0.ed02597e2d374210985795ab82dd48e7.2653.1557381626000000.lz4.
[root at localhost Downloads]#

Periodic statedumps will be much helpful in debugging memory leaks than
coredumps.

Thanks,
Sanju

On Thu, May 16, 2019 at 2:57 PM Christian Meyer <chrmeyer at chrmeyer.de>
wrote:

> Hi everyone!
>
> I'm using a Gluster 5.4 Setup with three Nodes and three volumes
>  (one is the gluster shared storage). The other are replicated volumes.
> Each node has 64GB of RAM.
> Over the time of ~2 month the memory consumption of glusterd grow
> linear. An the end glusterd used ~45% of RAM the brick processes
> together ~43% of RAM.
> I think this is a memory leak.
>
> I made a coredump of the processes (glusterd, bricks) (zipped ~500MB),
> hope this will help to find the problem.
>
> Could someone please have a look on it?
>
> Download Coredumps:
> https://s3.eu-central-1.amazonaws.com/glusterlogs/gluster_coredump.zip
>
> Kind regards
>
> Christian
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>


-- 
Thanks,
Sanju
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190529/87009ea6/attachment.html>

From alan.orth at gmail.com  Wed May 29 14:20:20 2019
From: alan.orth at gmail.com (Alan Orth)
Date: Wed, 29 May 2019 17:20:20 +0300
Subject: [Gluster-users] Does replace-brick migrate data?
In-Reply-To: <39dcc6a5-1610-93e1-aaff-7fef9b6c1faa@redhat.com>
References: <CAKKdN4VP6ome2h2_iXaRVdr+WNp7K6PssCBpOffOn547P9R3Dw@mail.gmail.com>
	<32e26faf-e5c0-b944-2a32-c9eae408b146@redhat.com>
	<CAKKdN4VTTmnt_-TbeT7HanMpLM+cMvm3A9_KFxZJ2HKkg3tUVA@mail.gmail.com>
	<0ab0c28a-48a1-92c0-a106-f4fa94cb620f@redhat.com>
	<39dcc6a5-1610-93e1-aaff-7fef9b6c1faa@redhat.com>
Message-ID: <CAKKdN4X=H0cVdwKiPK-8tvYMkYWaKrfV7k9zTFJrRSe__5GkGA@mail.gmail.com>

Dear Ravi,

Thank you for the link to the blog post series?it is very informative and
current! If I understand your blog post correctly then I think the answer
to your previous question about pending AFRs is: no, there are no pending
AFRs. I have identified one file that is a good test case to try to
understand what happened after I issued the `gluster volume replace-brick
... commit force` a few days ago and then added the same original brick
back to the volume later. This is the current state of the replica 2
distribute/replicate volume:

[root at wingu0 ~]# gluster volume info apps

Volume Name: apps
Type: Distributed-Replicate
Volume ID: f118d2da-79df-4ee1-919d-53884cd34eda
Status: Started
Snapshot Count: 0
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: wingu3:/mnt/gluster/apps
Brick2: wingu4:/mnt/gluster/apps
Brick3: wingu05:/data/glusterfs/sdb/apps
Brick4: wingu06:/data/glusterfs/sdb/apps
Brick5: wingu0:/mnt/gluster/apps
Brick6: wingu05:/data/glusterfs/sdc/apps
Options Reconfigured:
diagnostics.client-log-level: DEBUG
storage.health-check-interval: 10
nfs.disable: on

I checked the xattrs of one file that is missing from the volume's FUSE
mount (though I can read it if I access its full path explicitly), but is
present in several of the volume's bricks (some with full size, others
empty):

[root at wingu0 ~]# getfattr -d -m. -e hex
/mnt/gluster/apps/clcgenomics/clclicsrv/licenseserver.cfg

getfattr: Removing leading '/' from absolute path names
# file: mnt/gluster/apps/clcgenomics/clclicsrv/licenseserver.cfg
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.apps-client-3=0x000000000000000000000000
trusted.afr.apps-client-5=0x000000000000000000000000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x0200000000000000585a396f00046e15
trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd

[root at wingu05 ~]# getfattr -d -m. -e hex
/data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg
getfattr: Removing leading '/' from absolute path names
# file: data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd
trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667
trusted.glusterfs.dht.linkto=0x617070732d7265706c69636174652d3200

[root at wingu05 ~]# getfattr -d -m. -e hex
/data/glusterfs/sdc/apps/clcgenomics/clclicsrv/licenseserver.cfg
getfattr: Removing leading '/' from absolute path names
# file: data/glusterfs/sdc/apps/clcgenomics/clclicsrv/licenseserver.cfg
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd
trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667

[root at wingu06 ~]# getfattr -d -m. -e hex
/data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg
getfattr: Removing leading '/' from absolute path names
# file: data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd
trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667
trusted.glusterfs.dht.linkto=0x617070732d7265706c69636174652d3200

According to the trusted.afr.apps-client-xx xattrs this particular file
should be on bricks with id "apps-client-3" and "apps-client-5". It took me
a few hours to realize that the brick-id values are recorded in the
volume's volfiles in /var/lib/glusterd/vols/apps/bricks. After comparing
those brick-id values with a volfile backup from before the replace-brick,
I realized that the files are simply on the wrong brick now as far as
Gluster is concerned. This particular file is now on the brick for
"apps-client-4". As an experiment I copied this one file to the two bricks
listed in the xattrs and I was then able to see the file from the FUSE
mount (yay!).

Other than replacing the brick, removing it, and then adding the old brick
on the original server back, there has been no change in the data this
entire time. Can I change the brick IDs in the volfiles so they reflect
where the data actually is? Or perhaps script something to reset all the
xattrs on the files/directories to point to the correct bricks?

Thank you for any help or pointers,

On Wed, May 29, 2019 at 7:24 AM Ravishankar N <ravishankar at redhat.com>
wrote:

>
> On 29/05/19 9:50 AM, Ravishankar N wrote:
>
>
> On 29/05/19 3:59 AM, Alan Orth wrote:
>
> Dear Ravishankar,
>
> I'm not sure if Brick4 had pending AFRs because I don't know what that
> means and it's been a few days so I am not sure I would be able to find
> that information.
>
> When you find some time, have a look at a blog <http://wp.me/peiBB-6b>
> series I wrote about AFR- I've tried to explain what one needs to know to
> debug replication related issues in it.
>
> Made a typo error. The URL for the blog is https://wp.me/peiBB-6b
>
> -Ravi
>
>
> Anyways, after wasting a few days rsyncing the old brick to a new host I
> decided to just try to add the old brick back into the volume instead of
> bringing it up on the new host. I created a new brick directory on the old
> host, moved the old brick's contents into that new directory (minus the
> .glusterfs directory), added the new brick to the volume, and then did
> Vlad's find/stat trick? from the brick to the FUSE mount point.
>
> The interesting problem I have now is that some files don't appear in the
> FUSE mount's directory listings, but I can actually list them directly and
> even read them. What could cause that?
>
> Not sure, too many variables in the hacks that you did to take a guess.
> You can check if the contents of the .glusterfs folder are in order on the
> new brick (example hardlink for files and symlinks for directories are
> present etc.) .
> Regards,
> Ravi
>
>
> Thanks,
>
> ?
> https://lists.gluster.org/pipermail/gluster-users/2018-February/033584.html
>
> On Fri, May 24, 2019 at 4:59 PM Ravishankar N <ravishankar at redhat.com>
> wrote:
>
>>
>> On 23/05/19 2:40 AM, Alan Orth wrote:
>>
>> Dear list,
>>
>> I seem to have gotten into a tricky situation. Today I brought up a shiny
>> new server with new disk arrays and attempted to replace one brick of a
>> replica 2 distribute/replicate volume on an older server using the
>> `replace-brick` command:
>>
>> # gluster volume replace-brick homes wingu0:/mnt/gluster/homes
>> wingu06:/data/glusterfs/sdb/homes commit force
>>
>> The command was successful and I see the new brick in the output of
>> `gluster volume info`. The problem is that Gluster doesn't seem to be
>> migrating the data,
>>
>> `replace-brick` definitely must heal (not migrate) the data. In your
>> case, data must have been healed from Brick-4 to the replaced Brick-3. Are
>> there any errors in the self-heal daemon logs of Brick-4's node? Does
>> Brick-4 have pending AFR xattrs blaming Brick-3? The doc is a bit out of
>> date. replace-brick command internally does all the setfattr steps that are
>> mentioned in the doc.
>>
>> -Ravi
>>
>>
>> and now the original brick that I replaced is no longer part of the
>> volume (and a few terabytes of data are just sitting on the old brick):
>>
>> # gluster volume info homes | grep -E "Brick[0-9]:"
>> Brick1: wingu4:/mnt/gluster/homes
>> Brick2: wingu3:/mnt/gluster/homes
>> Brick3: wingu06:/data/glusterfs/sdb/homes
>> Brick4: wingu05:/data/glusterfs/sdb/homes
>> Brick5: wingu05:/data/glusterfs/sdc/homes
>> Brick6: wingu06:/data/glusterfs/sdc/homes
>>
>> I see the Gluster docs have a more complicated procedure for replacing
>> bricks that involves getfattr/setfattr?. How can I tell Gluster about the
>> old brick? I see that I have a backup of the old volfile thanks to yum's
>> rpmsave function if that helps.
>>
>> We are using Gluster 5.6 on CentOS 7. Thank you for any advice you can
>> give.
>>
>> ?
>> https://docs.gluster.org/en/latest/Administrator%20Guide/Managing%20Volumes/#replace-faulty-brick
>>
>> --
>> Alan Orth
>> alan.orth at gmail.com
>> https://picturingjordan.com
>> https://englishbulgaria.net
>> https://mjanja.ch
>> "In heaven all the interesting people are missing." ?Friedrich Nietzsche
>>
>> _______________________________________________
>> Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>
> --
> Alan Orth
> alan.orth at gmail.com
> https://picturingjordan.com
> https://englishbulgaria.net
> https://mjanja.ch
> "In heaven all the interesting people are missing." ?Friedrich Nietzsche
>
>
> _______________________________________________
> Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users
>
>

-- 
Alan Orth
alan.orth at gmail.com
https://picturingjordan.com
https://englishbulgaria.net
https://mjanja.ch
"In heaven all the interesting people are missing." ?Friedrich Nietzsche
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190529/b469325b/attachment.html>

From dcunningham at voisonics.com  Wed May 29 22:33:01 2019
From: dcunningham at voisonics.com (David Cunningham)
Date: Thu, 30 May 2019 10:33:01 +1200
Subject: [Gluster-users] Transport endpoint is not connected
In-Reply-To: <64ca2efd-7ce2-e88c-db75-1bbb20db44ad@redhat.com>
References: <CAHGbP-__to-x+sxEbjayC5zCEXc2fy2eS3_pRsVTHMeSre9NLw@mail.gmail.com>
	<c9886f6b-4cc2-15ea-6c46-286b217f84db@redhat.com>
	<CAHGbP-8vAC4+WLObZbXPtmMG+rt-WsqLr3gt=c2yPZAZMWKgMg@mail.gmail.com>
	<64ca2efd-7ce2-e88c-db75-1bbb20db44ad@redhat.com>
Message-ID: <CAHGbP-9OW5NWkRfR=vW0OH_zOE4CVWA0RjETv==92PSEVvLpXA@mail.gmail.com>

Hi Ravi,

I think it probably is a firewall issue with the network provider. I was
hoping to see a specific connection failure message we could send to them,
but will take it up with them anyway.

Thanks for your help.


On Wed, 29 May 2019 at 23:10, Ravishankar N <ravishankar at redhat.com> wrote:

> I don't see a "Connected to gvol0-client-1" in the log.  Perhaps a
> firewall issue like the last time? Even in the earlier add-brick log from
> the other email thread, connection to the 2nd brick was not established.
>
> -Ravi
> On 29/05/19 2:26 PM, David Cunningham wrote:
>
> Hi Ravi and Joe,
>
> The command "gluster volume status gvol0" shows all 3 nodes as being
> online, even on gfs3 as below. I've attached the glfsheal-gvol0.log, in
> which I can't see anything like a connection error. Would you have any
> further suggestions? Thank you.
>
> [root at gfs3 glusterfs]# gluster volume status gvol0
> Status of volume: gvol0
> Gluster process                             TCP Port  RDMA Port  Online
> Pid
>
> ------------------------------------------------------------------------------
> Brick gfs1:/nodirectwritedata/gluster/gvol0 49152     0          Y
> 7706
> Brick gfs2:/nodirectwritedata/gluster/gvol0 49152     0          Y
> 7625
> Brick gfs3:/nodirectwritedata/gluster/gvol0 49152     0          Y
> 7307
> Self-heal Daemon on localhost               N/A       N/A        Y
> 7316
> Self-heal Daemon on gfs1                    N/A       N/A        Y
> 40591
> Self-heal Daemon on gfs2                    N/A       N/A        Y
> 7634
>
> Task Status of Volume gvol0
>
> ------------------------------------------------------------------------------
> There are no active volume tasks
>
>
> On Wed, 29 May 2019 at 16:26, Ravishankar N <ravishankar at redhat.com>
> wrote:
>
>>
>> On 29/05/19 6:21 AM, David Cunningham wrote:
>>
>> Hello all,
>>
>> We are seeing a strange issue where a new node gfs3 shows another node
>> gfs2 as not connected on the "gluster volume heal" info:
>>
>> [root at gfs3 bricks]# gluster volume heal gvol0 info
>> Brick gfs1:/nodirectwritedata/gluster/gvol0
>> Status: Connected
>> Number of entries: 0
>>
>> Brick gfs2:/nodirectwritedata/gluster/gvol0
>> Status: Transport endpoint is not connected
>> Number of entries: -
>>
>> Brick gfs3:/nodirectwritedata/gluster/gvol0
>> Status: Connected
>> Number of entries: 0
>>
>>
>> However it does show the same node connected on "gluster peer status".
>> Does anyone know why this would be?
>>
>> [root at gfs3 bricks]# gluster peer status
>> Number of Peers: 2
>>
>> Hostname: gfs2
>> Uuid: 91863102-23a8-43e1-b3d3-f0a1bd57f350
>> State: Peer in Cluster (Connected)
>>
>> Hostname: gfs1
>> Uuid: 32c99e7d-71f2-421c-86fc-b87c0f68ad1b
>> State: Peer in Cluster (Connected)
>>
>>
>> In nodirectwritedata-gluster-gvol0.log on gfs3 we see this logged with
>> regards to gfs2:
>>
>> You need to check glfsheal-$volname.log on the node where you ran the
>> command and check for any connection related errors.
>>
>> -Ravi
>>
>>
>> [2019-05-29 00:17:50.646360] I [MSGID: 115029]
>> [server-handshake.c:537:server_setvolume] 0-gvol0-server: accepted client
>> from
>> CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0
>> (version: 5.6)
>> [2019-05-29 00:17:50.761120] I [MSGID: 115036]
>> [server.c:469:server_rpc_notify] 0-gvol0-server: disconnecting connection
>> from
>> CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0
>> [2019-05-29 00:17:50.761352] I [MSGID: 101055]
>> [client_t.c:435:gf_client_unref] 0-gvol0-server: Shutting down connection
>> CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0
>>
>> Thanks in advance for any assistance.
>>
>> --
>> David Cunningham, Voisonics Limited
>> http://voisonics.com/
>> USA: +1 213 221 1092
>> New Zealand: +64 (0)28 2558 3782
>>
>> _______________________________________________
>> Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>
> --
> David Cunningham, Voisonics Limited
> http://voisonics.com/
> USA: +1 213 221 1092
> New Zealand: +64 (0)28 2558 3782
>
>

-- 
David Cunningham, Voisonics Limited
http://voisonics.com/
USA: +1 213 221 1092
New Zealand: +64 (0)28 2558 3782
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190530/76ae2a73/attachment.html>

From hunter86_bg at yahoo.com  Wed May 29 10:27:47 2019
From: hunter86_bg at yahoo.com (Strahil)
Date: Wed, 29 May 2019 13:27:47 +0300
Subject: [Gluster-users] Transport endpoint is not connected
Message-ID: <l851jxtrxqwuycqvualyb3st.1559125667846@email.android.com>

Check with gluster volume status the tcp pprts and then try with telnet/ncat to connect from gfs3 to gfs2  on the tcp port.

Best Regards,
Strahil NikolovOn May 29, 2019 03:51, David Cunningham <dcunningham at voisonics.com> wrote:
>
> Hello all,
>
> We are seeing a strange issue where a new node gfs3 shows another node gfs2 as not connected on the "gluster volume heal" info:
>
> [root at gfs3 bricks]# gluster volume heal gvol0 info
> Brick gfs1:/nodirectwritedata/gluster/gvol0
> Status: Connected
> Number of entries: 0
>
> Brick gfs2:/nodirectwritedata/gluster/gvol0
> Status: Transport endpoint is not connected
> Number of entries: -
>
> Brick gfs3:/nodirectwritedata/gluster/gvol0
> Status: Connected
> Number of entries: 0
>
>
> However it does show the same node connected on "gluster peer status". Does anyone know why this would be?
>
> [root at gfs3 bricks]# gluster peer status
> Number of Peers: 2
>
> Hostname: gfs2
> Uuid: 91863102-23a8-43e1-b3d3-f0a1bd57f350
> State: Peer in Cluster (Connected)
>
> Hostname: gfs1
> Uuid: 32c99e7d-71f2-421c-86fc-b87c0f68ad1b
> State: Peer in Cluster (Connected)
>
>
> In nodirectwritedata-gluster-gvol0.log on gfs3 we see this logged with regards to gfs2:
>
> [2019-05-29 00:17:50.646360] I [MSGID: 115029] [server-handshake.c:537:server_setvolume] 0-gvol0-server: accepted client from CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0 (version: 5.6)
> [2019-05-29 00:17:50.761120] I [MSGID: 115036] [server.c:469:server_rpc_notify] 0-gvol0-server: disconnecting connection from CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0
> [2019-05-29 00:17:50.761352] I [MSGID: 101055] [client_t.c:435:gf_client_unref] 0-gvol0-server: Shutting down connection CTX_ID:30d74196-fece-4380-adc0-338760188b81-GRAPH_ID:0-PID:7718-HOST:gfs2.xxx.com-PC_NAME:gvol0-client-2-RECON_NO:-0
>
> Thanks in advance for any assistance.
>
> -- 
> David Cunningham, Voisonics Limited
> http://voisonics.com/
> USA: +1 213 221 1092
> New Zealand: +64 (0)28 2558 3782
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190529/83f8160d/attachment.html>

From hunter86_bg at yahoo.com  Thu May 30 04:11:51 2019
From: hunter86_bg at yahoo.com (Strahil)
Date: Thu, 30 May 2019 07:11:51 +0300
Subject: [Gluster-users] Transport endpoint is not connected
Message-ID: <20r8rlguxb86gpnxjwe3wpqw.1559189511842@email.android.com>

You can try to run a ncat from gfs3:

ncat -z -v gfs1 49152
ncat -z -v gfs2 49152

If ncat fails to connect ->  it's definately a firewall.

Best Regards,
Strahil NikolovOn May 30, 2019 01:33, David Cunningham <dcunningham at voisonics.com> wrote:
>
> Hi Ravi,
>
> I think it probably is a firewall issue with the network provider. I was hoping to see a specific connection failure message we could send to them, but will take it up with them anyway. 
>
> Thanks for your help.
>
>
> On Wed, 29 May 2019 at 23:10, Ravishankar N <ravishankar at redhat.com> wrote:
>>
>> I don't see a "Connected to gvol0-client-1" in the log.? Perhaps a firewall issue like the last time? Even in the earlier add-brick log from the other email thread, connection to the 2nd brick was not established.
>>
>> -Ravi
>>
>> On 29/05/19 2:26 PM, David Cunningham wrote:
>>>
>>> Hi Ravi and Joe,
>>>
>>> The command "gluster volume status gvol0" shows all 3 nodes as being online, even on gfs3 as below. I've attached the glfsheal-gvol0.log, in which I can't see anything like a connection error. Would you have any further suggestions? Thank you.
>>>
>>> [root at gfs3 glusterfs]# gluster volume status gvol0
>>> Status of volume: gvol0
>>> Gluster process???????????????????????????? TCP Port? RDMA Port? Online? Pid
>>> ------------------------------------------------------------------------------
>>> Brick gfs1:/nodirectwritedata/gluster/gvol0 49152???? 0????????? Y?????? 7706 
>>> Brick gfs2:/nodirectwritedata/gluster/gvol0 49152???? 0????????? Y?????? 7625 
>>> Brick gfs3:/nodirectwritedata/gluster/gvol0 49152???? 0????????? Y?????? 7307 
>>> Self-heal Daemon on localhost?????????????? N/A?????? N/A??????? Y?????? 7316 
>>> Self-heal Daemon on gfs1??????????????????? N/A?????? N/A??????? Y?????? 40591
>>> Self-heal Daemon on gfs2??????????????????? N/A?????? N/A??????? Y?????? 7634 
>>> ?
>>> Task Status of Volume gvol0
>>> ------------------------------------------------------------------------------
>>> There are no active volume tasks
>>>
>>>
>>> On Wed, 29 May 2019 at 16:26, Ravishankar N <ravishankar at redhat.com> wrote:
>>>>
>>>>
>>>> On 29/05/19 6:21 AM, David Cunningham wrote:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190530/fde36918/attachment.html>

From hgowtham at redhat.com  Thu May 30 09:02:41 2019
From: hgowtham at redhat.com (Hari Gowtham)
Date: Thu, 30 May 2019 14:32:41 +0530
Subject: [Gluster-users] Announcing Gluster release 6.2
Message-ID: <CAKh1kXsJvcqryAZ0rVTY71R32sy9YoeDqht2gNSKfcqrctvvJQ@mail.gmail.com>

The Gluster community is pleased to announce the release of Gluster
6.2 (packages available at [1]).

Release notes for the release can be found at [2].

Major changes, features and limitations addressed in this release:

None

Thanks,
Gluster community

[1] Packages for 6.2:
https://download.gluster.org/pub/gluster/glusterfs/6/6.2/

[2] Release notes for 6.2:
https://docs.gluster.org/en/latest/release-notes/6.2/

--
Regards,
Hari Gowtham.

From alan.orth at gmail.com  Thu May 30 21:50:51 2019
From: alan.orth at gmail.com (Alan Orth)
Date: Fri, 31 May 2019 00:50:51 +0300
Subject: [Gluster-users] Does replace-brick migrate data?
In-Reply-To: <CAKKdN4X=H0cVdwKiPK-8tvYMkYWaKrfV7k9zTFJrRSe__5GkGA@mail.gmail.com>
References: <CAKKdN4VP6ome2h2_iXaRVdr+WNp7K6PssCBpOffOn547P9R3Dw@mail.gmail.com>
	<32e26faf-e5c0-b944-2a32-c9eae408b146@redhat.com>
	<CAKKdN4VTTmnt_-TbeT7HanMpLM+cMvm3A9_KFxZJ2HKkg3tUVA@mail.gmail.com>
	<0ab0c28a-48a1-92c0-a106-f4fa94cb620f@redhat.com>
	<39dcc6a5-1610-93e1-aaff-7fef9b6c1faa@redhat.com>
	<CAKKdN4X=H0cVdwKiPK-8tvYMkYWaKrfV7k9zTFJrRSe__5GkGA@mail.gmail.com>
Message-ID: <CAKKdN4W-wKsfx-G5Ji=hkLdw9WBqfdMkjceFrwgoibkp072w4A@mail.gmail.com>

Dear Ravi,

I spent a bit of time inspecting the xattrs on some files and directories
on a few bricks for this volume and it looks a bit messy. Even if I could
make sense of it for a few and potentially heal them manually, there are
millions of files and directories in total so that's definitely not a
scalable solution. After a few missteps with `replace-brick ... commit
force` in the last week?one of which on a brick that was dead/offline?as
well as some premature `remove-brick` commands, I'm unsure how how to
proceed and I'm getting demotivated. It's scary how quickly things get out
of hand in distributed systems...

I had hoped that bringing the old brick back up would help, but by the time
I added it again a few days had passed and all the brick-id's had changed
due to the replace/remove brick commands, not to mention that the
trusted.afr.$volume-client-xx values were now probably pointing to the
wrong bricks (?).

Anyways, a few hours ago I started a full heal on the volume and I see that
there is a sustained 100MiB/sec of network traffic going from the old
brick's host to the new one. The completed heals reported in the logs look
promising too:

Old brick host:

# grep '2019-05-30' /var/log/glusterfs/glustershd.log | grep -o -E
'Completed (data|metadata|entry) selfheal' | sort | uniq -c
 281614 Completed data selfheal
     84 Completed entry selfheal
 299648 Completed metadata selfheal

New brick host:

# grep '2019-05-30' /var/log/glusterfs/glustershd.log | grep -o -E
'Completed (data|metadata|entry) selfheal' | sort | uniq -c
 198256 Completed data selfheal
  16829 Completed entry selfheal
 229664 Completed metadata selfheal

So that's good I guess, though I have no idea how long it will take or if
it will fix the "missing files" issue on the FUSE mount. I've increased
cluster.shd-max-threads to 8 to hopefully speed up the heal process.

I'd be happy for any advice or pointers,

On Wed, May 29, 2019 at 5:20 PM Alan Orth <alan.orth at gmail.com> wrote:

> Dear Ravi,
>
> Thank you for the link to the blog post series?it is very informative and
> current! If I understand your blog post correctly then I think the answer
> to your previous question about pending AFRs is: no, there are no pending
> AFRs. I have identified one file that is a good test case to try to
> understand what happened after I issued the `gluster volume replace-brick
> ... commit force` a few days ago and then added the same original brick
> back to the volume later. This is the current state of the replica 2
> distribute/replicate volume:
>
> [root at wingu0 ~]# gluster volume info apps
>
> Volume Name: apps
> Type: Distributed-Replicate
> Volume ID: f118d2da-79df-4ee1-919d-53884cd34eda
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 3 x 2 = 6
> Transport-type: tcp
> Bricks:
> Brick1: wingu3:/mnt/gluster/apps
> Brick2: wingu4:/mnt/gluster/apps
> Brick3: wingu05:/data/glusterfs/sdb/apps
> Brick4: wingu06:/data/glusterfs/sdb/apps
> Brick5: wingu0:/mnt/gluster/apps
> Brick6: wingu05:/data/glusterfs/sdc/apps
> Options Reconfigured:
> diagnostics.client-log-level: DEBUG
> storage.health-check-interval: 10
> nfs.disable: on
>
> I checked the xattrs of one file that is missing from the volume's FUSE
> mount (though I can read it if I access its full path explicitly), but is
> present in several of the volume's bricks (some with full size, others
> empty):
>
> [root at wingu0 ~]# getfattr -d -m. -e hex
> /mnt/gluster/apps/clcgenomics/clclicsrv/licenseserver.cfg
>
> getfattr: Removing leading '/' from absolute path names
> # file: mnt/gluster/apps/clcgenomics/clclicsrv/licenseserver.cfg
> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
> trusted.afr.apps-client-3=0x000000000000000000000000
> trusted.afr.apps-client-5=0x000000000000000000000000
> trusted.afr.dirty=0x000000000000000000000000
> trusted.bit-rot.version=0x0200000000000000585a396f00046e15
> trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd
>
> [root at wingu05 ~]# getfattr -d -m. -e hex /data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg
> getfattr: Removing leading '/' from absolute path names
> # file: data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg
> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
> trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd
> trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667
> trusted.glusterfs.dht.linkto=0x617070732d7265706c69636174652d3200
>
> [root at wingu05 ~]# getfattr -d -m. -e hex /data/glusterfs/sdc/apps/clcgenomics/clclicsrv/licenseserver.cfg
> getfattr: Removing leading '/' from absolute path names
> # file: data/glusterfs/sdc/apps/clcgenomics/clclicsrv/licenseserver.cfg
> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
> trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd
> trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667
>
> [root at wingu06 ~]# getfattr -d -m. -e hex /data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg
> getfattr: Removing leading '/' from absolute path names
> # file: data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg
> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
> trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd
> trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667
> trusted.glusterfs.dht.linkto=0x617070732d7265706c69636174652d3200
>
> According to the trusted.afr.apps-client-xx xattrs this particular file
> should be on bricks with id "apps-client-3" and "apps-client-5". It took me
> a few hours to realize that the brick-id values are recorded in the
> volume's volfiles in /var/lib/glusterd/vols/apps/bricks. After comparing
> those brick-id values with a volfile backup from before the replace-brick,
> I realized that the files are simply on the wrong brick now as far as
> Gluster is concerned. This particular file is now on the brick for
> "apps-client-4". As an experiment I copied this one file to the two
> bricks listed in the xattrs and I was then able to see the file from the
> FUSE mount (yay!).
>
> Other than replacing the brick, removing it, and then adding the old brick
> on the original server back, there has been no change in the data this
> entire time. Can I change the brick IDs in the volfiles so they reflect
> where the data actually is? Or perhaps script something to reset all the
> xattrs on the files/directories to point to the correct bricks?
>
> Thank you for any help or pointers,
>
> On Wed, May 29, 2019 at 7:24 AM Ravishankar N <ravishankar at redhat.com>
> wrote:
>
>>
>> On 29/05/19 9:50 AM, Ravishankar N wrote:
>>
>>
>> On 29/05/19 3:59 AM, Alan Orth wrote:
>>
>> Dear Ravishankar,
>>
>> I'm not sure if Brick4 had pending AFRs because I don't know what that
>> means and it's been a few days so I am not sure I would be able to find
>> that information.
>>
>> When you find some time, have a look at a blog <http://wp.me/peiBB-6b>
>> series I wrote about AFR- I've tried to explain what one needs to know to
>> debug replication related issues in it.
>>
>> Made a typo error. The URL for the blog is https://wp.me/peiBB-6b
>>
>> -Ravi
>>
>>
>> Anyways, after wasting a few days rsyncing the old brick to a new host I
>> decided to just try to add the old brick back into the volume instead of
>> bringing it up on the new host. I created a new brick directory on the old
>> host, moved the old brick's contents into that new directory (minus the
>> .glusterfs directory), added the new brick to the volume, and then did
>> Vlad's find/stat trick? from the brick to the FUSE mount point.
>>
>> The interesting problem I have now is that some files don't appear in the
>> FUSE mount's directory listings, but I can actually list them directly and
>> even read them. What could cause that?
>>
>> Not sure, too many variables in the hacks that you did to take a guess.
>> You can check if the contents of the .glusterfs folder are in order on the
>> new brick (example hardlink for files and symlinks for directories are
>> present etc.) .
>> Regards,
>> Ravi
>>
>>
>> Thanks,
>>
>> ?
>> https://lists.gluster.org/pipermail/gluster-users/2018-February/033584.html
>>
>> On Fri, May 24, 2019 at 4:59 PM Ravishankar N <ravishankar at redhat.com>
>> wrote:
>>
>>>
>>> On 23/05/19 2:40 AM, Alan Orth wrote:
>>>
>>> Dear list,
>>>
>>> I seem to have gotten into a tricky situation. Today I brought up a
>>> shiny new server with new disk arrays and attempted to replace one brick of
>>> a replica 2 distribute/replicate volume on an older server using the
>>> `replace-brick` command:
>>>
>>> # gluster volume replace-brick homes wingu0:/mnt/gluster/homes
>>> wingu06:/data/glusterfs/sdb/homes commit force
>>>
>>> The command was successful and I see the new brick in the output of
>>> `gluster volume info`. The problem is that Gluster doesn't seem to be
>>> migrating the data,
>>>
>>> `replace-brick` definitely must heal (not migrate) the data. In your
>>> case, data must have been healed from Brick-4 to the replaced Brick-3. Are
>>> there any errors in the self-heal daemon logs of Brick-4's node? Does
>>> Brick-4 have pending AFR xattrs blaming Brick-3? The doc is a bit out of
>>> date. replace-brick command internally does all the setfattr steps that are
>>> mentioned in the doc.
>>>
>>> -Ravi
>>>
>>>
>>> and now the original brick that I replaced is no longer part of the
>>> volume (and a few terabytes of data are just sitting on the old brick):
>>>
>>> # gluster volume info homes | grep -E "Brick[0-9]:"
>>> Brick1: wingu4:/mnt/gluster/homes
>>> Brick2: wingu3:/mnt/gluster/homes
>>> Brick3: wingu06:/data/glusterfs/sdb/homes
>>> Brick4: wingu05:/data/glusterfs/sdb/homes
>>> Brick5: wingu05:/data/glusterfs/sdc/homes
>>> Brick6: wingu06:/data/glusterfs/sdc/homes
>>>
>>> I see the Gluster docs have a more complicated procedure for replacing
>>> bricks that involves getfattr/setfattr?. How can I tell Gluster about the
>>> old brick? I see that I have a backup of the old volfile thanks to yum's
>>> rpmsave function if that helps.
>>>
>>> We are using Gluster 5.6 on CentOS 7. Thank you for any advice you can
>>> give.
>>>
>>> ?
>>> https://docs.gluster.org/en/latest/Administrator%20Guide/Managing%20Volumes/#replace-faulty-brick
>>>
>>> --
>>> Alan Orth
>>> alan.orth at gmail.com
>>> https://picturingjordan.com
>>> https://englishbulgaria.net
>>> https://mjanja.ch
>>> "In heaven all the interesting people are missing." ?Friedrich Nietzsche
>>>
>>> _______________________________________________
>>> Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>>
>> --
>> Alan Orth
>> alan.orth at gmail.com
>> https://picturingjordan.com
>> https://englishbulgaria.net
>> https://mjanja.ch
>> "In heaven all the interesting people are missing." ?Friedrich Nietzsche
>>
>>
>> _______________________________________________
>> Gluster-users mailing listGluster-users at gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>
> --
> Alan Orth
> alan.orth at gmail.com
> https://picturingjordan.com
> https://englishbulgaria.net
> https://mjanja.ch
> "In heaven all the interesting people are missing." ?Friedrich Nietzsche
>


-- 
Alan Orth
alan.orth at gmail.com
https://picturingjordan.com
https://englishbulgaria.net
https://mjanja.ch
"In heaven all the interesting people are missing." ?Friedrich Nietzsche
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190531/f7909d13/attachment.html>

From ravishankar at redhat.com  Fri May 31 04:57:18 2019
From: ravishankar at redhat.com (Ravishankar N)
Date: Fri, 31 May 2019 10:27:18 +0530
Subject: [Gluster-users] Does replace-brick migrate data?
In-Reply-To: <CAKKdN4W-wKsfx-G5Ji=hkLdw9WBqfdMkjceFrwgoibkp072w4A@mail.gmail.com>
References: <CAKKdN4VP6ome2h2_iXaRVdr+WNp7K6PssCBpOffOn547P9R3Dw@mail.gmail.com>
	<32e26faf-e5c0-b944-2a32-c9eae408b146@redhat.com>
	<CAKKdN4VTTmnt_-TbeT7HanMpLM+cMvm3A9_KFxZJ2HKkg3tUVA@mail.gmail.com>
	<0ab0c28a-48a1-92c0-a106-f4fa94cb620f@redhat.com>
	<39dcc6a5-1610-93e1-aaff-7fef9b6c1faa@redhat.com>
	<CAKKdN4X=H0cVdwKiPK-8tvYMkYWaKrfV7k9zTFJrRSe__5GkGA@mail.gmail.com>
	<CAKKdN4W-wKsfx-G5Ji=hkLdw9WBqfdMkjceFrwgoibkp072w4A@mail.gmail.com>
Message-ID: <e32dc54f-74b6-4d0a-da03-fe61d1eb2d93@redhat.com>


On 31/05/19 3:20 AM, Alan Orth wrote:
> Dear Ravi,
>
> I spent a bit of time inspecting the xattrs on some files and 
> directories on a few bricks for this volume and it looks a bit messy. 
> Even if I could make sense of it for a few and potentially heal them 
> manually, there are millions of files and directories in total so 
> that's definitely not a scalable solution. After a few missteps with 
> `replace-brick ... commit force` in the last week?one of which on a 
> brick that was dead/offline?as well as some premature `remove-brick` 
> commands, I'm unsure how how to proceed and I'm getting demotivated. 
> It's scary how quickly things get out of hand in distributed systems...
Hi Alan,
The one good thing about gluster is it that the data is always available 
directly on the backed bricks even if your volume has inconsistencies at 
the gluster level. So theoretically, if your cluster is FUBAR, you could 
just create a new volume and copy all data onto it via its mount from 
the old volume's bricks.
>
> I had hoped that bringing the old brick back up would help, but by the 
> time I added it again a few days had passed and all the brick-id's had 
> changed due to the replace/remove brick commands, not to mention that 
> the trusted.afr.$volume-client-xx values were now probably pointing to 
> the wrong bricks (?).
>
> Anyways, a few hours ago I started a full heal on the volume and I see 
> that there is a sustained 100MiB/sec of network traffic going from the 
> old brick's host to the new one. The completed heals reported in the 
> logs look promising too:
>
> Old brick host:
>
> # grep '2019-05-30' /var/log/glusterfs/glustershd.log | grep -o -E 
> 'Completed (data|metadata|entry) selfheal' | sort | uniq -c
> ?281614 Completed data selfheal
> ? ? ?84 Completed entry selfheal
> ?299648 Completed metadata selfheal
>
> New brick host:
>
> # grep '2019-05-30' /var/log/glusterfs/glustershd.log | grep -o -E 
> 'Completed (data|metadata|entry) selfheal' | sort | uniq -c
> ?198256 Completed data selfheal
> ? 16829 Completed entry selfheal
> ?229664 Completed metadata selfheal
>
> So that's good I guess, though I have no idea how long it will take or 
> if it will fix the "missing files" issue on the FUSE mount. I've 
> increased cluster.shd-max-threads to 8 to hopefully speed up the heal 
> process.
The afr xattrs should not cause files to disappear from mount. If the 
xattr names do not match what each AFR subvol expects (for eg. in a 
replica 2 volume, trusted.afr.*-client-{0,1} for 1st subvol, 
client-{2,3} for 2nd subvol and so on - ) for its children then it won't 
heal the data, that is all. But in your case I see some inconsistencies 
like one brick having the actual file (licenseserver.cfg) and the other 
having a linkto file (the one with thedht.linkto xattr) /in the same 
replica pair/.
>
> I'd be happy for any advice or pointers,

Did you check if the .glusterfs hardlinks/symlinks exist and are in 
order for all bricks?

-Ravi

>
> On Wed, May 29, 2019 at 5:20 PM Alan Orth <alan.orth at gmail.com 
> <mailto:alan.orth at gmail.com>> wrote:
>
>     Dear Ravi,
>
>     Thank you for the link to the blog post series?it is very
>     informative and current! If I understand your blog post correctly
>     then I think the answer to your previous question about pending
>     AFRs is: no, there are no pending AFRs. I have identified one file
>     that is a good test case to try to understand what happened after
>     I issued the `gluster volume replace-brick ... commit force` a few
>     days ago and then added the same original brick back to the volume
>     later. This is the current state of the replica 2
>     distribute/replicate volume:
>
>     [root at wingu0 ~]# gluster volume info apps
>
>     Volume Name: apps
>     Type: Distributed-Replicate
>     Volume ID: f118d2da-79df-4ee1-919d-53884cd34eda
>     Status: Started
>     Snapshot Count: 0
>     Number of Bricks: 3 x 2 = 6
>     Transport-type: tcp
>     Bricks:
>     Brick1: wingu3:/mnt/gluster/apps
>     Brick2: wingu4:/mnt/gluster/apps
>     Brick3: wingu05:/data/glusterfs/sdb/apps
>     Brick4: wingu06:/data/glusterfs/sdb/apps
>     Brick5: wingu0:/mnt/gluster/apps
>     Brick6: wingu05:/data/glusterfs/sdc/apps
>     Options Reconfigured:
>     diagnostics.client-log-level: DEBUG
>     storage.health-check-interval: 10
>     nfs.disable: on
>
>     I checked the xattrs of one file that is missing from the volume's
>     FUSE mount (though I can read it if I access its full path
>     explicitly), but is present in several of the volume's bricks
>     (some with full size, others empty):
>
>     [root at wingu0 ~]# getfattr -d -m. -e hex
>     /mnt/gluster/apps/clcgenomics/clclicsrv/licenseserver.cfg
>
>     getfattr: Removing leading '/' from absolute path names # file:
>     mnt/gluster/apps/clcgenomics/clclicsrv/licenseserver.cfg
>     security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
>     trusted.afr.apps-client-3=0x000000000000000000000000
>     trusted.afr.apps-client-5=0x000000000000000000000000
>     trusted.afr.dirty=0x000000000000000000000000
>     trusted.bit-rot.version=0x0200000000000000585a396f00046e15
>     trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd [root at wingu05 ~]#
>     getfattr -d -m. -e hex
>     /data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg
>     getfattr: Removing leading '/' from absolute path names # file:
>     data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg
>     security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
>     trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd
>     trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667
>     trusted.glusterfs.dht.linkto=0x617070732d7265706c69636174652d3200
>     [root at wingu05 ~]# getfattr -d -m. -e hex
>     /data/glusterfs/sdc/apps/clcgenomics/clclicsrv/licenseserver.cfg
>     getfattr: Removing leading '/' from absolute path names # file:
>     data/glusterfs/sdc/apps/clcgenomics/clclicsrv/licenseserver.cfg
>     security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
>     trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd
>     trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667
>     [root at wingu06 ~]# getfattr -d -m. -e hex
>     /data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg
>     getfattr: Removing leading '/' from absolute path names # file:
>     data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg
>     security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
>     trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd
>     trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667
>     trusted.glusterfs.dht.linkto=0x617070732d7265706c69636174652d3200
>
>     According to the trusted.afr.apps-client-xxxattrs this particular
>     file should be on bricks with id "apps-client-3" and
>     "apps-client-5". It took me a few hours to realize that the
>     brick-id values are recorded in the volume's volfiles in
>     /var/lib/glusterd/vols/apps/bricks. After comparing those brick-id
>     values with a volfile backup from before the replace-brick, I
>     realized that the files are simply on the wrong brick now as far
>     as Gluster is concerned. This particular file is now on the brick
>     for "apps-client-4". As an experiment I copied this one file to
>     the two bricks listed in the xattrs and I was then able to see the
>     file from the FUSE mount (yay!).
>
>     Other than replacing the brick, removing it, and then adding the
>     old brick on the original server back, there has been no change in
>     the data this entire time. Can I change the brick IDs in the
>     volfiles so they reflect where the data actually is? Or perhaps
>     script something to reset all the xattrs on the files/directories
>     to point to the correct bricks?
>
>     Thank you for any help or pointers,
>
>     On Wed, May 29, 2019 at 7:24 AM Ravishankar N
>     <ravishankar at redhat.com <mailto:ravishankar at redhat.com>> wrote:
>
>
>         On 29/05/19 9:50 AM, Ravishankar N wrote:
>>
>>
>>         On 29/05/19 3:59 AM, Alan Orth wrote:
>>>         Dear Ravishankar,
>>>
>>>         I'm not sure if Brick4 had pending AFRs because I don't know
>>>         what that means and it's been a few days so I am not sure I
>>>         would be able to find that information.
>>         When you find some time, have a look at a blog
>>         <http://wp.me/peiBB-6b> series I wrote about AFR- I've tried
>>         to explain what one needs to know to debug replication
>>         related issues in it.
>
>         Made a typo error. The URL for the blog is https://wp.me/peiBB-6b
>
>         -Ravi
>
>>>
>>>         Anyways, after wasting a few days rsyncing the old brick to
>>>         a new host I decided to just try to add the old brick back
>>>         into the volume instead of bringing it up on the new host. I
>>>         created a new brick directory on the old host, moved the old
>>>         brick's contents into that new directory (minus the
>>>         .glusterfs directory), added the new brick to the volume,
>>>         and then did Vlad's find/stat trick? from the brick to the
>>>         FUSE mount point.
>>>
>>>         The interesting problem I have now is that some files don't
>>>         appear in the FUSE mount's directory listings, but I can
>>>         actually list them directly and even read them. What could
>>>         cause that?
>>         Not sure, too many variables in the hacks that you did to
>>         take a guess. You can check if the contents of the .glusterfs
>>         folder are in order on the new brick (example hardlink for
>>         files and symlinks for directories are present etc.) .
>>         Regards,
>>         Ravi
>>>
>>>         Thanks,
>>>
>>>         ?
>>>         https://lists.gluster.org/pipermail/gluster-users/2018-February/033584.html
>>>
>>>         On Fri, May 24, 2019 at 4:59 PM Ravishankar N
>>>         <ravishankar at redhat.com <mailto:ravishankar at redhat.com>> wrote:
>>>
>>>
>>>             On 23/05/19 2:40 AM, Alan Orth wrote:
>>>>             Dear list,
>>>>
>>>>             I seem to have gotten into a tricky situation. Today I
>>>>             brought up a shiny new server with new disk arrays and
>>>>             attempted to replace one brick of a replica 2
>>>>             distribute/replicate volume on an older server using
>>>>             the `replace-brick` command:
>>>>
>>>>             # gluster volume replace-brick homes
>>>>             wingu0:/mnt/gluster/homes
>>>>             wingu06:/data/glusterfs/sdb/homes commit force
>>>>
>>>>             The command was successful and I see the new brick in
>>>>             the output of `gluster volume info`. The problem is
>>>>             that Gluster doesn't seem to be migrating the data,
>>>
>>>             `replace-brick` definitely must heal (not migrate) the
>>>             data. In your case, data must have been healed from
>>>             Brick-4 to the replaced Brick-3. Are there any errors in
>>>             the self-heal daemon logs of Brick-4's node? Does
>>>             Brick-4 have pending AFR xattrs blaming Brick-3? The doc
>>>             is a bit out of date. replace-brick command internally
>>>             does all the setfattr steps that are mentioned in the doc.
>>>
>>>             -Ravi
>>>
>>>
>>>>             and now the original brick that I replaced is no longer
>>>>             part of the volume (and a few terabytes of data are
>>>>             just sitting on the old brick):
>>>>
>>>>             # gluster volume info homes | grep -E "Brick[0-9]:"
>>>>             Brick1: wingu4:/mnt/gluster/homes
>>>>             Brick2: wingu3:/mnt/gluster/homes
>>>>             Brick3: wingu06:/data/glusterfs/sdb/homes
>>>>             Brick4: wingu05:/data/glusterfs/sdb/homes
>>>>             Brick5: wingu05:/data/glusterfs/sdc/homes
>>>>             Brick6: wingu06:/data/glusterfs/sdc/homes
>>>>
>>>>             I see the Gluster docs have a more complicated
>>>>             procedure for replacing bricks that involves
>>>>             getfattr/setfattr?. How can I tell Gluster about the
>>>>             old brick? I see that I have a backup of the old
>>>>             volfile thanks to yum's rpmsave function if that helps.
>>>>
>>>>             We are using Gluster 5.6 on CentOS 7. Thank you for any
>>>>             advice you can give.
>>>>
>>>>             ?
>>>>             https://docs.gluster.org/en/latest/Administrator%20Guide/Managing%20Volumes/#replace-faulty-brick
>>>>
>>>>             -- 
>>>>             Alan Orth
>>>>             alan.orth at gmail.com <mailto:alan.orth at gmail.com>
>>>>             https://picturingjordan.com
>>>>             https://englishbulgaria.net
>>>>             https://mjanja.ch
>>>>             "In heaven all the interesting people are missing."
>>>>             ?Friedrich Nietzsche
>>>>
>>>>             _______________________________________________
>>>>             Gluster-users mailing list
>>>>             Gluster-users at gluster.org  <mailto:Gluster-users at gluster.org>
>>>>             https://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>>>
>>>         -- 
>>>         Alan Orth
>>>         alan.orth at gmail.com <mailto:alan.orth at gmail.com>
>>>         https://picturingjordan.com
>>>         https://englishbulgaria.net
>>>         https://mjanja.ch
>>>         "In heaven all the interesting people are missing."
>>>         ?Friedrich Nietzsche
>>
>>         _______________________________________________
>>         Gluster-users mailing list
>>         Gluster-users at gluster.org  <mailto:Gluster-users at gluster.org>
>>         https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
>     -- 
>     Alan Orth
>     alan.orth at gmail.com <mailto:alan.orth at gmail.com>
>     https://picturingjordan.com
>     https://englishbulgaria.net
>     https://mjanja.ch
>     "In heaven all the interesting people are missing." ?Friedrich
>     Nietzsche
>
>
>
> -- 
> Alan Orth
> alan.orth at gmail.com <mailto:alan.orth at gmail.com>
> https://picturingjordan.com
> https://englishbulgaria.net
> https://mjanja.ch
> "In heaven all the interesting people are missing." ?Friedrich Nietzsche
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190531/132b5048/attachment.html>

From khiremat at redhat.com  Fri May 31 05:34:39 2019
From: khiremat at redhat.com (Kotresh Hiremath Ravishankar)
Date: Fri, 31 May 2019 11:04:39 +0530
Subject: [Gluster-users] Geo Replication stops replicating
In-Reply-To: <CAKs6y9yUt4x-VyYwAq=EQfHsH+KQGVyxH7EwC24SV0HSGhQjAw@mail.gmail.com>
References: <CAKs6y9yUt4x-VyYwAq=EQfHsH+KQGVyxH7EwC24SV0HSGhQjAw@mail.gmail.com>
Message-ID: <CAPgWtC6WdUxxcT5cOsyNkxzjj1oCB8ychgxWNW5wpDK40AQ+SQ@mail.gmail.com>

Hi,

This looks like the hang because stderr buffer filled up with errors
messages and no one reading it.
I think this issue is fixed in latest releases. As a workaround, you can do
following and check if it works.

Prerequisite:
 rsync version should be > 3.1.0

Workaround:
gluster volume geo-replication <MASTERVOL> <SLAVEHOST>::<SLAVEVOL> config
rsync-options "--ignore-missing-args"

Thanks,
Kotresh HR


On Thu, May 30, 2019 at 5:39 PM deepu srinivasan <sdeepugd at gmail.com> wrote:

> Hi
> We were evaluating Gluster geo Replication between two DCs one is in US
> west and one is in US east. We took multiple trials for different file
> size.
> The Geo Replication tends to stop replicating but while checking the
> status it appears to be in Active state. But the slave volume did not
> increase in size.
> So we have restarted the geo-replication session and checked the status.
> The status was in an active state and it was in History Crawl for a long
> time. We have enabled the DEBUG mode in logging and checked for any error.
> There was around 2000 file appeared for syncing candidate. The Rsync
> process starts but the rsync did not happen in the slave volume. Every time
> the rsync process appears in the "ps auxxx" list but the replication did
> not happen in the slave end. What would be the cause of this problem? Is
> there anyway to debug it?
>
> We have also checked the strace of the rync program.
> it displays something like this
>
> "write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 128"
>
>
> We are using the below specs
>
> Gluster version - 4.1.7
> Sync mode - rsync
> Volume - 1x3 in each end (master and slave)
> Intranet Bandwidth - 10 Gig
>


-- 
Thanks and Regards,
Kotresh H R
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190531/d07e67ce/attachment.html>

From khiremat at redhat.com  Fri May 31 09:55:37 2019
From: khiremat at redhat.com (Kotresh Hiremath Ravishankar)
Date: Fri, 31 May 2019 15:25:37 +0530
Subject: [Gluster-users] Geo Replication stops replicating
In-Reply-To: <CAKs6y9x0wF62KVg7ObTfPpVOSKmcQE-wV_+wdandaQkEAgGR8Q@mail.gmail.com>
References: <CAKs6y9yUt4x-VyYwAq=EQfHsH+KQGVyxH7EwC24SV0HSGhQjAw@mail.gmail.com>
	<CAPgWtC6WdUxxcT5cOsyNkxzjj1oCB8ychgxWNW5wpDK40AQ+SQ@mail.gmail.com>
	<CAKs6y9wcDMyVO2r6cAqs-JvdjCajzjit0DnnNxN8b=NVLQU8xw@mail.gmail.com>
	<CAKs6y9x0wF62KVg7ObTfPpVOSKmcQE-wV_+wdandaQkEAgGR8Q@mail.gmail.com>
Message-ID: <CAPgWtC56YLaOx4FFBe3y+M=HnGhdM_bsf2B680e7g_vAk4B_Eg@mail.gmail.com>

Hi,

Could you take the strace with with more string size? The argument strings
are truncated.

strace -s 500 -ttt -T -p <rsync pid>

On Fri, May 31, 2019 at 3:17 PM deepu srinivasan <sdeepugd at gmail.com> wrote:

> Hi Kotresh
> The above-mentioned work around did not work properly.
>
> On Fri, May 31, 2019 at 3:16 PM deepu srinivasan <sdeepugd at gmail.com>
> wrote:
>
>> Hi Kotresh
>> We have tried the above-mentioned rsync option and we are planning to
>> have the version upgrade to 6.0.
>>
>> On Fri, May 31, 2019 at 11:04 AM Kotresh Hiremath Ravishankar <
>> khiremat at redhat.com> wrote:
>>
>>> Hi,
>>>
>>> This looks like the hang because stderr buffer filled up with errors
>>> messages and no one reading it.
>>> I think this issue is fixed in latest releases. As a workaround, you can
>>> do following and check if it works.
>>>
>>> Prerequisite:
>>>  rsync version should be > 3.1.0
>>>
>>> Workaround:
>>> gluster volume geo-replication <MASTERVOL> <SLAVEHOST>::<SLAVEVOL>
>>> config rsync-options "--ignore-missing-args"
>>>
>>> Thanks,
>>> Kotresh HR
>>>
>>>
>>>
>>>
>>> On Thu, May 30, 2019 at 5:39 PM deepu srinivasan <sdeepugd at gmail.com>
>>> wrote:
>>>
>>>> Hi
>>>> We were evaluating Gluster geo Replication between two DCs one is in US
>>>> west and one is in US east. We took multiple trials for different file
>>>> size.
>>>> The Geo Replication tends to stop replicating but while checking the
>>>> status it appears to be in Active state. But the slave volume did not
>>>> increase in size.
>>>> So we have restarted the geo-replication session and checked the
>>>> status. The status was in an active state and it was in History Crawl for a
>>>> long time. We have enabled the DEBUG mode in logging and checked for any
>>>> error.
>>>> There was around 2000 file appeared for syncing candidate. The Rsync
>>>> process starts but the rsync did not happen in the slave volume. Every time
>>>> the rsync process appears in the "ps auxxx" list but the replication did
>>>> not happen in the slave end. What would be the cause of this problem? Is
>>>> there anyway to debug it?
>>>>
>>>> We have also checked the strace of the rync program.
>>>> it displays something like this
>>>>
>>>> "write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 128"
>>>>
>>>>
>>>> We are using the below specs
>>>>
>>>> Gluster version - 4.1.7
>>>> Sync mode - rsync
>>>> Volume - 1x3 in each end (master and slave)
>>>> Intranet Bandwidth - 10 Gig
>>>>
>>>
>>>
>>> --
>>> Thanks and Regards,
>>> Kotresh H R
>>>
>>

-- 
Thanks and Regards,
Kotresh H R
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190531/b89a13b6/attachment.html>

From khiremat at redhat.com  Fri May 31 10:52:52 2019
From: khiremat at redhat.com (Kotresh Hiremath Ravishankar)
Date: Fri, 31 May 2019 16:22:52 +0530
Subject: [Gluster-users] Geo Replication stops replicating
In-Reply-To: <CAKs6y9x4H7g-_O-CPxpK3uT6W-tK1UsB1Dgu22+fsMCUai+cUQ@mail.gmail.com>
References: <CAKs6y9yUt4x-VyYwAq=EQfHsH+KQGVyxH7EwC24SV0HSGhQjAw@mail.gmail.com>
	<CAPgWtC6WdUxxcT5cOsyNkxzjj1oCB8ychgxWNW5wpDK40AQ+SQ@mail.gmail.com>
	<CAKs6y9wcDMyVO2r6cAqs-JvdjCajzjit0DnnNxN8b=NVLQU8xw@mail.gmail.com>
	<CAKs6y9x0wF62KVg7ObTfPpVOSKmcQE-wV_+wdandaQkEAgGR8Q@mail.gmail.com>
	<CAPgWtC56YLaOx4FFBe3y+M=HnGhdM_bsf2B680e7g_vAk4B_Eg@mail.gmail.com>
	<CAKs6y9x4H7g-_O-CPxpK3uT6W-tK1UsB1Dgu22+fsMCUai+cUQ@mail.gmail.com>
Message-ID: <CAPgWtC51131=vGJ22=HDZQ7RdodnmKThTFYVtNH-LOusaJGosA@mail.gmail.com>

Yes, rsync config option should have fixed this issue.

Could you share the output of the following?

1. gluster volume geo-replication <MASTERVOL> <SLAVEHOST>::<SLAVEVOL>
config rsync-options
2. ps -ef | grep rsync

On Fri, May 31, 2019 at 4:11 PM deepu srinivasan <sdeepugd at gmail.com> wrote:

> Done.
> We got the following result .
>
>> 1559298781.338234 write(2, "rsync: link_stat
>> \"/tmp/gsyncd-aux-mount-EEJ_sY/.gfid/3fa6aed8-802e-4efe-9903-8bc171176d88\"
>> failed: No such file or directory (2)", 128
>
> seems like a file is missing ?
>
> On Fri, May 31, 2019 at 3:25 PM Kotresh Hiremath Ravishankar <
> khiremat at redhat.com> wrote:
>
>> Hi,
>>
>> Could you take the strace with with more string size? The argument
>> strings are truncated.
>>
>> strace -s 500 -ttt -T -p <rsync pid>
>>
>> On Fri, May 31, 2019 at 3:17 PM deepu srinivasan <sdeepugd at gmail.com>
>> wrote:
>>
>>> Hi Kotresh
>>> The above-mentioned work around did not work properly.
>>>
>>> On Fri, May 31, 2019 at 3:16 PM deepu srinivasan <sdeepugd at gmail.com>
>>> wrote:
>>>
>>>> Hi Kotresh
>>>> We have tried the above-mentioned rsync option and we are planning to
>>>> have the version upgrade to 6.0.
>>>>
>>>> On Fri, May 31, 2019 at 11:04 AM Kotresh Hiremath Ravishankar <
>>>> khiremat at redhat.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> This looks like the hang because stderr buffer filled up with errors
>>>>> messages and no one reading it.
>>>>> I think this issue is fixed in latest releases. As a workaround, you
>>>>> can do following and check if it works.
>>>>>
>>>>> Prerequisite:
>>>>>  rsync version should be > 3.1.0
>>>>>
>>>>> Workaround:
>>>>> gluster volume geo-replication <MASTERVOL> <SLAVEHOST>::<SLAVEVOL>
>>>>> config rsync-options "--ignore-missing-args"
>>>>>
>>>>> Thanks,
>>>>> Kotresh HR
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Thu, May 30, 2019 at 5:39 PM deepu srinivasan <sdeepugd at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi
>>>>>> We were evaluating Gluster geo Replication between two DCs one is in
>>>>>> US west and one is in US east. We took multiple trials for different file
>>>>>> size.
>>>>>> The Geo Replication tends to stop replicating but while checking the
>>>>>> status it appears to be in Active state. But the slave volume did not
>>>>>> increase in size.
>>>>>> So we have restarted the geo-replication session and checked the
>>>>>> status. The status was in an active state and it was in History Crawl for a
>>>>>> long time. We have enabled the DEBUG mode in logging and checked for any
>>>>>> error.
>>>>>> There was around 2000 file appeared for syncing candidate. The Rsync
>>>>>> process starts but the rsync did not happen in the slave volume. Every time
>>>>>> the rsync process appears in the "ps auxxx" list but the replication did
>>>>>> not happen in the slave end. What would be the cause of this problem? Is
>>>>>> there anyway to debug it?
>>>>>>
>>>>>> We have also checked the strace of the rync program.
>>>>>> it displays something like this
>>>>>>
>>>>>> "write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 128"
>>>>>>
>>>>>>
>>>>>> We are using the below specs
>>>>>>
>>>>>> Gluster version - 4.1.7
>>>>>> Sync mode - rsync
>>>>>> Volume - 1x3 in each end (master and slave)
>>>>>> Intranet Bandwidth - 10 Gig
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Thanks and Regards,
>>>>> Kotresh H R
>>>>>
>>>>
>>
>> --
>> Thanks and Regards,
>> Kotresh H R
>>
>

-- 
Thanks and Regards,
Kotresh H R
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190531/c5663c74/attachment.html>

From khiremat at redhat.com  Fri May 31 11:05:50 2019
From: khiremat at redhat.com (Kotresh Hiremath Ravishankar)
Date: Fri, 31 May 2019 16:35:50 +0530
Subject: [Gluster-users] Geo Replication stops replicating
In-Reply-To: <CAKs6y9w1rU5oGLQbCAvi9VGPnUTi02Sz3YksWPuBgFG1YW6P=g@mail.gmail.com>
References: <CAKs6y9yUt4x-VyYwAq=EQfHsH+KQGVyxH7EwC24SV0HSGhQjAw@mail.gmail.com>
	<CAPgWtC6WdUxxcT5cOsyNkxzjj1oCB8ychgxWNW5wpDK40AQ+SQ@mail.gmail.com>
	<CAKs6y9wcDMyVO2r6cAqs-JvdjCajzjit0DnnNxN8b=NVLQU8xw@mail.gmail.com>
	<CAKs6y9x0wF62KVg7ObTfPpVOSKmcQE-wV_+wdandaQkEAgGR8Q@mail.gmail.com>
	<CAPgWtC56YLaOx4FFBe3y+M=HnGhdM_bsf2B680e7g_vAk4B_Eg@mail.gmail.com>
	<CAKs6y9x4H7g-_O-CPxpK3uT6W-tK1UsB1Dgu22+fsMCUai+cUQ@mail.gmail.com>
	<CAPgWtC51131=vGJ22=HDZQ7RdodnmKThTFYVtNH-LOusaJGosA@mail.gmail.com>
	<CAKs6y9w1rU5oGLQbCAvi9VGPnUTi02Sz3YksWPuBgFG1YW6P=g@mail.gmail.com>
Message-ID: <CAPgWtC6DsMjHNK6Pj4S=f35QGMGYemMFr-RjocpExPZq7mG+bw@mail.gmail.com>

That means it could be working and the defunct process might be some old
zombie one. Could you check, that data progress ?

On Fri, May 31, 2019 at 4:29 PM deepu srinivasan <sdeepugd at gmail.com> wrote:

> Hi
> When i change the rsync option the rsync process doesnt seem to start .
> Only a defunt process is listed in ps aux. Only when i set rsync option to
> " " and restart all the process the rsync process is listed in ps aux.
>
>
> On Fri, May 31, 2019 at 4:23 PM Kotresh Hiremath Ravishankar <
> khiremat at redhat.com> wrote:
>
>> Yes, rsync config option should have fixed this issue.
>>
>> Could you share the output of the following?
>>
>> 1. gluster volume geo-replication <MASTERVOL> <SLAVEHOST>::<SLAVEVOL>
>> config rsync-options
>> 2. ps -ef | grep rsync
>>
>> On Fri, May 31, 2019 at 4:11 PM deepu srinivasan <sdeepugd at gmail.com>
>> wrote:
>>
>>> Done.
>>> We got the following result .
>>>
>>>> 1559298781.338234 write(2, "rsync: link_stat
>>>> \"/tmp/gsyncd-aux-mount-EEJ_sY/.gfid/3fa6aed8-802e-4efe-9903-8bc171176d88\"
>>>> failed: No such file or directory (2)", 128
>>>
>>> seems like a file is missing ?
>>>
>>> On Fri, May 31, 2019 at 3:25 PM Kotresh Hiremath Ravishankar <
>>> khiremat at redhat.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> Could you take the strace with with more string size? The argument
>>>> strings are truncated.
>>>>
>>>> strace -s 500 -ttt -T -p <rsync pid>
>>>>
>>>> On Fri, May 31, 2019 at 3:17 PM deepu srinivasan <sdeepugd at gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Kotresh
>>>>> The above-mentioned work around did not work properly.
>>>>>
>>>>> On Fri, May 31, 2019 at 3:16 PM deepu srinivasan <sdeepugd at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Kotresh
>>>>>> We have tried the above-mentioned rsync option and we are planning to
>>>>>> have the version upgrade to 6.0.
>>>>>>
>>>>>> On Fri, May 31, 2019 at 11:04 AM Kotresh Hiremath Ravishankar <
>>>>>> khiremat at redhat.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> This looks like the hang because stderr buffer filled up with errors
>>>>>>> messages and no one reading it.
>>>>>>> I think this issue is fixed in latest releases. As a workaround, you
>>>>>>> can do following and check if it works.
>>>>>>>
>>>>>>> Prerequisite:
>>>>>>>  rsync version should be > 3.1.0
>>>>>>>
>>>>>>> Workaround:
>>>>>>> gluster volume geo-replication <MASTERVOL> <SLAVEHOST>::<SLAVEVOL>
>>>>>>> config rsync-options "--ignore-missing-args"
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Kotresh HR
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, May 30, 2019 at 5:39 PM deepu srinivasan <sdeepugd at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi
>>>>>>>> We were evaluating Gluster geo Replication between two DCs one is
>>>>>>>> in US west and one is in US east. We took multiple trials for different
>>>>>>>> file size.
>>>>>>>> The Geo Replication tends to stop replicating but while checking
>>>>>>>> the status it appears to be in Active state. But the slave volume did not
>>>>>>>> increase in size.
>>>>>>>> So we have restarted the geo-replication session and checked the
>>>>>>>> status. The status was in an active state and it was in History Crawl for a
>>>>>>>> long time. We have enabled the DEBUG mode in logging and checked for any
>>>>>>>> error.
>>>>>>>> There was around 2000 file appeared for syncing candidate. The
>>>>>>>> Rsync process starts but the rsync did not happen in the slave volume.
>>>>>>>> Every time the rsync process appears in the "ps auxxx" list but the
>>>>>>>> replication did not happen in the slave end. What would be the cause of
>>>>>>>> this problem? Is there anyway to debug it?
>>>>>>>>
>>>>>>>> We have also checked the strace of the rync program.
>>>>>>>> it displays something like this
>>>>>>>>
>>>>>>>> "write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 128"
>>>>>>>>
>>>>>>>>
>>>>>>>> We are using the below specs
>>>>>>>>
>>>>>>>> Gluster version - 4.1.7
>>>>>>>> Sync mode - rsync
>>>>>>>> Volume - 1x3 in each end (master and slave)
>>>>>>>> Intranet Bandwidth - 10 Gig
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Thanks and Regards,
>>>>>>> Kotresh H R
>>>>>>>
>>>>>>
>>>>
>>>> --
>>>> Thanks and Regards,
>>>> Kotresh H R
>>>>
>>>
>>
>> --
>> Thanks and Regards,
>> Kotresh H R
>>
>

-- 
Thanks and Regards,
Kotresh H R
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190531/97e6ad20/attachment.html>

From spisla80 at gmail.com  Fri May 31 12:32:22 2019
From: spisla80 at gmail.com (David Spisla)
Date: Fri, 31 May 2019 14:32:22 +0200
Subject: [Gluster-users] [Gluster-devel] Improve stability between
 SMB/CTDB and Gluster (together with Samba Core Developer)
In-Reply-To: <CAJyj9j_5xTxrR-Ta-yeEEyae8nJDMwVYN5M0iX1T2EFxf4qtOQ@mail.gmail.com>
References: <CAJyj9j_BofU7ziSNmTQWzcs4r9xLzzTNKetk+nhHay70T_z1qA@mail.gmail.com>
	<CALXm_dFHHX7Sn8YvpemN1B6YPPB4W+4hZ+50NfTrDaijqejk4Q@mail.gmail.com>
	<AM6PR04MB6006428E20D24F6B883BD12EF10F0@AM6PR04MB6006.eurprd04.prod.outlook.com>
	<CAJyj9j_e=AM=1OoZ6T8jeF3B0pewB8Tft2TPro1X6dmKjzQ+eg@mail.gmail.com>
	<CAHxyDdNsZRngAYBY+fxR58X12QhkVDyiBRCRRFXRCdya1s1Sgw@mail.gmail.com>
	<CAJyj9j-EPt9g9pPJDzt4Ztnx420=_+cnVGKdDTDSEUdHm7cr8Q@mail.gmail.com>
	<CAJyj9j_5xTxrR-Ta-yeEEyae8nJDMwVYN5M0iX1T2EFxf4qtOQ@mail.gmail.com>
Message-ID: <CAJyj9j9=19RVYQPNB_SC11dKZ7Rpqti58h2Ei+-qcDywKt-pGg@mail.gmail.com>

 Hello together,

in order not to lose the focus for the topic, I make new date suggestions
for next week

June 03th ? 07th at 12:30 - 14:30 IST or  (9:00 - 11:00 CEST)

June 03th ? 06th at 16:30 - 18:30 IST or  (13:00 - 15:00 CEST)


Regards

David Spisla

Am Di., 21. Mai 2019 um 11:24 Uhr schrieb David Spisla <spisla80 at gmail.com>:

> Hello together,
>
> we are still seeking a day and time to talk about interesting Samba /
> Glusterfs issues. Here is a new list of possible dates and time.
>
> May 22th ? 24th at 12:30 - 14:30 IST or  (9:00 - 11:00 CEST)
>
> May 27th ? 29th and 31th at 12:30 - 14:30 IST (9:00 - 11:00 CEST)
>
>
> On May 30th there is a holiday here in germany.
>
> @Poornima Gurusiddaiah <pgurusid at redhat.com> If there is any problem
> finding a date please contanct me. I will look for alternatives
>
>
> Regards
>
> David Spisla
>
>
>
> Am Do., 16. Mai 2019 um 12:42 Uhr schrieb David Spisla <spisla80 at gmail.com
> >:
>
>> Hello Amar,
>>
>> thank you for the information. Of course, we should wait for Poornima
>> because of her knowledge.
>>
>> Regards
>> David Spisla
>>
>> Am Do., 16. Mai 2019 um 12:23 Uhr schrieb Amar Tumballi Suryanarayan <
>> atumball at redhat.com>:
>>
>>> David, Poornima is on leave from today till 21st May. So having it after
>>> she comes back is better. She has more experience in SMB integration than
>>> many of us.
>>>
>>> -Amar
>>>
>>> On Thu, May 16, 2019 at 1:09 PM David Spisla <spisla80 at gmail.com> wrote:
>>>
>>>> Hello everyone,
>>>>
>>>> if there is any problem in finding a date and time, please contact me.
>>>> It would be fine to have a meeting soon.
>>>>
>>>> Regards
>>>> David Spisla
>>>>
>>>> Am Mo., 13. Mai 2019 um 12:38 Uhr schrieb David Spisla <
>>>> david.spisla at iternity.com>:
>>>>
>>>>> Hi Poornima,
>>>>>
>>>>>
>>>>>
>>>>> thats fine. I would suggest this dates and times:
>>>>>
>>>>>
>>>>>
>>>>> May 15th ? 17th at 12:30, 13:30, 14:30 IST (9:00, 10:00, 11:00 CEST)
>>>>>
>>>>> May 20th ? 24th at 12:30, 13:30, 14:30 IST (9:00, 10:00, 11:00 CEST)
>>>>>
>>>>>
>>>>>
>>>>> I add Volker Lendecke from Sernet to the mail. He is the Samba Expert.
>>>>>
>>>>> Can someone of you provide a host via bluejeans.com? If not, I will
>>>>> try it with GoToMeeting (https://www.gotomeeting.com).
>>>>>
>>>>>
>>>>>
>>>>> @all Please write your prefered dates and times. For me, all oft the
>>>>> above dates and times are fine
>>>>>
>>>>>
>>>>>
>>>>> Regards
>>>>>
>>>>> David
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *Von:* Poornima Gurusiddaiah <pgurusid at redhat.com>
>>>>> *Gesendet:* Montag, 13. Mai 2019 07:22
>>>>> *An:* David Spisla <spisla80 at gmail.com>; Anoop C S <anoopcs at redhat.com>;
>>>>> Gunther Deschner <gdeschne at redhat.com>
>>>>> *Cc:* Gluster Devel <gluster-devel at gluster.org>;
>>>>> gluster-users at gluster.org List <gluster-users at gluster.org>
>>>>> *Betreff:* Re: [Gluster-devel] Improve stability between SMB/CTDB and
>>>>> Gluster (together with Samba Core Developer)
>>>>>
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>>
>>>>>
>>>>> We would be definitely interested in this. Thank you for contacting
>>>>> us. For the starter we can have an online conference. Please suggest few
>>>>> possible date and times for the week(preferably between IST 7.00AM -
>>>>> 9.PM)?
>>>>>
>>>>> Adding Anoop and Gunther who are also the main contributors to the
>>>>> Gluster-Samba integration.
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Poornima
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Thu, May 9, 2019 at 7:43 PM David Spisla <spisla80 at gmail.com>
>>>>> wrote:
>>>>>
>>>>> Dear Gluster Community,
>>>>>
>>>>> at the moment we are improving the stability of SMB/CTDB and Gluster.
>>>>> For this purpose we are working together with an advanced SAMBA Core
>>>>> Developer. He did some debugging but needs more information about Gluster
>>>>> Core Behaviour.
>>>>>
>>>>>
>>>>>
>>>>> *Would any of the Gluster Developer wants to have a online conference
>>>>> with him and me?*
>>>>>
>>>>>
>>>>>
>>>>> I would organize everything. In my opinion this is a good chance to
>>>>> improve stability of Glusterfs and this is at the moment one of the major
>>>>> issues in the Community.
>>>>>
>>>>>
>>>>>
>>>>> Regards
>>>>>
>>>>> David Spisla
>>>>>
>>>>> _______________________________________________
>>>>>
>>>>> Community Meeting Calendar:
>>>>>
>>>>> APAC Schedule -
>>>>> Every 2nd and 4th Tuesday at 11:30 AM IST
>>>>> Bridge: https://bluejeans.com/836554017
>>>>>
>>>>> NA/EMEA Schedule -
>>>>> Every 1st and 3rd Tuesday at 01:00 PM EDT
>>>>> Bridge: https://bluejeans.com/486278655
>>>>>
>>>>> Gluster-devel mailing list
>>>>> Gluster-devel at gluster.org
>>>>> https://lists.gluster.org/mailman/listinfo/gluster-devel
>>>>>
>>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>>>
>>> --
>>> Amar Tumballi (amarts)
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190531/10c45f2f/attachment.html>

From cody at platform9.com  Wed May  1 17:42:40 2019
From: cody at platform9.com (Cody Hill)
Date: Wed, 01 May 2019 17:42:40 -0000
Subject: [Gluster-users] GlusterFS on ZFS
In-Reply-To: <CAHxyDdPAS7d+b67E+XSLm-27FFU6uU0BkAuYGUuGkoRRQzeZkA@mail.gmail.com>
References: <085deed5-f048-4baa-84f8-1f6ef1436a5b@email.android.com>
	<AFAEF562-85BF-431D-B222-6EA01B51A6B9@platform9.com>
	<CAHxyDdPAS7d+b67E+XSLm-27FFU6uU0BkAuYGUuGkoRRQzeZkA@mail.gmail.com>
Message-ID: <60CDF008-E11D-40C7-9D10-A30665FF8847@platform9.com>

Thanks Amar.

I?m going to see what kind of performance I get with just ZFS cache using Intel Optane and RaidZ10 with 12x drives.
If this performs better than AWS GP2, I?m good. If not I?ll look into dmcache.

Has anyone used bcache? Have any experience there? 

Thank you,
Cody Hill  |  Director of Technology  |  Platform9
Direct: (650) 567-3107  
cody at platform9.com <mailto:cody at platform9.com> |  Platform9.com <http://platform9.com/> | Public Calendar <http://pf9.io/ch-cal>


> On May 1, 2019, at 7:34 AM, Amar Tumballi Suryanarayan <atumball at redhat.com> wrote:
> 
> 
> 
> On Tue, Apr 23, 2019 at 11:38 PM Cody Hill <cody at platform9.com <mailto:cody at platform9.com>> wrote:
> 
> Thanks for the info Karli,
> 
> I wasn?t aware ZFS Dedup was such a dog. I guess I?ll leave that off. My data get?s 3.5:1 savings on compression alone. I was aware of stripped sets. I will be doing 6x Striped sets across 12x disks. 
> 
> On top of this design I?m going to try and test Intel Optane DIMM (512GB) as a ?Tier? for GlusterFS to try and get further write acceleration. And issues with GlusterFS ?Tier? functionality that anyone is aware of?
> 
> 
> Hi Cody, I wanted to be honest about GlusterFS 'Tier' functionality. While it is functional and works, we had not seen the actual benefit we expected with the feature, and noticed it is better to use the tiering on each host machine (ie, on bricks) and use those bricks as glusterfs bricks. (like dmcache). 
> 
> Also note that from glusterfs-6.x releases, Tier feature is deprecated.
> 
> -Amar
>  
> Thank you,
> Cody Hill 
> 
>> On Apr 18, 2019, at 2:32 AM, Karli Sj?berg <karli at inparadise.se <mailto:karli at inparadise.se>> wrote:
>> 
>> 
>> 
>> Den 17 apr. 2019 16:30 skrev Cody Hill <cody at platform9.com <mailto:cody at platform9.com>>:
>> Hey folks.
>> 
>> I?m looking to deploy GlusterFS to host some VMs. I?ve done a lot of reading and would like to implement Deduplication and Compression in this setup. My thought would be to run ZFS to handle the Compression and Deduplication.
>> 
>> You _really_ don't want ZFS doing dedup for any reason.
>> 
>> 
>> ZFS would give me the following benefits:
>> 1. If a single disk fails rebuilds happen locally instead of over the network
>> 2. Zil & L2Arc should add a slight performance increase
>> 
>> Adding two really good NVME SSD's as a mirrored SLOG vdev does a huge deal for synchronous write performance, turning every random write into large streams that the spinning drives handle better.
>> 
>> Don't know how picky Gluster is about synchronicity though, most "performance" tweaking suggests setting stuff to async, which I wouldn't recommend, but it's a huge boost for throughput obviously; not having to wait for stuff to actually get written, but it's dangerous.
>> 
>> With mirrored NVME SLOG's, you could probably get that throughput without going asynchronous, which saves you from potential data corruption in a sudden power loss.
>> 
>> L2ARC on the other hand does a bit for read latency, but for a general purpose file server- in practice- not a huge difference, the working set is just too large. Also keep in mind that L2ARC isn't "free". You need more RAM to know where you've cached stuff...
>> 
>> 3. Deduplication and Compression are inline and have pretty good performance with modern hardware (Intel Skylake)
>> 
>> ZFS deduplication has terrible performance. Watch your throughput automatically drop from hundreds or thousands of MB/s down to, like 5. It's a feature;)
>> 
>> 4. Automated Snapshotting
>> 
>> I can then layer GlusterFS on top to handle distribution to allow 3x Replicas of my storage.
>> My question is? Why aren?t more people doing this? Is this a horrible idea for some reason that I?m missing?
>> 
>> While it could save a lot of space in some hypothetical instance, the drawbacks can never motivate it. E.g. if you want one node to suddenly die and never recover because of RAM exhaustion, go with ZFS dedup ;)
>> 
>> I?d be very interested to hear your thoughts.
>> 
>> Avoid ZFS dedup at all costs. LZ4 compression on the hand is awesome, definitely use that! It's basically a free performance enhancer the also saves space :)
>> 
>> As another person has said, the best performance layout is RAID10- striped mirrors. I understand you'd want to get as much volume as possible with RAID-Z/RAID(5|6) since gluster also replicates/distributes, but it has a huge impact on IOPS. If performance is the main concern, do striped mirrors with replica 3 in Gluster. My advice is to test thoroughly with different pool layouts to see what gives acceptable performance against your volume requirements.
>> 
>> /K
>> 
>> 
>> Additional thoughts:
>> I?d like to use Ganesha pNFS to connect to this storage. (Any issues here?)
>> I think I?d need KeepAliveD across these 3x nodes to store in the FSTAB (Is this correct?)
>> I?m also thinking about creating a ?Gluster Tier? of 512GB of Intel Optane DIMM to really smooth out write latencies? Any issues here?
>> 
>> Thank you,
>> Cody Hill
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>> https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users>
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
> https://lists.gluster.org/mailman/listinfo/gluster-users <https://lists.gluster.org/mailman/listinfo/gluster-users>
> 
> -- 
> Amar Tumballi (amarts)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190501/d56e44ec/attachment-0001.html>

From Tim.Stalker at ucdenver.edu  Thu May  2 19:09:07 2019
From: Tim.Stalker at ucdenver.edu (Stalker, Tim)
Date: Thu, 02 May 2019 19:09:07 -0000
Subject: [Gluster-users] geo-replication create push-pem command failure
Message-ID: <CY4PR0501MB38283FD80CE97E26751E709890340@CY4PR0501MB3828.namprd05.prod.outlook.com>

I'm running gluster 4.1.8 on a two node replicated cluster. I'm trying to get geo-replication setup between one slave node with a running volume on it. Everything is working, I have a passwordless connection between a geogrp user on the slave and the root user on master 1.


When I run gluster volume geo-replication mastervol slavevolumemngr at slave1::slavevol create push-pem


I get this error: gluster command on slavevolumemngr at slave1 failed. Error: bash: gluster: command not found


I have passwordless login setup from the root user on master1's pub key and the key here /var/lib/glusterd/geo-replication/secret.pem.pub


Is there a logfile for the attempt to run create push-pem? ssh is setup correctly. On master1 I've tried both of these commands then the create push-pem command


gluster-georep-sshkey generate --no-prefix

gluster-georep-sshkey generate


Any help anyone can provide would be great


Thanks,


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190502/f224419a/attachment-0001.html>

From sdeepugd at gmail.com  Wed May  8 19:16:38 2019
From: sdeepugd at gmail.com (deepu srinivasan)
Date: Wed, 08 May 2019 19:16:38 -0000
Subject: [Gluster-users] No Variation in Gluster geo Replication when sync
	jobs value is increased.
Message-ID: <CAKs6y9ya3TkySPcfi0H9Y17KGNZXijjNi7UKU3_1VJF84w5HmA@mail.gmail.com>

Hi
We were evaluating Gluster geo Replication. We took multiple trials for
different file size and different syncjob value. While increasing or
decreasing the syncjob value the geo replication didnot increase or
decrease its time period for synching.Is this normal or should we change
any other configuration other than syncjob value?
Here are the following stats

Fie Count 8192 8192 8192
Folders 174 174 174
Trial 3 3 3
Trial 1 time taken 4min 4min20sec 4min56sec
Trial 2 time taken 4min16sec 4min9sec 4min
Trial 3 time taken 4min11sec 4min30sec 4min50sec
Sync-jobs (gluster geo config) 3 10 30
Tool Gluster Geo Gluster Geo Gluster Geo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190508/4753aac9/attachment.html>

From david.spisla at iternity.com  Tue May 14 14:43:11 2019
From: david.spisla at iternity.com (David Spisla)
Date: Tue, 14 May 2019 14:43:11 -0000
Subject: [Gluster-users] [Gluster-devel] Improve stability between
 SMB/CTDB and Gluster (together with Samba Core Developer)
In-Reply-To: <CALXm_dFHHX7Sn8YvpemN1B6YPPB4W+4hZ+50NfTrDaijqejk4Q@mail.gmail.com>
References: <CAJyj9j_BofU7ziSNmTQWzcs4r9xLzzTNKetk+nhHay70T_z1qA@mail.gmail.com>
	<CALXm_dFHHX7Sn8YvpemN1B6YPPB4W+4hZ+50NfTrDaijqejk4Q@mail.gmail.com>
Message-ID: <AM6PR04MB6006428E20D24F6B883BD12EF10F0@AM6PR04MB6006.eurprd04.prod.outlook.com>

Hi Poornima,

thats fine. I would suggest this dates and times:

May 15th ? 17th at 12:30, 13:30, 14:30 IST (9:00, 10:00, 11:00 CEST)
May 20th ? 24th at 12:30, 13:30, 14:30 IST (9:00, 10:00, 11:00 CEST)

I add Volker Lendecke from Sernet to the mail. He is the Samba Expert.
Can someone of you provide a host via bluejeans.com? If not, I will try it with GoToMeeting (https://www.gotomeeting.com).

@all Please write your prefered dates and times. For me, all oft the above dates and times are fine

Regards
David


David Spisla
Software Engineer
david.spisla at iternity.com
+49 761 59034852
iTernity GmbH
Heinrich-von-Stephan-Str. 21
79100 Freiburg
Deutschland
Website
 
Newsletter
 
Support Portal
iTernity GmbH. Gesch?ftsf?hrer: Ralf Steinemann. 
?Eingetragen beim Amtsgericht Freiburg: HRB-Nr. 701332. 
?USt.Id DE242664311. [v01.023] 
Von: Poornima Gurusiddaiah <pgurusid at redhat.com>
Gesendet: Montag, 13. Mai 2019 07:22
An: David Spisla <spisla80 at gmail.com>; Anoop C S <anoopcs at redhat.com>; Gunther Deschner <gdeschne at redhat.com>
Cc: Gluster Devel <gluster-devel at gluster.org>; gluster-users at gluster.org List <gluster-users at gluster.org>
Betreff: Re: [Gluster-devel] Improve stability between SMB/CTDB and Gluster (together with Samba Core Developer)

Hi,

We would be definitely interested in this. Thank you for contacting us. For the starter we can have an online conference. Please suggest few possible date and times for the week(preferably between IST 7.00AM - 9.PM<http://9.PM>)?
Adding Anoop and Gunther who are also the main contributors to the Gluster-Samba integration.

Thanks,
Poornima


On Thu, May 9, 2019 at 7:43 PM David Spisla <spisla80 at gmail.com<mailto:spisla80 at gmail.com>> wrote:
Dear Gluster Community,
at the moment we are improving the stability of SMB/CTDB and Gluster. For this purpose we are working together with an advanced SAMBA Core Developer. He did some debugging but needs more information about Gluster Core Behaviour.

Would any of the Gluster Developer wants to have a online conference with him and me?

I would organize everything. In my opinion this is a good chance to improve stability of Glusterfs and this is at the moment one of the major issues in the Community.

Regards
David Spisla
_______________________________________________

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/836554017

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/486278655

Gluster-devel mailing list
Gluster-devel at gluster.org<mailto:Gluster-devel at gluster.org>
https://lists.gluster.org/mailman/listinfo/gluster-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190514/485c9cd6/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image860747.png
Type: image/png
Size: 382 bytes
Desc: image860747.png
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190514/485c9cd6/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image735814.png
Type: image/png
Size: 412 bytes
Desc: image735814.png
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190514/485c9cd6/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image116096.png
Type: image/png
Size: 6545 bytes
Desc: image116096.png
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190514/485c9cd6/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image142576.png
Type: image/png
Size: 37146 bytes
Desc: image142576.png
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190514/485c9cd6/attachment-0003.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image714843.png
Type: image/png
Size: 522 bytes
Desc: image714843.png
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190514/485c9cd6/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image293410.png
Type: image/png
Size: 591 bytes
Desc: image293410.png
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190514/485c9cd6/attachment-0005.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image570372.png
Type: image/png
Size: 775 bytes
Desc: image570372.png
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190514/485c9cd6/attachment-0006.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image031225.png
Type: image/png
Size: 508 bytes
Desc: image031225.png
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190514/485c9cd6/attachment-0007.png>

From jeff.bischoff at turbonomic.com  Wed May 15 01:45:57 2019
From: jeff.bischoff at turbonomic.com (Jeff Bischoff)
Date: Wed, 15 May 2019 01:45:57 -0000
Subject: [Gluster-users] Gluster mounts becoming stale and never recovering
Message-ID: <C3ECB987-40E9-4965-847F-6F0744F09EA3@vmturbo.com>

Hi all,

We are having a sporadic issue with our Gluster mounts that is affecting several of our Kubernetes environments. We are having trouble understanding what is causing it, and we could use some guidance from the pros!

Scenario
We have an environment running a single-node Kubernetes with Heketi and several pods using Gluster mounts. The environment runs fine and the mounts appear to be healthy for up to several days. Suddenly, one or more (sometimes all) Gluster mounts report a stale mount and shut down the brick. The affected containers enter a crash loop that continues indefinitely, until someone intervenes. To work-around the crash loop, a user needs to trigger the bricks to be started again--either through manually starting them, restarting the Gluster pod or restarting the entire node.

Diagnostics
Looking at the glusterd.log file, the error message at the time the problem starts looks something like this:

got disconnect from stale rpc on /var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_d0456279568a623a16a5508daa89b4d5/brick

This message occurs once for each brick that stops responding. The brick does not recover on its own. Here is that same message again, with surrounding context included.

[2019-05-07 11:53:38.663362] I [run.c:241:runner_log] (-->/usr/lib64/glusterfs/4.1.7/xlator/mgmt/glusterd.so(+0x3a7a5) [0x7f795f0d77a5] -->/usr/lib64/glusterfs/4.1.7/xlator/mgmt/glusterd.so(+0xe2765) [0x7f795f17f765] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7f79643180f5] ) 0-management: Ran script: /var/lib/glusterd/hooks/1/stop/pre/S29CTDB-teardown.sh --volname=vol_d0a0dcf9903e236f68a3933c3060ec5a --last=no
[2019-05-07 11:53:38.905338] E [run.c:241:runner_log] (-->/usr/lib64/glusterfs/4.1.7/xlator/mgmt/glusterd.so(+0x3a7a5) [0x7f795f0d77a5] -->/usr/lib64/glusterfs/4.1.7/xlator/mgmt/glusterd.so(+0xe26c3) [0x7f795f17f6c3] -->/lib64/libglusterfs.so.0(runner_log+0x115) [0x7f79643180f5] ) 0-management: Failed to execute script: /var/lib/glusterd/hooks/1/stop/pre/S30samba-stop.sh --volname=vol_d0a0dcf9903e236f68a3933c3060ec5a --last=no
[2019-05-07 11:53:38.982785] I [MSGID: 106542] [glusterd-utils.c:8253:glusterd_brick_signal] 0-glusterd: sending signal 15 to brick with pid 8951
[2019-05-07 11:53:39.983244] I [MSGID: 106143] [glusterd-pmap.c:397:pmap_registry_remove] 0-pmap: removing brick /var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_d0456279568a623a16a5508daa89b4d5/brick on port 49169
[2019-05-07 11:53:39.984656] W [glusterd-handler.c:6124:__glusterd_brick_rpc_notify] 0-management: got disconnect from stale rpc on /var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_d0456279568a623a16a5508daa89b4d5/brick
[2019-05-07 11:53:40.316466] I [MSGID: 106131] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already stopped
[2019-05-07 11:53:40.316601] I [MSGID: 106568] [glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: nfs service is stopped
[2019-05-07 11:53:40.316644] I [MSGID: 106599] [glusterd-nfs-svc.c:82:glusterd_nfssvc_manager] 0-management: nfs/server.so xlator is not installed
[2019-05-07 11:53:40.319650] I [MSGID: 106131] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd already stopped
[2019-05-07 11:53:40.319708] I [MSGID: 106568] [glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: bitd service is stopped
[2019-05-07 11:53:40.321091] I [MSGID: 106131] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already stopped
[2019-05-07 11:53:40.321132] I [MSGID: 106568] [glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: scrub service is stopped

The version of gluster we are using (running in a container, using the gluster/gluster-centos image from dockerhub):


# rpm -qa | grep gluster

glusterfs-rdma-4.1.7-1.el7.x86_64

gluster-block-0.3-2.el7.x86_64

python2-gluster-4.1.7-1.el7.x86_64

centos-release-gluster41-1.0-3.el7.centos.noarch

glusterfs-4.1.7-1.el7.x86_64

glusterfs-api-4.1.7-1.el7.x86_64

glusterfs-cli-4.1.7-1.el7.x86_64

glusterfs-geo-replication-4.1.7-1.el7.x86_64

glusterfs-libs-4.1.7-1.el7.x86_64

glusterfs-client-xlators-4.1.7-1.el7.x86_64

glusterfs-fuse-4.1.7-1.el7.x86_64

glusterfs-server-4.1.7-1.el7.x86_64

The version of gluster running on our Kubernetes node (a CentOS system):


]$ rpm -qa | grep gluster

glusterfs-libs-3.12.2-18.el7.x86_64

glusterfs-3.12.2-18.el7.x86_64

glusterfs-fuse-3.12.2-18.el7.x86_64

glusterfs-client-xlators-3.12.2-18.el7.x86_64

The Kubernetes version:


$  kubectl version

Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5", GitCommit:"2166946f41b36dea2c4626f90a77706f426cdea2", GitTreeState:"clean", BuildDate:"2019-03-25T15:19:22Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}

Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5", GitCommit:"2166946f41b36dea2c4626f90a77706f426cdea2", GitTreeState:"clean", BuildDate:"2019-03-25T15:19:22Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}

Full Gluster logs available if needed, just let me know how to provide them.

Thanks in advance for any help or suggestions on this!

Best,

Jeff Bischoff
Turbonomic
This message and its attachments are intended only for the designated recipient(s). It may contain confidential or proprietary information and may be subject to legal or other confidentiality protections. If you are not a designated recipient, you may not review, copy or distribute this message. If you receive this in error, please notify the sender by reply e-mail and delete this message. Thank you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190515/7dbed9d7/attachment.html>

From jeff.bischoff at turbonomic.com  Thu May 16 15:22:04 2019
From: jeff.bischoff at turbonomic.com (Jeff Bischoff)
Date: Thu, 16 May 2019 15:22:04 -0000
Subject: [Gluster-users] Gluster mounts becoming stale and never recovering
Message-ID: <2110B5B6-B284-4DF4-A0AA-87863F8FC7BF@vmturbo.com>

Hi all,

We are having a sporadic issue with our Gluster mounts that is affecting several of our Kubernetes environments. We are having trouble understanding what is causing it, and we could use some guidance from the pros!

Scenario
We have an environment running a single-node Kubernetes with Heketi and several pods using Gluster mounts. The environment runs fine and the mounts appear to be healthy for up to several days. Suddenly, one or more (sometimes all) Gluster mounts have a problem and shut down the brick. The affected containers enter a crash loop that continues indefinitely, until someone intervenes. To work-around the crash loop, a user needs to trigger the bricks to be started again--either through manually starting them, restarting the Gluster pod or restarting the entire node.

Diagnostics
The tell-tale error message is seeing the following when describing a pod that is in a crash loop:

Message:      error while creating mount source path '/var/lib/kubelet/pods/4a2574bb-6fa4-11e9-a315-005056b83c80/volumes/kubernetes.io~glusterfs/db': mkdir /var/lib/kubelet/pods/4a2574bb-6fa4-11e9-a315-005056b83c80/volumes/kubernetes.io~glusterfs/db: file exists

We always see that "file exists" message when this error occurs.

Looking at the glusterd.log file, there had been nothing in the log for over a day and then suddenly, at the time the crash loop started, this:

[2019-05-08 13:49:04.733147] I [MSGID: 106143] [glusterd-pmap.c:397:pmap_registry_remove] 0-pmap: removing brick /var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_a3cef78a5914a2808da0b5736e3daec7/brick on port 49168
[2019-05-08 13:49:04.733374] I [MSGID: 106143] [glusterd-pmap.c:397:pmap_registry_remove] 0-pmap: removing brick /var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_7614e5014a0e402630a0e1fd776acf0a/brick on port 49167
[2019-05-08 13:49:05.003848] W [socket.c:599:__socket_rwv] 0-management: readv on /var/run/gluster/fe4ac75011a4de0e.socket failed (No data available)
[2019-05-08 13:49:05.065420] W [socket.c:599:__socket_rwv] 0-management: readv on /var/run/gluster/85e9fb223aa121f2.socket failed (No data available)
[2019-05-08 13:49:05.066479] W [socket.c:599:__socket_rwv] 0-management: readv on /var/run/gluster/e2a66e8cd8f5f606.socket failed (No data available)
[2019-05-08 13:49:05.067444] W [socket.c:599:__socket_rwv] 0-management: readv on /var/run/gluster/a0625e5b78d69bb8.socket failed (No data available)
[2019-05-08 13:49:05.068471] W [socket.c:599:__socket_rwv] 0-management: readv on /var/run/gluster/770bc294526d0360.socket failed (No data available)
[2019-05-08 13:49:05.074278] W [socket.c:599:__socket_rwv] 0-management: readv on /var/run/gluster/adbd37fe3e1eed36.socket failed (No data available)
[2019-05-08 13:49:05.075497] W [socket.c:599:__socket_rwv] 0-management: readv on /var/run/gluster/17712138f3370e53.socket failed (No data available)
[2019-05-08 13:49:05.076545] W [socket.c:599:__socket_rwv] 0-management: readv on /var/run/gluster/a6cf1aca8b23f394.socket failed (No data available)
[2019-05-08 13:49:05.077511] W [socket.c:599:__socket_rwv] 0-management: readv on /var/run/gluster/d0f83b191213e877.socket failed (No data available)
[2019-05-08 13:49:05.078447] W [socket.c:599:__socket_rwv] 0-management: readv on /var/run/gluster/d5dd08945d4f7f6d.socket failed (No data available)
[2019-05-08 13:49:05.079424] W [socket.c:599:__socket_rwv] 0-management: readv on /var/run/gluster/c8d7b10108758e2f.socket failed (No data available)
[2019-05-08 13:49:14.778619] I [MSGID: 106143] [glusterd-pmap.c:397:pmap_registry_remove] 0-pmap: removing brick /var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_0ed4f7f941de388cda678fe273e9ceb4/brick on port 49166
... (and more of the same)

Nothing further has been printed to the gluster log since. The bricks do not come back on their own.
The version of gluster we are using (running in a container, using the gluster/gluster-centos image from dockerhub):

# rpm -qa | grep gluster
glusterfs-rdma-4.1.7-1.el7.x86_64
gluster-block-0.3-2.el7.x86_64
python2-gluster-4.1.7-1.el7.x86_64
centos-release-gluster41-1.0-3.el7.centos.noarch
glusterfs-4.1.7-1.el7.x86_64
glusterfs-api-4.1.7-1.el7.x86_64
glusterfs-cli-4.1.7-1.el7.x86_64
glusterfs-geo-replication-4.1.7-1.el7.x86_64
glusterfs-libs-4.1.7-1.el7.x86_64
glusterfs-client-xlators-4.1.7-1.el7.x86_64
glusterfs-fuse-4.1.7-1.el7.x86_64
glusterfs-server-4.1.7-1.el7.x86_64

The version of glusterfs running on our Kubernetes node (a CentOS system):

]$ rpm -qa | grep gluster
glusterfs-libs-3.12.2-18.el7.x86_64
glusterfs-3.12.2-18.el7.x86_64
glusterfs-fuse-3.12.2-18.el7.x86_64
glusterfs-client-xlators-3.12.2-18.el7.x86_64

The Kubernetes version:

$  kubectl version
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5", GitCommit:"2166946f41b36dea2c4626f90a77706f426cdea2", GitTreeState:"clean", BuildDate:"2019-03-25T15:19:22Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5", GitCommit:"2166946f41b36dea2c4626f90a77706f426cdea2", GitTreeState:"clean", BuildDate:"2019-03-25T15:19:22Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}

Our gluster settings/volume options:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: gluster-heketi
  selfLink: /apis/storage.k8s.io/v1/storageclasses/gluster-heketi
parameters:
  gidMax: "50000"
  gidMin: "2000"
  resturl: http://10.233.35.158:8080
  restuser: "null"
  restuserkey: "null"
  volumetype: "none"
  volumeoptions: cluster.post-op-delay-secs 0, performance.client-io-threads off, performance.open-behind off, performance.readdir-ahead off, performance.read-ahead off, performance.stat-prefetch off, performance.write-behind off, performance.io-cache off, cluster.consistent-metadata on, performance.quick-read off, performance.strict-o-direct on
provisioner: kubernetes.io/glusterfs
reclaimPolicy: Delete

Volume info for the heketi volume:


gluster> volume info heketidbstorage


Volume Name: heketidbstorage

Type: Distribute

Volume ID: 34b897d0-0953-4f8f-9c5c-54e043e55d92

Status: Started

Snapshot Count: 0

Number of Bricks: 1

Transport-type: tcp

Bricks:

Brick1: 10.10.168.25:/var/lib/heketi/mounts/vg_c197878af606e71a874ad28e3bd7e4e1/brick_a16f9f0374fe5db948a60a017a3f5e60/brick

Options Reconfigured:

user.heketi.id: 1d2400626dac780fce12e45a07494853

transport.address-family: inet

nfs.disable: on

Full Gluster logs available if needed, just let me know how best to provide them.

Thanks in advance for any help or suggestions on this!

Best,

Jeff Bischoff
Turbonomic

This message and its attachments are intended only for the designated recipient(s). It may contain confidential or proprietary information and may be subject to legal or other confidentiality protections. If you are not a designated recipient, you may not review, copy or distribute this message. If you receive this in error, please notify the sender by reply e-mail and delete this message. Thank you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190516/e112fb5b/attachment.html>

From leoalex at gmail.com  Mon May 27 10:54:01 2019
From: leoalex at gmail.com (Leo David)
Date: Mon, 27 May 2019 10:54:01 -0000
Subject: [Gluster-users] [ovirt-users] Re: Single instance scaleup.
In-Reply-To: <626088321.4969320.1558877893362@mail.yahoo.com>
References: <v8rxyqcbniy6jojgltf9t607.1558864970597@email.android.com>
	<CANMQpRYU72yHfcbwOcJGGEW1tGztvZC87+pebytabTamO_svbg@mail.gmail.com>
	<626088321.4969320.1558877893362@mail.yahoo.com>
Message-ID: <CANMQpRY3dMrozffjGJedRdL+TYs+gRvB3bYjiVDBBV-KM+kFOA@mail.gmail.com>

Hi,
Any suggestions ?
Thank you very much !

Leo

On Sun, May 26, 2019 at 4:38 PM Strahil Nikolov <hunter86_bg at yahoo.com>
wrote:

> Yeah,
> it seems different from the docs.
> I'm adding the gluster users list ,as they are more experienced into that.
>
> @Gluster-users,
>
> can you provide some hint how to add aditional replicas to the below
> volumes , so they become 'replica 2 arbiter 1' or 'replica 3' type volumes ?
>
>
> Best Regards,
> Strahil Nikolov
>
> ? ??????, 26 ??? 2019 ?., 15:16:18 ?. ???????+3, Leo David <
> leoalex at gmail.com> ??????:
>
>
> Thank you Strahil,
> The engine and ssd-samsung are distributed...
> So these are the ones that I need to have replicated accross new nodes.
> I am not very sure about the procedure to accomplish this.
> Thanks,
>
> Leo
>
> On Sun, May 26, 2019, 13:04 Strahil <hunter86_bg at yahoo.com> wrote:
>
> Hi Leo,
> As you do not have a distributed volume , you can easily switch to replica
> 2 arbiter 1 or replica 3 volumes.
>
> You can use the following for adding the bricks:
>
>
> https://access.redhat.com/documentation/en-US/Red_Hat_Storage/2.1/html/Administration_Guide/Expanding_Volumes.html
>
> Best Regards,
> Strahil Nikoliv
> On May 26, 2019 10:54, Leo David <leoalex at gmail.com> wrote:
>
> Hi Stahil,
> Thank you so much for yout input !
>
>  gluster volume info
>
>
> Volume Name: engine
> Type: Distribute
> Volume ID: d7449fc2-cc35-4f80-a776-68e4a3dbd7e1
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1
> Transport-type: tcp
> Bricks:
> Brick1: 192.168.80.191:/gluster_bricks/engine/engine
> Options Reconfigured:
> nfs.disable: on
> transport.address-family: inet
> storage.owner-uid: 36
> storage.owner-gid: 36
> features.shard: on
> performance.low-prio-threads: 32
> performance.strict-o-direct: off
> network.remote-dio: off
> network.ping-timeout: 30
> user.cifs: off
> performance.quick-read: off
> performance.read-ahead: off
> performance.io-cache: off
> cluster.eager-lock: enable
> Volume Name: ssd-samsung
> Type: Distribute
> Volume ID: 76576cc6-220b-4651-952d-99846178a19e
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1
> Transport-type: tcp
> Bricks:
> Brick1: 192.168.80.191:/gluster_bricks/sdc/data
> Options Reconfigured:
> cluster.eager-lock: enable
> performance.io-cache: off
> performance.read-ahead: off
> performance.quick-read: off
> user.cifs: off
> network.ping-timeout: 30
> network.remote-dio: off
> performance.strict-o-direct: on
> performance.low-prio-threads: 32
> features.shard: on
> storage.owner-gid: 36
> storage.owner-uid: 36
> transport.address-family: inet
> nfs.disable: on
>
> The other two hosts will be 192.168.80.192/193  - this is gluster
> dedicated network over 10GB sfp+ switch.
> - host 2 wil have identical harware configuration with host 1 ( each disk
> is actually a raid0 array )
> - host 3 has:
>    -  1 ssd for OS
>    -  1 ssd - for adding to engine volume in a full replica 3
>    -  2 ssd's in a raid 1 array to be added as arbiter for the data volume
> ( ssd-samsung )
> So the plan is to have "engine"  scaled in a full replica 3,  and
> "ssd-samsung" scalled in a replica 3 arbitrated.
>
>
>
>
> On Sun, May 26, 2019 at 10:34 AM Strahil <hunter86_bg at yahoo.com> wrote:
>
> Hi Leo,
>
> Gluster is quite smart, but in order to provide any hints , can you
> provide output of 'gluster volume info <glustervol>'.
> If you have 2 more systems , keep in mind that it is best to mirror the
> storage on the second replica (2 disks on 1 machine -> 2 disks on the new
> machine), while for the arbiter this is not neccessary.
>
> What is your network and NICs ? Based on my experience , I can recommend
> at least 10 gbit/s  interfase(s).
>
> Best Regards,
> Strahil Nikolov
> On May 26, 2019 07:52, Leo David <leoalex at gmail.com> wrote:
>
> Hello Everyone,
> Can someone help me to clarify this ?
> I have a single-node 4.2.8 installation ( only two gluster storage domains
> - distributed  single drive volumes ). Now I just got two identintical
> servers and I would like to go for a 3 nodes bundle.
> Is it possible ( after joining the new nodes to the cluster ) to expand
> the existing volumes across the new nodes and change them to replica 3
> arbitrated ?
> If so, could you share with me what would it be the procedure ?
> Thank you very much !
>
> Leo
>
>
>
> --
> Best regards, Leo David
>
>

-- 
Best regards, Leo David
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190527/639c849d/attachment-0001.html>

From sunkumar at redhat.com  Mon May 27 12:53:16 2019
From: sunkumar at redhat.com (sunkumar at redhat.com)
Date: Mon, 27 May 2019 12:53:16 -0000
Subject: [Gluster-users] Gluster Community Meeting (APAC friendly hours)
Message-ID: <0000000000001f378e0589de08c2@google.com>

Bridge: https://bluejeans.com/836554017

Meeting minutes: https://hackmd.io/B4vOpJumRgexzqeQiNPVOw

Flash Talk : What is Thin Arbiter? (By Ashish Pandey)

Previous Meeting notes: http://github.com/gluster/community

Title: Gluster Community Meeting (APAC friendly hours)
Bridge: https://bluejeans.com/836554017Meeting minutes:  
https://hackmd.io/B4vOpJumRgexzqeQiNPVOwFlash Talk : What is Thin Arbiter?  
(By Ashish Pandey)Previous Meeting notes:  
http://github.com/gluster/community
When: Tue May 28, 2019 11:30am ? 12:30pm India Standard Time - Kolkata
Where: https://bluejeans.com/836554017
Who:
     * pgurusid at redhat.com - organizer
     * javico at paradigmadigital.com
     * spentaparthi at idirect.net
     * sstephen at redhat.com
     * brian.riddle at storagecraft.com
     * sthomas at rpstechnologysolutions.co.uk
     * kdhananj at redhat.com
     * rwareing at fb.com
     * david.spisla at iternity.com
     * khiremat at redhat.com
     * pkarampu at redhat.com
     * gluster-users at gluster.org
     * dcunningham at voisonics.com
     * m.vrgotic at activevideo.com
     * barchu02 at unm.edu
     * gluster-devel at gluster.org
     * sunkumar at redhat.com
     * jpark at dexyp.com
     * rouge2507 at gmail.com
     * dan at clough.xyz
     * Max de Graaf
     * mark.boulton at uwa.edu.au
     * hgowtham at redhat.com
     * gabriel.lindeborg at svenskaspel.se
     * maintainers at gluster.org
     * ranaraya at redhat.com
     * philip.ruenagel at gmail.com
     * spalai at redhat.com
     * m.ragusa at eurodata.de
     * pauyeung at connexity.com
     * duprel at email.sc.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190527/a21a4d6d/attachment-0001.html>

From leoalex at gmail.com  Tue May 28 14:08:49 2019
From: leoalex at gmail.com (Leo David)
Date: Tue, 28 May 2019 14:08:49 -0000
Subject: [Gluster-users] [ovirt-users] Re: Single instance scaleup.
In-Reply-To: <CANMQpRY3dMrozffjGJedRdL+TYs+gRvB3bYjiVDBBV-KM+kFOA@mail.gmail.com>
References: <v8rxyqcbniy6jojgltf9t607.1558864970597@email.android.com>
	<CANMQpRYU72yHfcbwOcJGGEW1tGztvZC87+pebytabTamO_svbg@mail.gmail.com>
	<626088321.4969320.1558877893362@mail.yahoo.com>
	<CANMQpRY3dMrozffjGJedRdL+TYs+gRvB3bYjiVDBBV-KM+kFOA@mail.gmail.com>
Message-ID: <CANMQpRYYNn-Zsn0j0wX77rpqWA5ugn9PtJ1K+FURCt0GECdsLg@mail.gmail.com>

Hi,
Looks like the only way arround would be to create a brand-new volume as
replicated on other disks, and start moving the vms all around the place
between volumes ?
Cheers,

Leo

On Mon, May 27, 2019 at 1:53 PM Leo David <leoalex at gmail.com> wrote:

> Hi,
> Any suggestions ?
> Thank you very much !
>
> Leo
>
> On Sun, May 26, 2019 at 4:38 PM Strahil Nikolov <hunter86_bg at yahoo.com>
> wrote:
>
>> Yeah,
>> it seems different from the docs.
>> I'm adding the gluster users list ,as they are more experienced into that.
>>
>> @Gluster-users,
>>
>> can you provide some hint how to add aditional replicas to the below
>> volumes , so they become 'replica 2 arbiter 1' or 'replica 3' type volumes ?
>>
>>
>> Best Regards,
>> Strahil Nikolov
>>
>> ? ??????, 26 ??? 2019 ?., 15:16:18 ?. ???????+3, Leo David <
>> leoalex at gmail.com> ??????:
>>
>>
>> Thank you Strahil,
>> The engine and ssd-samsung are distributed...
>> So these are the ones that I need to have replicated accross new nodes.
>> I am not very sure about the procedure to accomplish this.
>> Thanks,
>>
>> Leo
>>
>> On Sun, May 26, 2019, 13:04 Strahil <hunter86_bg at yahoo.com> wrote:
>>
>> Hi Leo,
>> As you do not have a distributed volume , you can easily switch to
>> replica 2 arbiter 1 or replica 3 volumes.
>>
>> You can use the following for adding the bricks:
>>
>>
>> https://access.redhat.com/documentation/en-US/Red_Hat_Storage/2.1/html/Administration_Guide/Expanding_Volumes.html
>>
>> Best Regards,
>> Strahil Nikoliv
>> On May 26, 2019 10:54, Leo David <leoalex at gmail.com> wrote:
>>
>> Hi Stahil,
>> Thank you so much for yout input !
>>
>>  gluster volume info
>>
>>
>> Volume Name: engine
>> Type: Distribute
>> Volume ID: d7449fc2-cc35-4f80-a776-68e4a3dbd7e1
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 1
>> Transport-type: tcp
>> Bricks:
>> Brick1: 192.168.80.191:/gluster_bricks/engine/engine
>> Options Reconfigured:
>> nfs.disable: on
>> transport.address-family: inet
>> storage.owner-uid: 36
>> storage.owner-gid: 36
>> features.shard: on
>> performance.low-prio-threads: 32
>> performance.strict-o-direct: off
>> network.remote-dio: off
>> network.ping-timeout: 30
>> user.cifs: off
>> performance.quick-read: off
>> performance.read-ahead: off
>> performance.io-cache: off
>> cluster.eager-lock: enable
>> Volume Name: ssd-samsung
>> Type: Distribute
>> Volume ID: 76576cc6-220b-4651-952d-99846178a19e
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 1
>> Transport-type: tcp
>> Bricks:
>> Brick1: 192.168.80.191:/gluster_bricks/sdc/data
>> Options Reconfigured:
>> cluster.eager-lock: enable
>> performance.io-cache: off
>> performance.read-ahead: off
>> performance.quick-read: off
>> user.cifs: off
>> network.ping-timeout: 30
>> network.remote-dio: off
>> performance.strict-o-direct: on
>> performance.low-prio-threads: 32
>> features.shard: on
>> storage.owner-gid: 36
>> storage.owner-uid: 36
>> transport.address-family: inet
>> nfs.disable: on
>>
>> The other two hosts will be 192.168.80.192/193  - this is gluster
>> dedicated network over 10GB sfp+ switch.
>> - host 2 wil have identical harware configuration with host 1 ( each disk
>> is actually a raid0 array )
>> - host 3 has:
>>    -  1 ssd for OS
>>    -  1 ssd - for adding to engine volume in a full replica 3
>>    -  2 ssd's in a raid 1 array to be added as arbiter for the data
>> volume ( ssd-samsung )
>> So the plan is to have "engine"  scaled in a full replica 3,  and
>> "ssd-samsung" scalled in a replica 3 arbitrated.
>>
>>
>>
>>
>> On Sun, May 26, 2019 at 10:34 AM Strahil <hunter86_bg at yahoo.com> wrote:
>>
>> Hi Leo,
>>
>> Gluster is quite smart, but in order to provide any hints , can you
>> provide output of 'gluster volume info <glustervol>'.
>> If you have 2 more systems , keep in mind that it is best to mirror the
>> storage on the second replica (2 disks on 1 machine -> 2 disks on the new
>> machine), while for the arbiter this is not neccessary.
>>
>> What is your network and NICs ? Based on my experience , I can recommend
>> at least 10 gbit/s  interfase(s).
>>
>> Best Regards,
>> Strahil Nikolov
>> On May 26, 2019 07:52, Leo David <leoalex at gmail.com> wrote:
>>
>> Hello Everyone,
>> Can someone help me to clarify this ?
>> I have a single-node 4.2.8 installation ( only two gluster storage
>> domains - distributed  single drive volumes ). Now I just got two
>> identintical servers and I would like to go for a 3 nodes bundle.
>> Is it possible ( after joining the new nodes to the cluster ) to expand
>> the existing volumes across the new nodes and change them to replica 3
>> arbitrated ?
>> If so, could you share with me what would it be the procedure ?
>> Thank you very much !
>>
>> Leo
>>
>>
>>
>> --
>> Best regards, Leo David
>>
>>
>
> --
> Best regards, Leo David
>


-- 
Best regards, Leo David
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190528/0f7ec522/attachment-0001.html>

From sdeepugd at gmail.com  Thu May 30 12:09:53 2019
From: sdeepugd at gmail.com (deepu srinivasan)
Date: Thu, 30 May 2019 12:09:53 -0000
Subject: [Gluster-users] Geo Replication Stops
Message-ID: <CAKs6y9yxfbpTr5fGbathZyts-9UaL4ps8+K-7HCONOEambgJ6A@mail.gmail.com>

Hi
We were evaluating Gluster geo Replication between two DCs one is in US
west and one is in US east. We took multiple trials for different file
size.
The Geo Replication tends to stop replicating but while checking the status
it appears to be in Active state. But the slave volume did not increase in
size.
So we have restarted the geo-replication session and checked the status.
The status was in an active state and it was in History Crawl for a long
time. We have enabled the DEBUG mode in logging and checked for any error.
There was around 2000 file appeared for syncing candidate. The Rsync
process starts but the rsync did not happen in the slave volume. Every time
the rsync process appears in the "ps auxxx" list but the replication did
not happen in the slave end. What would be the cause of this problem? Is
there anyway to debug it?

We have also checked the strace of the rync program.
it displays something like this

"write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 128"


We are using the below specs

Gluster version - 4.1.7
Sync mode - rsync
Volume - 1x3 in each end (master and slave)
Intranet Bandwidth - 10 Gig
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190530/03792ec7/attachment-0001.html>

From sdeepugd at gmail.com  Fri May 31 09:46:36 2019
From: sdeepugd at gmail.com (deepu srinivasan)
Date: Fri, 31 May 2019 09:46:36 -0000
Subject: [Gluster-users] Geo Replication stops replicating
In-Reply-To: <CAPgWtC6WdUxxcT5cOsyNkxzjj1oCB8ychgxWNW5wpDK40AQ+SQ@mail.gmail.com>
References: <CAKs6y9yUt4x-VyYwAq=EQfHsH+KQGVyxH7EwC24SV0HSGhQjAw@mail.gmail.com>
	<CAPgWtC6WdUxxcT5cOsyNkxzjj1oCB8ychgxWNW5wpDK40AQ+SQ@mail.gmail.com>
Message-ID: <CAKs6y9wcDMyVO2r6cAqs-JvdjCajzjit0DnnNxN8b=NVLQU8xw@mail.gmail.com>

Hi Kotresh
We have tried the above-mentioned rsync option and we are planning to have
the version upgrade to 6.0.

On Fri, May 31, 2019 at 11:04 AM Kotresh Hiremath Ravishankar <
khiremat at redhat.com> wrote:

> Hi,
>
> This looks like the hang because stderr buffer filled up with errors
> messages and no one reading it.
> I think this issue is fixed in latest releases. As a workaround, you can
> do following and check if it works.
>
> Prerequisite:
>  rsync version should be > 3.1.0
>
> Workaround:
> gluster volume geo-replication <MASTERVOL> <SLAVEHOST>::<SLAVEVOL> config
> rsync-options "--ignore-missing-args"
>
> Thanks,
> Kotresh HR
>
>
>
>
> On Thu, May 30, 2019 at 5:39 PM deepu srinivasan <sdeepugd at gmail.com>
> wrote:
>
>> Hi
>> We were evaluating Gluster geo Replication between two DCs one is in US
>> west and one is in US east. We took multiple trials for different file
>> size.
>> The Geo Replication tends to stop replicating but while checking the
>> status it appears to be in Active state. But the slave volume did not
>> increase in size.
>> So we have restarted the geo-replication session and checked the status.
>> The status was in an active state and it was in History Crawl for a long
>> time. We have enabled the DEBUG mode in logging and checked for any error.
>> There was around 2000 file appeared for syncing candidate. The Rsync
>> process starts but the rsync did not happen in the slave volume. Every time
>> the rsync process appears in the "ps auxxx" list but the replication did
>> not happen in the slave end. What would be the cause of this problem? Is
>> there anyway to debug it?
>>
>> We have also checked the strace of the rync program.
>> it displays something like this
>>
>> "write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 128"
>>
>>
>> We are using the below specs
>>
>> Gluster version - 4.1.7
>> Sync mode - rsync
>> Volume - 1x3 in each end (master and slave)
>> Intranet Bandwidth - 10 Gig
>>
>
>
> --
> Thanks and Regards,
> Kotresh H R
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190531/002b27e6/attachment-0001.html>

From maybeonly at gmail.com  Mon May 13 13:57:46 2019
From: maybeonly at gmail.com (Only Maybe)
Date: Mon, 13 May 2019 13:57:46 -0000
Subject: [Gluster-users] File not updated on gluster fuse volume when
	written by another client
Message-ID: <CAKOYdjTc0aJPQhRRc_ZzbKGiwD5a0bgAbBhwYiHZm2r0nstC0A@mail.gmail.com>

I have a cluster with many replica volumes
One of the node was rebooted for some reason
Then I mounted all volumes from localhost, on the rebooted server
I found some contents of file which was updated by other clients does not
change.
If I remount the volume, the file was updated, but stopped updating again
from then on.

Gluserfs version: 6.1 & 6.0 & 3.8
opversion: 3.8
Everything seems well in gluster volume status *

is it related to dentry cache? what can I do?
Thank you
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190513/db63fb3e/attachment.html>

From chrmeyer at chrmeyer.de  Thu May 16 07:54:02 2019
From: chrmeyer at chrmeyer.de (Christian Meyer)
Date: Thu, 16 May 2019 07:54:02 -0000
Subject: [Gluster-users] Memory leak in gluster 5.4
Message-ID: <CAKof759xx8nHCcv6chOPRAekNVUWXuYYeeDK8mQ5Go4WA7p8dA@mail.gmail.com>

Hi everyone!

I'm using a Gluster 5.4 Setup with 3 Nodes with three volumes
(replicated) (one is the gluster shared storage).
Each node has 64GB of RAM.
Over the time of ~2 month the memory consumption of glusterd grow
linear. An the end glusterd used ~45% of RAM the brick processes
together ~43% of RAM.
I think this is a memory leak.

I made a coredump of the processes (glusterd, bricks) (zipped ~500MB),
hope this will help you to find the problem.

Could you have a look on it?

Download Coredumps:
https://s3.eu-central-1.amazonaws.com/glusterlogs/gluster_coredump.zip

Kind regards

Christian

From chrmeyer at chrmeyer.de  Thu May 16 09:19:48 2019
From: chrmeyer at chrmeyer.de (Christian Meyer)
Date: Thu, 16 May 2019 09:19:48 -0000
Subject: [Gluster-users] Memory leak in gluster 5.4
Message-ID: <CAKof758-meKaqVdo2yzsEF-6S06QC0O9Avp2T--d+amwWpFPwg@mail.gmail.com>

Hi everyone!

I'm using a Gluster 5.4 Setup with 3 Nodes with three volumes
(replicated) (one is the gluster shared storage).
Each node has 64GB of RAM.
Over the time of ~2 month the memory consumption of glusterd grow
linear. An the end glusterd used ~45% of RAM the brick processes
together ~43% of RAM.
I think this is a memory leak.

I made a coredump of the processes (glusterd, bricks) (zipped ~500MB),
hope this will help you to find the problem.

Could you please have a look on it?

Download Coredumps:
https://s3.eu-central-1.amazonaws.com/glusterlogs/gluster_coredump.zip

Kind regards

Christian

From saravana20july at gmail.com  Wed May 22 13:15:33 2019
From: saravana20july at gmail.com (Saravana Kumar)
Date: Wed, 22 May 2019 13:15:33 -0000
Subject: [Gluster-users] Geo Replication Hangs
Message-ID: <CAKyae4+vwfHO9WDch0F_TVmeL7LMun_GJkAu-WcsaidwnhABVQ@mail.gmail.com>

Hi

We were evaluating Gluster geo Replication between two DataCenters, one is
in US west and other in US east. We took multiple trials for different file
size .

The Geo Replication tend to stop replicating,  While checking the status it
appears to be in Active state, but the slave volume did not increase in
size.

So we have restarted the geo replication session and checked the status .
The status was in active state and the it was in History Crawl for a long
time. We have enabled the DEBUG mode in logging and checked for any error.

There were around 2000 file appeared for syncing candidate. The Rsync
process starts but the rsync did not happen in the slave volume. Every time
the rsync process appears in the "ps auxxx" list but the replication did
not happen in the slave end. What would be the cause for this problem? Is
there anyways to debug it?

We are using the below specs

Gluster version - 4.1.7
Sync mode - rsync
Volume - 1x3 in each end (master and slave)
Intranet Bandwidth - 10 Gig

-- 
Regards,
Saravana Kumar.N
[image: Picture]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190522/247e2283/attachment.html>

From Rene.Kucera at ontec.at  Fri May 24 12:10:40 2019
From: Rene.Kucera at ontec.at (Rene Kucera)
Date: Fri, 24 May 2019 12:10:40 -0000
Subject: [Gluster-users] Glusterfs Split-Brain problem
Message-ID: <4E7878669989AC42976A9FD44C0FBF546CF0CB08@mail02.olymp.ontec.at>

Hi Gluster Community

We have a PVE Proxmox cluster with two nodes. These two nodes each have 4 HDDs over which we have a glusterfs to migrate VMs live.

A few days ago we had the problem that some disk files in the glusterfs got into a split-brain condition. We were able to secure the corresponding logfiles and resolve the split brain condition, but don't know how it happened. In the appendix you can find the Glusterfs log files.

Maybe one of you can tell us what caused the problem:

Here is the network setup of the PVE Cluster

192.168.231.0/24 --> Serverlan (reach PVE Gui port 8006)
10.10.11.0 /24 --> Cluster Ha Lan
10.10.12.0 /24 --> Glusterfs Storage lan

Glusterfs Lan
.) PVEServer1 - 10.10.12.31
.) PVEServer2 - 10.10.12.32

What we've seen in the mnt-pve-GlusterVol01.log log file:
Server1:
[2019-05-13 04:25:01.509716] I [MSGID: 100011] [glusterfsd.c:1396:reincarnate] 0-glusterfsd: Fetching the volume file from server...

[2019-05-13 09:47:48.277650] W [socket.c:590:__socket_rwv] 0-glusterfs: readv on 10.10.12.31:24007 failed (No data available)

[2019-05-13 09:47:48.277696] E [glusterfsd-mgmt.c:1908:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: 10.10.12.31 (No data available)

[2019-05-13 09:47:48.277704] I [glusterfsd-mgmt.c:1926:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers

[2019-05-13 09:47:50.926948] W [glusterfsd.c:1327:cleanup_and_exit] (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x7494) [0x7fe58a1eb494] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xf5) [0x55a8728115e5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x54) [0x55a872811444] ) 0-: received signum (15), shutting down

[2019-05-13 09:47:50.926977] I [fuse-bridge.c:5794:fini] 0-fuse: Unmounting '/mnt/pve/GlusterVol01'.

[2019-05-13 09:47:50.950381] I [fuse-bridge.c:5086:fuse_thread_proc] 0-fuse: unmounting /mnt/pve/GlusterVol01

[2019-05-13 09:49:43.823117] I [MSGID: 100030] [glusterfsd.c:2454:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.8.8 (args: /usr/sbin/glusterfs --volfile-server=10.10.12.31 --volfile-id=vol0 /mnt/pve/GlusterVol01)

[2019-05-13 09:49:43.828117] I [MSGID: 101190] [event-epoll.c:628:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1

[2019-05-13 09:49:43.869885] W [MSGID: 108003] [afr.c:102:fix_quorum_options] 0-vol0-replicate-0: quorum-type none overriding quorum-count 1

[2019-05-13 09:49:43.871644] I [MSGID: 101190] [event-epoll.c:628:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2

[2019-05-13 09:49:43.880208] I [MSGID: 114020] [client.c:2356:notify] 0-vol0-client-0: parent translators are ready, attempting connect on transport

[2019-05-13 09:49:43.880609] I [MSGID: 114020] [client.c:2356:notify] 0-vol0-client-1: parent translators are ready, attempting connect on transport

[2019-05-13 09:49:43.880816] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-vol0-client-0: changing port to 49155 (from 0)

Final graph:

+------------------------------------------------------------------------------+

1: volume vol0-client-0

2: type protocol/client

3: option ping-timeout 5

4: option remote-host pvetau01-storage

5: option remote-subvolume /var/lib/glusterfs/data01/brick1/vol0

6: option transport-type socket

7: option transport.address-family inet

8: option username 4ccc2234-fba7-40f9-b97b-26d3fa8ab401

9: option password cef1b5f5-b16c-4a3c-b49f-f814901a3252

10: option filter-O_DIRECT enable

11: option send-gids true

12: end-volume

13:

14: volume vol0-client-1

15: type protocol/client

16: option ping-timeout 5

17: option remote-host pvetau02-storage

18: option remote-subvolume /var/lib/glusterfs/data01/brick1/vol0

19: option transport-type socket

20: option transport.address-family inet

21: option username 4ccc2234-fba7-40f9-b97b-26d3fa8ab401

22: option password cef1b5f5-b16c-4a3c-b49f-f814901a3252

23: option filter-O_DIRECT enable

24: option send-gids true

25: end-volume

26:

27: volume vol0-replicate-0

28: type cluster/replicate

29: option eager-lock enable

30: option quorum-count 1

31: subvolumes vol0-client-0 vol0-client-1

32: end-volume

33:

34: volume vol0-dht

35: type cluster/distribute

36: option lock-migration off

37: subvolumes vol0-replicate-0

38: end-volume

39:

40: volume vol0-write-behind

41: type performance/write-behind

42: subvolumes vol0-dht

43: end-volume

44:

45: volume vol0-readdir-ahead

46: type performance/readdir-ahead

47: subvolumes vol0-write-behind

48: end-volume

49:

50: volume vol0-open-behind

51: type performance/open-behind

52: subvolumes vol0-readdir-ahead

53: end-volume

54:

55: volume vol0

56: type debug/io-stats

57: option log-level INFO

58: option latency-measurement off

59: option count-fop-hits off

60: subvolumes vol0-open-behind

61: end-volume

62:

63: volume meta-autoload

64: type meta

65: subvolumes vol0

66: end-volume

67:

+------------------------------------------------------------------------------+

[2019-05-13 09:49:43.881243] I [MSGID: 114057] [client-handshake.c:1447:select_server_supported_programs] 0-vol0-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)

[2019-05-13 09:49:43.881434] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-vol0-client-1: changing port to 49154 (from 0)

[2019-05-13 09:49:43.881906] I [MSGID: 114057] [client-handshake.c:1447:select_server_supported_programs] 0-vol0-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)

[2019-05-13 09:49:43.882213] I [MSGID: 114046] [client-handshake.c:1223:client_setvolume_cbk] 0-vol0-client-1: Connected to vol0-client-1, attached to remote volume '/var/lib/glusterfs/data01/brick1/vol0'.

[2019-05-13 09:49:43.882222] I [MSGID: 114047] [client-handshake.c:1234:client_setvolume_cbk] 0-vol0-client-1: Server and Client lk-version numbers are not same, reopening the fds

[2019-05-13 09:49:43.882249] I [MSGID: 108005] [afr-common.c:4382:afr_notify] 0-vol0-replicate-0: Subvolume 'vol0-client-1' came back up; going online.

[2019-05-13 09:49:43.882360] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-vol0-client-1: Server lk version = 1

[2019-05-13 09:49:43.886625] I [MSGID: 114046] [client-handshake.c:1223:client_setvolume_cbk] 0-vol0-client-0: Connected to vol0-client-0, attached to remote volume '/var/lib/glusterfs/data01/brick1/vol0'.

[2019-05-13 09:49:43.886633] I [MSGID: 114047] [client-handshake.c:1234:client_setvolume_cbk] 0-vol0-client-0: Server and Client lk-version numbers are not same, reopening the fds

[2019-05-13 09:49:43.890995] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-vol0-client-0: Server lk version = 1

[2019-05-13 09:49:43.891049] I [fuse-bridge.c:4153:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.26

[2019-05-13 09:49:43.891067] I [fuse-bridge.c:4838:fuse_graph_sync] 0-fuse: switched to graph 0

[2019-05-13 09:49:43.891625] I [MSGID: 108031] [afr-common.c:2152:afr_local_discovery_cbk] 0-vol0-replicate-0: selecting local read_child vol0-client-0

[2019-05-13 10:20:38.998246] C [rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired] 0-vol0-client-1: server 10.10.12.32:49154 has not responded in the last 5 seconds, disconnecting.

[2019-05-13 10:20:38.998657] E [rpc-clnt.c:365:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7f69df41fe83] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1d1)[0x7f69df1e7b61] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f69df1e7c7e] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x89)[0x7f69df1e92e9] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x94)[0x7f69df1e9bb4] ))))) 0-vol0-client-1: forced unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2019-05-13 10:20:33.237111 (xid=0x492)

[2019-05-13 10:20:38.998681] W [MSGID: 114031] [client-rpc-fops.c:2933:client3_3_lookup_cbk] 0-vol0-client-1: remote operation failed. Path: / (00000000-0000-0000-0000-000000000001) [Transport endpoint is not connected]

[2019-05-13 10:20:38.998829] E [rpc-clnt.c:365:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7f69df41fe83] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1d1)[0x7f69df1e7b61] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f69df1e7c7e] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x89)[0x7f69df1e92e9] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x94)[0x7f69df1e9bb4] ))))) 0-vol0-client-1: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2019-05-13 10:20:33.237115 (xid=0x493)

[2019-05-13 10:20:38.998843] W [rpc-clnt-ping.c:203:rpc_clnt_ping_cbk] 0-vol0-client-1: socket disconnected

[2019-05-13 10:20:38.998854] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-vol0-client-1: disconnected from vol0-client-1. Client process will keep trying to connect to glusterd until brick's port is available

[2019-05-13 10:20:43.355917] I [MSGID: 108031] [afr-common.c:2152:afr_local_discovery_cbk] 0-vol0-replicate-0: selecting local read_child vol0-client-0

[2019-05-13 10:21:20.850030] E [socket.c:2309:socket_connect_finish] 0-vol0-client-1: connection to 10.10.12.32:24007 failed (No route to host)

[2019-05-13 10:22:07.026615] E [MSGID: 114058] [client-handshake.c:1534:client_query_portmap_cbk] 0-vol0-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.

[2019-05-13 10:22:07.026663] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-vol0-client-1: disconnected from vol0-client-1. Client process will keep trying to connect to glusterd until brick's port is available

[2019-05-13 10:22:10.010421] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-vol0-client-1: changing port to 49154 (from 0)

[2019-05-13 10:22:10.011105] I [MSGID: 114057] [client-handshake.c:1447:select_server_supported_programs] 0-vol0-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)

[2019-05-13 10:22:10.011558] I [MSGID: 114046] [client-handshake.c:1223:client_setvolume_cbk] 0-vol0-client-1: Connected to vol0-client-1, attached to remote volume '/var/lib/glusterfs/data01/brick1/vol0'.

[2019-05-13 10:22:10.011609] I [MSGID: 114047] [client-handshake.c:1234:client_setvolume_cbk] 0-vol0-client-1: Server and Client lk-version numbers are not same, reopening the fds

[2019-05-13 10:22:10.011622] I [MSGID: 114042] [client-handshake.c:1054:client_post_handshake] 0-vol0-client-1: 2 fds open - Delaying child_up until they are re-opened

[2019-05-13 10:22:10.032258] I [MSGID: 114041] [client-handshake.c:676:client_child_up_reopen_done] 0-vol0-client-1: last fd open'd/lock-self-heal'd - notifying CHILD-UP

[2019-05-13 10:22:10.032492] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-vol0-client-1: Server lk version = 1

[2019-05-13 10:22:13.790586] I [MSGID: 108031] [afr-common.c:2152:afr_local_discovery_cbk] 0-vol0-replicate-0: selecting local read_child vol0-client-0

[2019-05-13 11:12:57.300347] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing READ on gfid 5f9490a8-ec56-410e-9c70-653e0da77174: split-brain observed. [Input/output error]

[2019-05-13 11:12:57.305284] W [MSGID: 108008] [afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 found with event generation 4 for gfid 5f9490a8-ec56-410e-9c70-653e0da77174. (Possible split-brain)

[2019-05-13 11:12:57.305712] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing FGETXATTR on gfid 5f9490a8-ec56-410e-9c70-653e0da77174: split-brain observed. [Input/output error]

[2019-05-13 11:12:57.306277] W [MSGID: 108027] [afr-common.c:2491:afr_discover_done] 0-vol0-replicate-0: no read subvols for (null)

[2019-05-13 11:12:57.306938] I [MSGID: 114024] [client-helpers.c:99:this_fd_set_ctx] 0-vol0-client-0: /images/103/vm-103-disk-0.qcow2 (5f9490a8-ec56-410e-9c70-653e0da77174): trying duplicate remote fd set.

[2019-05-13 11:12:57.306973] I [MSGID: 114024] [client-helpers.c:99:this_fd_set_ctx] 0-vol0-client-1: /images/103/vm-103-disk-0.qcow2 (5f9490a8-ec56-410e-9c70-653e0da77174): trying duplicate remote fd set.

[2019-05-13 11:12:57.310052] W [fuse-bridge.c:2228:fuse_readv_cbk] 0-glusterfs-fuse: 2698: READ => -1 gfid=5f9490a8-ec56-410e-9c70-653e0da77174 fd=0x7f69d1cba184 (Input/output error)

[2019-05-13 11:12:57.310137] W [fuse-bridge.c:2228:fuse_readv_cbk] 0-glusterfs-fuse: 2697: READ => -1 gfid=5f9490a8-ec56-410e-9c70-653e0da77174 fd=0x7f69d1cba184 (Input/output error)

[2019-05-13 11:12:57.311543] W [fuse-bridge.c:2228:fuse_readv_cbk] 0-glusterfs-fuse: 2699: READ => -1 gfid=5f9490a8-ec56-410e-9c70-653e0da77174 fd=0x7f69d1cba184 (Input/output error)

The message "E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing FGETXATTR on gfid 5f9490a8-ec56-410e-9c70-653e0da77174: split-brain observed. [Input/output error]" repeated 2 times between [2019-05-13 11:12:57.305712] and [2019-05-13 11:12:57.310816]

The message "W [MSGID: 108027] [afr-common.c:2491:afr_discover_done] 0-vol0-replicate-0: no read subvols for (null)" repeated 2 times between [2019-05-13 11:12:57.306277] and [2019-05-13 11:12:57.311184]

The message "W [MSGID: 108008] [afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 found with event generation 4 for gfid 5f9490a8-ec56-410e-9c70-653e0da77174. (Possible split-brain)" repeated 6 times between [2019-05-13 11:12:57.305284] and [2019-05-13 11:12:57.311274]

The message "E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing READ on gfid 5f9490a8-ec56-410e-9c70-653e0da77174: split-brain observed. [Input/output error]" repeated 5 times between [2019-05-13 11:12:57.300347] and [2019-05-13 11:12:57.311531]


Server 2:
[2019-05-13 04:25:01.338790] I [MSGID: 100011] [glusterfsd.c:1396:reincarnate] 0-glusterfsd: Fetching the volume file from server...

[2019-05-13 09:47:59.443328] E [socket.c:2309:socket_connect_finish] 0-glusterfs: connection to 10.10.12.31:24007 failed (Connection refused)

[2019-05-13 09:48:17.426580] C [rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired] 0-vol0-client-0: server 10.10.12.31:49155 has not responded in the last 5 seconds, disconnecting.

[2019-05-13 09:48:17.426872] E [rpc-clnt.c:365:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7efebd3f9e83] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1d1)[0x7efebd1c1b61] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7efebd1c1c7e] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x89)[0x7efebd1c32e9] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x94)[0x7efebd1c3bb4] ))))) 0-vol0-client-0: forced unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) called at 2019-05-13 09:48:12.180579 (xid=0x5663a4)

[2019-05-13 09:48:17.426899] W [MSGID: 114031] [client-rpc-fops.c:2933:client3_3_lookup_cbk] 0-vol0-client-0: remote operation failed. Path: / (00000000-0000-0000-0000-000000000001) [Transport endpoint is not connected]

[2019-05-13 09:48:17.427056] E [rpc-clnt.c:365:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7efebd3f9e83] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1d1)[0x7efebd1c1b61] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7efebd1c1c7e] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x89)[0x7efebd1c32e9] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x94)[0x7efebd1c3bb4] ))))) 0-vol0-client-0: forced unwinding frame type(GF-DUMP) op(NULL(2)) called at 2019-05-13 09:48:12.180591 (xid=0x5663a5)

[2019-05-13 09:48:17.427067] W [rpc-clnt-ping.c:203:rpc_clnt_ping_cbk] 0-vol0-client-0: socket disconnected

[2019-05-13 09:48:17.427077] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-vol0-client-0: disconnected from vol0-client-0. Client process will keep trying to connect to glusterd until brick's port is available

[2019-05-13 09:48:21.479100] I [MSGID: 108031] [afr-common.c:2152:afr_local_discovery_cbk] 0-vol0-replicate-0: selecting local read_child vol0-client-1

[2019-05-13 09:48:59.219302] E [socket.c:2309:socket_connect_finish] 0-vol0-client-0: connection to 10.10.12.31:24007 failed (No route to host)

[2019-05-13 09:49:41.468469] I [glusterfsd-mgmt.c:1600:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing

[2019-05-13 09:49:42.505174] E [MSGID: 114058] [client-handshake.c:1534:client_query_portmap_cbk] 0-vol0-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.

[2019-05-13 09:49:42.505225] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-vol0-client-0: disconnected from vol0-client-0. Client process will keep trying to connect to glusterd until brick's port is available

[2019-05-13 09:49:45.442003] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-vol0-client-0: changing port to 49155 (from 0)

[2019-05-13 09:49:45.442523] I [MSGID: 114057] [client-handshake.c:1447:select_server_supported_programs] 0-vol0-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)

[2019-05-13 09:49:45.442802] I [MSGID: 114046] [client-handshake.c:1223:client_setvolume_cbk] 0-vol0-client-0: Connected to vol0-client-0, attached to remote volume '/var/lib/glusterfs/data01/brick1/vol0'.

[2019-05-13 09:49:45.442812] I [MSGID: 114047] [client-handshake.c:1234:client_setvolume_cbk] 0-vol0-client-0: Server and Client lk-version numbers are not same, reopening the fds

[2019-05-13 09:49:45.442820] I [MSGID: 114042] [client-handshake.c:1054:client_post_handshake] 0-vol0-client-0: 2 fds open - Delaying child_up until they are re-opened

[2019-05-13 09:49:45.443244] I [MSGID: 114041] [client-handshake.c:676:client_child_up_reopen_done] 0-vol0-client-0: last fd open'd/lock-self-heal'd - notifying CHILD-UP

[2019-05-13 09:49:45.443353] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-vol0-client-0: Server lk version = 1

[2019-05-13 09:49:49.622255] I [MSGID: 108031] [afr-common.c:2152:afr_local_discovery_cbk] 0-vol0-replicate-0: selecting local read_child vol0-client-1

[2019-05-13 10:20:06.060045] W [glusterfsd.c:1327:cleanup_and_exit] (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x7494) [0x7efebc254494] -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xf5) [0x55dba7a3b5e5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x54) [0x55dba7a3b444] ) 0-: received signum (15), shutting down

[2019-05-13 10:20:06.068969] I [fuse-bridge.c:5794:fini] 0-fuse: Unmounting '/mnt/pve/GlusterVol01'.

[2019-05-13 10:20:06.103235] I [fuse-bridge.c:5086:fuse_thread_proc] 0-fuse: unmounting /mnt/pve/GlusterVol01

[2019-05-13 10:22:08.842734] I [MSGID: 100030] [glusterfsd.c:2454:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.8.8 (args: /usr/sbin/glusterfs --volfile-server=10.10.12.31 --volfile-id=vol0 /mnt/pve/GlusterVol01)

[2019-05-13 10:22:08.853935] I [MSGID: 101190] [event-epoll.c:628:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1

[2019-05-13 10:22:08.944855] W [MSGID: 108003] [afr.c:102:fix_quorum_options] 0-vol0-replicate-0: quorum-type none overriding quorum-count 1

[2019-05-13 10:22:08.946502] I [MSGID: 101190] [event-epoll.c:628:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2

[2019-05-13 10:22:08.972020] I [MSGID: 114020] [client.c:2356:notify] 0-vol0-client-0: parent translators are ready, attempting connect on transport

[2019-05-13 10:22:08.972395] I [MSGID: 114020] [client.c:2356:notify] 0-vol0-client-1: parent translators are ready, attempting connect on transport


[2019-05-13 10:22:08.972832] I [rpc-clnt.c:1965:rpc_clnt_reconfig] 0-vol0-client-0: changing port to 49155 (from 0)

[2019-05-13 10:22:08.973142] I [MSGID: 114057] [client-handshake.c:1447:select_server_supported_programs] 0-vol0-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)

[2019-05-13 10:22:08.973231] I [MSGID: 114057] [client-handshake.c:1447:select_server_supported_programs] 0-vol0-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)

[2019-05-13 10:22:08.973544] I [MSGID: 114046] [client-handshake.c:1223:client_setvolume_cbk] 0-vol0-client-1: Connected to vol0-client-1, attached to remote volume '/var/lib/glusterfs/data01/brick1/vol0'.

[2019-05-13 10:22:08.973544] I [MSGID: 114046] [client-handshake.c:1223:client_setvolume_cbk] 0-vol0-client-0: Connected to vol0-client-0, attached to remote volume '/var/lib/glusterfs/data01/brick1/vol0'.

[2019-05-13 10:22:08.973566] I [MSGID: 114047] [client-handshake.c:1234:client_setvolume_cbk] 0-vol0-client-0: Server and Client lk-version numbers are not same, reopening the fds

[2019-05-13 10:22:08.973567] I [MSGID: 114047] [client-handshake.c:1234:client_setvolume_cbk] 0-vol0-client-1: Server and Client lk-version numbers are not same, reopening the fds

[2019-05-13 10:22:08.973616] I [MSGID: 108005] [afr-common.c:4382:afr_notify] 0-vol0-replicate-0: Subvolume 'vol0-client-1' came back up; going online.

[2019-05-13 10:22:08.973639] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-vol0-client-1: Server lk version = 1

[2019-05-13 10:22:08.977940] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-vol0-client-0: Server lk version = 1

[2019-05-13 10:22:08.978055] I [fuse-bridge.c:4153:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.26

[2019-05-13 10:22:08.978075] I [fuse-bridge.c:4838:fuse_graph_sync] 0-fuse: switched to graph 0

[2019-05-13 10:22:08.978603] I [MSGID: 108031] [afr-common.c:2152:afr_local_discovery_cbk] 0-vol0-replicate-0: selecting local read_child vol0-client-1

[2019-05-13 10:53:46.573894] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing READ on gfid 5f9490a8-ec56-410e-9c70-653e0da77174: split-brain observed. [Input/output error]

[2019-05-13 10:53:46.573992] W [MSGID: 108008] [afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 found with event generation 2 for gfid 5f9490a8-ec56-410e-9c70-653e0da77174. (Possible split-brain)

[2019-05-13 10:53:46.574253] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing FGETXATTR on gfid 5f9490a8-ec56-410e-9c70-653e0da77174: split-brain observed. [Input/output error]

[2019-05-13 10:53:46.574949] W [MSGID: 108027] [afr-common.c:2491:afr_discover_done] 0-vol0-replicate-0: no read subvols for (null)

[2019-05-13 10:53:46.575526] W [fuse-bridge.c:2228:fuse_readv_cbk] 0-glusterfs-fuse: 1380: READ => -1 gfid=5f9490a8-ec56-410e-9c70-653e0da77174 fd=0x7f649c00e06c (Input/output error)

[2019-05-13 10:53:46.577820] W [fuse-bridge.c:2228:fuse_readv_cbk] 0-glusterfs-fuse: 1381: READ => -1 gfid=5f9490a8-ec56-410e-9c70-653e0da77174 fd=0x7f649c00e06c (Input/output error)

[2019-05-13 10:53:46.596838] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing READ on gfid 609bb8be-3ae8-470d-9f88-2b65095fbed4: split-brain observed. [Input/output error]

[2019-05-13 10:53:46.597759] W [MSGID: 108008] [afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 found with event generation 2 for gfid 609bb8be-3ae8-470d-9f88-2b65095fbed4. (Possible split-brain)

[2019-05-13 10:53:46.598916] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing FGETXATTR on gfid 609bb8be-3ae8-470d-9f88-2b65095fbed4: split-brain observed. [Input/output error]

The message "W [MSGID: 108027] [afr-common.c:2491:afr_discover_done] 0-vol0-replicate-0: no read subvols for (null)" repeated 2 times between [2019-05-13 10:53:46.574949] and [2019-05-13 10:53:46.599257]

[2019-05-13 10:53:46.599525] W [MSGID: 108008] [afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 found with event generation 2 for gfid 609bb8be-3ae8-470d-9f88-2b65095fbed4. (Possible split-brain)

[2019-05-13 10:53:46.599797] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing READ on gfid 609bb8be-3ae8-470d-9f88-2b65095fbed4: split-brain observed. [Input/output error]

[2019-05-13 10:53:46.599825] W [fuse-bridge.c:2228:fuse_readv_cbk] 0-glusterfs-fuse: 1389: READ => -1 gfid=609bb8be-3ae8-470d-9f88-2b65095fbed4 fd=0x7f649c00e06c (Input/output error)

[2019-05-13 10:53:46.599876] W [MSGID: 108008] [afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 found with event generation 2 for gfid 609bb8be-3ae8-470d-9f88-2b65095fbed4. (Possible split-brain)

[2019-05-13 10:53:46.600149] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing READ on gfid 609bb8be-3ae8-470d-9f88-2b65095fbed4: split-brain observed. [Input/output error]

[2019-05-13 10:53:46.600193] W [MSGID: 108008] [afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 found with event generation 2 for gfid 609bb8be-3ae8-470d-9f88-2b65095fbed4. (Possible split-brain)

[2019-05-13 10:53:46.600417] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing FGETXATTR on gfid 609bb8be-3ae8-470d-9f88-2b65095fbed4: split-brain observed. [Input/output error]

[2019-05-13 10:53:46.600775] W [MSGID: 108027] [afr-common.c:2491:afr_discover_done] 0-vol0-replicate-0: no read subvols for (null)

[2019-05-13 10:53:46.601071] W [MSGID: 108008] [afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 found with event generation 2 for gfid 609bb8be-3ae8-470d-9f88-2b65095fbed4. (Possible split-brain)

[2019-05-13 10:53:46.601537] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing READ on gfid 609bb8be-3ae8-470d-9f88-2b65095fbed4: split-brain observed. [Input/output error]

[2019-05-13 10:53:46.601577] W [fuse-bridge.c:2228:fuse_readv_cbk] 0-glusterfs-fuse: 1390: READ => -1 gfid=609bb8be-3ae8-470d-9f88-2b65095fbed4 fd=0x7f649c00e06c (Input/output error)

[2019-05-13 10:53:46.619830] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing READ on gfid 79423c92-0338-4dc9-bafc-091172e8d845: split-brain observed. [Input/output error]

[2019-05-13 10:53:46.620701] W [MSGID: 108008] [afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 found with event generation 2 for gfid 79423c92-0338-4dc9-bafc-091172e8d845. (Possible split-brain)

[2019-05-13 10:53:46.621098] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing FGETXATTR on gfid 79423c92-0338-4dc9-bafc-091172e8d845: split-brain observed. [Input/output error]

[2019-05-13 10:53:46.621455] W [MSGID: 108027] [afr-common.c:2491:afr_discover_done] 0-vol0-replicate-0: no read subvols for (null)

[2019-05-13 10:53:46.621732] W [MSGID: 108008] [afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 found with event generation 2 for gfid 79423c92-0338-4dc9-bafc-091172e8d845. (Possible split-brain)


[2019-05-13 10:53:46.623509] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing READ on gfid 79423c92-0338-4dc9-bafc-091172e8d845: split-brain observed. [Input/output error]


[2019-05-13 10:53:46.624891] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing FGETXATTR on gfid 79423c92-0338-4dc9-bafc-091172e8d845: split-brain observed. [Input/output error]

[2019-05-13 10:53:46.625212] W [MSGID: 108027] [afr-common.c:2491:afr_discover_done] 0-vol0-replicate-0: no read subvols for (null)

[2019-05-13 10:53:46.625314] W [MSGID: 108008] [afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 found with event generation 2 for gfid 79423c92-0338-4dc9-bafc-091172e8d845. (Possible split-brain)

[2019-05-13 10:53:46.625721] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing READ on gfid 79423c92-0338-4dc9-bafc-091172e8d845: split-brain observed. [Input/output error]

[2019-05-13 10:53:46.625754] W [fuse-bridge.c:2228:fuse_readv_cbk] 0-glusterfs-fuse: 1399: READ => -1 gfid=79423c92-0338-4dc9-bafc-091172e8d845 fd=0x7f649c00e06c (Input/output error)

[2019-05-13 10:53:46.576286] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing FGETXATTR on gfid 5f9490a8-ec56-410e-9c70-653e0da77174: split-brain observed. [Input/output error]


[2019-05-13 10:56:28.176786] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing READ on gfid 5f9490a8-ec56-410e-9c70-653e0da77174: split-brain observed. [Input/output error]

[2019-05-13 10:56:28.177684] W [MSGID: 108008] [afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 found with event generation 2 for gfid 5f9490a8-ec56-410e-9c70-653e0da77174. (Possible split-brain)

[2019-05-13 10:56:28.178782] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing FGETXATTR on gfid 5f9490a8-ec56-410e-9c70-653e0da77174: split-brain observed. [Input/output error]

[2019-05-13 10:56:28.179128] W [MSGID: 108027] [afr-common.c:2491:afr_discover_done] 0-vol0-replicate-0: no read subvols for (null)

[2019-05-13 10:56:28.180634] W [fuse-bridge.c:2228:fuse_readv_cbk] 0-glusterfs-fuse: 1533: READ => -1 gfid=5f9490a8-ec56-410e-9c70-653e0da77174 fd=0x7f649c00e06c (Input/output error)

[2019-05-13 10:56:28.179439] W [MSGID: 108008] [afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 found with event generation 2 for gfid 5f9490a8-ec56-410e-9c70-653e0da77174. (Possible split-brain)

[2019-05-13 10:56:28.180620] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing READ on gfid 5f9490a8-ec56-410e-9c70-653e0da77174: split-brain observed. [Input/output error]

[2019-05-13 10:59:25.278595] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing READ on gfid 5f9490a8-ec56-410e-9c70-653e0da77174: split-brain observed. [Input/output error]

[2019-05-13 10:59:25.279517] W [MSGID: 108008] [afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 found with event generation 2 for gfid 5f9490a8-ec56-410e-9c70-653e0da77174. (Possible split-brain)

[2019-05-13 10:59:25.280605] E [MSGID: 108008] [afr-read-txn.c:80:afr_read_txn_refresh_done] 0-vol0-replicate-0: Failing FGETXATTR on gfid 5f9490a8-ec56-410e-9c70-653e0da77174: split-brain observed. [Input/output error]


[2019-05-13 10:59:25.281649] W [fuse-bridge.c:2228:fuse_readv_cbk] 0-glusterfs-fuse: 1685: READ => -1 gfid=5f9490a8-ec56-410e-9c70-653e0da77174 fd=0x7f649c00e06c (Input/output error)

[2019-05-13 10:59:25.281250] W [MSGID: 108008] [afr-read-txn.c:238:afr_read_txn] 0-vol0-replicate-0: Unreadable subvolume -1 found with event generation 2 for gfid 5f9490a8-ec56-410e-9c70-653e0da77174. (Possible split-brain)
-------------------------------------------------


What we can't explain is why server 1 does the following:

[2019-05-13 09:47:48.277650] W [socket.c:590:__socket_rwv] 0-glusterfs: readv on 10.10.12.31:24007 failed (No data available)

[2019-05-13 09:47:48.277696] E [glusterfsd-mgmt.c:1908:mgmt_rpc_notify] 0-glusterfsd-mgmt: failed to connect with remote-host: 10.10.12.31 (No data available)

[2019-05-13 09:47:48.277704] I [glusterfsd-mgmt.c:1926:mgmt_rpc_notify] 0-glusterfsd-mgmt: Exhausted all volfile servers


then the volume will be unmounted and re-mounted with another port again.
In further consequence server 2 behaves exactly like this which consequences in a a split-brain condition of the disk files of the VMs.

we would be glad if someone could explain these behaviors to us.

BR
Ren?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190524/7649a418/attachment.html>

From sdeepugd at gmail.com  Fri May 31 09:47:09 2019
From: sdeepugd at gmail.com (deepu srinivasan)
Date: Fri, 31 May 2019 09:47:09 -0000
Subject: [Gluster-users] Geo Replication stops replicating
In-Reply-To: <CAKs6y9wcDMyVO2r6cAqs-JvdjCajzjit0DnnNxN8b=NVLQU8xw@mail.gmail.com>
References: <CAKs6y9yUt4x-VyYwAq=EQfHsH+KQGVyxH7EwC24SV0HSGhQjAw@mail.gmail.com>
	<CAPgWtC6WdUxxcT5cOsyNkxzjj1oCB8ychgxWNW5wpDK40AQ+SQ@mail.gmail.com>
	<CAKs6y9wcDMyVO2r6cAqs-JvdjCajzjit0DnnNxN8b=NVLQU8xw@mail.gmail.com>
Message-ID: <CAKs6y9x0wF62KVg7ObTfPpVOSKmcQE-wV_+wdandaQkEAgGR8Q@mail.gmail.com>

Hi Kotresh
The above-mentioned work around did not work properly.

On Fri, May 31, 2019 at 3:16 PM deepu srinivasan <sdeepugd at gmail.com> wrote:

> Hi Kotresh
> We have tried the above-mentioned rsync option and we are planning to have
> the version upgrade to 6.0.
>
> On Fri, May 31, 2019 at 11:04 AM Kotresh Hiremath Ravishankar <
> khiremat at redhat.com> wrote:
>
>> Hi,
>>
>> This looks like the hang because stderr buffer filled up with errors
>> messages and no one reading it.
>> I think this issue is fixed in latest releases. As a workaround, you can
>> do following and check if it works.
>>
>> Prerequisite:
>>  rsync version should be > 3.1.0
>>
>> Workaround:
>> gluster volume geo-replication <MASTERVOL> <SLAVEHOST>::<SLAVEVOL> config
>> rsync-options "--ignore-missing-args"
>>
>> Thanks,
>> Kotresh HR
>>
>>
>>
>>
>> On Thu, May 30, 2019 at 5:39 PM deepu srinivasan <sdeepugd at gmail.com>
>> wrote:
>>
>>> Hi
>>> We were evaluating Gluster geo Replication between two DCs one is in US
>>> west and one is in US east. We took multiple trials for different file
>>> size.
>>> The Geo Replication tends to stop replicating but while checking the
>>> status it appears to be in Active state. But the slave volume did not
>>> increase in size.
>>> So we have restarted the geo-replication session and checked the status.
>>> The status was in an active state and it was in History Crawl for a long
>>> time. We have enabled the DEBUG mode in logging and checked for any error.
>>> There was around 2000 file appeared for syncing candidate. The Rsync
>>> process starts but the rsync did not happen in the slave volume. Every time
>>> the rsync process appears in the "ps auxxx" list but the replication did
>>> not happen in the slave end. What would be the cause of this problem? Is
>>> there anyway to debug it?
>>>
>>> We have also checked the strace of the rync program.
>>> it displays something like this
>>>
>>> "write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 128"
>>>
>>>
>>> We are using the below specs
>>>
>>> Gluster version - 4.1.7
>>> Sync mode - rsync
>>> Volume - 1x3 in each end (master and slave)
>>> Intranet Bandwidth - 10 Gig
>>>
>>
>>
>> --
>> Thanks and Regards,
>> Kotresh H R
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190531/cb09098f/attachment-0001.html>

From sdeepugd at gmail.com  Fri May 31 10:41:41 2019
From: sdeepugd at gmail.com (deepu srinivasan)
Date: Fri, 31 May 2019 10:41:41 -0000
Subject: [Gluster-users] Geo Replication stops replicating
In-Reply-To: <CAPgWtC56YLaOx4FFBe3y+M=HnGhdM_bsf2B680e7g_vAk4B_Eg@mail.gmail.com>
References: <CAKs6y9yUt4x-VyYwAq=EQfHsH+KQGVyxH7EwC24SV0HSGhQjAw@mail.gmail.com>
	<CAPgWtC6WdUxxcT5cOsyNkxzjj1oCB8ychgxWNW5wpDK40AQ+SQ@mail.gmail.com>
	<CAKs6y9wcDMyVO2r6cAqs-JvdjCajzjit0DnnNxN8b=NVLQU8xw@mail.gmail.com>
	<CAKs6y9x0wF62KVg7ObTfPpVOSKmcQE-wV_+wdandaQkEAgGR8Q@mail.gmail.com>
	<CAPgWtC56YLaOx4FFBe3y+M=HnGhdM_bsf2B680e7g_vAk4B_Eg@mail.gmail.com>
Message-ID: <CAKs6y9x4H7g-_O-CPxpK3uT6W-tK1UsB1Dgu22+fsMCUai+cUQ@mail.gmail.com>

Done.
We got the following result .

> 1559298781.338234 write(2, "rsync: link_stat
> \"/tmp/gsyncd-aux-mount-EEJ_sY/.gfid/3fa6aed8-802e-4efe-9903-8bc171176d88\"
> failed: No such file or directory (2)", 128

seems like a file is missing ?

On Fri, May 31, 2019 at 3:25 PM Kotresh Hiremath Ravishankar <
khiremat at redhat.com> wrote:

> Hi,
>
> Could you take the strace with with more string size? The argument strings
> are truncated.
>
> strace -s 500 -ttt -T -p <rsync pid>
>
> On Fri, May 31, 2019 at 3:17 PM deepu srinivasan <sdeepugd at gmail.com>
> wrote:
>
>> Hi Kotresh
>> The above-mentioned work around did not work properly.
>>
>> On Fri, May 31, 2019 at 3:16 PM deepu srinivasan <sdeepugd at gmail.com>
>> wrote:
>>
>>> Hi Kotresh
>>> We have tried the above-mentioned rsync option and we are planning to
>>> have the version upgrade to 6.0.
>>>
>>> On Fri, May 31, 2019 at 11:04 AM Kotresh Hiremath Ravishankar <
>>> khiremat at redhat.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> This looks like the hang because stderr buffer filled up with errors
>>>> messages and no one reading it.
>>>> I think this issue is fixed in latest releases. As a workaround, you
>>>> can do following and check if it works.
>>>>
>>>> Prerequisite:
>>>>  rsync version should be > 3.1.0
>>>>
>>>> Workaround:
>>>> gluster volume geo-replication <MASTERVOL> <SLAVEHOST>::<SLAVEVOL>
>>>> config rsync-options "--ignore-missing-args"
>>>>
>>>> Thanks,
>>>> Kotresh HR
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, May 30, 2019 at 5:39 PM deepu srinivasan <sdeepugd at gmail.com>
>>>> wrote:
>>>>
>>>>> Hi
>>>>> We were evaluating Gluster geo Replication between two DCs one is in
>>>>> US west and one is in US east. We took multiple trials for different file
>>>>> size.
>>>>> The Geo Replication tends to stop replicating but while checking the
>>>>> status it appears to be in Active state. But the slave volume did not
>>>>> increase in size.
>>>>> So we have restarted the geo-replication session and checked the
>>>>> status. The status was in an active state and it was in History Crawl for a
>>>>> long time. We have enabled the DEBUG mode in logging and checked for any
>>>>> error.
>>>>> There was around 2000 file appeared for syncing candidate. The Rsync
>>>>> process starts but the rsync did not happen in the slave volume. Every time
>>>>> the rsync process appears in the "ps auxxx" list but the replication did
>>>>> not happen in the slave end. What would be the cause of this problem? Is
>>>>> there anyway to debug it?
>>>>>
>>>>> We have also checked the strace of the rync program.
>>>>> it displays something like this
>>>>>
>>>>> "write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 128"
>>>>>
>>>>>
>>>>> We are using the below specs
>>>>>
>>>>> Gluster version - 4.1.7
>>>>> Sync mode - rsync
>>>>> Volume - 1x3 in each end (master and slave)
>>>>> Intranet Bandwidth - 10 Gig
>>>>>
>>>>
>>>>
>>>> --
>>>> Thanks and Regards,
>>>> Kotresh H R
>>>>
>>>
>
> --
> Thanks and Regards,
> Kotresh H R
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190531/a6cdb69c/attachment-0001.html>

From sdeepugd at gmail.com  Fri May 31 10:59:03 2019
From: sdeepugd at gmail.com (deepu srinivasan)
Date: Fri, 31 May 2019 10:59:03 -0000
Subject: [Gluster-users] Geo Replication stops replicating
In-Reply-To: <CAPgWtC51131=vGJ22=HDZQ7RdodnmKThTFYVtNH-LOusaJGosA@mail.gmail.com>
References: <CAKs6y9yUt4x-VyYwAq=EQfHsH+KQGVyxH7EwC24SV0HSGhQjAw@mail.gmail.com>
	<CAPgWtC6WdUxxcT5cOsyNkxzjj1oCB8ychgxWNW5wpDK40AQ+SQ@mail.gmail.com>
	<CAKs6y9wcDMyVO2r6cAqs-JvdjCajzjit0DnnNxN8b=NVLQU8xw@mail.gmail.com>
	<CAKs6y9x0wF62KVg7ObTfPpVOSKmcQE-wV_+wdandaQkEAgGR8Q@mail.gmail.com>
	<CAPgWtC56YLaOx4FFBe3y+M=HnGhdM_bsf2B680e7g_vAk4B_Eg@mail.gmail.com>
	<CAKs6y9x4H7g-_O-CPxpK3uT6W-tK1UsB1Dgu22+fsMCUai+cUQ@mail.gmail.com>
	<CAPgWtC51131=vGJ22=HDZQ7RdodnmKThTFYVtNH-LOusaJGosA@mail.gmail.com>
Message-ID: <CAKs6y9w1rU5oGLQbCAvi9VGPnUTi02Sz3YksWPuBgFG1YW6P=g@mail.gmail.com>

Hi
When i change the rsync option the rsync process doesnt seem to start .
Only a defunt process is listed in ps aux. Only when i set rsync option to
" " and restart all the process the rsync process is listed in ps aux.


On Fri, May 31, 2019 at 4:23 PM Kotresh Hiremath Ravishankar <
khiremat at redhat.com> wrote:

> Yes, rsync config option should have fixed this issue.
>
> Could you share the output of the following?
>
> 1. gluster volume geo-replication <MASTERVOL> <SLAVEHOST>::<SLAVEVOL>
> config rsync-options
> 2. ps -ef | grep rsync
>
> On Fri, May 31, 2019 at 4:11 PM deepu srinivasan <sdeepugd at gmail.com>
> wrote:
>
>> Done.
>> We got the following result .
>>
>>> 1559298781.338234 write(2, "rsync: link_stat
>>> \"/tmp/gsyncd-aux-mount-EEJ_sY/.gfid/3fa6aed8-802e-4efe-9903-8bc171176d88\"
>>> failed: No such file or directory (2)", 128
>>
>> seems like a file is missing ?
>>
>> On Fri, May 31, 2019 at 3:25 PM Kotresh Hiremath Ravishankar <
>> khiremat at redhat.com> wrote:
>>
>>> Hi,
>>>
>>> Could you take the strace with with more string size? The argument
>>> strings are truncated.
>>>
>>> strace -s 500 -ttt -T -p <rsync pid>
>>>
>>> On Fri, May 31, 2019 at 3:17 PM deepu srinivasan <sdeepugd at gmail.com>
>>> wrote:
>>>
>>>> Hi Kotresh
>>>> The above-mentioned work around did not work properly.
>>>>
>>>> On Fri, May 31, 2019 at 3:16 PM deepu srinivasan <sdeepugd at gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Kotresh
>>>>> We have tried the above-mentioned rsync option and we are planning to
>>>>> have the version upgrade to 6.0.
>>>>>
>>>>> On Fri, May 31, 2019 at 11:04 AM Kotresh Hiremath Ravishankar <
>>>>> khiremat at redhat.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> This looks like the hang because stderr buffer filled up with errors
>>>>>> messages and no one reading it.
>>>>>> I think this issue is fixed in latest releases. As a workaround, you
>>>>>> can do following and check if it works.
>>>>>>
>>>>>> Prerequisite:
>>>>>>  rsync version should be > 3.1.0
>>>>>>
>>>>>> Workaround:
>>>>>> gluster volume geo-replication <MASTERVOL> <SLAVEHOST>::<SLAVEVOL>
>>>>>> config rsync-options "--ignore-missing-args"
>>>>>>
>>>>>> Thanks,
>>>>>> Kotresh HR
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, May 30, 2019 at 5:39 PM deepu srinivasan <sdeepugd at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi
>>>>>>> We were evaluating Gluster geo Replication between two DCs one is in
>>>>>>> US west and one is in US east. We took multiple trials for different file
>>>>>>> size.
>>>>>>> The Geo Replication tends to stop replicating but while checking the
>>>>>>> status it appears to be in Active state. But the slave volume did not
>>>>>>> increase in size.
>>>>>>> So we have restarted the geo-replication session and checked the
>>>>>>> status. The status was in an active state and it was in History Crawl for a
>>>>>>> long time. We have enabled the DEBUG mode in logging and checked for any
>>>>>>> error.
>>>>>>> There was around 2000 file appeared for syncing candidate. The Rsync
>>>>>>> process starts but the rsync did not happen in the slave volume. Every time
>>>>>>> the rsync process appears in the "ps auxxx" list but the replication did
>>>>>>> not happen in the slave end. What would be the cause of this problem? Is
>>>>>>> there anyway to debug it?
>>>>>>>
>>>>>>> We have also checked the strace of the rync program.
>>>>>>> it displays something like this
>>>>>>>
>>>>>>> "write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 128"
>>>>>>>
>>>>>>>
>>>>>>> We are using the below specs
>>>>>>>
>>>>>>> Gluster version - 4.1.7
>>>>>>> Sync mode - rsync
>>>>>>> Volume - 1x3 in each end (master and slave)
>>>>>>> Intranet Bandwidth - 10 Gig
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Thanks and Regards,
>>>>>> Kotresh H R
>>>>>>
>>>>>
>>>
>>> --
>>> Thanks and Regards,
>>> Kotresh H R
>>>
>>
>
> --
> Thanks and Regards,
> Kotresh H R
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190531/d20c2c85/attachment-0001.html>

From sdeepugd at gmail.com  Fri May 31 12:02:35 2019
From: sdeepugd at gmail.com (deepu srinivasan)
Date: Fri, 31 May 2019 12:02:35 -0000
Subject: [Gluster-users] Geo Replication stops replicating
In-Reply-To: <CAPgWtC6DsMjHNK6Pj4S=f35QGMGYemMFr-RjocpExPZq7mG+bw@mail.gmail.com>
References: <CAKs6y9yUt4x-VyYwAq=EQfHsH+KQGVyxH7EwC24SV0HSGhQjAw@mail.gmail.com>
	<CAPgWtC6WdUxxcT5cOsyNkxzjj1oCB8ychgxWNW5wpDK40AQ+SQ@mail.gmail.com>
	<CAKs6y9wcDMyVO2r6cAqs-JvdjCajzjit0DnnNxN8b=NVLQU8xw@mail.gmail.com>
	<CAKs6y9x0wF62KVg7ObTfPpVOSKmcQE-wV_+wdandaQkEAgGR8Q@mail.gmail.com>
	<CAPgWtC56YLaOx4FFBe3y+M=HnGhdM_bsf2B680e7g_vAk4B_Eg@mail.gmail.com>
	<CAKs6y9x4H7g-_O-CPxpK3uT6W-tK1UsB1Dgu22+fsMCUai+cUQ@mail.gmail.com>
	<CAPgWtC51131=vGJ22=HDZQ7RdodnmKThTFYVtNH-LOusaJGosA@mail.gmail.com>
	<CAKs6y9w1rU5oGLQbCAvi9VGPnUTi02Sz3YksWPuBgFG1YW6P=g@mail.gmail.com>
	<CAPgWtC6DsMjHNK6Pj4S=f35QGMGYemMFr-RjocpExPZq7mG+bw@mail.gmail.com>
Message-ID: <CAKs6y9wdUwwwE5d2invMLiE5EexvoviZ1QQ17dHiR69yE=6big@mail.gmail.com>

Checked the data. It remains in 2708. No progress.

On Fri, May 31, 2019 at 4:36 PM Kotresh Hiremath Ravishankar <
khiremat at redhat.com> wrote:

> That means it could be working and the defunct process might be some old
> zombie one. Could you check, that data progress ?
>
> On Fri, May 31, 2019 at 4:29 PM deepu srinivasan <sdeepugd at gmail.com>
> wrote:
>
>> Hi
>> When i change the rsync option the rsync process doesnt seem to start .
>> Only a defunt process is listed in ps aux. Only when i set rsync option to
>> " " and restart all the process the rsync process is listed in ps aux.
>>
>>
>> On Fri, May 31, 2019 at 4:23 PM Kotresh Hiremath Ravishankar <
>> khiremat at redhat.com> wrote:
>>
>>> Yes, rsync config option should have fixed this issue.
>>>
>>> Could you share the output of the following?
>>>
>>> 1. gluster volume geo-replication <MASTERVOL> <SLAVEHOST>::<SLAVEVOL>
>>> config rsync-options
>>> 2. ps -ef | grep rsync
>>>
>>> On Fri, May 31, 2019 at 4:11 PM deepu srinivasan <sdeepugd at gmail.com>
>>> wrote:
>>>
>>>> Done.
>>>> We got the following result .
>>>>
>>>>> 1559298781.338234 write(2, "rsync: link_stat
>>>>> \"/tmp/gsyncd-aux-mount-EEJ_sY/.gfid/3fa6aed8-802e-4efe-9903-8bc171176d88\"
>>>>> failed: No such file or directory (2)", 128
>>>>
>>>> seems like a file is missing ?
>>>>
>>>> On Fri, May 31, 2019 at 3:25 PM Kotresh Hiremath Ravishankar <
>>>> khiremat at redhat.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Could you take the strace with with more string size? The argument
>>>>> strings are truncated.
>>>>>
>>>>> strace -s 500 -ttt -T -p <rsync pid>
>>>>>
>>>>> On Fri, May 31, 2019 at 3:17 PM deepu srinivasan <sdeepugd at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Kotresh
>>>>>> The above-mentioned work around did not work properly.
>>>>>>
>>>>>> On Fri, May 31, 2019 at 3:16 PM deepu srinivasan <sdeepugd at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Kotresh
>>>>>>> We have tried the above-mentioned rsync option and we are planning
>>>>>>> to have the version upgrade to 6.0.
>>>>>>>
>>>>>>> On Fri, May 31, 2019 at 11:04 AM Kotresh Hiremath Ravishankar <
>>>>>>> khiremat at redhat.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> This looks like the hang because stderr buffer filled up with
>>>>>>>> errors messages and no one reading it.
>>>>>>>> I think this issue is fixed in latest releases. As a workaround,
>>>>>>>> you can do following and check if it works.
>>>>>>>>
>>>>>>>> Prerequisite:
>>>>>>>>  rsync version should be > 3.1.0
>>>>>>>>
>>>>>>>> Workaround:
>>>>>>>> gluster volume geo-replication <MASTERVOL> <SLAVEHOST>::<SLAVEVOL>
>>>>>>>> config rsync-options "--ignore-missing-args"
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Kotresh HR
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, May 30, 2019 at 5:39 PM deepu srinivasan <
>>>>>>>> sdeepugd at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi
>>>>>>>>> We were evaluating Gluster geo Replication between two DCs one is
>>>>>>>>> in US west and one is in US east. We took multiple trials for different
>>>>>>>>> file size.
>>>>>>>>> The Geo Replication tends to stop replicating but while checking
>>>>>>>>> the status it appears to be in Active state. But the slave volume did not
>>>>>>>>> increase in size.
>>>>>>>>> So we have restarted the geo-replication session and checked the
>>>>>>>>> status. The status was in an active state and it was in History Crawl for a
>>>>>>>>> long time. We have enabled the DEBUG mode in logging and checked for any
>>>>>>>>> error.
>>>>>>>>> There was around 2000 file appeared for syncing candidate. The
>>>>>>>>> Rsync process starts but the rsync did not happen in the slave volume.
>>>>>>>>> Every time the rsync process appears in the "ps auxxx" list but the
>>>>>>>>> replication did not happen in the slave end. What would be the cause of
>>>>>>>>> this problem? Is there anyway to debug it?
>>>>>>>>>
>>>>>>>>> We have also checked the strace of the rync program.
>>>>>>>>> it displays something like this
>>>>>>>>>
>>>>>>>>> "write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 128"
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> We are using the below specs
>>>>>>>>>
>>>>>>>>> Gluster version - 4.1.7
>>>>>>>>> Sync mode - rsync
>>>>>>>>> Volume - 1x3 in each end (master and slave)
>>>>>>>>> Intranet Bandwidth - 10 Gig
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Thanks and Regards,
>>>>>>>> Kotresh H R
>>>>>>>>
>>>>>>>
>>>>>
>>>>> --
>>>>> Thanks and Regards,
>>>>> Kotresh H R
>>>>>
>>>>
>>>
>>> --
>>> Thanks and Regards,
>>> Kotresh H R
>>>
>>
>
> --
> Thanks and Regards,
> Kotresh H R
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190531/5cecc590/attachment-0001.html>

From matthew.has.questions at gmail.com  Thu May 23 18:10:32 2019
From: matthew.has.questions at gmail.com (Matthew B)
Date: Thu, 23 May 2019 18:10:32 -0000
Subject: [Gluster-users] Geo-Replication faulty - Changelog register failed
 error=[Errno 21] Is a directory
Message-ID: <CAJcSNDcu=a7PmDT6-SYrUc9+D0KXCyYoCX11fS5a9Bd4grBoEg@mail.gmail.com>

Hello - I am having a problem with geo-replication on glusterv5 that I hope
someone can help me with.

I have a 7-server distribute cluster as the primary volume, and a 2 server
distribute cluster as the secondary volume. Both are running the same
version of gluster on CentOS 7: glusterfs-5.3-2.el7.x86_64

I was able to setup the replication keys, user, groups, etc and establish
the session, but it goes faulty quickly after initializing.

I ran into the missing libgfchangelog.so error and fixed with a symlink:

[root at pcic-backup01 ~]# ln -s /usr/lib64/libgfchangelog.so.0
/usr/lib64/libgfchangelog.so
[root at pcic-backup01 ~]# ls -lh /usr/lib64/libgfchangelog.so*
lrwxrwxrwx. 1 root root  30 May 16 13:16 /usr/lib64/libgfchangelog.so ->
/usr/lib64/libgfchangelog.so.0
lrwxrwxrwx. 1 root root  23 May 16 08:58 /usr/lib64/libgfchangelog.so.0 ->
libgfchangelog.so.0.0.1
-rwxr-xr-x. 1 root root 62K Feb 25 04:02 /usr/lib64/libgfchangelog.so.0.0.1


But right now, when trying to start replication it goes faulty:

[root at gluster01 ~]# gluster volume geo-replication storage
geoaccount at 10.0.231.81::pcic-backup start
Starting geo-replication session between storage &
geoaccount at 10.0.231.81::pcic-backup
has been successful
[root at gluster01 ~]# gluster volume geo-replication status

MASTER NODE    MASTER VOL    MASTER BRICK                  SLAVE USER
 SLAVE                                        SLAVE NODE    STATUS
    CRAWL STATUS    LAST_SYNCED
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
10.0.231.50    storage       /mnt/raid6-storage/storage    geoaccount
 ssh://geoaccount at 10.0.231.81::pcic-backup    N/A           Initializing...
   N/A             N/A
10.0.231.54    storage       /mnt/raid6-storage/storage    geoaccount
 ssh://geoaccount at 10.0.231.81::pcic-backup    N/A           Initializing...
   N/A             N/A
10.0.231.56    storage       /mnt/raid6-storage/storage    geoaccount
 ssh://geoaccount at 10.0.231.81::pcic-backup    N/A           Initializing...
   N/A             N/A
10.0.231.52    storage       /mnt/raid6-storage/storage    geoaccount
 ssh://geoaccount at 10.0.231.81::pcic-backup    N/A           Initializing...
   N/A             N/A
10.0.231.55    storage       /mnt/raid6-storage/storage    geoaccount
 ssh://geoaccount at 10.0.231.81::pcic-backup    N/A           Initializing...
   N/A             N/A
10.0.231.51    storage       /mnt/raid6-storage/storage    geoaccount
 ssh://geoaccount at 10.0.231.81::pcic-backup    N/A           Initializing...
   N/A             N/A
10.0.231.53    storage       /mnt/raid6-storage/storage    geoaccount
 ssh://geoaccount at 10.0.231.81::pcic-backup    N/A           Initializing...
   N/A             N/A
[root at gluster01 ~]# gluster volume geo-replication status

MASTER NODE    MASTER VOL    MASTER BRICK                  SLAVE USER
 SLAVE                                        SLAVE NODE    STATUS    CRAWL
STATUS    LAST_SYNCED
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
10.0.231.50    storage       /mnt/raid6-storage/storage    geoaccount
 ssh://geoaccount at 10.0.231.81::pcic-backup    N/A           Faulty    N/A
          N/A
10.0.231.54    storage       /mnt/raid6-storage/storage    geoaccount
 ssh://geoaccount at 10.0.231.81::pcic-backup    N/A           Faulty    N/A
          N/A
10.0.231.56    storage       /mnt/raid6-storage/storage    geoaccount
 ssh://geoaccount at 10.0.231.81::pcic-backup    N/A           Faulty    N/A
          N/A
10.0.231.55    storage       /mnt/raid6-storage/storage    geoaccount
 ssh://geoaccount at 10.0.231.81::pcic-backup    N/A           Faulty    N/A
          N/A
10.0.231.53    storage       /mnt/raid6-storage/storage    geoaccount
 ssh://geoaccount at 10.0.231.81::pcic-backup    N/A           Faulty    N/A
          N/A
10.0.231.51    storage       /mnt/raid6-storage/storage    geoaccount
 ssh://geoaccount at 10.0.231.81::pcic-backup    N/A           Faulty    N/A
          N/A
10.0.231.52    storage       /mnt/raid6-storage/storage    geoaccount
 ssh://geoaccount at 10.0.231.81::pcic-backup    N/A           Faulty    N/A
          N/A
[root at gluster01 ~]# gluster volume geo-replication storage
geoaccount at 10.0.231.81::pcic-backup stop
Stopping geo-replication session between storage &
geoaccount at 10.0.231.81::pcic-backup
has been successful


And the
/var/log/glusterfs/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.log
log file contains the error: GLUSTER: Changelog register failed
error=[Errno 21] Is a directory

[root at gluster01 ~]# cat
/var/log/glusterfs/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.log

[2019-05-23 17:07:23.500781] I [gsyncd(config-get):308:main] <top>: Using
session config file
path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf
[2019-05-23 17:07:23.629298] I [gsyncd(status):308:main] <top>: Using
session config file
path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf
[2019-05-23 17:07:31.354005] I [gsyncd(config-get):308:main] <top>: Using
session config file
path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf
[2019-05-23 17:07:31.483582] I [gsyncd(config-get):308:main] <top>: Using
session config file
path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf
[2019-05-23 17:07:31.863888] I [gsyncd(config-get):308:main] <top>: Using
session config file
path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf
[2019-05-23 17:07:31.994895] I [gsyncd(monitor):308:main] <top>: Using
session config file
path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf
[2019-05-23 17:07:33.133888] I
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status
Change status=Initializing...
[2019-05-23 17:07:33.134301] I [monitor(monitor):157:monitor] Monitor:
starting gsyncd worker brick=/mnt/raid6-storage/storage
slave_node=10.0.231.81
[2019-05-23 17:07:33.214462] I [gsyncd(agent
/mnt/raid6-storage/storage):308:main] <top>: Using session config file
path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf
[2019-05-23 17:07:33.216737] I [changelogagent(agent
/mnt/raid6-storage/storage):72:__init__] ChangelogAgent: Agent listining...
[2019-05-23 17:07:33.228072] I [gsyncd(worker
/mnt/raid6-storage/storage):308:main] <top>: Using session config file
path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf
[2019-05-23 17:07:33.247236] I [resource(worker
/mnt/raid6-storage/storage):1366:connect_remote] SSH: Initializing SSH
connection between master and slave...
[2019-05-23 17:07:34.948796] I [gsyncd(config-get):308:main] <top>: Using
session config file
path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf
[2019-05-23 17:07:35.73339] I [gsyncd(status):308:main] <top>: Using
session config file
path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf
[2019-05-23 17:07:35.232405] I [resource(worker
/mnt/raid6-storage/storage):1413:connect_remote] SSH: SSH connection
between master and slave established. duration=1.9849
[2019-05-23 17:07:35.232748] I [resource(worker
/mnt/raid6-storage/storage):1085:connect] GLUSTER: Mounting gluster volume
locally...
[2019-05-23 17:07:36.359250] I [resource(worker
/mnt/raid6-storage/storage):1108:connect] GLUSTER: Mounted gluster volume
duration=1.1262
[2019-05-23 17:07:36.359639] I [subcmds(worker
/mnt/raid6-storage/storage):80:subcmd_worker] <top>: Worker spawn
successful. Acknowledging back to monitor
[2019-05-23 17:07:36.380975] E [repce(agent
/mnt/raid6-storage/storage):122:worker] <top>: call failed:
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 118, in
worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line
40, in register
    return Changes.cl_register(cl_brick, cl_dir, cl_log, cl_level, retries)
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line
45, in cl_register
    cls.raise_changelog_err()
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line
29, in raise_changelog_err
    raise ChangelogException(errn, os.strerror(errn))
ChangelogException: [Errno 21] Is a directory
[2019-05-23 17:07:36.382556] E [repce(worker
/mnt/raid6-storage/storage):214:__call__] RepceClient: call failed
call=27412:140659114579776:1558631256.38 method=register
error=ChangelogException
[2019-05-23 17:07:36.382833] E [resource(worker
/mnt/raid6-storage/storage):1266:service_loop] GLUSTER: Changelog register
failed error=[Errno 21] Is a directory
[2019-05-23 17:07:36.404313] I [repce(agent
/mnt/raid6-storage/storage):97:service_loop] RepceServer: terminating on
reaching EOF.
[2019-05-23 17:07:37.361396] I [monitor(monitor):278:monitor] Monitor:
worker died in startup phase brick=/mnt/raid6-storage/storage
[2019-05-23 17:07:37.370690] I
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status
Change status=Faulty
[2019-05-23 17:07:41.526408] I [gsyncd(config-get):308:main] <top>: Using
session config file
path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf
[2019-05-23 17:07:41.643923] I [gsyncd(status):308:main] <top>: Using
session config file
path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf
[2019-05-23 17:07:45.722193] I [gsyncd(config-get):308:main] <top>: Using
session config file
path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf
[2019-05-23 17:07:45.817210] I [gsyncd(config-get):308:main] <top>: Using
session config file
path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf
[2019-05-23 17:07:46.188499] I [gsyncd(config-get):308:main] <top>: Using
session config file
path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf
[2019-05-23 17:07:46.258817] I [gsyncd(config-get):308:main] <top>: Using
session config file
path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf
[2019-05-23 17:07:47.350276] I [gsyncd(monitor-status):308:main] <top>:
Using session config file
path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf
[2019-05-23 17:07:47.364751] I
[subcmds(monitor-status):29:subcmd_monitor_status] <top>: Monitor Status
Change status=Stopped


I'm not really sure where to go from here...

[root at gluster01 ~]# gluster volume geo-replication storage
geoaccount at 10.0.231.81::pcic-backup config  | grep -i changelog
change_detector:changelog
changelog_archive_format:%Y%m
changelog_batch_size:727040
changelog_log_file:/var/log/glusterfs/geo-replication/storage_10.0.231.81_pcic-backup/changes-${local_id}.log
changelog_log_level:INFO

Thanks,
 -Matthew
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190523/d7f586a7/attachment.html>

From matthew.has.questions at gmail.com  Tue May 28 15:03:11 2019
From: matthew.has.questions at gmail.com (Matthew B)
Date: Tue, 28 May 2019 15:03:11 -0000
Subject: [Gluster-users] Geo-Replication faulty - changelog register failed
	- Is a directory
Message-ID: <CAJcSNDcRVLfFD9Nme_on+eeO8tLa-y22mLW=kip5soQ5bw+anQ@mail.gmail.com>

Hello - I am having a problem with geo-replication on glusterv5 that I hope
someone can help me with.

I have a 7-server distribute cluster as the primary volume, and a 2 server
distribute cluster as the secondary volume. Both are running the same
version of gluster on CentOS 7: glusterfs-5.3-2.el7.x86_64

I was able to setup the replication keys, user, groups, etc and establish
the session, but it goes faulty quickly after initializing.

I ran into the missing libgfchangelog.so error and fixed with a symlink:

[root at pcic-backup01 ~]# ln -s /usr/lib64/libgfchangelog.so.0
/usr/lib64/libgfchangelog.so
[root at pcic-backup01 ~]# ls -lh /usr/lib64/libgfchangelog.so*
lrwxrwxrwx. 1 root root  30 May 16 13:16 /usr/lib64/libgfchangelog.so ->
/usr/lib64/libgfchangelog.so.0
lrwxrwxrwx. 1 root root  23 May 16 08:58 /usr/lib64/libgfchangelog.so.0 ->
libgfchangelog.so.0.0.1
-rwxr-xr-x. 1 root root 62K Feb 25 04:02 /usr/lib64/libgfchangelog.so.0.0.1


But right now, when trying to start replication it goes faulty:

[root at gluster01 ~]# gluster volume geo-replication storage
geoaccount at 10.0.231.81::pcic-backup start
Starting geo-replication session between storage &
geoaccount at 10.0.231.81::pcic-backup
has been successful
[root at gluster01 ~]# gluster volume geo-replication status

MASTER NODE    MASTER VOL    MASTER BRICK                  SLAVE USER
 SLAVE                                        SLAVE NODE    STATUS
    CRAWL STATUS    LAST_SYNCED
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
10.0.231.50    storage       /mnt/raid6-storage/storage    geoaccount
 ssh://geoaccount at 10.0.231.81::pcic-backup    N/A           Initializing...
   N/A             N/A
10.0.231.54    storage       /mnt/raid6-storage/storage    geoaccount
 ssh://geoaccount at 10.0.231.81::pcic-backup    N/A           Initializing...
   N/A             N/A
10.0.231.56    storage       /mnt/raid6-storage/storage    geoaccount
 ssh://geoaccount at 10.0.231.81::pcic-backup    N/A           Initializing...
   N/A             N/A
10.0.231.52    storage       /mnt/raid6-storage/storage    geoaccount
 ssh://geoaccount at 10.0.231.81::pcic-backup    N/A           Initializing...
   N/A             N/A
10.0.231.55    storage       /mnt/raid6-storage/storage    geoaccount
 ssh://geoaccount at 10.0.231.81::pcic-backup    N/A           Initializing...
   N/A             N/A
10.0.231.51    storage       /mnt/raid6-storage/storage    geoaccount
 ssh://geoaccount at 10.0.231.81::pcic-backup    N/A           Initializing...
   N/A             N/A
10.0.231.53    storage       /mnt/raid6-storage/storage    geoaccount
 ssh://geoaccount at 10.0.231.81::pcic-backup    N/A           Initializing...
   N/A             N/A
[root at gluster01 ~]# gluster volume geo-replication status

MASTER NODE    MASTER VOL    MASTER BRICK                  SLAVE USER
 SLAVE                                        SLAVE NODE    STATUS    CRAWL
STATUS    LAST_SYNCED
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
10.0.231.50    storage       /mnt/raid6-storage/storage    geoaccount
 ssh://geoaccount at 10.0.231.81::pcic-backup    N/A           Faulty    N/A
          N/A
10.0.231.54    storage       /mnt/raid6-storage/storage    geoaccount
 ssh://geoaccount at 10.0.231.81::pcic-backup    N/A           Faulty    N/A
          N/A
10.0.231.56    storage       /mnt/raid6-storage/storage    geoaccount
 ssh://geoaccount at 10.0.231.81::pcic-backup    N/A           Faulty    N/A
          N/A
10.0.231.55    storage       /mnt/raid6-storage/storage    geoaccount
 ssh://geoaccount at 10.0.231.81::pcic-backup    N/A           Faulty    N/A
          N/A
10.0.231.53    storage       /mnt/raid6-storage/storage    geoaccount
 ssh://geoaccount at 10.0.231.81::pcic-backup    N/A           Faulty    N/A
          N/A
10.0.231.51    storage       /mnt/raid6-storage/storage    geoaccount
 ssh://geoaccount at 10.0.231.81::pcic-backup    N/A           Faulty    N/A
          N/A
10.0.231.52    storage       /mnt/raid6-storage/storage    geoaccount
 ssh://geoaccount at 10.0.231.81::pcic-backup    N/A           Faulty    N/A
          N/A
[root at gluster01 ~]# gluster volume geo-replication storage
geoaccount at 10.0.231.81::pcic-backup stop
Stopping geo-replication session between storage &
geoaccount at 10.0.231.81::pcic-backup
has been successful


And the
/var/log/glusterfs/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.log
log file contains the error: GLUSTER: Changelog register failed
error=[Errno 21] Is a directory

[root at gluster01 ~]# cat
/var/log/glusterfs/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.log

[2019-05-23 17:07:23.500781] I [gsyncd(config-get):308:main] <top>: Using
session config file
path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf
[2019-05-23 17:07:23.629298] I [gsyncd(status):308:main] <top>: Using
session config file
path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf
[2019-05-23 17:07:31.354005] I [gsyncd(config-get):308:main] <top>: Using
session config file
path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf
[2019-05-23 17:07:31.483582] I [gsyncd(config-get):308:main] <top>: Using
session config file
path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf
[2019-05-23 17:07:31.863888] I [gsyncd(config-get):308:main] <top>: Using
session config file
path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf
[2019-05-23 17:07:31.994895] I [gsyncd(monitor):308:main] <top>: Using
session config file
path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf
[2019-05-23 17:07:33.133888] I
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status
Change status=Initializing...
[2019-05-23 17:07:33.134301] I [monitor(monitor):157:monitor] Monitor:
starting gsyncd worker brick=/mnt/raid6-storage/storage
slave_node=10.0.231.81
[2019-05-23 17:07:33.214462] I [gsyncd(agent
/mnt/raid6-storage/storage):308:main] <top>: Using session config file
path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf
[2019-05-23 17:07:33.216737] I [changelogagent(agent
/mnt/raid6-storage/storage):72:__init__] ChangelogAgent: Agent listining...
[2019-05-23 17:07:33.228072] I [gsyncd(worker
/mnt/raid6-storage/storage):308:main] <top>: Using session config file
path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf
[2019-05-23 17:07:33.247236] I [resource(worker
/mnt/raid6-storage/storage):1366:connect_remote] SSH: Initializing SSH
connection between master and slave...
[2019-05-23 17:07:34.948796] I [gsyncd(config-get):308:main] <top>: Using
session config file
path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf
[2019-05-23 17:07:35.73339] I [gsyncd(status):308:main] <top>: Using
session config file
path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf
[2019-05-23 17:07:35.232405] I [resource(worker
/mnt/raid6-storage/storage):1413:connect_remote] SSH: SSH connection
between master and slave established. duration=1.9849
[2019-05-23 17:07:35.232748] I [resource(worker
/mnt/raid6-storage/storage):1085:connect] GLUSTER: Mounting gluster volume
locally...
[2019-05-23 17:07:36.359250] I [resource(worker
/mnt/raid6-storage/storage):1108:connect] GLUSTER: Mounted gluster volume
duration=1.1262
[2019-05-23 17:07:36.359639] I [subcmds(worker
/mnt/raid6-storage/storage):80:subcmd_worker] <top>: Worker spawn
successful. Acknowledging back to monitor
[2019-05-23 17:07:36.380975] E [repce(agent
/mnt/raid6-storage/storage):122:worker] <top>: call failed:
Traceback (most recent call last):
  File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 118, in
worker
    res = getattr(self.obj, rmeth)(*in_data[2:])
  File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line
40, in register
    return Changes.cl_register(cl_brick, cl_dir, cl_log, cl_level, retries)
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line
45, in cl_register
    cls.raise_changelog_err()
  File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line
29, in raise_changelog_err
    raise ChangelogException(errn, os.strerror(errn))
ChangelogException: [Errno 21] Is a directory
[2019-05-23 17:07:36.382556] E [repce(worker
/mnt/raid6-storage/storage):214:__call__] RepceClient: call failed
call=27412:140659114579776:1558631256.38 method=register
error=ChangelogException
[2019-05-23 17:07:36.382833] E [resource(worker
/mnt/raid6-storage/storage):1266:service_loop] GLUSTER: Changelog register
failed error=[Errno 21] Is a directory
[2019-05-23 17:07:36.404313] I [repce(agent
/mnt/raid6-storage/storage):97:service_loop] RepceServer: terminating on
reaching EOF.
[2019-05-23 17:07:37.361396] I [monitor(monitor):278:monitor] Monitor:
worker died in startup phase brick=/mnt/raid6-storage/storage
[2019-05-23 17:07:37.370690] I
[gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status
Change status=Faulty
[2019-05-23 17:07:41.526408] I [gsyncd(config-get):308:main] <top>: Using
session config file
path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf
[2019-05-23 17:07:41.643923] I [gsyncd(status):308:main] <top>: Using
session config file
path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf
[2019-05-23 17:07:45.722193] I [gsyncd(config-get):308:main] <top>: Using
session config file
path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf
[2019-05-23 17:07:45.817210] I [gsyncd(config-get):308:main] <top>: Using
session config file
path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf
[2019-05-23 17:07:46.188499] I [gsyncd(config-get):308:main] <top>: Using
session config file
path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf
[2019-05-23 17:07:46.258817] I [gsyncd(config-get):308:main] <top>: Using
session config file
path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf
[2019-05-23 17:07:47.350276] I [gsyncd(monitor-status):308:main] <top>:
Using session config file
path=/var/lib/glusterd/geo-replication/storage_10.0.231.81_pcic-backup/gsyncd.conf
[2019-05-23 17:07:47.364751] I
[subcmds(monitor-status):29:subcmd_monitor_status] <top>: Monitor Status
Change status=Stopped


I'm not really sure where to go from here...

[root at gluster01 ~]# gluster volume geo-replication storage
geoaccount at 10.0.231.81::pcic-backup config  | grep -i changelog
change_detector:changelog
changelog_archive_format:%Y%m
changelog_batch_size:727040
changelog_log_file:/var/log/glusterfs/geo-replication/storage_10.0.231.81_pcic-backup/changes-${local_id}.log
changelog_log_level:INFO

Thanks,
 -Matthew
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gluster.org/pipermail/gluster-users/attachments/20190528/0bb861ab/attachment.html>