<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    Hi Sunny, <br>

    <br>

    Where would I find the changes-&lt;brick-path&gt;.log files? Is

    there anything else to help diagnose this? <br>

    <br>

    Thanks,<br>

     -Matthew<br>

    <div class="moz-signature"><font size="-1">

        <p>--<br>

          Matthew Benstead<br>

          <font size="-2">System Administrator<br>

            <a href="https://pacificclimate.org/">Pacific Climate

              Impacts Consortium</a><br>

            University of Victoria, UH1<br>

            PO Box 1800, STN CSC<br>

            Victoria, BC, V8W 2Y2<br>

            Phone: +1-250-721-8432<br>

            Email: <a class="moz-txt-link-abbreviated" href="mailto:matthewb@uvic.ca">matthewb@uvic.ca</a></font></p>

      </font>

    </div>

    <div class="moz-cite-prefix">On 7/29/19 9:46 AM, Matthew Benstead

      wrote:<br>

    </div>

    <blockquote type="cite"

      cite="mid:803eea7e-86d0-35cd-e18a-500803bf4fc1@uvic.ca">

      <pre class="moz-quote-pre" wrap="">Hi Sunny,

Yes, I have attached the gsyncd.log file. I couldn't find any

changes-&lt;brick-path&gt;.log files...

Trying to start replication goes faulty right away:

[root@gluster01 ~]# rpm -q glusterfs

glusterfs-5.6-1.el7.x86_64

[root@gluster01 ~]# uname -r

3.10.0-957.21.3.el7.x86_64

[root@gluster01 ~]# cat /etc/centos-release

CentOS Linux release 7.6.1810 (Core)

[root@gluster01 ~]# gluster volume geo-replication storage

<a class="moz-txt-link-abbreviated" href="mailto:root@10.0.231.81::pcic-backup">root@10.0.231.81::pcic-backup</a> start

Starting geo-replication session between storage &amp;

10.0.231.81::pcic-backup has been successful

[root@gluster01 ~]# gluster volume geo-replication storage

<a class="moz-txt-link-abbreviated" href="mailto:root@10.0.231.81::pcic-backup">root@10.0.231.81::pcic-backup</a> status

MASTER NODE    MASTER VOL    MASTER BRICK                  SLAVE USER   

SLAVE                       SLAVE NODE    STATUS    CRAWL STATUS   

LAST_SYNCED         

-------------------------------------------------------------------------------------------------------------------------------------------------------

10.0.231.50    storage       /mnt/raid6-storage/storage    root         

10.0.231.81::pcic-backup    N/A           Faulty    N/A            

N/A                 

10.0.231.52    storage       /mnt/raid6-storage/storage    root         

10.0.231.81::pcic-backup    N/A           Faulty    N/A            

N/A                 

10.0.231.54    storage       /mnt/raid6-storage/storage    root         

10.0.231.81::pcic-backup    N/A           Faulty    N/A            

N/A                 

10.0.231.51    storage       /mnt/raid6-storage/storage    root         

10.0.231.81::pcic-backup    N/A           Faulty    N/A            

N/A                 

10.0.231.53    storage       /mnt/raid6-storage/storage    root         

10.0.231.81::pcic-backup    N/A           Faulty    N/A            

N/A                 

10.0.231.55    storage       /mnt/raid6-storage/storage    root         

10.0.231.81::pcic-backup    N/A           Faulty    N/A            

N/A                 

10.0.231.56    storage       /mnt/raid6-storage/storage    root         

10.0.231.81::pcic-backup    N/A           Faulty    N/A            

N/A                 

[root@gluster01 ~]# gluster volume geo-replication storage

<a class="moz-txt-link-abbreviated" href="mailto:root@10.0.231.81::pcic-backup">root@10.0.231.81::pcic-backup</a> stop

Stopping geo-replication session between storage &amp;

10.0.231.81::pcic-backup has been successful

This is the primary cluster:

[root@gluster01 ~]# gluster volume info storage

Volume Name: storage

Type: Distribute

Volume ID: 6f95525a-94d7-4174-bac4-e1a18fe010a2

Status: Started

Snapshot Count: 0

Number of Bricks: 7

Transport-type: tcp

Bricks:

Brick1: 10.0.231.50:/mnt/raid6-storage/storage

Brick2: 10.0.231.51:/mnt/raid6-storage/storage

Brick3: 10.0.231.52:/mnt/raid6-storage/storage

Brick4: 10.0.231.53:/mnt/raid6-storage/storage

Brick5: 10.0.231.54:/mnt/raid6-storage/storage

Brick6: 10.0.231.55:/mnt/raid6-storage/storage

Brick7: 10.0.231.56:/mnt/raid6-storage/storage

Options Reconfigured:

features.read-only: off

features.inode-quota: on

features.quota: on

performance.readdir-ahead: on

nfs.disable: on

geo-replication.indexing: on

geo-replication.ignore-pid-check: on

transport.address-family: inet

features.quota-deem-statfs: on

changelog.changelog: on

diagnostics.client-log-level: INFO

And this is the cluster I'm trying to replicate to:

[root@pcic-backup01 ~]# gluster volume info pcic-backup

Volume Name: pcic-backup

Type: Distribute

Volume ID: 2890bcde-a023-4feb-a0e5-e8ef8f337d4c

Status: Started

Snapshot Count: 0

Number of Bricks: 2

Transport-type: tcp

Bricks:

Brick1: 10.0.231.81:/pcic-backup01-zpool/brick

Brick2: 10.0.231.82:/pcic-backup02-zpool/brick

Options Reconfigured:

nfs.disable: on

transport.address-family: inet

Thanks,

 -Matthew

On 7/28/19 10:56 PM, Sunny Kumar wrote:

</pre>

      <blockquote type="cite">

        <pre class="moz-quote-pre" wrap="">HI Matthew,

Can you share geo-rep logs and one more log file

(changes-&lt;brick-path&gt;.log) it will help to pinpoint actual reason

behind failure.

/sunny

On Mon, Jul 29, 2019 at 9:13 AM Nithya Balachandran <a class="moz-txt-link-rfc2396E" href="mailto:nbalacha@redhat.com">&lt;nbalacha@redhat.com&gt;</a> wrote:

</pre>

        <blockquote type="cite">

          <pre class="moz-quote-pre" wrap="">

On Sat, 27 Jul 2019 at 02:31, Matthew Benstead <a class="moz-txt-link-rfc2396E" href="mailto:matthewb@uvic.ca">&lt;matthewb@uvic.ca&gt;</a> wrote:

</pre>

          <blockquote type="cite">

            <pre class="moz-quote-pre" wrap="">Ok thank-you for explaining everything - that makes sense.

Currently the brick file systems are pretty evenly distributed so I probably won't run the fix-layout right now.

Would this state have any impact on geo-replication? I'm trying to geo-replicate this volume, but am getting a weird error: "Changelog register failed error=[Errno 21] Is a directory"

</pre>

          </blockquote>

          <pre class="moz-quote-pre" wrap="">

It should not. Sunny, can you comment on this?

Regards,

Nithya

</pre>

          <blockquote type="cite">

            <pre class="moz-quote-pre" wrap="">

I assume this is related to something else, but I wasn't sure.

Thanks,

 -Matthew

--

Matthew Benstead

System Administrator

Pacific Climate Impacts Consortium

University of Victoria, UH1

PO Box 1800, STN CSC

Victoria, BC, V8W 2Y2

Phone: +1-250-721-8432

Email: <a class="moz-txt-link-abbreviated" href="mailto:matthewb@uvic.ca">matthewb@uvic.ca</a>

On 7/26/19 12:02 AM, Nithya Balachandran wrote:

On Fri, 26 Jul 2019 at 01:56, Matthew Benstead <a class="moz-txt-link-rfc2396E" href="mailto:matthewb@uvic.ca">&lt;matthewb@uvic.ca&gt;</a> wrote:

</pre>

            <blockquote type="cite">

              <pre class="moz-quote-pre" wrap="">Hi Nithya,

Hmm... I don't remember if I did, but based on what I'm seeing it sounds like I probably didn't run rebalance or fix-layout.

It looks like folders that haven't had any new files created have a dht of 0, while other folders have non-zero values.

[root@gluster07 ~]# getfattr --absolute-names -m . -d -e hex /mnt/raid6-storage/storage/ | grep dht

[root@gluster07 ~]# getfattr --absolute-names -m . -d -e hex /mnt/raid6-storage/storage/home | grep dht

trusted.glusterfs.dht=0x00000000000000000000000000000000

[root@gluster07 ~]# getfattr --absolute-names -m . -d -e hex /mnt/raid6-storage/storage/home/matthewb | grep dht

trusted.glusterfs.dht=0x00000001000000004924921a6db6dbc7

If I just run the fix-layout command will it re-create all of the dht values or just the missing ones?

</pre>

            </blockquote>

            <pre class="moz-quote-pre" wrap="">

A fix-layout will recalculate the layouts entirely so files all the values will change. No files will be moved.

A rebalance will recalculate the layouts like the fix-layout but will also move files to their new locations based on the new layout ranges. This could take a lot of time depending on the number of files/directories on the volume. If you do this, I would recommend that you turn off lookup-optimize until the rebalance is over.

</pre>

            <blockquote type="cite">

              <pre class="moz-quote-pre" wrap="">Since the brick is already fairly size balanced could I get away with running fix-layout but not rebalance? Or would the new dht layout mean slower accesses since the files may be expected on different bricks?

</pre>

            </blockquote>

            <pre class="moz-quote-pre" wrap="">

The first access for a file will be slower. The next one will be faster as the location will be cached in the client's in-memory structures.

You may not need to run either a fix-layout or a rebalance if new file creations will be in directories created after the add-brick. Gluster will automatically include all 7 bricks for those directories.

Regards,

Nithya

</pre>

            <blockquote type="cite">

              <pre class="moz-quote-pre" wrap="">Thanks,

 -Matthew

--

Matthew Benstead

System Administrator

Pacific Climate Impacts Consortium

University of Victoria, UH1

PO Box 1800, STN CSC

Victoria, BC, V8W 2Y2

Phone: +1-250-721-8432

Email: <a class="moz-txt-link-abbreviated" href="mailto:matthewb@uvic.ca">matthewb@uvic.ca</a>

On 7/24/19 9:30 PM, Nithya Balachandran wrote:

On Wed, 24 Jul 2019 at 22:12, Matthew Benstead <a class="moz-txt-link-rfc2396E" href="mailto:matthewb@uvic.ca">&lt;matthewb@uvic.ca&gt;</a> wrote:

</pre>

              <blockquote type="cite">

                <pre class="moz-quote-pre" wrap="">So looking more closely at the trusted.glusterfs.dht attributes from the bricks it looks like they cover the entire range... and there is no range left for gluster07.

The first 6 bricks range from 0x00000000 to 0xffffffff - so... is there a way to re-calculate what the dht values should be? Each of the bricks should have a gap

Gluster05 00000000 -&gt; 2aaaaaa9

Gluster06 2aaaaaaa -&gt; 55555553

Gluster01 55555554 -&gt; 7ffffffd

Gluster02 7ffffffe -&gt; aaaaaaa7

Gluster03 aaaaaaa8 -&gt; d5555551

Gluster04 d5555552 -&gt; ffffffff

Gluster07 None

If we split the range into 7 servers that would be a gap of about 0x24924924 for each server.

Now in terms of the gluster07 brick, about 2 years ago the RAID array the brick was stored on became corrupted. I ran the remove-brick force command, then provisioned a new server, ran the add-brick command and then restored the missing files from backup by copying them back to the main gluster mount (not the brick).

</pre>

              </blockquote>

              <pre class="moz-quote-pre" wrap="">Did you run a rebalance after performing the add-brick? Without a rebalance/fix-layout , the layout for existing directories on the volume will not  be updated to use the new brick as well.

That the layout does not include the new brick in the root dir is in itself is not a problem. Do you create a lot of files directly in the root of the volume? If yes, you might want to run a rebalance. Otherwise, if you mostly create files in newly added directories, you can probably ignore this. You can check the layout for directories on the volume and see if they incorporate the brick7.

I would expect a lookup on the root to have set an xattr on the brick with an empty layout range . The fact that the xattr does not exist at all on the brick is what I am looking into.

</pre>

              <blockquote type="cite">

                <pre class="moz-quote-pre" wrap="">It looks like prior to that event this was the layout - which would make sense given the equal size of the 7 bricks:

gluster02.pcic.uvic.ca | SUCCESS | rc=0 &gt;&gt;

# file: /mnt/raid6-storage/storage

trusted.glusterfs.dht=0x000000010000000048bfff206d1ffe5f

gluster05.pcic.uvic.ca | SUCCESS | rc=0 &gt;&gt;

# file: /mnt/raid6-storage/storage

trusted.glusterfs.dht=0x0000000100000000b5dffce0da3ffc1f

gluster04.pcic.uvic.ca | SUCCESS | rc=0 &gt;&gt;

# file: /mnt/raid6-storage/storage

trusted.glusterfs.dht=0x0000000100000000917ffda0b5dffcdf

gluster03.pcic.uvic.ca | SUCCESS | rc=0 &gt;&gt;

# file: /mnt/raid6-storage/storage

trusted.glusterfs.dht=0x00000001000000006d1ffe60917ffd9f

gluster01.pcic.uvic.ca | SUCCESS | rc=0 &gt;&gt;

# file: /mnt/raid6-storage/storage

trusted.glusterfs.dht=0x0000000100000000245fffe048bfff1f

gluster07.pcic.uvic.ca | SUCCESS | rc=0 &gt;&gt;

# file: /mnt/raid6-storage/storage

trusted.glusterfs.dht=0x000000010000000000000000245fffdf

gluster06.pcic.uvic.ca | SUCCESS | rc=0 &gt;&gt;

# file: /mnt/raid6-storage/storage

trusted.glusterfs.dht=0x0000000100000000da3ffc20ffffffff

Which yields the following:

00000000 -&gt; 245fffdf    Gluster07

245fffe0 -&gt; 48bfff1f    Gluster01

48bfff20 -&gt; 6d1ffe5f    Gluster02

6d1ffe60 -&gt; 917ffd9f    Gluster03

917ffda0 -&gt; b5dffcdf    Gluster04

b5dffce0 -&gt; da3ffc1f    Gluster05

da3ffc20 -&gt; ffffffff    Gluster06

Is there some way to get back to this?

Thanks,

 -Matthew

--

Matthew Benstead

System Administrator

Pacific Climate Impacts Consortium

University of Victoria, UH1

PO Box 1800, STN CSC

Victoria, BC, V8W 2Y2

Phone: +1-250-721-8432

Email: <a class="moz-txt-link-abbreviated" href="mailto:matthewb@uvic.ca">matthewb@uvic.ca</a>

On 7/18/19 7:20 AM, Matthew Benstead wrote:

Hi Nithya,

No - it was added about a year and a half ago. I have tried re-mounting the volume on the server, but it didn't add the attr:

[root@gluster07 ~]# umount /storage/

[root@gluster07 ~]# cat /etc/fstab | grep "/storage"

10.0.231.56:/storage /storage glusterfs defaults,log-level=WARNING,backupvolfile-server=10.0.231.51 0 0

[root@gluster07 ~]# mount /storage/

[root@gluster07 ~]# df -h /storage/

Filesystem            Size  Used Avail Use% Mounted on

10.0.231.56:/storage  255T  194T   62T  77% /storage

[root@gluster07 ~]# getfattr --absolute-names -m . -d -e hex /mnt/raid6-storage/storage/

# file: /mnt/raid6-storage/storage/

security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a756e6c6162656c65645f743a733000

trusted.gfid=0x00000000000000000000000000000001

trusted.glusterfs.6f95525a-94d7-4174-bac4-e1a18fe010a2.xtime=0x5d307baa00023ec0

trusted.glusterfs.quota.dirty=0x3000

trusted.glusterfs.quota.size.2=0x00001b71d5279e000000000000763e32000000000005cd53

trusted.glusterfs.volume-id=0x6f95525a94d74174bac4e1a18fe010a2

Thanks,

 -Matthew

On 7/17/19 10:04 PM, Nithya Balachandran wrote:

Hi Matthew,

Was this node/brick added to the volume recently? If yes, try mounting the volume on a fresh mount point - that should create the xattr on this as well.

Regards,

Nithya

On Wed, 17 Jul 2019 at 21:01, Matthew Benstead <a class="moz-txt-link-rfc2396E" href="mailto:matthewb@uvic.ca">&lt;matthewb@uvic.ca&gt;</a> wrote:

</pre>

                <blockquote type="cite">

                  <pre class="moz-quote-pre" wrap="">Hello,

I've just noticed one brick in my 7 node distribute volume is missing

the trusted.glusterfs.dht xattr...? How can I fix this?

I'm running glusterfs-5.3-2.el7.x86_64 on CentOS 7.

All of the other nodes are fine, but gluster07 from the list below does

not have the attribute.

$ ansible -i hosts gluster-servers[0:6] ... -m shell -a "getfattr -m .

--absolute-names -n trusted.glusterfs.dht -e hex

/mnt/raid6-storage/storage"

...

gluster05 | SUCCESS | rc=0 &gt;&gt;

# file: /mnt/raid6-storage/storage

trusted.glusterfs.dht=0x0000000100000000000000002aaaaaa9

gluster03 | SUCCESS | rc=0 &gt;&gt;

# file: /mnt/raid6-storage/storage

trusted.glusterfs.dht=0x0000000100000000aaaaaaa8d5555551

gluster04 | SUCCESS | rc=0 &gt;&gt;

# file: /mnt/raid6-storage/storage

trusted.glusterfs.dht=0x0000000100000000d5555552ffffffff

gluster06 | SUCCESS | rc=0 &gt;&gt;

# file: /mnt/raid6-storage/storage

trusted.glusterfs.dht=0x00000001000000002aaaaaaa55555553

gluster02 | SUCCESS | rc=0 &gt;&gt;

# file: /mnt/raid6-storage/storage

trusted.glusterfs.dht=0x00000001000000007ffffffeaaaaaaa7

gluster07 | FAILED | rc=1 &gt;&gt;

/mnt/raid6-storage/storage: trusted.glusterfs.dht: No such

attributenon-zero return code

gluster01 | SUCCESS | rc=0 &gt;&gt;

# file: /mnt/raid6-storage/storage

trusted.glusterfs.dht=0x0000000100000000555555547ffffffd

Here are all of the attr's from the brick:

[root@gluster07 ~]# getfattr --absolute-names -m . -d -e hex

/mnt/raid6-storage/storage/

# file: /mnt/raid6-storage/storage/

security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a756e6c6162656c65645f743a733000

trusted.gfid=0x00000000000000000000000000000001

trusted.glusterfs.6f95525a-94d7-4174-bac4-e1a18fe010a2.xtime=0x5d2dee800001fdf9

trusted.glusterfs.quota.dirty=0x3000

trusted.glusterfs.quota.size.2=0x00001b69498a1400000000000076332e000000000005cd03

trusted.glusterfs.volume-id=0x6f95525a94d74174bac4e1a18fe010a2

And here is the volume information:

[root@gluster07 ~]# gluster volume info storage

Volume Name: storage

Type: Distribute

Volume ID: 6f95525a-94d7-4174-bac4-e1a18fe010a2

Status: Started

Snapshot Count: 0

Number of Bricks: 7

Transport-type: tcp

Bricks:

Brick1: 10.0.231.50:/mnt/raid6-storage/storage

Brick2: 10.0.231.51:/mnt/raid6-storage/storage

Brick3: 10.0.231.52:/mnt/raid6-storage/storage

Brick4: 10.0.231.53:/mnt/raid6-storage/storage

Brick5: 10.0.231.54:/mnt/raid6-storage/storage

Brick6: 10.0.231.55:/mnt/raid6-storage/storage

Brick7: 10.0.231.56:/mnt/raid6-storage/storage

Options Reconfigured:

changelog.changelog: on

features.quota-deem-statfs: on

features.read-only: off

features.inode-quota: on

features.quota: on

performance.readdir-ahead: on

nfs.disable: on

geo-replication.indexing: on

geo-replication.ignore-pid-check: on

transport.address-family: inet

Thanks,

 -Matthew

_______________________________________________

Gluster-users mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Gluster-users@gluster.org">Gluster-users@gluster.org</a>

<a class="moz-txt-link-freetext" href="https://lists.gluster.org/mailman/listinfo/gluster-users">https://lists.gluster.org/mailman/listinfo/gluster-users</a>

</pre>

                </blockquote>

                <pre class="moz-quote-pre" wrap="">

</pre>

              </blockquote>

            </blockquote>

          </blockquote>

        </blockquote>

      </blockquote>

      <pre class="moz-quote-pre" wrap="">

</pre>

    </blockquote>

    <br>

  </body>

</html>